Sequence data generation
DNA extraction and sequencing are as stated in Brandt et al. (2023). All
sequences generated were assembled in CLC Bio Main Workbench Version 6.9
(http://www.clcbio.com). All gene regions were submitted to GenBank and
the accession numbers are recorded in Table S1. The CO1, 16S and EF-1ɣ
datasets were concatenated using FASconCAT v1.11 (Kück and Meusemann,
2010). The edited sequences for each gene region as well as the
concatenated dataset, were aligned using MAFFT online (Katoh, 2005;
Katoh & Toh, 2008). The ‘Auto’ strategy for alignment was used in
MAFFT, this inspects the direction of the sequences and adjust the
alignment in correlation to the first sequence.
Divergence dating
Dating the phylogeny was done
using BEAST v1.8.4 . All identical sequences were removed from the
analysis. In order to determine the most appropriate fossil calibration
points, numerous combinations were tested varying the fossil
calibrations used. The final
fossil calibration points used were as follows: the Hexathelidae fossil,Rosamygale grauvogely (Gresa-Voltzia formation, France, Triassic)
. Rosamygale grauvogely is thus, the first mygalomorph appearance
in the fossil record, dating to 250 – 240 MYA . The Nemesiidae fossil,Cretamygale chasei (Isle of Wight, Cretaceous) , two
Ctenizidae/Halonoproctidae fossils, Ummidia damzeni and U.
malonowskii (both from Baltic amber, Palaeogene) . Lastly, a fossil
from the family Cyrtaucheniidae, Bolostromus destructus(Dominican amber, Neogene) . Each family was represented by available
sequence data from GenBank for 16S rDNA, cytochrome oxidase 1 (CO1) and
elongation factor 1 gamma (EF-1ɣ) (Table S5). Each calibration point was
set by setting a hard minimum bound (the youngest possible age of the
fossil based on the deposit in which it was found) and a soft upper
bound (the oldest possible age).
The nucleotide substitution rate
was determined using jModelTest v2.1.7 . This was done for each gene
region and the best model to account for varying base pair substitution
rates was selected based on the Bayesian information criterion (BIC) . A
summary of the number of samples for each gene region, sequence length
and substitution model are given in Table S6. This was used to set the
prior in BEAUTi v1.8.4 . Other priors set included, the rate of
molecular evolution to a relaxed clock with a lognormal distribution
(this allows mutational rates to vary over the tree) . As the dataset
includes intraspecific and interspecific sampling the study follows the
protocol of by testing four different tree priors. All analyses were
thus repeated using ‘Speciation: Birth-Death Incomplete Sampling’,
‘Speciation: Birth-Death’, ‘Speciation: Yule Process’ and ‘Coalescent:
Constant Size’ at tree priors. This allowed testing at both the
population level of diversity and species level diversity. An
uncorrelated relaxed clock was used, as this allows for each branch of
the phylogeny to have a different evolutionary rate . BEAST was then run
for 200,000,000 (250,000,000 for the yule model) generations and sampled
every 5000 generations. This was repeated twice for each analysis to
ensure convergence. BEAST was run in conjunction with BEAGLE .
Tracer v1.7.1 was used to confirm convergence . This was tested by
checking the log file for each BEAST run and ensuring the ESS values
were above 200. The individual runs were then compiled to ensure a
normal distribution. Log Combiner v1.8.4 was used to combine the
multiple tree files into one file. The subsampling number was set to
50,000 and the burn-in for each tree to 50,000,000.
The tree files viewed, annotated and edited in FigTree .