2.6. Phylogeographic pattern
TreeMix v. 1.13 (Pickrell and Pritchard, 2012) was used to infer population splitting and mixture patterns. The method employs allele frequencies to build a graph-based model of the population network (as opposed to a bifurcating tree) by first building a Maximum Likelihood (ML) tree and then searching for migration events that increase the composite likelihood (Flesch et al., 2020; Pickrell and Pritchard, 2012). The program utilizes a Gaussian approximation to model genetic drift (drift parameter) along each population (Flesch et al., 2020; Pickrell and Pritchard, 2012). Using the combined data set filtered for LD, a ML tree was built with a window size (k) of 500 SNPs, evaluating from 0 to 16 migrations edges (m ), 10 iterations per edge, using the “-noss” option to prevent overcorrection of sample size. The optimal number of significant migration edges was then inferred from the second-order rate of change in likelihood (Δm ) across incremental values of m with the OptM package in R (Fitak, 2021).
In addition to network analyses and as a complementary result, a Maximum Likelihood (ML) tree was built using the concatenated SNP dataset (2867 bp). Prior to concatenation, the combined VCF file was filtered using BCFtools (Li, 2011) to remove individuals with more than 10% of missing data (N=8) and candidate SNP outliers (N=18, see section 2.7). The filtered VCF file was then converted to PHYLIP format using thevcf2phylip.py script (Ortiz, 2019). Consensus sequences for each population were estimated with the function consensusString in the R package ‘Biostrings’. The two Cabrera localities were considered as a single population (Fst =0.03) and data merged. A ML tree was then built on the consensus alignment with IQ-TREE 2 v2.2.0.8 (Minh et al., 2020) using variable sites only and applying an ascertainment bias correction for SNP data (model GTR+ASC) (Lewis, 2001), with 10000 pseudo-replicates.
To identify the most ancestral population in our dataset we used IQ-TREE 2 with non-reversible substitution models (model 12.12) (Naser-Khdour et al., 2022) with 1,000 ultrafast bootstrap replicates, using both the consensus alignment as well as one random specimen per population (all sites or only variables). The program performs a bootstrap analysis to obtain several ML rooted bootstrap trees; it then computesrootstrap support values for each branch in the tree, as the proportion of rooted bootstrap trees that have the root on that branch. A root testing was then performed with option –root-test to compare the log-likelihoods of the trees being rooted on every branch of the ML tree. The resulting trees were visualized and edited with Figtree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).