Curating Datasets- A Guide to Making Custom Datasets for Analysis of Admixture
All population genetics/archaeogenetics analyses require the use of a dataset. A dataset refers to the genetic data of a collection of individuals and other associated data. If you are to use any of the programs mentioned in this article, you will need to use a dataset. The format of the dataset will depend on the program being used. See more
As stated in this article, ADMIXTOOLS is a collection of programs that use direct genetic data (SNPs) to infer genetic relationships between populations. There are two versions of ADMIXTOOLS, the original, developed by David Reich and Nick Patterson for Linux and Mac, and ADMIXTOOLS 2, an improved, faster version developed by Robert Maier, Pavel Flegontov, Ulas Isildak, David Reich, and Nick Patterson for Linux, Mac and Windows. Please note that ADMIXTOOLS tutorials on this site will be specific to ADMIXTOOLS 2, as it is more accessible. See more
As stated in this article, ADMIXTURE estimates the ancestry of a population using K (a given number) hypothetical populations. It does so in a model-based manner. It utilizes a given dataset containing samples (which can be either ancient or modern, and can be sequenced in any way), and estimates the amount of genetic ancestry of the samples derived from each of K populations, though it does not model genetic drift. These populations are hypothetical and not designated by the user, meaning that ADMIXTURE does not directly test for admixture between populations, examining individual samples instead of populations as a whole. The use of ADMIXTURE is beneficial when differentiating between clusters of populations. Additionally, unlike ADMIXTOOLS, there is little bias when it comes to the type of file (though the way that a sample was sequenced has the capability to affect the composition of a sample at higher values of K) ADMIXTURE results can be plotted via a bar chart, or even used as PCA-like coordinates. Note that ADMIXTURE can only be used in Linux and MacOS. This tutorial will pertain to Linux only, as I have no experience with MacOS and do not want to disseminate any misinformation relating to the OS. See more
As stated in this article, LINADMIX was developed by Lily Agranat-Tamir, Shamam Waldman, Naomi Rosen, Benjamin Yakir, Shai Carmi, and Liran Carmel as an alternative to qpAdm given the fact that it's advised that qpAdm should not be used when both modern and ancient samples are co-examined. LINADMIX works in tandem with ADMIXTURE, relying on ADMIXTURE's output. LINADMIX uses a linear regression model and estimates admixture proportions for a target population using ADMIXTURE results of source populations as mixing coefficients and computes a plausibility value to determine whether or not the model is plausible, meaning that it can also be used to designate plausible models. LINADMIX can be used to model modern populations, and can be used in cases of missing data and genetic drift (whereas ADMIXTURE cannot model genetic drift, LINADMIX is robust to it). Although LINADMIX performs better when source populations are highly diverged, genetically similar source populations can still be used. See more