Assessing population differentiation and isolation from single-nucleotide polymorphism data
Nicholson G., Smith AV., Jónsson F., Gústafsson O., Stefánsson K., Donnelly P.
We introduce a new, hierarchical, model for single-nucleotide polymorphism allele frequencies in a structured population, which is naturally fitted via Markov chain Monte Carlo methods. There is one parameter for each population, closely analogous to a population-specific version of Wright's FST, which can be interpreted as measuring how isolated the relevant population has been. Our model includes the effects of single-nucleotide polymorphism ascertainment and is motivated by population genetics considerations, explicitly in the transient setting after divergence of populations, rather than as the equilibrium of a stochastic model, as is traditionally the case. For the sizes of data set that we consider the method provides good parameter estimates and considerably outperforms estimation methods analogous to those currently used in practice. We apply the method to one new and one existing human data set, each with rather different characteristics - the first consisting of three rather close European populations; the second of four populations taken from across the globe. A novelty of our framework is that the fit of the underlying model can be assessed easily, and these results are encouraging for both data sets analysed. Our analysis suggests that Iceland is more differentiated than the other two European populations (France and Utah), a finding which is consistent with the historical record, but not obvious from comparisons of simple summary statistics.