Population Structure

From PLoSWiki
Jump to: navigation, search
Lana S. Martin1, Eleazar Eskin2
Author 1 ORCID: Orcid icon.png 0000-0003-2311-7191
Author 2 ORCID: Orcid icon.png 0000-0003-1149-4758


Population structure (or population stratification) is the presence of a systematic difference in allele frequencies between subpopulations in a population, possibly due to different ancestry, especially in the context of association studies. Population structure arises when there is relatedness among individuals in a study cohort. Unless accounted for in the methodology, population structure can create false positive associations when conducted a genome-wide association study.

Causes of Population Structure

Images should be uploaded to wikimedia commons under the CC-BY license.

The basic cause of population structure is nonrandom mating between groups, often due to their physical separation (e.g., for populations of African and European descent) followed by genetic drift of allele frequencies in each group. In some contemporary populations there has been recent admixture between individuals from different populations, leading to populations in which ancestry is variable (as in African Americans). Over tens of generations, random mating can eliminate this type of structure. In some parts of the globe (e.g., in Europe), population structure is best modeled by isolation-by-distance, in which allele frequencies tend to vary smoothly with location.

Population structure and association studies

Population structure can be a problem for association studies, such as case-control studies, where the association could be found due to the underlying structure of the population and not a disease associated locus.

Geneticists link genetic traits with disease risk and development using a genome-wide association study (GWAS). One challenge to producing accurate GWAS results is that of relatedness, termed “population structure,” within a study cohort. Population structure can produce many false positive associations in GWAS results; in other words, population structure may cause a GWAS method to incorrectly identify genetic variants as associated with a disease. Over the past 10 years, new approaches have used mixed models to mitigate the biasing effects of population structure and relatedness in association studies. However, developing GWAS techniques to effectively test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then show how mixed models, extended with particular algorithms, can correct for these confounding genetic relationships.

By analogy, one might imagine a scenario in which certain small beads are made out of a certain type of unique foam, and that children tend to choke on these beads; one might wrongly conclude that the foam material causes choking when in fact it is the small size of the beads. Also the real disease causing locus might not be found in the study if the locus is less prevalent in the population where the case subjects are chosen. For this reason, it was common in the 1990s to use family-based data where the effect of population stratification can easily be controlled for using methods such as the TDT. But if the structure is known or a putative structure is found, there are a number of possible ways to implement this structure in the association studies and thus compensate for any population bias. Most contemporary genome-wide association studies take the view that the problem of population stratification is manageable,[1] and that the logistic advantages of using unrelated cases and controls make these studies preferable to family-based association studies.

The two most widely used approaches to this problem include genomic control, which is a relatively nonparametric method for controlling the inflation of test statistics,[2] and structured association methods,[3]which use genetic information to estimate and control for population structure. Currently, the most widely used structured association method is Eigenstrat, developed by Alkes Price and colleagues.[4]

True genetic model

An example of population structure confounding from mouse genetics

Why we observe false positives in mouse genetic studies

Correcting for population structure using mixed model methods

Correcting for population structure in mouse association studies

Correcting for population structure in human association studies


See also

Wikipedia pages that should link here


  1. ^ Population stratification#cite note-1
  2. ^ Population stratification#cite note-2
  3. ^ Population stratification#cite note-3
  4. ^ Population stratification#cite note-4