Institute of Information Science Academia Sinica
Topic: Population Stratification: Simpson’s Paradox In Genetic Association
Speaker: Prof. SAURABH GHOSH (Human Genetics Unit Indian Statistical Institute, Kolkata)
Date: 2019-08-07 (Wed) 14:00 – 16:00
Location: Auditorium 107 at IIS new Building
Host: Huai-Kuang Tsai


The effect of population stratification (usually characterized by the presence of genetic heterogeneity) on tests of genetic association for both binary as well as quantitative traits has traditionally been studied in the context of false positive rates. It is well known that a test based on a sample comprising genetically or phenotypic heterogeneous subpopulations is susceptible to an inflated rate of false positives for population-based designs but not for family-based designs. On the other hand, the effects of population stratification on the false negative rates of association tests based on either of the study designs have remained largely unexplored. In this talk, we first investigate, both analytically as well as empirically, the possible marginal and joint effects of genetic and phenotypic heterogeneities on the false negative rates (and hence power) of both population-based as well as family-based tests for quantitative traits with controlled false positive rates. We study the marginal effect of either phenotypic or genetic heterogeneity on the powers of some popular population-based association tests for quantitative traits and evaluate both the marginal as well as joint effects of genetic and phenotypic heterogeneities on some model free family-based tests for transmission disequilibrium. On a related issue, we discuss a novel Bayesian semi-parametric algorithm to correct for population stratification in tests of association. While several methods have been developed to infer on population structure and correct for stratification in the tests for association, the estimation of the number of underlying subpopulations (K), which is of additional interest from an evolutionary perspective, has not been adequately addressed, except in STRUCTURE. In order to circumvent the problem of estimation of parameters in high dimensional spaces, STRUCTURE adopts an ad hoc approach of Bayesian deviance that tends to overestimate K and may lead to reduced power in detecting association.  Following the approach of Bhattacharya (2008), we use a MCMC procedure to estimate population structure under the assumption that K is random. Based on extensive simulations under a set-up of no admixture and an unlinked set of markers, we found that our method provides more accurate estimates of K compared to STRUCTURE and is marginally more powerful than STRAT after controlling for the overall false positive rate. We have also analyzed the Human Genome Diversity Panel data using our model and have obtained very good clustering of the individuals in the panel.

(This is a joint work with Tanushree Haldar and Arunabha Majumdar)