學術演講

TIGP -- On the Use of GWAS Data

講者范盛娟博士 (中研院生醫所)
邀請人：TIGP Bioinformatics Program
時間2012-10-18 (Thu.) 14:00 ~ 15:30
地點資訊所新館106演講廳

摘要

Current genotyping technology has created an abundance of data that brought forth a golden opportunity for genomic data-mining research. Despite the fact that the dissection of the etiology of complex diseases seems possible, there are many challenges. For example: GWAS data only explains a small portion of heritability, genetic structure variants do not get enough consideration, phenotype definition is usually a broad spectrum which causes replication results to be inconsistent etc.

In the past, we proposed a constrained two way model (CTWM) to search for expression quantitative trait loci (EQTL) using data from two ethnic populations. The study involved genome wide gene expression and SNP data. On the other hand, since most of the odds ratios obtained from GWAS ranged from 1.1 to 1.5, it is suggested that gene-gene interaction is a prevalent phenomenon in the etiology of common diseases. We explored haplotype-based approaches because they might have greater power than single-locus analyses when SNPs are in strong linkage disequilibrium with the risk locus. Two data mining approaches, multifactor dimensionality reduction (MDR) and classification and regression tree (CART) with the concept of haplotypes considering their haplotype uncertainty were evaluated. High-density genotyping arrays can now screen more than five million genetic markers. As a result, multiple comparisons have become an important issue. We recently proposed a two-stage maximal segmental score procedure (MSS), which uses region-specific empirical p-values to identify genomic segments most likely harboring the disease gene. Through simulations, our results indicate that MSS increases power to detect genetic associations compared with conventional methods. Common diseases are likely caused by a complex interplay between many genes and environmental factors. Patients with the same diagnosis may differ greatly in the number and severity of symptoms, suggesting heterogeneity in causal pathways. To circumvent, one of the alternative approaches is to use endophenotypes to study the association. We proposed an analytical procedure to identify endophenotypes which uses non-negative matrix factorization (NMF) to explore the potential molecular dissimilarities of a complex disease based on microarray data; adjusted rand index was also used to select informative transcripts for each molecular subtypes. A simulation study with gene expression data sets to add genotype information was conducted to examine the performance between our proposed method and principal component analysis with k-means clustering (PCA-K). Our results demonstrated that the proposed procedure provides higher power for different scenarios comparing with PCA-K.

中央研究院資訊科學研究所

活動訊息

學術演講

TIGP -- On the Use of GWAS Data

摘要