Page 14 - profile2014.indd
P. 14
Decode life, Explore the unknown.
Bioinformatics Lab
Lab
Research Faculty
Ting-Yi Sung Our current research is focused on bioinformatics for “omics” studies, classified into two
main areas: (i) genomics and transcriptomics, and (ii) proteomics and metabolomics.
Research Fellow
These areas are described below.
Jan-Ming Ho
Research Fellow 1. Genomics and Transcriptomics Studies
Wen-Lian Hsu
Distinguished Research Fellow With the ascension of next-generation sequencing (NGS) as a predominant technology
for genome and transcriptome studies, we have devoted ourselves to developing new
Chung-Yen Lin methodologies and tools for analyzing NGS data. First, we have proposed computational
Associate Research Fellow methods to assemble high-throughput short read sequences; to this end, we have devel-
Arthur Chun-Chieh Shih oped a de novo assembler, called JR-Assembler. This tool can assemble a giga-base-pair
Research Fellow genome from Illumina short reads, and is effective in memory usage and efficient in CPU
time. Second, we have proposed an automated metagenomic data-processing pipeline,
Huai-Kuang Tsai called MetaABC, which integrates several binning tools coupled with methods for remov-
Associate Research Fellow ing artifacts, analyzing unassigned reads, and controlling sampling biases, to achieve less
biased analysis. Third, in order to uncover secrets within the massive collection of om-
Postdoctoral ics data from model and non-model organisms, we have integrated several open source
Yu-Jung Chang software packages with our own tools, enabling NGS and other omics data to be com-
bined and analyzed at our web server, called Multi-Omics Online Analysis System (avail-
Chia-Ying Cheng able at http://molas.iis.sinica.edu.tw). Fourth, we are working on read alignment; NGS
Te-Chin Chu reads are getting longer, but most existing short-read aligners were developed and opti-
Chan-Hsien Lin mized for 100bp reads or shorter. We have developed a new alignment algorithm, called
Kart, which can efficiently produce reliable, longer alignments with a low error rate, and
Hsin-Nan Lin can tackle PacBio reads with high accuracy. Fifth, to address the increase in computa-
Ke-Shiuan Lynn tion power required for biological research, we have collaborated with colleagues in our
institute to implement a user-friendly tool for biologists, called CloudDOE (http://cloud-
Zing Tsung-Yeh Tsai doe.iis.sinica.edu.tw/); this tool involves a Hadoop cloud which can substantially reduce
Yu-Wei Tsay the complexity and costs of deployment, execution, enhancement, and management of
computation resources.
We have been using the aforementioned methods and tools to tackle various biological
problems, and the following in particular: (a) gene duplication in C4 plant leaf evolution,
(b) reconstruction of regulatory networks of maize leaf development, (c) integration of
transcription factors, miRNAs, and epigenetic information to study gene regulation, (d)
reconstruction of miRNA-gene regulatory networks in cardiac hypertrophy and B cell dif-
ferentiation, (e) identification of structural variations in the autism genome, (f) functional
analysis of non-coding RNAs in human, (g) identification of druggable oncogene fusions
and the underlying mechanisms, and (h) viral genome recombination and genotyping.
2. Proteomics and Metabolomics
Mass Spectrometry (MS)-based proteomics and metabolomics. MS has become the pre-
dominant technology for proteomics research. Protein identification and quantitation are
the two main purposes of mass spectral analysis. Previously we focused on developing
bioinformatics systems for quantitation analysis, through which we created three tools,
i.e., MaXIC-Q, MaXIC-Q, and IDEAL-Q, for various experimental quantitation approaches.
Currently, we are working on improving protein identification. First, we have recently
proposed novel methods for glycoprotein identification (including the implementation
of an automated tool called MAGIC), since glycosylation is considered to be the most
important post-translational modification (PTM), and analysis of MS/MS data acquired
from glycoproteomic experiments is challenging. Second, we have proposed a method
to utilize SWATH-MS data for protein identification. SWATH is a data-independent ac-
quisition method developed in recent years primarily for targeted proteomics analysis,
and this method has since attracted considerable attention. Since the high-throughput
data generated from SWATH-MS is mainly used for targeted proteomics analysis, we pro-
14 研究群 Research Laboratories