Page 15 - profile2014.indd
P. 15
Bioinformatics Lab
as a universal predictor for proteins, regardless
of organism. UniLoc uses natural language
processing techniques to define protein syn-
onyms. A protein synonym is a peptide of n
amino acids that indicates a possible sequence
variation in the evolution of a protein. UniLoc
is built on a proteome-scale database and
includes localization sites in prokaryotic and
eukaryotic organisms. It can efficiently distin-
guish between single- and multi-localized pro-
teins and predict localizations with high preci-
sion and recall, outperforming most existing
predictors. Furthermore, UniLoc can also be
used to interpret a prediction with identified
template sequences in the database.
Figure 1. The Web framework for Integrated Omic Data to reveal the Disease-centric human proteome database. The ultimate goal
hidden biological regulations and pathways. of our MS-based proteomics research is biomarker discovery. To
achieve this goal, we have constructed a human proteome data-
base, which contains comprehensive information on the human
membrane proteome. Using this database, we have joined other
posed a method, called ProDIA, to generate in silico MS/MS spectra researchers in Taiwan to work on human chromosome 4 in the
from SWATH-MS datasets; this would enable MS/MS datasets to be Chromosome-centric Human Proteome Project (c-HPP), an interna-
searched using database sequence searching tools, e.g., MASCOT, tional project orchestrated by the Human Proteome Organization.
for protein identification. Third, we are developing a method to In the current stage, the main goal of c-HPP is to detect missing
generate peak list files from raw data, enabling the generation of proteins. Using our bioinformatics expertise, we have determined
peaks consistent with raw data and providing charge information a list of missing proteins in chromosome 4 for experimental detec-
for each peak; our glycoproteomics research has shown that the tion by our collaborators.
majority of the currently available converters for generating peak
list files from spectra raw data suffer some serious limitations, in- Collaborators: Since bioinformatics is an interdisciplinary research
cluding the failure to provide the charge state or intensity of each area, we have been collaborating with principal investigators from
peak in a spectrum, and inconsistency between the m/z or inten- the Biodiversity Research Center (BRC), the Genomics Research
sity of some peaks and the raw data. Center (GRC), the Institute of Biomedical Science (IBMS), the Insti-
tute of Chemistry (IC), the Institute of Plant Science and Microbi-
In addition to proteomics, we have also worked on MS-based ology (IPSM), and the Institute of Cellular and Organismic Biology
metabolomics. Since few tools are available for metabolite quan- (ICOB) at Academia Sinica; the National Health Research Institute;
titation, we have developed an automated and highly accurate the School of Life Sciences at Yang Ming University; and the Col-
metabolite quantitation tool. Moreover, we have proposed a com- lege of Bioresources and Agriculture and the College of Life Sci-
putational method for metabolite identification, which involves an ence at National Taiwan University; furthermore, we have also been
effective clustering technique to group a metabolite with its frag- working with physicians from National Taiwan University Hospital.
ments, and then enables searches against different metabolite da- In addition, we have ongoing collaborative research projects with
tabases. The proposed method can lead to identification with high investigators in the Department of Plant Biology and Medical
sensitivity and accuracy. School, Michigan State University, David Geffen School of Medicine
at UCLA, and Microsoft Inc.
Protein structure and subcellular localization predictions. We work
on structure prediction specifically for transmembrane (TM) pro-
teins, since (i) membrane proteins are prominent drug targets, and
(ii) TM proteins are a major type of such proteins. We have devel-
oped methods for predicting TM topology, helix-helix interaction
and contacts, and lipid exposure of each TM residue. Furthermore,
we have developed a method for predicting signal peptides, as
these can be mistakenly predicted as TM helices.
Since determination of protein subcellular localization (PSL) sites
through wet-lab experiments is labor intensive and time consum-
ing, we have developed a computational approach, called UniLoc,
Figure 2. Omics Database for Model and non-Model Organisms.
15