Page 15 - profile2014.indd
P. 15

Bioinformatics Lab












                                                                                  as a universal predictor for proteins, regardless
                                                                                  of organism. UniLoc uses natural language
                                                                                  processing techniques to define protein syn-
                                                                                  onyms. A protein synonym is a peptide of  n
                                                                                  amino acids that indicates a possible sequence
                                                                                  variation in the evolution of a protein. UniLoc
                                                                                  is built on a proteome-scale database and
                                                                                  includes localization sites in prokaryotic and
                                                                                  eukaryotic organisms. It can efficiently distin-
                                                                                  guish between single- and multi-localized pro-
                                                                                  teins and predict localizations with high preci-
                                                                                  sion and recall,  outperforming most existing
                                                                                  predictors. Furthermore, UniLoc can also be
                                                                                  used to interpret a prediction with identified
                                                                                  template sequences in the database.

               Figure 1. The Web framework for Integrated Omic Data to reveal the   Disease-centric human proteome database.  The ultimate goal
               hidden biological regulations and pathways.          of  our  MS-based  proteomics  research  is  biomarker  discovery. To
                                                                    achieve this goal, we have constructed a human proteome data-
                                                                    base, which contains comprehensive information on the human
                                                                    membrane proteome. Using this database, we have joined other
               posed a method, called ProDIA, to generate in silico MS/MS spectra   researchers  in  Taiwan  to  work on  human  chromosome 4  in  the
               from SWATH-MS datasets; this would enable MS/MS datasets to be   Chromosome-centric Human Proteome Project (c-HPP), an interna-
               searched using database sequence searching tools, e.g., MASCOT,   tional project orchestrated by the Human Proteome Organization.
               for protein identification. Third, we are developing a method to   In the current stage, the main goal of c-HPP is to detect missing
               generate peak list files from raw data, enabling the generation of   proteins. Using our bioinformatics expertise, we have determined
               peaks consistent with raw data and providing charge information   a list of missing proteins in chromosome 4 for experimental detec-
               for each peak; our glycoproteomics research has shown that the   tion by our collaborators.
               majority of the currently available converters for generating peak
               list files from spectra raw data suffer some serious limitations, in-  Collaborators: Since bioinformatics is an interdisciplinary research
               cluding the failure to provide the charge state or intensity of each   area, we have been collaborating with principal investigators from
               peak in a spectrum, and inconsistency between the m/z or inten-  the Biodiversity Research Center (BRC), the Genomics Research
               sity of some peaks and the raw data.                 Center (GRC), the Institute of Biomedical Science (IBMS), the Insti-
                                                                    tute of Chemistry (IC), the Institute of Plant Science and Microbi-
               In addition to proteomics, we have also worked on MS-based   ology (IPSM), and the Institute of Cellular and Organismic Biology
               metabolomics. Since few tools are available for metabolite quan-  (ICOB) at Academia Sinica; the National Health Research Institute;
               titation, we have developed an automated and highly accurate   the School of Life Sciences at Yang Ming University; and the Col-
               metabolite quantitation tool. Moreover, we have proposed a com-  lege of Bioresources and Agriculture and the College of Life Sci-
               putational method for metabolite identification, which involves an   ence at National Taiwan University; furthermore, we have also been
               effective clustering technique to group a metabolite with its frag-  working with physicians from National Taiwan University Hospital.
               ments, and then enables searches against different metabolite da-  In addition, we have ongoing collaborative research projects with
               tabases. The proposed method can lead to identification with high   investigators in the Department of Plant Biology and Medical
               sensitivity and accuracy.                            School, Michigan State University, David Geffen School of Medicine
                                                                    at UCLA, and Microsoft Inc.
               Protein structure and subcellular localization predictions. We work
               on structure prediction specifically for transmembrane (TM) pro-
               teins, since (i) membrane proteins are prominent drug targets, and
               (ii) TM proteins are a major type of such proteins. We have devel-
               oped methods for predicting TM topology, helix-helix interaction
               and contacts, and lipid exposure of each TM residue. Furthermore,
               we have developed a method for predicting signal peptides, as
               these can be mistakenly predicted as TM helices.
               Since determination of protein subcellular localization (PSL) sites
               through wet-lab experiments is labor intensive and time consum-
               ing, we have developed a computational approach, called UniLoc,
                                                                      Figure 2. Omics Database for Model and non-Model Organisms.



                                                                                                                     15
   10   11   12   13   14   15   16   17   18   19   20