Page 50 - 2017 Brochure
P. 50
聘研究員

Distinguished Research Fellows

許聞廉 Wen-Lian Hsu

Distinguished Research Fellow
Ph.D., Operations Research, Cornell University

Tel: +886-2-2788-3799 ext. 1804 Fax: +886-2-2782-4814
Email: hsu@iis.sinica.edu.tw
http://iasl.iis.sinica.edu.tw/hsu

Research Description d) A statistical principle-based approach (SPBA) for
NLU. SPBA is a pattern-based machine learning
My main research interests include graph algorithms, (ML) approach that exhibits the benefits of both rule-
natural language understanding and bioinformatics. based and traditional statistical ML approaches. SPBA
automatically clusters linguistic patterns and aligns
1. Graph algorithm: We have performed extensive and the input to representative patterns (principle) from
ground-breaking work on two fundamental classes each cluster. The insertion and deletion scores can be
of special graphs, namely planar graphs and interval pre-trained by logistic regression. SPBA automatically
graphs. A new data structure, PC-tree, was introduced takes care of rule variations and exceptions, and is
to greatly simplify the recognition of these two classes highly interpretable.
of graphs. PC-tree is a natural representation for planar
graph embedding. Our new planarity test, based on PC- 3. Bioinformatics: We have developed the following:
trees is simple, elegant, and yields a linear time algorithm a) A suite of software for protein quantitation (Multi-Q,
for finding maximal planar subgraphs.
MaxiQ, and Ideal-Q), which has received a lot of
2. N atural language understanding (NLU): We have attention and users.
developed the following: b) An ultra-efficient divide-and-conquer algorithm, called
Kart, for NGS read mapping, which divides a read into
a) A Chinese input system, GOING 自然輸入法 , which small fragments that can be aligned independently
automatically translates a Pinyin sequence into in parallel. Kart is 3- to 10-times faster than other
characters based on context. With an accuracy rate aligners. The same strategy has also been applied to
close to 96%, it received the Distinguished Chinese an RNA-seq mapper, generating superior results.
Information Product Award( 中 文 傑 出 資 訊 產 品 獎 )in c) An automated method for programmable one-pot
1993. It is being used by more than a million people in oligosaccharide synthesis, which drastically reduces
Taiwan. the complexity of intermediate separation steps and
protecting group manipulation.
b) A subsequence input method, déjà vu, which can d) Several machine learning and knowledge-based
be applied to any language to speed up typing, algorithms for protein structure and function prediction,
especially on mobile devices. Using déjà vu, one can as well as protein interaction interface prediction. Our
get “bureaucracy” by typing “bucy”, and “machine new protein sequence aligner, SymAlign, has much
learning” by typing “malr”. The shorthand is created better precision than other aligners (40% compared to
on-the-fly. 6% or less for the others). SymAlign regards protein
sequences as a language and identifies protein
c) A knowledge annotation and inference kernel, synonyms from a sizeable database.
InfoMap, which supports the implementation of our
NLU modules. Utilizing InfoMap, we won 1st place in
the Chinese Question Answering contest in NTCIR held
in Tokyo, Japan in 2005 and 2007; 1st place in 2006
SIGHAN CityU Word Segmentation; 2nd place in 2006
SIGHAN Named Entity Recognition (NER) competition;
and 1st place in 2013 NTCIR-10 Recognizing Inference
in Text Task. In biological literature mining, we also won
1st place in the BioCreative II.5 Gene Normalization
task in 2009.

48 特聘講座 / 特聘研究群 Distinguished Chair and Distinguished Research Fellows
   45   46   47   48   49   50   51   52   53   54   55