Page 28 - profile2012.indd
P. 28
Research Laboratories 研究群
語言與知識處理實驗室
Natural Language and
Knowledge Processing Laboratory
Research Faculty Technical Faculty
Research Faculty
Wen-Lian Hsu Fu Chang Keh-Jiann Chen Hsin-Min Wang Der-Ming Juang
Distinguished Research Fellow Associate Research Fellow Research Fellow Associate Research Fellow Assistant Research Engineer
Group Profile
We focus on problems concerning knowledge-based informa- c) Pattern-based information extraction (IE) in 2009 Music Information Re-
tion processing, a process which is strongly motivated by the Most pattern-based IE approaches kick off by manually pro- trieval Evaluation eXchange
over-flooding of information on the Internet. We shall work on viding seed instances. We have proposed two mechanisms (MIREX2009). We have devel-
knowledge acquisition, utilization, and representation. to remove human efforts at the beginning state. First, we oped a novel query by multi-
applied a semi-supervised method that can take a large tags music search system. Basic natural language
1. Knowledge Acquisition quantity of seed instances with diverse quality. Second, we
proposed a weakly-supervised approach for extracting in- c) Chinese question answering understanding is to identify
Our focus is on strategies and methodologies of automating stances of semantic classes, which uses a compression mod- system
knowledge acquisition processes. el to assess the contextual evidence of its extraction. We integrated several Chinese person, time, location,
NLP techniques to construct artifact, and event in a
a) Construction of linguistic knowledge bases 2. Knowledge Utilization a Chinese factoid QA system,
In the past twenty some years, we have developed an in- which won the first place in NTCIR-5 and NTCIR-6. In the future, we will sentence, which is especially
frastructure for Chinese language processing that includes Our Chinese input system, GOING, is used by over one million extend the system to answer “how” and “what” types of questions.
part-of-speech tagged corpus, tree-banks, Chinese lexical people in Taiwan. Our knowledge representation kernel, Info- important in Chinese
databases, Chinese grammars, InfoMap, word identification Map, has been applied to a wide variety of application systems. d) Named entity recognition (NER) language processing, since
systems, sentence parsers, etc. We have also developed some In the future, we will design event frames as a major building Identifying person, location, and organization names in documents is very
basic techniques for knowledge extraction, such as named- block of our learning system. We will also develop basic tech- important for natural language understanding. In the past, we have devel- it has no specified word
entity recognition (NER), semantic role labeling, and relation nologies for processing spoken languages, and music to sup- oped a machine-learning based NER system, which won the second place
extraction in both Chinese and biological literature. We plan port various applications. in 2006 SIGHAN competition, and the 1st place in 2009 BioCreative II.5 boundary.
to extract linguistic and domain knowledge from the web gene name normalization shared task. In recent years, we focused on the
with crowd sourcing. a) Knowledge-based Chinese language processing research of using semantic rules and language patterns for NER adopting
We will focus on the conceptual processing of Chinese docu- Markov-Logic Network, which provides more flexibility in NER.
b) Machine learning and pattern classification ments. Our system will utilize the statistical, linguistic, and
We have proposed an extremely efficient tree decomposi- common sense knowledge derived by our evolving Knowl- e) Chinese Textual Entailment (TE)
tion approach to train non-linear support vector machines edge Web and E-HowNet to parse the conceptual structures TE is the task of identifying inferences between sentences. We have in-
at a speedup factor of hundreds, or even thousands some- of sentences and interpret the sentence meanings. tegrated several NLP tools and resources, focusing on deeper semantic
times, while still achieving comparable test accuracy. We and syntactic analysis to construct a Chinese TE recognition system, which
are also pioneering a new method for ranking and select- b) Audio (speech / music / song) processing & retrieval achieved good performance in 2011 NTCIR-9 TE shared task.
ing features using multiple feature We focus on speech recognition, speaker
subsets, and have gained advan- recognition/segmentation/clustering, and 3. Knowledge Representation
tages in computing speed, test spoken document retrieval/summarization.
accuracy, the number of essential Our speaker verification system was ranked We will remodel the current ontology structures of WordNet, HowNet, and
features that are ranked above all 2nd in 2006 International Symposium on FrameNet to achieve a more unified representation. We designed a uni-
irrelevant features, and the number Chinese Spoken Language Processing. We versal concept representational mechanism called E-HowNet, which is a
of essential features in the selected have developed a prototype TV news re- frame-based entity-relation model E-HowNet has semantic composition and
features. trieval system. In regards to music, our re- decomposition capabilities which intend to derive near-canonical sense rep-
search focuses on vocal melody extraction, resentation of sentences through semantic composition of lexical senses.
query by singing/humming, music tag an-
notation, and tag-based music retrieval.
Our audio-tagging system was ranked 1st
研究群
28 Research Laboratories
28
29
29