Page 28 - profile2012.indd
P. 28

Research Laboratories  研究群



                                                                                                                                                                        語言與知識處理實驗室


                            Natural Language and




                          Knowledge Processing                                                                                        Laboratory




          Research Faculty                                                                                                           Technical Faculty
          Research Faculty

             Wen-Lian Hsu              Fu Chang                 Keh-Jiann Chen            Hsin-Min Wang                               Der-Ming Juang
             Distinguished Research Fellow  Associate Research Fellow  Research Fellow    Associate Research Fellow                   Assistant Research Engineer





               Group Profile
             We focus on problems concerning knowledge-based informa-  c) Pattern-based information extraction (IE)                     in 2009 Music Information Re-
             tion processing, a process which is strongly motivated by the   Most pattern-based IE approaches kick off by manually pro-  trieval  Evaluation  eXchange
             over-flooding of information on the Internet. We shall work on   viding seed instances. We have proposed two mechanisms    (MIREX2009).  We have devel-
             knowledge acquisition, utilization, and representation.  to remove human efforts at the beginning state. First, we         oped a novel query by multi-
                                                                   applied  a  semi-supervised  method that  can  take  a large         tags music search system.                                         Basic natural language
             1. Knowledge Acquisition                              quantity of seed instances with diverse quality. Second, we
                                                                   proposed a weakly-supervised approach for extracting in-           c) Chinese  question answering                                      understanding is to identify
             Our focus is on strategies and methodologies of automating   stances of semantic classes, which uses a compression mod-    system
             knowledge acquisition processes.                      el to assess the contextual evidence of its extraction.              We integrated several Chinese                                     person, time, location,
                                                                                                                                        NLP techniques to construct                                       artifact, and event in a
             a) Construction of linguistic knowledge bases       2. Knowledge Utilization                                               a  Chinese  factoid  QA  system,
               In the past twenty some years, we have developed an in-                                                                  which won the first place in NTCIR-5 and NTCIR-6. In the future, we will   sentence, which is especially
               frastructure for Chinese language processing that includes   Our Chinese input system, GOING, is used by over one million   extend the system to answer “how” and “what” types of questions.
               part-of-speech tagged corpus, tree-banks, Chinese lexical   people in Taiwan. Our knowledge representation kernel, Info-                                                                   important in Chinese
               databases, Chinese grammars, InfoMap, word identification   Map, has been applied to a wide variety of application systems.   d) Named entity recognition (NER)                            language processing, since
               systems, sentence parsers, etc. We have also developed some   In the future, we will design event frames as a major building   Identifying person, location, and organization names in documents is very
               basic techniques for knowledge extraction, such as named-  block of our learning system. We will also develop basic tech-  important for natural language understanding. In the past, we have devel-  it has no specified word
               entity recognition (NER), semantic role labeling, and relation   nologies for processing spoken languages, and music to sup-  oped a machine-learning based NER system, which won the second place
               extraction in both Chinese and biological literature. We plan   port various applications.                               in 2006 SIGHAN competition, and the 1st place in 2009 BioCreative II.5   boundary.
               to extract linguistic and domain knowledge from the web                                                                  gene name normalization shared task. In recent years, we focused on the
               with crowd sourcing.                              a) Knowledge-based Chinese language processing                         research of using semantic rules and language patterns for NER adopting
                                                                   We will focus on the conceptual processing of Chinese docu-          Markov-Logic Network, which provides more flexibility in NER.
             b) Machine learning and pattern classification        ments. Our system will utilize the statistical, linguistic, and
               We have proposed an extremely efficient tree decomposi-  common sense knowledge derived by our evolving Knowl-         e) Chinese Textual Entailment (TE)
               tion approach to train non-linear support vector machines   edge Web and E-HowNet to parse the conceptual structures     TE is the task of identifying inferences between sentences. We have in-
               at a speedup factor of hundreds, or even thousands some-  of sentences and interpret the sentence meanings.              tegrated several NLP tools and resources, focusing on deeper semantic
               times, while still achieving comparable test accuracy.  We                                                               and syntactic analysis to construct a Chinese TE recognition system, which
               are also pioneering a new method for ranking and select-  b) Audio (speech / music / song) processing & retrieval        achieved good performance in 2011 NTCIR-9 TE shared task.
               ing features using multiple feature                              We focus on speech recognition, speaker
               subsets, and have gained advan-                                  recognition/segmentation/clustering,  and             3. Knowledge Representation
               tages in computing speed, test                                   spoken document retrieval/summarization.
               accuracy, the number of essential                                Our speaker verification system was ranked            We will remodel the current ontology structures of WordNet, HowNet, and
               features  that are  ranked  above  all                           2nd  in  2006  International  Symposium  on           FrameNet to achieve a more unified representation.  We designed a uni-
               irrelevant features, and the number                              Chinese Spoken Language Processing.  We               versal concept representational mechanism called E-HowNet, which is a
               of essential features in the selected                            have developed a prototype  TV news re-               frame-based entity-relation model E-HowNet has semantic composition and
               features.                                                        trieval system. In regards to music, our re-          decomposition capabilities which intend to derive near-canonical sense rep-
                                                                                search focuses on vocal melody extraction,            resentation of sentences through semantic composition of lexical senses.
                                                                                query by singing/humming, music tag an-
                                                                                notation, and tag-based music retrieval.
                                                                                Our audio-tagging system was ranked 1st



               研究群
         28    Research Laboratories
         28
                                                                                                                                                                                                                                            29
                                                                                                                                                                                                                                            29
   23   24   25   26   27   28   29   30   31   32   33