Page 30 - 2017 Brochure

P. 30

earch Laboratories

Natural Language and Knowledge Processing Lab

We focus on problems concerning knowledge-based a Chinese factoid QA system, which won first prize in
information processing, which is strongly motivated by the NTCIR-5 and NTCIR-6. In the future, we will extend the
over-ﬂooding of information on the Internet. We are currently system to answer “how” and “what” questions.
studying knowledge acquisition, representation and
utilization with a special emphasis on Chinese processing. d) Named entity recognition (NER)
Identifying person, location, and organization names in
1. Knowledge Base documents is extremely important for natural language
understanding. In the past, we have developed a machine-
Our focus is on strategies and methodologies for learning based NER system, which won second prize in
automating knowledge acquisition processes. the 2006 SIGHAN competition, and first prize in the 2009
BioCreative II.5 gene name normalization shared task.
a) Construction of linguistic knowledge bases
Over the past twenty-some years, we have developed an e) Chinese Textual Entailment (TE)
infrastructure for Chinese language processing that includes TE is the task of identifying inferences between sentences.
part-of-speech tagged corpus, tree-banks, Chinese lexical We have integrated several NLP tools and resources,
databases, Chinese grammar, InfoMap, word identiﬁcation focusing on deeper semantic and syntactic analysis
systems, sentence parsers, and more. We plan to utilize to construct a Chinese TE recognition system, which
these tools, in combination with crowd-sourcing, to extract performed well in the 2011 NTCIR-9 TE shared task.
linguistic and domain knowledge from the web. Various
knowledge bases including general and special domain f) Distributional Word Representation
ontology, as well as lexical items and named entities from Recently, distributional word representations have become
Wikipedia are connected to form a complete concept net. widely used in NLP tasks. Compared with traditional
symbolic word meaning representations, distributional word
2. Natural Language Understanding representations are trained from a corpus, and represent
word meanings as vectors, thus providing additional
We will remodel the current ontology structures of computational power and the advantage of generation.
WordNet, HowNet, and FrameNet to achieve a more However, this representation is short of explanation ability.
unified representation. We designed a universal concept Thus fully making use of the strength of each kind of
representational mechanism called E-HowNet, which is a representation is crucial in resolving practical NLP tasks.
frame-based entity-relation model. E-HowNet has semantic For instance, we developed a lexical sentiment analyzer
composition and decomposition capabilities which may using both distributional word representation and E-HowNet
derive near-canonical representations of sentences through to predict the sentiment of a given word. This system won
semantic composition of lexical senses. To connect with ﬁrst prize in Valance at an international contest held by IALP
other well-developed ontology structures, senses in in 2016. We have also studied how to infuse information
E-HowNet were manually connected to the corresponding in knowledge bases into the form of distributional word
synsets in WordNet, and lexicons in E-HowNet are representation. These results were published in EACL,
automatically linked to synsets in WordNet. 2017.

a) Knowledge-based Chinese language processing 3. Natural Language Applications
We will focus on the conceptual processing of Chinese
documents. Our system will utilize the statistical, linguistic, Our Chinese input system, GOING, is used by over one
and common sense knowledge, derived from our evolving million people in Taiwan, and InfoMap, our knowledge
Knowledge Web and E-HowNet, to parse the conceptual representation kernel, has been applied to a wide variety
structures of sentences and interpret sentence meanings. of systems. In the future, we will design event frames as
a major building block of our learning system. We will
b) Statistical Principle-based Model also develop basic technologies for processing spoken
Most pattern-based IE approaches are initiated by manually language and music to support various applications.
providing seed instances. We have proposed a semi-
supervised method that can take a large quantity of seed a) Sentiment Analysis and Opinion Mining
instances with diverse quality. Our strategy provides ﬂexible Processing subjective information requires deep
frame-based pattern matching and summarization. understanding. We have studied opinions, sentiments,
subjectivities, affects, emotions and views in texts, such as
c) Chinese question answering system news articles, blogs, forums, reviews, comments, dialogs
We integrated several Chinese NLP techniques to construct

28 研究群 Research Laboratories

25 26 27 28 29 30 31 32 33 34 35