Page 28 - profile-ok
P. 28

研究群   |   Research Laboratories









                                             Natural Language and

                                             Knowledge Processing Laboratory








           Research Faculty                     Group Profile


           Wen-Lian	Hsu                      We	focus	on	problems	concerning	knowledge-based	information	processing.	This	area	of	research	  (or	Pinyin)	sequence	into	characters	with	a	hit	ratio	close	to	96%.	This	  ognition	and	answer	ranking,	to	construct	a	Chinese	QA	system.	Our
           Distinguished	Research	Fellow     is	strongly	motivated	by	the	over-flooding	of	information	on	the	Internet,	for	which	effective	and	  system	is	widely	used	in	Taiwan.	It	received	the	Distinguished	Chi-  system	won	the	first	place	in	NTCIR-5	(2005)	and	NTCIR-6	(2007).	In	the
           Operations	Research	,	Cornell	University  autonomous	information	processing	tools	are	still	lacking.	In	order	to	achieve	high-level	intelli-  nese	Information	Product	Award (中文傑出資訊產品獎 )in	1993.	In	the	  future,	we	will	extend	the	types	of	questions	asked	and	add	the	ability
                                             gent	information	processing,	many	challenging	research	problems	in	the	areas	of	knowledge	ac-  area	of	PC	Home	software	downloads,	GOING	has	been	downloaded	  to	engage	in	dialogues.
           Fu	Chang                          quisition,	knowledge	representation,	and	knowledge	utilization	must	be	addressed.         about	one	million	times.	It	is	one	of	two	domestic	software	programs
           Associate	Research	Fellow                                                                                                   ranked	within	the	top	20	for	downloads.	Our	knowledge	representa-  ●  Integration of the knowledge about Chinese characters
           Mathematical	Statistics	,	Columbia	University  1. Knowledge Acquisition                                                     tion	kernel,	InfoMap,	has	been	applied	to	a	wide	variety	of	application	  We	have	established	a	platform	to	integrate	knowledge	about	Chi-
           Keh-Jiann	Chen                      For	the	task	of	acquiring	linguistic	and	common	sense	knowledge,	we	will	focus	on	strategies	  systems	in	natural	language	processing,	biological	knowledge	base,	  nese	characters	with	the	features	listed	below:
           Research	Fellow                     and	methodologies	of	automating	knowledge	acquisition	processes.	We	expect	that	in	the	fu-  and	e-learning.	In	the	future,	we	will	design	an	event	frame,	which	is	  (1)	Our	platform	has	various	means	to	retrieve	Chinese	characters,
           Computer	Science	,	State	University	of	New	York	  ture,	enhancement	of	knowledge	bases	will	be	carried	out	automatically	by	using	established	  the	key	technology	for	language	understanding	and	also	acts	as	a	ma-
           at	Buffalo                          and	yet	to	be	developed	processing	technologies	to	extract	new	knowledge	from	the	Internet	  jor	building	block	of	our	learning	system.	We	will	also	develop	basic	  such	as	by	glyph	structures	and	by	pronunciations.
                                               and	from	various	text	sources,	such	as	XML	documents	and	tagged	corpus.	                technologies	for	processing	spoken	languages,	and	support	various	  (2)	It	 addresses	 the	 issues	 of	 un-encoded	 Chinese	 characters,	 in-
           Chun-Nan	Hsu                                                                                                                applications.	Future	major	research	topics	include:	knowledge-based	  cluding	displaying,	retrieval,	input,	registration,	and	printing.
           Research	Fellow                     ●  Construction of linguistic knowledge bases                                           language	processing,	information	extraction	and	retrieval	from	text,
           Computer	Science	,	University	of	Southern	                                                                                  audio,	and	video,	intelligent	search,	cross-language	information	re-  (3)	It	organizes	information	about	Chinese	characters	and	allows
           California                          In	 the	 past	 twenty	 some	 years,	 we	 have	 developed	 an	 infrastructure	 for	 Chinese	 language	  trieval,	computer	processing	for	Taiwanese,	question	answering,	dia-  customization	to	meet	personal	preference.
                                               processing	which	includes	part-of-speech	tagged	corpus,	tree-banks,	Chinese	lexical	database,
           Hsin-Min	Wang                       Chinese	grammars,	InfoMap,	Chinese	glyph	structure	databases,	word	identification	systems,	  log,	and	intelligent	tutoring.
           Associate	Research	Fellow           sentence	parsers,	etc.	We	have	also	developed	some	basic	techniques	for	knowledge	extrac-  ●  Knowledge-based Chinese language processing  3. Knowledge Representation
           Electrical	Engineering	,	National	Taiwan	University  tion,	such	as	named-entity	recognition	(NER),	semantic	role	labeling,	and	relation	extraction	in
                                               both	Chinese	and	biological	literature.	We	have	won	1st	place	in	Chinese	word	segmentation,	  We	will	focus	on	the	conceptual	processing	of	Chinese	documents.	  We	study	the	logical	foundation	of	ontology	as	well	as	fine-grained
                                               2nd	place	in	Chinese	NER	at	2006	SIGHAN	contests,	and	1st	place	in	gene	normalization	in	the	  Our	knowledge-based	language	processing	system	will	utilize	statisti-  semantic	representation,	which	enable	us	to	have	better	knowledge
                                               2009	BioCreative	II.5	contest.	In	the	future,	we	plan	to	utilize	our	developed	infrastructure	to	  cal,	linguistic,	and	common	sense	knowledge	derived	by	our	evolving	  about	meaning	representation	and	composition.	We	will	remodel	the
                                               extract	linguistic	and	domain	knowledge	from	various	corpora	and	texts	on	the	web,	and	to	  Knowledge	Web	and	E-HowNet	to	parse	the	conceptual	structures	of	  current	ontology	structures	of	WordNet,	HowNet,	and	FrameNet	to
                                               enhance	current	knowledge	bases.	In	particular,	we	have	collected	40	million	high-frequency	  sentences	and	interpret	sentence	meanings.	The	knowledge-based	  achieve	a	better	and	more	unified	representation.
                                               meaningful	word	pairs	in	Chinese.	Based	on	these,	our	future	research	involves	automatically	  language	processing	systems	incorporate	various	knowledge	bases	  ●  E-HowNet
                                               collecting	useful	event	frames	in	order	to	better	understand	natural	language	texts.	   to	 form	 a	 learning	 system.	The	 processing	 power	 of	 the	 language
           Technical Faculty                   ●  Machine learning and data mining                                                     processing	systems	is	increased,	due	to	the	enhancement	of	the	used	  Natural	language	is	a	means	of	denoting	concepts.	However,	word
                                                                                                                                       knowledge	bases.	In	addition,	these	knowledge	bases	are	evolving
                                                                                                                                       due	to	automatic	knowledge	extraction	made	possible	by	the	lan-  sense	 ambiguities	 make	 natural	 language	 processing	 and	 concep-
                                                                                                                                                                                           tual	 processing	 almost	 impossible.	 To	 bridge	 the	 gaps	 between
                                               We	have	focused	on	machine	learning	and	its	applications	to	document	image	analysis,	opti-
           Der-Ming	Juang                      cal	character	recognition,	and	bioinformatics,	and	we	will	continue	our	work	in	enhancing	the	  guage	processing	systems.                   natural	 language	 representations	 and	 conceptual	 representations,
           Assistant	Research	Engineer         applicability	of	learning	machines	to	large-scale	problems.	There	are	three	types	of	scale	prob-  ●  Audio (speech / music / song) processing & retrieval  we	propose	a	universal	concept	representational	mechanism	called
           The	Institute	of	Computer	Management,	National	  lems	with	which	we	deal:	large	scale	in	training	samples,	large	scale	in	class	types,	and	large	                               E-HowNet,	 which	 was	 evolved	 from	 HowNet.	 It	 extends	 the	 word
           Tsing	Hua	University
                                               scale	in	(irrelevant)	features.	For	the	first	problem,	we	have	proposed	an	extremely	efficient	tree	  Our	goal	is	to	develop	methods	for	analyzing,	extracting,	recogniz-  sense	definition	mechanism	of	HowNet	and	uses	WordNet	synsets	as
                                               decomposition	approach	to	train	non-linear	support	vector	machines	at	a	speedup	factor	of	  ing,	indexing,	and	retrieving	information	from	audio	data.	In	the	area	  vocabulary	to	describe	concepts.	Each	word	sense	(or	concept)	is	de-
                                               hundreds,	sometimes	even	thousands,	while	achieving	comparable	test	accuracy.	This	method	  of	speech,	our	research	has	focused	mainly	on	speech	recognition,	  fined	by	some	simpler	concept.	The	simple	concepts	used	in	the	defi-
                                               has	been	used	effectively	to	deal	with	a	large	size	protein-protein	interface	prediction	prob-  speaker	recognition,	and	speech	information	retrieval.	We	have	pub-  nitions	can	be	further	decomposed	into	even	simpler	concepts,	until
                                               lem	with	a	300-fold	speedup.	The	tree	decomposition	method	can	be	extended	to	an	equally	  lished	several	papers	in	prestigious	journals,	such	as	IEEE	TASLP	and	  primitive	or	basic	concepts	are	derived.	Therefore,	definitions	can	be
                                               powerful	forest	decomposition	in	order	to	speed	up	machine	learning	on	data	sets	that	scale	  ACM	TALIP.	 In	 addition,	 we	 have	 successfully	 implemented	 several	  dynamically	decomposed	and	unified	into	E-HowNet	representations
                                               up	in	both	training	samples	and	class	types,	thereby	solving	the	first	and	the	second	problems	  prototype	systems,	such	as	a	TV	news	retrieval	system	and	a	speaker	  at	different	levels.	E-HowNet	is	language	independent.	Thus,	any	word
                                               simultaneously.	For	the	third	problem,	we	are	pioneering	a	new	method	for	ranking	and	select-  verification	system.	Our	speaker	verification	system	was	ranked	2nd	  sense	of	any	language	can	be	defined	and	near-canonical	represen-
                                               ing	features	using	multiple	feature	subsets,	and	have	gained	advantages	in	computing	speed,	  out	of	6	participants	in	the	ISCSLP2006	speaker	recognition	evalu-  tation	can	be	achieved.	The	semantic	distances	of	any	two	concepts,
                                               test	accuracy,	the	number	of	essential	features	that	are	ranked	above	all	irrelevant	features,	and	  ation.	 Our	 on-going	 research	 includes	 attribute-detection-based	  as	well	as	their	sense	similarity	and	difference	can	be	determined	by
                                               the	number	of	essential	features	in	the	selected	features.	While	endeavoring	to	develop	new	  speech	recognition,	spoken	document	summarization,	speaker	dia-  checking	their	definitions.	In	addition	to	taxonomy	links,	concepts	are
                                               methods,	we	also	publicize	both	our	implementations	and	the	data	sets	that	were	created	in	  rization,	and	language	modeling.	In	the	area	of	music,	our	research	  also	associated	by	their	shared	conceptual	features,	and	fine-grained
                                               our	applications,	so	as	to	benefit	potential	users	of	our	methods.                      has	 focused	 mainly	 on	 vocal	 melody	 extraction,	 query	 by	 singing/  differences	between	near-synonyms	can	be	differentiated	by	adding
                                                                                                                                       humming,	and	solo	vocal	modeling.	We	have	successfully	implement-  new	features.
                                                                                                                                       ed	several	prototype	systems,	such	as	a	music	retrieval	system	and
                                             2. Knowledge Utilization                                                                  a	 singer	 identification	 system,	 and	 published	 papers	 in	 IEEE	TASLP,	  ●  Expression of the knowledge about the glyph of Chinese char-
                                                                                                                                                                                             acters
                                                                                                                                       IEEE	TMM,	Computer	Music	Journal,	as	well	as	others.	We	participat-
                                               We	have	designed	a	Chinese	input	system,	GOING,	which	automatically	translates	a	phonetic	  ed	in	the	audio	tag	classification	task	of	MIREX2009,	and	our	system	  The	 Chinese	 Glyph	 Structure	 Database	 is	 designed	 to	 record	 the
                                                                                                                                       was	ranked	1st	out	of	12	systems.	Future	research	directions	include	  knowledge	 of	 Chinese	 characters,	 including	 time-variant	 shapes,
                                                                                                                                       continuous	improvement	of	our	own	technologies	and	systems,	fea-  structures	and	the	relationships	across	variants	in	practical	usage.	The
                                                                                                                                       ture	analysis,	vocal	separation,	and	finally	automatic	music	structure	  database	has	the	following	features:
                                                                                                                                       analysis	and	summarization,	so	as	to	facilitate	the	management	and
                                                                                                                                       retrieval	of	a	large	music	database.                  (1)	It	reflects	the	evolution	of	Chinese	characters.
                                                                                                                                       ●  Chinese question answering system                  (2)	It	expresses	the	cross-era	relationship	among	Chinese	character
                                                                                                                                                                                               variants.
                                                                                                                                       In	a	natural	language	question	answering	system,	the	user	can	ask	  (3)	It	demonstrates	an	exclusive	shape-by-definition	feature	of	Chi-
                                                                                                                                       a	 computer	 questions	 in	 an	 ordinary	 fashion,	 such	 as	“Who	 is	 the	  nese	characters.
                                                                                                                                       President	of	the	United	States?”	Such		system	would	greatly	enhance
                                                                                                                                       search	efficiency.	We	integrate	several	Chinese	NLP	techniques,	such	  (4)	It	uses	“glyph	expressions”	and	“style	codes”	to	solve	the	problem
                                                                                                                                       as	question	type	classification,	passage	retrieval,	named	entity	rec-  of	Chinese	character	encoding.

      28                                                                                                                                                                                                                                29
   23   24   25   26   27   28   29   30   31   32   33