Page 63 - profile2014.indd

P. 63

研究員
王新民 Hsin-Min Wang

Research Fellow
Ph.D., Electrical Engineering, National Taiwan University
Tel: +886-2-2788-3799 ext. 1714 Fax: +886-2-2782-4814
Email: whm@iis.sinica.edu.tw
http://www.iis.sinica.edu.tw/pages/whm

Research Description ● Deputy Director, IIS, Academia
Sinica (2011 - )
My research interests include speech processing, natural language processing, multimedia ● Research Fellow, IIS, Academia
information retrieval, machine learning, and pattern recognition. Our research goal is to Sinica (2010 - )
develop methods for analyzing, extracting, recognizing, indexing, and retrieving informa- ● Associate Research Fellow, IIS,
tion from audio data, with special emphasis on speech and music. In the eld of speech, Academia Sinica (2002 - 2010)

research has been focused mainly on speaker recognition, spoken language recognition,
voice conversion, and spoken document retrieval/summarization. Our recent achieve- ● Assistant Research Fellow, IIS,
ments include a new maximum mutual information-based framework for GMM-based Academia Sinica (1996 - 2002)

voice conversion, subspace-based spoken language identi cation, and i-vector-based lan- ● Ph.D., EE, National Taiwan Univer-
guage modeling for spoken document retrieval. Our ongoing research includes language sity (1995)

modeling for speech recognition/document classi cation/information retrieval, subspace- ● B.S., EE, National Taiwan Univer-
based speaker/spoken language recognition, discriminative training for GMM-based voice sity (1989)

conversion, and expressive speech synthesis. In the music eld, research has been focused
mainly on vocal melody extraction, automatic music tagging, music emotion recognition, ● President, Association for

and music search. Our recent achievements in this eld include a novel cost-sensitive multi- Computational Linguistics and
label (CSML) learning framework for music tagging, a novel query by multiple tags with Chinese Language Processing
multiple levels of preference (denoted as an MTML query) scenario and a corresponding tag (2013 - )
cloud-based query interface for music search, and an acoustic emotion Gaussians model for ● Editorial board member, IJCLCLP
emotion-based music annotation and retrieval. Our extended work on acoustic visual emo- (2004 - ), JISE (2012 - ), APSIPA
tion Gaussians modeling for automatic music video generation won the ACM Multimedia TSIP (2014 -)
2012 Grand Challenge First Prize. Our ongoing research includes continuous improvement
of our own technologies and systems, audio feature analysis, semantic visualization of mu-
sic tags, and vocal separation, so as to facilitate the management and retrieval of a large music database. Future research directions also
include singing voice synthesis, context-aware music retrieval/recommendation, and music structure analysis/summarization.

Publications

1. Wei-Ho Tsai and Hsin-Min Wang, “Automatic singer recogni- 6. Shih-Sian Cheng, Hsin-Chia Fu, and Hsin-Min Wang, “Mod-
tion of popular music recordings via estimation and modeling el-based clustering by probabilistic self-organizing maps,”
of singer vocal signal,” IEEE Trans. on Audio, Speech, and IEEE Trans. on Neural Networks, 20(5), pp. 805-826, May
Language Processing, 14(1), pp. 330-341, January 2006. 2009.
2. Wei-Ho Tsai, Shih-Sian Cheng, and Hsin-Min Wang, “Auto- 7. Shih-Sian Cheng, Hsin-Min Wang, and Hsin-Chia Fu, “BIC-
matic speaker clustering using a voice characteristic reference based speaker segmentation using divide-and-conquer strate-
space and maximum purity estimation,” IEEE Trans. on Au- gies with application to speaker diarization,” IEEE Trans. on
dio, Speech and Language Processing, 15(4), pp. 1461-1474, Audio, Speech, and Language Processing, 18(1), pp. 141-157,
May 2007. January 2010.
3. Yi-Hsiang Chao, Wei-Ho Tsai, Hsin-Min Wang, and Ruei- 8. Chih-Yi Chiu and Hsin-Min Wang, “Time-series linear search
Chuan Chang, “Using kernel discriminant analysis to improve for video copies based on compact signature manipulation and
the characterization of the alternative hypothesis for speaker containment relation modeling,” IEEE Trans. on Circuits and

veriﬁcation,” IEEE Trans. on Audio, Speech and Language Systems for Video Technology, 20(11), pp. 1603-1613, Novem-
Processing, 16(8), pp. 1675-1684, November 2008. ber 2010.
4. Hung-Ming Yu, Wei-Ho Tsai, and Hsin-Min Wang, “A query- 9. Hung-Yi Lo, Ju-Chiang Wang, Hsin-Min Wang, and Shou-De
by-singing system for retrieving karaoke music,” IEEE Trans. Lin, “Cost-sensitive multi-label learning for audio tag anno-
on Multimedia, 10(8), pp. 1626-1637, December 2008. tation and retrieval,” IEEE Trans. on Multimedia, 13(3), pp.
518-529, June 2011.
5. Yi-Ting Chen, Berlin Chen, and Hsin-Min Wang, “A proba-
bilistic generative framework for extractive broadcast news 10. Hung-Yi Lo, Shou-De Lin, and Hsin-Min Wang, “Generalized
speech summarization,” IEEE Trans. on Audio, Speech and k-labelsets ensemble for multi-label and cost-sensitive clas-

Language Processing, 17(1), pp.95-106, January 2009. siﬁcation,” accepted to appear in IEEE Trans. on Knowledge
and Data Engineering.

58 59 60 61 62 63 64 65 66 67 68