Page 61 - 2017 Brochure

P. 61

究員

王新民 Hsin-Min Wang

Research Fellow
Ph.D., Electrical Engineering, National Taiwan University

Tel: +886-2-2788-3799 ext. 1714 Fax: +886-2-2782-4814
Email: whm@iis.sinica.edu.tw
http://www.iis.sinica.edu.tw/pages/whm

• Research Fellow, Institute of Information Science, Academia Sinica (2010-present)
• Deputy Director, Institute of Information Science, Academia Sinica (2011-present)
• Deputy Director, Center for Digital Cultures, Academia Sinica (2013-present)
• Associate Research Fellow, Institute of Information Science, Academia Sinica (2002-2010)
• Assistant Research Fellow, Institute of Information Science, Academia Sinica (1996-2002)
• President, Association for Computational Linguistics and Chinese Language Processing (2013-2015)
• Editorial board member, IJCLCLP (2004-2016), JISE (2012-2016), APSIPA TSIP (2014-present), IEEE/ACM TASLP (2016-present)

Research Description and question answering. In the music field, research has been
mainly focused on vocal melody extraction and automatic music
My research interests include spoken language processing, natural video generation. Our recent achievements in this field include
language processing, multimedia information retrieval, machine an acoustic-phonetic F0 modeling framework for vocal melody
learning, and pattern recognition. The overall research goal is to extraction and an emotion-oriented pseudo song prediction and
develop methods for analyzing, extracting, recognizing, indexing, matching framework for automatic music video generation. We
and retrieving information from audio data, with special emphasis have successfully implemented a complete automatic music video
on speech and music. In the field of speech, my research has generation system that can edit a long user-generated video into
been focused mainly on speaker recognition, spoken language a music-compliant short professional-like video. Our ongoing
recognition, voice conversion, and spoken document retrieval/ research includes continuous improvement of our technologies
summarization. Our recent achievements include locally linear and systems, cover song identification, and automatic generation
embedding-based approaches for voice conversion and post- of set lists for concert videos, so as to facilitate the management
filtering, discriminative autoencoders for speech and speaker and retrieval in a large music database. Future research directions
recognition, and novel paragraph embedding methods for spoken also include singing voice synthesis, speech to singing voice
document retrieval/summarization. Our ongoing research includes conversion, and music structure analysis/summarization.
audio-visual speaker recognition and speech enhancement,
subspace neural networks for spoken language/dialect/accent
recognition, many-to-one/non-parallel voice conversion, and
neural network-based spoken document retrieval/summarization

Publications 7. Ju-Chiang Wang, Yi-Hsuan Yang, Hsin-Min Wang, and Shyh-Kang
Jeng, “Modeling the affective content of music with a Gaussian
1. Yi-Ting Chen, Berlin Chen, and Hsin-Min Wang, “A probabilistic mixture model,” IEEE Trans. on Affective Computing, 6(1), pp. 56
generative framework for extractive broadcast news speech -68, March 2015.
summarization,” IEEE Trans. on Audio, Speech and Language
Processing, 17(1), pp.95-106, January 2009. 8. Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Ea-
Ee Jan, Wen-Lian Hsu, Hsin-Hsi Chen, “Extractive broadcast news
2. Shih-Sian Cheng, Hsin-Chia Fu, and Hsin-Min Wang, “Model-based summarization leveraging recurrent neural network language modeling
clustering by probabilistic self-organizing maps,” IEEE Trans. on techniques,” IEEE/ACM Trans. on Audio, Speech, and Language
Neural Networks, 20(5), pp. 805-826, May 2009. Processing, 23(8), pp. 1322-1334, August 2015.

3. Shih-Sian Cheng, Hsin-Min Wang, and Hsin-Chia Fu, “BIC-based 9. Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng, “An acoustic-
speaker segmentation using divide-and-conquer strategies with phonetic model of F0 likelihood for vocal melody extraction,” IEEE/
application to speaker diarization,” IEEE Trans. on Audio, Speech, ACM Trans. on Audio, Speech, and Language Processing, 23(9), pp.
and Language Processing, 18(1), pp. 141-157, January 2010. 1457-1468, September 2015.

4. Chih-Yi Chiu and Hsin-Min Wang, “Time-series linear search 10. Yu-Ren Chien, Hsin-Min Wang, and Shyh-Kang Jeng, “Alignment
for video copies based on compact signature manipulation and of lyrics with accompanied singing audio based on acoustic-phonetic
containment relation modeling,” IEEE Trans. on Circuits and Systems vowel likelihood modeling,” IEEE/ACM Trans. on Audio, Speech, and
for Video Technology, 20(11), pp. 1603-1613, November 2010. Language Processing, 24(11), pp. 1998 - 2008, November 2016.

5. Hung-Yi Lo, Ju-Chiang Wang, Hsin-Min Wang, and Shou-De Lin,
“Cost-sensitive multi-label learning for audio tag annotation and
retrieval,” IEEE Trans. on Multimedia, 13(3), pp. 518-529, June 2011.

6. Hung-Yi Lo, Shou-De Lin, and Hsin-Min Wang, “Generalized
k-labelsets ensemble for multi-label and cost-sensitive classification,”
IEEE Trans. on Knowledge and Data Engineering, 26(7), pp. 1679-
1691, July 2014.

59

56 57 58 59 60 61 62 63 64 65 66