Page 63 - profile2014.indd
P. 63

研究員
                       王新民 Hsin-Min Wang



               Research Fellow
               Ph.D., Electrical Engineering, National Taiwan University
               Tel: +886-2-2788-3799 ext. 1714        Fax: +886-2-2782-4814
               Email: whm@iis.sinica.edu.tw
               http://www.iis.sinica.edu.tw/pages/whm



                  Research Description                                                      ● Deputy Director, IIS, Academia
                                                                                           Sinica (2011 - )
               My research interests include speech processing, natural language processing, multimedia     ● Research Fellow, IIS, Academia
               information retrieval, machine learning, and pattern recognition. Our research goal is to   Sinica (2010 - )
               develop methods for analyzing, extracting, recognizing, indexing, and retrieving informa-    ● Associate Research Fellow, IIS,
               tion from audio data, with special emphasis on speech and music. In the  eld of speech,   Academia Sinica (2002 - 2010)

               research has been focused mainly on speaker recognition, spoken language recognition,
               voice  conversion,  and  spoken  document  retrieval/summarization.  Our  recent  achieve-    ● Assistant Research Fellow, IIS,
               ments include a new maximum mutual information-based framework for GMM-based   Academia Sinica (1996 - 2002)

               voice conversion, subspace-based spoken language identi cation, and i-vector-based lan-    ● Ph.D., EE, National Taiwan Univer-
               guage modeling for spoken document retrieval. Our ongoing research includes language   sity (1995)

               modeling for speech recognition/document classi cation/information retrieval, subspace-    ● B.S., EE, National Taiwan Univer-
               based speaker/spoken language recognition, discriminative training for GMM-based voice   sity (1989)

               conversion, and expressive speech synthesis. In the music  eld, research has been focused
               mainly on vocal melody extraction, automatic music tagging, music emotion recognition,     ● President, Association for

               and music search. Our recent achievements in this  eld include a novel cost-sensitive multi-  Computational Linguistics and
               label (CSML) learning framework for music tagging, a novel query by multiple tags with   Chinese Language Processing
               multiple levels of preference (denoted as an MTML query) scenario and a corresponding tag   (2013 - )
               cloud-based query interface for music search, and an acoustic emotion Gaussians model for     ● Editorial board member, IJCLCLP
               emotion-based music annotation and retrieval. Our extended work on acoustic visual emo-  (2004 - ), JISE (2012 - ), APSIPA
               tion Gaussians modeling for automatic music video generation won the ACM Multimedia   TSIP (2014 -)
               2012 Grand Challenge First Prize. Our ongoing research includes continuous improvement
               of our own technologies and systems, audio feature analysis, semantic visualization of mu-
               sic tags, and vocal separation, so as to facilitate the management and retrieval of a large music database. Future research directions also
               include singing voice synthesis, context-aware music retrieval/recommendation, and music structure analysis/summarization.



                  Publications


               1.  Wei-Ho Tsai and Hsin-Min Wang, “Automatic singer recogni-  6.  Shih-Sian Cheng, Hsin-Chia Fu, and Hsin-Min Wang, “Mod-
                   tion of popular music recordings via estimation and modeling   el-based clustering  by probabilistic  self-organizing  maps,”
                   of singer vocal signal,” IEEE Trans. on Audio, Speech, and   IEEE Trans. on Neural Networks, 20(5), pp. 805-826, May
                   Language Processing, 14(1), pp. 330-341, January 2006.  2009.
               2.  Wei-Ho Tsai, Shih-Sian Cheng, and Hsin-Min Wang, “Auto-  7.  Shih-Sian Cheng, Hsin-Min Wang, and Hsin-Chia Fu, “BIC-
                   matic speaker clustering using a voice characteristic reference   based speaker segmentation using divide-and-conquer strate-
                   space and maximum purity estimation,” IEEE Trans. on Au-  gies with application to speaker diarization,” IEEE Trans. on
                   dio, Speech and Language Processing, 15(4), pp. 1461-1474,   Audio, Speech, and Language Processing, 18(1), pp. 141-157,
                   May 2007.                                           January 2010.
               3.  Yi-Hsiang Chao,  Wei-Ho  Tsai, Hsin-Min  Wang, and Ruei-  8.  Chih-Yi Chiu and Hsin-Min Wang, “Time-series linear search
                   Chuan Chang, “Using kernel discriminant analysis to improve   for video copies based on compact signature manipulation and
                   the characterization of the alternative hypothesis for speaker   containment relation modeling,” IEEE Trans. on Circuits and

                   verification,”  IEEE Trans.  on Audio, Speech and Language   Systems for Video Technology, 20(11), pp. 1603-1613, Novem-
                   Processing, 16(8), pp. 1675-1684, November 2008.    ber 2010.
               4.  Hung-Ming Yu, Wei-Ho Tsai, and Hsin-Min Wang, “A query-  9.  Hung-Yi Lo, Ju-Chiang Wang, Hsin-Min Wang, and Shou-De
                   by-singing system for retrieving karaoke music,” IEEE Trans.   Lin, “Cost-sensitive multi-label learning for audio tag anno-
                   on Multimedia, 10(8), pp. 1626-1637, December 2008.  tation and retrieval,” IEEE Trans. on Multimedia, 13(3), pp.
                                                                       518-529, June 2011.
               5.  Yi-Ting Chen, Berlin Chen, and Hsin-Min Wang, “A proba-
                   bilistic  generative framework for extractive  broadcast  news   10.  Hung-Yi Lo, Shou-De Lin, and Hsin-Min Wang, “Generalized
                   speech summarization,” IEEE Trans. on Audio, Speech and   k-labelsets ensemble for multi-label and cost-sensitive clas-

                   Language Processing, 17(1), pp.95-106, January 2009.  sification,” accepted to appear in IEEE Trans. on Knowledge
                                                                       and Data Engineering.




                                                                                                                     63
   58   59   60   61   62   63   64   65   66   67   68