Institute of Information Science Academia Sinica
Topic: Some Experience with Formosa Speech in the Wild Corpus Development
Speaker: Prof. Yuan-Fu Liao (Department of Electronic Engineering, National Taipei University of Technology)
Date: 2019-04-23 (Tue) 10:00 – 12:00
Location: Auditorium106 at IIS new Building
Host: Keh-Yih Su


It is well understood that a Taiwanese-specific automatic speech recognition (ASR) system is required for better speech-enabled human-computer interaction in Taiwanese people's daily life. However, given practical investment considerations, especially in the shadows of giant enterprises, such as iFlytek, Google and SpeechOcean, it is difficult, if not impossible, to get enough commitment to develop a Taiwanese-specific ASR system.

Should we develop a highly accurate Taiwanese-specific ASR system, a large-scale Taiwanese speech corpus is indispensable. In particular, for Deep Neural Network (DNN)-based ASR systems, thousand hours of speech data are required to successfully train the DNNs. However, speech data collection is labor-intensive and time consuming. The current available Taiwanese speech corpora are often, unfortunately, small (except commercial ones), containing only speech elicited in a single communicative context (e.g., news), associated with a particular speech style (e.g., read speech), produced by speakers of a specific age group (e.g., adults), or too costly for academic licenses (such as SpeechOcean' corpora). In this regard, one immediate and critical problem we face in the endeavor of developing a Taiwanese-specific ASR system relates to the language resource infrastructure.

To remedy this predicament, the Formosa Speech in the Wild (FSW) project was established in Nov. 2017. Its initial goal is to collect a shared, large-scale, real-life, multi-genre and spontaneous Taiwanese speech corpus in order to improve the development of Taiwanese-specific ASR techniques. The final goal of the FSW project is to publicly release about 3000 and 100 hours of Taiwanese Mandarin and Taiwanese Hokkien speech data, respectively, to fulfill the big language resource requirement in the era of artificial intelligence and machine learning.

In this talk, we will briefly describe the development of the FSW Corpus (both Mandarin and Taiwanese Hokkien) and our collection and annotation protocol. We will also introduce the task and design of the FSR challenge, as well as give the final evaluation results.


Yuan-Fu Liao received the B.S., M.S., and Ph.D. degrees from Department of Communication Engineering, National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 1991, 1993, and 1998, respectively. From January 1999 to June 1999, he was a Postdoctoral Researcher with the Department of Communication Engineering, National Chiao-Tung University. From September 1999 to February 2002, he became a Research Engineer with Philips Research East Asia, Taiwan, Since February 2002, he has been with the Department of Electronic Engineering, National Taipei University of Technology, Taipei, Taiwan, where he is currently an associate professor. His major research interests are Speech Signal Processing (Speech/Speaker/Language Recognition/Speech Synthesis), Audio Signal Processing (Speech Enhancement, Microphone Array), Natural Language Processing, Machine Learning​ (Deep Learning, Deep Neural Networks).