中央研究院 資訊科學研究所

活動訊息

友善列印

列印可使用瀏覽器提供的(Ctrl+P)功能

Incorporating Multiple Knowledge Sources in Feature Compensation and Acoustic Modeling Adaptation for Robust Speech Recognition

:::

Incorporating Multiple Knowledge Sources in Feature Compensation and Acoustic Modeling Adaptation for Robust Speech Recognition

  • 講者Yu Tsao 博士 (National Institute of Information and Communications Technology (NICT), Kyoto, Japan)
    邀請人:王新民老師
  • 時間2011-04-08 (Fri.) 10:30 ~ 12:00
  • 地點本所新館一樓106演講廳
摘要

The mismatch between training and testing conditions is an important issue to the current applicability of automatic speech recognition (ASR). The source of mismatch may come from: 1) speaker effects, including speaker’s accent, dialect, and speaking rates; 2) speaking environment effects, including interfering noise, transducers and transmission channel distortions. Many approaches have been proposed in order to reduce this mismatch. Among them, feature compensation and acoustic model adaptation are two popular directions. For feature compensation, a transformation function is calculated to convert testing speech features to match the training condition. The converted features are then used for performing recognition. For acoustic model adaptation, on the other hand, a transformation function is estimated and used to adapt acoustic models from the training condition to match the testing environment. The adapted acoustic models are then used for testing recognition. For both feature compensation and model adaptation, the efficiency and accuracy of the transformation function estimation are crucial to their achievable performance.

In this talk, we present our recent researchesthat attempt to incorporate multiple knowledge sources to improvethe efficiency and accuracy of the transformation function estimation for feature compensation and acoustic model adaptation. We studiedand proposed foureffective ways forincorporatingthe multiple knowledge sources: (1) Prepare multiple sets of prior information that are collected from different phases. (2) Derive new objective functionsthat consider multiple goals. (3) Utilize knowledge based or data-driven based constraints.(4) Characterize diverse acoustic information embedded in the available speech data. Our experimental results indicate that by properly incorporating the multiple knowledge sources, we can improve the efficiency and accuracy of transformation function estimation and therefore enhance the performance of feature compensation and acoustic model adaptation.