Institute of Information Technology
- 1:
About the Institute. - 2:
Staff.- 2.1:
Information Transmission. - 2.2:
Dialogue Systems. - 2.3:
Alumni.
- 2.1:
- 3:
Teaching. - 4:
Information Transmission. - 5:
Dialogue Systems. - 6:
Intranet. - 7:
Links.
Incorporating Knowledge into Statistical Acoustic Models for Spoken Language Dialogue Systems
Author: Sakriani Sakti
Status: completed
Description:
The subject of this thesis will be the study of acoustic modeling approaches for spoken language dialog systems. One of the most important research challenges is how to build accurate acoustic models that can truly reflect the spoken language to be recognized.
Current acoustic models usually use a statistical approach based on hidden Markov models (HMMs). This modeling approach has been achieving encouraging results in recent decades, and outperforms the previous knowledge-based approach where the rules are created manually. However, although such models have proved to be an efficient choice, it is believed that they are insufficient to handle all sources of variability that exist in everyday dialog conversational speech. By completely ignoring linguistic knowledge and relying only on statistical models, only a limited level of success can be achieved. Many researchers are aware of this problem, and thus various attempts to integrate more explicitly knowledge-based and statistical approaches also exist. However, there is no common, flexible enough framework that allows integration of additional knowledge into current statistical acoustic models.
The study will aim to build upon previous work in the area of acoustic modeling, and will incorporate the useful knowledge to improve the capability of current statistical acoustic models. Several acoustic modeling techniques will be investigated, including HMM, HMM/BN, and Dynamic Bayesian Network (DBN). After gaining experience with these individual components, the goal will be to generalize and formalize an efficient acoustic model structure, leading to capturing a wide knowledge of speech variability with feasible cost of training and recognition effort, and of course improving speech recognition accuracy.


