ituu_logo
Verlauf

Statistical Pronunciation Modeling for Non-Native Speech

Author: Rainer Gruhn

Status: completed

Description:

This thesis analyses non-native speech and focuses on a fully statistical approach to model non-native speakers' pronunciation. Second language speakers pronounce words in multiple different ways compared to native speakers. This is a major problem for automatic speech recognition, e.g. in cases such as a German traveller using an automatic tourist information system in English, or for car drivers attempting to insert a destination city name in a foreign country. Those pronunciation deviations, may it be phoneme substitutions, deletions or insertions, need to be modeled automatically in a data driven way in order to make the approach widely applicable.

We propose discrete Hidden Markov Models (HMMs) to represent all pronunciation variations. This allows higher flexibility compared to the standard rule-based approach, especially with respect to pronunciations unseen in training data or unexpected by expert knowledge.

The HMMs are initialized on a standard pronunciation dictionary. One HMM is generated per word in the dictionary, with one state per phoneme in the baseform pronunciation. Non-native training data is segmented into word chunks, on which phoneme recognition is performed. The probability distributions of the HMMs are trained on these phoneme sequences.

To apply the models, both an N-best word level recognition and a utterance-level phoneme recognition of the test data are required. A pronunciation score is calculated by performing a Viterbi alignment with the HMM dictionary as model and the phoneme sequence as input data. This score is a measure how well the phonemes match with the pronunciation of the word sequence modelled by the HMM. The hypothesis with the highest score is selected as recognition result.

Further Information

PhD_Description.pdf

Presentation

Link to Document