Institute of Information Technology
- 1:
About the Institute. - 2:
Staff.- 2.1:
Information Transmission. - 2.2:
Dialogue Systems. - 2.3:
Alumni.
- 2.1:
- 3:
Teaching. - 4:
Information Transmission. - 5:
Dialogue Systems. - 6:
Intranet. - 7:
Links.
Speech-Emotion Recognition in Adaptive Dialogue Systems
Author: Johannes Pittermann
Status: completed
Description:
The ongoing trend of computers increasingly gaining importance in everyday life and becoming ubiquitous poses a big challenge for the fast, easy and consistent interaction with such devices and applications. With speech constituting the primary and most convenient means of communication, spoken language dialogue systems (SLDSs) provide a simple and consistent access to the technical possibilities of these systems.
In order to increase the user-friendliness and to design the dialogue flow in a more natural way, the subject of this thesis is centered around adaptive SLDS with particular focus on emotion recognition from speech signals and on emotion-sensitive dialogue management.
Accordingly, the work described in this thesis subdivides into the enhancement of speech-emotion recognition, on the one hand, and the development of an extended dialogue model integrating the recognized emotional cues into adaptive dialogue management, on the other hand.
For the recognition of emotions, we have implemented and evaluated three different ideas: a plain emotion recognizer, a combined speech-emotion recognizer and a linguistic analysis to detect emotions from the recognized text. Our first approach, a plain emotion recognizer , uses Hidden Markov Models (HMMs) to classify different emotional states from prosodic and acoustic features extracted from the speech signal.
Based on this straightforward recognizer we have developed a what we refer to as speech-emotion recognizer combining speech and emotion recognition into one process. This approach is strongly related to a regular speech recognizer in which, however, phonemes and emotions coalesce to so-called ``emophonemes'' and words like ``EVENING'' evolve to word-emotions like ``EVENING-ANGER''. By that, the number of HMMs in the acoustic model is multiplied by the number of emotions leading to an enormous increase of the model's complexity. Slightly deviating from the proposed all-in-one system we have extended this approach to a two-step system which still uses a common feature extraction but applies an extra speech recognizer inserted ahead to sensibly minimize the degrees of freedom in the speech-emotion recognizer by a preselection of words. Our experiments have shown that knowledge about the textual content of an utterance helps to improve the emotion recognition performance visibly.
For the speech-emotion recognizers we have also derived a postprocessing algorithm (based on ROVER proposed by J. Fiscus (1997)) which reduces the recognition error rate by combining the output of multiple recognizers. We have also complemented our signal-based emotion recognizers by a linguistic analysis which infers the emotional state from the utterances' textual content.
To integrate emotions into adaptive dialogue management, we have first picked up the idea of using a rule basis consisting of rules and conditions how, e.g., the prompt shall be adapted to the user's emotional state according to the effect the system's message is supposed to have on the user (how can a negative message be communicated to an angry user?). Moreover, we have developed a semi-stochastic dialogue model consisting of a predefined scaffolding of states representing dialogue fields and attached emotions and transitions between these states which are determined by training on collected dialogue data. The proposed structure, on the one hand, enables the developer to define prompts tailored to different emotional states, but, on the other hand, leaves the decision to the dialogue model, e.g., when to apply which prompt. Instead of or in addition to emotions, this dialogue model is also able to integrate other/further dialogue control parameters.


