Dr.-Ing. Stefan Ultes

My official website has moved: www.ultes.eu.
Information on this website is no longer updated.

PhD Thesis

Title

User-centered Adaptive Spoken Dialogue Modelling (pdf)

Status

completed

Description

Most dialog systems make use solely of the spoken words and their semantics, although speech signals reveal much more about the speaker, e.g. its age, gender, emotional state, etc. Furthermore, additional interaction parameters derived from the dialogue modules, e.g., the number of repromts or the confidence of the speech recognizer, may be used to derive the user's satisfaction level. Using this speaker state information - along with the semantics - can be a promising way of moving dialog systems towards better performance whilst making them more natural at the same time.

In order to do so, two issues have to be dealt with: We have to find reliable and resuable methods for modeling speaker state information using state-of-the-art machine learning techniques. Furthermore, this information must be provided in a way that it can be used by the dialogue manager. This issue has been addressed in the paper "Towards Quality-Adaptive Dialogue Management" for interaction quality which is a similarly defined metric to user satisfaction.
Second, ways of using this speaker state information to improve the dialogue have to be found. For this, we focus both on conventional rule-based dialogue management approaches as well as statistical systems. Partially Observable Markov Decision Processes (POMDPs), a state-of-the-art statistical modeling method, offer an easy and unified way of integrating speaker state information into dialog systems.

Ressources

LEGO Spoken Dialog Corpus

Updated Parameterized & Annotated CMU Let's Go Database (LEGOv2)

Download:

Download the complete corpus containing audio and features / lables. (1.1 GB)

Description:

The LEGOv2 database is a parameterized and annotated version of the CMU Let's Go database from 2006 and 2007.

This spoken dialogue corpus contains interactions captured from the CMU Let's Go (LG) System by Carnegie Mellon University in 2006 and 2007. It is based on raw log-files from the LG system.

The corpus has been parameterized and annotated by the Dialogue Systems Group at Ulm University, Germany.

See license.txt for legal issues.

To cite the corpus, please use the following two publications:
[Schmitt2012]
A. Schmitt, S. Ultes and W. Minker
A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let's Go Bus Information System
International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, pp. 3369--3373, May 2012

[Ultes2015]
S. Ultes, A. Schmitt, M. J. Platero Sánchez and W. Minker
Analysis of an Extended Interaction Quality Corpus
International Workshop On Spoken Dialogue Systems (IWSDS), Busan, Korea, January 2015
accepted for publication

References:

[Esenazi2008] Maxine Eskenazi, Alan W Black, Antoine Raux, and Brian Langner
Let’s Go Lab: a platform for evaluation of spoken dialog systems with real world users
in: Proceedings of Interspeech 2008 Conference, Brisbane, Australia

[Schmitt2011] Alexander Schmitt, Benjamin Schatz and Wolfgang Minker,
MODELING AND PREDICTING QUALITY IN SPOKEN HUMAN-COMPUTER INTERACTION,
in: Proceedings of the SIGDIAL 2011 Conference,
Association for Computational Linguistics, 2011

[Schmitt2009]
A. Schmitt, T. Heinroth and J. Liscombe
On NoMatchs, NoInputs and BargeIns: Do Non-Acoustic Features Support Anger Detection?
Proceedings of the SIGDIAL 2009 Conference, Association for Computational Linguistics, London, UK, pp. 128--131, 2009



------------------------------------------------------------------------------------------------------------------------
Files
------------------------------------------------------------------------------------------------------------------------

license.txt -> license file
readme.txt -> this file
interaction_parameters.pdf -> Description of interaction parameters
|
|---- audio -> wav files with user utterances and full recordings
|
|---- corpus
        |
        |---- csv   -> CSV-files with interactions.csv, acoustics.csv and calls.csv
        |---- mysql -> mysql dump
       
       

The corpus comes with both, MySQL-database dumps and CSV files.

The mysql-dump can be imported right away in any MySQL database.
CSV files can be imported e.g. in Excel, Matlab, R, Weka, SPSS and other SQL databases than mysql.

** interactions.csv/interactions-Table **: each line contains a system-user exchange, parameterized with 53 interaction parameters

Prompt, Utterance, ASRRecognitionStatus, #ASRSuccess, (#)ASRSuccess,
%ASRSuccess, #TimeOutPrompts, (#)TimeOutPrompts, %TimeOutPrompts, #ASRRejections, (#)ASRRejections,
%ASRRejections, #TimeOuts_ASRRej, (#)TimeOuts_ASRRej, %TimeOuts_ASRRej, Barged-In?, #Barge-Ins,
(#)Barge-Ins, %Barge-Ins, ASRConfidence, MeanASRConfidence, (Mean)ASRConfidence, UTD,
ExMo, Modality, UnExMo?, #UnExMo, (#)UnExMo, %UnExMo, WPUT, WPST, SemanticParse, HelpRequest?,
#HelpRequests, (#)HelpRequest, %HelpRequest, Activity, ActivityType, DD, RoleIndex, RoleName, RePrompt?,
#RePrompts, (#)RePrompts, %RePrompts, LoopName, #Exchanges, #SystemTurns, #UserTurns, #SystemQuestions,
(#)SystemQuestions, SystemDialogueAct, UserDialogueAct

as described by [Schmitt et al., 2011]

Use callid to join with call-Table

Furthermore, the file contains *EmotionalState* annotations and *Interaction Quality* annotations, see below. For interaction quality please refer
to [Schmitt et al., 2011]

** calls.csv/calls-Table **: each line contains information affecting the entire call. Primary key: callid

The file contains *gender*, *age* and *dialogue outcome* annotations that can be used as target variable to predict task completion.

** acoustics.csv/acoustics-Table **: each line contains basic acoustic and prosodic features extracted on the full utterance. Extraction has been
done with the Praat software; see [Schmitt2009] for details.







------------------------------------------------------------------------------------------------------------------------
Number of Calls: 548
Number of System-User Exchanges: 13,836
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
Interaction Quality [Schmitt2011]:
------------------------------------------------------------------------------------------------------------------------
Number of Calls with IQ Annotations from all Raters:     401
Number of Exchanges with IQ Annotations from all Raters: 9,638

Annotation Scheme:
1: extremely unsatisfied
2: strongly unsatisfied
3: unsatisfied
4: slightly unsatisfied
5: satisfied

------------------------------------------------------------------------------------------------------------------------
Rater guidelines for annotating Interaction Quality
------------------------------------------------------------------------------------------------------------------------
1.
The rater should try to mirror the users point of view on the interaction as objectively as possible.
2.
An exchange consists of the system prompt and the user response. Due to system design, the latter is not always present.
3.
The IQ score is defined on a 5-point scale with “1=extremely unsatisfied”, “2=strongly unsatisfied”, “3=unsatisfied”, “4=slightly unsatisfied” and “5=satisfied”.
4.
The Interaction Quality is to be rated for each exchange in the dialogue. The dialogue’s specific history should be minded.
For Example, a dialogue that has proceeded fairly poor for a long time, should require some time to recover.
5.
A dialogue always starts with an Interaction Quality score 5.
6.
The first user input should also be rated with 5, since until this moment, no rateable interaction has taken place.
7.
A request for help does not invariably cause a lower Interaction Quality, but can result in it.
8.
In general, the score from one exchange to the following exchange is increased or decreased by one point at the most.
9.
Exceptions, where the score can be decreased by two points, are e.g. hot anger or sudden frustration. The rater’s
perception is decisive here.
10.
Also, if the dialogue obviously collapses due to system or user behavior, the score can be set to 1 immediately. An
example is a reasonable frustrated sudden hang-up.
11.
Anger does not need to influence the score, but can. You should try to figure out whether it might be caused by the
dialogue behavior or not.
12.
In the case a user realizes that he should adapt his dialogue strategy to obtain the desired result or information and
succeeded that way, the Interaction Quality score can be raised up to two points per turn. In a manner of speaking, he
realizes that he caused the poor Interaction Quality by himself.
13.
If the system does not reply with a bus schedule to a specific user query and prompts that the request is out of scope, this
can be considered as completed task and therefore does not need to affect the Interaction Quality.
14.
If a dialogue consists of several independent queries, each query’s quality is to be rated independently. The dialogues
history shouldnt be minded when a new query begins. But the score provided for the first exchange should be equal to
the last label of the previous query.
15.
If a dialogue proceeds fairly poor for a long time, the rater should consider to increase the score more slowly if its getting
better. Also, in general, he or she should observe the remaining dialogue more critical.
16.
If a constantly low-quality dialogue finishes with a reasonable result, the Interaction Quality should be increased.

------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
Emotional States:
------------------------------------------------------------------------------------------------------------------------
Number of Calls with Emotion Annotations:                 302
Number of Raters for Emotion Annotation:                 1

Annotation Scheme: friendly, neutral, slightly angry, angry, very angry

Parameterized & Annotated CMU Let's Go Database (LEGO)

Download:

Download the complete corpus containing audio and features / lables. (1 GB)

Description:

The LEGO database is a parameterized and annotated version of the CMU Let's Go database from 2006.

This spoken dialogue corpus contains interactions captured from the
CMU Let's Go (LG) System by Carnegie Mellon University in 2006. It is based on
raw log-files from the LG system.

The corpus has been parameterized and annotated by the Dialogue Systems Group at
Ulm University, Germany.

see license.txt for legal issues

To cite the corpus, please use the following two publications:
[Schmitt2012]
A. Schmitt, S. Ultes and W. Minker
A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let's Go Bus Information System
International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, pp. 3369--3373, May 2012

References:

[Esenazi2008] Maxine Eskenazi, Alan W Black, Antoine Raux, and Brian Langner
Let’s Go Lab: a platform for evaluation of spoken dialog systems with real world users
in: Proceedings of Interspeech 2008 Conference, Brisbane, Australia

[Schmitt2011] Alexander Schmitt, Benjamin Schatz and Wolfgang Minker,
MODELING AND PREDICTING QUALITY IN SPOKEN HUMAN-COMPUTER INTERACTION,
in: Proceedings of the SIGDIAL 2011 Conference,
Association for Computational Linguistics, 2011

[Schmitt2009]
A. Schmitt, T. Heinroth and J. Liscombe
On NoMatchs, NoInputs and BargeIns: Do Non-Acoustic Features Support Anger Detection?
Proceedings of the SIGDIAL 2009 Conference, Association for Computational Linguistics, London, UK, pp. 128--131, 2009



------------------------------------------------------------------------------------------------------------------------
Files
------------------------------------------------------------------------------------------------------------------------

license.txt -> license file
readme.txt -> this file
interaction_parameters.pdf -> Description of interaction parameters
|
|---- audio -> wav files with user utterances and full recordings
|
|---- corpus
        |
        |---- csv   -> CSV-files with interactions.csv, acoustics.csv and calls.csv
        |---- mysql -> mysql dump
       
       

The corpus comes with both, MySQL-database dumps and CSV files.

The mysql-dump can be imported right away in any MySQL database.
CSV files can be imported e.g. in Excel, Matlab, R, Weka, SPSS and other SQL databases than mysql.

** interactions.csv/interactions-Table **: each line contains a system-user exchange, parameterized with 53 interaction parameters

Prompt, Utterance, ASRRecognitionStatus, #ASRSuccess, (#)ASRSuccess,
%ASRSuccess, #TimeOutPrompts, (#)TimeOutPrompts, %TimeOutPrompts, #ASRRejections, (#)ASRRejections,
%ASRRejections, #TimeOuts_ASRRej, (#)TimeOuts_ASRRej, %TimeOuts_ASRRej, Barged-In?, #Barge-Ins,
(#)Barge-Ins, %Barge-Ins, ASRConfidence, MeanASRConfidence, (Mean)ASRConfidence, UTD,
ExMo, Modality, UnExMo?, #UnExMo, (#)UnExMo, %UnExMo, WPUT, WPST, SemanticParse, HelpRequest?,
#HelpRequests, (#)HelpRequest, %HelpRequest, Activity, ActivityType, DD, RoleIndex, RoleName, RePrompt?,
#RePrompts, (#)RePrompts, %RePrompts, LoopName, #Exchanges, #SystemTurns, #UserTurns, #SystemQuestions,
(#)SystemQuestions, SystemDialogueAct, UserDialogueAct

as described by [Schmitt et al., 2011]

Use callid to join with call-Table

Furthermore, the file contains *EmotionalState* annotations and *Interaction Quality* annotations, see below. For interaction quality please refer
to [Schmitt et al., 2011]

** calls.csv/calls-Table **: each line contains information affecting the entire call. Primary key: callid

The file contains *gender*, *age* and *dialogue outcome* annotations that can be used as target variable to predict task completion.

** acoustics.csv/acoustics-Table **: each line contains basic acoustic and prosodic features extracted on the full utterance. Extraction has been
done with the Praat software; see [Schmitt2009] for details.







------------------------------------------------------------------------------------------------------------------------
Number of Calls: 347
Number of System-User Exchanges: 9,083

------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
Interaction Quality [Schmitt2011]:
------------------------------------------------------------------------------------------------------------------------
Number of Calls with IQ Annotations from all Raters:     200
Number of Calls annotated with IQ by Rater 1:             200
Number of Calls annotated with IQ by Rater 2:             237
Number of Calls annotated with IQ by Rater 3:             237

Number of Exchanges with IQ Annotations from all Raters: 4,885
Number of Exchanges annotated with IQ by Rater 1:         4,885
Number of Exchanges annotated with IQ by Rater 2:         6,379
Number of Exchanges annotated with IQ by Rater 3:         6,340

Annotation Scheme:
1: extremely unsatisfied
2: strongly unsatisfied
3: unsatisfied
4: slightly unsatisfied
5: satisfied

------------------------------------------------------------------------------------------------------------------------
Rater guidelines for annotating Interaction Quality
------------------------------------------------------------------------------------------------------------------------
1.
The rater should try to mirror the users point of view on the interaction as objectively as possible.
2.
An exchange consists of the system prompt and the user response. Due to system design, the latter is not always present.
3.
The IQ score is defined on a 5-point scale with “1=extremely unsatisfied”, “2=strongly unsatisfied”, “3=unsatisfied”, “4=slightly unsatisfied” and “5=satisfied”.
4.
The Interaction Quality is to be rated for each exchange in the dialogue. The dialogue’s specific history should be minded.
For Example, a dialogue that has proceeded fairly poor for a long time, should require some time to recover.
5.
A dialogue always starts with an Interaction Quality score 5.
6.
The first user input should also be rated with 5, since until this moment, no rateable interaction has taken place.
7.
A request for help does not invariably cause a lower Interaction Quality, but can result in it.
8.
In general, the score from one exchange to the following exchange is increased or decreased by one point at the most.
9.
Exceptions, where the score can be decreased by two points, are e.g. hot anger or sudden frustration. The rater’s
perception is decisive here.
10.
Also, if the dialogue obviously collapses due to system or user behavior, the score can be set to 1 immediately. An
example is a reasonable frustrated sudden hang-up.
11.
Anger does not need to influence the score, but can. You should try to figure out whether it might be caused by the
dialogue behavior or not.
12.
In the case a user realizes that he should adapt his dialogue strategy to obtain the desired result or information and
succeeded that way, the Interaction Quality score can be raised up to two points per turn. In a manner of speaking, he
realizes that he caused the poor Interaction Quality by himself.
13.
If the system does not reply with a bus schedule to a specific user query and prompts that the request is out of scope, this
can be considered as completed task and therefore does not need to affect the Interaction Quality.
14.
If a dialogue consists of several independent queries, each query’s quality is to be rated independently. The dialogues
history shouldnt be minded when a new query begins. But the score provided for the first exchange should be equal to
the last label of the previous query.
15.
If a dialogue proceeds fairly poor for a long time, the rater should consider to increase the score more slowly if its getting
better. Also, in general, he or she should observe the remaining dialogue more critical.
16.
If a constantly low-quality dialogue finishes with a reasonable result, the Interaction Quality should be increased.

------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
Emotional States:
------------------------------------------------------------------------------------------------------------------------
Number of Calls with Emotion Annotations:                 302
Number of Raters for Emotion Annotation:                 1

Annotation Scheme: friendly, neutral, slightly angry, angry, very angry

JaCHMM Conditioned Hidden Markov Model Java Library

Java Conditioned Hidden Markov Model library (JaCHMM)

Description:

The JaCHMM - the Java Conditioned Hidden Markov Model library - is a complete implementation of a Conditioned Hidden Markov Model in Java ready to use either on command line or as a module in Java projects. The JaCHMM is licenced under the BSD licence. It gives an implementation of the Viterbi, Forward-Backward, Baum-Welch and K-Means algorithms, all adapted for the CHMM.

JaCHMM is based on the JaHMM and also designed to achieve reasonable performance without making the code unreadable. Consequently, it offers a good way of applying the Conditioned Hidden Markov Model in various tasks, e.g., for scientific or teaching purposes.

Download:

Download the JaCHMM from sourceforge.net.

Supervised Theses

Completed
Embedded Aigaion Query 2015

Master Thesis
IQ-adaptive Statistical Dialogue Management using Gaussian Processes
October 2015
Link to Document

Master Thesis
User Evaluation of User-adaptive Dialogue
October 2015
Link to Document

Master Thesis
Multimodal Adaptive Dialogue Management in Owlspeak
September 2015

2014

Master Thesis
Analysing and Improving Interaction Quality Recognition
2014
Link to Document

Master Thesis
Analysis of an Extended Data Set for Interaction Quality Recognition
2014
Link to Document

Master Thesis
Anwendung von Emotionserkennung in IVR-Systemen auf Mensch-Mensch-Dialoge
2014
Link to Document

Bachelor Thesis
Creation of User-Adaptive Dialogue for User Simulator Evaluation
2014
Link to Document

Master Thesis
Improving Anger Recognition Using Formants
2014
Link to Document

Master Thesis
IQ-adaptive Statistical Spoken Dialogue Management
2014
Link to Document

Master Thesis
User Evaluation of User-adaptive Dialogue
2014
Link to Document

2013

Bachelor Thesis
Garbage Recognition for Anger Detection from Speech
2013
Link to Document

Master Thesis
Methods for Adapting the Dialogue Strategy to User Satisfaction
2013
Link to Document

2012

Master Thesis
Evaluation of Statistical Models for Classification of User Satisfaction
2012
Link to Document

Master Thesis
Statistical Approaches for Adaptive Spoken Dialogue Management
2012
Link to Document

Teaching

Lectures
Embedded Aigaion Query 2015

W. Minker, F. Nothdurft and S. Ultes
User-Adaptive and Intelligent Human-like Dialogue Interaction
Graduate Course within the ERASMUS+ Mobility Programme, Department of Computer Science, University of Granada (Spain), October 2015

S. Ultes and W. Minker
User-centred Adaptive Dialogue Modelling
Graduate Course within the ERASMUS+ Mobility Programme, Department of Computer Science, University of Granada (Spain), April 2015

2014

S. Ultes and W. Minker
Spoken Dialogue Systems: Trends in Modeling and Assessment
Graduate Course within the ERASMUS+ Mobility Programme, Department of Computer Science, University of Granada (Spain), October 2014

S. Ultes and W. Minker
User-adaptive Dialogue Management
Graduate Course within the ERASMUS/SOCRATES Mobility Programme, Department of Computer Science, University of Granada (Spain), May 2014

2013

S. Ultes and W. Minker
Basics and Perspectives of User Satsifaction Recognition in Human Machine Dialogue
Graduate Course within the ERASMUS/SOCRATES Mobility Programme, Department of Computer Science, University of Granada (Spain), November 2013

H. Hofmann, K. Jokinen, F. Nothdurft, T. Gasanova, M. Sidorov, R. Sergienko, S. Ultes and W. Minker
Selected Topics In Spoken Dialogue Research at Ulm University and Tartu University
Graduate Course within the ERASMUS/SOCRATES Mobility Programme, Institute of Communications Engineering, University of Ulm, November 2013

W. Minker, F. Nothdurft and S. Ultes
Trust and Quality in Spoken Dialogue Systems
Graduate Course within the ERASMUS/SOCRATES Mobility Programme, Department of Computer Science, University of Granada (Spain), March 2013

2012

W. Minker and S. Ultes
Quality-Adaptive Spoken Dialogue
Graduate Course within the ERASMUS/SOCRATES Mobility Programme, Department of Computer Science, University of Granada (Spain), November 2012

W. Minker, H. Lang, A. Schmitt, S. Ultes and F. Nothdurft
Assistive and Adaptive Dialogue Systems
Graduate Course within the ERASMUS/SOCRATES Mobility Programme, Department of Computer Science, University of Granada (Spain), June 2012

Affiliated Research Associate

Stefan Ultes

View Stefan Ultes's profile on LinkedIn