Invited Talks

Quality of Experiencing Multimodal Dialogue Systems

Sebastian Möller, Benjamin Weiss, Ina Wechsung, Christine Kühnel
Quality and Usability Lab, Deutsche Telekom Laboratories, TU Berlin, Germany

Multimodal dialogue systems appear to offer better interaction experience, as multimodality seems to have fundamental advantages over unimodal interaction. However, there are few matching examples beyond the standard “put-that-there” scenario. Much more often, simply providing alternative input or output modalities resulting in sequential multimodality seems to be the state-of-the-art. The question is what constitutes a “good” interaction, i.e. what aspects contribute to the user having a good or bad impression of the system she has been using. This is commonly understood by the term “Quality of Experience”, QoE.

In this talk, we will provide a common ground on what Quality of Experience really means, and into what aspects it can be divided. We will then review these aspects as to whether they support multimodality, and whether quality may be positively or negatively influenced by including additional modalities into the system. We will review state-of-the-art techniques for multimodal system evaluation, including subjective evaluation principles as well as model-based evaluation approaches. We will conclude by identifying research questions to be answered in order to fully support the design and evaluation of multimodal dialogue applications.

Recent developments, challenges and opportunities in spoken dialogue systems technology

Michael McTear
Computer Science Research Institute, University of Ulster, Northern Ireland, UK

There have been considerable advances in spoken dialogue systems technology in recent years. On the one hand there has been increasing research activity in terms of funded projects, workshops, special conference sessions, special journal issues, and books; and on the other, in terms of the emergence of applications for areas such as the automation of call centre activities as well as voice search on mobile phones. There is also a healthy competition between those researchers who rely on predominantly hand-crafted, rule-based methods and those who favour data-driven approaches in support of machine learning. Furthermore, there has been a move away from contrived and restricted applications that aim to demonstrate a particular theory or methodology towards more realistic and more useful systems that can be deployed in everyday environments.Notwithstanding these developments there remain many challenges and potential obstacles. One issue concerns the availability of resources. On the positive side it is encouraging that high levels of accuracy are being demonstrated in commercial voice search systems that can avail of massive datasets and resources, such as large, specially designed language models. On the downside, such resources are generally not available to individual researchers or small groups. This raises the question of whether the design and implementation of end-to-end systems is only realistically achievable by large, well-resourced groups. A second issue is that often it would appear that applications such as voice search are motivated by what is technically feasible in terms of speech recognition accuracy, search engine technology, and network connectivity, and not in terms of what might be more attractive and useful for users, such as the ability to engage in a spoken dialogue about a topic rather than simply receiving a ranked list of links from a search engine, or the use of a spoken dialogue application to support the activities of daily living. Some examples of such applications will be discussed in the light of current technologies and challenges, as well as future opportunities.