News of the Institute of Media Informatics

PhD exposition/Hartwig, S.: Learning from Human Annotated Data: Acquisition and Estimation of Human Judgments using Deep Learning

Ulm University

Introduction of a dissertaion project | Wednesday, 29 May 2024, 10:00 am | O27/331


Sebastian Hartwig, member of the research group Visual Computing gives an introdcution of his dissertation topic of the title Learning from Human Annotated Data: Acquisition and Estimation of Human Judgments using Deep Learning.

Abstract: Many computer vision tasks, such as object detection and semantic segmentation, have been tackled by deep learning approaches that mimic human experts after being trained with human supervision. Human perception is considered the gold standard for vision tasks, as humans possess strong abilities in object recognition, semantic reasoning, and abstract pattern matching, such as compositional reasoning. When training deep learning models based on human judgments, it is assumed that the model learns to reflect consensus and achieves decisions that gain broad acceptance within the majority of a group, also referred to as human preferences.

This thesis explores the innovative process of constructing valuable human-annotated datasets, derived from extensive online crowdsourcing studies involving diverse human raters. We explore three scenarios in which human perception significantly enhances machine learning applications: optimal viewpoint selection, cluster separation in scattered data, and thermal comfort sensation. In optimal viewpoint selection, we delve into human preferences for selecting the best viewpoints of common objects utilizing a forced-choice paired comparison experiment. This study not only enriches our understanding of visual preference but also guides the development of algorithms that can predict optimal viewpoints based on human judgments. After the analysis of human preferred viewpoints, we can show high agreement between subjects. For cluster detection in scatterplots, we capture nuanced perceptions of cluster formations in visual data through direct engagement with human raters. This approach enables us to fine-tune machine learning models that mirror human visual clustering capabilities, bridging the gap between human intuition and algorithmic pattern recognition. The analysis of human annotations reveals ambiguities of perceived clusters, which we utilize to model a human agreement score estimated by our developed approach indicating consensus for a group of raters. Additionally, by conducting experiments on temperature sensation, we investigate how subjective thermal comfort can be quantitatively analyzed and modeled, thus opening new avenues for smart environment controls that adapt to individual comfort levels. The datasets generated from these studies have been meticulously cleaned and analyzed, allowing for the development of specialized deep learning models that effectively emulate human judgment. We present the results of this research through a comprehensive evaluation performed on unseen test data, showcasing the robustness and applicability of our models in real-world scenarios.

To summarize our findings, this work proposes the methodology of acquiring high-quality human- annotated data serving as supervision for training deep learning models, and thus, we can show that estimations align well with human judgments.


We are looking forward to numerous participation and fair discussions.