Data quality in user-generated content
Data quality in user-generated content
Data quality in user-generated content
Prof. Dr. Mathias Klier
Andreas Obermeier
Prof. Dr. Mathias Klier
+49 (0) 7 31 50-3 23 12
mathias.klier(at)uni-ulm.de
In an increasingly digital world, the amount of user-generated content (UGC) - e.g., customer reviews on rating platforms, articles in wikis, or posts in social media - is growing very rapidly. Since this treasure trove of data holds enormous economic potential, machine learning methods for analyzing the large amounts of unstructured data have become highly relevant in science and practice in recent years. However, such analyses and their results can only be valid and value-adding if the underlying input data are quality-assured.
Nevertheless, in contrast to the area of structured data, no comparable approaches to automated measurement and improvement of data quality exist to date for unstructured, textual UGC. Also, the machine learning methods used for analysis currently only take very limited account of the fact that textual UGC can exhibit poor data quality. This is where the planned project comes in and seeks solutions for measuring the data quality of UGC and for taking the data quality of UGC into account in machine learning methods. The University of Ulm, in cooperation with the University of Regensburg, is pursuing the following research questions:
Cooperation partners: University of Regensburg
Funding body: German Research Foundation (DFG)
Project period: until 2024