C: Software Engineering for Data Science and Deep Learning
Software engineering for Data Science and Deep Learning
Software engineering for Data Science and Deep Learning is a new and emerging field. It aims to understand and implement algorithms, methods, and pipelines on a specific topic in Data Science. Of particular interest are methods in machine learning, including modern neural networks (Deep Learning), and their applications to the analysis, interlinkage, and enrichment of unstructured data like multimedia content and textual content, as well as the analysis and use of open data on the web.
Specific Topics for Next Term
Topics we are going to offer in Summer 2025 are:
- Project Economic Data Science on Very Large Data Sets together with Dr. Alexander Rieber, Economics, Uni Ulm
- Project Botanic Data Science on Very Large Data Sets together with Dr. Annika Schrumpt, Palmengarten, Stadt Frankfurt am Main
- Project Text Classification on Very Large Data Sets (depending on the number of interested students)
Background on Project Data Science on Very Large Data Sets
The project group covers different topics in Data Science. Examples of topics are the analysis, interlinkage, and enrichment of unstructured textual documents or the analysis and use of semi-structured graph data on the web. The students work in small groups on different innovative and applied problems. Besides a requirements analysis and conceptual specification of the problem, an important task is the implementation and scientific evaluation of the proposed solution.
Data Science deals with the data-driven, interdisciplinary analysis of digital objects such as semi-structured graph data on the web (i.e., Linked Open Data), documents, profiles, or communities, and understanding their relationships. The module involves understanding and summarizing algorithms and methods on a specific topic in Data Science. Of particular interest are methods in machine learning, including modern neural networks (Deep Learning) and their applications to the analysis, interlinkage, and enrichment of unstructured data like multimedia content and textual content, as well as the analysis and use of open data on the web.
The students of the practical course are encouraged to independently organize and work on a project for a real or fictive partner in industry or research. An essential requirement of the practical course is a proper conceptual design, implementation, and scientific evaluation of the solution. In addition, a sufficient level of innovation for the proposed solution, an in-depth analysis of the problem, and the documentation of the results are required. This includes a continuous evaluation and reporting of intermediate results and active participation of students in the design of the solution for the practical course. Thus, students are highly encouraged to propose their views on the problem and make suggestions for improving the applied methods and results.
General information about the concept:
https://github.com/data-science-and-big-data-analytics/teaching-examples
Examples of previous projects
There are a couple of previous examples of projects with SE:
[1] Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal, https://arxiv.org/abs/2411.13687
[2] Fine-Tuning Language Models for Scientific Writing Support, https://arxiv.org/abs/2306.10974
[3] Analysis of GraphSum's Attention Weights to Improve the Explainability of Multi-Document Summarization, https://arxiv.org/abs/2105.11908
Interested?
Contact me for an informal chat.