L: Software Engineering for Advanced Machine Learning and Deep Learning
Software engineering for Data Science and Deep Learning
Software engineering for Data Science and Deep Learning is a new and emerging field. It aims to understand and implement algorithms, methods, and pipelines on a specific topic in Data Science. Of particular interest are methods in machine learning, including modern neural networks (Deep Learning), and their applications to the analysis, interlinkage, and enrichment of unstructured data like multimedia content and textual content, as well as the analysis and use of open data on the web.
Specific Topics for Next Term
Topics we are going to offer are:
- Project on "Continous Learning on Transformer-based Language Models" -- Special topic 2025
- Project on "Text Classification using eXtreme Multi Label (XML)" (depending on number of participants)
Background on Project Data Science on Very Large Data Sets
The project group covers different topics in Data Science. Examples of topics are the analysis, interlinkage, and enrichment of unstructured textual documents or the analysis and use of semi-structured graph data on the web. The students work in small groups on different innovative and applied problems. Besides a requirements analysis and conceptual specification of the problem, an important task is the implementation and scientific evaluation of the proposed solution.
Data Science deals with the data-driven, interdisciplinary analysis of digital objects such as semi-structured graph data on the web (i.e., Linked Open Data), documents, profiles, or communities, and understanding their relationships. The module involves understanding and summarizing algorithms and methods on a specific topic in Data Science. Of particular interest are methods in machine learning, including modern neural networks (Deep Learning) and their applications to the analysis, interlinkage, and enrichment of unstructured data like multimedia content and textual content, as well as the analysis and use of open data on the web.
The students of the practical course are encouraged to independently organize and work on a project for a real or fictive partner in industry or research. An essential requirement of the practical course is a proper conceptual design, implementation, and scientific evaluation of the solution. In addition, a sufficient level of innovation for the proposed solution, an in-depth analysis of the problem, and the documentation of the results are required. This includes a continuous evaluation and reporting of intermediate results and active participation of students in the design of the solution for the practical course. Thus, students are highly encouraged to propose their views on the problem and make suggestions for improving the applied methods and results.
General information about the concept: https://github.com/data-science-and-big-data-analytics/teaching-examples
Topics for BSc and MSc theses: https://tinyurl.com/dsbda-topics
Writing template: https://tinyurl.com/dsbda-template
Examples of previous projects
There are a couple of previous examples of projects with SE:
[1] Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP, ACL 2022, URL: https://aclanthology.org/2022.acl-long.279/
[2] Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal, URL: https://arxiv.org/abs/2411.13687
[3] Continual Learning for Encoder-only Language Models via a Discrete Key-Value Bottleneck, https://arxiv.org/abs/2412.08528
Interested?
Contact me for an informal chat.