# Topics for bachelor and master thesis as well as projects

Topics for bachelor and master theses as well as projects come from the areas of software engineering and programming languages. The concrete topics for theses are oriented towards our research areas. Basics are taught in the corresponding courses of the institute.

The following list offers a selection of topics and subject areas. More detailed information can be obtained from the respective contact persons. In addition, we are open to your own suggestions for topics.

^{(Legend - B: Bachelor Thesis, M: Master Thesis, P: Project)}

## Domain-Specific Languages

## Analysis Methods for Complex Model Transformations

**Context**

Model Driven Engineering (MDE) is one approach to handle the continuous growth in size and complexity of software systems to be developed. In MDE, models are used to describe the software system in an abstract way. Transformations are applied to these models. If, for example, the naming of a software component changes, the affected models do not need to be updated manually, but can be updated automatically by a transformation. To describe the task of a transformation, so-called model transformation languages are used, which contain special language concepts to be able to operate on models as comfortably as possible.

**Problem**

In general purpose languages (GPL), like Java, it is possible to define the same program using different language concepts. However, these language concepts can affect the performance of a program differently. In Java, for example, one can access an element in an ArrayList in constant time, whereas with a LinkedList one must first navigate through the list. This results in the problem that a developer must pay attention to use the best suitable language concepts. However, this problem occurs not only in the context of GPLs, but also in model transformation languages. The problem is aggravated by the fact that there is hardly any tool support to detect expensive language constructs in transformations, let alone support to replace them.

**Task / Goal**

- Analyze which expensive language constructs occur in transformation definitions
- Develop refactorings to optimize transformation definitions by replacing expensive language constructs
- Develop different analyses that detect expensive language concepts based on static and/or dynamically obtained information in model transformations in order to suggest suitable refactorings

**Related Work**

- The work focus on the transformation language Atlas Transformation Language (ATL)
- Wimmer, M., Perez, S. M., Jouault, F., & Cabot, J. (2012). A Catalogue of Refactorings for Model-to-Model Transformations. J. Object Technol., 11(2), 2-1.

**Contact**

**Context**

Model Driven Engineering (MDE) is one approach to handle the continuous growth in size and complexity of software systems to be developed. In MDE, models are used to describe the software system in an abstract way. Transformations are applied to these models. If, for example, the naming of a software component changes, the affected models do not need to be updated manually, but can be updated automatically by a transformation. To describe the task of a transformation, so-called model transformation languages are used, which contain special language concepts to be able to operate on models as comfortably as possible.

**Problem**

Performance issues can also occur when using model transformation languages, as when using a general purpose languages (GPL), such as Java. Unfortunately, the tool support for these languages is much smaller than for GPLs. For example, there are few profilers for model transformation languages and the information they provide is difficult to interpret for users without expert knowledge of language details. In order to support users, there is currently a lack of visualizations that present the necessary information in a comprehensible way.

**Task / Goal**

- Develop different visualizations that help to understand the execution of a transformation and to find possible causes for performance issues
- Conduct a study to evaluate the visualizations

**Related Work**

- The work focus on the transformation language Atlas Transformation Language (ATL)
- ATL Profiler
- Isaacs, K. E., Giménez, A., Jusufi, I., Gamblin, T., Bhatele, A., Schulz, M., Hamann, B., & Bremer, P. T. (2014, June). State of the Art of Performance Visualization. In EuroVis (STARs).

**Contact**

**Context**

Model-driven software engineering (MDSE) is centered around the transformation of models through complex transformation chains that are constructed by composing basic transformation steps. These basic steps apply a transformation rule on an input model to create an output model. We focus on the Henshin framework for declarative model transformations.

**Problem**

Defects in either the transformation rule or the input model can cause a basic transformation step to not be applicable at all or produce wrong output. At a higher level, defects in the structure of a transformation chain might schedule the application of basic steps incorrectly, thus leading to applicability issues or unexpected behavior. Due to the declarative nature of transformation steps and nondeterminism in pattern matching or scheduling of steps, these bugs are often difficult to find and fix.

One possible approach to assist developers with understanding and debugging complex transformations is to present a visualization of the control flow through the transformation and offer features to edit this control flow and interact with it.

**Task / Goal**

- Development of a control flow visualization for Henshin transformations.
- Integration in the Henshin tool (based on Eclipse Rich Client Platform).
- Evaluation of the implementation with regard to
- performance: what are the costs in time and memory space?
- usability: how does the approach compare to exisiting features concerning identification and fixing of defects in transformations?

**Contact**

## Enhancing Collaborative Modeling

**Context**

We are convinced that (graphical) modeling is a cooperative and explorative process done by domain experts. However, several studies identified an insufficient collaboration support of graphical modeling tools. Therefore, we develop a new collaboration approach for graphical modeling tools that is based on the persistence of sequential edit operations (also called operation-based approach). We aim to enable both synchronous as well as asynchronous collaboration and improve the comprehensibility of evolving models.

**Problem**

At the present stage of our research we persist every kind of edit operation a user has performed to alter the model and present the result in a simple list. We see two main challenges: 1) an unfiltered listing of all kinds of recorded operations, such as moving the same block twenty times or renaming a block, does not necessarily help a user to easily understand the way a model evolved. Therefore, the presented edit operations must be able to be filtered, processed, or rearranged. 2) A listing does not always have to be the appropriate form of presentation. Does there exist further suitable visualizations?

**Tasks**

- Identify the user's needs regarding exploration of a model's history
- Develop filtering mechanisms for recorded model edit operations
- Develop multiple visualizations for recorded (and filtered) edit operations with the intention tu support the comprehension of evolving models
- Evaluate your result with a user study

**Related Work**

- Enhancing Collaborative Modeling

*Summary of the vision of our new modeling approach.* - An Introduction to Model Versioning

*Introduction in model versioning.* - Event Sourcing

*Data persistence pattern that we use for versioning purposes.* - A Domain-Specific Language and Interactive User Interface for Model-Driven Engineering of Technology Roadmaps

*Graphical roadmapping tool developed by our institute that we extend with our collaboration approach.*

**Contact**

**Context**

We are convinced that (graphical) modeling is a cooperative and explorative process done by domain experts. However, several studies identified an insufficient collaboration support of graphical modeling tools. Therefore, we develop a new collaboration approach for graphical modeling tools that is based on the persistence of sequential edit operations (also called operation-based approach). We aim to enable both synchronous as well as asynchronous collaboration and improve the comprehensibility of evolving models.

**Problem**

In the context of synchronous collaboration it can happen that two or more users make changes within the same area of a graphical model. This might lead to visually interfering edit operations, e.g., Alice edits the value of a property *Prop1* while Bob moves *BlockA* which contains *Prop1*. Also semantically conflicting edit operations, such as Bob deletes *BlockA *instead of moving it,* *can have a negative impact to the user experience.

We assume that the negative impact of those interfering edit operations can be avoided be implicitly create a temporary branch that delays interfering edit operations of other users.

**Tasks**

- Systematically identify different types of be interfering edit operations
- Develop an algorithm to detect interfering edit operations
- Implement a solution that uses your algorithm and is able to temporary delay other users edit operations.
- Conduct a user study that evaluates the impact of your approach on the user experience.

**Related Work**

- Enhancing Collaborative Modeling

*Summary of the vision of our new modeling approach.* - An Introduction to Model Versioning

*Introduction in model versioning.* - Event Sourcing

*Data persistence pattern that we use for versioning purposes.* - A Domain-Specific Language and Interactive User Interface for Model-Driven Engineering of Technology Roadmaps

*Graphical roadmapping tool developed by our institute that we extend with our collaboration approach.*

**Contact**

## Quadcopter Lab

**Context**

rosslt ist eine C++ Bibliothek + Integration in ROS (Robot Operating System), die das Setzen und Verändern von Werten über Komponentengrenzen ermöglicht. In einer einfachen Robotikanwendung könnten z.B. in einer Komponente Wegpunkte erzeugt werden, in einer anderen Komponente eine Trajektorie durch diese Wegpunkte geplant werden und in einer dritten Komponente diese Trajektorien visualisiert werden. Jeder berechnete Wert hängt dabei von den ursprünglichen Wegpunkten ab. Ziel ist es, diese berechneten Werte veränderbar zu machen, indem die Berechnungen umgekehrt und auf die ursprünglichen Daten zurückzuführen. Dazu werden die Datenflüsse zur Laufzeit durch die rosslt Bibliothek getrackt und bei Bedarf invertiert.

**Problem**

Die rosslt-Client-Bibliothek wurde bisher nur prototypisch in C++ umgesetzt. Eine Implementierung in Python könnte potentiell die nötigen Änderungen am Code für das Tracking stark reduzieren. Python ist eine dynamisch typisierte Sprache, die weit bessere Introspection Möglichkeiten als C++ bietet. Eventuell bietet sich auch die Möglichkeit, durch Nutzung aspektorientierter Programmierung, die nötigen Änderungen auf (fast) null zu reduzieren.

**Contact**

Thomas Witte

**Context**

rosslt ist eine C++ Bibliothek + Integration in ROS (Robot Operating System), die das Setzen und Verändern von Werten über Komponentengrenzen ermöglicht. In einer einfachen Robotikanwendung könnten z.B. in einer Komponente Wegpunkte erzeugt werden, in einer anderen Komponente eine Trajektorie durch diese Wegpunkte geplant werden und in einer dritten Komponente diese Trajektorien visualisiert werden. Jeder berechnete Wert hängt dabei von den ursprünglichen Wegpunkten ab. Ziel ist es, diese berechneten Werte veränderbar zu machen, indem die Berechnungen umgekehrt und auf die ursprünglichen Daten zurückzuführen. Dazu werden die Datenflüsse zur Laufzeit durch die rosslt Bibliothek getrackt und bei Bedarf invertiert.

**Problem**

Das Location-Tracking von ROS-Messages erlaubt es, den Ursprung (Node, Parameter) einzelner Nachrichten festzustellen. Das Ziel der Abschlussarbeit/des Projekts ist, dies graphisch aufzubereiten und interaktiv zu visualisieren. Eine solche Oberfläche sollte es erlauben zur Laufzeit ROS-Topics zu abbonieren, Nachrichten anzuzeigen und zu verändern. Dabei sollte die Source-Location der Nachricht genutzt werden um im Ursprungsnode so Änderungen vorzunehmen (z.B. Parameter setzen), dass sich die Änderung auch auf die folgenden Nachrichten entsprechend auswirkt. Zusätzlich kann der Ursprung und die Historie jedes Wertes in einer Message visualisiert werden um Datenflüsse und Berechnungen interaktiv zu debuggen.

**Contact**

Thomas Witte

**Context**

Um Gefährdungen und Verletzungen zu vermeiden, ist die Sicherheit (safety) von entscheidender Bedeutung bei der Softwareentwicklung für Quadcopter. Dem gegenüber steht eine sehr heterogene Softwarearchitektur, die sich aus vielen verschiedenen, kommunizierenden Komponenten zusammensetzt. Diese Komponenten wurden evtl für einen ganz anderen Anwendungskontext entwickelt oder sicherheitskritische Zustände entwickeln sich erst aus dem Zusammenwirken mehrerer Komponenten. Durch Überwachung des Systemzustands und Tracking von Datenflüssen zur Laufzeit soll eine Watchdog-Komponente dazu in die Lage versetzt werden, sicherheitskritische Situationen zu erkennen, deren Ursache zu erklären und gegebenenfalls automatisiert einen sicheren Zustand herbeiführen.

**Problem**

Zunächst soll das bestehende Quadcoptersystem systematisch analysiert werden und safety-kritische Zustände identifiziert werden. Dies könnte z.B. Wertebereiche, Rate und Latenz von Nachrichten, interne Zustände von Softwarekomponenten (Vor-, Nachbedingungen, Assertions), sowie den Zustand von Hardwarekomponenten (z.B. Akkustand, Prozessorauslastung, Netzwerkdurchsatz) umfassen.

In einem zweiten Schritt soll dann eine erweiterbare Softwarekomponente entwickelt werden, die diese sicherheitsrelevanten Zustände überwachen und auf Überschreitung des zulässigen Bereichs reagieren kann. Durch Analyse der Herkunft und Geschichte unzulässiger Nachrichten und des ROS-Nodegraphen soll zudem die Fehlersuche erleichtert und dem Entwickler erklärt werden, wo das Problem wahrscheinlich liegt und wie es sich eventuell lösen lässt.

**Contact**

Thomas Witte

**Context**

ROS (Robot Operating System) ist ein Softwareframework zur Entwicklung von Robotikanwendungen. Die Software besteht dabei aus vielen unabhängigen Komponenten (oft verschiedene Prozesse, die sogar verteilt über mehrere Rechner laufen können), die untereinander Nachrichten austauschen. Diese komponentenbasierte Architektur fördert die Wiederverwendung bestehenden Codes und erlaubt das Starten, Beenden oder Ersetzen von Teilen der Anwendung zur Laufzeit.

**Problem**

Im Rahmen der Arbeit soll zunächst beschrieben und klassifiziert werden, welche Rekonfigurationen zur Laufzeit überhaupt möglich sind, also z.B. Parameteränderungen, die nur vorher bestimmte und vom Entwickler vorgesehene Änderungen am Verhalten zulassen bis zum Ersetzen von ganzen Komponenten, die beliebige Änderungen ermöglichen.

Danach soll eine Softwarekomponente entstehen, die Rekonfigurationen an der Anwendung über Services der Anwendung selbst bereitstellt, sodass die Anwendung selbst Änderungen an ihrem Verhalten anstoßen kann. Die Durchführung der Rekonfiguration soll dabei überwacht werden, sodass sichergestellt ist, dass danach wieder ein konsistenter Zustand existiert oder gegebenenfalls ein Rollback möglich ist, der den alten Systemzustand wieder herstellt. Im Rahmen der Evaluation soll die Performance und Korrektheit solcher Rekonfigurationen (z.B. Zeit, Jitter, verlorene Nachrichten) analysiert werden.

**Literature**

Davide Brugali: Runtime reconfiguration of robot control systems: a ROS-based case study

**Contact**

Thomas Witte

## Relaxed Conformance Editing

**Context**

Probably everyone has had to create graphical models as part of their studies. Class diagrams, ER diagrams, state machines or sequence diagrams are popular. There are also a variety of modeling tools for this purpose, although the usability usually "takes some getting used to". Among other things, this is due to the fact that many modeling tools only allow the creation of correct models. This means, for example, that arrows must always be connected to a start and end node. However, this type of modeling often limits you or makes it more cumbersome to use.

In graphic drawing tools like Illustrator, Inkscape but also Powerpoint this restriction does not exist. However, the drawn shapes cannot be recognized as model elements and processed if necessary. One idea to counter this problem is the so-called "Relaxed Conformance Editing", which, as the name suggests, does not always require metamodel-conform graphical representations.

**Problem**

As part of a master's thesis, an implementation of the concepts of relaxed conformance editing has been developed. This implementation, called CouchEdit, consists of a backend that implements the basic concepts and a prototypical frontend that serves to demonstrate the functions of the backend. This backend takes arbitrary graphical elements and attempts to abstract or parse the model from them, for example by considering spatial positions relative to each other. The gathered information is used to generate suggestions such as "Make this a component of other element". Currently, these suggestions can only be accessed via the context menus of the elements. There are no visual hints that suggestions are available and there also is no visualization of which elements are referred to by a suggestion. So if there are multiple equal suggestions, the context menu has the same entry multiple times and there's no way to tell what effect each of them would have. So in summary there's a lack of proper visualizations for interacting with suggestions.

**Task / Goal**

- Develop visualization concepts for the interaction with relaxed conformance editing
- Implement the developed visualization concepts as a prototype or into the frontend of CouchEdit
- Evaluate the developed visualization concepts

**Related Work**

- Leander Nachreiner, Alexander Raschke, Michael Stegmaier, Matthias Tichy

*CouchEdit: A Relaxed Conformance Editing Approach*

MLE ’20: 2nd International Workshop on Modeling Language Engineering and Execution

DOI 10.1145/3417990.3421401

**Contact**

## Software Configuration

# Analyzing Product Lines with #SAT

**Context**

A #SAT solver computes the number of satisfying assignments of a propositional formula. This number is required for a variety of problems in different domains. For example, #SAT solvers enable a high number of analyses on feature models. However, #SAT is a computationally complex task and widely regarded as harder than SAT.

**Problem**

Due to the hardness of #SAT, state-of-the-art solvers often do not scale for complex formulas. One solution considered are approximate #SAT solvers which estimate the number of satisfying assignments. The required runtime of such solvers is heavily dependent on parameters used for the approximation. The goal is to identify effective parameters to achieve a strong trade-off between runtime and result quality.

**Tasks**

- Research approximate #SAT solvers
- Analyze parameters allowed by the solvers
- Find heuristic(s) for parametrization based on feature models and the propositional formula representing it
- Evaluate heuristic(s)

**Related Work**

- A New Approach to Model Counting
- A Scalable Approximate Model Counter
- Evaluating #SAT Solvers on Industrial Feature Models
- Applications of #SAT Solvers on Feature Models

**Contact**

**Context**

A #SAT solver computes the number of satisfying assignments of a propositional formula. This number is required for a variety of problems in different domains. For example, #SAT solvers enable a high number of analyses on feature models. However, #SAT is a computationally complex task and widely regarded as harder than SAT.

**Problem**

Due to the hardness of #SAT, state-of-the-art solvers often do not scale for complex formulas. One solution considered are approximate #SAT solvers which estimate the number of satisfying assignments. However, existing approximate #SAT solvers are optimized for other types of formulas. Developing approximations tailored to feature models may beneficial.

**Tasks**

- Research approximate #SAT solvers
- Identify possible approximations for feature models
- Implement your own solver
- Evaluate its competitiveness vs the state of the art

**Related Work**

- A New Approach to Model Counting
- A Scalable Approximate Model Counter
- Evaluating #SAT Solvers on Industrial Feature Models
- Applications of #SAT Solvers on Feature Models

**Contact**

**Context**

A #SAT solver computes the number of satisfying assignments of a propositional formula which can be applied to feature models to compute the number of valid configurations. This enables a large variety of analyses (e.g. for supporting the development process). However, #SAT is a computationally expensive task.

**Problem**

There is generally no single SAT/#SAT solver that is superior to all other solvers on all input formulas. For regular SAT, switching solvers depending on the input has proven to be very effective (see SATzilla or HordeSAT for example). This has not been tried for #SAT. The goal of this thesis is to find effective metrics to select a solver and develop a prototype of a meta #SAT solver.

**Tasks**

- Gather existing #SAT solvers
- Find effective parameters to select solvers
- Implement a prototype
- Empirically evaluate the implemented meta #SAT solver

**Related Work**

- sharpSAT - Counting Models with AdvancedComponent Caching and Implicit BCP
- SATzilla: Portfolio-based Algorithm Selection for SAT

**Contact**

# Knowledge Compilation

**Context**

Propositional formulas are prevalent in a variety of computer science domains. Such formulas are typically evaluated with SAT solvers which commonly operate on conjuctive normal form (CNF) due to its beneficial properties.

**Problem**

A large variety of translation techniques to CNF have been considered in the literature which typically result in semantically equivalent but different CNFs. It has been shown that the performance of SAT/#SAT solvers are heavily dependent on the resulting CNF. The goal of this work is to compare several CNF translation techniques. Furthermore, the runtime of SAT and #SAT solvers when given the different CNFs shall be examined.

**Tasks**

- Research on CNF translation techniques
- Implement CNF translation techniques
- Implement comparison benchmark for CNFs

**Related Work**

**Contact**

**Context**

Propositional formulas are prevalent in a variety of computer science domains. Such formulas are typically evaluated with SAT/#SAT solvers.

**Problem**

Semantically equivalent formulas are not necessarily equally easy to solve. Thus, many preprocessings, that adapt the formulas without changing its semantics, have been proposed for SAT. These have not been widely employed for #SAT and their effectiveness is thus unknown.

**Tasks**

- Identify existing preprocessings that may be suitable for #SAT
- Create your own preprocessing?
- Implement benchmark framework for preprocessings
- Evaluate effectiveness of preprocessings
- Identify metrics to select preprocessings based on input

**Related Work**

**Contact**

**Context**

A d-DNNF is a normal form for propositional formulas that allows faster queries for certain analyses. For example, a d-DNNF representing a feature model allows to compute the number of valid configurations within polynomial time. However, compiling any formula and, thus, a feature model to d-DNNF is computationally expensive.

**Problem**

Feature models are typically evolving over time. After a change, it may be beneficial to adapt the existing d-DNNF instead of compiling the entire formula again. Your task is to incrementally adapt an existing d-DNNF according to changes in the feature model.

**Tasks**

- Handle changes performed on CNF incrementally
- Handle changes performed on feature model incrementally
- Implement prototype
- Empirically evaluate the benefits

**Related Work**

**Contact**

**Context**

Binary decision diagrams (BDD) represent Boolean functions as directed, acyclic graphs. The order of variables in these graphs are crucial to their size. Some orderings may result in a linear size of the graph, while other can cause an exponential size. For the practical use of BDDs, for instance for the analysis of feature models, one is interested in a near-optimal variable order. Boolean functions in conjunctive normal form (CNF) can be interpreted as (weighted) graphs by translating clauses or variables to nodes or edges.

**Problem**

This thesis should research approaches applying algorithms from graph theory (e.g., chain decomposition) to graphs derived from Boolean functions in CNF in order to generate advantageous variable orders.

**Tasks**

- Investigate the feasability of exploiting graph anaylses to generate variable orderings
- Implement the algorithms into our BDD framework ddueruem*
- Evaluate the impact of the algorithms on BDD construction performance

**Related Work**

**Contact**

Tobias Heß, Thomas Thüm

Individual points of emphasis and own ideas are highly appreciated.

* ddueruem is written in Python, prior experience with Python is helpful but not required.

Binary decision diagrams (BDD) represent Boolean functions as directed, acylig graphs and enable a multitude of operations (e.g., Negation) and analyses (e.g., SAT, #SAT, Existential Quantification, QBF) to be computed efficiently. As a consequence, the construction of BDDs can be notoriously difficult. Therefore, it is desireable to reuse already computed BDDs as much as possible and prefer updates to existing BDDs over the construction of new ones.

**Problem**

This thesis aims at researching the feasability of constructing and updating BDDs incrementally. In particular, algorithms for facilitating update operations need to be devised, verified, and evaluated.

**Tasks**

- Develop algorithms for adding and removing constraints from BDDs.
- Implement the algorithms into our BDD framework ddueruem* along with appropriate supporting features (e.g., caching strategies).
- Evaluate the performance of an incremental approach to classic construction techniques.

**Related Work**

**Contact**

Tobias Heß, Thomas Thüm

Individual points of emphasis and own ideas are highly appreciated.

* ddueruem is written in Python, prior experience with Python is helpful but not required.

Binary decision diagrams (BDD) represent Boolean functions as directed, acylig graphs and enable a multitude of operations (e.g., Negation) and analyses (e.g., SAT, #SAT, Existential Quantification, QBF) to be computed efficiently. As a consequence, the construction of BDDs can be notoriously difficult. Efficient, finely-synchronized data structures are need to allow parallelized BDD construction to flourish.

**Problem**

This thesis aims at developing a thread-safe, finely-synchronized hash table for storing BDDs in *safe** Rust.

**Tasks**

- Explore different hashing concepts and implementations and define requirements for the datastructure.
- Implement a proof of concept implementation in our prototype OBDDimal.
- Make the implementation as safe and efficient as possible and document design decisions and unsave parts without alternative.

**Related Work**

**Contact**

Tobias Heß, Thomas Thüm

Individual points of emphasis and own ideas are highly appreciated.

* as much as possible ;-)

ddueruem-web is a RESTful (web)service for ddueruem, which acts as an interface to (legacy) BDD libraries. The backend of ddueruem-web is written in Django (Python) and utilizes Docker, Singularity, and git under the hood.

The goal of this project is to develop a frontend for ddueruem-web using a framework of your choice (e.g., vue.js, WebAssembly, React, ...) and harnessing the RESTful backend.

Individual points of emphasis and own ideas are highly appreciated.

The project can be conducted as a 8, 12, or 16 LP project,.

**Contact**

Tobias Heß

ddueruem-web is a RESTful (web)service for ddueruem, which acts as an interface to (legacy) BDD libraries. FeatureIDE is an IDE for model-driven software development that is principially developed by our working group.

The goal of this project is the integration of the two tools via the RESTful API of ddueruem-web. Work on this project may also be conducted as part of a Bachelor's thesis.

Individual points of emphasis and own ideas are highly appreciated.

The project can be conducted as a 8, 12, or 16 LP project,.

**Contact**

Tobias Heß, Thomas Thüm

## Constraint-Programmierung und Constraint Handling Rules

## Constraint Handling Rules

We are developing a rule-based implementation of a tool to analyse and generate graphs. It is used in the domain of mason’s marks. For thousands of years, stonemasons have been inscribing these symbolic signs on dressed stone. Geometrically, mason’s marks are line drawings. They consist of a pattern of straight lines, sometimes circles and arcs. We represent mason’s marks by connected planar graphs. Our prototype tool for analysis and generation of graphs is written in the rule-based declarative language Constraint Handling Rules. One or several of following features could be improved in this proposed work:

**Goals/Tasks**

- improve the vertex-centric logical graph representation, i.e. adding arcs, colors, labels,
- encode existing mason marks either by hand, from images or import from existing databases,
- recognize (sub)graphs and patterns in a graph, in particular (self-)similarities and correlations between graphs,
- perform classical algorithms on graphs like shortest paths,
- derive properties and statistics from graphs,
- generate (randomly or exhaustively) and draw graphs from given constrained subgraphs based on properties and statistics.

**References**

**Prerequesites**

- Good knowledge of Prolog and CHR
- Lecture Rule-based Programming

**Contact**

When algorithms are written in CHR, constraints represent both data and operations. CHR is already incremental by nature, i.e. constraints can be added at runtime. Logical retraction adds decrementality. Hence any algorithm written in CHR with justifications will become fully dynamic. Operations can be undone and data can be removed at any point in the computation without compromising the correctness of the result.

A straightforward source-to-source transformation can introduce justifications for user-defined constraints into the CHR. Then a scheme of two rules suffices to allow for logical retraction (deletion, removal) of constraints during computation. Without the need to recompute from scratch, these rules remove not only the constraint but also undo all consequences of the rule applications that involved the constraint.

**Goal**

Further work should investigate implementation, dynamic algorithms and application domains of CHR with justifications:

- research how logical as well as classical algorithms implemented in CHR behave when they become dynamic.
- improve the implementation, optimize and benchmark it.
- support detection and repair of inconsistencies (for error diagnosis), - support nonmonotonic logical behaviors (e.g. default logic, abduction, defeasible reasoning).

**References**

- Thom Frühwirth: Justifications in Constraint Handling Rules for Logical Retraction in Dynamic Algorithms.
- CHR translator

**Prerequisites**

- Good knowledge of Prolog and CHR
- Lecture Rule-based Programming
- Interest to learn about formal analysis methods of rule-based languages

**Contact**

Extend the analysis techniques and/or the associated tool from the following two research papers:

A dynamic program analysis of the non-termination problem for recursion in the Constraint Handling Rules (CHR) language: A simple program transformation for recursive rules in CHR was introduced that produces one or more adversary rules. When the rules are executed together, a non-terminating computation may arise. It was shown that any non-terminating computation of the original rule contains this witness computation.

**References**

- Thom Fruehwirth: A Devil's Advocate against Termination of Direct Recursion, PPDP 2015.
- Transformation Tool available (use "Devil" options).

A static program analysis of the non-termination problem for recursion in the Constraint Handling Rules (CHR) language: Theorems with so-called misbehavior conditions for potential non-termination and failure (as well as definite termination) of linear direct recursive simplification rules are given. Logical relationships between the constraints in a recursive rule play a crucial role in this kind of program analysis.

**References**

**Prerequesites**

- Lecture "Rule-Based Programming"

**Contact**

In distributed computation, data and processes are distributed over a network of stores and processing units. In a constraint-based programming language paradigm this means that constraints have to be annotated with spatial information defining their whereabouts. Obvious topologies are a distinction between global and local stores as well as trees. Localized constraints can also be used for so-called reified (or meta-)constraints (e.g. https://sicstus.sics.se/sicstus/docs/4.0.8/html/sicstus/Reified-Constraints.html), to store justifications and for spatial reasoning.

In Constraint Handling Rules (CHR), there is a simple source-to-source program transformation that adds local annotations to constraints.

The scope of the work includes implementation of such a transformation, their application and/or static program analysis to derive distribution patterns, i.e. to localize constraint computation while minimizing communication overhead.

**References**

- Edmund S. L. Lam, Iliano Cervesato and Nabeeha Fatima CoMingle:Distributed Logic Programming for Decentralized Mobile Ensembles. In proceedings of International Conference on Distributed Computing Techniques (Coordination'15)
- A. Raffaeta and T. Frühwirth: Spatio-Temporal Annotated Constraint Logic Programming, Third International Symposium on Practical Aspects of Declarative Languages (PADL'01), Las Vegas, USA, March 2001.
- T. Frühwirth: Entailment Simplification and Constraint Constructors for User-Defined Constraints, Third Workshop on Constraint Logic Programming (WCLP 93), Marseille, France, March 1993.

**Prerequesites**

- Lecture "Rule-Based Programming"

**Contact**

Invariants (or assertions, properties, conditions) annotate program text and express static and dynamic properties of a program's execution. Invariants can be expressed as logical relations (predicates) over the program's variables. In the context of constraint-programming and Constraint Handling Rules (CHR), they amount to constraints. These can be readily added to the program to enforce the invariants. By comparing the program with and without invariants expressed as constraints using established program analysis techniques for CHR, namely confluence and program equivalence, we can check if the invariants hold in the program.

Furthermore, invariants can be strenghened and even be generated by adapting the so-called completion method (that is normally used to generate additional rules to make a CHR program confluent).

**References**

Program slicing is a program anaylsis technique whereby one extracts properties and relationships of variables in a program by removing from the program all statements that do not effect the assignments of the variables. In the context of constraint programming and Constraint Handling Rules that deal with logical relations (predicates) this amounts to the logical operation of variable projection. This means that we remove unwanted variables that are not of interest from the program by program transformation. This transformation can be accomplished by adapting the technique of "completion". It is usually used to make a non-confluent program confluent.

**References**

**Prerequisites**

- Lecture "Rule-based Programming

**Contact**

Prolog (WAM) and then Java (JVM) popularized the concept of an abstract (or virtual) machine to implement programming languages in a systematic, portable yet efficient way. Such a machine shall be developed for CHR.

**Goals**

Define the abstract code instructions for CHR and to implement them.

**Prerequesites**

- Good knowledge of Prolog and CHR
- Lecture Rule-based Programming
- Lecture Compiler Construction (Compilerbau) (optional but very helpful)

**Contact**

Repeated recursion unfolding is a new approach that repeatedly unfolds a recursion with itself and simplifies it while keeping all unfolded rules. Each unfolding doubles the number of recursive steps covered. This reduces the number of recursive rule applications to its logarithm at the expense of introducing a logarithmic number of unfolded rules to the program. Efficiency crucially depends on the amount of simplification inside the unfolded rules. A super-linear speedup is provably possible in the best case, i.e. speedup by more than a constant factor. The optimization can lower the time complexity class of a program.

**Goals**

The goal is to implement this optimization scheme as a program transformation in the programming language of choice. If necessary, the scheme should be transferred from recursion to iteration constructs such as loops.

**References**

**Prerequisite**

- Excellent knowledge and programming skills in the choosen programming language.

**Contact**

The declarative programming language Constraint Handling Rules (CHR) is designed as a language extension to other, not necessarily declarative programming languages. There are existing implementations for *Prolog*, *C*, *Java*, *JavaScript*, and others. Since there still plenty of languages for which there is no implementation, or at least no maintained one, and since the problems that can be tackled using CHR are plentiful (think of any problem including the simplification of constraints, or derivation of new from existing information), we want to work on spreading CHR to those languages as well, by designing and (prototypically) implementing runtime environments (RTEs) for CHR in those languages still lacking a CHR implementation.

**Goals**

- Design a runtime environment for CHR, which is able to deal with the specific features of the language of your choice. This may includes implementing logical variables, if they are not already a feature of the language.
- Design translations schemes from CHR to your language of choice
- Implement a prototype of your RTE, and test it extensively with (manually) translated CHR programs, as well as unit tests.
- (
**additionally for Masters theses/projects**) Implement an embedding of CHR into your language of choice. This may be a compiler, a macro system, or an embedded domain specific language (EDSL); whatever your language offers in this regard.

**References**

- P. Van Weert: Efficient Lazy Evaluation of Rule-Based Programs
- P. Van Weert, P. Wuille, T. Schrijvers, B. Demoen: CHR for Imperative Host Languages
- F. Nogatz, T. Frühwirth, D. Seipel: CHR.js: A CHR Implementation in JavaScript
- Dragan Ivanović: Implementing Constraint Handling Rules as a Domain-Specific Language Embedded in Java
- CHR(Python) prototype

**Prerequesites**

- Interest in programming language design and (to some extend) compiler construction.
- Good programming skills and knowledge of the inner workings of CHR and the programming language of your choice.

**Contact**

Data-Oriented Programming (DOP) is a programming paradigm that evolved from Object-Oriented Programming (OOP) and Functional Programming (FP). The main principles of DOP are

- the separation of code and data (in contrast to OOP), i.e, functions are not rigidly linked to a user-defined data-type
- data is always immutable
- the use of generic data structures, instead of user-defined data-types. E.g., instead of defining a data-type (class) with different fields, dictionaries (heterogenous maps) are used

Those principles make dynamically typed languages like *LISP*-dialects or *JavaScript*-derivates ideal for this paradigm, and in fact, a major representative of data-oriented languages is the fairly modern *LISP*-dialect *Clojure*. On the other hand, it is almost impossible to keep true to all of those principles in most statically typed languages like *Java* or *Haskell* (mostly because of the use of generic data structures).

This comes with a price, though: since we use dynamic-typing, we can never guarantee at compile time, that a dictionary, for example, has the field a function needs to work with it. Languages like *TypeScript* or *PureScript* solve this, by beeing able to give partial type annotations for JSON-like objects. Such an approach can be interpreted as a constraint-based type system, where a data-object has to satisfy constraints of the kind *"has field X of type T"* where *X* is a key of some sort, and *T* a data-type.

Since *Prolog* also offers dictionaries, is a dynamically typed language, and offers a powerful macro-system, it is a perfect playground to experiment with such an approach. Thus, we want to implement an annotation system for *Prolog* predicates, that gives the programmer the option to add type-annotations to a predicate, that are checked at compile time, and then, if successfully checked, removed.

**Goals**

- Design a contraint-based type-system for DOP in
*Prolog* - Implement a type-checker for said type-system using
*Prolog*and*Constraint Handling Rules* - Desing user-friendly type-annotations for this type-system

**Prerequesites**

- Lecture Rule-Based Programming (strongly recommended)

**References**

- The Joy of Clojure, Chapter 14: Data Oriented Programming
- SWI-Prolog Manual: Dicts: structures with named arguments

**Contact**

## Empirical Software Engineering

## Conducting and Reporting Empirical Studies

**Context**

Empirical Software Engineering (ESE) plays an increasingly important role in software engineering (SE) with regard to the evaluation of software artifacts, technologies and processes. For the report of significant results, it is fundamentally important to select and correctly apply appropriate statistical methods in addition to an appropriate research method (e.g., experiment, case study, etc.) for the particular research question to be answered.

**Problem**

It can be assumed that many researchers who use statistical methods in the context of ESE research are not trained statisticians. In order to investigate this and to improve the quality of ESE research with respect to the descriptive and inferential statistical methods used, it is first necessary to understand a) to what extent statistical methods are currently applied, b) in which context they are applied (e.g., in the context of which research methods), c) whether and how their selection is reported, and d) whether they are applied correctly. Findings on these should provide insight into possible sources of error. Based on this, suggestions could be developed on how to support researchers in the application of statistical methods specifically related to ESE research.

**Tasks**

- Conducting a systematic literature search (SLR), i.e., selection and analysis of existing work related to ESE research
- Development of a taxonomy of typical deficiencies regarding the use of statistical methods in ESE research papers
- In conjunction with this, building up a relevant literature collection and indexing it (supplementary material).
- Design and development of supportive guidance for the appropriate selection and application of statistical methods related to ESE research

**Related Work**

- Zhang, L., Tian, JH., Jiang, J. et al. Empirical Research in Software Engineering — A Literature Survey. J. Comput. Sci. Technol. 33, 876–899 (2018).
*https://doi.org/10.1007/s11390-018-1864-x*

**Contact**

## Software Evolution

# Development of Multi-Variant Software

**Context**

Feature traces identify the implementation of features in the source code of a software system. *F**eature location*, is the process of recovering lost information on feature traces. One possible method to recover feature traces is *variability mining* [1] where dependencies between types in the used programming language are inspected (semi-)automatically. For example, if we know that a function implements a feature *A*, we know that all calls to that function also have to trace to *A*, or a formula stronger than *A* (i.e., a function call has to be mapped to a feature formula *X* that satisfies the constraint *X ⇒ A*). Thus, we ensure that whenever we call that function in a software variant, that function is also present.

**Problem**

So far type-checking for variability mining has only been used for software product lines. In practice, developers often implemented variability by copying and adapting source code, known as clone-and-own. To support teams developing variability with variants instead of an integrated software product line, we want to adapt the concepts of variability mining to variants. Our goal is to not only detect type errors in a single variant but also in other or future variants.

**Task / Goal**

- Integrate the variability mining software LEADT [1] into our VariantSync prototype IDE.
- Evaluate your results on existing software projects as in previous work on variability mining.

**Related Work**

[1] Variability Mining: Consistent Semi-Automatic Detection of Product-Line Features

**Recommended but Optional Prerequisites**

- Having attended the lecture on Software Product Lines

**Contact**

**Context**

Feature traces identify the implementation of certain program features in the implementation of the software. Feature Trace Recording is a novel methodology to document feature traces already while developers are programming. Therefore developers specify the feature which they are working on as a propositional formula, referred to as the **feature context**. For instance, when writing source code that communicates with the Linux Operating System, it would belong to the feature interaction *Linux ∧ ¬Windows∧ ¬MacOS*. The image above depicts an example where a developer works on the *pop *method of a *stack *in Java. For each edit to the source code, the developer describes which feature he is changing with the feature context.

**Problem**

Todays standard for tracking source code evolution are version control systems such as Git and Subversion. So histories in version control systems are the most abundant datasets for evaluating and employing feature trace recording. However, although intended in version control, commits are rarely disciplined in the sense that each commit captures exactly one change to a feature. Sometimes, even huge refactorings or renamings affecting the entire code base occur in a single commit.

**Task / Goal**

- Literature Survey
- Find a way to detect if a commit is disciplined (i.e., if it touches exactly one feaure or feature interaction).
- Investigate if and how undisciplined commits can be divided into smaller disciplined edits.
- Evaluate or prove that your division maintains consistency
- Evaluate the usefulness of your method: How often are commits undisciplined? How many undisciplined commits can be made disciplined?

**Related Work**

- Feature Trace Recording
- SmartCommit: A Graph-based Interactive Assistant for Activity-Oriented Commits

**Recommended but Optional Prerequisites**

- Having attended the lecture on Software Product Lines

**Contact**

# Differencing and Merging

**Context**

Many tasks need to assess the changes made to development artefacts. A common way to detect changes are diffing operators that take the old and new state of an artefact as input and yield an **edit script** (i.e., a list of edits that transform the old state to the new state). For example, git diff computes a list of source code lines that were *inserted *and a list of lines that where *removed*.

**Problem**

When computing such an edit script, the edits are a valid transformation indeed but might not the actual changes of the developers. In this sense, the edits might diverge from what developers actually did. This becomes more prominent when looking at diffs for abstract syntax trees (ASTs) where edit scripts consist of possibly very fine-grained tree operations.

To this end, we classified semantic edits as edit operations that describe developer‘s intents more accurately when editing ASTs.

**Questions**

- Can we lift edit scripts to semantic-edit scripts?
- Do semantic edits represent developers actions more accurately?
- Do existing tree diffing algorithms yield edits impossible for programmers to execute on the concrete syntax?
- Are semantic edits sufficient to assess developers’ intents?
- Does semantic lifting increase diffing accuracy?
- Can we use state-based diff (based on set operations) to lift edit scripts?

**Goal**

Create an algorithm for semantic lifting of edit scripts obtained from off-the-shelf tree differs to semantic edits. Optionally, extend the notion of semantic edits for increased accuracy. Evaluate your results on existing diffs.

**Related Work**

- Fine-Grained and Accurate Source Code Differencing (also on Github here)
- Change Detection in Hierarchically Structured Information
- A Rule-Based Approach to the Semantic Lfting of Model Differences in the Context of Model Versioning
- Master's Thesis of Paul Bittner

**Contact**