Analysing 2020 BPI Challenge Data Using Machine Learning: A Decision Tree Approach

Ulm University

[SAPS] Projektabschlusspräsentation, Sebastian Schäfer, Ort: Online, Datum: 30.03.2021, Zeit: 15:00 Uhr

The BPI Challenge 2020 offers a data set along with some research questions to be answered by applying analytical algorithms to the data. This report describes an attempt to answer one such question regarding the determination of properties of specific classes of data using a machine learning approach employing decision trees. The decision trees are intended to both split the data according to the searched-for classes, while at the same time also revealing information about the properties of these classes. The reader is first introduced to the theoretical foundations of decision trees, in particular the used CART algorithm. In the further progress of the report, the data set is then described and the input data for the decision tree learning algorithm is prepared. This pre-processed data is used for learning a decision tree which is then evaluated and an attempt at optimizing the result is taken. We describe how, from this optimized tree, some properties of a specific class of data can be inferred. In closing, ideas for improving the results are presented, among them reducing the decision tree learning problem to a simpler one, where not all classes of data are examined at the same time, but rather just one class and where the process is repeated multiple times.