Information Extraction

Extraction of information from technical due diligence reports to build a structured knowledge base

For more and more companies, digital transformation is an integral part of their corporate strategy, as many see it as a key to future success. And rightly so: practice shows that a focus on digitisation has a positive impact on the companies’ development. The real estate sector also wants to use digital transformation to strengthen its market position in the German economy and hold its own against disruptive competitors. Profound changes such as online search portals and virtual tours are already playing a crucial role in the day-to-day work of these companies. In addition, e-archives, cloud computing or the Internet-of-Things (IoT) are becoming increasingly important.

These challenges make day-to-day work more difficult and employees are less and less able to concentrate on their regular tasks. For this reason, an international consulting firm for the construction and real estate sector headquartered in Germany has set itself the goal of testing the extent to which existing digital technical due diligence (TDD) reports can be analysed by means of a new, intelligent system in order to extract selected technically relevant information (e.g., the cut-off date of the building inspection) based on the textual content in an automated way. The challenge here lies in particular in the different structuring of the TDD reports, which can even differ within one location of the consulting company in terms of structure, layout and content. Currently, employees have to manually capture this information based on the textual content of the TDD reports in a laborious and time-consuming process. Considering the large and ever-increasing number of TDD reports, it is becoming increasingly difficult for the consulting firm to extract all selected information consistently and with due diligence in a reasonable amount of time. However, this is essential, especially for the timely development of a structured knowledge base that makes the information from the TDD reports available for further analyses in a machine-readable form.

To implement this disruptive project, a system was developed, in cooperation with the Institute of Business Analytics, that analyses the digital TDD reports in an automated way and extracts the desired technical information based on the textual content. For this purpose, artificial intelligence methods for information extraction were designed and implemented. Finally, after the technical design was finished and implemented, the quality of the system was determined to confirm the functionality of the implementation. More than 93% of the searched information could be extracted (recall) with a very high percentage (98%) of correctly extracted information (precision).

Cooperation partner: consulting company for the construction and real estate sector.

Project period: June 2018 - August 2018