Information Extraction

Extraction of information from technical due diligence final reports to build a structured knowledge base

For more and more companies, digital transformation is an integral part of corporate strategy, as many see it as a key to future success. And rightly so: Practice shows that a focus on digitization has a positive impact on the development of companies. The real estate industry also wants to use the digital transformation to consolidate its market position in the German economy and hold its own against disruptive competitors. Profound changes such as online search portals and virtual tours are already playing a crucial role in the day-to-day work of companies. In addition, e-archives, cloud computing or the Internet-of-Things (IoT) are becoming increasingly important. 

These challenges make everyday work more difficult and employees are less and less able to concentrate on their regular activities. For this reason, an international consulting firm for the construction and real estate sector headquartered in Germany set itself the goal of testing the extent to which existing digital technical due diligence (TDD) final reports can be analyzed by means of a new, intelligent system in order to automatically extract selected technically relevant information (e.g., the key date of the building inspection) based on the textual content. The challenge here lies in particular in the different structuring of the TDD completion reports, which can even differ within one location of the consulting company in terms of structure, layout and content. Currently, employees have to manually capture this information based on the textual content of the TDD final reports in a laborious and time-consuming process step. Against the background of the large and ever-increasing number of TDD final reports, it is becoming increasingly difficult for the consulting company to extract all selected information consistently and with due diligence in a reasonable amount of time. However, this is essential, especially for the timely development of a structured knowledge base that makes the information from the TDD completion reports available for further analyses in a machine-readable form.

To implement this disruptive project, a system was developed in cooperation with the Institute for Business Analytics that automatically analyzes the digital TDD completion reports and extracts the desired technical information based on the textual content. For this purpose, artificial intelligence methods for information extraction were designed and implemented. Finally, after the technical design and implementation was done, the goodness of the system was determined to confirm the functioning of the implementation. More than 93% of the searched information could be extracted (recall) with a very high percentage of 98% correctly extracted information (precision).

Cooperation partner: Consulting company for the construction and real estate sector

Project period: June 2018 - August 2018