Processing of Large Amounts of Data in Ontologies via Abstraction and Refinement
Ontology based data access (OBDA) is an increasingly popular paradigm in the area of knowledge representation and information systems. An ontology in this context is a combination of a TBox with background domain knowledge and an ABox, which contains facts about elements of the application domain. The TBox is used to enrich and integrate large, incomplete, and possibly semi-structured data, which users can then access via queries. For example, a large part of Wikipedia is available in machine-processable form (called DBpedia), which, together with an ontological TBox, is an important information source for many applications. To efficiently handle large ABoxes, OBDA approaches assume that the data is stored in a database. Nevertheless, the assumption of complete data that is typically made in databases (closed world assumption) does not hold and reasoning is required to answer queries. A standard reasoning approach is materialization, i.e., all entailed consequences are added to the ABox before the system accepts queries. For large ABoxes, however, the materialization can take several hours.
The goal of this project is the development of a novel approach to materialization, where we do not compute the materialization directly on the (usually large) ABox, but where we work instead on a smaller ``abstraction'' of the data. For the abstraction, we define criteria under which individuals from the ABox are considered equivalent. Such indistinguishable individuals are then represented just once in the abstraction. For TBoxes that are small compared to the ABox, the abstraction is usually significantly smaller than the original ABox and, hence, the entailed consequences can be computed efficiently in main-memory. Through the entailed consequences individuals that were indistinguishable may become distinguishable. To account for that, the initial abstraction is iteratively refined until a fixed-point is reached.
In the project, we plan to analyze to which TBox language the procedure can be extended and how the iterative refinement process has to be adapted to obtain soundness and completeness. We further aim at developing parallel refinement algorithms. Finally, we would like to analyze how the materialized abstractions can be used also for complex reasoning tasks such as the computation of instances of complex concepts.
The proposed project supports the efficient use of the ever growing sources of structured data by combining well-established database technologies with in-memory-based reasoning techniques in a novel way.
Project period: April 2015 to November 2018
- Birte Glimm (principal investigator)
- Trung Kien Tran (scientific staff)