GraphScale Project Description

"Which studies tested substances from the class of neuro transmitters, which activate the same receptors as epinephrine?" A prompt and correct answer to such questions is critical for companies who have to develop new or improved medications using a fast range of research data in a time-critical setting. Also other agile companies more and more rely on knowledge intensive processes to create added value. Semantic technologies contribute a core part in this area, due to their ability to associate data with semantics. This allows for also retrieving hidden connections in the data derived by means of automated reasoning.

The standards of the World Wide Web Consortiums (W3C) for knowledge representation play an important role in this context. With these standards, knowledge is represented in the form of triples, e.g., the triple (adrenaline sameAs epinephrine) that adrenaline is a synonyme for epinephrine, because the W3C assigned this meaning to the keyword sameAs. Additional modeling constructs, for example, subClassOf can be used to denote subclass relationships between classes of objects or the keyword type can be used to define instances of a class of objects. Due to the triple form, it is convenient to represent the knowledge as a graph: adrenaline and epinephrine are seen as nodes that are connected through an edge labeled with sameAs. Due to the standardized semantics of the keywords, implicit knowledge can be made explicit by automated reasoning systems. Knowledge intensive process can, therefore, be realized better and faster.

Practical experiences show, however, that current solutions require compromises for large and dynamic data sets. The GraphScale project aims at developing a technology that stores knowledge in graph structures while supporting parallel and incremental reasoning algorithms and revisioning mechanisms. In contrast to traditional approaches the graph representation promises a more efficient memory management and better parallelization, which allows for more efficient automated reasoning.

For this purpose, we plan to develop a novel deduction calculus that is optimized for graph databases. Regarding the optimizations modularization and graph partitioning techniques are being studied and integrated into the reasoning system. Another aspect is the adaption of the usually main memory based reasoner for the use of secondary storage. For this, one has to identify relevant parts of the overall data that are on-demand loaded into main memory, while guaranteeing the correctness and completeness of the overall system. The developed algorithms will be implemented in a prototype system, which considers the requirements of the associated partners of the project. The prototype will further be used to evaluate the outcome of the project in the later phases.

Project period: July 2012 to August 2014.

The project is a KMU innovativ Projekt with the partner derivo GmbH funded by the Federal Ministry of Education and Research.

Project Members

Birte Glimm (principal investigator)

Trung Kien Tran (scientific staff)