A convenient way to implement parallel computations is to describe them in the form of a flow graph, which is a graph that may contain branches and loops.
During the execution of the application, this flow graph can be unrolled into a directed acyclic graph, depnding on the execution flow of the application.
Traditionally, thse applications can only be executed on shared memory.
We are developing methods to execute flow graph based applications on distributed memory systems. This includes the distributed unrolling into a direct acyclic graph and scheduling the generated tasks. This schedling uses data locality as a schedling metric to prevent excessive and unnecessary data transfers among nodes.