Architecture

Platform Architecture

DARE Components

dispel4py

Dispel4py is a free and open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. It enables users to focus on their scientific methods, avoiding distracting details and retaining flexibility over the computing infrastructure they use. It delivers mappings to diverse computing infrastructures, including cloud technologies, HPC architectures and specialised data-intensive machines, to move seamlessly into production with large-scale data loads. The dispel4py system maps workflows dynamically onto multiple enactment systems, such as MPI, STORM and Multiprocessing, without users having to modify their workflows.

More information on dispel4p:

s-ProvFlow

s-ProvFlow implements the P4 aspects of the DARE platform. It is a provenance framework for storage and access of data-intensive streaming lineage. It offers a a web API and a range of dedicated visualisation tools based on the underlying provenance model, S-PROV, which utilises and extends PROV and ProvONE models.

S-PROV addresses aspects of mapping between logical representation and concrete implementation of a workflow until its enactment onto a target computational resource. The model captures aspects associated with the distribution of the computation, runtime changes and support for flexible metadata management and discovery for the data products generated by the execution of a data-intensive workflow.

Complete Documentation for the component can be found at the relevant repository.

Dispel4py Workflow Registry

The dispel4py Registry is a RESTful Web service providing functionality for registering workflow entities, such as processing elements (PEs), functions and literals, while encouraging sharing and collaboration via groups and workspaces.

The DARE users should register their workflows in the dispel4py registry before accessing the DARE Execution API in order to run them. Once the workflows are registered, they can be retrieved by name. Each workflow is uniquely identified by its workspace, package and PE name. For more information on the API, check the Documentation.

CWL Workflow Registry

Similarly to the dispel4py workflows, CWLs should also be registered in the respective registry in order to be accessible for execution. This component allows the registration of docker execution environments, which are associated with CWL worklfows.

The platform admins should create and build docker environments and then register them in the CWL Workflow Registry. Afterwards, the research developers can list the existing dockers in the Registry, download their files etc in order to find a suitable environment for their application.

Once they have found an execution environment, they can register their application and associate it with a docker. After the registration, the workflows can be retrieved with their name and version. The DARE execution API will request only this information (i.e. name and version) in order to retrieve the workflow and execute it. If the application used inside the CWL workflow supports MPI, the users can request their docker to run in multiple containers.

DARE Execution API

The DARE Execution API enables the distributed and scalable execution of numerical simulations (now using SPECFEM3D code), dispel4py workflows (e.g. used to describe the steps of RA except for simulations), which can be extended to other execution contexts, and CWL workflow (e.g. used in the Cyclone use case). Execution API also offers services such as uploading/downloading and referencing of data and process monitoring. More information can be found at the relevant repository page.

Data catalogue - Semantic Data Discovery

The Data catalogue is part of the DARE Knowledge Base and manages information related to the data elements processed via the platform. It exposes a RESTful API for registering new data sources and retrieving information on data previously registered or data results produced by processes executed over DARE. For more information visit the respective repository

Testing environment - Playground

DARE platform provides a “playground” - testing environment to research developers. During the testing phase of the workflow development, users can simulate a dispel4py workflow execution as well as simulate a local dispel4py run using the playground module of the DARE platform. Additional information and API description is available in the corresponding repository