Dispel4py is a free and open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. It enables users to focus on their scientific methods, avoiding distracting details and retaining flexibility over the computing infrastructure they use. It delivers mappings to diverse computing infrastructures, including cloud technologies, HPC architectures and specialised data-intensive machines, to move seamlessly into production with large-scale data loads. The dispel4py system maps workflows dynamically onto multiple enactment systems, such as MPI, STORM and Multiprocessing, without users having to modify their workflows.
More information on dispel4p:
- dispel4py documentation on pythonhosted.org
- Filgueira Rosa and others. dispel4py: A Python framework for data-intensive scientific computing, The International Journal of High Performance Computing Applications 2017, Vol. 31(4) 316–334
s-ProvFlow implements the P4 aspects of the DARE platform. It is a provenance framework for storage and access of data-intensive streaming lineage. It offers a a web API and a range of dedicated visualisation tools based on the underlying provenance model, S-PROV, which utilises and extends PROV and ProvONE models.
S-PROV addresses aspects of mapping between logical representation and concrete implementation of a workflow until its enactment onto a target computational resource. The model captures aspects associated with the distribution of the computation, runtime changes and support for flexible metadata management and discovery for the data products generated by the execution of a data-intensive workflow.
Complete Documentation for the component can be found at the relevant repository.
Dispel4py Workflow Registry
The dispel4py Registry is a RESTful Web service providing functionality for registering workflow entities, such as processing elements (PEs), functions and literals, while encouraging sharing and collaboration via groups and workspaces.
The DARE users should register their workflows in the dispel4py registry before accessing the DARE Execution API in order to run them. Once the workflows are registered, they can be retrieved by name. Each workflow is uniquely identified by its workspace, package and PE name. For more information on the API, check the Documentation.
CWL Workflow Registry
Similarly to the dispel4py workflows, CWLs should also be registered in the respective registry in order to be accessible for execution. This component allows the registration of docker execution environments, which are associated with CWL worklfows.
The platform admins should create and build docker environments and then register them in the CWL Workflow Registry. Afterwards, the research developers can list the existing dockers in the Registry, download their files etc in order to find a suitable environment for their application.
Once they have found an execution environment, they can register their application and associate it with a docker. After the registration, the workflows can be retrieved with their name and version. The DARE execution API will request only this information (i.e. name and version) in order to retrieve the workflow and execute it. If the application used inside the CWL workflow supports MPI, the users can request their docker to run in multiple containers.
DARE Execution API
The DARE Execution API enables the distributed and scalable execution of numerical simulations (now using SPECFEM3D code), dispel4py workflows (e.g. used to describe the steps of RA except for simulations), which can be extended to other execution contexts, and CWL workflow (e.g. used in the Cyclone use case). Execution API also offers services such as uploading/downloading and referencing of data and process monitoring. More information can be found at the relevant repository page.
Data catalogue - Semantic Data Discovery
The Data catalogue is part of the DARE Knowledge Base and manages information related to the data elements processed via the platform. It exposes a RESTful API for registering new data sources and retrieving information on data previously registered or data results produced by processes executed over DARE. For more information visit the respective repository
Testing environment - Playground
DARE platform provides a “playground” - testing environment to research developers. During the testing phase of the workflow development, users can simulate a dispel4py workflow execution as well as simulate a local dispel4py run using the playground module of the DARE platform. Additional information and API description is available in the corresponding repository