API Documentation

In this page, information on DARE API is provided. DARE Platform follows the Microservices software architecture, therefore the DARE API is constitute by multiple individual APIs. The following sections list the API endpoints of each DARE service. For high level description of the components visit the features section of our site.

The main component APIs described here are:

  1. dare-login service which interacts with Keycloak in the backend
  2. d4p-registry, API to interact with the dispel4py workflow registry
  3. workflow-registry, API to interact with the CWL workflow registry
  4. exec-api, the DARE execution API
  5. s-prov, the provenance API
  6. playground, which enables a development environment for scientists to write their workflows
  7. semantic-data-discovery, API to retrieve data from the Data Catalogue

Finally, we provide documentation on the dispel4py library.

DARE Login API

DARE login service interacts between DARE components and Keycloak. It exposes functionality for sign in to the platform, refreshing and validating a token etc. The main API calls are:

HTTP method Endpoint Description Content Type Parameters
POST /auth/ Authenticates a user performing HTTP call
to the Keycloak service.
After having successfully authenticated
the user, Dispel4py Registry,
CWL registry and Execution API are
notified to check if the
user already exists in their local DBs
application/json data (body), example:
{
"username": "string",
"password": "string",
"requested_issuer": "string"}
POST /validate-token/ Validates a token using the Keycloak Service application/json data (body),example:
{
"access_token": "string"
}
POST /delegation-token/ Issues a token for internal application use application/json data (body), example:
{
"access_token": "string"
}
POST /refresh-token/ Uses the refresh token to issue a new token for a user application/json data (body), example:
{
"refresh_token": "string",
"issuer": "string"
}

The technical documentation of the dare-login component can be found here.

D4p Information Registry

The Dispel4py Workflow Registry enables the research developers to register their dispel4py workflows, re-use and share them. The fowlloing table shows the available API endpoints of this component.

HTTP method Endpoint Description Content Type Parameters
GET /connections/ Retrieves all the available PE Connection resources.
A PE Connection resource allows the
addition and manipulation
of PE connections.
Connections are associated with PEs and
are not themselves workspace items
application/json No parameters
POST /connections/ Creates a new PE Connection resource,
which allows the addition
and manipulation of PE connections.
Connections are associated
with PEs and are not themselves
workspace items
application/json data (body) example: {
"comment" : "string" ,
"kind" : "string" ,
"modifiers" : "string" ,
"name" : "string" ,
"is_array" : true ,
"s_type" : "string" ,
"d_type" : "string" ,
"pesig" : "string" }
GET /connections/{id}/ Retrieves a specific PE Connection resource.
A PE Connection resource allows the addition
and manipulation of PE connections.
Connections are associated with PEs and
are not themselves workspace items.
application/json id (integer)
PUT /connections/{id}/ Updates an existing PE Connetion resource.
A PE Connection resource allows the addition
and manipulation of PE connections.
Connections are associated with PEs and
are not themselves workspace items.
application/json -id (integer)
-data (body) example:
{
"comment" : "string" ,
"kind" : "string" ,
"modifiers" : "string" ,
"name" : "string" ,
"is_array" : true ,
"s_type" : "string" ,
"d_type" : "string" ,
"pesig" : "string"
}
DELETE /connections/{id}/ Deletes an existing PE Connection resource
from the DB
application/json id (integer)
GET /fnimpls/ Retrieve all the available function
implementation resources
application/json No parameters
POST /fnimpls/ Creates a new Function Implementation application/json data (body), example:
{
"code" : "string",
"parent_sig": "string",
"description" : "string",
"pckg" : "string",
"workspace": "string" ,
"clone_of": "string" ,
"name": "string"
}
GET /fnimpls/{id}/ Retrieves a specific Function implementation resource application/json id (integer)
PUT /fnimpls/{id}/ Updates an existing function implementation application/json -id (integer)
-data (body), example:
{
"code": "string",
"parent_sig": "string",
"description": "string",
"pckg": "string",
"workspace": "string",
"clone_of": "string",
"name": "string"
}
DELETE /fnimpls/{id}/ Deletes an existing Function Implementation application/json id (integer)
GET /fnparams/ Retrieves all the available Function Parameters application/json No parameters
POST /fnparams/ Creates a new Function Parameters application/json data (body), example:
{
"parent_function": "string",
"param_name": "string",
"param_type" : "string"
}
GET /fnparams/{id}/ Retrieves a specific Function Parameters application/json id (integer)
PUT /fnparams/{id}/ Updates an existing Function
Parameters entry
application/json -id (integer)
-data (body) example:
{
"parent_function": "string",
"param_name": "string",
"param_type": "string"
}
DELETE /fnparams/{id}/ Deletes an existing Function Parameters application/json id (integer)
GET /functions/ Retrieves all the Function resources
from the DB
application/json No parameters
POST /functions/ Creates a new Function Resource application/json data (body), example:
{
"description" : "string",
"parameters" : ["string"],
"fnimpls": ["string"],
"pckg": "string",
"workspace": "string",
"return_type": "string",
"clone_of": "string",
"name": "string"
}
GET /functions/{id}/ Retrieves an existing Function resource application/json id (integer)
PUT /functions/{id}/ Updates an existing function resouce application/json -id (integer)
-data(body), example:
{
"description": "string",
"parameters": ["string"],
"fnimpls": ["string"],
"pckg": "string",
"workspace": "string",
"return_type": "string",
"clone_of": "string",
"name": "string"
}
DELETE /functions/{id}/ Deletes an existing function resource application/json id (integer)
GET /groups/ Retrieves all the available groups application/json No parameters
POST /groups/ Creates a new user group application/json data (body), example: { "name": "string" }
GET /groups/{id}/ Retrieves a user group application/json id (integer)
PUT /groups/{id}/ Updates an existing user group application/json -id (integer)
-data (body),
example: { "name": "string" }
DELETE /groups/{id}/ Removes a user group application/json id (integer)
GET /literals/ Retrieves all the literal entities application/json No parameters
POST /literals/ Creates a new Literal Entities application/json data (body), example:
{
"description": "string",
"value": "string",
"name": "string",
"pckg": "string",
"workspace": "string",
"clone_of" : "string"
}
GET /literals/{id}/ Retrieves a Literal Entities application/json id (integer)
PUT /literals/{id}/ Updates an existing Literal Entities application/json -id (integer)
-data (body), example:
{
"description": "string",
"value": "string",
"name": "string",
"pckg": "string",
"workspace": "string",
"clone_of": "string"
}
DELETE /literals/{id}/ Deletes a Literal Entities application/json id (integer)
GET /peimpls/ Retrieves all the available
PE Implementation
application/json No parameters
POST /peimpls/ Creates a new PE Implementation application/json data (body), example:
{
"code": "string",
"parent_sig": "string",
"description": "string",
"pckg": "string",
"workspace": "string",
"clone_of": "string",
"name": "string"
}
GET /peimpls/{id}/ Retrieves a specific
PE Implementation
application/json id (integer)
PUT /peimpls/{id}/ Updates an existing PE Implementation application/json -id (integer)
-data(body), example:
{
"code": "string",
"parent_sig": "string",
"description": "string",
"pckg": "string", "workspace": "string",
"clone_of" : "string",
"name": "string"
}
DELETE /peimpls/{id}/ Deletes an existing PE Implementation application/json id (integer)
GET /pes/ Retrieves all the available PE resources application/json No parameters
POST /pes/ Creates a new PE application/json data (body), example:
{
"description": "string",
"name": "string",
"connections": ["string"],
"pckg": "string",
"workspace": "string",
"clone_of": "string",
"peimpls": ["string"]
}
GET /pes/{id}/ Retrieves a specific PE application/json id (integer)
PUT /pes/{id}/ Updates an existing PE application/json -id(integer)
-data(body), example:
{
"description": "string",
"name": "string",
"connections": ["string"],
"pckg": "string",
"workspace": "string",
"clone_of": "string",
"peimpls": ["string"]
}
DELETE /pes/{id}/ Deletes an existing PE application/json id (integer)
GET /registryusergroups/ Retrieves all the available
registry user groups
application/json No parameters
POST /registryusergroups/ Creates a new Registry user group application/json data (body), example:
{
"description": "string",
"group_name": "string"
}
GET /registryusergroups/{id}/ Retrieves a specific Registry user group application/json id (integer)
PUT /registryusergroups/{id}/ Updates an existing Registry user group application/json -id(integer)
-data(body), example:
{
"owner": "string",
"description": "string",
"group_name": "string"
}
DELETE /registryusergroups/{id}/ Deletes an existing Registry user group application/json id (integer)
GET /users/ Retrieves all the existing users application/json No parameters
POST /users/ Creates a new user application/json data (body), example:
{
"username": "string",
"password": "string",
"first_name": "string",
"last_name": "string",
"email": "string"
}
GET /users/{id}/ Retrieves a specific user application/json id (integer)
PUT /users/{id}/ Updates a specific user application/json -id(integer)
-data(body), example:
{
"username": "string",
"password": "string",
"first_name": "string",
"last_name": "string",
"email": "string"
}
DELETE /users/{id}/ Deletes a specific user application/json id (integer)
GET /workspaces/ Retrieves all the available workspaces application/json parameters:

-name: name
-description: The name of the workspace we want to display
-paramType: query

-name: username
-description: The username the workspace is associated with
-paramType: query

-name: search
- description: perform a simple full-text on
descriptions and names of workspaces.
-paramType: query
POST /workspaces/ Create or clone a new workspace application/json parameters:

- name: name
- description: the name of the workspace.

- name: description
- description: a textual description of the workspace.

- name: clone_of
- description: indicates that a cloning operation is requested.
- paramType: query
- type: long
GET Retrieves a specific workspace application/json id (integer)
PUT /workspaces/{id}/ Updates an existing workspace application/json -id (integer)
-data (body), example:
{
"clone_of": "string",
"name": "string",
"description": "string"
}
DELETE /workspaces/{id}/ Deletes an existing workspace application/json id (integer)

CWL Workflow Registry

This component is a Django Web Service exposing an API for CWLs and dockers registration. The technical documentation for the CWL workflow registry is available in the project’s micro site here. The following table shows all the available API calls in the CWL workflow registry.

HTTP method Endpoint Description Content Type Parameters
POST /docker/ Creates a new Docker Environment.
The environment consist of a Dockerfile and
can be associated with one or
multiple DockerScript entries
(which represent bash or python scripts)
application/json data (body), example:
{
"docker_name": "name",
"docker_tag": "tag",
"script_names": ["script1.sh", "script2.sh"]
"files":
{
"dockerfile": "string",
"script1.sh": "string.",
"script2.sh": "string"
},
"access_token": "token"
}
POST /docker/update_docker/ Updates an existing Docker Environment application/json data (body), example:
{
"docker_name": "name",
"docker_tag" "tag",
"update": {"tag": "v2.0"},
"files": {"dockerfile": "string"},
"access_token": "token"
}
POST /docker/provide_url/ Updates an existing Docker environment’s url field.
Once the docker image is built
and pushed in a public repository,
the relevant Docker entry should
be updated with the URL.
application/json data (body), example:
{
"docker_name": "name",
"docker_tag": "tag",
"docker_url": "url",
"access_token": "token"
}
DELETE /docker/delete_docker/ Deletes an existing docker environment. application/json data (body), example:
{
"docker_name": "name",
"docker_tag": "tag",
"access_token": "token"
}
GET /docker/bynametag/ Retrieves a Docker Environment using
its name and tag.
application/json -docker_name (string)
-docker_tag(string)
GET /docker/byuser/ Retrieves all the registered Docker
environments by user
application/json -requested_user(string) if exists,
otherwise it uses the user that
performed the request
GET /docker/download/ Downloads in a zip file the Dockerfile
and the relevant scripts of a
Docker Environment.
application/json -docker_name (string)
-docker_tag(string)
POST /scripts/add/ Adds a new script in an existing
Docker Environment
application/json data (body), example:
{
"docker_name": "name",
"docker_tag": "tag",
"script_name": "entrypoint.sh",
"files": {"entrypoint.sh": "string"},
"access_token": "token"
}
POST /scripts/edit/ Edits an existing script of a
Docker Environment
application/json data (body), example:
{
"docker_name": "name",
"docker_tag": "tag",
"script_name": "entrypoint.sh",
"files": {"entrypoint.sh": “string”},
"access_token": "token"
}
DELETE /scripts/delete/ Deletes an existing script from a docker environment application/json data (body), example:
{
"docker_name": "name",
"docker_tag": "tag",
"script_name": "entrypoint.sh",
"access_token": "token"
}
GET /scripts/download Downloads a specific script from
a Docker Environment
application/json -docker_name(string)
-docker_tag(string)
-script_name(string)
GET /scripts/byname Retrieves a specific script based on the name
& tag of the Docker Environment
and on the name of the script.
application/json -docker_name(string)
-docker_tag(string)
-script_name(string)
POST /workflows/ Creates a new CWL workflow of
class Workflow
application/json data (body), example:
{
"workflow_name": "demo_workflow.cwl",
"workflow_version": "v1.0",
"spec_file_name": "spec.yaml",
"docker_name": "name",
"docker_tag": "tag",
"workflow_part_data":
[{
"name":arguments.cwl”,
"version":"v1.0",
"spec_name": "arguments.yaml"
},
{
"name": "tar_param.cwl",
"version":"v1.0",
"spec_name": "tar_param.yaml"
}],
"files":
{
"demo_workflow.cwl":"string",
"spec.yaml": "string",
"arguments.cwl": "string",
"arguments.yaml": "string",
"tar_param.cwl": "string",
"tar_param.yaml": "string"
},
"access_token": "token"
}
POST /workflows/update_workflow/ Updates an existing CWL workflow of
class Workflow
application/json data (body), example:
{
"workflow_name":"demo_workflow.cwl",
"workflow_version": "v1.0",
"files": {"workflow_file": "string",
"spec_file": "string",},
"update": {"version":"v1.1"},
"access_token": "token"
}
POST /workflows/update_docker/ Associate a CWL workflow of class
Workflow with a different Docker
Environment.
application/json data (body), example:
{
"workflow_name":"demo_workflow.cwl",
"workflow_version": "v1.0",
"docker_name": "test",
"docker_tag": "v1.0",
"access_token": "token"
}
DELETE /workflows/delete_workflow/ Deletes an existing CWL workflow
(class Workflow) and all the
associated Workflow parts
(class CommandLineTool).
application/json data (body), example:
{
"workflow_name":"demo_workflow.cwl",
"workflow_version": "v1.0",
"access_token": "token"
}
GET /workflows/bynameversion/ Retrieve a CWL workflow of class Workflow
and its associated workflow parts
as well as the related docker
environment, based on the workflow
name and version.
application/json -workflow_name(string)
-workflow_version(string)
GET /workflows/download Downloads in a zip file all the CWL files
(Workflow and CommandLineTool) as well as
the relevant Dockerfile and scripts
(if the parameter dockerized is provided)
application/json -workflow_name(string)
-workflow_version(string)
-dockerized(boolean)
application/json
POST /workflow_parts/add/ Adds a new CommandLineTool CWL in an existing CWL workflow application/json data (body), example:
{
"workflow_name":"demo_workflow.cwl",
"workflow_version": "v1.0",
"workflow_part_name":"arguments.cwl",
"workflow_part_version": "v1.0",
"spec_name": "arguments.yaml",
"files": {"arguments.cwl": "string",
"arguments.yaml": "string"},
"access_token": "token"
}
POST /workflow_parts/edit/ Edits an existing CommandLineTool CWL workflow application/json data (body), example:
{
"workflow_name":"demo_workflow.cwl",
"workflow_version": "v1.0",
"workflow_part_name":"arguments.cwl",
"workflow_part_version": "v1.0",
"spec_name": "arguments.yaml",
"files":
{
"arguments.cwl":"string",
"arguments.yaml": "string"
},
"update":
{
"version":"v1.1”
},
"access_token": "token"
}
DELETE /workflow_parts/delete/ Deletes an existing CommandLIneTool CWL workflow application/json data (body), example:
{
"workflow_name":"demo_workflow.cwl",
"workflow_version": "v1.0",
"workflow_part_name": "arguments.cwl",
"workflow_part_version": "v1.0",
"access_token": "token"
}
GET /workflow_parts/bynameversion Retrieves a specific CommandLineTool CWL based
on its parent (name & version) and
its own name and version.
application/json -workflow_name(string)
-workflow_version(string)
-workflow_part_name(string)
-workflow_part_version(string)
GET /workflow_parts/download/ Downloads a specific CWL of class CommandLineTool application/json -workflow_name(string)
-workflow_version(string)
-workflow_part_name(string)
-workflow_part_version(string)
POST /accounts/login/ Authenticates a user (login) used by the
dare-login component described above
when a user calls the /auth/ endpoint of the dare-login.
If the user does not exist in the CWL workflow
registry’s local DB,
it creates a new user.
application/json data (body), example:
{
"username": "string",
"password":"string",
"access_token":"string",
"email":"string",
"given_name":"string",
"family_name":"string"
}

Execution API

General

Execution API provides endpoints for multiple execution contexts:

  • Dispel4py: dynamically creates containers to execute Dispel4py workflows
  • CWL: execution environments spawned on-demand to execute CWL workflows.
  • Specfem: creates containers in a dynamic way to execute Specfem executable. This endpoint is deprecated, Specfem is now executed via the CWL endpoint.

API calls

HTTP method Endpoint Description Content Type Parameters
POST /create-folders/ Endpoint used by the /auth/ endpoint of dare-login.
Checks if the user’s workspace in the DARE platform
is available, otherwise it creates the
necessary folder structure
application/json data (body), example:
{
"username": "string"
}
POST /d4p-mpi-spec/ Used internally by the dispel4py execution environment
in order to retrieve the respective
PE Implementation and spec.yaml
application/json data (body),example:
{
"pe_imple": "name",
"nodes": 3,
"input_data": {}
}
POST /run-d4p/ Creates a new dispel4py execution environment,
using the Kubernetes API.
Generates a new run directory,
stored under the user’s “runs” folder
(i.e. ///runs/).
All the execution results are
stored in the generated run directory.
application/json data (body), example:
{
"access_token": "string",
"workspace": "string",
"pckg": "string",
"pe_name":"string",
"target":"string",
"nodes":1
}
POST /run-specfem/ Deprecated endpoint. Use /run-cwl instead. application/json -
POST /run-cwl/ Endpoint to instantiate an execution environment
for CWL workflow execution.
The environment to be instantiated
is retrieved from the CWL using the
CWL Workflow Registry. Generates a new run
directory, stored under the user’s “runs” folder
(i.e. ///runs/). All the execution
results are stored in the
generated run directory.
application/json data (body), example:
{
"access_token": "string",
"nodes":12,
"workflow_name":"string",
"workflow_version": "string",
"input_data":
{
"example1":"string"
}
}
POST /upload/ Endpoint used to upload files in the DARE platform.
The files are stored under the user’s home directory.
The home directory is named after his/hers username and
inside there are 3 folders, i.e. uploads, debug and runs.
All the uploaded files are stored under the user’s “uploads” directory
application/json data (body), example:
{
"dataset_name": "string",
"path": "string",
"access_token": "string",
"files": []
}
GET /my-files/ Lists all the users’ directories under the “uploads”,
“runs” and “debug” folders. If the parameter
num_run_dirs is present, the response
is limited to the most recent directories based on
the number provided in the aforementioned parameter
application/json -access_token(string)
-num_run_dirs(integer)
GET /list/ Lists all the files inside a specific directory.
This directory could be retrieved from the previous endpoint
application/json -access_token(string)
-path(string)
GET /download/ Downloads a specific file from the DARE platform.
To find the file’s full path use the two previous endpoints
application/json -access_token(string)
-path(string)
GET /send2drop/ Uploads files from the dare platform to B2DROP application/json -access_token(string)
-path(string)
GET /cleanup/ Clears the user’s folders (uploads, runs, debug) application/json -access_token(string)
-runs(boolean)
-uploads(boolean)
-debug(boolean)
GET /my-pods List the running jobs of a user application/json data example: {"access_token": "string"}

Technical documentation of the component is also available here

Provenance

Version: v1

/data


GET

Description: The data is selected by specifying a query string. Query parameters allow to search by attribution to a component or to an implementation, generation by a workflow execution and by combining more metadata and parameters terms with their min and max valuesranges. Mode of the search can also be indicated (mode ::= (OR | AND). It will apply to the search upon metadata and parameters values-ranges

Parameters

Name Located in Description Required Schema
usernames query csv list of users the Workflows Executons are associated with No string
terms query csv list of metadata or parameter terms. These relate positionally to the maxvalues and the minvalues No string
functionNames query csv list of functions the Data was generated with No string
wasAttributedTo query csv list of Component or Component Instances involved in the generation of the Data No string
minvalues query csv list of metadata or parameters minvalues. These relate positionally to the terms and the minvalues No string
rformat query unimplemented: format of the response payload (json,json-ld) No string
start query index of the starting item Yes integer
limit query max number of items expected Yes integer
maxvalues query csv list of metadata or parameters maxvalues. These relate positionally to the terms and the minvalues No string
formats query csv list of data formats (eg. mime-types) No string
types query csv list of data types No string
wasGeneratedBy query the id of the Invocation that generated the Data No string

Responses

Code Description

/data/filterOnAncestor


POST

Description: Filter a list of data ids based on the existence of at least one ancestor in their data dependency graph, according to a list of metadata terms and their min and max values-ranges. Maximum depth level and mode of the search can also be indicated (mode ::= (OR | AND)

Parameters

Name Located in Description Required Schema
body body No object

Responses

Code Description

/data/{data_id}


GET

Description: Extract Data and their DataGranules by the Data id

Parameters

Name Located in Description Required Schema
data_id path Yes string

Responses

Code Description

/data/{data_id}/derivedData


GET

Description: Starting from a specific data entity of the data dependency is possible to navigate through the derived data or backwards across the element’s data dependencies. The number of traversal steps is provided as a parameter (level).

Parameters

Name Located in Description Required Schema
level query level of depth in the data derivation graph, starting from the current Data Yes string
data_id path Yes string

Responses

Code Description

/data/{data_id}/export


GET

Description: Export of provenance information PROV-XML or RDF format. The S-PROV information returned covers the whole workflow execution or is restricted to a single data element. In the latter case, the graph is returned by following the derivations within and across runs. A level parameter allows to indicate the depth of the resulting trace

Parameters

Name Located in Description Required Schema
format query export format of the PROV document returned No string
rdfout query export rdf format of the PROV document returned No string
creator query the name of the user requesting the export No string
level query level of depth in the data derivation graph, starting from the current Data Yes string
data_id path Yes string

Responses

Code Description

/data/{data_id}/wasDerivedFrom


GET

Description: Starting from a specific data entity of the data dependency is possible to navigate through the derived data or backwards across the element’s data dependencies. The number of traversal steps is provided as a parameter (level).

Parameters

Name Located in Description Required Schema
level query level of depth in the data derivation graph, starting from the current Data Yes string
data_id path Yes string

Responses

Code Description

/instances/{instid}


GET

Description: Extract details about a single instance or component by specifying its id. The returning document will indicate the changes that occurred, reporting the first invocation affected. It support the specification of a list of runIds the instance was wasAssociateFor, considering that the same instance could be used across multiple runs

Parameters

Name Located in Description Required Schema
start query index of the starting item Yes integer
limit query max number of items expected Yes integer
wasAssociateFor query cvs list of runIds the instance was wasAssociateFor (when more instances are reused in multiple workflow executions) No string
instid path Yes string

Responses

Code Description

/invocations/{invocid}


GET

Description: Extract details about a single invocation by specifying its id

Parameters

Name Located in Description Required Schema
invocid path Yes string

Responses

Code Description

/summaries/collaborative


GET

Description: Extract information about the reuse and exchange of data between workflow executions based on terms' valuesranges and a group of users. The API method allows for inclusive or exclusive (mode ::= (OR j AND) queries on the terms' values. As above, additional details, such as running infrastructure, type and name of the workflow can be selectively extracted by assigning these properties to a groupBy parameter. This will support the generation of grouped views

Parameters

Name Located in Description Required Schema
wasAssociatedWith query csv lis of Components involved in the Workflow Executions No string
usernames query csv list of users the Workflows Executons are associated with No string
terms query csv list of metadata or parameter terms. These relate positionally to the maxvalues and the minvalues No string
functionNames query csv list of functions that are executed by at least one workflow’s components No string
level query level of depth in the data derivation graph, starting from the current Data Yes string
minvalues query csv list of metadata or parameters minvalues. These relate positionally to the terms and the minvalues No string
rformat query unimplemented: format of the response payload (json,json-ld) No string
maxvalues query csv list of metadata or parameters maxvalues. These relate positionally to the terms and the minvalues No string
formats query csv list of data formats (eg. mime-types) No string
clusters query csv list of clusters that describe and group one or more workflow’s component No string
groupby query express the grouping of the returned data No string
types query No string
mode query execution mode of the workflow in case it support different kind of concrete mappings (eg. mpi, simple, multiprocess, etc.. No string

Responses

Code Description

/summaries/workflowexecution


GET

Description: Produce a detailed overview of the distribution of the computation, reporting the size of data movements between the workflow components, their instances or invocations across worker nodes, depending on the specified granularity level. Additional information, such as process pid, worker, instance or component of the workflow (depending on the level of granularity) can be selectively extracted by assigning these properties to a groupBy parameter. This will support the generation of grouped views

Parameters

Name Located in Description Required Schema
level query level of depth in the data derivation graph, starting from the current Data Yes string
mintime query minimum start time of the Invocation No string
groupby query express the grouping of the returned data No string
runId query the id of the run to be analysed No string
maxtime query maximum start time of the Invocation No string
maxidx query maximum iteration index of an Invocation No integer
minidx query minimum iteration index of an Invocation No integer

Responses

Code Description

/terms


GET

Description: Return a list of discoverable metadata terms based on their appearance for a list of runIds, usernames, or for the whole provenance archive. Terms are returned indicating their type (when consistently used), min and max values and their number occurrences within the scope of the search

Parameters

Name Located in Description Required Schema
aggregationLevel query set whether the terms need to be aggreagated by runId, username or across the whole collection (all) No string
usernames query csv list of usernames No string
runIds query csv list of run ids No string

Responses

Code Description

/workflowexecutions


GET

Description: Extract documents from the bundle collection according to a query string which may include usernames, type of the workflow, the components the run wasAssociatedWith and their implementations. Data results' metadata and parameters can also be queried by specifying the terms and their min and max values-ranges and data formats. Mode of the search can also be indicated (mode ::= (OR j AND). It will apply to the search upon metadata and parameters values of each run

Parameters

Name Located in Description Required Schema
wasAssociatedWith query csv lis of Components involved in the Workflow Executions No string
usernames query csv list of users the Workflows Executons are associated with No string
terms query csv list of metadata or parameter terms. These relate positionally to the maxvalues and the minvalues No string
functionNames query csv list of functions that are executed by at least one workflow’s components No string
minvalues query csv list of metadata or parameters minvalues. These relate positionally to the terms and the minvalues No string
rformat query unimplemented: format of the response payload (json,json-ld) No string
start query index of the starting item Yes integer
limit query max number of items expected Yes integer
maxvalues query csv list of metadata or parameters maxvalues. These relate positionally to the terms and the minvalues No string
formats query csv list of data formats (eg. mime-types) No string
clusters query csv list of clusters that describe and group one or more workflow’s component No string
types query No string
mode query execution mode of the workflow in case it support different kind of concrete mappings (eg. mpi, simple, multiprocess, etc.. No string

Responses

Code Description

/workflowexecutions/insert


POST

Description: Bulk insert of bundle or lineage documents in JSON format. These must be provided as encoded stirng in a POST request

Parameters

Name Located in Description Required Schema
body body No object

Responses

Code Description

/workflowexecutions/import


POST

Description: Import of provenance output which is not yet mapped to the s-ProvFlowMongoDB format. The files provided in the archive will be mapped to s-ProvFlowMongoDB if they are in one of the supported formats.

Parameters

Name Located in Description Required Schema
archive form Zip archive of provenance output, which will be mapped to s-ProvFlowMongoDB and stored. Currently only files in the CWLProv format are supported Yes file
format form Format of the provenance output to be imported. Yes String

Responses

Code Description

/workflowexecutions/{run_id}/export


GET

Description: Export of provenance information PROV-XML or RDF format. The S-PROV information returned covers the whole workflow execution or is restricted to a single data element. In the latter case, the graph is returned by following the derivations within and across runs. A level parameter allows to indicate the depth of the resulting trace

Parameters

Name Located in Description Required Schema
rdfout query export rdf format of the PROV document returned No string
creator query the name of the user requesting the export No string
format query export format of the PROV document returned No string
run_id path Yes string

Responses

Code Description

/workflowexecutions/{runid}


DELETE

Description: Extract documents from the bundle collection by the runid of a WFExecution. The method will return input data and infomation about the components and the libraries used for the specific run

Parameters

Name Located in Description Required Schema
runid path Yes string

Responses

Code Description
GET

Description: Extract documents from the bundle collection by the runid of a WFExecution. The method will return input data and infomation about the components and the libraries used for the specific run

Parameters

Name Located in Description Required Schema
runid path Yes string

Responses

Code Description

/workflowexecutions/{runid}/delete


POST

Description: Delete a workflow execution trace, including its bundle and all its lineage documents

Parameters

Name Located in Description Required Schema
runid path Yes string

Responses

Code Description

/workflowexecutions/{runid}/edit


POST

Description: Update of the description of a workflow execution. Users can improve this information in free-tex

Parameters

Name Located in Description Required Schema
body body No object
runid path Yes string

Responses

Code Description

/workflowexecutions/{runid}/showactivity


GET

Description: Extract detailed information related to the activity related to a WFExecution (id). The result-set can be grouped by invocations, instances or components (parameter level) and shows progress, anomalies (such as exceptions or systems' and users messages), occurrence of changes and the rapid availability of accessible data bearing intermediate results. This method can also be used for runtime monitoring

Parameters

Name Located in Description Required Schema
start query index of the starting item Yes integer
limit query max number of items expected Yes integer
level query level of aggregation of the monitoring information (component, instance, invocation, cluster) No string
runid path Yes string

Responses

Code Description

Testing environment

The purpose of this component is to provide a DARE environment for test and debugging purposes. The component exposes two endpoints:

  • The /playground endpoint: this simulates the dispel4py execution in DARE and prints the logs and output content directly to the user
  • The /run-command endpoint: accepts any bash command, which is executed and returns the result to the user

Use in notebook

  • For the first endpoint, you need to execute the first steps as always: login, create workspace, register the workflow
  • For the second endpoint, you need to provide the endpoint, the token, the command and the output file name if exists

Update helper_functions

Add the below two methods in helper_functions:

  • For the first endpoint
def debug_d4p(hostname, impl_id, pckg, workspace_id, pe_name, token, reqs=None, output_filename="output.txt",
              **kw):
    # Prepare data for posting
    data = {
        "impl_id": impl_id,
        "pckg": pckg,
        "wrkspce_id": workspace_id,
        "n_nodes": 1,
        "name": pe_name,
        "access_token": token,
        "output_filename": output_filename,
        "reqs": reqs if not (reqs is None) else "None"
    }
    d4p_args = {}
    for k in kw:
        d4p_args[k] = kw.get(k)
    data['d4p_args'] = d4p_args
    r = requests.post(hostname + '/playground', data=json.dumps(data))
    if r.status_code == 200:
        response = json.loads(r.text)
        if response["logs"]:
            print("Logs:\n========================")
            for log in response["logs"]:
                print(log)
        if response["output"]:
            print("Output content:\n==============================")
            for output in response["output"]:
                print(output)
    else:
        print('Playground returns status_code: \
                ' + str(r.status_code))
        print(r.text)
  • For the second endpoint:
import requests
import json

def exec_command(hostname, token, command, run_dir="new", output_filename="output.txt"):
    data = {
        "access_token": token,
        "command": command,
        "run_dir": run_dir,
        "output_filename": output_filename
    }

    r = requests.post(hostname + '/run-command', data=json.dumps(data))
    if r.status_code == 200:
        response = json.loads(r.text)
        if response["logs"]:
            print("Logs:\n========================")
            for log in response["logs"]:
                print(log)
        if response["output"]:
            print("Output content:\n==============================")
            for output in response["output"]:
                print(output)
        if response["run_dir"]:
            print("Run directory is: ")
            print(response["run_dir"])
    else:
        print('Playground returns status_code: \
                ' + str(r.status_code))
        print(r.text)

Update the jupyter notebook

  • For the /playground endpoint:
F.debug_d4p(impl_id=impl_id, pckg="mysplitmerge_pckg", workspace_id=workspace_id, pe_name="mySplitMerge", 
            token=F.auth(), creds=creds, no_processes=6, iterations=1,
            reqs='https://gitlab.com/project-dare/dare-api/raw/master/examples/jupyter/requirements.txt')
  • For the /run-command endpoint:
    F.exec_command(PLAYGROUND_API_HOSTNAME, F.auth(), "pip install --user numpy")

Technical documentation of the component is also available here

Semantic Data Discovery

The API documentation of the Semantic Data Discovery component is available in our testbed environment

Dispel4py Documentation

dispel4py is a free and open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. It enables users to focus on their scientific methods, avoiding distracting details and retaining flexibility over the computing infrastructure they use. It delivers mappings to diverse computing infrastructures, including cloud technologies, HPC architectures and specialised data-intensive machines, to move seamlessly into production with large-scale data loads. The dispel4py system maps workflows dynamically onto multiple enactment systems, and supports parallel processing on distributed memory systems with MPI and shared memory systems with multiprocessing, without users having to modify their workflows.

Dependencies

dispel4py has been tested with Python 2.7.6, 2.7.5, 2.7.2, 2.6.6 and Python 3.4.3, 3.6, 3.7.

The following Python packages are required to run dispel4py:

If using the MPI mapping:

Installation

Clone this repository to your desktop. You can then install from the local copy to your python environment by calling:

python setup.py install

from the dispel4py root directory.

Docker

The Dockerfile in the dispel4py root directory builds a Debian Linux distribution and installs dispel4py and OpenMPI.

docker build . -t dare-dispel4py

Start a Docker container with the dispel4py image in interactive mode with a bash shell:

docker run -it dare-dispel4py /bin/bash

For the EPOS use cases obspy is included in a separate Dockerfile Dockerfile.seismo:

docker build . -f Dockerfile.seismo -t dare-dispel4py-seismo

Provenance in Dispel4py

lean_empty

.. code-block:: python

clean_empty(d)

Utility function that given a dictionary in input, removes all the properties that are set to None. It workes recursively through lists and nested documents

total_size

.. code-block:: python

total_size(o, handlers={}, verbose=False)

Returns the approximate memory footprint an object and all of its contents.

Automatically finds the contents of the following builtin containers and their subclasses: tuple, list, deque, dict, set and frozenset. To search other containers, add handlers to iterate over their contents:

handlers = {SomeContainerClass: iter, OtherContainerClass: OtherContainerClass.get_elements}

write

.. code-block:: python

write(self, name, data)

Redefines the native write function of the dispel4py SimpleFunctionPE to take into account provenance payload when transfering data.

getDestination_prov

.. code-block:: python

getDestination_prov(self, data)

When provenance is activated it redefines the native dispel4py.new.process getDestination function to take into account provenance information when redirecting grouped operations.

commandChain

.. code-block:: python

commandChain(commands, envhpc, queue=None)

Utility function to execute a chain of system commands on the hosting oeprating system. The current environment variable can be passed as parameter env. The queue parameter is used to store the stdoutdata, stderrdata of each process in message

ProvenanceType

.. code-block:: python

ProvenanceType(self)

A workflow is a program that combines atomic and independent processing elements via a specification language and a library of components. More advanced systems adopt abstractions to facilitate re-use of workflows across users'' contexts and application domains. While methods can be multi-disciplinary, provenance should be meaningful to the domain adopting them. Therefore, a portable specification of a workflow requires mechanisms allowing the contextualisation of the provenance produced. For instance, users may want to extract domain-metadata from a component or groups of components adopting vocabularies that match their domain and current research, tuning the level of granularity. To allow this level of flexibility, we explore an approach that considers a workflow component described by a class, according to the Object-Oriented paradigm. The class defines the behaviour of its instances as their type, which specifies what an instance will do in terms of a set of methods. We introduce the concept of ProvenanceType\ , that augments the basic behaviour by extending the class native type, so that a subset of those methods perform the additional actions needed to deliver provenance data. Some of these are being used by some of the preexisting methods, and characterise the behaviour of the specific provenance type, some others can be used by the developer to easily control precision and granularity. This approach, tries to balance between automation, transparency and explicit intervention of the developer of a data-intensive tool, who can tune provenance-awareness through easy-to-use extensions.

The type-based approach to provenance collection provides a generic ProvenanceType class that defines the properties of a provenance-aware workflow component. It provides a wrapper that meets the provenance requirements, while leaving the computational behaviour of the component unchanged. Types may be developed as Pattern Type and Contextual Type to represent respectively complex computational patterns and to capture specific metadata contextualisations associated to the produce output data.

The ProvenanceType presents the following class constants to indicate where the lineage information will be stored. Options include a remote repository, a local file system or a ProvenanceSensor (experimental).

  • _SAVE_MODE\ SERVICE=‘service’
  • _SAVE_MODE\ FILE=‘file’
  • _SAVE_MODE\ SENSOR=‘sensor’

The following variables will be used to configure some general provenance capturing properties

  • _PROV\ PATH\ : When _SAVE_MODE\ SERVICE is chosen, this variable should be populated with a string indicating a file system path where the lineage will be stored.
  • _REPOS\ URL\ : When _SAVE_MODE\ SERVICE is chosen, this variable should be populated with a string indicating the repository endpoint (S-ProvFlow) where the provenance will be sent.
  • _PROV_EXPORT_URL: The service endpoint from where the provenance of a workflow execution, after being stored, can be extracted in PROV format.
  • _BULK\ SIZE\ : Number of lineage documents to be stored in a single file or in a single request to the remote service. Helps tuning the overhead brough by the latency of accessing storage resources.

getProvStateObjectId ^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.getProvStateObjectId(self, name)

Return the id of a named object stored in the provenance state

apply_derivation_rule ^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.apply_derivation_rule(self, event, voidInvocation, oport=None, iport=None, data=None, metadata=None)

In support of the implementation of a ProvenanceType realising a lineage Pattern type. This method is invoked by the ProvenanceType each iteration when a decision has to be made whether to ignore or discard the dependencies on the ingested stream and stateful entities, applying a specific provenance pattern, thereby creating input/output derivations. The framework invokes this method every time the data is written on an output port (\ event\ : write\ ) and every time an invocation (\ s-prov:Invocation\ ) ends (\ event\ : _end_invocation\ event\ ). The latter can be further described by the boolean parameter voidInvocation\ , indicating whether the invocation terminated with any data produced. The default implementation provides a stateless behaviour, where the output depends only from the input data recieved during the invocation.

getInputAt ^^^^^^^^^^

.. code-block:: python

ProvenanceType.getInputAt(self, port=‘input’, index=None)

Return input data currently available at a specific port. When reading input of a grouped operator, the gindex parameter allows to access exclusively the data related to the group index.

addNamespacePrefix ^^^^^^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.addNamespacePrefix(self, prefix, url)

In support of the implementation of a ProvenanceType realising a lineage Contextualisation type. A Namespace prefix can be declared with its vocabulary url to map the metadata terms to external controlled vocabularies. They can be used to qualify the metadata terms extracted from the extractItemMetadata function, as well as for those terms injected selectively at runtime by the write method. The namespaces will be used consistently when exporting the lineage traces to semantic-web formats, such as RDF.

extractItemMetadata ^^^^^^^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.extractItemMetadata(self, data, port)

In support of the implementation of a ProvenanceType realising a lineage Contextualisation type. Extracts metadata from the domain specific content of the data (s-prov:DataGranules) written on a components output port\ , according to a particular vocabulary.

ignorePastFlow ^^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.ignorePastFlow(self)

In support of the implementation of a ProvenanceType realising a lineage Pattern type.

It instructs the type to ignore the all the inputs when the method _apply_derivation\ rule is invoked for a certain event."

ignoreState ^^^^^^^^^^^

.. code-block:: python

ProvenanceType.ignoreState(self)

In support of the implementation of a ProvenanceType realising a lineage Pattern type.

It instructs the type to ignore the content of the provenance state when the method _apply_derivation\ rule is invoked for a certain event."

discardState ^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.discardState(self)

In support of the implementation of a ProvenanceType realising a lineage Pattern type.

It instructs the type to reset the data dependencies in the provenance state when the method _apply_derivation\ rule is invoked for a certain event. These will not be availabe in the following invocations."

discardInFlow ^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.discardInFlow(self, wlength=None, discardState=False)

In support of the implementation of a ProvenanceType realising a lineage Pattern type.

It instructs the type to reset the data dependencies related to the component'’s inputs when the method _apply_derivation\ rule is invoked for a certain event. These will not be availabe in the following invocations."

update_prov_state ^^^^^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.update_prov_state(self, lookupterm, data, location='', format='', metadata={}, ignore_inputs=False, ignore_state=True, stateless=False, **kwargs)

In support of the implementation of a ProvenanceType realising a lineage Pattern type or inn those circumstances where developers require to explicitly manage the provenance information within the component'’s logic,.

Updates the provenance state (\ s-prov:StateCollection\ ) with a reference, identified by a lookupterm\ , to a new data entity or to the current input. The lookupterm will allow developers to refer to the entity when this is used to derive new data. Developers can specify additional medatata by passing a metadata dictionary. This will enrich the one generated by the extractItemMetadata method. Optionally the can also specify format and location of the output when this is a concrete resource (file, db entry, online url), as well as instructing the provenance generation to ‘ignore_input’ and ‘ignore_state’ dependencies.

The kwargs parameter allows to pass an argument dep where developers can specify a list of data id to explicitly declare dependencies with any data in the provenance state (\ s-prov:StateCollection\ ).

write ^^^^^

.. code-block:: python

ProvenanceType.write(self, name, data, **kwargs)

This is the native write operation of dispel4py triggering the transfer of data between adjacent components of a workflow. It is extended by the ProvenanceType with explicit provenance controls through the kwargs parameter. We assume these to be ignored when provenance is deactivated. Also this method can use the lookup tags to establish dependencies of output data on entities in the provenance state.

The kwargs parameter allows to pass the following arguments:

  • dep : developers can specify a list of data id to explicitly declare dependencies with any data in the provenance state (\ s-prov:StateCollection\ ).
  • metadata\ : developers can specify additional medatata by passing a metadata dictionary.
  • _ignore\ inputs\ : instructs the provenance generation to ignore the dependencies on the current inputs.
  • format\ : the format of the output.
  • location\ : location of the output when this is a concrete resource (file, db entry, online url).

checkSelectiveRule ^^^^^^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.checkSelectiveRule(self, streammeta)

In alignement with what was previously specified in the configure_prov_run for the Processing Element, check the data granule metadata whether its properies values fall in a selective provenance generation rule.

checkTransferRule ^^^^^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.checkTransferRule(self, streammeta)

In alignement with what was previously specified in the configure_prov_run for the Processing Element, check the data granule metadata whether its properies values fall in a selective data transfer rule.

extractDataSourceId ^^^^^^^^^^^^^^^^^^^

.. code-block:: python

ProvenanceType.extractDataSourceId(self, data, port)

In support of the implementation of a ProvenanceType realising a lineage Pattern type. Extract the id from the incoming data, if applicable, to reuse it to identify the correspondent provenance entity. This functionality is handy especially when a workflow component ingests data represented by self-contained and structured file formats. For instance, the NetCDF attributes Convention includes in its internal metadata an id that can be reused to ensure the linkage and therefore the consistent continuation of provenance tracesbetween workflow executions that generate and use the same data.

AccumulateFlow

.. code-block:: python

AccumulateFlow(self)

A Pattern type for a Processing Element (\ s-prov:Component\ ) whose output depends on a sequence of input data; e.g. computation of periodic average.

Nby1Flow

.. code-block:: python

Nby1Flow(self)

A Pattern type for a Processing Element (\ s-prov:Component\ ) whose output depends on the data received on all its input ports in lock-step; e.g. combined analysis of multiple variables.

SlideFlow

.. code-block:: python

SlideFlow(self)

A Pattern type for a Processing Element (\ s-prov:Component\ ) whose output depends on computations over sliding windows; e.g. computation of rolling sums.

ASTGrouped

.. code-block:: python

ASTGrouped(self)

A Pattern type for a Processing Element (\ s-prov:Component\ ) that manages a stateful operator with grouping rules; e.g. a component that produces a correlation matrix with the incoming coefficients associated with the same sampling-iteration index

SingleInvocationFlow

.. code-block:: python

SingleInvocationFlow(self)

A Pattern type for a Processing Element (\ s-prov:Component\ ) that presents stateless input output dependencies; e.g. the Processing Element of a simple I/O pipeline.

AccumulateStateTrace

.. code-block:: python

AccumulateStateTrace(self)

A Pattern type for a Processing Element (\ s-prov:Component\ ) that keeps track of the updates on intermediate results written to the output after a sequence of inputs; e.g. traceable approximation of frequency counts or of periodic averages.

IntermediateStatefulOut

.. code-block:: python

IntermediateStatefulOut(self)

A Pattern type for a Processing Element (\ s-prov:Component\ ) stateful component which produces distinct but interdependent output; e.g. detection of events over periodic observations or any component that reuses the data just written to generate a new product

ForceStateless

.. code-block:: python

ForceStateless(self)

A Pattern type for a Processing Element (\ s-prov:Component\ ). It considers the outputs of the component dependent only on the current input data, regardless from any explicit state update; e.g. the user wants to reduce the amount of lineage produced by a component that presents inline calls to the _update_prov\ state\ , accepting less accuracy.

get_source

.. code-block:: python

get_source(object, spacing=10, collapse=1)

Print methods and doc strings. Takes module, class, list, dictionary, or string.

injectProv

.. code-block:: python

injectProv(object, provType, active=True, componentsType=None, workflow={}, **kwargs)

This function dinamically extend the type of each the nodes of the graph or subgraph with ProvenanceType type or its specialisation.

configure_prov_run

.. code-block:: python

configure_prov_run(graph, provRecorderClass=None, provImpClass=<class ‘dispel4py.provenance.ProvenanceType’>, input=None, username=None, workflowId=None, description=None, system_id=None, workflowName=None, workflowType=None, w3c_prov=False, runId=None, componentsType=None, clustersRecorders={}, feedbackPEs=[], save_mode=‘file’, sel_rules={}, transfer_rules={}, update=False, sprovConfig=None, sessionId=None, mapping=‘simple’)

In order to enable the user of a data-intensive application to configure the lineage metadata extracted from the execution of their worklfows we adopt a provenance configuration profile. The configuration is used at the time of the initialisation of the workflow to prepare its provenance-aware execution. We consider that a chosen configuration may be influenced by personal and community preferences, as well as by rules introduced by institutional policies. For instance, a certain RI would require to choose among a set of contextualisation types, in order to adhere to the infrastructure’s metadata portfolio. Thus, a provenance configuration profile play in favour of more generality, encouraging the implementation and the re-use of fundamental methods across disciplines.

With this method, the users of the workflow provide general provenance information on the attribution of the run, such as username\ , runId (execution id), description\ , workflowName\ , and its semantic characterisation workflowType. It allows users to indicate which provenance types to apply to each component and the belonging conceptual provenance cluster. Moreover, users can also choose where to store the lineage (_save\ mode\ ), locally in the file system or in a remote service or database. Lineage storage operations can be performed in bulk, with different impacts on the overall overhead and on the experienced rapidity of access to the lineage information.

  • Configuration JSON\ : We show here an example of the JSON document used to prepare a worklfow for a provenance aware execution. Some properties are described inline. These are defined by terms in the provone and s-prov namespaces.

.. code-block:: python

   {
           'provone:User': "aspinuso",
           's-prov:description' : "provdemo demokritos",
           's-prov:workflowName': "demo_epos",
           # Assign a generic characterisation or aim of the workflow
           's-prov:workflowType': "seis:preprocess",
           # Specify the unique id of the workflow
           's-prov:workflowId'  : "workflow process",
           # Specify whether the lineage is saved locally to the file system or remotely to an existing serivce (for location setup check the class prperties or the command line instructions section.)
           's-prov:save-mode'   : 'service'         ,
           # Assign the Provenance Types and Provenance Clusters to the processing elements of the workflows. These are indicated by the name attributed to their class or function, eg. PE_taper. The 's-prov:type' property accepts a list of class names, corrisponding to the types' implementation. The 's-prov:cluster' is used to group more processing elements to a common functional section of the workflow.
           's-prov:componentsType' :
                              {'PE_taper': {'s-prov:type':["SeismoPE"]),
                                            's-prov:prov-cluster':'seis:Processor'},
                               'PE_plot_stream':    {'s-prov:prov-cluster':'seis:Visualisation',
                                                  's-prov:type':["SeismoPE"]},
                               'StoreStream':    {'s-prov:prov-cluster':'seis:DataHandler',
                                                  's-prov:type':["SeismoPE,AccumulateFlow"]}
                               }}
  • Selectivity rules\ : By declaratively indicating a set of Selectivity rules for every component (’s-prov:sel_rules'), users can respectively activate the collection of the provenance for particular Data elements or trigger transfer operations of the data to external locations. The approach takes advantage of the contextualisation possibilities offered by the provenance Contextualisation types. The rules consist of comparison expressions formulated in JSON that indicate the boundary values for a specific metadata term. Such representation is inspired by the query language and selectors adopted by a popular document store, MongoDB. These can be defined also within the configuration JSON introduced above.

Example, a Processing Element CorrCoef that produces lineage information only when the rho value is greater than 0:

.. code-block:: python

   { "CorrCoef": {
       "rules": {
           "rho": {
               "$gt": 0
   }}}}
  • ** Command Line Activation**\ : To enable proveance activation through command line dispel4py should be executed with specific command line instructions. The following command will execute a local test for the provenance-aware execution of the MySplitAndMerge workflow.**

.. code-block:: python

dispel4py –provenance-config=dispel4py/examples/prov_testing/prov-config-mysplitmerge.json –provenance-repository-url= multi dispel4py/examples/prov_testing/mySplitMerge_prov.py -n 10

  • The following command instead stores the provenance files to the local filesystem in a given directory. To activate this mode, the property s-prov:save_mode of the configuration file needs to be set to ‘file’.

.. code-block:: python

dispel4py --provenance-config=dispel4py/examples/prov_testing/prov-config-mysplitmerge.json --provenance-path=/path/to/prov multi dispel4py/examples/prov_testing/mySplitMerge_prov.py -n 10

ProvenanceSimpleFunctionPE

.. code-block:: python

ProvenanceSimpleFunctionPE(self, *args, **kwargs)

A Pattern type for the native SimpleFunctionPE of dispel4py

ProvenanceIterativePE

.. code-block:: python

ProvenanceIterativePE(self, *args, **kwargs)

A Pattern type for the native IterativePE Element of dispel4py