In this page, information on DARE API is provided. DARE Platform follows the Microservices software architecture, therefore the DARE API is constitute by multiple individual APIs. The following sections list the API endpoints of each DARE service. For high level description of the components visit the features section of our site.
The main component APIs described here are:
- dare-login service which interacts with Keycloak in the backend
- d4p-registry, API to interact with the dispel4py workflow registry
- workflow-registry, API to interact with the CWL workflow registry
- exec-api, the DARE execution API
- s-prov, the provenance API
- playground, which enables a development environment for scientists to write their workflows
- semantic-data-discovery, API to retrieve data from the Data Catalogue
Finally, we provide documentation on the dispel4py library.
DARE Login API
DARE login service interacts between DARE components and Keycloak. It exposes functionality for sign in to the platform, refreshing and validating a token etc. The main API calls are:
HTTP method | Endpoint | Description | Content Type | Parameters |
---|---|---|---|---|
POST | /auth/ | Authenticates a user performing HTTP call to the Keycloak service. After having successfully authenticated the user, Dispel4py Registry, CWL registry and Execution API are notified to check if the user already exists in their local DBs |
application/json | data (body), example: { "username": "string", "password": "string", "requested_issuer": "string"} |
POST | /validate-token/ | Validates a token using the Keycloak Service | application/json | data (body),example: { "access_token": "string" } |
POST | /delegation-token/ | Issues a token for internal application use | application/json | data (body), example: { "access_token": "string" } |
POST | /refresh-token/ | Uses the refresh token to issue a new token for a user | application/json | data (body), example: { "refresh_token": "string", "issuer": "string" } |
The technical documentation of the dare-login component can be found here.
D4p Information Registry
The Dispel4py Workflow Registry enables the research developers to register their dispel4py workflows, re-use and share them. The fowlloing table shows the available API endpoints of this component.
HTTP method | Endpoint | Description | Content Type | Parameters |
---|---|---|---|---|
GET | /connections/ | Retrieves all the available PE Connection resources. A PE Connection resource allows the addition and manipulation of PE connections. Connections are associated with PEs and are not themselves workspace items |
application/json | No parameters |
POST | /connections/ | Creates a new PE Connection resource, which allows the addition and manipulation of PE connections. Connections are associated with PEs and are not themselves workspace items |
application/json | data (body) example: { "comment" : "string" , "kind" : "string" , "modifiers" : "string" , "name" : "string" , "is_array" : true , "s_type" : "string" , "d_type" : "string" , "pesig" : "string" } |
GET | /connections/{id}/ | Retrieves a specific PE Connection resource. A PE Connection resource allows the addition and manipulation of PE connections. Connections are associated with PEs and are not themselves workspace items. |
application/json | id (integer) |
PUT | /connections/{id}/ | Updates an existing PE Connetion resource. A PE Connection resource allows the addition and manipulation of PE connections. Connections are associated with PEs and are not themselves workspace items. |
application/json | -id (integer) -data (body) example: { "comment" : "string" , "kind" : "string" , "modifiers" : "string" , "name" : "string" , "is_array" : true , "s_type" : "string" , "d_type" : "string" , "pesig" : "string" } |
DELETE | /connections/{id}/ | Deletes an existing PE Connection resource from the DB |
application/json | id (integer) |
GET | /fnimpls/ | Retrieve all the available function implementation resources |
application/json | No parameters |
POST | /fnimpls/ | Creates a new Function Implementation | application/json | data (body), example: { "code" : "string", "parent_sig": "string", "description" : "string", "pckg" : "string", "workspace": "string" , "clone_of": "string" , "name": "string" } |
GET | /fnimpls/{id}/ | Retrieves a specific Function implementation resource | application/json | id (integer) |
PUT | /fnimpls/{id}/ | Updates an existing function implementation | application/json | -id (integer) -data (body), example: { "code": "string", "parent_sig": "string", "description": "string", "pckg": "string", "workspace": "string", "clone_of": "string", "name": "string" } |
DELETE | /fnimpls/{id}/ | Deletes an existing Function Implementation | application/json | id (integer) |
GET | /fnparams/ | Retrieves all the available Function Parameters | application/json | No parameters |
POST | /fnparams/ | Creates a new Function Parameters | application/json | data (body), example: { "parent_function": "string", "param_name": "string", "param_type" : "string" } |
GET | /fnparams/{id}/ | Retrieves a specific Function Parameters | application/json | id (integer) |
PUT | /fnparams/{id}/ | Updates an existing Function Parameters entry |
application/json | -id (integer) -data (body) example: { "parent_function": "string", "param_name": "string", "param_type": "string" } |
DELETE | /fnparams/{id}/ | Deletes an existing Function Parameters | application/json | id (integer) |
GET | /functions/ | Retrieves all the Function resources from the DB |
application/json | No parameters |
POST | /functions/ | Creates a new Function Resource | application/json | data (body), example: { "description" : "string", "parameters" : ["string"], "fnimpls": ["string"], "pckg": "string", "workspace": "string", "return_type": "string", "clone_of": "string", "name": "string" } |
GET | /functions/{id}/ | Retrieves an existing Function resource | application/json | id (integer) |
PUT | /functions/{id}/ | Updates an existing function resouce | application/json | -id (integer) -data(body), example: { "description": "string", "parameters": ["string"], "fnimpls": ["string"], "pckg": "string", "workspace": "string", "return_type": "string", "clone_of": "string", "name": "string" } |
DELETE | /functions/{id}/ | Deletes an existing function resource | application/json | id (integer) |
GET | /groups/ | Retrieves all the available groups | application/json | No parameters |
POST | /groups/ | Creates a new user group | application/json | data (body), example: { "name": "string" } |
GET | /groups/{id}/ | Retrieves a user group | application/json | id (integer) |
PUT | /groups/{id}/ | Updates an existing user group | application/json | -id (integer) -data (body), example: { "name": "string" } |
DELETE | /groups/{id}/ | Removes a user group | application/json | id (integer) |
GET | /literals/ | Retrieves all the literal entities | application/json | No parameters |
POST | /literals/ | Creates a new Literal Entities | application/json | data (body), example: { "description": "string", "value": "string", "name": "string", "pckg": "string", "workspace": "string", "clone_of" : "string" } |
GET | /literals/{id}/ | Retrieves a Literal Entities | application/json | id (integer) |
PUT | /literals/{id}/ | Updates an existing Literal Entities | application/json | -id (integer) -data (body), example: { "description": "string", "value": "string", "name": "string", "pckg": "string", "workspace": "string", "clone_of": "string" } |
DELETE | /literals/{id}/ | Deletes a Literal Entities | application/json | id (integer) |
GET | /peimpls/ | Retrieves all the available PE Implementation |
application/json | No parameters |
POST | /peimpls/ | Creates a new PE Implementation | application/json | data (body), example: { "code": "string", "parent_sig": "string", "description": "string", "pckg": "string", "workspace": "string", "clone_of": "string", "name": "string" } |
GET | /peimpls/{id}/ | Retrieves a specific PE Implementation |
application/json | id (integer) |
PUT | /peimpls/{id}/ | Updates an existing PE Implementation | application/json | -id (integer) -data(body), example: { "code": "string", "parent_sig": "string", "description": "string", "pckg": "string", "workspace": "string", "clone_of" : "string", "name": "string" } |
DELETE | /peimpls/{id}/ | Deletes an existing PE Implementation | application/json | id (integer) |
GET | /pes/ | Retrieves all the available PE resources | application/json | No parameters |
POST | /pes/ | Creates a new PE | application/json | data (body), example: { "description": "string", "name": "string", "connections": ["string"], "pckg": "string", "workspace": "string", "clone_of": "string", "peimpls": ["string"] } |
GET | /pes/{id}/ | Retrieves a specific PE | application/json | id (integer) |
PUT | /pes/{id}/ | Updates an existing PE | application/json | -id(integer) -data(body), example: { "description": "string", "name": "string", "connections": ["string"], "pckg": "string", "workspace": "string", "clone_of": "string", "peimpls": ["string"] } |
DELETE | /pes/{id}/ | Deletes an existing PE | application/json | id (integer) |
GET | /registryusergroups/ | Retrieves all the available registry user groups |
application/json | No parameters |
POST | /registryusergroups/ | Creates a new Registry user group | application/json | data (body), example: { "description": "string", "group_name": "string" } |
GET | /registryusergroups/{id}/ | Retrieves a specific Registry user group | application/json | id (integer) |
PUT | /registryusergroups/{id}/ | Updates an existing Registry user group | application/json | -id(integer) -data(body), example: { "owner": "string", "description": "string", "group_name": "string" } |
DELETE | /registryusergroups/{id}/ | Deletes an existing Registry user group | application/json | id (integer) |
GET | /users/ | Retrieves all the existing users | application/json | No parameters |
POST | /users/ | Creates a new user | application/json | data (body), example: { "username": "string", "password": "string", "first_name": "string", "last_name": "string", "email": "string" } |
GET | /users/{id}/ | Retrieves a specific user | application/json | id (integer) |
PUT | /users/{id}/ | Updates a specific user | application/json | -id(integer) -data(body), example: { "username": "string", "password": "string", "first_name": "string", "last_name": "string", "email": "string" } |
DELETE | /users/{id}/ | Deletes a specific user | application/json | id (integer) |
GET | /workspaces/ | Retrieves all the available workspaces | application/json | parameters: -name: name -description: The name of the workspace we want to display -paramType: query -name: username -description: The username the workspace is associated with -paramType: query -name: search - description: perform a simple full-text on descriptions and names of workspaces. -paramType: query |
POST | /workspaces/ | Create or clone a new workspace | application/json | parameters: - name: name - description: the name of the workspace. - name: description - description: a textual description of the workspace. - name: clone_of - description: indicates that a cloning operation is requested. - paramType: query - type: long |
GET | Retrieves a specific workspace | application/json | id (integer) | |
PUT | /workspaces/{id}/ | Updates an existing workspace | application/json | -id (integer) -data (body), example: { "clone_of": "string", "name": "string", "description": "string" } |
DELETE | /workspaces/{id}/ | Deletes an existing workspace | application/json | id (integer) |
CWL Workflow Registry
This component is a Django Web Service exposing an API for CWLs and dockers registration. The technical documentation for the CWL workflow registry is available in the project’s micro site here. The following table shows all the available API calls in the CWL workflow registry.
HTTP method | Endpoint | Description | Content Type | Parameters |
---|---|---|---|---|
POST | /docker/ | Creates a new Docker Environment. The environment consist of a Dockerfile and can be associated with one or multiple DockerScript entries (which represent bash or python scripts) |
application/json | data (body), example: { "docker_name": "name", "docker_tag": "tag", "script_names": ["script1.sh", "script2.sh"] "files": { "dockerfile": "string", "script1.sh": "string.", "script2.sh": "string" }, "access_token": "token" } |
POST | /docker/update_docker/ | Updates an existing Docker Environment | application/json | data (body), example: { "docker_name": "name", "docker_tag" "tag", "update": {"tag": "v2.0"}, "files": {"dockerfile": "string"}, "access_token": "token" } |
POST | /docker/provide_url/ | Updates an existing Docker environment’s url field. Once the docker image is built and pushed in a public repository, the relevant Docker entry should be updated with the URL. |
application/json | data (body), example: { "docker_name": "name", "docker_tag": "tag", "docker_url": "url", "access_token": "token" } |
DELETE | /docker/delete_docker/ | Deletes an existing docker environment. | application/json | data (body), example: { "docker_name": "name", "docker_tag": "tag", "access_token": "token" } |
GET | /docker/bynametag/ | Retrieves a Docker Environment using its name and tag. |
application/json | -docker_name (string) -docker_tag(string) |
GET | /docker/byuser/ | Retrieves all the registered Docker environments by user |
application/json | -requested_user(string) if exists, otherwise it uses the user that performed the request |
GET | /docker/download/ | Downloads in a zip file the Dockerfile and the relevant scripts of a Docker Environment. |
application/json | -docker_name (string) -docker_tag(string) |
POST | /scripts/add/ | Adds a new script in an existing Docker Environment |
application/json | data (body), example: { "docker_name": "name", "docker_tag": "tag", "script_name": "entrypoint.sh", "files": {"entrypoint.sh": "string"}, "access_token": "token" } |
POST | /scripts/edit/ | Edits an existing script of a Docker Environment |
application/json | data (body), example: { "docker_name": "name", "docker_tag": "tag", "script_name": "entrypoint.sh", "files": {"entrypoint.sh": “string”}, "access_token": "token" } |
DELETE | /scripts/delete/ | Deletes an existing script from a docker environment | application/json | data (body), example: { "docker_name": "name", "docker_tag": "tag", "script_name": "entrypoint.sh", "access_token": "token" } |
GET | /scripts/download | Downloads a specific script from a Docker Environment |
application/json | -docker_name(string) -docker_tag(string) -script_name(string) |
GET | /scripts/byname | Retrieves a specific script based on the name & tag of the Docker Environment and on the name of the script. |
application/json | -docker_name(string) -docker_tag(string) -script_name(string) |
POST | /workflows/ | Creates a new CWL workflow of class Workflow |
application/json | data (body), example: { "workflow_name": "demo_workflow.cwl", "workflow_version": "v1.0", "spec_file_name": "spec.yaml", "docker_name": "name", "docker_tag": "tag", "workflow_part_data": [{ "name":arguments.cwl”, "version":"v1.0", "spec_name": "arguments.yaml" }, { "name": "tar_param.cwl", "version":"v1.0", "spec_name": "tar_param.yaml" }], "files": { "demo_workflow.cwl":"string", "spec.yaml": "string", "arguments.cwl": "string", "arguments.yaml": "string", "tar_param.cwl": "string", "tar_param.yaml": "string" }, "access_token": "token" } |
POST | /workflows/update_workflow/ | Updates an existing CWL workflow of class Workflow |
application/json | data (body), example: { "workflow_name":"demo_workflow.cwl", "workflow_version": "v1.0", "files": {"workflow_file": "string", "spec_file": "string",}, "update": {"version":"v1.1"}, "access_token": "token" } |
POST | /workflows/update_docker/ | Associate a CWL workflow of class Workflow with a different Docker Environment. |
application/json | data (body), example: { "workflow_name":"demo_workflow.cwl", "workflow_version": "v1.0", "docker_name": "test", "docker_tag": "v1.0", "access_token": "token" } |
DELETE | /workflows/delete_workflow/ | Deletes an existing CWL workflow (class Workflow) and all the associated Workflow parts (class CommandLineTool). |
application/json | data (body), example: { "workflow_name":"demo_workflow.cwl", "workflow_version": "v1.0", "access_token": "token" } |
GET | /workflows/bynameversion/ | Retrieve a CWL workflow of class Workflow and its associated workflow parts as well as the related docker environment, based on the workflow name and version. |
application/json | -workflow_name(string) -workflow_version(string) |
GET | /workflows/download | Downloads in a zip file all the CWL files (Workflow and CommandLineTool) as well as the relevant Dockerfile and scripts (if the parameter dockerized is provided) |
application/json | -workflow_name(string) -workflow_version(string) -dockerized(boolean) |
application/json | ||||
POST | /workflow_parts/add/ | Adds a new CommandLineTool CWL in an existing CWL workflow | application/json | data (body), example: { "workflow_name":"demo_workflow.cwl", "workflow_version": "v1.0", "workflow_part_name":"arguments.cwl", "workflow_part_version": "v1.0", "spec_name": "arguments.yaml", "files": {"arguments.cwl": "string", "arguments.yaml": "string"}, "access_token": "token" } |
POST | /workflow_parts/edit/ | Edits an existing CommandLineTool CWL workflow | application/json | data (body), example: { "workflow_name":"demo_workflow.cwl", "workflow_version": "v1.0", "workflow_part_name":"arguments.cwl", "workflow_part_version": "v1.0", "spec_name": "arguments.yaml", "files": { "arguments.cwl":"string", "arguments.yaml": "string" }, "update": { "version":"v1.1” }, "access_token": "token" } |
DELETE | /workflow_parts/delete/ | Deletes an existing CommandLIneTool CWL workflow | application/json | data (body), example:
{ "workflow_name":"demo_workflow.cwl", "workflow_version": "v1.0", "workflow_part_name": "arguments.cwl", "workflow_part_version": "v1.0", "access_token": "token" } |
GET | /workflow_parts/bynameversion | Retrieves a specific CommandLineTool CWL based on its parent (name & version) and its own name and version. |
application/json | -workflow_name(string) -workflow_version(string) -workflow_part_name(string) -workflow_part_version(string) |
GET | /workflow_parts/download/ | Downloads a specific CWL of class CommandLineTool | application/json | -workflow_name(string) -workflow_version(string) -workflow_part_name(string) -workflow_part_version(string) |
POST | /accounts/login/ | Authenticates a user (login) used by the dare-login component described above when a user calls the /auth/ endpoint of the dare-login. If the user does not exist in the CWL workflow registry’s local DB, it creates a new user. |
application/json | data (body), example: { "username": "string", "password":"string", "access_token":"string", "email":"string", "given_name":"string", "family_name":"string" } |
Execution API
General
Execution API provides endpoints for multiple execution contexts:
- Dispel4py: dynamically creates containers to execute Dispel4py workflows
- CWL: execution environments spawned on-demand to execute CWL workflows.
- Specfem: creates containers in a dynamic way to execute Specfem executable. This endpoint is deprecated, Specfem is now executed via the CWL endpoint.
API calls
HTTP method | Endpoint | Description | Content Type | Parameters |
---|---|---|---|---|
POST | /create-folders/ | Endpoint used by the /auth/ endpoint of dare-login. Checks if the user’s workspace in the DARE platform is available, otherwise it creates the necessary folder structure |
application/json | data (body), example: { "username": "string" } |
POST | /d4p-mpi-spec/ | Used internally by the dispel4py execution environment in order to retrieve the respective PE Implementation and spec.yaml |
application/json | data (body),example: { "pe_imple": "name", "nodes": 3, "input_data": {} } |
POST | /run-d4p/ | Creates a new dispel4py execution environment, using the Kubernetes API. Generates a new run directory, stored under the user’s “runs” folder (i.e. / All the execution results are stored in the generated run directory. |
application/json | data (body), example: { "access_token": "string", "workspace": "string", "pckg": "string", "pe_name":"string", "target":"string", "nodes":1 } |
POST | /run-specfem/ | Deprecated endpoint. Use /run-cwl instead. | application/json | - |
POST | /run-cwl/ | Endpoint to instantiate an execution environment for CWL workflow execution. The environment to be instantiated is retrieved from the CWL using the CWL Workflow Registry. Generates a new run directory, stored under the user’s “runs” folder (i.e. / results are stored in the generated run directory. |
application/json | data (body), example:
{ "access_token": "string", "nodes":12, "workflow_name":"string", "workflow_version": "string", "input_data": { "example1":"string" } } |
POST | /upload/ | Endpoint used to upload files in the DARE platform. The files are stored under the user’s home directory. The home directory is named after his/hers username and inside there are 3 folders, i.e. uploads, debug and runs. All the uploaded files are stored under the user’s “uploads” directory |
application/json | data (body), example:
{ "dataset_name": "string", "path": "string", "access_token": "string", "files": [ } |
GET | /my-files/ | Lists all the users’ directories under the “uploads”, “runs” and “debug” folders. If the parameter num_run_dirs is present, the response is limited to the most recent directories based on the number provided in the aforementioned parameter |
application/json | -access_token(string) -num_run_dirs(integer) |
GET | /list/ | Lists all the files inside a specific directory. This directory could be retrieved from the previous endpoint |
application/json | -access_token(string) -path(string) |
GET | /download/ | Downloads a specific file from the DARE platform. To find the file’s full path use the two previous endpoints |
application/json | -access_token(string) -path(string) |
GET | /send2drop/ | Uploads files from the dare platform to B2DROP | application/json | -access_token(string) -path(string) |
GET | /cleanup/ | Clears the user’s folders (uploads, runs, debug) | application/json | -access_token(string) -runs(boolean) -uploads(boolean) -debug(boolean) |
GET | /my-pods | List the running jobs of a user | application/json | data example: {"access_token": "string"} |
Technical documentation of the component is also available here
Provenance
Version: v1
/data
GET
Description: The data is selected by specifying a query string. Query parameters allow to search by attribution to a component or to an implementation, generation by a workflow execution and by combining more metadata and parameters terms with their min and max valuesranges. Mode of the search can also be indicated (mode ::= (OR | AND). It will apply to the search upon metadata and parameters values-ranges
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
usernames | query | csv list of users the Workflows Executons are associated with | No | string |
terms | query | csv list of metadata or parameter terms. These relate positionally to the maxvalues and the minvalues | No | string |
functionNames | query | csv list of functions the Data was generated with | No | string |
wasAttributedTo | query | csv list of Component or Component Instances involved in the generation of the Data | No | string |
minvalues | query | csv list of metadata or parameters minvalues. These relate positionally to the terms and the minvalues | No | string |
rformat | query | unimplemented: format of the response payload (json,json-ld) | No | string |
start | query | index of the starting item | Yes | integer |
limit | query | max number of items expected | Yes | integer |
maxvalues | query | csv list of metadata or parameters maxvalues. These relate positionally to the terms and the minvalues | No | string |
formats | query | csv list of data formats (eg. mime-types) | No | string |
types | query | csv list of data types | No | string |
wasGeneratedBy | query | the id of the Invocation that generated the Data | No | string |
Responses
Code | Description |
---|
/data/filterOnAncestor
POST
Description: Filter a list of data ids based on the existence of at least one ancestor in their data dependency graph, according to a list of metadata terms and their min and max values-ranges. Maximum depth level and mode of the search can also be indicated (mode ::= (OR | AND)
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
body | body | No | object |
Responses
Code | Description |
---|
/data/{data_id}
GET
Description: Extract Data and their DataGranules by the Data id
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
data_id | path | Yes | string |
Responses
Code | Description |
---|
/data/{data_id}/derivedData
GET
Description: Starting from a specific data entity of the data dependency is possible to navigate through the derived data or backwards across the element’s data dependencies. The number of traversal steps is provided as a parameter (level).
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
level | query | level of depth in the data derivation graph, starting from the current Data | Yes | string |
data_id | path | Yes | string |
Responses
Code | Description |
---|
/data/{data_id}/export
GET
Description: Export of provenance information PROV-XML or RDF format. The S-PROV information returned covers the whole workflow execution or is restricted to a single data element. In the latter case, the graph is returned by following the derivations within and across runs. A level parameter allows to indicate the depth of the resulting trace
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
format | query | export format of the PROV document returned | No | string |
rdfout | query | export rdf format of the PROV document returned | No | string |
creator | query | the name of the user requesting the export | No | string |
level | query | level of depth in the data derivation graph, starting from the current Data | Yes | string |
data_id | path | Yes | string |
Responses
Code | Description |
---|
/data/{data_id}/wasDerivedFrom
GET
Description: Starting from a specific data entity of the data dependency is possible to navigate through the derived data or backwards across the element’s data dependencies. The number of traversal steps is provided as a parameter (level).
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
level | query | level of depth in the data derivation graph, starting from the current Data | Yes | string |
data_id | path | Yes | string |
Responses
Code | Description |
---|
/instances/{instid}
GET
Description: Extract details about a single instance or component by specifying its id. The returning document will indicate the changes that occurred, reporting the first invocation affected. It support the specification of a list of runIds the instance was wasAssociateFor, considering that the same instance could be used across multiple runs
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
start | query | index of the starting item | Yes | integer |
limit | query | max number of items expected | Yes | integer |
wasAssociateFor | query | cvs list of runIds the instance was wasAssociateFor (when more instances are reused in multiple workflow executions) | No | string |
instid | path | Yes | string |
Responses
Code | Description |
---|
/invocations/{invocid}
GET
Description: Extract details about a single invocation by specifying its id
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
invocid | path | Yes | string |
Responses
Code | Description |
---|
/summaries/collaborative
GET
Description: Extract information about the reuse and exchange of data between workflow executions based on terms' valuesranges and a group of users. The API method allows for inclusive or exclusive (mode ::= (OR j AND) queries on the terms' values. As above, additional details, such as running infrastructure, type and name of the workflow can be selectively extracted by assigning these properties to a groupBy parameter. This will support the generation of grouped views
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
wasAssociatedWith | query | csv lis of Components involved in the Workflow Executions | No | string |
usernames | query | csv list of users the Workflows Executons are associated with | No | string |
terms | query | csv list of metadata or parameter terms. These relate positionally to the maxvalues and the minvalues | No | string |
functionNames | query | csv list of functions that are executed by at least one workflow’s components | No | string |
level | query | level of depth in the data derivation graph, starting from the current Data | Yes | string |
minvalues | query | csv list of metadata or parameters minvalues. These relate positionally to the terms and the minvalues | No | string |
rformat | query | unimplemented: format of the response payload (json,json-ld) | No | string |
maxvalues | query | csv list of metadata or parameters maxvalues. These relate positionally to the terms and the minvalues | No | string |
formats | query | csv list of data formats (eg. mime-types) | No | string |
clusters | query | csv list of clusters that describe and group one or more workflow’s component | No | string |
groupby | query | express the grouping of the returned data | No | string |
types | query | No | string | |
mode | query | execution mode of the workflow in case it support different kind of concrete mappings (eg. mpi, simple, multiprocess, etc.. | No | string |
Responses
Code | Description |
---|
/summaries/workflowexecution
GET
Description: Produce a detailed overview of the distribution of the computation, reporting the size of data movements between the workflow components, their instances or invocations across worker nodes, depending on the specified granularity level. Additional information, such as process pid, worker, instance or component of the workflow (depending on the level of granularity) can be selectively extracted by assigning these properties to a groupBy parameter. This will support the generation of grouped views
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
level | query | level of depth in the data derivation graph, starting from the current Data | Yes | string |
mintime | query | minimum start time of the Invocation | No | string |
groupby | query | express the grouping of the returned data | No | string |
runId | query | the id of the run to be analysed | No | string |
maxtime | query | maximum start time of the Invocation | No | string |
maxidx | query | maximum iteration index of an Invocation | No | integer |
minidx | query | minimum iteration index of an Invocation | No | integer |
Responses
Code | Description |
---|
/terms
GET
Description: Return a list of discoverable metadata terms based on their appearance for a list of runIds, usernames, or for the whole provenance archive. Terms are returned indicating their type (when consistently used), min and max values and their number occurrences within the scope of the search
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
aggregationLevel | query | set whether the terms need to be aggreagated by runId, username or across the whole collection (all) | No | string |
usernames | query | csv list of usernames | No | string |
runIds | query | csv list of run ids | No | string |
Responses
Code | Description |
---|
/workflowexecutions
GET
Description: Extract documents from the bundle collection according to a query string which may include usernames, type of the workflow, the components the run wasAssociatedWith and their implementations. Data results' metadata and parameters can also be queried by specifying the terms and their min and max values-ranges and data formats. Mode of the search can also be indicated (mode ::= (OR j AND). It will apply to the search upon metadata and parameters values of each run
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
wasAssociatedWith | query | csv lis of Components involved in the Workflow Executions | No | string |
usernames | query | csv list of users the Workflows Executons are associated with | No | string |
terms | query | csv list of metadata or parameter terms. These relate positionally to the maxvalues and the minvalues | No | string |
functionNames | query | csv list of functions that are executed by at least one workflow’s components | No | string |
minvalues | query | csv list of metadata or parameters minvalues. These relate positionally to the terms and the minvalues | No | string |
rformat | query | unimplemented: format of the response payload (json,json-ld) | No | string |
start | query | index of the starting item | Yes | integer |
limit | query | max number of items expected | Yes | integer |
maxvalues | query | csv list of metadata or parameters maxvalues. These relate positionally to the terms and the minvalues | No | string |
formats | query | csv list of data formats (eg. mime-types) | No | string |
clusters | query | csv list of clusters that describe and group one or more workflow’s component | No | string |
types | query | No | string | |
mode | query | execution mode of the workflow in case it support different kind of concrete mappings (eg. mpi, simple, multiprocess, etc.. | No | string |
Responses
Code | Description |
---|
/workflowexecutions/insert
POST
Description: Bulk insert of bundle or lineage documents in JSON format. These must be provided as encoded stirng in a POST request
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
body | body | No | object |
Responses
Code | Description |
---|
/workflowexecutions/import
POST
Description: Import of provenance output which is not yet mapped to the s-ProvFlowMongoDB format. The files provided in the archive will be mapped to s-ProvFlowMongoDB if they are in one of the supported formats.
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
archive | form | Zip archive of provenance output, which will be mapped to s-ProvFlowMongoDB and stored. Currently only files in the CWLProv format are supported | Yes | file |
format | form | Format of the provenance output to be imported. | Yes | String |
Responses
Code | Description |
---|
/workflowexecutions/{run_id}/export
GET
Description: Export of provenance information PROV-XML or RDF format. The S-PROV information returned covers the whole workflow execution or is restricted to a single data element. In the latter case, the graph is returned by following the derivations within and across runs. A level parameter allows to indicate the depth of the resulting trace
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
rdfout | query | export rdf format of the PROV document returned | No | string |
creator | query | the name of the user requesting the export | No | string |
format | query | export format of the PROV document returned | No | string |
run_id | path | Yes | string |
Responses
Code | Description |
---|
/workflowexecutions/{runid}
DELETE
Description: Extract documents from the bundle collection by the runid of a WFExecution. The method will return input data and infomation about the components and the libraries used for the specific run
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
runid | path | Yes | string |
Responses
Code | Description |
---|
GET
Description: Extract documents from the bundle collection by the runid of a WFExecution. The method will return input data and infomation about the components and the libraries used for the specific run
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
runid | path | Yes | string |
Responses
Code | Description |
---|
/workflowexecutions/{runid}/delete
POST
Description: Delete a workflow execution trace, including its bundle and all its lineage documents
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
runid | path | Yes | string |
Responses
Code | Description |
---|
/workflowexecutions/{runid}/edit
POST
Description: Update of the description of a workflow execution. Users can improve this information in free-tex
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
body | body | No | object | |
runid | path | Yes | string |
Responses
Code | Description |
---|
/workflowexecutions/{runid}/showactivity
GET
Description: Extract detailed information related to the activity related to a WFExecution (id). The result-set can be grouped by invocations, instances or components (parameter level) and shows progress, anomalies (such as exceptions or systems' and users messages), occurrence of changes and the rapid availability of accessible data bearing intermediate results. This method can also be used for runtime monitoring
Parameters
Name | Located in | Description | Required | Schema |
---|---|---|---|---|
start | query | index of the starting item | Yes | integer |
limit | query | max number of items expected | Yes | integer |
level | query | level of aggregation of the monitoring information (component, instance, invocation, cluster) | No | string |
runid | path | Yes | string |
Responses
Code | Description |
---|
Testing environment
The purpose of this component is to provide a DARE environment for test and debugging purposes. The component exposes two endpoints:
- The /playground endpoint: this simulates the dispel4py execution in DARE and prints the logs and output content directly to the user
- The /run-command endpoint: accepts any bash command, which is executed and returns the result to the user
Use in notebook
- For the first endpoint, you need to execute the first steps as always: login, create workspace, register the workflow
- For the second endpoint, you need to provide the endpoint, the token, the command and the output file name if exists
Update helper_functions
Add the below two methods in helper_functions:
- For the first endpoint
def debug_d4p(hostname, impl_id, pckg, workspace_id, pe_name, token, reqs=None, output_filename="output.txt",
**kw):
# Prepare data for posting
data = {
"impl_id": impl_id,
"pckg": pckg,
"wrkspce_id": workspace_id,
"n_nodes": 1,
"name": pe_name,
"access_token": token,
"output_filename": output_filename,
"reqs": reqs if not (reqs is None) else "None"
}
d4p_args = {}
for k in kw:
d4p_args[k] = kw.get(k)
data['d4p_args'] = d4p_args
r = requests.post(hostname + '/playground', data=json.dumps(data))
if r.status_code == 200:
response = json.loads(r.text)
if response["logs"]:
print("Logs:\n========================")
for log in response["logs"]:
print(log)
if response["output"]:
print("Output content:\n==============================")
for output in response["output"]:
print(output)
else:
print('Playground returns status_code: \
' + str(r.status_code))
print(r.text)
- For the second endpoint:
import requests
import json
def exec_command(hostname, token, command, run_dir="new", output_filename="output.txt"):
data = {
"access_token": token,
"command": command,
"run_dir": run_dir,
"output_filename": output_filename
}
r = requests.post(hostname + '/run-command', data=json.dumps(data))
if r.status_code == 200:
response = json.loads(r.text)
if response["logs"]:
print("Logs:\n========================")
for log in response["logs"]:
print(log)
if response["output"]:
print("Output content:\n==============================")
for output in response["output"]:
print(output)
if response["run_dir"]:
print("Run directory is: ")
print(response["run_dir"])
else:
print('Playground returns status_code: \
' + str(r.status_code))
print(r.text)
Update the jupyter notebook
- For the /playground endpoint:
F.debug_d4p(impl_id=impl_id, pckg="mysplitmerge_pckg", workspace_id=workspace_id, pe_name="mySplitMerge",
token=F.auth(), creds=creds, no_processes=6, iterations=1,
reqs='https://gitlab.com/project-dare/dare-api/raw/master/examples/jupyter/requirements.txt')
- For the /run-command endpoint:
F.exec_command(PLAYGROUND_API_HOSTNAME, F.auth(), "pip install --user numpy")
Technical documentation of the component is also available here
Semantic Data Discovery
The API documentation of the Semantic Data Discovery component is available in our testbed environment
Dispel4py Documentation
dispel4py is a free and open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. It enables users to focus on their scientific methods, avoiding distracting details and retaining flexibility over the computing infrastructure they use. It delivers mappings to diverse computing infrastructures, including cloud technologies, HPC architectures and specialised data-intensive machines, to move seamlessly into production with large-scale data loads. The dispel4py system maps workflows dynamically onto multiple enactment systems, and supports parallel processing on distributed memory systems with MPI and shared memory systems with multiprocessing, without users having to modify their workflows.
Dependencies
dispel4py has been tested with Python 2.7.6, 2.7.5, 2.7.2, 2.6.6 and Python 3.4.3, 3.6, 3.7.
The following Python packages are required to run dispel4py:
- networkx (https://networkx.github.io/)
If using the MPI mapping:
- mpi4py (http://mpi4py.scipy.org/)
Installation
Clone this repository to your desktop. You can then install from the local copy to your python environment by calling:
python setup.py install
from the dispel4py root directory.
Docker
The Dockerfile in the dispel4py root directory builds a Debian Linux distribution and installs dispel4py and OpenMPI.
docker build . -t dare-dispel4py
Start a Docker container with the dispel4py image in interactive mode with a bash shell:
docker run -it dare-dispel4py /bin/bash
For the EPOS use cases obspy is included in a separate Dockerfile Dockerfile.seismo
:
docker build . -f Dockerfile.seismo -t dare-dispel4py-seismo
Provenance in Dispel4py
lean_empty
.. code-block:: python
clean_empty(d)
Utility function that given a dictionary in input, removes all the properties that are set to None. It workes recursively through lists and nested documents
total_size
.. code-block:: python
total_size(o, handlers={}, verbose=False)
Returns the approximate memory footprint an object and all of its contents.
Automatically finds the contents of the following builtin containers and their subclasses: tuple, list, deque, dict, set and frozenset. To search other containers, add handlers to iterate over their contents:
handlers = {SomeContainerClass: iter, OtherContainerClass: OtherContainerClass.get_elements}
write
.. code-block:: python
write(self, name, data)
Redefines the native write function of the dispel4py SimpleFunctionPE to take into account provenance payload when transfering data.
getDestination_prov
.. code-block:: python
getDestination_prov(self, data)
When provenance is activated it redefines the native dispel4py.new.process getDestination function to take into account provenance information when redirecting grouped operations.
commandChain
.. code-block:: python
commandChain(commands, envhpc, queue=None)
Utility function to execute a chain of system commands on the hosting oeprating system. The current environment variable can be passed as parameter env. The queue parameter is used to store the stdoutdata, stderrdata of each process in message
ProvenanceType
.. code-block:: python
ProvenanceType(self)
A workflow is a program that combines atomic and independent processing elements via a specification language and a library of components. More advanced systems adopt abstractions to facilitate re-use of workflows across users'' contexts and application domains. While methods can be multi-disciplinary, provenance should be meaningful to the domain adopting them. Therefore, a portable specification of a workflow requires mechanisms allowing the contextualisation of the provenance produced. For instance, users may want to extract domain-metadata from a component or groups of components adopting vocabularies that match their domain and current research, tuning the level of granularity. To allow this level of flexibility, we explore an approach that considers a workflow component described by a class, according to the Object-Oriented paradigm. The class defines the behaviour of its instances as their type, which specifies what an instance will do in terms of a set of methods. We introduce the concept of ProvenanceType\ , that augments the basic behaviour by extending the class native type, so that a subset of those methods perform the additional actions needed to deliver provenance data. Some of these are being used by some of the preexisting methods, and characterise the behaviour of the specific provenance type, some others can be used by the developer to easily control precision and granularity. This approach, tries to balance between automation, transparency and explicit intervention of the developer of a data-intensive tool, who can tune provenance-awareness through easy-to-use extensions.
The type-based approach to provenance collection provides a generic ProvenanceType class that defines the properties of a provenance-aware workflow component. It provides a wrapper that meets the provenance requirements, while leaving the computational behaviour of the component unchanged. Types may be developed as Pattern Type and Contextual Type to represent respectively complex computational patterns and to capture specific metadata contextualisations associated to the produce output data.
The ProvenanceType presents the following class constants to indicate where the lineage information will be stored. Options include a remote repository, a local file system or a ProvenanceSensor (experimental).
- _SAVE_MODE\ SERVICE=‘service’
- _SAVE_MODE\ FILE=‘file’
- _SAVE_MODE\ SENSOR=‘sensor’
The following variables will be used to configure some general provenance capturing properties
- _PROV\ PATH\ : When _SAVE_MODE\ SERVICE is chosen, this variable should be populated with a string indicating a file system path where the lineage will be stored.
- _REPOS\ URL\ : When _SAVE_MODE\ SERVICE is chosen, this variable should be populated with a string indicating the repository endpoint (S-ProvFlow) where the provenance will be sent.
- _PROV_EXPORT_URL: The service endpoint from where the provenance of a workflow execution, after being stored, can be extracted in PROV format.
- _BULK\ SIZE\ : Number of lineage documents to be stored in a single file or in a single request to the remote service. Helps tuning the overhead brough by the latency of accessing storage resources.
getProvStateObjectId ^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.getProvStateObjectId(self, name)
Return the id of a named object stored in the provenance state
apply_derivation_rule ^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.apply_derivation_rule(self, event, voidInvocation, oport=None, iport=None, data=None, metadata=None)
In support of the implementation of a ProvenanceType realising a lineage Pattern type. This method is invoked by the ProvenanceType each iteration when a decision has to be made whether to ignore or discard the dependencies on the ingested stream and stateful entities, applying a specific provenance pattern, thereby creating input/output derivations. The framework invokes this method every time the data is written on an output port (\ event\ : write\ ) and every time an invocation (\ s-prov:Invocation\ ) ends (\ event\ : _end_invocation\ event\ ). The latter can be further described by the boolean parameter voidInvocation\ , indicating whether the invocation terminated with any data produced. The default implementation provides a stateless behaviour, where the output depends only from the input data recieved during the invocation.
getInputAt ^^^^^^^^^^
.. code-block:: python
ProvenanceType.getInputAt(self, port=‘input’, index=None)
Return input data currently available at a specific port. When reading input of a grouped operator, the gindex parameter allows to access exclusively the data related to the group index.
addNamespacePrefix ^^^^^^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.addNamespacePrefix(self, prefix, url)
In support of the implementation of a ProvenanceType realising a lineage Contextualisation type. A Namespace prefix can be declared with its vocabulary url to map the metadata terms to external controlled vocabularies. They can be used to qualify the metadata terms extracted from the extractItemMetadata function, as well as for those terms injected selectively at runtime by the write method. The namespaces will be used consistently when exporting the lineage traces to semantic-web formats, such as RDF.
extractItemMetadata ^^^^^^^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.extractItemMetadata(self, data, port)
In support of the implementation of a ProvenanceType realising a lineage Contextualisation type. Extracts metadata from the domain specific content of the data (s-prov:DataGranules) written on a components output port\ , according to a particular vocabulary.
ignorePastFlow ^^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.ignorePastFlow(self)
In support of the implementation of a ProvenanceType realising a lineage Pattern type.
It instructs the type to ignore the all the inputs when the method _apply_derivation\ rule is invoked for a certain event."
ignoreState ^^^^^^^^^^^
.. code-block:: python
ProvenanceType.ignoreState(self)
In support of the implementation of a ProvenanceType realising a lineage Pattern type.
It instructs the type to ignore the content of the provenance state when the method _apply_derivation\ rule is invoked for a certain event."
discardState ^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.discardState(self)
In support of the implementation of a ProvenanceType realising a lineage Pattern type.
It instructs the type to reset the data dependencies in the provenance state when the method _apply_derivation\ rule is invoked for a certain event. These will not be availabe in the following invocations."
discardInFlow ^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.discardInFlow(self, wlength=None, discardState=False)
In support of the implementation of a ProvenanceType realising a lineage Pattern type.
It instructs the type to reset the data dependencies related to the component'’s inputs when the method _apply_derivation\ rule is invoked for a certain event. These will not be availabe in the following invocations."
update_prov_state ^^^^^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.update_prov_state(self, lookupterm, data, location='', format='', metadata={}, ignore_inputs=False, ignore_state=True, stateless=False, **kwargs)
In support of the implementation of a ProvenanceType realising a lineage Pattern type or inn those circumstances where developers require to explicitly manage the provenance information within the component'’s logic,.
Updates the provenance state (\ s-prov:StateCollection\ ) with a reference, identified by a lookupterm\ , to a new data entity or to the current input. The lookupterm will allow developers to refer to the entity when this is used to derive new data. Developers can specify additional medatata by passing a metadata dictionary. This will enrich the one generated by the extractItemMetadata method. Optionally the can also specify format and location of the output when this is a concrete resource (file, db entry, online url), as well as instructing the provenance generation to ‘ignore_input’ and ‘ignore_state’ dependencies.
The kwargs parameter allows to pass an argument dep where developers can specify a list of data id to explicitly declare dependencies with any data in the provenance state (\ s-prov:StateCollection\ ).
write ^^^^^
.. code-block:: python
ProvenanceType.write(self, name, data, **kwargs)
This is the native write operation of dispel4py triggering the transfer of data between adjacent components of a workflow. It is extended by the ProvenanceType with explicit provenance controls through the kwargs parameter. We assume these to be ignored when provenance is deactivated. Also this method can use the lookup tags to establish dependencies of output data on entities in the provenance state.
The kwargs parameter allows to pass the following arguments:
- dep : developers can specify a list of data id to explicitly declare dependencies with any data in the provenance state (\ s-prov:StateCollection\ ).
- metadata\ : developers can specify additional medatata by passing a metadata dictionary.
- _ignore\ inputs\ : instructs the provenance generation to ignore the dependencies on the current inputs.
- format\ : the format of the output.
- location\ : location of the output when this is a concrete resource (file, db entry, online url).
checkSelectiveRule ^^^^^^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.checkSelectiveRule(self, streammeta)
In alignement with what was previously specified in the configure_prov_run for the Processing Element, check the data granule metadata whether its properies values fall in a selective provenance generation rule.
checkTransferRule ^^^^^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.checkTransferRule(self, streammeta)
In alignement with what was previously specified in the configure_prov_run for the Processing Element, check the data granule metadata whether its properies values fall in a selective data transfer rule.
extractDataSourceId ^^^^^^^^^^^^^^^^^^^
.. code-block:: python
ProvenanceType.extractDataSourceId(self, data, port)
In support of the implementation of a ProvenanceType realising a lineage Pattern type. Extract the id from the incoming data, if applicable, to reuse it to identify the correspondent provenance entity. This functionality is handy especially when a workflow component ingests data represented by self-contained and structured file formats. For instance, the NetCDF attributes Convention includes in its internal metadata an id that can be reused to ensure the linkage and therefore the consistent continuation of provenance tracesbetween workflow executions that generate and use the same data.
AccumulateFlow
.. code-block:: python
AccumulateFlow(self)
A Pattern type for a Processing Element (\ s-prov:Component\ ) whose output depends on a sequence of input data; e.g. computation of periodic average.
Nby1Flow
.. code-block:: python
Nby1Flow(self)
A Pattern type for a Processing Element (\ s-prov:Component\ ) whose output depends on the data received on all its input ports in lock-step; e.g. combined analysis of multiple variables.
SlideFlow
.. code-block:: python
SlideFlow(self)
A Pattern type for a Processing Element (\ s-prov:Component\ ) whose output depends on computations over sliding windows; e.g. computation of rolling sums.
ASTGrouped
.. code-block:: python
ASTGrouped(self)
A Pattern type for a Processing Element (\ s-prov:Component\ ) that manages a stateful operator with grouping rules; e.g. a component that produces a correlation matrix with the incoming coefficients associated with the same sampling-iteration index
SingleInvocationFlow
.. code-block:: python
SingleInvocationFlow(self)
A Pattern type for a Processing Element (\ s-prov:Component\ ) that presents stateless input output dependencies; e.g. the Processing Element of a simple I/O pipeline.
AccumulateStateTrace
.. code-block:: python
AccumulateStateTrace(self)
A Pattern type for a Processing Element (\ s-prov:Component\ ) that keeps track of the updates on intermediate results written to the output after a sequence of inputs; e.g. traceable approximation of frequency counts or of periodic averages.
IntermediateStatefulOut
.. code-block:: python
IntermediateStatefulOut(self)
A Pattern type for a Processing Element (\ s-prov:Component\ ) stateful component which produces distinct but interdependent output; e.g. detection of events over periodic observations or any component that reuses the data just written to generate a new product
ForceStateless
.. code-block:: python
ForceStateless(self)
A Pattern type for a Processing Element (\ s-prov:Component\ ). It considers the outputs of the component dependent only on the current input data, regardless from any explicit state update; e.g. the user wants to reduce the amount of lineage produced by a component that presents inline calls to the _update_prov\ state\ , accepting less accuracy.
get_source
.. code-block:: python
get_source(object, spacing=10, collapse=1)
Print methods and doc strings. Takes module, class, list, dictionary, or string.
injectProv
.. code-block:: python
injectProv(object, provType, active=True, componentsType=None, workflow={}, **kwargs)
This function dinamically extend the type of each the nodes of the graph or subgraph with ProvenanceType type or its specialisation.
configure_prov_run
.. code-block:: python
configure_prov_run(graph, provRecorderClass=None, provImpClass=<class ‘dispel4py.provenance.ProvenanceType’>, input=None, username=None, workflowId=None, description=None, system_id=None, workflowName=None, workflowType=None, w3c_prov=False, runId=None, componentsType=None, clustersRecorders={}, feedbackPEs=[], save_mode=‘file’, sel_rules={}, transfer_rules={}, update=False, sprovConfig=None, sessionId=None, mapping=‘simple’)
In order to enable the user of a data-intensive application to configure the lineage metadata extracted from the execution of their worklfows we adopt a provenance configuration profile. The configuration is used at the time of the initialisation of the workflow to prepare its provenance-aware execution. We consider that a chosen configuration may be influenced by personal and community preferences, as well as by rules introduced by institutional policies. For instance, a certain RI would require to choose among a set of contextualisation types, in order to adhere to the infrastructure’s metadata portfolio. Thus, a provenance configuration profile play in favour of more generality, encouraging the implementation and the re-use of fundamental methods across disciplines.
With this method, the users of the workflow provide general provenance information on the attribution of the run, such as username\ , runId (execution id), description\ , workflowName\ , and its semantic characterisation workflowType. It allows users to indicate which provenance types to apply to each component and the belonging conceptual provenance cluster. Moreover, users can also choose where to store the lineage (_save\ mode\ ), locally in the file system or in a remote service or database. Lineage storage operations can be performed in bulk, with different impacts on the overall overhead and on the experienced rapidity of access to the lineage information.
- Configuration JSON\ : We show here an example of the JSON document used to prepare a worklfow for a provenance aware execution. Some properties are described inline. These are defined by terms in the provone and s-prov namespaces.
.. code-block:: python
{
'provone:User': "aspinuso",
's-prov:description' : "provdemo demokritos",
's-prov:workflowName': "demo_epos",
# Assign a generic characterisation or aim of the workflow
's-prov:workflowType': "seis:preprocess",
# Specify the unique id of the workflow
's-prov:workflowId' : "workflow process",
# Specify whether the lineage is saved locally to the file system or remotely to an existing serivce (for location setup check the class prperties or the command line instructions section.)
's-prov:save-mode' : 'service' ,
# Assign the Provenance Types and Provenance Clusters to the processing elements of the workflows. These are indicated by the name attributed to their class or function, eg. PE_taper. The 's-prov:type' property accepts a list of class names, corrisponding to the types' implementation. The 's-prov:cluster' is used to group more processing elements to a common functional section of the workflow.
's-prov:componentsType' :
{'PE_taper': {'s-prov:type':["SeismoPE"]),
's-prov:prov-cluster':'seis:Processor'},
'PE_plot_stream': {'s-prov:prov-cluster':'seis:Visualisation',
's-prov:type':["SeismoPE"]},
'StoreStream': {'s-prov:prov-cluster':'seis:DataHandler',
's-prov:type':["SeismoPE,AccumulateFlow"]}
}}
- Selectivity rules\ : By declaratively indicating a set of Selectivity rules for every component (’s-prov:sel_rules'), users can respectively activate the collection of the provenance for particular Data elements or trigger transfer operations of the data to external locations. The approach takes advantage of the contextualisation possibilities offered by the provenance Contextualisation types. The rules consist of comparison expressions formulated in JSON that indicate the boundary values for a specific metadata term. Such representation is inspired by the query language and selectors adopted by a popular document store, MongoDB. These can be defined also within the configuration JSON introduced above.
Example, a Processing Element CorrCoef that produces lineage information only when the rho value is greater than 0:
.. code-block:: python
{ "CorrCoef": {
"rules": {
"rho": {
"$gt": 0
}}}}
- ** Command Line Activation**\ : To enable proveance activation through command line dispel4py should be executed with specific command line instructions. The following command will execute a local test for the provenance-aware execution of the MySplitAndMerge workflow.**
.. code-block:: python
dispel4py –provenance-config=dispel4py/examples/prov_testing/prov-config-mysplitmerge.json –provenance-repository-url=
- The following command instead stores the provenance files to the local filesystem in a given directory. To activate this mode, the property s-prov:save_mode of the configuration file needs to be set to ‘file’.
.. code-block:: python
dispel4py --provenance-config=dispel4py/examples/prov_testing/prov-config-mysplitmerge.json --provenance-path=/path/to/prov multi dispel4py/examples/prov_testing/mySplitMerge_prov.py -n 10
ProvenanceSimpleFunctionPE
.. code-block:: python
ProvenanceSimpleFunctionPE(self, *args, **kwargs)
A Pattern type for the native SimpleFunctionPE of dispel4py
ProvenanceIterativePE
.. code-block:: python
ProvenanceIterativePE(self, *args, **kwargs)
A Pattern type for the native IterativePE Element of dispel4py