Skip to main content
Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/DataObjectTypesAndBuckets"

Line 21: Line 21:
 
<pre>Only persistent buckets will be returned here, transient buckets generated dynamically for a workflow will not be returned.
 
<pre>Only persistent buckets will be returned here, transient buckets generated dynamically for a workflow will not be returned.
 
</pre>  
 
</pre>  
*POST: Add a new persistent bucket. The bucket definition must at least contain the name and the data object type of the bucket, additionally parameters may be set that are used in the data object type to build store and objects names in this bucket. You can create only buckets with data object types that contain a "persistent" definition. See below for a description of available data object types. If an already existing name is used, the bucket will be updated after successful validation.  An actually running job will not be influenced.
+
*POST: Add a new persistent bucket. The bucket definition must at least contain the name and the data object type of the bucket, additionally parameters may be set that are used in the data object type to build store and objects names in this bucket. You can create only buckets with data object types that contain a "persistent" definition. See below for a description of available data object types. If an already existing name is used, the bucket will be updated after successful validation.  An actually running job will not be influenced. If an existing workflow uses a bucket of the same name, even if this bucket is optional and did not exist before, it will be updated for a new job run, too.
If an existing workflow uses a bucket of the same name, even if this bucket is optional and did not exist before, it will be updated for a new job run, too.
+
 
 
'''Usage:'''  
 
'''Usage:'''  
  
Line 38: Line 38:
 
Use a GET request to retrieve information about a bucket with a given name:
 
Use a GET request to retrieve information about a bucket with a given name:
 
'''Supported operations:'''  
 
'''Supported operations:'''  
*GET: Returns the information for the given bucket.
+
*GET: Returns the information for the given bucket.  
 
+
*DELETE: Delete a bucket with the given bucket name. Buckets that are still used by an existing workflow cannot be deleted.
 
'''Usage:'''  
 
'''Usage:'''  
 
*URL: <tt><nowiki>http://<hostname>:8080/smila/jobmanager/buckets/<bucket-name></nowiki></tt>.  
 
*URL: <tt><nowiki>http://<hostname>:8080/smila/jobmanager/buckets/<bucket-name></nowiki></tt>.  
Line 45: Line 45:
 
**GET
 
**GET
 
*Response status codes:  
 
*Response status codes:  
**200 OK: Upon successful execution (GET).
+
**200 OK: Upon successful execution (GET, DELETE). In case of DELETE with non existing bucket name, the call will be ignored.
**404 Server error: If a wrong name is used, a HTTP 404 Server Error is followed by an error in json format.
+
**404 Server error: If a wrong name is used, a HTTP 404 Server Error is followed by an error in json format (GET).
 +
**400 Bad Request: If the bucket is referenced by an existing workflow an error will occur. If a bucket is predefined in the configuration it can't be removed.

Revision as of 07:59, 4 July 2011

Buckets

A bucket is a container for data objects that are processed by workflows. All data objects in a single bucket are located in a single DOS store and share the same naming conventions. The contents of data objects in a single bucket have the same structure which is defined by the data object type. For examples, a data object can be a sequence of records ("record bulk") or an index partition. Different data object types are predefined by the software.

Buckets can be persistent or transient: Objects in transient buckets are deleted automatically when the workflow run that created them has ended while objects in persistent buckets usually stay forever (or are deleted explicitly by some action).

Currently, only persistent buckets of type recordBulk have to be defined explicitly by the user. Transient buckets are created automatically based on workflow definition, and persistent index buckets (templates) are provided automatically. However, other "interesting" data object types may be added to the software in later versions.

Only stores for transient buckets are created automatically by the JobManager, stores for persistent buckets have to be created by the user, either explicitly or by creating an index. The replication level for transient buckets (-> stores created by JobManager) is defined globally in the services.ini.

Bucket parameters may be set if needed, e.g. to create a workflow which works on two different indizes.

Monitor and modify buckets

All buckets

Use a GET request to retrieve monitoring information for all defined (persistent) buckets:

Supported operations:

  • GET: Returns the buckets information. If there are no buckets defined, you will get an empty list.
Only persistent buckets will be returned here, transient buckets generated dynamically for a workflow will not be returned.
  • POST: Add a new persistent bucket. The bucket definition must at least contain the name and the data object type of the bucket, additionally parameters may be set that are used in the data object type to build store and objects names in this bucket. You can create only buckets with data object types that contain a "persistent" definition. See below for a description of available data object types. If an already existing name is used, the bucket will be updated after successful validation. An actually running job will not be influenced. If an existing workflow uses a bucket of the same name, even if this bucket is optional and did not exist before, it will be updated for a new job run, too.

Usage:

  • URL: http://<hostname>:8080/smila/jobmanager/buckets.
  • Allowed methods:
    • GET
    • POST
  • Response status codes:
    • 200 OK: Upon successful execution (GET).
    • 201 CREATED: Upon successfull execution (POST). In case of success a HTTP return code is returned (followed by a JSON object containing the name and URI of the created bucket).
    • 400 Bad Request: If the parameters in the bucket definition would result in incorrect store names, an HTTP 400 Bad Request is followed by an error in json format specifying which bucket and data object type are involved

Specific buckets

Use a GET request to retrieve information about a bucket with a given name: Supported operations:

  • GET: Returns the information for the given bucket.
  • DELETE: Delete a bucket with the given bucket name. Buckets that are still used by an existing workflow cannot be deleted.

Usage:

  • URL: http://<hostname>:8080/smila/jobmanager/buckets/<bucket-name>.
  • Allowed methods:
    • GET
  • Response status codes:
    • 200 OK: Upon successful execution (GET, DELETE). In case of DELETE with non existing bucket name, the call will be ignored.
    • 404 Server error: If a wrong name is used, a HTTP 404 Server Error is followed by an error in json format (GET).
    • 400 Bad Request: If the bucket is referenced by an existing workflow an error will occur. If a bucket is predefined in the configuration it can't be removed.

Back to the top