Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

SMILA/Documentation/DataObjectTypesAndBuckets

< SMILA‎ | Documentation
Revision as of 12:07, 11 July 2011 by Nadine.auslaender.attensity.com (Talk | contribs) (All data object types)

Buckets and Data Object Types

Buckets

A bucket is a data container comprising logically grouped data objects that are to processed by some asynchronous workflows in SMILA. All data objects in a bucket are physically located in the same store and therefore share the same naming convention. For example, a data object can be a sequence of records ("record bulk") or an index. Also, the contents within one bucket have the same structure as is determined by its data object type. The actual data object types from which you can select when creating a bucket are predefined by the software and cannot be changed during runtime.

An important aspect of buckets is that they can be persistent or transient: Objects in transient buckets are deleted automatically when the workflow run that created them has ended while objects in persistent buckets survive until they are deleted explicitly or another workflow uses them. Whereas persistent buckets have to be created explicitly via the respective REST/JSON API call (see below) before they can be used in a workflow, transient ones are generated automatically by the system based on the definition of the respective workflow and need not and also cannot be created explicitly via this API. Similar, a store referenced by some transient bucket is created automatically by the Job Manager but a store referenced by a persistent bucket must be created beforehand.

Persistent buckets can have parameters that are required for the referenced data object type or for the involved workers to operate when the bucket is referenced in a workflow.

Monitor, create and modify buckets

All buckets

Use a GET request to retrieve monitoring information for all defined (persistent) buckets. Use POST to add new buckets.

Supported operations:

  • GET: Returns the buckets information. If there are no buckets defined, you will get an empty list.
Only persistent buckets will be returned here, transient buckets generated dynamically for a workflow will not be returned.
  • POST: Add a new persistent bucket. The bucket definition must at least contain the name and the data object type of the bucket, additionally parameters may be set that are used in the data object type to build store and objects names in this bucket. You can create only buckets with data object types that contain a "persistent" definition. See below for a description of available data object types. If an already existing name is used, the bucket will be updated after successful validation. An actually running job will not be influenced. If an existing workflow uses a bucket of the same name, even if this bucket is optional and did not exist before, it will be updated for a new job run, too.

Usage:

  • URL: http://<hostname>:8080/smila/jobmanager/buckets.
  • Allowed methods:
    • GET
    • POST
  • Response status codes:
    • 200 OK: Upon successful execution (GET).
    • 201 CREATED: Upon successfull execution (POST). In case of success a HTTP return code is returned (followed by a JSON object containing the name and URI of the created bucket).
    • 400 Bad Request: If the parameters in the bucket definition would result in incorrect store names, an HTTP 400 Bad Request is followed by an error in json format specifying which bucket and data object type are involved

Specific buckets

Use a GET request to retrieve information about a bucket with a given name. Use DELETE to delete a bucket with given name.

Supported operations:

  • GET: Returns the information for the given bucket.
  • DELETE: Delete a bucket with the given bucket name. Buckets that are still used by an existing workflow cannot be deleted.

Usage:

  • URL: http://<hostname>:8080/smila/jobmanager/buckets/<bucket-name>.
  • Allowed methods:
    • GET
    • DELETE
  • Response status codes:
    • 200 OK: Upon successful execution (GET, DELETE). In case of DELETE with non existing bucket name, the call will be ignored.
    • 404 Server error: If a wrong name is used, a HTTP 404 Server Error is followed by an error in json format (GET).
    • 400 Bad Request: If the bucket is referenced by an existing workflow an error will occur. If a bucket is predefined in the configuration it can't be removed.

Data Object Types

The data object type definition is provided with software and cannot be added at runtime. It defines default data object modes and default name patterns required by the software.

  • The data object type definition should not be changed by the user. If something is wrong the job manager would start anyway but errors may occur later.
  • Data object type use parameter variables ${...}. System parameter variables (name starts with _) are resolved automatically, values for other variables must be provided by higher-level definitions (buckets, workflows, jobs)

Data Object Types delivered by software:

{"dataObjectTypes": [
      {
      	"name": "recordBulk", 
         "persistent": {
           "store": "${store}",
           "object": "${_bucketName}/${_uuid}"
         },
         "transient": {
           "store": "${tempStore}",
           "object": "${_bucketName}/${_uuid}"
         }
      }
  ]
}

Notes:

  • ${_uuid} means: new uuid is generated only when creating new bulk. When transforming an existing bulk, the uuid is reused.

The meaning of these definitions is:

  • recordBulk: a data object of this type contains a sequence of records. This type is the standard intermediate object type in workflows.

Monitor data object types

All data object types

Use a GET request to retrieve information about all object data types.

Supported operations:

  • GET: Returns the information for all data object types.

Usage:

  • URL: http://<hostname>:8080/smila/jobmanager/dataobjecttypes/.
  • Allowed methods:
    • GET
  • Response status codes:
    • 200 OK: Upon successful execution

Specific data object type

Use a GET request to retrieve information about a specific object data types.

Supported operations:

  • GET: Returns the information for a specific data object type.

Usage:

  • URL: http://<hostname>:8080/smila/jobmanager/dataobjecttypes/<dataobjecttype-name> .
  • Allowed methods:
    • GET
  • Response status codes:
    • 200 OK: Get data object type definition.

Copyright © Eclipse Foundation, Inc. All Rights Reserved.