A bucket is a container for data objects that are processed by workflows. All data objects in a single bucket are located in a single DOS store and share the same naming conventions. The contents of data objects in a single bucket have the same structure which is defined by the data object type. For examples, a data object can be a sequence of records ("record bulk") or an index partition. Different data object types are predefined by the software.
Buckets can be persistent or transient: Objects in transient buckets are deleted automatically when the workflow run that created them has ended while objects in persistent buckets usually stay forever (or are deleted explicitly by some action).
Currently, only persistent buckets of type recordBulk have to be defined explicitly by the user. Transient buckets are created automatically based on workflow definition, and persistent index buckets (templates) are provided automatically. However, other "interesting" data object types may be added to the software in later versions.
Only stores for transient buckets are created automatically by the JobManager, stores for persistent buckets have to be created by the user, either explicitly or by creating an index. The replication level for transient buckets (-> stores created by JobManager) is defined globally in the services.ini.
Bucket parameters may be set if needed, e.g. to create a workflow which works on two different indizes.
Get information about all buckets
Use a GET request to retrieve monitoring information for all defined (persistent) buckets: Supported operations:
- GET: Returns the buckets information. If there are no buckets defined, you will get an empty list.
Only persistent buckets will be returned here, transient buckets generated dynamically for a workflow will not be returned.
- URL: http://<hostname>:8080/smila/buckets.
- Allowed methods:
- GET (with optional URL parameter returnObjects)
- Response status codes:
- 200 OK: Upon successful execution (GET).