Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Worker/PipeletProcessorWorker"

(New page: {{note| Available since SMILA 0.9.0!}} = PipeletWorker (bundle org.eclipse.smila.processing.worker) = The PipeletWorker is a worker designed to execute a single pipe...)
 
m (Fix failOnError description)
Line 65: Line 65:
 
The worker adds all task parameters to a map in attribute <tt>_parameters</tt> in each record before giving it to the pipelet which can access them. The helper class <tt>org.eclipse.smila.processing.parameters.ParameterAccesssor</tt> supports this by checking for requested parameters first in this <tt>_parameters</tt> map, then at the top-level of a record and then in the pipelet configuration. Therefore it's possible to override properties from the pipelet configuration by setting them as task parameters, if the pipelet uses the ParameterAccessor to access parameters in records and configuration. This is done for example by the [[SMILA/Documentation/LuceneIndexPipelet|Lucene indexing]] and [[SMILA/Documentation/SesameOntologyManager|Sesame]] pipelets.
 
The worker adds all task parameters to a map in attribute <tt>_parameters</tt> in each record before giving it to the pipelet which can access them. The helper class <tt>org.eclipse.smila.processing.parameters.ParameterAccesssor</tt> supports this by checking for requested parameters first in this <tt>_parameters</tt> map, then at the top-level of a record and then in the pipelet configuration. Therefore it's possible to override properties from the pipelet configuration by setting them as task parameters, if the pipelet uses the ParameterAccessor to access parameters in records and configuration. This is done for example by the [[SMILA/Documentation/LuceneIndexPipelet|Lucene indexing]] and [[SMILA/Documentation/SesameOntologyManager|Sesame]] pipelets.
  
If the internal parameter _failOnError is not set in the worker _failOnError=false is set. This means that called pipelets should continue processing records and not stop when defect records are processed. The pipelets should implement this behaviour. How this is done you can find in [http://wiki.eclipse.org/SMILA/Development_Guidelines/How_to_write_a_Pipelet].
+
If the internal parameter <tt>_failOnError</tt> was not set before, the worker will set the parameter to "false". This means that the called pipelets should continue processing records and not stop when processing defect records. The pipelets themselves must implement this behavior. How to achieve this is explained in [[SMILA/Development_Guidelines/How_to_write_a_Pipelet|How to write a Pipelet]].
  
 
== Error handling ==
 
== Error handling ==

Revision as of 06:13, 20 September 2011

Note.png
Available since SMILA 0.9.0!


PipeletWorker (bundle org.eclipse.smila.processing.worker)

The PipeletWorker is a worker designed to execute a single pipelet directly, without pipeline overhead.

JavaDoc

This page gives only a rough overview of the service. Please refer to the JavaDoc for detailed information about the Java components.

Configuration

The PipeletWorker is configured via incoming task parameters. These parameters could have been set e.g. in a job definition.

Parameter Description Default value
pipeletName Name of the pipelet to execute ---

Sample job definition that sets the parameters:

{
  "name":"myJob",
  "parameters":{
    "pipeletName": "MySamplePipelet",
    ...
   },
  "workflow":"myWorkflow"
}

PipeletWorker definition in workers.json

GET /smila/jobmanager/workers/pipeletWorker/

HTTP/1.x 200 OK

{
  "name" : "pipeletWorker",
  "readOnly" : true,
  "parameters" : [ {
    "name" : "pipeletName"
  }],
  "input" : [ {
    "name" : "input",
    "type" : "recordBulks"
  } ],
  "output" : [ {
    "name" : "output",
    "type" : "recordBulks",
    "modes" : [ "optional" ]
  } ]
}

The output bucket of the worker is optional, hence in an asynchronous workflow the worker does not need to have a successor. If the output bucket is not defined, the result records of the pipeline processing are not persisted to a bulk, but thrown away. This makes sense if the pipelet stores the records somewhere itself, e.g. adds them to an index.

Access task parameters in pipelets

The worker adds all task parameters to a map in attribute _parameters in each record before giving it to the pipelet which can access them. The helper class org.eclipse.smila.processing.parameters.ParameterAccesssor supports this by checking for requested parameters first in this _parameters map, then at the top-level of a record and then in the pipelet configuration. Therefore it's possible to override properties from the pipelet configuration by setting them as task parameters, if the pipelet uses the ParameterAccessor to access parameters in records and configuration. This is done for example by the Lucene indexing and Sesame pipelets.

If the internal parameter _failOnError was not set before, the worker will set the parameter to "false". This means that the called pipelets should continue processing records and not stop when processing defect records. The pipelets themselves must implement this behavior. How to achieve this is explained in How to write a Pipelet.

Error handling

The following errors may occur when a task for the PipelineProcessingWorker is processed:

  • Pipelet parameter missing or invalid parameter
    • If the given pipelet parameter is not set (or invalid) the task will fail with a non-recoverable error.
  • ProcessingException while processing a bunch of parallel records.
    • Recoverable ProcessingException: The current task will fail with a recoverable error, so the whole task (with all records) will be repeated.
    • Non-recoverable ProcessingException: An error will be logged and the worker will continue with the next bunch of records. The records of the current bunch will be lost. (This is implemented in a way as to not fail the whole task with all its input records in case of a single record defect.)

Back to the top