Jump to: navigation, search

SMILA/Documentation/Worker/ScriptProcessorWorker

ScriptProcessorWorker (bundle org.eclipse.smila.scripting)

Note.png
Available since SMILA 1.3!


The ScriptProcessorWorker is a worker designed to process (synchronous) script calls inside an asynchronous workflow. The worker in principal is independent of a dedicated script processing implementation, however, in SMILA we use scripting with JavaScript.

The scripts that can be used for execution are those defined in the SMILA configuration: <SMILA>/configuration/org.eclipse.smila.scripting/js


Configuration

The ScriptProcessorWorker is configured via incoming task parameters. These parameters can be set, for example, in a job definition.

Parameter Description Default value
script Required parameter for the name of the script (= filename without .js extension).

Can be overwritten by each input record containing an attribute "_script" with a single string value to choose a different script to process this record. If this attribute does not specify an existing script (or the value is not a single string), the script given by the task parameters is used to process the record (and a warning is written to the log file).

---
function Optional parameter for the name of the script function to use for processing records.

Can be overwritten by each input record containing an attribute "_function" with a single string value to choose a different function to process this record. If this attribute value is not valid, the function given by the task parameters is used to process the record (and a warning is written to the logfile).

processRecord
initializeFunction Optional parameter for the name of the script function called once to use for initializing the script. (This parameter cannot be overwritten in the record.)

Hint: The script must implement the function - you can leave it empty if you don't need it, but it must be defined!

prepare
writeAttachmentsToOutput Optional parameter. By default, attachments on incoming records are also added to the output records (if any are written). If this parameter is set to false, only record metadata is written to the output bulk. This can save a lot of IO if attachments are not needed anymore in the workflow after this worker. true


Sample job definition that sets these parameters:

{
  "name":"myJob",
  "parameters":{
    "script": "myScript",
    "function": "myFunction",
    "initializeFunction": "myPrepare",
    ...
   },
  "workflow":"myWorkflow"
}

Worker Definition

Can be found:

  • in the configuration file: configuration/org.eclipse.smila.jobmanager/workers.json
  • via REST API: GET /smila/jobmanager/workers/scriptProcessor

Input / Output

Excerpt from the worker description:

... 
"input": [
   {
     "name":"input",
     "type":"recordBulks"
   }
 ],
"output": [
   {
     "name":"output",
     "type":"recordBulks",
     "modes":[
       "optional",
       "maybeEmpty"
     ] 
  } 
]

The output bucket of the worker is optional, hence in an asynchronous workflow the worker does not need to have a successor. If the output bucket is not defined, the result records of the script processing are not persisted to a bulk, but thrown away. This makes sense if the script stores the records somewhere itself, e.g. adds them to an index.

There are two ways to write records to the output bulk:

  • the script can either return 1 record. If the script returns multiple records, they will be wrapped in one record which is probably what you want.
  • to actually return multple records, the script can use the emit(record) which is registered in the script scope by the ScriptProcessorWorker (it is not available when scripts are called directly via the ReST API or Scripting Engine Service). The record passed to emit() is written to the output bulk immediately so further changes to the record object will not change the written record in the bulk anymore.

If the script neither returns nor emits a record, nothing will be written to the output bulk for the processed input record. Thus it is possible to drop records. It's also perfectly OK to drop all records this way, the task will still be finished as successful in this case.

Accessing Task Parameters in Pipelets

The worker adds all task parameters to a map in attribute _parameters in each record before giving it to the ScriptProcessor, so each pipelet invoked from the script can access them. The helper class org.eclipse.smila.processing.parameters.ParameterAccesssor supports this by checking for requested parameters first in this _parameters map, then at the top-level of a record and then in the pipelet configuration. Therefore it's possible to override properties from the pipelet configuration by setting them as task parameters, if the pipelet uses the ParameterAccessor to access parameters in records and configuration. This is done for example by the SetValuePipelet.

In contrast to the PipelineProcessingWorker, the worker does not set parameter _failOnError=false to prevent pipelets from throwing exceptions. This was mainly necessary when processing multiple input records in one pipeline call (pipelineRunBulkSize > 1). This is not done by the ScriptProcessingWorker at all, exceptions thrown by scripts and embedded pipelets are handled by the worker. However, if you need this behavior you can easily set the parameter in your script, provided that the pipelets themselves implement this behavior. How to achieve this is explained in How to write a Pipelet.

Error Handling

The following errors may occur when a task for the ScriptProcessorWorker is processed:

  • Parameter sets an invalid value
    • If a script or function parameter is set to an invalid value, the task will fail with a non-recoverable error.
  • Errors in prepare
    • If the prepare function fails with a recoverable error, the task will be retried.
    • The task will fail for other errors in the prepare function.
  • ScriptingEngineException while processing a record.
    • Recoverable ScriptingEngineException: The current task will fail with a recoverable error, so the whole task (with all records) will be repeated.
    • Non-recoverable ScriptingEngineException: An error will be logged and the worker will continue with the next record. The current record will be lost. (This is implemented in a way as to not fail the whole task with all its input records in case of a single record defect.)
    • However, if all records fail with a non-recoverable error, the task will be finished as failed, too.

Special Logging

Scripts executed by the ScriptProcessorWorker can use a special log facility named "workerLog". The workerLog supports three log levels (info, warn, error) and for each log level provides three methods, e.g. for level "warn":

workerLog.warn(String message);
workerLog.warn(String message, AnyMap details);
workerLog.warn(String message, Throwable cause);

Hint: To log exceptions thrown by Java classes (pipelets, services, ...), catch the JavaScript Error and use it's property "javaException" when calling the log method:

try {
 // call pipelets or services
} catch (e) {
 workerLog.warn("Error processing record " + record.$id, e.javaException);
}

The levels will be written to the SMILA.log like normal log messages (logger category is "org.eclipse.smila.taskworker.TaskLog"), the details object or the causing exception will be appended to the message. Additionally, the number of "warn" messages logged will be reported in the job run data counters for the scriptProcessor worker as "warnCount", so you can see in the job run data, that there might be something in the log file to check for. For example:

{
  "endTime" : "2014-11-07T13:34:56.889+0100",
  "finishTime" : "2014-11-07T13:34:56.429+0100",
  "jobId" : "20141107-133456202375",
  "mode" : "STANDARD",
  "startTime" : "2014-11-07T13:34:56.255+0100",
  "state" : "SUCCEEDED",
  "workflowRuns" : {
    "activeWorkflowRunCount" : 0,
    ...
  },
  "tasks" : {
    "createdTaskCount" : 2,
    ...
  },
  "worker" : {
    "1_scriptProcessor" : {
      "warnCount" : 3,
      "duration" : 0.079656584,
      ...

Other Extras

  • The org.eclipse.smila.taskworker.TaskContext object of the current worker task can be access via global property "taskContext". For example this makes it possible to access the task properties (job name and run id etc.):
var jobRunId = taskContext.getTask().getProperties().get("jobRunId");