Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Importing/UpdatePusher"

(New page: === Worker description === * Name: <tt>updatePusher</tt> * Parameters: ** <tt>jobToPushTo</tt>: The job to push the crawled records to. * Input Slots: ** <tt>recordToPush</tt>: a bucket ...)
 
(Worker description)
Line 8: Line 8:
 
* Output Slots:
 
* Output Slots:
 
** <tt>pushedRecords</tt>: (optional) the records that could be successfully submitted to the destination job. Usually not set, but may be used to trigger further actions on submitted records.
 
** <tt>pushedRecords</tt>: (optional) the records that could be successfully submitted to the destination job. Usually not set, but may be used to trigger further actions on submitted records.
 +
 +
The UpdatePusher takes each record from the input, sends it to a bulkbuilder service. If an output bucket is connected the record is written to it. If the record contains a <tt>_deltaHash</tt> attribute value, the worker checks with DeltaService if the record has not been pushed yet to prevent duplicates, and marks it updated afterwards. If the <tt>_deltaHash</tt> attribute is empty, the record is pushed always and not marked as updated in DeltaService.
 +
 +
Exception handling of bulkbuilder errors:
 +
* If an InvalidRecordException is thrown by Bulkbuilder it is logged and the record is skipped (and is also not added to the output bulk, if set).
 +
* Other BulkbuilderExceptions are not catched. If they are marked as recoverable they should lead to an retry of the task, else the task will fail fatal.

Revision as of 11:59, 29 November 2011

Worker description

  • Name: updatePusher
  • Parameters:
    • jobToPushTo: The job to push the crawled records to.
  • Input Slots:
    • recordToPush: a bucket of type recordBulks containing the records produced by the crawl workflow.
  • Output Slots:
    • pushedRecords: (optional) the records that could be successfully submitted to the destination job. Usually not set, but may be used to trigger further actions on submitted records.

The UpdatePusher takes each record from the input, sends it to a bulkbuilder service. If an output bucket is connected the record is written to it. If the record contains a _deltaHash attribute value, the worker checks with DeltaService if the record has not been pushed yet to prevent duplicates, and marks it updated afterwards. If the _deltaHash attribute is empty, the record is pushed always and not marked as updated in DeltaService.

Exception handling of bulkbuilder errors:

  • If an InvalidRecordException is thrown by Bulkbuilder it is logged and the record is skipped (and is also not added to the output bulk, if set).
  • Other BulkbuilderExceptions are not catched. If they are marked as recoverable they should lead to an retry of the task, else the task will fail fatal.