Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

SMILA/Project Concepts/Controlling Tasks Order Concept

Controlling Order of Tasks in the Workflow / Race Conditions

Our current Workflow allows that more than one record regarding the same Data Source Entry is processed by the framework at once.

Therefore we have several Problems cases regarding our open structure. Could the current System run with different records that are for the same Data Source Entry?

And when yes: A newer record could be processed faster (because he is transfered through less GFPs (BPel-Processes) or less Queue.

Simple Scenarios: - add Record is sent to framework and a short later arrives a delete Record. Because the Delete Record has not been processed by BPEL it would executed before the result of the add record is ready to put into the index - two add records are sent to the framework, we would do the processing twice, but we have no advantage of it, we could purge the first record


The problem of “two records with the same ID but different data or initial operation”.

Current workflow process designed for exclusive and consequent record processing. Its assumed that there is “start of processing” ( record crawled ), some business processes executing consequently for record and there is a “process finish” ( record stored into index ). Record between processing is stored in the Blackboard cache ( also finally its stored in XmlStorage and BinStorage ). From the other side execution of business processes is asynchronous (via queue Listener). Blackboard based workflow scheme is unable to work normally with asynchronous processes. Its “assumed” that we process the same record until “process finish”.

Solutions

  1. To block new record from processing until previously put record with the same id processing finished.
    1. Its required some special additional storage for delayed records.
    2. its not clear then previous record is “finished processing”
  2. To avoid Blackboard usage and put record completely into queue.
  3. To stop/reject records processing if timestamp is older the last one.
    1. Really minimum changes in current workflow
    2. Its required some additional but simple service for generating/validating timestamps

The main advantage of the first solution that every record modification will be processed. The main disadvantage that is makes record processing synchronous. And there is a problem that if processing of some record failed, it may totally stops future processing of records with this ID.

For current functionality I prefer the last one (stop/reject records processing by timestamp), because its more effective (asynchronous) and safe. Unfortunately some record changes may be lost. Now we don't need them but we may easily imagine some new pipelet that stores/tracks record changes.

It's suggested to add "timestamp" field into Id and to compare Ids by two operations equals and equivalent


More complex solution

I may try to suggest base for more complex solution. The main idea is to adopt Blackboard for editing multiple record versions. The following list of requirements represents the idea. But I'm not shure that its required now.

  1. “Timestamp service” used for generating/validating record timestamps.
  2. Blackboard supports editing of records with multiple versions (separated by timestamp).
  3. Attachments saved into BinStorage with timestamp during processing.
  4. When some process wants to commit record (from Blackboard into XmlStorage),it will commit only if its the last one
  5. Other behavior is to store into XmlStorage all record versions with timestamps.

Back to the top