Revision as of 10:50, 2 October 2009

Status
this page is very much a WIP and discussion is still happening on the dev list.

as the concept matures during the discussion this page will be updated in certain intervals.

this enhancement is tracked thru bug 289995

for the development i opened a new branch @ https://dev.eclipse.org/svnroot/rt/org.eclipse.smila/branches/2009-09-23_r608_resequencer

splitting of records
this is more complicated. see the SRS class in code comment for its treatment
overflow of the SN
a reset signal must be sent to SRS
what to do with recods that miss needed config data?
handling depends on the process mode:
- register: error
- resequence: error for now, but outcome could be config'able
handling of unknown IDs

Idea 1 - Handle Resequencing in Connectivity

There was an idea to handle this case in the connectivity directly with the help of a buffer:

each incoming operation for a requested resource is buffered for a period of time X.
X is at minimum as long as the longest processing path takes for any given record. In the beginning this value is certainly chosen manually but with evaluating Performance Counters it should be possible to get X automatically or adjust it.
during the time in the buffer, additional processing requests are consolidated to reduce load

CON

lag
all messages will have the lag of length X before the index is updated. for mass crawling this might be acceptable but in combination with agents this is in most cases not acceptable but will depend on the nature of application SMILA is used for.
no guarantee
delaying processing will reduce the chances of mishaps but there is no guarantee that this is really so. especially when the processing chain is more complex such as in a cluster setup to spread processing load over several nodes. in such a scenario we will also need to take into account that some nodes may be down temporarily while retaining the records that were assigned to them.

PRO

simple to implement and has no effect on the API or other logic

Idea 2 - Full Resequencer

Synopis: The Resequencer will update the lucene index in the exact order as the crawler or agent adds records to the queue.

The processing will be like so:

the router will feed Q1 with ADD and DELETE ops. For the resequencer to know the order a new meta info needs to be added -- the sequence number. it must be generated in the agent or by the agentcontroler
the processing piplines are as normal but
1. w/o the step of calling the lucene service
2. they add the result to a new queue, Q2
the Resequencer will listen on Q2 and picks up all messages
starting with the first record, feed consecutive chunks of IDs to the lucene service
when messages get lost in the DLQ then a timeout can solve this problem

PRO

no processing target can ask for more and correct result is always possible
it is possible to add a note into the index for records ending up in the DLQ, ie. record was not indexed due to processing error.

CON

overkill, b/c there is no need to resequence all messages, only those with same ID

Idea 3 - Smart Resequencer (SRS)

Synopsis: The Smart Resequencer will only resequence the operation on a per record basis.
Rationale: in most cases records are independent of each other and so there is no need of ordering all records.

The processing is very similar to the Full Resequencer:

the router will feed Q1 with ADD and DELETE ops.
For the resequencer to know the order, a new meta info needs to be added -- the sequence number (SN). it must be generated in the agent or by the agent controller
the Resequencer will
1. get copies of the messages sent to Q1 via a 2nd Send task on the router, so that it can know what's going on, i.e. it maps IDs to SN
  these copies just need to contain the id and annotation holding the SN.
2. enhancement:
  the SRS could remove obsolete older messages from Qs to stop them from being processed further.
  this needs the hash and SN to be present as a JMX property for the selector and doesnt cover the case of records currently processed in a pipline.
the processing piplines stay as they are but
1. w/o the step of calling the lucene service
2. they add the result to a new queue, Q2
the Resequencer will listen on Q2 and pick up all messages. for each message:
1. IF its SN matches the one in the map THEN
  call the index service for the record
  ELSE
  ignore and skip operation
  FI
2. remove ID from map

PRO

- smarter ;)

CON

- ossilating resources (that constantly change) will never make it into the index.

Appendix

Abreviations

Abrev	Meaning
SN	Sequence Number
SRS	Smart Resquecer Service

Ideas

replace the SN with a more general ComparableObject

@@ Line 78: / Line 78: @@
-===  Idea 3 - Smart Resequencer  ===
+===  Idea 3 - Smart Resequencer (SRS) ===
-Synopis: The Resequencer will  process only the most recent operation per resource added by the agent, suppressing the older ones
+Synopsis: The Smart Resequencer will only resequence the operation on a '''per record''' basis.<br>
+Rationale: in most cases records are independent of each other and so there is no need of ordering all records.
-The processing will be similar to the Full Resequencer and is then like so:
+The processing is very similar to the Full Resequencer:
-#  the router will feed Q1 with ADD and DELETE ops. For the resequencer to know the order a new meta info needs to be added -- the sequence number. it must be generated in the agent or by the agentcontroler
+#  the router will feed Q1 with ADD and DELETE ops. <br> For the resequencer to know the order, a new meta info needs to be added -- the sequence number (SN). it must be generated in the agent or by the agent controller
 # the Resequencer will
-## subscripe to/peek into Q1 w/o taking messages from the Q. It remembers in a Map the id and seq#.
+## get copies of the messages sent to Q1 via a 2nd Send task on the router, so that it can know what's going on, i.e. it maps IDs to SN  <br> these copies just need to contain the id and annotation holding the SN.
-## when the map contained already an entry for the ID then it may remove messages for this ID from Qs stoping them from being processed further. this needs howevrt the hash tp be present as a JMX property.
+## enhancement: <br> the SRS could remove obsolete older messages from Qs to stop them from being processed further. <br>this needs the hash and SN to be present as a JMX property for the selector and doesnt cover the case of records currently processed in a pipline.
-#  the processing piplines are as normal but
+#  the processing piplines stay as they are but
 ## w/o the step of calling the lucene service
 ## they add the result to a new queue, Q2
 #  the Resequencer will listen on Q2 and pick up all messages. for each  message:
-##  IF its seq# matches the one in the map THEN <br/> call the index service for the record <br/> ELSE <br/> ignore and skip operation <br>FI
+##  IF its SN matches the one in the map THEN <br/> call the index service for the record <br/> ELSE <br/> ignore and skip operation <br>FI
 ## remove ID from map
@@ Line 98: / Line 99: @@
 ==== CON ====
-- ossilating resources (that constantly change) will never make it into the index
+- ossilating resources (that constantly change) will never make it into the index.
 = Appendix =

Operation N	Operation N+1	expected index State after N+1
ADD A,t1	ADD A,t2	A,t2
ADD A,t1	DELETE A,t2	A doesn't exist
DELETE A,t1	ADD A,t2	A exists

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "SMILA/Specifications/ProcessingMessageResequencer"

Revision as of 10:50, 2 October 2009

Contents

The Core Problem

Indexing Requirements

(Special) Cases to be Covered

Idea 1 - Handle Resequencing in Connectivity

CON

PRO

Idea 2 - Full Resequencer

PRO

CON

Idea 3 - Smart Resequencer (SRS)

PRO

CON

Appendix

Abreviations

Ideas

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "SMILA/Specifications/ProcessingMessageResequencer"

Revision as of 10:50, 2 October 2009

Contents

The Core Problem

Indexing Requirements

(Special) Cases to be Covered

Idea 1 - Handle Resequencing in Connectivity

CON

PRO

Idea 2 - Full Resequencer

PRO

CON

Idea 3 - Smart Resequencer (SRS)

PRO

CON

Appendix

Abreviations

Ideas