Difference between revisions of "SMILA/Specifications/DeltaIndexingAndConnectivtyDiscussion09"

From Eclipsepedia

Jump to: navigation, search
(New page: {| | style="background:red;" | <br>WARNING: This page is under construction by Daniel Stucky<br>   |} == Alternative Concept == === Motivation =...)
 
m (Draft)
Line 34: Line 34:
 
** for each Record an entry is made/updated in the DeltaIndexingManager
 
** for each Record an entry is made/updated in the DeltaIndexingManager
 
** Delta-Delete is executed at the end of the import
 
** Delta-Delete is executed at the end of the import
* <b>ADDITIVE</b>: as <b>FULL</b>, but Delta-Delete is not executed (we allow records in thee index that do not exist anymore
+
* <b>ADDITIVE</b>: as <b>FULL</b>, but Delta-Delete is not executed (we allow records in the index that do not exist anymore
 
* <b>INITIAL</b>: For an initial import in an empty index or a new source in an existing index performance can be optimized by
 
* <b>INITIAL</b>: For an initial import in an empty index or a new source in an existing index performance can be optimized by
 
** NOT checking if a record needs to be updated (we know that all records are new)
 
** NOT checking if a record needs to be updated (we know that all records are new)

Revision as of 11:32, 25 August 2008


WARNING: This page is under construction by Daniel Stucky
 


Contents

Alternative Concept

Motivation

Why DeltaIndexing

TODO: describe it

The Problems

TODO: describe it

Draft

TODO: describe it

Interfaces

TODO: Define new Interfaces for DeltaIndexingManager and ConnectivityManager


New Feature: DeltaIndexing On/Off

Motivation

It should be possible to turn the usage of DeltaIndexing on and off, either to reduce complexity or to gain better performance.

Draft

A simple boolean logic (on/off) seems to simple, as I see 4 possible use cases (modes):

  • FULL: DeltaIndexing is fully activated. This means that
    • each Record is checked if it needs to be updated
    • for each Record an entry is made/updated in the DeltaIndexingManager
    • Delta-Delete is executed at the end of the import
  • ADDITIVE: as FULL, but Delta-Delete is not executed (we allow records in the index that do not exist anymore
  • INITIAL: For an initial import in an empty index or a new source in an existing index performance can be optimized by
    • NOT checking if a record needs to be updated (we know that all records are new)
    • adding an entry in the DeltaIndexingManager for each Record. This allows later imports to make use of DeltaIndexing
    • NOT: executing Delta-Delete (we know that no records are to be deleted)
  • DISABLED: DeltaIndexing is fully deactivated. No checks are done, no entries are created/updated, no Delta-Delete is executed. Later runs cannot benefit from DeltaIndexing

As always, Delta-Delete MUST NOT be executed if any errors occur during import as we do not want to delete records erroneously!

Configuration

To configure the mode of DeltaIndexing execution, an additional parameter is needed in the IndexOrderConfiguration:

XML-Schema:

...
<xs:element name="DeltaIndexingMode" type="DeltaIndexingModeType"/>
 
  <xs:simpleType name="DeltaIndexingModeType">
    <xs:annotation>
      <xs:appinfo>
        <jxb:class ref="org.eclipse.eilf.connectivity.framework.indexorder.messages.DeltaIndexingModeType"/>
      </xs:appinfo>
    </xs:annotation>
    <xs:restriction base="xs:string">
      <xs:pattern value="FULL"/>
      <xs:pattern value="ADDITIVE"/>
      <xs:pattern value="INITIAL"/>
      <xs:pattern value="DISABLED"/>
    </xs:restriction>
  </xs:simpleType>
...

XML example

...
<DeltaIndexingMode>FULL</DeltaIndexingMode>
...


Implementation

Current Concept

The execution logic has to be added in parts to the CrawlerController (CrawlThread) and ConnectivityManager. Therefore the mode has to be added to the ConnectivityManager interface.


Alternative Concept

The execution logic has to be added to the CrawlerController (CrawlThread) only. It decides what actions to perform on the given mode.