SMILA/Specifications/DeltaIndexingAndConnectivtyDiscussion09/Turning DeltaIndexing On or Off
New Feature: DeltaIndexing On/Off
This approach was realized. See https://bugs.eclipse.org/bugs/show_bug.cgi?id=279242
It should be possible to turn the usage of DeltaIndexing on and off, either to reduce complexity or to gain better performance.
A simple boolean logic (on/off) seems to simple, as I see 4 possible use cases (modes):
- FULL: DeltaIndexing is fully activated. This means that
- each Record is checked if it needs to be updated
- for each Record an entry is made/updated in the DeltaIndexingManager
- Delta-Delete is executed at the end of the import
- ADDITIVE: as FULL, but Delta-Delete is not executed (we allow records in the index that do not exist anymore
- INITIAL: For an initial import in an empty index or a new source in an existing index performance can be optimized by
- NOT checking if a record needs to be updated (we know that all records are new)
- adding an entry in the DeltaIndexingManager for each Record. This allows later imports to make use of DeltaIndexing
- NOT: executing Delta-Delete (we know that no records are to be deleted)
- DISABLED: DeltaIndexing is fully deactivated. No checks are done, no entries are created/updated, no Delta-Delete is executed. Later runs cannot benefit from DeltaIndexing
As always, Delta-Delete MUST NOT be executed if any errors occur during import as we do not want to delete records erroneously!
To configure the mode of DeltaIndexing execution, an additional parameter is needed in the IndexOrderConfiguration:
... <xs:element name="DeltaIndexingMode" type="DeltaIndexingModeType"/> <xs:simpleType name="DeltaIndexingModeType"> <xs:annotation> <xs:appinfo> <jxb:class ref="org.eclipse.eilf.connectivity.framework.indexorder.messages.DeltaIndexingModeType"/> </xs:appinfo> </xs:annotation> <xs:restriction base="xs:string"> <xs:pattern value="FULL"/> <xs:pattern value="ADDITIVE"/> <xs:pattern value="INITIAL"/> <xs:pattern value="DISABLED"/> </xs:restriction> </xs:simpleType> ...
... <DeltaIndexingMode>FULL</DeltaIndexingMode> ...
The execution logic has to be added in parts to the CrawlerController (CrawlThread) and ConnectivityManager. Therefore the mode has to be added to the ConnectivityManager interface. The problem is, that initialize and finish still need to be called, and that the MODE then controls if and how DeltaIndexing is used. This makes the usage and implementation of ConnectivityManager more and more complex and obscure (too many special cases).
The execution logic has to be added either
- to the CrawlerController (CrawlThread) only. It decides what actions to perform on the given mode.
- to the Crawler themselves, if the more radical change is implemented