Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "SMILA/Specifications/DeltaIndexingAndConnectivtyDiscussion09"
(→Separate Interfaces for ConnectivityManager and DeltaIndexingManager: moved to own page) |
(refactored this page and moved implemented changes to their own pages) |
||
Line 1: | Line 1: | ||
− | == | + | == Motivation for this page and usage == |
− | + | the current implementation for the [[SMILA/Documentation/DeltaIndexingManager | DeltaIndexingManager]] has several problems or short comings which are listed under the section [[#Ideas (under discussion)|Ideas (under discussion)]]. if the idea is rather large, an own page is usually better and should be created as a child to this page. it still should have an own section that at least must contain a link to the page.. | |
− | + | The initiating authors should edit only their own sections and not those of others. | |
− | + | ||
+ | each subsection/page should state: | ||
+ | * context such as: author, data, based on SVN revision | ||
+ | * motivation/problem | ||
+ | * a solution proposal | ||
+ | ideas that have been implemented are moved to their own page and referenced in [[#Implemented Changes|Implemented Changes]]. | ||
− | == | + | == Ideas and Problems (under discussion) == |
− | + | ||
− | + | ||
+ | === DeltaIndexing reflects crawl state rather than index state === | ||
− | + | One Problem at the moment is, that because SMILA's processing of incoming Records is asynchronous, DeltaIndexing does NOT really reflect the state of a Record in the index, as there is no guarantee that a Record is indexed after it was successfully added to the Queue. This could be achieved by implementing Notifications that update the DeltaIndexing state using this information. If this is done, then the computation of DeltaIndexing-Delete has to wait for all Queue entries to pass the workflow. This is a complex process which seems to be error-prone. Is it really necessary to reflect the index state or is it enough to reflect the last crawl state ? | |
− | + | === Extract Session Interface from DeltaIndexingManager === | |
− | + | ||
− | === | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
For a better separation of tasks and an easy handling of locks on data sources during a delta indexing run, we could introduce the following interfaces. The implementations should only be proxies using the same DeltaIndexingManager service implementation, so that a DeltaIndexingSession may internally use another service if the initial one becomes unavailable. | For a better separation of tasks and an easy handling of locks on data sources during a delta indexing run, we could introduce the following interfaces. The implementations should only be proxies using the same DeltaIndexingManager service implementation, so that a DeltaIndexingSession may internally use another service if the initial one becomes unavailable. | ||
Line 106: | Line 89: | ||
<b>This approach was not realized.</b> | <b>This approach was not realized.</b> | ||
But a sessionId was introduced to distinguish between different sessions without relying on thread ids. See [https://bugs.eclipse.org/bugs/show_bug.cgi?id=279243 https://bugs.eclipse.org/bugs/show_bug.cgi?id=279243] | But a sessionId was introduced to distinguish between different sessions without relying on thread ids. See [https://bugs.eclipse.org/bugs/show_bug.cgi?id=279243 https://bugs.eclipse.org/bugs/show_bug.cgi?id=279243] | ||
+ | |||
+ | |||
+ | == Implemented Changes == | ||
+ | |||
+ | {{CTable}} | ||
+ | | Page || Date || Bug || Author(s) | ||
+ | |- | ||
+ | | [[SMILA/Specifications/DeltaIndexingAndConnectivtyDiscussion09/Turning_DeltaIndexing_On_or_Off |New Feature: DeltaIndexing On/Off ]] || 2009-06-10 || {{bug|279242}} || DS | ||
+ | |- | ||
+ | | [[SMILA/Specifications/DeltaIndexingAndConnectivtyDiscussion09/Separate_Interfaces_for_ConnectivityManager_and_DeltaIndexingManager| Separate Interfaces for ConnectivityManager and DeltaIndexingManager ]] || 2008-06 ? || ? || DS? | ||
+ | |- | ||
+ | | [[SMILA/Specifications/DeltaIndexingAndConnectivtyDiscussion09/Usage_of_DeltaIndexingManager_by_CrawlerControler_alone|Usage of DeltaIndexingManager by CrawlerControler alone]] || ? || ? || DS? | ||
+ | |} |
Revision as of 13:44, 15 October 2009
Contents
Motivation for this page and usage
the current implementation for the DeltaIndexingManager has several problems or short comings which are listed under the section Ideas (under discussion). if the idea is rather large, an own page is usually better and should be created as a child to this page. it still should have an own section that at least must contain a link to the page..
The initiating authors should edit only their own sections and not those of others.
each subsection/page should state:
- context such as: author, data, based on SVN revision
- motivation/problem
- a solution proposal
ideas that have been implemented are moved to their own page and referenced in Implemented Changes.
Ideas and Problems (under discussion)
DeltaIndexing reflects crawl state rather than index state
One Problem at the moment is, that because SMILA's processing of incoming Records is asynchronous, DeltaIndexing does NOT really reflect the state of a Record in the index, as there is no guarantee that a Record is indexed after it was successfully added to the Queue. This could be achieved by implementing Notifications that update the DeltaIndexing state using this information. If this is done, then the computation of DeltaIndexing-Delete has to wait for all Queue entries to pass the workflow. This is a complex process which seems to be error-prone. Is it really necessary to reflect the index state or is it enough to reflect the last crawl state ?
Extract Session Interface from DeltaIndexingManager
For a better separation of tasks and an easy handling of locks on data sources during a delta indexing run, we could introduce the following interfaces. The implementations should only be proxies using the same DeltaIndexingManager service implementation, so that a DeltaIndexingSession may internally use another service if the initial one becomes unavailable.
interface DeltaIndexingManager { /** * Initializes a new DeltaIndexingSession if the datasource is not locked. */ DeltaIndexingSession init(String dataSourceID) throws DeltaIndexingException; /** * Clear all data sources that are not locked. */ void clear() throws DeltaIndexingException; /** * Clears the data source if not locked. */ void clear(String dataSourceID) throws DeltaIndexingException; /** * Unlocks all data sources by force. */ void unlockDatasources() throws DeltaIndexingException; /** * Checks if a data source exists. */ boolean exists(String dataSourceId); }
interface DeltaIndexingSession { /** * Checks if the id needs to be updated. */ boolean checkForUpdate(Id id, String hash) throws DeltaIndexingException; /** * Maks the id as visited. */ void visit(Id id, String hash) throws DeltaIndexingException; /** * Returns an iterator over all unvistied ids of the data source */ Iterator<Id> obsoleteIdIterator(String dataSourceID) throws DeltaIndexingException; /** * Returns an iterator over all unvistied ids of a parent id (compound objects) */ Iterator<Id> obsoleteIdIterator(Id id) throws DeltaIndexingException; /** * Deletes the id. */ void delete(Id id) throws DeltaIndexingException; /** * Finishes the deltaindexing run and unlocks the data source. */ void finish(String dataSourceID) throws DeltaIndexingException; }
This approach was not realized. But a sessionId was introduced to distinguish between different sessions without relying on thread ids. See https://bugs.eclipse.org/bugs/show_bug.cgi?id=279243
Implemented Changes
Page | Date | Bug | Author(s) |
New Feature: DeltaIndexing On/Off | 2009-06-10 | bug 279242 | DS |
Separate Interfaces for ConnectivityManager and DeltaIndexingManager | 2008-06 ? | ? | DS? |
Usage of DeltaIndexingManager by CrawlerControler alone | ? | ? | DS? |