Jump to: navigation, search

Difference between revisions of "SMILA/Specifications/DeltaIndexingAndConnectivtyDiscussion09/Usage of DeltaIndexingManager by CrawlerControler alone"

(Replacing page with 'Obsolete page. Can be deleted.')
 
Line 1: Line 1:
==== Usage of DeltaIndexingManager by CrawlerControler alone ====
+
Obsolete page. Can be deleted.
 
+
Here is another idea based on the changes introduced with [[SMILA/Specifications/DeltaIndexingAndConnectivtyDiscussion09/Separate_Interfaces_for_ConnectivityManager_and_DeltaIndexingManager]] but taking it further that not the CrawlerController communicates with DeltaIndexingManager but each Crawler.
+
 
+
{{note|implemented|date: ??}
+
 
+
 
+
# DeltaIndexing used by Crawlers: This is a radical change as this also affects the Crawler interface. Crawlers could directly communicate with the DeltaIndexingManager and provide only those Records that pass DeltaIndexing (are new, nedd an update). CrawlerController and Crawler could implement a Consumer/Producer pattern which should improve performance. No more sending of arrays with DIInformation and thereafter retrieving the Record objects. DeltaIndexing-Delete information is computed in the Crawler and can passed to the CrawlerController as regular Records (only the ID is set) and a delete flag to notify the CrawlerController that this Record is to be deleted. This should reduce communication overhead, as the DIInformation has not to be passed between multiple components and the whole process can work multithreaded. Of course this adds a lot more logic to the Crawler and demands more knowledge from a Crawler developer. It would also mean that ID and HASH are generated in the Crawler. The downside is that each Crawler has to implement the DeltaIndexing workflow themselves. <br>We could even move all execution logic to the Crawler. CrawlerController would become obsolete. Then Crawlers would handle everything themselves - communication with DeltaIndexingManager, CoumpoundHandlers and ConnectivityManager. I think in this way the best performance can be achieved, as the setup is the very simple. No unnecessary passing of data between components. But a lot of logic has to be re-implemented in every Crawler. I wonder if there is a chance to minimize this.
+

Latest revision as of 03:38, 16 October 2009

Obsolete page. Can be deleted.