Jump to: navigation, search

SMILA/Specifications/DeltaIndexingAndConnectivtyDiscussion09/Separate Interfaces for ConnectivityManager and DeltaIndexingManager

Separate Interfaces for ConnectivityManager and DeltaIndexingManager

Note.png
IMPLEMENTED
In summer 2008


Motivation

the API of the ConnectivityManager includes parts of the API of the DeltaIndexingManager, which makes it more complex than necessary. Also it implicates that the ConnectivityManager has an internal state, as DeltaIndexing for a DataSource has to be initialized and finalized. This interfaces forces it's clients to make use of DeltaIndexing and to follow a strict workflow (initialize, add records, optionally call DeltaIndex-Delete and delete the returned IDs, finish). Even if this usage was configurable, the API is - simply spoken - ugly.

Proposal

I suggest to separate ConnectivityManager interface and DeltaIndexingManager interface. It makes both APIs more clear and focused. We should think about SMILA more of a "construction kit" than a "ready for all issues salvation". E.g. if someone wants to connect to SMILA, not using Crawlers or Agents but using the benefits of DeltaIndexing, all the components he needs are there. He can implement his own importer using the DeltaIndexingManager and ConnectivityManager interfaces. There is no need to provide the whole functionality "en-block". At the moment I see no urgend need for a remote interface (SCA). This could be neccessary in certain deploymnent scenarios, where the same DataSource (e.g. a website) is crawled by various Crawer/CrawlerController combinations and therefore must be handled by the same DeltaIndexingManager. But this coul'd also be achieved by the corresponding implementation itself (e.g. a DeltaIndexingManager that holds it's state in a distributed database). If we decide that an SCA interfacet is needed, it can be added easily.

interface ConnectivityManager
{
  int add(Record[] records) throws ConnectivityException;
  int update(Record[] records) throws ConnectivityException; // optional
  int delete(Id[] ids) throws ConnectivityException;
}


interface DeltaIndexinManager
{
    void init(String dataSourceID) throws DeltaIndexingException;
    boolean checkForUpdate(Id id, String hash) throws DeltaIndexingException;
    void visit(Id id, String hash) throws DeltaIndexingException;
    Iterator<Id> obsoleteIdIterator(String dataSourceID) throws DeltaIndexingException;
    void finish(String dataSourceId) throws ConnectivityException;
    ...
    // same functionality for Compound objects, remember not to overload methods when using SCA
}


Notes: If calls to ConnectivityManager are NOT relevant for DeltaIndexingState (e.g. if it's enough that a call of add/delete succeeded, not the successfull adding to the Queue is required) they could forego a return value and the ConnectivityException and then in the SCA interface these methods could be annotated with @oneway to improve performance. Via callbacks it would still be possible to send back information asynchronously. But if feedback is required, the synchronous method call is much easier to use.