Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

SMILA/Documentation/CompoundManagement

< SMILA‎ | Documentation
Revision as of 06:29, 27 May 2009 by Unnamed Poltroon (Talk) (New page: == Overview == CompoundManagement in Smila is an extendable set of components. The central component is the CompoundManager. It manages CompoundHandlers that are each capable of extractio...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

CompoundManagement in Smila is an extendable set of components. The central component is the CompoundManager. It manages CompoundHandlers that are each capable of extraction elements of certain types of files (like zip or chm). Each CompoundHandler registers itself at the CompoundManager providing a list of supported mime types. The CompoundManager provides functionality to check if a given record contains a compound. It uses a MimetypeIdentifier to identify the mime type of the given record and checks if any registered CompoundHandler is capable of processing records this mime type. It then delegates the processing to the CompoundHandler which in turn creates a CompoundCrawler over the extracted elements of the comnpound record and passes the CompoundCrawler back. CompoundCrawlers are just like regular Crawlers. The difference is that they work on the given compound record only and not on an external data source.

The following chart shows all CompoundManagement components:

CompoundManagement.png

Note DeltaIndexing does not support handling of compound elements, yet. A second run on an unmodified data source containing compounds will lead to the deletion of all compound elements. This feature will be added in M3.


API

/**
 * The Interface CompoundManager.
 */
public interface CompoundManager {
 
  /**
   * Checks if a record is a compound object.
   * 
   * @param record
   *          the Record
   * @param config
   *          the DataSourceConnectionConfig
   * @return true if the record is a compound object and is extractable by this CompoundManager, false otherwise
   * @throws CompoundException
   *           if any error occurs
   */
  boolean isCompound(final Record record, final DataSourceConnectionConfig config) throws CompoundException;
 
  /**
   * Extracts the elements of the given record and returns a Crawler over the extracted elements.
   * 
   * @param record
   *          the Record
   * @param config
   *          the DataSourceConnectionConfig
   * @return a Crawler interface over the extracted elements
   * @throws CompoundException
   *           if any error occurs
   */
  Crawler extract(final Record record, final DataSourceConnectionConfig config) throws CompoundException;
 
  /**
   * Adopts the input record according to the given configuration. The record may be left unmodified, modified or even
   * set to null.
   * 
   * @param record
   *          the Record
   * @param config
   *          the DataSourceConnectionConfig
   * @return the adopted record
   * @throws CompoundException
   *           if any error occurs
   */
  Record adoptCompoundRecord(final Record record, final DataSourceConnectionConfig config) throws CompoundException;
}
/**
 * The Interface CompoundHandler.
 */
public interface CompoundHandler {
 
  /**
   * Gets the mime types the CompoundHandler is capable to extract.
   * @return a Collection of mime types the CompoundHandler is capable to extract.
   */
  Collection<String> getSupportedMimeTypes();
 
  /**
   * Extracts the elements of the given record and returns a Crawler over the extracted elements.
   * @param record
   *          the Record
   * @param config
   *          the DataSourceConnectionConfig
   * @return a Crawler interface over the extracted elements
   * @throws CompoundException
   *           if any error occurs
   */
  Crawler extract(final Record record, final DataSourceConnectionConfig config) throws CompoundException;
}
/**
 * The Interface CompoundCrawler.
 */
public interface CompoundCrawler extends Crawler {
 
  /**
   * Sets the compound record to extract data from.
   * 
   * @param record
   *          the compound Record
   * @throws CrawlerException
   *           if parameter record is null
   */
  void setCompoundRecord(final Record record) throws CrawlerException;
 
  /**
   * Gets the compound record.
   * 
   * @return the compound record.
   */
  Record getCompoundRecord();
}


Implementations

It is possible to provide different implementations for all components. Most important is that it is easy to extend CompoundHandling by providing own CompoundHandler implementations.

org.eclipse.smila.connectivity.framework.impl

This bundle contains the default implementation of the CompoundManager interface as well as some abstract base classes for CompoundHandlers and CompoundCrawlers.

The CrawlerController implements the general processing logic common for all types of Crawlers. Its interface is a pure management interface that can be accessed by its Java interface or its wrapping JMX interface. It has references to the following OSGi services:

  • MimeTypeIdentifier (1..1)
  • CompoundHandler (0..n)

CompoundHandlers register themselves at the CompoundManager.

The method adoptCompoundRecord() is not implemented, yet. It just returns the unmodified input record.



org.eclipse.smila.connectivity.framework.compound.zip

This bundle contains an implementation to handle zip archives. It can handle the MimeTypes

  • application/zip
  • application/x-gzip
  • application/java-archive

It provides the OSGi Declarative Services ZipCompoundHandler and ZipCompoundCrawler. As with regular Crawlers the ZipCompoundCrawler is a ComponentFactory. Each time method extract(...) is called on the ZipCompoundHandler a new instance of a ZipCompoundCrawler is created. Both services don't have any dependencies to other services, except that ZipCompoundHandler references the ZipCompoundCrawler.

Note The extract functionality is implemented using standard JDK zip file handling. Therefore only the archives must only contain filenmaes in UTF-8 encoding. Lot of zip tools doe not use UTF-8 but the platform default encoding. This will lead to errors for some characters (e.g. German Umlaute).

Configuration

There are no configuration options available for this bundle.

Back to the top