Difference between revisions of "SMILA/Documentation/ConnectivityFramework"
|Line 1:||Line 1:|
== Overview ==
== Overview ==
Latest revision as of 04:35, 24 January 2012
The Connectivity framework, as the name suggests, provides a framework to easily integrate data from external systems into SMILA. To access external data two kinds of components are supported: Agents and Crawlers. To integrate some new datasource type into SMILA just a new Agent or Crawler has to be implemented.
Here is a short overview of the components of the ConnectivityFramework:
- AgentController: The AgentController implements the general processing logic common for all types of Agents. It's service interface is used by Agents to trigger add/update/delete actions. This component is not yet implemented!
- Agents: Agents monitor datasources for changes (add/update/delete) or are triggered by events (e.g. trigger in databases) and report those changes to the AgentController. Currently we do not provide any Agent implementation!
- CrawlerController: The CrawlerController implements the general processing logic common for all types of Crawlers.It's service interface is used by clients (e.g. JMX console) to start/stop crawls.
- Crawlers: A Crawler crawls a data source (e.g. a filesystem or a website) and returns all found data objects.
- CompoundManagement: Provides extractors for certain MimeTypes (e.g. zip, chm) and handles the processing of compound objects.
In addition there are three components that are not part of the ConnectivityFramework, but that interact with it:
- ConnectivityManager: The ConnectivityManager is the single point of entry for data in the SMILA. The Agent- and CrawlerController push the data through this component into the Queue.
- DeltaIndexingManager: The DeltaIndexingManager provides functionailty to decide wheter a record needs to be updated and sent to the ConnectivityManager or not.
- Configuration Management: This component is not yet implemented. It is designed to manage configurations for all kinds of services, e.g. DataSources for crawlers. At the moment all configurations have to be provided locally in the SMILA configuration folder.
The red labeled components are not yet implemented.
There is no overall configuration for the framework. Check out the documentation of each framework component for detailed infomation.
org.eclipse.smila.connectivity.framework.performancecounters.ConnectivityPerformanceAgent defines many common performance counters for crawlers and agents. Crawler and agent implementations can extend this class to provide additional specific counters, or just use this class if the common counters are sufficient.
The common counters are:
- startDate: date/time when importer was started
- endDate: date/time when importer has finished or was stopped
- jobName: name of job to which records where submitted
- importRunId: ID of the importer run
- records: number of records created by importer
- deltaIndices: number of requests to delta indexing manager
- averageRecordsProcessingTime: time since start divided by "records" in milliseconds
- averageDeltaIndicesProcessingTime: time since start divided by "deltaIndices" in milliseconds
- attachmentBytesTransferred: complete size of attachments added to records
- attachmentsTransferRate: time since start divided by attachmentBytesTransferred
- exceptions: number of non-fatal errors during importing
- exceptionsCritical: number of fatal errors during importing
- errorBuffer: List of descriptions of the last 10 exceptions.