SMILA/Documentation/Agent

Overview

An Agent monitors a data source for changes, sending both content and metadata of interest about new/modified resources and Ids of deleted resources.

SMILA currently comes with two types of Agents, each for a different datasource type, namely MockAgent (a sample implementation of an agent) and a FeedAgent that enables monitoring of RSS and atom feeds. Furthermore, the Connectivity Framework provides an API for developers to create their own Agents.

API

An Agent has to implement interface Agent which extends interface Runnable. The easiest way to achieve this is to extend the abstract base class AbstractAgent located in bundle org.eclipse.smila.connectivity.framework. This class already contains handling for the Agents Id, an OSGI service activate method and also default implementations for the start() and stop() methods creating a new Thread for the Agent to run in. So the only method that has to be implemented is method run() of the Runnable interface which contains the processing logic of the agent.

/**
 * The Interface Agent.
 */
public interface Agent extends Runnable {
 
 /**
   * Returns the ID of this Agent.
   * 
   * @return a String containing the ID of this Agent
   * 
   * @throws AgentException
   *           if any error occurs
   */
  String getAgentId() throws AgentException;
 
  /**
   * Starts the agent using the given configuration, creating a new internal thread.
   * 
   * @param controllerCallback
   *          reference to the interface AgentControllerCallback
   * @param agentState
   *          the AgentState
   * @param config
   *          the DataSourceConnectionConfig
   * @param sessionId
   *          the delta indexing session id
   * 
   * @throws AgentException
   *           if any error occurs
   */
  void start(final AgentControllerCallback controllerCallback, final AgentState agentState,
    final DataSourceConnectionConfig config, final String sessionId) throws AgentException;
 
  /**
   * Stops the agent.
   * 
   * @throws AgentException
   *           if any error occurs
   */
  void stop() throws AgentException;
}

Architecture

Agents are managed and instantiated by the AgentController. The AgentController communicates with the Agent via interface Agent, starting or stopping the agent. As long as the agent is running it communicates with the AgentController via the callback interface AgentControllerCallback to send add and delete events to the AgentController. The agent itself has no reference to DeltaIndexingManager, only the AgentController who initializes the delta indexing session has one. To identify the session the parameter sessionId is passed in method start(final AgentControllerCallback controllerCallback, final AgentState agentState, final DataSourceConnectionConfig config, final String sessionId) so that the Agent can send it back to the AgentController via interface AgentControllerCallback. Agents extend the Runnable interface and must implement method run(). There is already some functionality included in the abstract base class AbstractAgent for thread handling. In the start() method a new Thread is created for the Agent and stored in a private member variable. It also contains a private boolean flag _stopThread. The run() method should watch this flag using method isStopThread() to check when processing should end. Here is some skelleton code of how the implementation could look like:

  /**
   * Skelleton code for the run() method.
   * @see java.lang.Runnable#run()
   */
  public void run() {
    try {
      while (!isStopThread()) {
        try {
 
            // here goes the agent business logic
 
        } catch (InterruptedException e) {
          if (_log.isTraceEnabled()) {
            _log.trace("agent thread was interrupted ", e);
          }
        }
      }
    } catch (Exception e) {
      throw new RuntimeException(e);
    } catch (Throwable t) {
      throw new RuntimeException(t);
    } finally {
      try {
        stop();
      } catch (Exception e) {
        throw new RuntimeException(e);
      }
    }
  }

Package org.eclipse.smila.connectivity.framework.util provides some factory classes for Agents to create Ids, hashes and DataReference objects.

Configuration

An Agent is started with a specific, named configuration, that defines what information is to be sent (e.g. content, kinds of metadata) and where to find that data (e.g. file system path, JDBC Connection String). See each Agent documentation for details on configuration options.

Each Agent can define its own configuration because Agents need different information to monitor different data sources. As example a JDBC-Agent need information about which database and which table should be monitored and which columns should be returned.

Therefore the Agent developer defines a schema that contains all interesting information. This schema is based on a root schema that is shared betweeen Agents and Crawler. It declares the generic framework/frame which has to be used to send DataSourceConnectionConfigs to the SMILA framework. The root-schema can be found in: configuration\org.eclipse.smila.connectivity.framework.schema/schemas/RootDataSourceConnectionConfigSchema.xsd.

The root schema looks like as follows:

DataSourceID: A description string that is used in the whole framework to separate and address information that apply to the same agent
SchemaID: The SchemaID contains the whole bundle name of the Agent (e.g. FeedAgent: org.eclipse.smila.connectivity.framework.agent.feed).
The SMILA Framework uses this information to gather the schema for the validation of the DataSourceConnectionConfig that should be executed.
DataConnectionID: This tag describes if an Agent or Crawler should be used. It contains either of the following tags:

Agent: Crawler

The name that is used in these tags is the Service name of the Agent/Crawler.

CompoundHandling: Configuration options for CompoundHandling. See CompoundManagement for details.

Attributes: Placeholder for each Agent's attribute definition.
Each Agent can define here which Attributes it can return. An attribute is a specific information of an entry in the datasource that is crawled by the Agent (E.g. In a filesystem an entry is a file, and attributes of an file are Size, Content, etc.)

Process: Placeholder for Tags that the Agent developer can define.
In this Tag all information can be transferred for an agent that are necessary to start a monitoring process. These information may include connection information to the data source to monitor or filters ( e.g. queries/wildcards/include/excludes).

Further Information:

See for each Agent Attributes and Process Tags
How to implement an Agent

Agent lifecycle

The AgentController manages the life cycle of the agent (e.g. start, stop, abort) and may instantiate multiple agents concurrently, even of the same type. This is realised by using OSGi ComponentFactories. Each agent does not automatically start an OSGi service, but registers only an Agent ComponentFactory with the AgentController. Via the ComponentFactory the AgentController can instantiate agents on demand.

Here is a template for an agent OSGi component definition

<component name="%AGENT_TYPE%" immediate="false" factory="AgentFactory">
    <implementation class="%AGENT_IMPLEMENTATION_CLASS%" />
    <service>
         <provide interface="org.eclipse.smila.connectivity.framework.agent"/>
    </service>    
</component>

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Documentation/Agent

Contents

Overview

API

Architecture

Configuration

Further Information:

Agent lifecycle

See also

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Documentation/Agent

Contents

Overview

API

Architecture

Configuration

Further Information:

Agent lifecycle

See also