Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/AgentController"

(Agent Datasource Listing)
 
Line 1: Line 1:
 +
{{note|This is deprecated for SMILA 1.0, the connectivity framework is still functional but will aimed to be replaced by scalable import based on SMILAs job management.}}
 +
 
== Overview ==
 
== Overview ==
  

Latest revision as of 05:42, 24 January 2012

Note.png
This is deprecated for SMILA 1.0, the connectivity framework is still functional but will aimed to be replaced by scalable import based on SMILAs job management.


Overview

The AgentController is a component that manages and monitors Agents. Whenever a new agent task is triggered (via startAgent()) a new instance of the used Agent is created and the agent object hash value is used a an id (called import run id) to identify records created by this agent instance. This import run id is set as an attribute _importRunId on all records and will also visible on the agent instance in the JMX console.

API

AgentController provides two interfaces, one is used by management clients to start/stop agent instances, the other is used by Agents to execute callback methods on the AgentController itself, executing the ccommon processing logic.

Javadoc:

Implementations

It is possible to provide different implementations for the AgentController interface. At the moment there is one implementation available.

org.eclipse.smila.connectivity.framework.impl

This bundle contains the default implementation of the AgentController interface.

The AgentController implements the general processing logic common for all types of Agents. Its interface is a pure management interface that can be accessed by its Java interface or its wrapping JMX interface. It has references to the following OSGi services:

  • ConnectivityManager
  • Agent ComponentFactory
  • ConfigurationManagement (t.b.d.)
  • CompoundManagement (t.b.d.)

Agent Factories register themselves at the AgentController. Each time an agent is started with a datasource for a specific type of agent, a new instance of that Agent type is created via the Agent ComponentFactory. This allows parallel watching of datasources with the same type (e.g. several rss feeds). Note that it is not possible to start muptiple agents on the same data source concurrently!


This chart shows the current AgentController processing logic for one agent run: AgentControllerProcessingLogic.png

  • the Agent is started, initializes DeltaIndexing for the data source by calling DeltaIndexingManager:init(...) and waits for events in a separate thread. One of the following events can occur:
    • ADD: a new or updated object on the datasource was detected. A record object is created. It is checked if the record was updated by calling DeltaIndexingManager:checkForUpdate(...)
      • YES: the record is added to the Queue by calling ConnectivityManager:add(...) and updated in the DeltaIndexingManager by calling DeltaIndexingManager:visit(...)
      • NO: no actions are taken
    • DELETE: an object on the datasource was deleted. An Id object is created for the deleted object. This Id is deleted from both ConnectivityManager and DeltaIndexingManager by calling ConnectivityManager:delete(...)and DeltaIndexingManager:delete(..).
    • STOP: the agent is stopped either via an external command or because some fatal errors occured
      • it finishes DeltaIndexing by calling DeltaIndexingManager:finish(...) and ends the thread

The processing logic will be enhanced when CompoundManagement is integrated.

Note

The exact logic depends on the settings of DeltaIndexing in the data source configuration. Depending on the configured value, delta indexing logic is executed fully, partially or not at all.

Configuration

There are no configuration options available for this bundle.

JMX interface

Javadoc: org.eclipse.smila.connectivity.framework.AgentControllerAgent

Here is a screenshot of the AgentController in the JMX Console:

AgentControllerJMX.png

HTTP ReST JSON interface

Since version 0.9 the AgentController can also be controlled via the SMILA ReST API. It provides the following endpoints:

endpoint method description
/smila/agents GET list data sources available for agents and the current agent state
/smila/agents/<datasource-id> GET get statistics of current or last agent run, if one exists.
/smila/agents/<datasource-id> POST + JSON-Body start agent
/smila/agents/<datasource-id>/finish POST stop agent

Agent Datasource Listing

The listing contains the available data sources that can be used for crawling and the current agent state. State "Undefined" means that no agent run for the datasource has yet been started. Other states can be

  • Initializing: The agent is starting
  • Running: A agent is current working on this datasource.
  • Stopped: The agent was stopped by the user.
  • Aborted: A fatal error occurred while working on the datasource.

If the state has one of these four values, it is possible to read statistics for the datasource by using the given URL. Example:

GET /smila/agents/
-->
200 OK
{
    "agents": [
        {
            "name": "feeds",
            "state": "Running",
            "url": "http://localhost:8080/smila/agents/feeds/"
        },
        {
            "name": "jobfile",
            "state": "Undefined",
            "url": "http://localhost:8080/smila/agents/jobfile/"
        }
    ]
}

Start a Agent

If a datasource is not in agent state "Running" it can be started using the URL given in the datasource listing. The request must contain a JSON body describing the destination job to submit records to. In case of success the response contains the internal import run ID.

POST /smila/agents/feeds/
{ 
  "jobName": "indexUpdateJob" 
}
-->
200 OK
{
    "importRunId": 1231907158
}

Other response codes:

  • 400 Bad Request: datasource ID does not exist, destination job not given or not active, datasource is not a agent source or a agent is already running for the datasource.
  • 500 Internal Server Error: Ohter errors.

Get Agent Statistics

If a datasource has been agent or is currently agent you can read the performance counters using the datasource URL:

GET /smila/agents/feeds/
-->
200 OK
{
    "jobName": "indexUpdateJob",
    "attachmentBytesTransfered": 0,
    "attachmentTransferRate": 0,
    "averageAttachmentTransferRate": 0,
    "averageDeltaIndicesProcessingTime": 0,
    "averageRecordsProcessingTime": 0,
    "deltaIndices": 0,
    "errorBuffer": "[]",
    "exceptions": 0,
    "exceptionsCritical": 0,
    "importRunId": "1231907158",
    "overallAverageDeltaIndicesProcessingTime": 1990.95,
    "overallAverageRecordsProcessingTime": 1990.95,
    "records": 460,
    "startDate": "2011-09-06",
    "dataSourceId": "feeds",
    "state": "Running"
}

Other responses are

  • 400 Bad Request: Invalid datasource ID
  • 404 Not Found: No statistics available for given datasource
  • 500 Internal Server Error: Other error.

Stop a Agent

To stop a running agent, use the following HTTP request. The response will be empty, just the response code will be "OK".

POST /smila/agents/feeds/finish/
-->
200 OK

Other responses are:

  • 400 Bad Request: No agent is running for this datasource.
  • 500 Internal Server Error: Other errors.

Back to the top