Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/AgentController"

 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
{{note|This is deprecated for SMILA 1.0, the connectivity framework is still functional but will aimed to be replaced by scalable import based on SMILAs job management.}}
 +
 
== Overview ==
 
== Overview ==
  
The AgentController is a component that manages and monitors Agents.
+
The AgentController is a component that manages and monitors Agents. Whenever a new agent task is triggered (via <tt>startAgent()</tt>) a new instance of the used Agent is created and the agent object hash value is used a an id (called ''import run id'') to identify records created by this agent instance. This import run id is set as an attribute ''_importRunId'' on all records and will also visible on the agent instance in the JMX console.
  
 
== API ==
 
== API ==
Line 7: Line 9:
 
AgentController provides two interfaces, one is used by management clients to start/stop agent instances, the other is used by Agents to execute callback methods on the AgentController itself, executing the ccommon processing logic.
 
AgentController provides two interfaces, one is used by management clients to start/stop agent instances, the other is used by Agents to execute callback methods on the AgentController itself, executing the ccommon processing logic.
  
<source lang="java">
+
Javadoc:
/**
+
* [http://build.eclipse.org/rt/smila/javadoc/current/org/eclipse/smila/connectivity/framework/AgentController.html org.eclipse.smila.connectivity.framework.AgentController]
* Management interface for the AgentController.
+
* [http://build.eclipse.org/rt/smila/javadoc/current/org/eclipse/smila/connectivity/framework/util/AgentControllerCallback.html org.eclipse.smila.connectivity.framework.util.AgentControllerCallback]
*/
+
public interface AgentController {
+
 
+
  /**
+
  * Starts an Agent using the given dataSourceId. This method creates a new Thread. If it is called for a dataSourceId
+
  * that is currently used by another agent a ConnectivityException is thrown. Returns the hashCode of the agent
+
  * instance used for performance counter.
+
  *
+
  * @param dataSourceId
+
  *          the ID of the data source
+
  * @return - the hashcode of the agent instance as int value
+
  * @throws ConnectivityException
+
  *          if any error occurs
+
  */
+
  int startAgent(String dataSourceId) throws ConnectivityException;
+
 
+
  /**
+
  * Stops an active agent using the given dataSourceId.
+
  *
+
  * @param dataSourceId
+
  *          the ID of the data source
+
  * @throws ConnectivityException
+
  *          if any error occurs
+
  */
+
  void stopAgent(String dataSourceId) throws ConnectivityException;
+
 
+
  /**
+
  * Checks if there are any active agents.
+
  *
+
  * @return true if there are active agents, false otherwise
+
  * @throws ConnectivityException
+
  *          if any error occurs
+
  */
+
  boolean hasActiveAgents() throws ConnectivityException;
+
 
+
  /**
+
  * Returns a Collection of Strings containing the dataSourceIds of the currently active agents.
+
  *
+
  * @return a Collection of Strings containing the dataSourceIds
+
  * @throws ConnectivityException
+
  *          if any error occurs
+
  */
+
  Collection<String> getActiveAgents() throws ConnectivityException;
+
 
+
  /**
+
  * returns the AgentController known Agents.
+
  *
+
  * @return Collection with Strings
+
  */
+
  Collection<String> getAvailableAgents();
+
}
+
</source>
+
 
+
<source lang="java">
+
/**
+
* Interface for callbacks on the AgentController. This interface is used by Agents to send add and delete requests and
+
* to unregister an agent if a critical error occurred.
+
*/
+
public interface AgentControllerCallback {
+
 
+
  /**
+
  * Add the given records.
+
  *
+
  * @param records
+
  *          the records to add
+
  */
+
  void add(final Record[] records);
+
 
+
  /**
+
  * Delete the given ids.
+
  *
+
  * @param ids
+
  *          the ids of the records to delete
+
  */
+
  void delete(final Id[] ids);
+
 
+
  /**
+
  * Removes the Agent using the given DataSourceId from the list of active Agents.
+
  *
+
  * @param dataSourceId
+
  *          the ID of the data source used by the agent
+
  */
+
  void unregister(String dataSourceId);
+
}
+
 
+
</source>
+
  
 
== Implementations ==
 
== Implementations ==
Line 117: Line 33:
 
[[Image:AgentControllerProcessingLogic.png]]
 
[[Image:AgentControllerProcessingLogic.png]]
  
* the Agent is started, initializes DeltaIndexing for the data source by calling <tt>DeltaIndexingManager:init(String)</tt> and waits for events in a separate thread. One of the following events can occur:
+
* the Agent is started, initializes DeltaIndexing for the data source by calling <tt>DeltaIndexingManager:init(...)</tt> and waits for events in a separate thread. One of the following events can occur:
** ADD: a new or updated object on the datasource was detected. A record object is created. It is checked if the record was updated by calling <tt>DeltaIndexingManager:checkForUpdate(Id, String)</tt>
+
** ADD: a new or updated object on the datasource was detected. A record object is created. It is checked if the record was updated by calling <tt>DeltaIndexingManager:checkForUpdate(...)</tt>
*** YES: the record is added to the Queue by calling <tt>ConnectivityManager:add(Record[])</tt> and updated in the DeltaIndexingManager by calling <tt>DeltaIndexingManager:visit(Id, String, boolean)</tt>
+
*** YES: the record is added to the Queue by calling <tt>ConnectivityManager:add(...)</tt> and updated in the DeltaIndexingManager by calling <tt>DeltaIndexingManager:visit(...)</tt>
 
*** NO: no actions are taken
 
*** NO: no actions are taken
** DELETE: an object on the datasource was deleted. An Id object is created for the deleted object. This Id is deleted from both ConnectivityManager and DeltaIndexingManager by calling <tt>ConnectivityManager:delete(Id[])</tt>and <tt>DeltaIndexingManager:delete(Id[])</tt>.
+
** DELETE: an object on the datasource was deleted. An Id object is created for the deleted object. This Id is deleted from both ConnectivityManager and DeltaIndexingManager by calling <tt>ConnectivityManager:delete(...)</tt>and <tt>DeltaIndexingManager:delete(..)</tt>.
 
** STOP: the agent is stopped either via an external command or because some fatal errors occured
 
** STOP: the agent is stopped either via an external command or because some fatal errors occured
*** it finishes DeltaIndexing by calling <tt>DeltaIndexingManager:finish(String)</tt> and ends the thread
+
*** it finishes DeltaIndexing by calling <tt>DeltaIndexingManager:finish(...)</tt> and ends the thread
  
 
The processing logic will be enhanced when CompoundManagement is integrated.
 
The processing logic will be enhanced when CompoundManagement is integrated.
 +
 +
;Note
 +
The exact logic depends on the settings of DeltaIndexing in the data source configuration. Depending on the configured value, delta indexing logic is executed fully, partially or not at all.
  
 
=== Configuration ===
 
=== Configuration ===
Line 132: Line 51:
 
=== JMX interface ===
 
=== JMX interface ===
  
<source lang="java">
+
Javadoc: [http://build.eclipse.org/rt/smila/javadoc/current/org/eclipse/smila/connectivity/framework/AgentControllerAgent.html org.eclipse.smila.connectivity.framework.AgentControllerAgent]
/**
+
* The Interface AgentControllerAgent.
+
*/
+
public interface AgentControllerAgent {
+
  
  /**
+
Here is a screenshot of the AgentController in the JMX Console:
  * Start agent.
+
  *
+
  * @param dataSourceId
+
  *          the data source id
+
  *
+
  * @return the string
+
  */
+
  String startAgent(final String dataSourceId);
+
  
  /**
+
[[Image:AgentControllerJMX.png]]
  * Stop agent.
+
  *
+
  * @param dataSourceId
+
  *          the data source id
+
  *
+
  * @return the string
+
  */
+
  String stopAgent(final String dataSourceId);
+
  
  /**
+
=== HTTP ReST JSON interface ===
  * Gets the active agents status.
+
  *
+
  * @return the active agents status
+
  */
+
  String getActiveAgentTaskStatus();
+
  
  /**
+
Since version 0.9 the AgentController can also be controlled via the SMILA ReST API. It provides the following endpoints:
  * Gets the active agents.
+
  *
+
  * @return the active agents
+
  */
+
  String[] getActiveAgentTasks();
+
  
  /**
+
{|{{Greytable}}
  * returns all Agents that have connected to the AgentController.
+
! endpoint !! method !! description
  *
+
|-
  * @return List with Strings of all available Agents
+
| /smila/agents || GET || list data sources available for agents and the current agent state
  */
+
|-
  String[] getAvailableAgents();
+
| /smila/agents/<datasource-id> || GET || get statistics of current or last agent run, if one exists.
 +
|-
 +
| /smila/agents/<datasource-id> || POST + JSON-Body || start agent
 +
|-
 +
| /smila/agents/<datasource-id>/finish || POST || stop agent
 +
|-
 +
|}
  
 +
==== Agent Datasource Listing ====
 +
 +
The listing contains the available data sources that can be used for crawling and the current agent state. State "Undefined" means that no agent run for the datasource has yet been started. Other states can be
 +
* Initializing: The agent is starting
 +
* Running: A agent is current working on this datasource.
 +
* Stopped: The agent was stopped by the user.
 +
* Aborted: A fatal error occurred while working on the datasource.
 +
If the state has one of these four values, it is possible to read statistics for the datasource by using the given URL. Example:
 +
 +
<source lang="javascript">
 +
GET /smila/agents/
 +
-->
 +
200 OK
 +
{
 +
    "agents": [
 +
        {
 +
            "name": "feeds",
 +
            "state": "Running",
 +
            "url": "http://localhost:8080/smila/agents/feeds/"
 +
        },
 +
        {
 +
            "name": "jobfile",
 +
            "state": "Undefined",
 +
            "url": "http://localhost:8080/smila/agents/jobfile/"
 +
        }
 +
    ]
 
}
 
}
 
</source>
 
</source>
  
 +
==== Start a Agent ====
  
Here is a screenshot of the AgentController in the JMX Console:
+
If a datasource is not in agent state "Running" it can be started using the URL given in the datasource listing. The request must contain a JSON body describing the destination job to submit records to. In case of success the response contains the internal import run ID.
  
[[Image:AgentControllerJMX.png]]
+
<source lang="javascript">
 +
POST /smila/agents/feeds/
 +
{
 +
  "jobName": "indexUpdateJob"
 +
}
 +
-->
 +
200 OK
 +
{
 +
    "importRunId": 1231907158
 +
}
 +
</source>
 +
 
 +
Other response codes:
 +
* 400 Bad Request: datasource ID does not exist, destination job not given or not active, datasource is not a agent source or a agent is already running for the datasource.
 +
* 500 Internal Server Error: Ohter errors.
 +
 
 +
==== Get Agent Statistics  ====
 +
 
 +
If a datasource has been agent or is currently agent you can read the performance counters using the datasource URL:
 +
 
 +
<source lang="javascript">
 +
GET /smila/agents/feeds/
 +
-->
 +
200 OK
 +
{
 +
    "jobName": "indexUpdateJob",
 +
    "attachmentBytesTransfered": 0,
 +
    "attachmentTransferRate": 0,
 +
    "averageAttachmentTransferRate": 0,
 +
    "averageDeltaIndicesProcessingTime": 0,
 +
    "averageRecordsProcessingTime": 0,
 +
    "deltaIndices": 0,
 +
    "errorBuffer": "[]",
 +
    "exceptions": 0,
 +
    "exceptionsCritical": 0,
 +
    "importRunId": "1231907158",
 +
    "overallAverageDeltaIndicesProcessingTime": 1990.95,
 +
    "overallAverageRecordsProcessingTime": 1990.95,
 +
    "records": 460,
 +
    "startDate": "2011-09-06",
 +
    "dataSourceId": "feeds",
 +
    "state": "Running"
 +
}
 +
</source>
 +
 
 +
Other responses are
 +
 
 +
*400 Bad Request: Invalid datasource ID
 +
*404 Not Found: No statistics available for given datasource
 +
*500 Internal Server Error: Other error.
 +
 
 +
==== Stop a Agent ====
 +
 
 +
To stop a running agent, use the following HTTP request. The response will be empty, just the response code will be "OK".
 +
 
 +
<source lang="javascript">
 +
POST /smila/agents/feeds/finish/
 +
-->
 +
200 OK
 +
</source>
 +
 
 +
Other responses are:
 +
* 400 Bad Request: No agent is running for this datasource.
 +
* 500 Internal Server Error: Other errors.

Latest revision as of 05:42, 24 January 2012

Note.png
This is deprecated for SMILA 1.0, the connectivity framework is still functional but will aimed to be replaced by scalable import based on SMILAs job management.


Overview

The AgentController is a component that manages and monitors Agents. Whenever a new agent task is triggered (via startAgent()) a new instance of the used Agent is created and the agent object hash value is used a an id (called import run id) to identify records created by this agent instance. This import run id is set as an attribute _importRunId on all records and will also visible on the agent instance in the JMX console.

API

AgentController provides two interfaces, one is used by management clients to start/stop agent instances, the other is used by Agents to execute callback methods on the AgentController itself, executing the ccommon processing logic.

Javadoc:

Implementations

It is possible to provide different implementations for the AgentController interface. At the moment there is one implementation available.

org.eclipse.smila.connectivity.framework.impl

This bundle contains the default implementation of the AgentController interface.

The AgentController implements the general processing logic common for all types of Agents. Its interface is a pure management interface that can be accessed by its Java interface or its wrapping JMX interface. It has references to the following OSGi services:

  • ConnectivityManager
  • Agent ComponentFactory
  • ConfigurationManagement (t.b.d.)
  • CompoundManagement (t.b.d.)

Agent Factories register themselves at the AgentController. Each time an agent is started with a datasource for a specific type of agent, a new instance of that Agent type is created via the Agent ComponentFactory. This allows parallel watching of datasources with the same type (e.g. several rss feeds). Note that it is not possible to start muptiple agents on the same data source concurrently!


This chart shows the current AgentController processing logic for one agent run: AgentControllerProcessingLogic.png

  • the Agent is started, initializes DeltaIndexing for the data source by calling DeltaIndexingManager:init(...) and waits for events in a separate thread. One of the following events can occur:
    • ADD: a new or updated object on the datasource was detected. A record object is created. It is checked if the record was updated by calling DeltaIndexingManager:checkForUpdate(...)
      • YES: the record is added to the Queue by calling ConnectivityManager:add(...) and updated in the DeltaIndexingManager by calling DeltaIndexingManager:visit(...)
      • NO: no actions are taken
    • DELETE: an object on the datasource was deleted. An Id object is created for the deleted object. This Id is deleted from both ConnectivityManager and DeltaIndexingManager by calling ConnectivityManager:delete(...)and DeltaIndexingManager:delete(..).
    • STOP: the agent is stopped either via an external command or because some fatal errors occured
      • it finishes DeltaIndexing by calling DeltaIndexingManager:finish(...) and ends the thread

The processing logic will be enhanced when CompoundManagement is integrated.

Note

The exact logic depends on the settings of DeltaIndexing in the data source configuration. Depending on the configured value, delta indexing logic is executed fully, partially or not at all.

Configuration

There are no configuration options available for this bundle.

JMX interface

Javadoc: org.eclipse.smila.connectivity.framework.AgentControllerAgent

Here is a screenshot of the AgentController in the JMX Console:

AgentControllerJMX.png

HTTP ReST JSON interface

Since version 0.9 the AgentController can also be controlled via the SMILA ReST API. It provides the following endpoints:

endpoint method description
/smila/agents GET list data sources available for agents and the current agent state
/smila/agents/<datasource-id> GET get statistics of current or last agent run, if one exists.
/smila/agents/<datasource-id> POST + JSON-Body start agent
/smila/agents/<datasource-id>/finish POST stop agent

Agent Datasource Listing

The listing contains the available data sources that can be used for crawling and the current agent state. State "Undefined" means that no agent run for the datasource has yet been started. Other states can be

  • Initializing: The agent is starting
  • Running: A agent is current working on this datasource.
  • Stopped: The agent was stopped by the user.
  • Aborted: A fatal error occurred while working on the datasource.

If the state has one of these four values, it is possible to read statistics for the datasource by using the given URL. Example:

GET /smila/agents/
-->
200 OK
{
    "agents": [
        {
            "name": "feeds",
            "state": "Running",
            "url": "http://localhost:8080/smila/agents/feeds/"
        },
        {
            "name": "jobfile",
            "state": "Undefined",
            "url": "http://localhost:8080/smila/agents/jobfile/"
        }
    ]
}

Start a Agent

If a datasource is not in agent state "Running" it can be started using the URL given in the datasource listing. The request must contain a JSON body describing the destination job to submit records to. In case of success the response contains the internal import run ID.

POST /smila/agents/feeds/
{ 
  "jobName": "indexUpdateJob" 
}
-->
200 OK
{
    "importRunId": 1231907158
}

Other response codes:

  • 400 Bad Request: datasource ID does not exist, destination job not given or not active, datasource is not a agent source or a agent is already running for the datasource.
  • 500 Internal Server Error: Ohter errors.

Get Agent Statistics

If a datasource has been agent or is currently agent you can read the performance counters using the datasource URL:

GET /smila/agents/feeds/
-->
200 OK
{
    "jobName": "indexUpdateJob",
    "attachmentBytesTransfered": 0,
    "attachmentTransferRate": 0,
    "averageAttachmentTransferRate": 0,
    "averageDeltaIndicesProcessingTime": 0,
    "averageRecordsProcessingTime": 0,
    "deltaIndices": 0,
    "errorBuffer": "[]",
    "exceptions": 0,
    "exceptionsCritical": 0,
    "importRunId": "1231907158",
    "overallAverageDeltaIndicesProcessingTime": 1990.95,
    "overallAverageRecordsProcessingTime": 1990.95,
    "records": 460,
    "startDate": "2011-09-06",
    "dataSourceId": "feeds",
    "state": "Running"
}

Other responses are

  • 400 Bad Request: Invalid datasource ID
  • 404 Not Found: No statistics available for given datasource
  • 500 Internal Server Error: Other error.

Stop a Agent

To stop a running agent, use the following HTTP request. The response will be empty, just the response code will be "OK".

POST /smila/agents/feeds/finish/
-->
200 OK

Other responses are:

  • 400 Bad Request: No agent is running for this datasource.
  • 500 Internal Server Error: Other errors.

Back to the top