Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Mock Agent"

 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{note|This is deprecated for SMILA 1.0, the connectivity framework is still functional but will aimed to be replaced by scalable import based on SMILAs job management.}}
 +
 
== Overview ==
 
== Overview ==
  
The MockAgent is a sample implementation of an Agent used for testing. It creates new records in a configurable interval, sending add request to the AgentController.
+
The Mock agent is a sample implementation of an agent used for testing. It creates new records in a configurable interval and sends add requests to the AgentController.
 +
 
 
A record can contain the following attributes:
 
A record can contain the following attributes:
 
* Identifier
 
* Identifier
Line 8: Line 11:
 
* Content
 
* Content
  
== Crawling configuration ==
+
== Agent configuration ==
  
The example configuration file called "mockAgent.xml" is located at <tt>configuration/org.eclipse.smila.connectivity.framework</tt>.
+
The example configuration file is located at <tt>configuration/org.eclipse.smila.connectivity.framework/mockAgent.xml</tt>.
  
 
Defining Schema: <tt>org.eclipse.smila.connectivits.framework.agent.mock/schemas/MockDataSourceConnectionConfigSchema.xsd</tt>.
 
Defining Schema: <tt>org.eclipse.smila.connectivits.framework.agent.mock/schemas/MockDataSourceConnectionConfigSchema.xsd</tt>.
  
== Crawling configuration explanation ==
+
== Agent configuration explanation ==
  
The root element of crawling configuration is <tt>DataSourceConnectionConfig</tt> and contains the following sub elements:
+
The root element of the configuration is <tt>DataSourceConnectionConfig</tt> and contains the following sub elements:
  
 
* <tt>DataSourceID</tt> – the identification of a data source
 
* <tt>DataSourceID</tt> – the identification of a data source
 
* <tt>SchemaID</tt> – specifies the schema for the data source
 
* <tt>SchemaID</tt> – specifies the schema for the data source
 
* <tt>DataConnectionID</tt> – describes which agent or crawler should be used
 
* <tt>DataConnectionID</tt> – describes which agent or crawler should be used
** <tt>Crawler</tt> – service id a Crawler
+
** <tt>Crawler</tt> – service ID a crawler
** <tt>Agent</tt> – service id of an Agent
+
** <tt>Agent</tt> – service ID of an agent
* <tt>CompoundHandling</tt> – specify if packed data (like a zip containing files) should be unpack and files within should be crawled (YES or NO).
+
* <tt>CompoundHandling</tt> – specify if packed data (like a ZIP container containing files) should be unpack and files within should be processed(YES or NO).
 
* <tt>Attributes</tt> – list all attributes provided by the data source
 
* <tt>Attributes</tt> – list all attributes provided by the data source
 
** <tt>Attribute</tt>
 
** <tt>Attribute</tt>
Line 35: Line 38:
 
** <tt>SleepTime</tt> – the number of seconds to wait betweeen creation of records.
 
** <tt>SleepTime</tt> – the number of seconds to wait betweeen creation of records.
  
 
+
== Mock agent configuration example ==  
== MockAgent configuration example ==  
+
  
 
<source lang="xml">
 
<source lang="xml">
 
<DataSourceConnectionConfig
 
<DataSourceConnectionConfig
 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.rss/schemas/RSSDataSourceConnectionConfigSchema.xsd"
+
   xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.mock/schemas/MockDataSourceConnectionConfigSchema.xsd"
 
>
 
>
 
   <DataSourceID>mockAgent</DataSourceID>
 
   <DataSourceID>mockAgent</DataSourceID>
Line 69: Line 71:
 
</source>
 
</source>
  
== Output example for default configuration ==  
+
== Output example ==  
  
For a text file named <tt>crawler.txt</tt> located in <tt>c:/data</tt> the crawler will create the following record:
+
A record created by the Mock agent will have the following structure:
  
 
<source lang="xml">
 
<source lang="xml">
<Record xmlns="http://www.eclipse.org/smila/record" version="1.0">
+
<Record xmlns="http://www.eclipse.org/smila/record" version="2.0">
   <Id xmlns="http://www.eclipse.org/smila/id" version="1.0">
+
   <Val key="_recordid">mockAgent:&lt;Path=1241449855624&gt;</Val>
    <!-- Element name must be Source, not _Source, it's made due to syntax coloring problem in wiki -->
+
  <Val key="_source">mockAgent</Val>
    <_Source>mockAgent</_Source>
+
   <Val key="LastModifiedDate" type="datetime">2009-05-04 16:44:46.541</Val>
    <Key name="Path">1241449855624</Key>
+
   <Val key="Path">1241449855624</Val>
  </Id>
+
   <Val key="MimeType">text/html</Val>
   <A n="LastModifiedDate">
+
    <L>
+
      <V t="datetime">2009-05-04 16:44:46.541</V>
+
    </L>
+
  </A>
+
   <A n="Path">
+
    <L>
+
      <V>1241449855624</V>
+
    </L>
+
   </A>
+
  <A n="MimeType">
+
    <L>
+
      <V>text/html</V>
+
    </L>
+
  </A>
+
  <A n="_HASH_TOKEN">
+
    <L>
+
      <V>
+
        66f373e6f13498a65c7f5f1cf185611e94ab45630c825cc2028dda38e8245c7
+
      </V>
+
    </L>
+
  </A>
+
 
   <Attachment>Content</Attachment>
 
   <Attachment>Content</Attachment>
 
</Record>
 
</Record>

Latest revision as of 04:43, 24 January 2012

Note.png
This is deprecated for SMILA 1.0, the connectivity framework is still functional but will aimed to be replaced by scalable import based on SMILAs job management.


Overview

The Mock agent is a sample implementation of an agent used for testing. It creates new records in a configurable interval and sends add requests to the AgentController.

A record can contain the following attributes:

  • Identifier
  • MimeType
  • LastModifiedDate
  • Content

Agent configuration

The example configuration file is located at configuration/org.eclipse.smila.connectivity.framework/mockAgent.xml.

Defining Schema: org.eclipse.smila.connectivits.framework.agent.mock/schemas/MockDataSourceConnectionConfigSchema.xsd.

Agent configuration explanation

The root element of the configuration is DataSourceConnectionConfig and contains the following sub elements:

  • DataSourceID – the identification of a data source
  • SchemaID – specifies the schema for the data source
  • DataConnectionID – describes which agent or crawler should be used
    • Crawler – service ID a crawler
    • Agent – service ID of an agent
  • CompoundHandling – specify if packed data (like a ZIP container containing files) should be unpack and files within should be processed(YES or NO).
  • Attributes – list all attributes provided by the data source
    • Attribute
      • Type (required) – the data type (String, Integer or Date).
      • Name (required) – attributes name.
      • HashAttribute – specify if a hash should be created (true or false).
      • KeyAttribute – creates a key for this object, for example for record id (true or false).
      • Attachment – specify if the attribute return the data as attachment of record.
  • Process – contains parameters for the agent business logic.
    • SleepTime – the number of seconds to wait betweeen creation of records.

Mock agent configuration example

<DataSourceConnectionConfig
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.mock/schemas/MockDataSourceConnectionConfigSchema.xsd"
>
  <DataSourceID>mockAgent</DataSourceID>
  <SchemaID>org.eclipse.smila.connectivity.framework.agent.mock</SchemaID>
  <DataConnectionID>
    <Agent>MockAgent</Agent>
  </DataConnectionID>
  <CompoundHandling>Yes</CompoundHandling>
  <Attributes>
    <Attribute Type="Date" Name="LastModifiedDate" HashAttribute="true">
      <MockAttributes>LastModifiedDate</MockAttributes>
    </Attribute>
    <Attribute Type="String" Name="Path" KeyAttribute="true">
      <MockAttributes>Identifier</MockAttributes>
    </Attribute>
    <Attribute Type="String" Name="Content" Attachment="true" MimeTypeAttribute="MimeType">
      <MockAttributes>Content</MockAttributes>
    </Attribute>
    <Attribute Type="String" Name="MimeType">
      <MockAttributes>MimeType</MockAttributes>
    </Attribute>   
  </Attributes>
  <Process>
    <SleepTime>60</SleepTime>
  </Process>
</DataSourceConnectionConfig>

Output example

A record created by the Mock agent will have the following structure:

<Record xmlns="http://www.eclipse.org/smila/record" version="2.0">
  <Val key="_recordid">mockAgent:&lt;Path=1241449855624&gt;</Val>
  <Val key="_source">mockAgent</Val>
  <Val key="LastModifiedDate" type="datetime">2009-05-04 16:44:46.541</Val>
  <Val key="Path">1241449855624</Val>
  <Val key="MimeType">text/html</Val>
  <Attachment>Content</Attachment>
</Record>

See also