Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/2011.Simplification/JobFile Agent"

m
(For SMILA 1.0: Simplification pages are obsolete, redirect to SMILA/Documentation/JobFile_Agent)
 
Line 1: Line 1:
== Overview ==
+
#REDIRECT [[SMILA/Documentation/JobFile_Agent]]
 
+
The Job File agent offers the functionality to execute <i>ADD</i> and <i>DELETE</i> jobs. A job file is an XML file using the SMILA datamodel XML representation of <tt>Records</tt> and <tt>Ids</tt> to describe the data and special <tt>ADD</tt> and <tt>DELETE</tt> tags to specify the action to take.
+
 
+
== Agent configuration ==
+
 
+
The example configuration file is located at <tt>configuration/org.eclipse.smila.connectivity.framework/jobfile.xml</tt>.
+
 
+
Defining Schema: <tt>org.eclipse.smila.connectivits.framework.agent.jobfile/schemas/JobFileDataSourceConnectionConfigSchema.xsd</tt>.
+
 
+
== Agent configuration explanation ==
+
 
+
The root element of the configuration is <tt>DataSourceConnectionConfig</tt> and contains the following sub elements:
+
 
+
* <tt>DataSourceID</tt> – the identification of a data source
+
* <tt>SchemaID</tt> – specifies the schema for the data source
+
* <tt>DataConnectionID</tt> – describes which agent or crawler should be used
+
** <tt>Crawler</tt> – service ID of a crawler
+
** <tt>Agent</tt> – service ID of an agent
+
* <tt>CompoundHandling</tt> – specify if packed data (like a ZIP containing files) should be unpack and files within should be processed (YES or NO).
+
* <tt>Attributes</tt> – list all attributes provided by the data source
+
** <tt>Attribute</tt>
+
*** <tt>Type</tt> (required) – the data type (String, Integer or Date).
+
*** <tt>Name</tt> (required) – attributes name.
+
*** <tt>HashAttribute</tt> – specify if a hash should be created (true or false).
+
*** <tt>KeyAttribute</tt> – creates a key for this object, for example for record id (true or false).
+
*** <tt>Attachment</tt> – specify if the attribute return the data as attachment of record.
+
* <tt>Process</tt> – contains parameters for the agent business logic.
+
** <tt>UpdateInterval</tt> – the number of seconds to wait before reloading the job files specified by JobFileUrl.
+
** <tt>JobFileUrl</tt> – the URL of the job file to load. Protocols <tt>file://</tt> and <tt>http://</tt> are supported. You may specify multiple URLs.
+
**<tt>AttachmentSeparator</tt> - the separator used to separate attachment names and attachment URLs
+
 
+
 
+
The Job File agent offers no attributes by itself, rather it just creates the attributes that are part of each record in the job file. However, you have to specify the names of those attributes that should be used for hash creation (the hash is not part of the record) and optionally for id creation (it is also possible to already provide an Id in the job file for each record..
+
 
+
 
+
== Configuration example ==
+
 
+
<source lang="xml">
+
<DataSourceConnectionConfig
+
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+
  xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.jobfile/schemas/FeedDataSourceConnectionConfigSchema.xsd"
+
>
+
  <DataSourceID>jobfile</DataSourceID>
+
  <SchemaID>org.eclipse.smila.connectivity.framework.agent.jobfile</SchemaID>
+
  <DataConnectionID>
+
    <Agent>JobFileAgent</Agent>
+
  </DataConnectionID>
+
  <DeltaIndexing>full</DeltaIndexing>
+
  <Attributes>
+
    <Attribute Type="Date" Name="LastModifiedDate" HashAttribute="true" />
+
    <Attribute Type="String" Name="Path" KeyAttribute="true" />
+
    <Attribute Type="String" Name="Url" KeyAttribute="true" />
+
  </Attributes>
+
  <Process>
+
    <UpdateInterval>300</UpdateInterval>
+
    <AttachmentSeparator>####</AttachmentSeparator>
+
    <JobFileUrl>file://samplejobfile.xml</JobFileUrl>
+
  </Process>
+
</DataSourceConnectionConfig>
+
 
+
</source>
+
 
+
== The format of job files ==
+
 
+
A example configuration file called "samplejobfile.xml" is located at <tt>configuration/org.eclipse.smila.connectivity.framework</tt>.
+
 
+
Defining Schema: <tt>org.eclipse.smila.connectivits.framework.agent.jobfile/schemas/jobfile.xsd</tt>.
+
 
+
In a job file you can have either a <tt>ADD</tt> section, or a <tt>DELETE</tt> section or both. A <tt>ADD</tt> section can contain one or more <tt>Record</tt> sections. A <tt>Record</tt> section need not contain an <tt>Id</tt>. If no <tt>Id</tt> is contained, an Id object is created according to the Job File agent configuration. A <tt>DELETE</tt> section can contain one or more <tt>Id</tt> sections. In all respects the content of <tt>ADD</tt> and <tt>DELETE</tt> sections adhere to the datamodel XML schemes <tt>org.eclipse.smila.datamodel/xml/id.xsd</tt>. and <tt>org.eclipse.smila.datamodel/xml/record.xsd</tt>.
+
 
+
Attachments are handled slightly different:
+
Normally the XML datamodel contains only the name of an attachment. But during an import we want to fill an attachment with a value. Therefore it is necessary to not only include the attachment name in the XML but also an URL where the actual attachment value is located. Both information are separated by the <tt>AttachmentSeparator</tt> configured in the Job File agent configuration.
+
 
+
For example the attachment named <tt>Content</tt> should be filled with the document referenced by <tt>http://www.eclipse.org</tt>. As AttachmentSeparator the string <tt>####</tt> is used. Then the XML looks like this:
+
<source lang="xml">
+
...
+
    <Attachment>Content####epl-v10.html</Attachment>
+
...
+
</source>
+
 
+
<b>Note</b>: If you set the "_source" attribute for records in the job file, the value must match the <tt>DataSourceID</tt> in the Job File agent configuration! Otherwise the record is skipped.
+
 
+
== Example of a job file ==
+
 
+
Here is an example for a job file with both a <tt>ADD</tt> and <tt>DELETE</tt> section. It shows the different options of
+
* creating Id objects from attribute values
+
* providing Ids within the XML
+
* loading data into attachments
+
* providing text or markup data in attributes
+
 
+
<source lang="xml">
+
<?xml version="1.0" encoding="UTF-8"?>
+
<JobFile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+
  xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.jobfile/schemas/jobfile.xsd">
+
    <Add>
+
        <!-- sample record where id is created and content is loaded into attachment from file url //-->
+
        <Record version="2.0">
+
          <Val key="MimeType">text/html</Val>
+
          <Val key="Size" type="long">16536</Val>
+
          <Val key="Extension">html</Val>
+
          <Val key="LastModifiedDate" type="datetime">2009-03-13T10:42:00+0100</Val>
+
          <Val key="Filename">epl-v10.html</Val>
+
  <Val key="Path">epl-v10.html</Val> 
+
          <Attachment>Content####epl-v10.html</Attachment>
+
        </Record>   
+
       
+
        <!-- sample record where id is created and content is loaded inti attachment from http url //-->
+
        <Record version="1.0">
+
          <Val key="MimeType">text/html</Val>
+
          <Val key="Size" type="long">11765</Val>
+
          <Val key="Extension">html</Val>
+
          <Val key="LastModifiedDate" type="date">2009-07-09</Val>
+
          <Val key="Url">http://www.eclipse.org/smila/</Val>
+
          <Attachment>Content####http://www.eclipse.org/smila/</Attachment>
+
        </Record>   
+
 
+
        <!-- sample record where id is provided and txt content is provided in attribute //-->
+
        <Record version="2.0">
+
          <Val key="_recordid">jobfile:C:/sample folder/sample filename.txt</Val>
+
          <Val key="_source">jobfile</Val>
+
          <Val key="MimeType">text/plain</Val>
+
          <Val key="Size" type="long">16384</Val>
+
          <Val key="Extension">txt</Val>
+
          <Val key="LastModifiedDate" type="datetime">2009-07-09T14:53:16+0100</Val>
+
          <Val key="Filename">sample filename.txt</Val>
+
  <Val key="Path">C:/sample folder/sample filename.txt</Val> 
+
          <Val key="Content">This is just some imaginary text content. Used to show how SMILA JobFileAgent works.</Val> 
+
        </Record> 
+
 
+
        <!-- sample record where id is provided and html content is provided in attribute //-->
+
        <Record version="2.0">
+
          <Val key="_recordid">jobfile:C:/sample folder/sample filename.html</Val>
+
          <Val key="_source">jobfile</Val>
+
          <Val key="MimeType">text/html</Val>
+
          <Val key="Size" type="long">16384</Val>
+
          <Val key="Extension">html</Val>
+
          <Val key="LastModifiedDate" type="datetime">2009-07-09T14:53:16+0100</Val>
+
          <Val key="Filename">sample filename.html</Val>
+
  <Val key="Path">C:/sample folder/sample filename.html</Val> 
+
          <Val key="Content">
+
                <![CDATA[
+
                    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
+
                    <HTML>
+
                    <HEAD>
+
                      <TITLE> A sample test document </TITLE>
+
                      <META NAME="Author" CONTENtype="Danieel Stucky">
+
                      <META NAME="Keywords" CONTENtype="SMILA eclipse">
+
                      <META NAME="Description" CONTENtype="sample test document">
+
                    </HEAD>
+
                    <BODY>
+
                      This is just some imaginary text content. Used to show how SMILA's Job File agent works. It even contains a <a href="http://www.eclipse.org">link</a>.
+
                    </BODY>
+
                    </HTML>
+
                ]]>             
+
              </Val> 
+
        </Record>
+
    </Add>
+
 
+
</JobFile>
+
</source>
+
 
+
== See also ==
+
 
+
* [[SMILA/Documentation/Agent|Agent]]
+
* [[SMILA/Documentation/Mock Agent|Mock Agent]]
+
* [[SMILA/Documentation/Feed Agent|Feed Agent]]
+
 
+
__FORCETOC__
+
 
+
[[Category:SMILA]]
+

Latest revision as of 07:38, 19 January 2012

Back to the top