Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Feed Agent"

(Agent configuration)
Line 15: Line 15:
 
== Agent configuration ==
 
== Agent configuration ==
  
The example configuration file called "feed.xml" is located at <tt>configuration/org.eclipse.smila.connectivity.framework</tt>.
+
The example configuration file called "feeds.xml" is located at <tt>configuration/org.eclipse.smila.connectivity.framework</tt>.
  
 
Defining Schema: <tt>org.eclipse.smila.connectivits.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd</tt>.
 
Defining Schema: <tt>org.eclipse.smila.connectivits.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd</tt>.
Line 41: Line 41:
  
  
Here is a description of the Attributes the FeedAgent offers. It provides attributes about the feed itself (using prefix <tt>Feed</tt>) and attributes for entries of the feed. Note that not all feeds necessarily provide values for the attributes.
+
Here is a description of the Attributes the FeedAgent offers. It provides attributes about the feed itself (using prefix <tt>Feed</tt>) and attributes for entries of the feed. Some attributes do not return Literals (String, Date) but nested objects like <b>Person</b>, <b>Link</b>, etc. These objects are all MObjects that contain attributes themselfes. The nested MObjects and their attributes (the attribute names are hard coded and cannot be configured) are described below.
 +
Note that not all feeds necessarily provide values for all the attributes and that some values are provided that are not apperently part of the feed..
  
 
These are the attributes of the feed:
 
These are the attributes of the feed:
Line 51: Line 52:
 
|-
 
|-
 
| FeedAuthors
 
| FeedAuthors
| List<String>
+
| List<Person>
 
| Returns the feed authors
 
| Returns the feed authors
 
|-
 
|-
 
| FeedCategories
 
| FeedCategories
| List<String>
+
| List<Category>
 
| Returns the feed categories
 
| Returns the feed categories
 
|-
 
|-
 
| FeedContributors
 
| FeedContributors
| List<String>
+
| List<Person>
 
| Returns the feed contributors
 
| Returns the feed contributors
 
|-
 
|-
Line 78: Line 79:
 
| Returns the feed type
 
| Returns the feed type
 
|-
 
|-
| FeedImageDescription
+
| FeedImage
| String
+
| Image
| Returns the feed image description
+
| Returns the feed image
|-
+
| FeedImageLink
+
| String
+
| Returns the feed image link
+
|-
+
| FeedImageTitle
+
| String
+
| Returns the feed image title
+
|-
+
| FeedImageUrl
+
| String
+
| Returns the feed image url
+
 
|-
 
|-
 
| FeedLanguage
 
| FeedLanguage
Line 99: Line 88:
 
|-
 
|-
 
| FeedLinks
 
| FeedLinks
| List<String>
+
| List<Link>
 
| Returns the feed links
 
| Returns the feed links
 
|-
 
|-
Line 125: Line 114:
 
|-
 
|-
 
| Authors
 
| Authors
| List<String>
+
| List<Person>
 
| Returns a feed entry authors
 
| Returns a feed entry authors
 
|-
 
|-
 
| Categories
 
| Categories
| List<String>
+
| List<Category>
 
| Returns a feed entry categories
 
| Returns a feed entry categories
 
|-
 
|-
 
| Contents
 
| Contents
| List<String>
+
| List<Content>
 
| Returns a feed entry contents
 
| Returns a feed entry contents
 
|-
 
|-
 
| Contributors
 
| Contributors
| List<String>
+
| List<Person>
 
| Returns a feed entry contributors
 
| Returns a feed entry contributors
 
|-
 
|-
| DescriptionMimeType
+
| Description
| String
+
| Content
| Returns the mime type of a feed entry description
+
|-
+
| DescriptionValue
+
| String
+
 
| Returns a feed entry description
 
| Returns a feed entry description
 
|-
 
|-
 
| Enclosures
 
| Enclosures
| List<String>
+
| List<Enclosure>
 
| Returns a feed entry enclosures
 
| Returns a feed entry enclosures
 
|-
 
|-
 
| Links
 
| Links
| List<String>
+
| List<Link>
 
| Returns a feed entry links
 
| Returns a feed entry links
 
|-
 
|-
Line 175: Line 160:
  
  
 +
 +
MObject <b>Person</b>:
 +
{| class="wikitable" border="1"
 +
|-
 +
! Attribute
 +
! Type
 +
! Description
 +
|-
 +
| Email
 +
| String
 +
| Returns the email of the person
 +
|-
 +
| Name
 +
| String
 +
| Returns the name of the person
 +
|-
 +
| Uri
 +
| String
 +
| Returns the uri of the person
 +
|-
 +
|}
 +
 +
 +
MObject <b>Image</b>:
 +
{| class="wikitable" border="1"
 +
|-
 +
! Attribute
 +
! Type
 +
! Description
 +
|-
 +
| Link
 +
| String
 +
| Returns the link of the image
 +
|-
 +
| Title
 +
| String
 +
| Returns the title of the image
 +
|-
 +
| Url
 +
| String
 +
| Returns the url of the image
 +
|-
 +
| Description
 +
| String
 +
| Returns the description of the image
 +
|-
 +
|}
 +
 +
 +
 +
MObject <b>Category</b>:
 +
{| class="wikitable" border="1"
 +
|-
 +
! Attribute
 +
! Type
 +
! Description
 +
|-
 +
| Name
 +
| String
 +
| Returns the name of the category
 +
|-
 +
| TaxanomyUri
 +
| String
 +
| Returns the taxonomy uri of the category
 +
|-
 +
|}
 +
 +
 +
MObject <b>Enclosur</b>e:
 +
{| class="wikitable" border="1"
 +
|-
 +
! Attribute
 +
! Type
 +
! Description
 +
|-
 +
| Type
 +
| String
 +
| Returns the type of the enclosure
 +
|-
 +
| Url
 +
| String
 +
| Returns the url of the enclosure
 +
|-
 +
| Length
 +
| Integer
 +
| Returns the length of the enclosure
 +
|-
 +
|}
 +
 +
 +
MObject <b>Link</b>:
 +
{| class="wikitable" border="1"
 +
|-
 +
! Attribute
 +
! Type
 +
! Description
 +
|-
 +
| Href
 +
| String
 +
| Returns the href of the link
 +
|-
 +
| Hreflang
 +
| String
 +
| Returns the hreflang of the link
 +
|-
 +
| Rel
 +
| Integer
 +
| Returns the rel of the link
 +
|-
 +
| Title
 +
| String
 +
| Returns the title of the link
 +
|-
 +
| Type
 +
| String
 +
| Returns the type of the link
 +
|-
 +
| Length
 +
| Integer
 +
| Returns the length of the link
 +
|-
 +
|}
 +
 +
 +
MObject <b>Content</b>:
 +
{| class="wikitable" border="1"
 +
|-
 +
! Attribute
 +
! Type
 +
! Description
 +
|-
 +
| Mode
 +
| String
 +
| Returns the mode of the content
 +
|-
 +
| Value
 +
| String
 +
| Returns the value of the content
 +
|-
 +
| Type
 +
| String
 +
| Returns the type of the content
 +
|-
 +
|}
  
 
== FeedAgent configuration example ==  
 
== FeedAgent configuration example ==  
Line 183: Line 312:
 
   xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd"
 
   xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd"
 
>
 
>
<DataSourceConnectionConfig
+
   <DataSourceID>feeds</DataSourceID>
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+
  xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd"
+
>
+
   <DataSourceID>feed</DataSourceID>
+
 
   <SchemaID>org.eclipse.smila.connectivity.framework.agent.feed</SchemaID>
 
   <SchemaID>org.eclipse.smila.connectivity.framework.agent.feed</SchemaID>
 
   <DataConnectionID>
 
   <DataConnectionID>
Line 200: Line 325:
 
       <FeedAttributes>UpdateDate</FeedAttributes>
 
       <FeedAttributes>UpdateDate</FeedAttributes>
 
     </Attribute>
 
     </Attribute>
     <Attribute Type="String" Name="Url" KeyAttribute="true">
+
     <Attribute Type="String" Name="Uri" KeyAttribute="true">
 
       <FeedAttributes>Uri</FeedAttributes>
 
       <FeedAttributes>Uri</FeedAttributes>
 
     </Attribute>
 
     </Attribute>
     <Attribute Type="String" Name="Content" Attachment="true" MimeTypeAttribute="MimeType">
+
     <Attribute Type="String" Name="Links">
       <FeedAttributes>DescriptionValue</FeedAttributes>
+
       <FeedAttributes>Links</FeedAttributes>
     </Attribute>
+
     </Attribute>  
     <Attribute Type="String" Name="MimeType">
+
     <Attribute Type="MObject" Name="Contents">
       <FeedAttributes>DescriptionMimeType</FeedAttributes>
+
       <FeedAttributes>Contents</FeedAttributes>
 
     </Attribute>
 
     </Attribute>
 
     <Attribute Type="String" Name="Title">
 
     <Attribute Type="String" Name="Title">
 
       <FeedAttributes>Title</FeedAttributes>
 
       <FeedAttributes>Title</FeedAttributes>
 
     </Attribute>
 
     </Attribute>
     <Attribute Type="String" Name="FeedTitle">
+
     <Attribute Type="MObject" Name="Authors">
       <FeedAttributes>FeedTitle</FeedAttributes>
+
       <FeedAttributes>Authors</FeedAttributes>
     </Attribute>          
+
     </Attribute>  
 
   </Attributes>
 
   </Attributes>
 
   <Process>
 
   <Process>
     <UpdateInterval>60</UpdateInterval>
+
     <UpdateInterval>300</UpdateInterval>
 
     <FeedUrl>http://dev.eclipse.org/newslists/news.eclipse.rt.smila/maillist.rss</FeedUrl>
 
     <FeedUrl>http://dev.eclipse.org/newslists/news.eclipse.rt.smila/maillist.rss</FeedUrl>
 +
    <FeedUrl>http://search.twitter.com/search.atom?q=smila</FeedUrl>
 
   </Process>
 
   </Process>
 
</DataSourceConnectionConfig>
 
</DataSourceConnectionConfig>
 
 
</source>
 
</source>
  
 
== Output example ==  
 
== Output example ==  
  
A record created by the FeedAgent using the default configuration will have the following structure:
+
A record created by the FeedAgent using the default configuration may have the following or similar structure:
  
 
<source lang="xml">
 
<source lang="xml">
Line 255: Line 380:
 
     </L>
 
     </L>
 
   </A>
 
   </A>
   <A n="FeedTitle">
+
   <A n="Contents">
     <L>
+
     <O>
       <V>news.eclipse.rt.smila</V>
+
       <A n="Value">
    </L>
+
        <L>
 +
          <V>Hi J&#xFC;rgen, The idea is to support companies and projects that rely on semantic technologies (especially in RDF or OWL) with a set of plugins that they can reuse for their tooling. The first thing would be support for loading an ontology, searching for conc...</V>
 +
        </L>
 +
      </A>
 +
      <A n="Type">
 +
        <L>
 +
          <V>text/html</V>
 +
        </L>
 +
      </A>
 +
    </O>
 
   </A>
 
   </A>
 +
  <A n="Authors">
 +
    <O>
 +
      <A n="Name">
 +
        <L>
 +
          <V>lautenbacher@xxxxxxx (Florian Lautenbacher)</V>
 +
        </L>
 +
      </A>
 +
    </O>
 +
  </A>
 +
 
   <A n="_HASH_TOKEN">
 
   <A n="_HASH_TOKEN">
 
     <L>
 
     <L>
Line 267: Line 411:
 
     </L>
 
     </L>
 
   </A>
 
   </A>
  <Attachment>Content</Attachment>
 
 
</Record>
 
</Record>
  

Revision as of 05:38, 20 May 2009

Overview

The FeedAgent offers the functionality to receive RSS and Atom feeds on a regular basis. The implementation uses ROME and ROME Fetcher to retrieve and parse the feeds. ROME supports the following feed formats:

  • RSS 0.90
  • RSS 0.91 Netscape
  • RSS 0.91 Userland
  • RSS 0.92
  • RSS 0.93
  • RSS 0.94
  • RSS 1.0
  • RSS 2.0
  • Atom 0.3
  • Atom 1.0.

Agent configuration

The example configuration file called "feeds.xml" is located at configuration/org.eclipse.smila.connectivity.framework.

Defining Schema: org.eclipse.smila.connectivits.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd.

Agent configuration explanation

The root element of FeedAgent configuration is DataSourceConnectionConfig and contains the following sub elements:

  • DataSourceID – the identification of a data source
  • SchemaID – specifies the schema for the data source
  • DataConnectionID – describes which agent or crawler should be used
    • Crawler – service id a Crawler
    • Agent – service id of an Agent
  • CompoundHandling – specify if packed data (like a zip containing files) should be unpack and files within should be processed(YES or NO).
  • Attributes – list all attributes provided by the data source
    • Attribute
      • Type (required) – the data type (String, Integer or Date).
      • Name (required) – attributes name.
      • HashAttribute – specify if a hash should be created (true or false).
      • KeyAttribute – creates a key for this object, for example for record id (true or false).
      • Attachment – specify if the attribute return the data as attachment of record.
  • Process – contains parameters for the agent business logic.
    • UpdateInterval – the number of seconds to wait before reloading the feeds specified by FeedUrl.
    • FeedUrl – the URL of the news feed to load. You may specify multiple FeedUrls.


Here is a description of the Attributes the FeedAgent offers. It provides attributes about the feed itself (using prefix Feed) and attributes for entries of the feed. Some attributes do not return Literals (String, Date) but nested objects like Person, Link, etc. These objects are all MObjects that contain attributes themselfes. The nested MObjects and their attributes (the attribute names are hard coded and cannot be configured) are described below. Note that not all feeds necessarily provide values for all the attributes and that some values are provided that are not apperently part of the feed..

These are the attributes of the feed:

Attribute Type Description
FeedAuthors List<Person> Returns the feed authors
FeedCategories List<Category> Returns the feed categories
FeedContributors List<Person> Returns the feed contributors
FeedCopyright String Returns the feed copyright information
FeedDescription String Returns the feed description
FeedEncoding String Returns the charset encoding of the feed
FeedType String Returns the feed type
FeedImage Image Returns the feed image
FeedLanguage String Returns the feed language
FeedLinks List<Link> Returns the feed links
FeedPublishDate Date Returns the feed published date
FeedTitle String Returns the feed title
FeedUri String Returns the feed uri


And here are the attributes of feed entries:

Attribute Type Description
Authors List<Person> Returns a feed entry authors
Categories List<Category> Returns a feed entry categories
Contents List<Content> Returns a feed entry contents
Contributors List<Person> Returns a feed entry contributors
Description Content Returns a feed entry description
Enclosures List<Enclosure> Returns a feed entry enclosures
Links List<Link> Returns a feed entry links
PublishDate Date Returns a feed entry publish date
Title String Returns a feed entry title
Uri String Returns a feed entry uri.
UpdateDate Date Returns a feed entry update date.


MObject Person:

Attribute Type Description
Email String Returns the email of the person
Name String Returns the name of the person
Uri String Returns the uri of the person


MObject Image:

Attribute Type Description
Link String Returns the link of the image
Title String Returns the title of the image
Url String Returns the url of the image
Description String Returns the description of the image


MObject Category:

Attribute Type Description
Name String Returns the name of the category
TaxanomyUri String Returns the taxonomy uri of the category


MObject Enclosure:

Attribute Type Description
Type String Returns the type of the enclosure
Url String Returns the url of the enclosure
Length Integer Returns the length of the enclosure


MObject Link:

Attribute Type Description
Href String Returns the href of the link
Hreflang String Returns the hreflang of the link
Rel Integer Returns the rel of the link
Title String Returns the title of the link
Type String Returns the type of the link
Length Integer Returns the length of the link


MObject Content:

Attribute Type Description
Mode String Returns the mode of the content
Value String Returns the value of the content
Type String Returns the type of the content

FeedAgent configuration example

<DataSourceConnectionConfig
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd"
>
  <DataSourceID>feeds</DataSourceID>
  <SchemaID>org.eclipse.smila.connectivity.framework.agent.feed</SchemaID>
  <DataConnectionID>
    <Agent>FeedAgent</Agent>
  </DataConnectionID>
  <CompoundHandling>Yes</CompoundHandling>
  <Attributes>
    <Attribute Type="Date" Name="PublishDate" HashAttribute="true">
      <FeedAttributes>PublishDate</FeedAttributes>
    </Attribute>
    <Attribute Type="Date" Name="LastModifiedDate" HashAttribute="true">
      <FeedAttributes>UpdateDate</FeedAttributes>
    </Attribute>
    <Attribute Type="String" Name="Uri" KeyAttribute="true">
      <FeedAttributes>Uri</FeedAttributes>
    </Attribute>
    <Attribute Type="String" Name="Links">
      <FeedAttributes>Links</FeedAttributes>
    </Attribute>    
    <Attribute Type="MObject" Name="Contents">
      <FeedAttributes>Contents</FeedAttributes>
    </Attribute>
    <Attribute Type="String" Name="Title">
      <FeedAttributes>Title</FeedAttributes>
    </Attribute>
    <Attribute Type="MObject" Name="Authors">
      <FeedAttributes>Authors</FeedAttributes>
    </Attribute>    
  </Attributes>
  <Process>
    <UpdateInterval>300</UpdateInterval>
    <FeedUrl>http://dev.eclipse.org/newslists/news.eclipse.rt.smila/maillist.rss</FeedUrl>
    <FeedUrl>http://search.twitter.com/search.atom?q=smila</FeedUrl>
  </Process>
</DataSourceConnectionConfig>

Output example

A record created by the FeedAgent using the default configuration may have the following or similar structure:

<Record xmlns="http://www.eclipse.org/smila/record" version="1.0">
  <Id xmlns="http://www.eclipse.org/smila/id" version="1.0">
    <!-- Element name must be Source, not _Source, it's made due to syntax coloring problem in wiki -->
    <_Source>feed</_Source>
    <Key name="Url">http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</Key>
  </Id>
  <A n="PublishDate">
    <L>
      <V t="datetime">2009-04-30 13:28:34.0</V>
    </L>
  </A>
  <A n="Url">
    <L>
      <V>http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</V>
    </L>
  </A>
  <A n="MimeType">
    <L>
      <V>text/html</V>
    </L>
  </A>
  <A n="Title">
    <L>
      <V>[news.eclipse.rt.smila] Re: Semantic Software Engineering</V>
    </L>
  </A>
  <A n="Contents">
    <O>
      <A n="Value">
        <L>
          <V>Hi J&#xFC;rgen, The idea is to support companies and projects that rely on semantic technologies (especially in RDF or OWL) with a set of plugins that they can reuse for their tooling. The first thing would be support for loading an ontology, searching for conc...</V>
        </L>
      </A>
      <A n="Type">
        <L>
          <V>text/html</V>
        </L>
      </A>
    </O>
  </A>
  <A n="Authors">
    <O>
      <A n="Name">
        <L>
          <V>lautenbacher@xxxxxxx (Florian Lautenbacher)</V>
        </L>
      </A>
    </O>
  </A>
 
  <A n="_HASH_TOKEN">
    <L>
      <V>
        c51f10f6a0cf825c54361a62c0ef44fe55f8ad59b26b559cb837ff39eea3adb9
      </V>
    </L>
  </A>
</Record>

See also