Difference between revisions of "SMILA/Documentation/Feed Agent"

From Eclipsepedia

Jump to: navigation, search
(New page: == Overview == The FeedAgent offers the functionality to receive RSS and Atom feeds on a regular basis. The implementation uses [https://rome.dev.java.net/|ROME] and [http://wiki.java.net...)
 
(Overview)
Line 1: Line 1:
 
== Overview ==
 
== Overview ==
  
The FeedAgent offers the functionality to receive RSS and Atom feeds on a regular basis. The implementation uses [https://rome.dev.java.net/|ROME] and [http://wiki.java.net/bin/view/Javawsxml/RomeFetcher|ROME Fetcher] to retrieve and parse the feeds. ROME supports the following feed formats:
+
The FeedAgent offers the functionality to receive RSS and Atom feeds on a regular basis. The implementation uses [http://rome.dev.java.net/ ROME] and [http://wiki.java.net/bin/view/Javawsxml/RomeFetcher ROME Fetcher] to retrieve and parse the feeds. ROME supports the following feed formats:
 
* RSS 0.90
 
* RSS 0.90
 
* RSS 0.91 Netscape
 
* RSS 0.91 Netscape

Revision as of 06:29, 18 May 2009

Contents

Overview

The FeedAgent offers the functionality to receive RSS and Atom feeds on a regular basis. The implementation uses ROME and ROME Fetcher to retrieve and parse the feeds. ROME supports the following feed formats:

  • RSS 0.90
  • RSS 0.91 Netscape
  • RSS 0.91 Userland
  • RSS 0.92
  • RSS 0.93
  • RSS 0.94
  • RSS 1.0
  • RSS 2.0
  • Atom 0.3
  • Atom 1.0.

Agent configuration

The example configuration file called "feedAgent.xml" is located at configuration/org.eclipse.smila.connectivity.framework.

Defining Schema: org.eclipse.smila.connectivits.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd.

Agent configuration explanation

The root element of FeedAgent configuration is DataSourceConnectionConfig and contains the following sub elements:

  • DataSourceID – the identification of a data source
  • SchemaID – specifies the schema for the data source
  • DataConnectionID – describes which agent or crawler should be used
    • Crawler – service id a Crawler
    • Agent – service id of an Agent
  • CompoundHandling – specify if packed data (like a zip containing files) should be unpack and files within should be processed(YES or NO).
  • Attributes – list all attributes provided by the data source
    • Attribute
      • Type (required) – the data type (String, Integer or Date).
      • Name (required) – attributes name.
      • HashAttribute – specify if a hash should be created (true or false).
      • KeyAttribute – creates a key for this object, for example for record id (true or false).
      • Attachment – specify if the attribute return the data as attachment of record.
  • Process – contains parameters for the agent business logic.
    • UpdateInterval – the number of seconds to wait before reloading the feeds specified by FeedUrl.
    • FeedUrl – the URL of the news feed to load. You may specify multiple FeedUrls.


Here is a description of the Attributes the FeedAgent offers. It provides attributes about the feed itself (using prefix Feed) and attributes for entries of the feed. Note that not all feeds necessarily provide values for the attributes.

These are the attributes of the feed:

Attribute Type Description
FeedAuthors List<String> Returns the feed authors
FeedCategories List<String> Returns the feed categories
FeedContributors List<String> Returns the feed contributors
FeedCopyright String Returns the feed copyright information
FeedDescription String Returns the feed description
FeedEncoding String Returns the charset encoding of the feed
FeedType String Returns the feed type
FeedImageDescription String Returns the feed image description
FeedImageLink String Returns the feed image link
FeedImageTitle String Returns the feed image title
FeedImageUrl String Returns the feed image url
FeedLanguage String Returns the feed language
FeedLinks List<String> Returns the feed links
FeedPublishDate Date Returns the feed published date
FeedTitle String Returns the feed title
FeedUri String Returns the feed uri


And here are the attributes of feed entries:

Attribute Type Description
Authors List<String> Returns a feed entry authors
Categories List<String> Returns a feed entry categories
Contents List<String> Returns a feed entry contents
Contributors List<String> Returns a feed entry contributors
DescriptionMimeType String Returns the mime type of a feed entry description
DescriptionValue String Returns a feed entry description
Enclosures List<String> Returns a feed entry enclosures
Links List<String> Returns a feed entry links
PublishDate Date Returns a feed entry publish date
Title String Returns a feed entry title
Uri String Returns a feed entry uri.
UpdateDate Date Returns a feed entry update date.


FeedAgent configuration example

<DataSourceConnectionConfig
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd"
>
<DataSourceConnectionConfig
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd"
>
  <DataSourceID>feed</DataSourceID>
  <SchemaID>org.eclipse.smila.connectivity.framework.agent.feed</SchemaID>
  <DataConnectionID>
    <Agent>FeedAgent</Agent>
  </DataConnectionID>
  <CompoundHandling>Yes</CompoundHandling>
  <Attributes>
    <Attribute Type="Date" Name="PublishDate" HashAttribute="true">
      <FeedAttributes>PublishDate</FeedAttributes>
    </Attribute>
    <Attribute Type="Date" Name="LastModifiedDate" HashAttribute="true">
      <FeedAttributes>UpdateDate</FeedAttributes>
    </Attribute>
    <Attribute Type="String" Name="Url" KeyAttribute="true">
      <FeedAttributes>Uri</FeedAttributes>
    </Attribute>
    <Attribute Type="String" Name="Content" Attachment="true" MimeTypeAttribute="MimeType">
      <FeedAttributes>DescriptionValue</FeedAttributes>
    </Attribute>
    <Attribute Type="String" Name="MimeType">
      <FeedAttributes>DescriptionMimeType</FeedAttributes>
    </Attribute>
    <Attribute Type="String" Name="Title">
      <FeedAttributes>Title</FeedAttributes>
    </Attribute>
    <Attribute Type="String" Name="FeedTitle">
      <FeedAttributes>FeedTitle</FeedAttributes>
    </Attribute>            
  </Attributes>
  <Process>
    <UpdateInterval>60</UpdateInterval>
    <FeedUrl>http://dev.eclipse.org/newslists/news.eclipse.rt.smila/maillist.rss</FeedUrl>
  </Process>
</DataSourceConnectionConfig>

Output example

A record created by the FeedAgent using the default configuration will have the following structure:

<Record xmlns="http://www.eclipse.org/smila/record" version="1.0">
  <Id xmlns="http://www.eclipse.org/smila/id" version="1.0">
    <!-- Element name must be Source, not _Source, it's made due to syntax coloring problem in wiki -->
    <_Source>feed</_Source>
    <Key name="Url">http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</Key>
  </Id>
  <A n="PublishDate">
    <L>
      <V t="datetime">2009-04-30 13:28:34.0</V>
    </L>
  </A>
  <A n="Url">
    <L>
      <V>http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</V>
    </L>
  </A>
  <A n="MimeType">
    <L>
      <V>text/html</V>
    </L>
  </A>
  <A n="Title">
    <L>
      <V>[news.eclipse.rt.smila] Re: Semantic Software Engineering</V>
    </L>
  </A>
  <A n="FeedTitle">
    <L>
      <V>news.eclipse.rt.smila</V>
    </L>
  </A>
  <A n="_HASH_TOKEN">
    <L>
      <V>
        c51f10f6a0cf825c54361a62c0ef44fe55f8ad59b26b559cb837ff39eea3adb9
      </V>
    </L>
  </A>
  <Attachment>Content</Attachment>
</Record>

See also