Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Documentation/Feed Agent
Contents
Overview
The FeedAgent offers the functionality to receive RSS and Atom feeds on a regular basis. The implementation uses ROME and ROME Fetcher to retrieve and parse the feeds. ROME supports the following feed formats:
- RSS 0.90
- RSS 0.91 Netscape
- RSS 0.91 Userland
- RSS 0.92
- RSS 0.93
- RSS 0.94
- RSS 1.0
- RSS 2.0
- Atom 0.3
- Atom 1.0.
Agent configuration
The example configuration file called "feed.xml" is located at configuration/org.eclipse.smila.connectivity.framework.
Defining Schema: org.eclipse.smila.connectivits.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd.
Agent configuration explanation
The root element of FeedAgent configuration is DataSourceConnectionConfig and contains the following sub elements:
- DataSourceID – the identification of a data source
- SchemaID – specifies the schema for the data source
- DataConnectionID – describes which agent or crawler should be used
- Crawler – service id a Crawler
- Agent – service id of an Agent
- CompoundHandling – specify if packed data (like a zip containing files) should be unpack and files within should be processed(YES or NO).
- Attributes – list all attributes provided by the data source
- Attribute
- Type (required) – the data type (String, Integer or Date).
- Name (required) – attributes name.
- HashAttribute – specify if a hash should be created (true or false).
- KeyAttribute – creates a key for this object, for example for record id (true or false).
- Attachment – specify if the attribute return the data as attachment of record.
- Attribute
- Process – contains parameters for the agent business logic.
- UpdateInterval – the number of seconds to wait before reloading the feeds specified by FeedUrl.
- FeedUrl – the URL of the news feed to load. You may specify multiple FeedUrls.
Here is a description of the Attributes the FeedAgent offers. It provides attributes about the feed itself (using prefix Feed) and attributes for entries of the feed. Note that not all feeds necessarily provide values for the attributes.
These are the attributes of the feed:
Attribute | Type | Description |
---|---|---|
FeedAuthors | List<String> | Returns the feed authors |
FeedCategories | List<String> | Returns the feed categories |
FeedContributors | List<String> | Returns the feed contributors |
FeedCopyright | String | Returns the feed copyright information |
FeedDescription | String | Returns the feed description |
FeedEncoding | String | Returns the charset encoding of the feed |
FeedType | String | Returns the feed type |
FeedImageDescription | String | Returns the feed image description |
FeedImageLink | String | Returns the feed image link |
FeedImageTitle | String | Returns the feed image title |
FeedImageUrl | String | Returns the feed image url |
FeedLanguage | String | Returns the feed language |
FeedLinks | List<String> | Returns the feed links |
FeedPublishDate | Date | Returns the feed published date |
FeedTitle | String | Returns the feed title |
FeedUri | String | Returns the feed uri |
And here are the attributes of feed entries:
Attribute | Type | Description |
---|---|---|
Authors | List<String> | Returns a feed entry authors |
Categories | List<String> | Returns a feed entry categories |
Contents | List<String> | Returns a feed entry contents |
Contributors | List<String> | Returns a feed entry contributors |
DescriptionMimeType | String | Returns the mime type of a feed entry description |
DescriptionValue | String | Returns a feed entry description |
Enclosures | List<String> | Returns a feed entry enclosures |
Links | List<String> | Returns a feed entry links |
PublishDate | Date | Returns a feed entry publish date |
Title | String | Returns a feed entry title |
Uri | String | Returns a feed entry uri. |
UpdateDate | Date | Returns a feed entry update date. |
FeedAgent configuration example
<DataSourceConnectionConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd" > <DataSourceConnectionConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd" > <DataSourceID>feed</DataSourceID> <SchemaID>org.eclipse.smila.connectivity.framework.agent.feed</SchemaID> <DataConnectionID> <Agent>FeedAgent</Agent> </DataConnectionID> <CompoundHandling>Yes</CompoundHandling> <Attributes> <Attribute Type="Date" Name="PublishDate" HashAttribute="true"> <FeedAttributes>PublishDate</FeedAttributes> </Attribute> <Attribute Type="Date" Name="LastModifiedDate" HashAttribute="true"> <FeedAttributes>UpdateDate</FeedAttributes> </Attribute> <Attribute Type="String" Name="Url" KeyAttribute="true"> <FeedAttributes>Uri</FeedAttributes> </Attribute> <Attribute Type="String" Name="Content" Attachment="true" MimeTypeAttribute="MimeType"> <FeedAttributes>DescriptionValue</FeedAttributes> </Attribute> <Attribute Type="String" Name="MimeType"> <FeedAttributes>DescriptionMimeType</FeedAttributes> </Attribute> <Attribute Type="String" Name="Title"> <FeedAttributes>Title</FeedAttributes> </Attribute> <Attribute Type="String" Name="FeedTitle"> <FeedAttributes>FeedTitle</FeedAttributes> </Attribute> </Attributes> <Process> <UpdateInterval>60</UpdateInterval> <FeedUrl>http://dev.eclipse.org/newslists/news.eclipse.rt.smila/maillist.rss</FeedUrl> </Process> </DataSourceConnectionConfig>
Output example
A record created by the FeedAgent using the default configuration will have the following structure:
<Record xmlns="http://www.eclipse.org/smila/record" version="1.0"> <Id xmlns="http://www.eclipse.org/smila/id" version="1.0"> <!-- Element name must be Source, not _Source, it's made due to syntax coloring problem in wiki --> <_Source>feed</_Source> <Key name="Url">http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</Key> </Id> <A n="PublishDate"> <L> <V t="datetime">2009-04-30 13:28:34.0</V> </L> </A> <A n="Url"> <L> <V>http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</V> </L> </A> <A n="MimeType"> <L> <V>text/html</V> </L> </A> <A n="Title"> <L> <V>[news.eclipse.rt.smila] Re: Semantic Software Engineering</V> </L> </A> <A n="FeedTitle"> <L> <V>news.eclipse.rt.smila</V> </L> </A> <A n="_HASH_TOKEN"> <L> <V> c51f10f6a0cf825c54361a62c0ef44fe55f8ad59b26b559cb837ff39eea3adb9 </V> </L> </A> <Attachment>Content</Attachment> </Record>