Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "SMILA/Documentation/Feed Agent"
m |
|||
Line 1: | Line 1: | ||
== Overview == | == Overview == | ||
− | The | + | The Feed agent offers the functionality to receive RSS and Atom feeds on a regular basis. The implementation uses [http://rome.dev.java.net/ ROME] and [http://wiki.java.net/bin/view/Javawsxml/RomeFetcher ROME Fetcher] to retrieve and parse the feeds. ROME supports the following feed formats: |
* RSS 0.90 | * RSS 0.90 | ||
* RSS 0.91 Netscape | * RSS 0.91 Netscape | ||
Line 15: | Line 15: | ||
== Agent configuration == | == Agent configuration == | ||
− | The example configuration file | + | The example configuration file is located at <tt>configuration/org.eclipse.smila.connectivity.framework/feeds.xml</tt>. |
Defining Schema: <tt>org.eclipse.smila.connectivits.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd</tt>. | Defining Schema: <tt>org.eclipse.smila.connectivits.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd</tt>. | ||
Line 21: | Line 21: | ||
== Agent configuration explanation == | == Agent configuration explanation == | ||
− | The root element of | + | The root element of the configuration is <tt>DataSourceConnectionConfig</tt> and contains the following sub elements: |
* <tt>DataSourceID</tt> – the identification of a data source | * <tt>DataSourceID</tt> – the identification of a data source | ||
* <tt>SchemaID</tt> – specifies the schema for the data source | * <tt>SchemaID</tt> – specifies the schema for the data source | ||
* <tt>DataConnectionID</tt> – describes which agent or crawler should be used | * <tt>DataConnectionID</tt> – describes which agent or crawler should be used | ||
− | ** <tt>Crawler</tt> – service | + | ** <tt>Crawler</tt> – service ID a crawler |
− | ** <tt>Agent</tt> – service | + | ** <tt>Agent</tt> – service ID of an agent |
− | * <tt>CompoundHandling</tt> – specify if packed data (like a | + | * <tt>CompoundHandling</tt> – specify if packed data (like a ZIP containing files) should be unpack and files within should be processed (YES or NO). |
* <tt>Attributes</tt> – list all attributes provided by the data source | * <tt>Attributes</tt> – list all attributes provided by the data source | ||
** <tt>Attribute</tt> | ** <tt>Attribute</tt> | ||
Line 40: | Line 40: | ||
** <tt>FeedUrl</tt> – the URL of the news feed to load. You may specify multiple FeedUrls. | ** <tt>FeedUrl</tt> – the URL of the news feed to load. You may specify multiple FeedUrls. | ||
− | + | Here is a description of the attributes that the Feed agent offers. It provides attributes about the feed itself (using prefix <tt>Feed</tt>) and attributes for entries of the feed. Some attributes do not return literals (string, date) but nested objects like <b>Person</b>, <b>Link</b>, etc. These objects are all MObjects that contain attributes themselves. The nested MObjects and their attributes (the attribute names are hard coded and cannot be configured) are described below. | |
− | Here is a description of the | + | Note that not all feeds necessarily provide values for all the attributes and that some values are provided that are not apperently part of the feed. |
− | Note that not all feeds necessarily provide values for all the attributes and that some values are provided that are not apperently part of the feed | + | |
These are the attributes of the feed: | These are the attributes of the feed: | ||
Line 158: | Line 157: | ||
|- | |- | ||
|} | |} | ||
− | |||
− | |||
MObject <b>Person</b>: | MObject <b>Person</b>: | ||
Line 181: | Line 178: | ||
|- | |- | ||
|} | |} | ||
− | |||
MObject <b>Image</b>: | MObject <b>Image</b>: | ||
Line 207: | Line 203: | ||
|- | |- | ||
|} | |} | ||
− | |||
− | |||
MObject <b>Category</b>: | MObject <b>Category</b>: | ||
Line 226: | Line 220: | ||
|- | |- | ||
|} | |} | ||
− | |||
MObject <b>Enclosur</b>e: | MObject <b>Enclosur</b>e: | ||
Line 248: | Line 241: | ||
|- | |- | ||
|} | |} | ||
− | |||
MObject <b>Link</b>: | MObject <b>Link</b>: | ||
Line 282: | Line 274: | ||
|- | |- | ||
|} | |} | ||
− | |||
MObject <b>Content</b>: | MObject <b>Content</b>: | ||
Line 305: | Line 296: | ||
|} | |} | ||
− | == | + | == Configuration example == |
<source lang="xml"> | <source lang="xml"> | ||
Line 351: | Line 342: | ||
== Output example == | == Output example == | ||
− | A record created by the | + | A record created by the Feed agent using the default configuration may have the following or a similar structure: |
<source lang="xml"> | <source lang="xml"> | ||
<Record xmlns="http://www.eclipse.org/smila/record" version="1.0"> | <Record xmlns="http://www.eclipse.org/smila/record" version="1.0"> | ||
− | < | + | <Val key="_recordid">feed:<Url=http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</Val> |
− | + | <Val key="_source">feed</Val> | |
− | + | <Val key="PublishDate"> type="datetime">2009-04-30T13:28:34+0100</Val> | |
− | + | <Val key="Url">http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</Val> | |
− | </ | + | <Val key="MimeType">text/html</Val> |
− | < | + | <Val key="Title">[news.eclipse.rt.smila] Re: Semantic Software Engineering</Val> |
− | + | <Seq key="Contents"> | |
− | + | <Map> | |
− | + | <Val key="Value">Hi Jürgen, The idea is to support companies and projects that rely on semantic technologies (especially in RDF or OWL) with a set of plugins that they can reuse for their tooling. The first thing would be support for loading an ontology, searching for conc...</Val> | |
− | < | + | <Val key="Type">text/html</Val> |
− | + | </Map> | |
− | + | </Seq> | |
− | + | <Seq key="Authors"> | |
− | + | <Map> | |
− | + | <Val key="Name">lautenbacher@xxxxxxx (Florian Lautenbacher)</Val> | |
− | < | + | </Map> |
− | + | </Seq> | |
− | + | ||
− | + | ||
− | < | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | < | + | |
− | < | + | |
− | < | + | |
− | + | ||
− | + | ||
− | + | ||
− | < | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | </ | + | |
− | </ | + | |
− | < | + | |
− | < | + | |
− | < | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | </ | + | |
− | </ | + | |
</Record> | </Record> | ||
− | |||
</source> | </source> | ||
Revision as of 10:43, 20 April 2011
Contents
Overview
The Feed agent offers the functionality to receive RSS and Atom feeds on a regular basis. The implementation uses ROME and ROME Fetcher to retrieve and parse the feeds. ROME supports the following feed formats:
- RSS 0.90
- RSS 0.91 Netscape
- RSS 0.91 Userland
- RSS 0.92
- RSS 0.93
- RSS 0.94
- RSS 1.0
- RSS 2.0
- Atom 0.3
- Atom 1.0.
Agent configuration
The example configuration file is located at configuration/org.eclipse.smila.connectivity.framework/feeds.xml.
Defining Schema: org.eclipse.smila.connectivits.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd.
Agent configuration explanation
The root element of the configuration is DataSourceConnectionConfig and contains the following sub elements:
- DataSourceID – the identification of a data source
- SchemaID – specifies the schema for the data source
- DataConnectionID – describes which agent or crawler should be used
- Crawler – service ID a crawler
- Agent – service ID of an agent
- CompoundHandling – specify if packed data (like a ZIP containing files) should be unpack and files within should be processed (YES or NO).
- Attributes – list all attributes provided by the data source
- Attribute
- Type (required) – the data type (String, Integer or Date).
- Name (required) – attributes name.
- HashAttribute – specify if a hash should be created (true or false).
- KeyAttribute – creates a key for this object, for example for record id (true or false).
- Attachment – specify if the attribute return the data as attachment of record.
- Attribute
- Process – contains parameters for the agent business logic.
- UpdateInterval – the number of seconds to wait before reloading the feeds specified by FeedUrl.
- FeedUrl – the URL of the news feed to load. You may specify multiple FeedUrls.
Here is a description of the attributes that the Feed agent offers. It provides attributes about the feed itself (using prefix Feed) and attributes for entries of the feed. Some attributes do not return literals (string, date) but nested objects like Person, Link, etc. These objects are all MObjects that contain attributes themselves. The nested MObjects and their attributes (the attribute names are hard coded and cannot be configured) are described below. Note that not all feeds necessarily provide values for all the attributes and that some values are provided that are not apperently part of the feed.
These are the attributes of the feed:
Attribute | Type | Description |
---|---|---|
FeedAuthors | List<Person> | Returns the feed authors |
FeedCategories | List<Category> | Returns the feed categories |
FeedContributors | List<Person> | Returns the feed contributors |
FeedCopyright | String | Returns the feed copyright information |
FeedDescription | String | Returns the feed description |
FeedEncoding | String | Returns the charset encoding of the feed |
FeedType | String | Returns the feed type |
FeedImage | Image | Returns the feed image |
FeedLanguage | String | Returns the feed language |
FeedLinks | List<Link> | Returns the feed links |
FeedPublishDate | Date | Returns the feed published date |
FeedTitle | String | Returns the feed title |
FeedUri | String | Returns the feed uri |
And here are the attributes of feed entries:
Attribute | Type | Description |
---|---|---|
Authors | List<Person> | Returns a feed entry authors |
Categories | List<Category> | Returns a feed entry categories |
Contents | List<Content> | Returns a feed entry contents |
Contributors | List<Person> | Returns a feed entry contributors |
Description | Content | Returns a feed entry description |
Enclosures | List<Enclosure> | Returns a feed entry enclosures |
Links | List<Link> | Returns a feed entry links |
PublishDate | Date | Returns a feed entry publish date |
Title | String | Returns a feed entry title |
Uri | String | Returns a feed entry uri. |
UpdateDate | Date | Returns a feed entry update date. |
MObject Person:
Attribute | Type | Description |
---|---|---|
String | Returns the email of the person | |
Name | String | Returns the name of the person |
Uri | String | Returns the uri of the person |
MObject Image:
Attribute | Type | Description |
---|---|---|
Link | String | Returns the link of the image |
Title | String | Returns the title of the image |
Url | String | Returns the url of the image |
Description | String | Returns the description of the image |
MObject Category:
Attribute | Type | Description |
---|---|---|
Name | String | Returns the name of the category |
TaxanomyUri | String | Returns the taxonomy uri of the category |
MObject Enclosure:
Attribute | Type | Description |
---|---|---|
Type | String | Returns the type of the enclosure |
Url | String | Returns the url of the enclosure |
Length | Integer | Returns the length of the enclosure |
MObject Link:
Attribute | Type | Description |
---|---|---|
Href | String | Returns the href of the link |
Hreflang | String | Returns the hreflang of the link |
Rel | Integer | Returns the rel of the link |
Title | String | Returns the title of the link |
Type | String | Returns the type of the link |
Length | Integer | Returns the length of the link |
MObject Content:
Attribute | Type | Description |
---|---|---|
Mode | String | Returns the mode of the content |
Value | String | Returns the value of the content |
Type | String | Returns the type of the content |
Configuration example
<DataSourceConnectionConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.agent.feed/schemas/FeedDataSourceConnectionConfigSchema.xsd" > <DataSourceID>feeds</DataSourceID> <SchemaID>org.eclipse.smila.connectivity.framework.agent.feed</SchemaID> <DataConnectionID> <Agent>FeedAgent</Agent> </DataConnectionID> <CompoundHandling>Yes</CompoundHandling> <Attributes> <Attribute Type="Date" Name="PublishDate" HashAttribute="true"> <FeedAttributes>PublishDate</FeedAttributes> </Attribute> <Attribute Type="Date" Name="LastModifiedDate" HashAttribute="true"> <FeedAttributes>UpdateDate</FeedAttributes> </Attribute> <Attribute Type="String" Name="Uri" KeyAttribute="true"> <FeedAttributes>Uri</FeedAttributes> </Attribute> <Attribute Type="String" Name="Links"> <FeedAttributes>Links</FeedAttributes> </Attribute> <Attribute Type="MObject" Name="Contents"> <FeedAttributes>Contents</FeedAttributes> </Attribute> <Attribute Type="String" Name="Title"> <FeedAttributes>Title</FeedAttributes> </Attribute> <Attribute Type="MObject" Name="Authors"> <FeedAttributes>Authors</FeedAttributes> </Attribute> </Attributes> <Process> <UpdateInterval>300</UpdateInterval> <FeedUrl>http://dev.eclipse.org/newslists/news.eclipse.rt.smila/maillist.rss</FeedUrl> <FeedUrl>http://search.twitter.com/search.atom?q=smila</FeedUrl> </Process> </DataSourceConnectionConfig>
Output example
A record created by the Feed agent using the default configuration may have the following or a similar structure:
<Record xmlns="http://www.eclipse.org/smila/record" version="1.0"> <Val key="_recordid">feed:<Url=http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</Val> <Val key="_source">feed</Val> <Val key="PublishDate"> type="datetime">2009-04-30T13:28:34+0100</Val> <Val key="Url">http://dev.eclipse.org/mhonarc/newsLists/news.eclipse.rt.smila/msg00022.html</Val> <Val key="MimeType">text/html</Val> <Val key="Title">[news.eclipse.rt.smila] Re: Semantic Software Engineering</Val> <Seq key="Contents"> <Map> <Val key="Value">Hi Jürgen, The idea is to support companies and projects that rely on semantic technologies (especially in RDF or OWL) with a set of plugins that they can reuse for their tooling. The first thing would be support for loading an ontology, searching for conc...</Val> <Val key="Type">text/html</Val> </Map> </Seq> <Seq key="Authors"> <Map> <Val key="Name">lautenbacher@xxxxxxx (Florian Lautenbacher)</Val> </Map> </Seq> </Record>