Difference between revisions of "SMILA/Documentation/Web Crawler"

Revision as of 06:11, 19 March 2009

What does Web Crawler do

A WebCrawler collects data from the internet. Starting with an initial URL it recursively crawls all linked Websites. Due to the manifold capabilities of webpage structures and much linking to other pages, the configuration of this crawler enables you to limit the downloaded data to match your needs.

Crawling configuration

Defining Schema: org.eclipse.smila.connectivitiy.framework.crawler.web/schemas/WebIndexOrder.xsd

Crawling configuration explanation

The root element of crawling configuration is IndexOrderConfiguration and contains the following sub elements:

DataSourceID – the identification of a data source.
SchemaID – specify the schema for a crawler job.
DataConnectionID – describes which agent crawler should be used.
- Crawler – implementation class of a Crawler.
- Agent – implementation class of an Agent.

@@ Line 10: / Line 10: @@
 The root element of crawling configuration is IndexOrderConfiguration and contains the following sub elements:
+* <tt>DataSourceID</tt> – the identification of a data source.
+* <tt>SchemaID</tt> – specify the schema for a crawler job.
+* <tt>DataConnectionID</tt> – describes which agent crawler should be used.
+** <tt>Crawler</tt> – implementation class of a Crawler.
+** <tt>Agent</tt> – implementation class of an Agent.
 == See also ==

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "SMILA/Documentation/Web Crawler"

Revision as of 06:11, 19 March 2009

Contents

What does Web Crawler do

Crawling configuration

Crawling configuration explanation

See also

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "SMILA/Documentation/Web Crawler"

Revision as of 06:11, 19 March 2009

Contents

What does Web Crawler do

Crawling configuration

Crawling configuration explanation

See also