Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

SMILA/Crawler

< SMILA
Revision as of 02:55, 19 March 2009 by Eliseyev.softaria.com (Talk | contribs) (New page: A Crawler gathers information about resources, both content and metadata of interest like size or mime type. SMILA contains three types of crawlers, each for a different data source type,...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

A Crawler gathers information about resources, both content and metadata of interest like size or mime type.

SMILA contains three types of crawlers, each for a different data source type, namely WebCrawler, JDBC DatabaseCrawler, and FileSystemCrawler to facilitate gathering information from the internet, databases, or files from a hard disk.

A Crawler is started with a specific, named configuration, that defines what information is to be crawled (e.g. content, kinds of metadata) and where to find that data (e.g. file system path, JDBC Connection String).

The CrawlerController manages the life cycle of the crawler (e.g. start, stop, abort) and may instantiate multiple Crawlers concurrently, even of the same type.

Furthermore, the Connectivity Framework provides an API for developers to create own crawlers.

Back to the top