Difference between revisions of "SMILA/Crawler"

Latest revision as of 06:54, 25 January 2012

@@ Line 1: / Line 1: @@
-{{note|Deprecated, please use [[SMILA/Documentation/Importing/Concept|Importing Framework]] instead.}}
+You are looking either for [[SMILA/Documentation/Importing/Concept]] or [[SMILA/Documentation/ConnectivityFramework]]. Choose wisely.
-A Crawler gathers information about resources, both content and metadata of interest like size or mime type.
-SMILA contains three types of crawlers, each for a different data source type, namely WebCrawler, JDBC DatabaseCrawler, and FileSystemCrawler to facilitate gathering information from the internet, databases, or files from a hard disk.
-A Crawler is started with a specific, named configuration, that defines what information is to be crawled (e.g. content, kinds of metadata) and where to find that data (e.g. file system path, JDBC Connection String).
-The CrawlerController manages the life cycle of the crawler (e.g. start, stop, abort) and may instantiate multiple Crawlers concurrently, even of the same type.
-Furthermore, the Connectivity Framework provides an API for developers to create own crawlers.