Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Documentation/Filesystem Crawler
< SMILA | Documentation
Revision as of 03:23, 19 March 2009 by Eliseyev.softaria.com (Talk | contribs)
Contents
What does FileSystemCrawler do
The FileSystemCrawler collects all files and folders recursively starting from a given directory. Next do the content of files it may gather any file meta information from the following list:
- size
- full path
- file name only
- file size
- last modified date
- file content
- file extension
Crawling configuration
The configuration file can be found at configuration/org.eclipse.smila.framework/file.
Crawling Configuration explanation
The root element of crawling configuration is CrawlJob and contains the following sub elements:
- DataSourceID – the identification of a data source
- SchemaID – specifies the schema for a crawler job
- DataConnectionID – describes which agent crawler should be used
- Crawler – implementation class of a Crawler
- Agent – implementation class of an Agent