Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "SMILA/Documentation/Filesystem Crawler"
< SMILA | Documentation
m |
m |
||
Line 28: | Line 28: | ||
[[Category:SMILA]] | [[Category:SMILA]] | ||
+ | [[Category:Crawler]] |
Revision as of 03:23, 19 March 2009
Contents
What does FileSystemCrawler do
The FileSystemCrawler collects all files and folders recursively starting from a given directory. Next do the content of files it may gather any file meta information from the following list:
- size
- full path
- file name only
- file size
- last modified date
- file content
- file extension
Crawling configuration
The configuration file can be found at configuration/org.eclipse.smila.framework/file.
Crawling Configuration explanation
The root element of crawling configuration is CrawlJob and contains the following sub elements:
- DataSourceID – the identification of a data source
- SchemaID – specifies the schema for a crawler job
- DataConnectionID – describes which agent crawler should be used
- Crawler – implementation class of a Crawler
- Agent – implementation class of an Agent