Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "SMILA/Documentation/Filesystem Crawler"
< SMILA | Documentation
m (Replacing page with '== What does FileSystemCrawler do == The FileSystemCrawler collects all files and folders recursively starting from a given directory. Next do the content of files it may gath...') |
m |
||
Line 14: | Line 14: | ||
The configuration file can be found at <tt>configuration/org.eclipse.smila.framework/file</tt>. | The configuration file can be found at <tt>configuration/org.eclipse.smila.framework/file</tt>. | ||
+ | |||
+ | == Crawling Configuration explanation == | ||
+ | |||
+ | The root element of crawling configuration is CrawlJob and contains the following sub elements: | ||
+ | |||
+ | * <tt>DataSourceID</tt> – the identification of a data source | ||
+ | * <tt>SchemaID</tt> – specifies the schema for a crawler job | ||
+ | * <tt>DataConnectionID</tt> – describes which agent crawler should be used | ||
+ | ** <tt>Crawler</tt> – implementation class of a Crawler | ||
+ | ** <tt>Agent</tt> – implementation class of an Agent | ||
+ | |||
+ | [[Category:SMILA]] |
Revision as of 03:21, 19 March 2009
What does FileSystemCrawler do
The FileSystemCrawler collects all files and folders recursively starting from a given directory. Next do the content of files it may gather any file meta information from the following list:
- size
- full path
- file name only
- file size
- last modified date
- file content
- file extension
Crawling configuration
The configuration file can be found at configuration/org.eclipse.smila.framework/file.
Crawling Configuration explanation
The root element of crawling configuration is CrawlJob and contains the following sub elements:
- DataSourceID – the identification of a data source
- SchemaID – specifies the schema for a crawler job
- DataConnectionID – describes which agent crawler should be used
- Crawler – implementation class of a Crawler
- Agent – implementation class of an Agent