Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "SMILA/Documentation/Filesystem Crawler"
< SMILA | Documentation
m |
m (→Crawling Configuration explanation) |
||
Line 24: | Line 24: | ||
** <tt>Crawler</tt> – implementation class of a Crawler | ** <tt>Crawler</tt> – implementation class of a Crawler | ||
** <tt>Agent</tt> – implementation class of an Agent | ** <tt>Agent</tt> – implementation class of an Agent | ||
+ | * <tt>CompoundHandling</tt> – specify if packed data (like a zip containing files) should be unpack and files within should be crawled (YES or NO). | ||
+ | * <tt>Attributes</tt> – list all attributes which describe a file. | ||
+ | (LastModifiedDate, Filename, Path, Content, Extension, Size) | ||
+ | ** <tt>Attribute</tt> | ||
+ | *** <tt>Type</tt> (required) – the data type (String, Integer or Date). | ||
+ | *** <tt>Name</tt> (required) – attributes name. | ||
+ | *** <tt>HashAttribute</tt> – specify if a hash should be created (true or false). | ||
+ | *** <tt>KeyAttribute</tt> – creates a key for this object, for example for record id (true or false). | ||
+ | *** <tt>Attachment</tt> – specify if the attribute return the data as attachment of record. | ||
__FORCETOC__ | __FORCETOC__ |
Revision as of 03:25, 19 March 2009
Contents
What does FileSystemCrawler do
The FileSystemCrawler collects all files and folders recursively starting from a given directory. Next do the content of files it may gather any file meta information from the following list:
- size
- full path
- file name only
- file size
- last modified date
- file content
- file extension
Crawling configuration
The configuration file can be found at configuration/org.eclipse.smila.framework/file.
Crawling Configuration explanation
The root element of crawling configuration is CrawlJob and contains the following sub elements:
- DataSourceID – the identification of a data source
- SchemaID – specifies the schema for a crawler job
- DataConnectionID – describes which agent crawler should be used
- Crawler – implementation class of a Crawler
- Agent – implementation class of an Agent
- CompoundHandling – specify if packed data (like a zip containing files) should be unpack and files within should be crawled (YES or NO).
- Attributes – list all attributes which describe a file.
(LastModifiedDate, Filename, Path, Content, Extension, Size)
- Attribute
- Type (required) – the data type (String, Integer or Date).
- Name (required) – attributes name.
- HashAttribute – specify if a hash should be created (true or false).
- KeyAttribute – creates a key for this object, for example for record id (true or false).
- Attachment – specify if the attribute return the data as attachment of record.
- Attribute