SMILA/Documentation/Filesystem Crawler

What does FileSystemCrawler do

The FileSystemCrawler collects all files and folders recursively starting from a given directory. Next do the content of files it may gather any file meta information from the following list:

size
full path
file name only
file size
last modified date
file content
file extension

Crawling configuration

The configuration file can be found at configuration/org.eclipse.smila.framework/file.

Crawling Configuration explanation

The root element of crawling configuration is CrawlJob and contains the following sub elements:

DataSourceID – the identification of a data source
SchemaID – specifies the schema for a crawler job
DataConnectionID – describes which agent crawler should be used
- Crawler – implementation class of a Crawler
- Agent – implementation class of an Agent
CompoundHandling – specify if packed data (like a zip containing files) should be unpack and files within should be crawled (YES or NO).
Attributes – list all attributes which describe a file. (LastModifiedDate, Filename, Path, Content, Extension, Size)
- Attribute
  - Type (required) – the data type (String, Integer or Date).
  - Name (required) – attributes name.
  - HashAttribute – specify if a hash should be created (true or false).
  - KeyAttribute – creates a key for this object, for example for record id (true or false).
  - Attachment – specify if the attribute return the data as attachment of record.

Process – contains parameters for gathering data.
- BaseDir – the directory the crawling process begin (if is null, cannot be found/access or is not a directory a CrawlerCriticalException will be thrown).
  - Filter – select file type and crawling mode.
  - Recursive – (true or false).
  - CaseSensitive – true or false

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Documentation/Filesystem Crawler

Contents

What does FileSystemCrawler do

Crawling configuration

Crawling Configuration explanation

Breadcrumbs

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Documentation/Filesystem Crawler

Contents

What does FileSystemCrawler do

Crawling configuration

Crawling Configuration explanation