Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Filesystem Crawler"

m (Replacing page with '== What does FileSystemCrawler do == The FileSystemCrawler collects all files and folders recursively starting from a given directory. Next do the content of files it may gath...')
m
Line 14: Line 14:
  
 
The configuration file can be found at <tt>configuration/org.eclipse.smila.framework/file</tt>.
 
The configuration file can be found at <tt>configuration/org.eclipse.smila.framework/file</tt>.
 +
 +
== Crawling Configuration explanation ==
 +
 +
The root element of crawling configuration is CrawlJob and contains the following sub elements:
 +
 +
* <tt>DataSourceID</tt> – the identification of a data source
 +
* <tt>SchemaID</tt> – specifies the schema for a crawler job
 +
* <tt>DataConnectionID</tt> – describes which agent crawler should be used
 +
** <tt>Crawler</tt> – implementation class of a Crawler
 +
** <tt>Agent</tt> – implementation class of an Agent
 +
 +
[[Category:SMILA]]

Revision as of 02:21, 19 March 2009

What does FileSystemCrawler do

The FileSystemCrawler collects all files and folders recursively starting from a given directory. Next do the content of files it may gather any file meta information from the following list:

  • size
  • full path
  • file name only
  • file size
  • last modified date
  • file content
  • file extension

Crawling configuration

The configuration file can be found at configuration/org.eclipse.smila.framework/file.

Crawling Configuration explanation

The root element of crawling configuration is CrawlJob and contains the following sub elements:

  • DataSourceID – the identification of a data source
  • SchemaID – specifies the schema for a crawler job
  • DataConnectionID – describes which agent crawler should be used
    • Crawler – implementation class of a Crawler
    • Agent – implementation class of an Agent