Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

SMILA/Documentation/HowTo/How to implement a crawler

This page explains how to implement a crawler and add its functionality to SMILA.

Prepare bundle and manifest

  • Create a new bundle that will contain your crawler. Follow the instructions on How to create a bundle and name the project org.eclipse.smila.connectivity.framework.crawler.
  • Edit the manifest file and add the following packages to the Import-Package section.
    • org.eclipse.smila.connectivity.framework;version="0.5.0"
    • org.eclipse.smila.connectivity.framework.indexorder;version="0.5.0"
    • org.eclipse.smila.connectivity.framework.indexorder.messages.interfaces;version="0.5.0"
    • org.eclipse.smila.connectivity.framework.indexorder.schematools;version="0.5.0"
    • org.eclipse.smila.connectivity.framework.utils;version="0.5.0"
    • org.eclipse.smila.datamodel.record;version="0.5.0"
    • com.sun.xml.bind.v2;version="2.1.6"
    • javax.xml.bind;version="2.1.0"
    • javax.xml.bind.annotation;version="2.1.0"
    • javax.xml.bind.annotation.adapters;version="2.1.0"
    • javax.xml.stream;version="1.0.0"
    • org.apache.commons.logging;version="1.1.1"
  • The manifest should now look like this:
Manifest-Version: 1.0
Bundle-ManifestVersion: 2
Bundle-Name: Filesystem Crawler Plug-in (Incubation)
Bundle-SymbolicName: org.eclipse.smila.connectivity.framework.crawler.filesystem;singleton:=true
Bundle-Version: 0.5.0
Bundle-Vendor: empolis GmbH and brox IT Solutions GmbH
Import-Package: com.sun.xml.bind.v2;version="2.1.6",
 javax.xml.bind;version="2.1.0",
 javax.xml.bind.annotation;version="2.1.0",
 javax.xml.bind.annotation.adapters;version="2.1.0",
 javax.xml.stream;version="1.0.1",
 org.apache.commons.io;version="1.4.0",
 org.apache.commons.logging;version="1.1.1",
 org.eclipse.smila.connectivity.framework;version="0.5.0",
 org.eclipse.smila.connectivity.framework.indexorder;version="0.5.0",
 org.eclipse.smila.connectivity.framework.indexorder.messages;version="0.5.0",
 org.eclipse.smila.connectivity.framework.indexorder.messages.interfaces;version="0.5.0",
 org.eclipse.smila.connectivity.framework.indexorder.schematools;version="0.5.0",
 org.eclipse.smila.connectivity.framework.utils;version="0.5.0",
 org.eclipse.smila.datamodel.record;version="0.5.0",
 org.eclipse.smila.utils.digest;version="0.5.0"
Export-Package: org.eclipse.smila.connectivity.framework.crawler.filesystem;version="0.5.0",
 org.eclipse.smila.connectivity.framework.crawler.filesystem.messages;version="0.5.0"
Service-Component: OSGI-INF/filesystemcrawler.xml
Eclipse-LazyStart: false

Prepare Indexorder schema and classes

NOTE The information in this section is not completely up-to-date. For details about schema definition and compilaton best have a look at the source code of the crawlers that come with SMILA (e.g. the filesystem crawler in bundle org.eclipse.smila.connectivity.framework.crawler.filesystem) and use that as a template.

  • Add the code/gen folder to the source folders (build.properties : source.. = code/src/,code/gen/):
    • Right-click your bundle and click New > Source Folder.
    • Enter "code/gen" as the folder name.
  • Copy the content of the folder template-crawler into your crawler bundle folder.
  • Compile schema into JAXB classes by running ant.
  • Implement XSD schema for the crawler configuration using the template schemas\TemplateIndexOrder.xsd.
    • Index Order configuration based on XSD schema redefinition of abstract "RootIndexOrderConfiguration" schema
    • Developer should define redefinition of "Process" and "Attribute" nodes for crawler specific information.
<xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:redefine schemaLocation="../../org.eclipse.smila.connectivity.framework.indexorder/schemas/RootIndexOrderConfiguration.xsd">
    <xs:complexType name="Process">
      <xs:annotation>
        <xs:documentation>Process Specification</xs:documentation>
      </xs:annotation>
      <xs:complexContent>
        <xs:extension base="Process">
	  <\!--define process here -->
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
    <xs:complexType name="Attribute">
      <xs:complexContent>
        <xs:extension base="Attribute">
	  <\!--define attribute here -->
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
  </xs:redefine>
</xs:schema>
  • Rename and edit JAXB mapping file used for generating configuration classes (TemplateIndexOrder.jxb).
    • Update package name
  <jxb:package name="org.eclipse.smila.connectivity.framework.crawler.MYCRAWLER.messages"/>
    • Update schema location
<jxb:bindings schemaLocation="TemplateIndexOrder.xsd"
  • Add a schema location reference in the plug-in implementation (return "schemas/TemplateIndexOrder.xsd").
    • Create a new class (IndexOrderSchemaImpl) which implements the interface IndexOrderSchema.
    • Use the method String getSchemaLocation() to return "schemas/TemplateIndexOrder.xsd".
    • Use the method String getMessagesPackage() to return package name"org.eclipse.smila.connectivity.framework.crawler.MYCRAWLER.messages".
package org.eclipse.smila.connectivity.framework.crawler.MYCRAWLER;
 
import org.eclipse.smila.connectivity.framework.indexorder.IndexOrderSchema;
 
/**
 * The Class IndexOrderSchemaImpl.
 */
public class IndexOrderSchemaImpl implements IndexOrderSchema {
 
  /**
   * {@inheritDoc}
   * 
   * @see org.eclipse.smila.connectivity.framework.indexorder.IndexOrderSchema#getSchemaLocation()
   */
  public String getSchemaLocation() {
    return "schemas/MYCRAWLER_IndexOrder.xsd";
  }
 
  /**
   * {@inheritDoc}
   * 
   * @see org.eclipse.smila.connectivity.framework.indexorder.IndexOrderSchema#getMessagesPackage()
   */
  public String getMessagesPackage() {
    return "org.eclipse.smila.connectivity.framework.crawler.MYCRAWLER.messages";
  }
 
 
}
  • Implement the extension for org.eclipse.smila.connectivity.framework.indexorder.schema with the bundle name used as ID and NAME.
  • Check the schema classes in the file plugin.xml and change if necessary.
<plugin>
   <extension
         id="org.eclipse.smila.connectivity.framework.crawler.filesystem"
         name="org.eclipse.smila.connectivity.framework.crawler.filesystem"
         point="org.eclipse.smila.connectivity.framework.indexorder.schema">
      <schema
            class="org.eclipse.smila.connectivity.framework.crawler.filesystem.IndexOrderSchemaImpl">
      </schema>
   </extension>
</plugin>

Note: If you rename the schema file name, make sure to update the following locations:

  • Plug-in implementation classes
  • TemplateIndexOrder.jxb (it also should be renamed with the same name as schema)
  • build.xml

OSGi and Declarative Service requirements

  • It is not required to implement a BundleActivator.
  • Create the top level folder OSGI-INF.
  • Create a Component Description file in OSGI-INF. You can name the file as you like, but it is good practice to name it like the crawler. Therein you have to provide a unique component name, it should be the same as the crawler's class name followed by DS (for Declarative Service). Then you have to provide your implementation class and the service interface class, which is always org.eclipse.smila.connectivity.framework.Crawler. Here is an example for the FileSystemCrawler:
<?xml version="1.0" encoding="UTF-8"?>
<component name="FileSystemCrawlerDS" immediate="true">
    <implementation class="org.eclipse.smila.connectivity.framework.crawler.filesystem.FileSystemCrawler" />
    <service>
         <provide interface="org.eclipse.smila.connectivity.framework.Crawler"/>
    </service>
    <property name="org.eclipse.smila.connectivity.framework.crawler.type" value="filesystemcrawler"/>
</component>
  • Add a Service-Component entry to your manifest file, e.g.:
Service-Component: OSGI-INF/filesystemcrawler.xml
  • Open build.properties and change the binary build: Add the folders OSGI-INF and schemas as well as the file plugin.xml.
source.. = code/src/,\
           code/gen/
output.. = code/bin/
bin.includes = META-INF/,\
               .,\
               plugin.xml,\
               schemas/,\
               OSGI-INF/

Develop your crawler

  • Implement your crawler in a new class extending org.eclipse.smila.connectivity.framework.AbstractCrawler.
  • Create a JUnit test bundle for this crawler e.g. org.eclipse.smila.connectivity.framework.crawler.filesystem.test.
  • Integrate your new crawler bundle into the build process: Refer to the page How to integrate new bundle into build process for further instructions.
  • Integrate your test bundle into the build process: Refer to the page How to integrate test bundle into build process) for further instructions.

Run your crawler

Running SMILA in eclipse

  • Open the Run dialog, switch to the configuration page of your bundle, set the parameter Default Start level to 4, and the parameter Default Auto-Start to true.
  • Launch SMILA.launch.

Running SMILA application

  • Insert org.eclipse.smila.connectivity.framework.crawler.filesystem@4:start, \ to the config.ini file.
  • Launch SMILA by calling either SMILA.exe or eclipse.exe -console

Copyright © Eclipse Foundation, Inc. All Rights Reserved.