Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/HowTo/How to implement a crawler"

(General review on content and style)
Line 1: Line 1:
 +
This page explains how you to implement a [[SMILA/Glossary#C|crawler]] and [[SMILA/Howto_integrate_a_component_in_SMILA|add its functionality]] to SMILA.
 +
 
== Java implementations ==
 
== Java implementations ==
  
* Create a new eclipse Project using the Plug-in Project Wizard and select "Equinox" as OSGi framework
+
* Launch Eclipse, create a new project using the ''Plug-in Project'' wizard, and select "[[SMILA/Glossary#E|Equinox]]" as the OSGi framework.
* name the project using the prefix <tt>org.eclipse.smila.connectivity.framework.crawler.</tt>
+
* Name the project using the prefix <tt>org.eclipse.smila.connectivity.framework.crawler</tt>.
* edit the manifest and add the follwing to Imported Packages
+
* Edit the manifest file and add the following packages to the ''Import-Package'' section.
 
** <tt>org.eclipse.smila.connectivity.framework</tt>
 
** <tt>org.eclipse.smila.connectivity.framework</tt>
 
** <tt>org.eclipse.smila.connectivity.framework.utils</tt>
 
** <tt>org.eclipse.smila.connectivity.framework.utils</tt>
Line 15: Line 17:
 
** <tt>org.apache.commons.logging;version="1.1.1"</tt>
 
** <tt>org.apache.commons.logging;version="1.1.1"</tt>
  
* edit the manifest and add the follwing to <tt>Require-Bundle</tt>: (it MUST be imported via reguire bundle instead of package import)
+
* Edit the manifest file and add the following to the ''Require-Bundle'' section: (Please note that it is a MUST to import this package using ''Require-Bundle'' instead of ''Import-Package''.)
 
** <tt>org.eclipse.smila.connectivity.framework.indexorder</tt>
 
** <tt>org.eclipse.smila.connectivity.framework.indexorder</tt>
* After edit the manifest, this looks like:
+
* The manifest should now look like this:
 
<pre>
 
<pre>
 
Require-Bundle: org.eclipse.smila.connectivity.framework.indexorder
 
Require-Bundle: org.eclipse.smila.connectivity.framework.indexorder
Line 33: Line 35:
 
</pre>
 
</pre>
  
* Compile schema into JAXB classes (see main section on the top)
+
* Compile schema into JAXB classes (see main section on the top).
* Add <tt>"code/gen"</tt> to source folders (build.properties : <tt>source.. = code/src/,code/gen/</tt>)
+
* Add the <tt>code/gen</tt> folder to the source folders (build.properties : <tt>source.. = code/src/,code/gen/</tt>):
** click right on your bundle
+
** Right-click your bundle and click ''New > Source Folder''.
** go to the menu item New
+
** Enter "code/gen" as the folder name.
** click on Source folder
+
* Copy the content of the folder <tt>schema-compile-runtime\schema-pattern\</tt> into your crawler bundle folder.
** folder name: code/gen
+
* Compile schema into JAXB classes by running <tt>schema.cmd</tt> (The <tt>schema-compile-runtime</tt> folder should be located on the projects level - it will be used for compilation.)
* Copy content of the folder "schema-compile-runtime\schema-pattern\" into your crawler bundle folder
+
** Launch the file <tt>schema.cmd</tt> from a cmd console to see the result or error messages.
* Complile schema into JAXB classes by running schema.cmd ( "schema-compile-runtime" folder should be located on the projects level - it will be used for compilation )
+
* Implement XSD schema for the crawler configuration using the template <tt>schemas\TemplateIndexOrder.xsd</tt>.
** start schema.cmd from cmd console, to see the result or error messages
+
* Implement XSD schema for crawler configuration using template "schemas\TemplateIndexOrder.xsd"
+
 
** Index Order Configaration based on XSD schema redefinition of abstract "RootIndexOrderConfiguration" schema
 
** Index Order Configaration based on XSD schema redefinition of abstract "RootIndexOrderConfiguration" schema
 
** Developer should define redefinition of "Process" and "Attribute" nodes for crawler specific information.
 
** Developer should define redefinition of "Process" and "Attribute" nodes for crawler specific information.
 
<source lang="xml">
 
<source lang="xml">
 
<xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
<xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="RootIndexOrderConfiguration.xsd">
+
  <xs:redefine schemaLocation="RootIndexOrderConfiguration.xsd">
<xs:complexType name="Process">
+
    <xs:complexType name="Process">
<xs:annotation>
+
      <xs:annotation>
<xs:documentation>Process Specification</xs:documentation>
+
        <xs:documentation>Process Specification</xs:documentation>
</xs:annotation>
+
      </xs:annotation>
<xs:complexContent>
+
      <xs:complexContent>
<xs:extension base="Process">
+
        <xs:extension base="Process">
<\!--define process here -->
+
  <\!--define process here -->
</xs:extension>
+
        </xs:extension>
</xs:complexContent>
+
      </xs:complexContent>
</xs:complexType>
+
    </xs:complexType>
<xs:complexType name="Attribute">
+
    <xs:complexType name="Attribute">
<xs:complexContent>
+
      <xs:complexContent>
<xs:extension base="Attribute">
+
        <xs:extension base="Attribute">
<\!--define attribute here -->
+
  <\!--define attribute here -->
</xs:extension>
+
        </xs:extension>
</xs:complexContent>
+
      </xs:complexContent>
</xs:complexType>
+
    </xs:complexType>
</xs:redefine>
+
  </xs:redefine>
 
</xs:schema>
 
</xs:schema>
 
</source>
 
</source>
Line 92: Line 92:
 
}
 
}
 
</source>
 
</source>
 
  
 
* Implement extension for "org.eclipse.smila.connectivity.framework.indexorder.schema" with the bundle name used as ID and NAME
 
* Implement extension for "org.eclipse.smila.connectivity.framework.indexorder.schema" with the bundle name used as ID and NAME
Line 110: Line 109:
 
</source>
 
</source>
  
* Note: if you renamed schema file name, it should be fixed inside
+
* Note: If you rename the schema file name, make sure to update the following locations:
 
** plug-in implementation
 
** plug-in implementation
 
** TemplateIndexOrder.jxb (it also should be also renamed with the same name as schema)
 
** TemplateIndexOrder.jxb (it also should be also renamed with the same name as schema)
Line 117: Line 116:
 
=== OSGi and Declarative Service requirements ===
 
=== OSGi and Declarative Service requirements ===
  
* it is not required to implement a BundleActivator (this may change if SCA Nodes are used, then it may be required to register the Crawler)
+
* It is not required to implement a BundleActivator (this may change if SCA Nodes are used, then it may be required to register the Crawler)
 
* create a top level folder <tt>OSGI-INF</tt>
 
* create a top level folder <tt>OSGI-INF</tt>
 
* create a Component Description file in <tt>OSGI-INF</tt>. You can name the file as you like, but it is good practice to name it like the Crawler. Therein you have to provide a unique component name, it should be the same as the Crawler's name followed by DS (for DeclarativeService). Then you have to provide your implementation class and the service interface class, which is always <tt>org.eclipse.smila.connectivity.framework.Crawler</tt>.  
 
* create a Component Description file in <tt>OSGI-INF</tt>. You can name the file as you like, but it is good practice to name it like the Crawler. Therein you have to provide a unique component name, it should be the same as the Crawler's name followed by DS (for DeclarativeService). Then you have to provide your implementation class and the service interface class, which is always <tt>org.eclipse.smila.connectivity.framework.Crawler</tt>.  
Line 133: Line 132:
 
</source>
 
</source>
  
* add a Service-Component entry to your manifest file, e.g.
+
* Add a Service-Component entry to your manifest file, e.g.
 
<pre>
 
<pre>
 
Service-Component: OSGI-INF/filesystemcrawler.xml
 
Service-Component: OSGI-INF/filesystemcrawler.xml
 
</pre>
 
</pre>
  
* open build.properties and change binary build: select folders OSGI-INF, schemas and file plugin.xml
+
* Open build.properties and change binary build: select folders OSGI-INF, schemas and file plugin.xml
 
<source lang="xml">
 
<source lang="xml">
 
source.. = code/src/,\
 
source.. = code/src/,\
Line 175: Line 174:
 
</source>
 
</source>
  
=== Develope your crawler ===
+
=== Develop your crawler ===
  
* implement your crawler in a new Class extending <tt>org.eclipse.smila.connectivity.framework.AbstractCrawler</tt>
+
* Implement your crawler in a new class extending <tt>org.eclipse.smila.connectivity.framework.AbstractCrawler</tt>.
* create a junit test-bundle for this crawler e. g. org.eclipse.smila.connectivity.framework.crawler.filesystem.test
+
* Create a JUnit test bundle for this crawler e.g. <tt>org.eclipse.smila.connectivity.framework.crawler.filesystem.test</tt>.
* integrate your new crawler bundle into build process
+
* Integrate your new crawler bundle into the build process: Refer to the page [[SMILA/Development_Guidelines/How to integrate new bundle into build process|How to integrate new bundle into build process]] for further instructions.
** [[SMILA/Development_Guidelines/How to integrate new bundle into build process|How to integrate new bundle into build process]]
+
* Integrate your test bundle into the build process: Refer to the page [[SMILA/Development_Guidelines/How to integrate test bundle into build process|How to integrate test bundle into build process]]) for further instructions.
* integrate your test bundle into build process
+
** [[SMILA/Development_Guidelines/How to integrate test bundle into build process|How to integrate test bundle into build process]])
+
 
* Start your crawler with launch
 
* Start your crawler with launch
** by SMILA.exe
+
** by <tt>SMILA.exe</tt>
** insert "org.eclipse.smila.connectivity.framework.crawler.filesystem@4:start, \" to config.ini
+
** Insert <tt>org.eclipse.smila.connectivity.framework.crawler.filesystem@4:start, \</tt> to the <tt>config.ini</tt> file.
** by SMILA.launch
+
** by <tt>SMILA.launch</tt>
** Open run dialog, and set your bundle to start level 4 and auto start true.
+
** Open the ''Run'' dialog, switch to the configuration page of your bundle, set the parameter ''Default Start level'' to ''4'', and the parameter ''Default Auto-Start'' to ''true''.
  
 
* Enjoy
 
* Enjoy
  
 
[[Category:SMILA]]
 
[[Category:SMILA]]

Revision as of 08:31, 6 October 2008

This page explains how you to implement a crawler and add its functionality to SMILA.

Java implementations

  • Launch Eclipse, create a new project using the Plug-in Project wizard, and select "Equinox" as the OSGi framework.
  • Name the project using the prefix org.eclipse.smila.connectivity.framework.crawler.
  • Edit the manifest file and add the following packages to the Import-Package section.
    • org.eclipse.smila.connectivity.framework
    • org.eclipse.smila.connectivity.framework.utils
    • org.eclipse.smila.datamodel.record
    • org.osoa.sca.annotations
    • com.sun.xml.bind.v2;version="2.1.6"
    • javax.xml.bind;version="2.1.0"
    • javax.xml.bind.annotation;version="2.1.0"
    • javax.xml.bind.annotation.adapters;version="2.1.0"
    • javax.xml.stream;version="1.0.0"
    • org.apache.commons.logging;version="1.1.1"
  • Edit the manifest file and add the following to the Require-Bundle section: (Please note that it is a MUST to import this package using Require-Bundle instead of Import-Package.)
    • org.eclipse.smila.connectivity.framework.indexorder
  • The manifest should now look like this:
Require-Bundle: org.eclipse.smila.connectivity.framework.indexorder
Import-Package: com.sun.xml.bind.v2;version="2.1.6",
 javax.xml.bind;version="2.1.0",
 javax.xml.bind.annotation;version="2.1.0",
 javax.xml.bind.annotation.adapters;version="2.1.0",
 javax.xml.stream;version="1.0.0",
 org.apache.commons.io;version="1.4.0",
 org.apache.commons.logging;version="1.1.1",
 org.eclipse.smila.connectivity.framework,
 org.eclipse.smila.connectivity.framework.utils,
 org.eclipse.smila.datamodel.record,
 org.osoa.sca.annotations
  • Compile schema into JAXB classes (see main section on the top).
  • Add the code/gen folder to the source folders (build.properties : source.. = code/src/,code/gen/):
    • Right-click your bundle and click New > Source Folder.
    • Enter "code/gen" as the folder name.
  • Copy the content of the folder schema-compile-runtime\schema-pattern\ into your crawler bundle folder.
  • Compile schema into JAXB classes by running schema.cmd (The schema-compile-runtime folder should be located on the projects level - it will be used for compilation.)
    • Launch the file schema.cmd from a cmd console to see the result or error messages.
  • Implement XSD schema for the crawler configuration using the template schemas\TemplateIndexOrder.xsd.
    • Index Order Configaration based on XSD schema redefinition of abstract "RootIndexOrderConfiguration" schema
    • Developer should define redefinition of "Process" and "Attribute" nodes for crawler specific information.
<xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:redefine schemaLocation="RootIndexOrderConfiguration.xsd">
    <xs:complexType name="Process">
      <xs:annotation>
        <xs:documentation>Process Specification</xs:documentation>
      </xs:annotation>
      <xs:complexContent>
        <xs:extension base="Process">
	  <\!--define process here -->
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
    <xs:complexType name="Attribute">
      <xs:complexContent>
        <xs:extension base="Attribute">
	  <\!--define attribute here -->
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
  </xs:redefine>
</xs:schema>
  • Add schema location reference in plug-in implementation (return "schemas/TemplateIndexOrder.xsd")
    • create a new class (IndexOrderSchemaImpl) which implements interface IndexOrderSchema
    • use method String getSchemaLocation() to return "schemas/TemplateIndexOrder.xsd"
package org.eclipse.smila.connectivity.framework.crawler.filesystem;
 
import org.eclipse.smila.connectivity.framework.indexorder.IndexOrderSchema;
 
/**
 * The Class IndexOrderSchemaImpl.
 */
public class IndexOrderSchemaImpl implements IndexOrderSchema {
 
  /**
   * {@inheritDoc}
   * 
   * @see org.eclipse.smila.connectivity.framework.indexorder.IndexOrderSchema#getSchemaLocation()
   */
  public String getSchemaLocation() {
    return "schemas/filesystemIndexOrder.xsd";
  }
}
  • Implement extension for "org.eclipse.smila.connectivity.framework.indexorder.schema" with the bundle name used as ID and NAME
    • in plugin.xml check schema class and change if it is necessary
<plugin>
   <extension
         id="org.eclipse.smila.connectivity.framework.crawler.filesystem"
         name="org.eclipse.smila.connectivity.framework.crawler.filesystem"
         point="org.eclipse.smila.connectivity.framework.indexorder.schema">
      <schema
            class="org.eclipse.smila.connectivity.framework.crawler.filesystem.IndexOrderSchemaImpl">
      </schema>
   </extension>
</plugin>
  • Note: If you rename the schema file name, make sure to update the following locations:
    • plug-in implementation
    • TemplateIndexOrder.jxb (it also should be also renamed with the same name as schema)
    • schema.cmd

OSGi and Declarative Service requirements

  • It is not required to implement a BundleActivator (this may change if SCA Nodes are used, then it may be required to register the Crawler)
  • create a top level folder OSGI-INF
  • create a Component Description file in OSGI-INF. You can name the file as you like, but it is good practice to name it like the Crawler. Therein you have to provide a unique component name, it should be the same as the Crawler's name followed by DS (for DeclarativeService). Then you have to provide your implementation class and the service interface class, which is always org.eclipse.smila.connectivity.framework.Crawler.

filesystemcrawler.xml

<?xml version="1.0" encoding="UTF-8"?>
<component name="FileSystemCrawlerDS" immediate="true">
    <implementation class="org.eclipse.smila.connectivity.framework.crawler.filesystem.FileSystemCrawler" />
    <service>
         <provide interface="org.eclipse.smila.connectivity.framework.Crawler"/>
    </service>
    <property name="org.eclipse.smila.connectivity.framework.crawler.type" value="filesystemcrawler"/>
</component>
  • Add a Service-Component entry to your manifest file, e.g.
Service-Component: OSGI-INF/filesystemcrawler.xml
  • Open build.properties and change binary build: select folders OSGI-INF, schemas and file plugin.xml
source.. = code/src/,\
           code/gen/
output.. = code/bin/
bin.includes = META-INF/,\
               .,\
               plugin.xml,\
               schemas/,\
               OSGI-INF/

SCA requirements

Most requirements for SCA are already handled by the base class org.eclipse.smila.connectivity.framework.AbstractCrawler. You should annotate your implementation with @AllowsPassByReference, to allow SCA to pass service parameters by reference when service interactions are within the same adress space. Here is an example:

@AllowsPassByReference
public class FileSystemCrawler extends AbstractCrawler {
// implementation goes here ...
}

For SCA, make sure to set attribute "immediate" of tag "component" to false and attribute "servicefactory" of tag "service " to true in your xml-file. This is required to let SCA dynamically create ServiceReferences. Finally you have to provide a value for the property org.eclipse.smila.connectivity.framework.crawler.type. This is used by SCA to find the correct OSGi Declarative Service during runtime. The value has to be unique (it is used in SCA contribution files) and should be named like the Crawler. Here is an example:

filesystemcrawler.xml

<?xml version="1.0" encoding="UTF-8"?>
<component name="FileSystemCrawlerDS" immediate="false">
    <implementation class="org.eclipse.smila.connectivity.framework.crawler.filesystem.FileSystemCrawler" />
    <service servicefactory="true">
         <provide interface="org.eclipse.smila.connectivity.framework.Crawler"/>
    </service>
    <property name="org.eclipse.smila.connectivity.framework.crawler.type" value="filesystemcrawler"/>
</component>

Develop your crawler

  • Implement your crawler in a new class extending org.eclipse.smila.connectivity.framework.AbstractCrawler.
  • Create a JUnit test bundle for this crawler e.g. org.eclipse.smila.connectivity.framework.crawler.filesystem.test.
  • Integrate your new crawler bundle into the build process: Refer to the page How to integrate new bundle into build process for further instructions.
  • Integrate your test bundle into the build process: Refer to the page How to integrate test bundle into build process) for further instructions.
  • Start your crawler with launch
    • by SMILA.exe
    • Insert org.eclipse.smila.connectivity.framework.crawler.filesystem@4:start, \ to the config.ini file.
    • by SMILA.launch
    • Open the Run dialog, switch to the configuration page of your bundle, set the parameter Default Start level to 4, and the parameter Default Auto-Start to true.
  • Enjoy