Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Development Guidelines/How to implement a Crawler"

 
Line 1: Line 1:
{{note|This is deprecated for SMILA 1.0, the connectivity framework is still functional but will aimed to be replaced by scalable import based on SMILAs job management.}}
+
#REDIRECT [[SMILA/Development Guidelines/How to implement a crawler]]
 
+
This page explains how to implement a [[SMILA/Glossary#C|Crawler]] and [[SMILA/Howto integrate a component in SMILA|add its functionality]] to SMILA v1.0.
+
 
+
== Prepare bundle and manifest  ==
+
 
+
*Create a new bundle that will contain your crawler. Follow the instructions on [[SMILA/Development Guidelines/Create a bundle (plug-in)|How to create a bundle]]. In this sample we use the prefix <tt>myplugin.crawler.mock</tt> for the name of project.
+
*For crawler JXB code generation we need to '''import SMILA.builder''' plugin-project into our workspace. YOu can find it via svn from [[http://dev.eclipse.org/svnroot/rt/org.eclipse.smila/tags/0.9/core/SMILA.builder]].
+
 
+
*Edit the plugin configuration and add '''at least''' the following packages to the ''Import-Package'' section of ''Dependencies'' tab.
+
**<tt>org.eclipse.smila.connectivity;version="1.0.0"</tt>
+
**<tt>org.eclipse.smila.connectivity.framework;version="1.0.0"</tt>
+
**<tt>org.eclipse.smila.connectivity.framework.performancecounters;version="1.0.0"</tt>
+
**<tt>org.eclipse.smila.connectivity.framework.schema;version="1.8.0"</tt>
+
**<tt>org.eclipse.smila.connectivity.framework.schema.config;version="1.0.0"</tt>
+
**<tt>org.eclipse.smila.connectivity.framework.schema.config.interfaces;version="1.0.0"</tt>
+
**<tt>org.eclipse.smila.connectivity.framework.util;version="1.0.0"</tt>
+
**<tt>org.eclipse.smila.datamodel;version="1.0.0"</tt>
+
 
+
*you will have to add additional packages to fill you crawler with business logic&nbsp;!
+
 
+
*Now your MANIFEST.MF file should be like
+
<source lang="text">
+
Manifest-Version: 1.0
+
Bundle-ManifestVersion: 2
+
Bundle-Name: Mock Crawler
+
Bundle-SymbolicName: myplugin.crawler.mock
+
Bundle-Version: 1.0.0
+
Bundle-RequiredExecutionEnvironment: JavaSE-1.6
+
Import-Package:
+
org.eclipse.smila.connectivity;version="1.0.0",
+
org.eclipse.smila.connectivity.framework;version="1.0.0",
+
org.eclipse.smila.connectivity.framework.performancecounters;version="1.0.0",
+
org.eclipse.smila.connectivity.framework.schema;version="1.0.0",
+
org.eclipse.smila.connectivity.framework.schema.config;version="1.0.0",
+
org.eclipse.smila.connectivity.framework.schema.config.interfaces;version="1.0.0",
+
org.eclipse.smila.connectivity.framework.util;version="1.0.0",
+
org.eclipse.smila.datamodel;version="1.0.0"
+
</source>
+
 
+
== Prepare DataSourceConnect schema and classes  ==
+
 
+
*create an additional source folder <tt>code/gen</tt> to contain the generated schema sources
+
**Right-click your bundle and click ''New &gt; Source Folder''.
+
**Enter "code/gen" as the folder name.
+
**edit build.properties and add folder <tt>code/gen</tt> to the source folders.
+
 
+
<source lang="text">
+
source.. = code/src/,\
+
          code/gen/
+
output.. = code/bin/
+
</source>
+
 
+
<br>
+
 
+
*create schema definition
+
**create a folder <tt>schema</tt> in your bundle
+
**create file <tt>schemas\MockCrawlerSchema.xsd</tt> to contain the XSD schema for the crawler configuration based on the abstract XSD schema "RootDataSourceConnectionConfigSchema"
+
**therin you have to provide definitions of "Process" and "Attribute" nodes for crawler specific information
+
**the following code snippet can be used as a template
+
 
+
<source lang="xml">
+
<?xml version="1.0" encoding="UTF-8"?>
+
<xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
+
  <xs:redefine schemaLocation="../../org.eclipse.smila.connectivity.framework.schema/schemas/RootDataSourceConnectionConfigSchema.xsd">
+
    <xs:complexType name="Process">
+
      <xs:annotation>
+
        <xs:documentation>Process Specification</xs:documentation>
+
      </xs:annotation>
+
      <xs:complexContent>
+
        <xs:extension base="Process">
+
 
+
      <\!--define crawler specific process here -->
+
 
+
        </xs:extension>
+
      </xs:complexContent>
+
    </xs:complexType>
+
    <xs:complexType name="Attribute">
+
      <xs:complexContent>
+
        <xs:extension base="Attribute">
+
 
+
      <\!--define crawler specific attributes here -->
+
 
+
        </xs:extension>
+
      </xs:complexContent>
+
    </xs:complexType>
+
  </xs:redefine>
+
</xs:schema>
+
</source>
+
 
+
*create JAXB mapping
+
**create file <tt>schemas\MockCrawlerSchema.jxb</tt> to contain the JAXB mappings used for generating configuration classes.
+
**Here is an example for the <tt>MockCrawler</tt> JXB file you can use as a template, just rename the "schemaLocation" and "package name":
+
 
+
<source lang="xml">
+
<jxb:bindings version="1.0"
+
  xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
+
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
+
+
  <jxb:bindings schemaLocation="MockCrawlerSchema.xsd" node="/xs:schema">
+
    <jxb:schemaBindings>
+
      <jxb:package name="mypackage.crawler.mock.messages"/>
+
    </jxb:schemaBindings>   
+
    <jxb:globalBindings>
+
      <jxb:javaType name="java.util.Date" xmlType="xs:dateTime" printMethod="org.eclipse.smila.connectivity.framework.schema.tools.SimpleDateFormatter.print" parseMethod="org.eclipse.smila.connectivity.framework.schema.tools.SimpleDateFormatter.parse"/>
+
      <jxb:javaType name="org.eclipse.smila.connectivity.framework.schema.config.MimeTypeAttributeType" xmlType="MimeTypeAttributeType" parseMethod="org.eclipse.smila.connectivity.framework.schema.config.MimeTypeAttributeType.fromValue" printMethod="org.eclipse.smila.connectivity.framework.schema.config.MimeTypeAttributeType.toValue"/>
+
      <jxb:serializable uid="1"/>
+
    </jxb:globalBindings>
+
  </jxb:bindings>
+
</jxb:bindings>
+
</source>
+
 
+
<br>
+
 
+
*Add a schema location reference in the plug-in implementation
+
**Create a new class (<tt>DataSourceConnectionConfigPluginImpl</tt>) which implements the interface <tt>DataSourceConnectionConfigPlugin</tt>.
+
**Use the method <tt>String getSchemaLocation()</tt> to return "schemas/MockCrawlerSchema.xsd".
+
**Use the method <tt>String getMessagesPackage()</tt> to return package name"mypackage.crawler.mock.messages".
+
 
+
Here is an example implementation for the <tt>MockCrawler</tt> you can use as a template: <source lang="java">
+
package mypackage.crawler.mock;
+
 
+
import org.eclipse.smila.connectivity.framework.schema.DataSourceConnectionConfigPlugin;
+
 
+
/**
+
* The Class DataSourceConnectionConfigPluginImpl.
+
*/
+
public class DataSourceConnectionConfigPluginImpl implements DataSourceConnectionConfigPlugin {
+
 
+
  /**
+
  * {@inheritDoc}
+
  *
+
  * @see org.eclipse.smila.connectivity.framework.schema.DataSourceConnectionConfigPlugin#getSchemaLocation()
+
  */
+
  public String getSchemaLocation() {
+
    return "schemas/MockCrawlerSchema.xsd";
+
  }
+
 
+
  /**
+
  * {@inheritDoc}
+
  *
+
  * @see org.eclipse.smila.connectivity.framework.schema.DataSourceConnectionConfigPlugin#getMessagesPackage()
+
  */
+
  public String getMessagesPackage() {
+
    return "mypackage.crawler.mock.messages";
+
  }
+
 
+
}
+
</source>
+
 
+
*create new file <tt>plugin.xml</tt>
+
**define the extension for <tt>org.eclipse.smila.connectivity.framework.schema.extension</tt>, using the bundle name as ID and NAME.
+
**set the schema class to your implmenetation of interface <tt>DataSourceConnectionConfigPlugin</tt>
+
**Here is an example for the <tt>MockCrawler</tt> <tt>plugin.xml</tt> file you can use as a template:
+
 
+
<source lang="xml">
+
<plugin>
+
  <extension
+
        id="myplugin.crawler.mock"
+
        name="myplugin.crawler.mock"
+
        point="org.eclipse.smila.connectivity.framework.schema.extension">
+
      <schema
+
            class="mypackage.crawler.mock.DataSourceConnectionConfigPluginImpl">
+
      </schema>
+
  </extension>
+
</plugin>
+
</source>
+
 
+
<br>
+
 
+
*Compile schema into JAXB classes by using <tt>ant</tt>
+
**See [[SMILA/Development Guidelines/Setup for JAXB code generation]] for instruction on how to setup the JAXB generation tools. '''''It is advised to let lib outside the workspace, for example in a lower level folder.'''''
+
**create a new file <tt>build.xml</tt> to contain JXB build information. Use the following template as the content for file <tt>build.xml</tt> and rename the property value accordingly:
+
 
+
<source lang="xml">
+
<project name="sub-build" default="compile-schema-and-decorate" basedir=".">
+
 
+
  <property name="schema.name"  value="MockCrawlerSchema" />
+
 
+
  <import file="../SMILA.builder/xjc/build.xml" />
+
 
+
</project>
+
</source>
+
**Launch <tt>ant -Dlib.dir=../../lib</tt> from a cmd console to create the java files or to see any error messages. 
+
Note that here is supposed that JAXB and ANT lib is just a lower level that workspace.
+
<br> '''Note:''' If you rename the schema file name, make sure to update the following locations:
+
*Plug-in implementation classes
+
*<tt>MockCrawlerSchema.jxb</tt> (it also should be renamed with the same name as schema)
+
*<tt>build.xml</tt>
+
 
+
== OSGi and Declarative Service requirements  ==
+
 
+
*It is not required to implement a BundleActivator.
+
*Create the top level folder <tt>OSGI-INF</tt>.
+
*Create a Component Description file in <tt>OSGI-INF</tt>. You can name the file as you like, but it is good practice to name it like the crawler. Therein you have to provide a unique component name, it should be the same as the crawler's class name. Then you have to provide your implementation class and the service interface class, which is always <tt>org.eclipse.smila.connectivity.framework.Crawler</tt>. Here is an example for the <tt>MockCrawler</tt> component description file you can use as a template:
+
 
+
<source lang="xml">
+
<component name="MockCrawler" immediate="false" factory="CrawlerFactory">
+
    <implementation class="mypackage.crawer.mock.MockCrawler" />
+
    <service>
+
        <provide interface="org.eclipse.smila.connectivity.framework.Crawler"/>
+
    </service>   
+
</component>
+
</source>
+
 
+
*Add a ''Service-Component'' entry to your manifest file, e.g.:
+
<pre>Service-Component: OSGI-INF/mockcrawler.xml
+
</pre>
+
*Open <tt>build.properties</tt> and change the binary build: Add the folders <tt>OSGI-INF</tt> and <tt>schemas</tt> as well as the file <tt>plugin.xml</tt>.
+
 
+
<source lang="xml">
+
bin.includes = META-INF/,\
+
              .,\
+
              plugin.xml,\
+
              schemas/,\
+
              OSGI-INF/
+
</source>
+
 
+
<br>
+
 
+
== Implement your crawler  ==
+
 
+
*Implement your crawler in a new class extending <tt>org.eclipse.smila.connectivity.framework.AbstractCrawler</tt>.
+
 
+
*Integrate your new agent bundle into the build process: Refer to the page [[SMILA/Development Guidelines/How to integrate new bundle into build process|How to integrate new bundle into build process]] for further instructions.
+
 
+
* Follow the example of [[SMILA/Component Examples/FileSystemCrawler|FileSystemCrawler]]
+
[optional]
+
 
+
*Create a JUnit test bundle for this crawler e.g. <tt>myplugin.crawler.mock.test</tt>.
+
*Integrate your test bundle into the build process: Refer to the page [[SMILA/Development Guidelines/How to integrate test bundle into build process|How to integrate test bundle into build process]]) for further instructions.
+
 
+
== Activate your crawler  ==
+
 
+
=== Activation SMILA in eclipse  ===
+
 
+
*Open the ''Run'' dialog, switch to the configuration page of ''Bundles'', select your bundle and set the parameter ''Default Auto-Start'' to ''true''.
+
*Launch <tt>SMILA.launch</tt>.
+
 
+
=== Activation SMILA application  ===
+
 
+
*Insert your bundle , e.g. <tt>myplugin.crawler.mock@4:start</tt>, to the <tt>config.ini</tt> file.
+
*Launch SMILA by calling either <tt>SMILA.exe</tt> or <tt>eclipse.exe -console</tt>
+
 
+
== Run your crawler  ==
+
 
+
Information on how to start and run an Crawler can be found in the [[SMILA/Documentation/CrawlerController|CrawlerController]] documentation.
+
 
+
[[Category:SMILA]]
+

Latest revision as of 11:37, 19 January 2012

Back to the top