Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Development Guidelines/How to implement a Crawler"

 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
#REDIRECT [[SMILA/Development Guidelines/How to implement a crawler]]
 
#REDIRECT [[SMILA/Development Guidelines/How to implement a crawler]]
This page explains how to implement a [[SMILA/Glossary#C|Crawler]] and [[SMILA/Howto integrate a component in SMILA|add its functionality]] to SMILA v0.9.
 
 
== Prepare bundle and manifest  ==
 
 
*Create a new bundle that will contain your crawler. Follow the instructions on [[SMILA/Development Guidelines/Create a bundle (plug-in)|How to create a bundle]]. In this sample we use the prefix <tt>myplugin.crawler.mock</tt> for the name of project.
 
*For crawler JXB code generation we need to '''import SMILA.builder''' plugin-project into our workspace. YOu can find it via svn from [[http://dev.eclipse.org/svnroot/rt/org.eclipse.smila/tags/0.9/core/SMILA.builder]].
 
 
*Edit the plugin configuration and add '''at least''' the following packages to the ''Import-Package'' section of ''Dependencies'' tab.
 
**<tt>org.eclipse.smila.connectivity;version="0.8.0"</tt>
 
**<tt>org.eclipse.smila.connectivity.framework;version="0.9.0"</tt>
 
**<tt>org.eclipse.smila.connectivity.framework.performancecounters;version="0.9.0"</tt>
 
**<tt>org.eclipse.smila.connectivity.framework.schema;version="0.8.0"</tt>
 
**<tt>org.eclipse.smila.connectivity.framework.schema.config;version="0.8.0"</tt>
 
**<tt>org.eclipse.smila.connectivity.framework.schema.config.interfaces;version="0.8.0"</tt>
 
**<tt>org.eclipse.smila.connectivity.framework.util;version="0.9.0"</tt>
 
**<tt>org.eclipse.smila.datamodel;version="0.9.0"</tt>
 
 
*you will have to add additional packages to fill you crawler with business logic&nbsp;!
 
 
*Now your MANIFEST.MF file should be like
 
<source lang="text">
 
Manifest-Version: 1.0
 
Bundle-ManifestVersion: 2
 
Bundle-Name: Mock Crawler
 
Bundle-SymbolicName: myplugin.crawler.mock
 
Bundle-Version: 0.5.0
 
Bundle-RequiredExecutionEnvironment: JavaSE-1.6
 
Import-Package:
 
org.eclipse.smila.connectivity;version="0.8.0",
 
org.eclipse.smila.connectivity.framework;version="0.9.0",
 
org.eclipse.smila.connectivity.framework.performancecounters;version="0.9.0",
 
org.eclipse.smila.connectivity.framework.schema;version="0.8.0",
 
org.eclipse.smila.connectivity.framework.schema.config;version="0.8.0",
 
org.eclipse.smila.connectivity.framework.schema.config.interfaces;version="0.8.0",
 
org.eclipse.smila.connectivity.framework.util;version="0.9.0",
 
org.eclipse.smila.datamodel;version="0.9.0"
 
</source>
 
 
== Prepare DataSourceConnect schema and classes  ==
 
 
*create an additional source folder <tt>code/gen</tt> to contain the generated schema sources
 
**Right-click your bundle and click ''New &gt; Source Folder''.
 
**Enter "code/gen" as the folder name.
 
**edit build.properties and add folder <tt>code/gen</tt> to the source folders.
 
 
<source lang="text">
 
source.. = code/src/,\
 
          code/gen/
 
output.. = code/bin/
 
</source>
 
 
<br>
 
 
*create schema definition
 
**create a folder <tt>schema</tt> in your bundle
 
**create file <tt>schemas\MockCrawlerSchema.xsd</tt> to contain the XSD schema for the crawler configuration based on the abstract XSD schema "RootDataSourceConnectionConfigSchema"
 
**therin you have to provide definitions of "Process" and "Attribute" nodes for crawler specific information
 
**the following code snippet can be used as a template
 
 
<source lang="xml">
 
<?xml version="1.0" encoding="UTF-8"?>
 
<xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
  <xs:redefine schemaLocation="../../org.eclipse.smila.connectivity.framework.schema/schemas/RootDataSourceConnectionConfigSchema.xsd">
 
    <xs:complexType name="Process">
 
      <xs:annotation>
 
        <xs:documentation>Process Specification</xs:documentation>
 
      </xs:annotation>
 
      <xs:complexContent>
 
        <xs:extension base="Process">
 
 
      <\!--define crawler specific process here -->
 
 
        </xs:extension>
 
      </xs:complexContent>
 
    </xs:complexType>
 
    <xs:complexType name="Attribute">
 
      <xs:complexContent>
 
        <xs:extension base="Attribute">
 
 
      <\!--define crawler specific attributes here -->
 
 
        </xs:extension>
 
      </xs:complexContent>
 
    </xs:complexType>
 
  </xs:redefine>
 
</xs:schema>
 
</source>
 
 
*create JAXB mapping
 
**create file <tt>schemas\MockCrawlerSchema.jxb</tt> to contain the JAXB mappings used for generating configuration classes.
 
**Here is an example for the <tt>MockCrawler</tt> JXB file you can use as a template, just rename the "schemaLocation" and "package name":
 
 
<source lang="xml">
 
<jxb:bindings version="1.0"
 
  xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
 
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
 
 
  <jxb:bindings schemaLocation="MockCrawlerSchema.xsd" node="/xs:schema">
 
    <jxb:schemaBindings>
 
      <jxb:package name="mypackage.crawler.mock.messages"/>
 
    </jxb:schemaBindings>   
 
    <jxb:globalBindings>
 
      <jxb:javaType name="java.util.Date" xmlType="xs:dateTime" printMethod="org.eclipse.smila.connectivity.framework.schema.tools.SimpleDateFormatter.print" parseMethod="org.eclipse.smila.connectivity.framework.schema.tools.SimpleDateFormatter.parse"/>
 
      <jxb:javaType name="org.eclipse.smila.connectivity.framework.schema.config.MimeTypeAttributeType" xmlType="MimeTypeAttributeType" parseMethod="org.eclipse.smila.connectivity.framework.schema.config.MimeTypeAttributeType.fromValue" printMethod="org.eclipse.smila.connectivity.framework.schema.config.MimeTypeAttributeType.toValue"/>
 
      <jxb:serializable uid="1"/>
 
    </jxb:globalBindings>
 
  </jxb:bindings>
 
</jxb:bindings>
 
</source>
 
 
<br>
 
 
*Add a schema location reference in the plug-in implementation
 
**Create a new class (<tt>DataSourceConnectionConfigPluginImpl</tt>) which implements the interface <tt>DataSourceConnectionConfigPlugin</tt>.
 
**Use the method <tt>String getSchemaLocation()</tt> to return "schemas/MockCrawlerSchema.xsd".
 
**Use the method <tt>String getMessagesPackage()</tt> to return package name"mypackage.crawler.mock.messages".
 
 
Here is an example implementation for the <tt>MockCrawler</tt> you can use as a template: <source lang="java">
 
package mypackage.crawler.mock;
 
 
import org.eclipse.smila.connectivity.framework.schema.DataSourceConnectionConfigPlugin;
 
 
/**
 
* The Class DataSourceConnectionConfigPluginImpl.
 
*/
 
public class DataSourceConnectionConfigPluginImpl implements DataSourceConnectionConfigPlugin {
 
 
  /**
 
  * {@inheritDoc}
 
  *
 
  * @see org.eclipse.smila.connectivity.framework.schema.DataSourceConnectionConfigPlugin#getSchemaLocation()
 
  */
 
  public String getSchemaLocation() {
 
    return "schemas/MockCrawlerSchema.xsd";
 
  }
 
 
  /**
 
  * {@inheritDoc}
 
  *
 
  * @see org.eclipse.smila.connectivity.framework.schema.DataSourceConnectionConfigPlugin#getMessagesPackage()
 
  */
 
  public String getMessagesPackage() {
 
    return "mypackage.crawler.mock.messages";
 
  }
 
 
}
 
</source>
 
 
*create new file <tt>plugin.xml</tt>
 
**define the extension for <tt>org.eclipse.smila.connectivity.framework.schema.extension</tt>, using the bundle name as ID and NAME.
 
**set the schema class to your implmenetation of interface <tt>DataSourceConnectionConfigPlugin</tt>
 
**Here is an example for the <tt>MockCrawler</tt> <tt>plugin.xml</tt> file you can use as a template:
 
 
<source lang="xml">
 
<plugin>
 
  <extension
 
        id="myplugin.crawler.mock"
 
        name="myplugin.crawler.mock"
 
        point="org.eclipse.smila.connectivity.framework.schema.extension">
 
      <schema
 
            class="mypackage.crawler.mock.DataSourceConnectionConfigPluginImpl">
 
      </schema>
 
  </extension>
 
</plugin>
 
</source>
 
 
<br>
 
 
*Compile schema into JAXB classes by using <tt>ant</tt>
 
**See [[SMILA/Development Guidelines/Setup for JAXB code generation]] for instruction on how to setup the JAXB generation tools. '''''It is advised to let lib outside the workspace, for example in a lower level folder.'''''
 
**create a new file <tt>build.xml</tt> to contain JXB build information. Use the following template as the content for file <tt>build.xml</tt> and rename the property value accordingly:
 
 
<source lang="xml">
 
<project name="sub-build" default="compile-schema-and-decorate" basedir=".">
 
 
  <property name="schema.name"  value="MockCrawlerSchema" />
 
 
 
  <import file="../SMILA.builder/xjc/build.xml" />
 
 
 
</project>
 
</source>
 
**Launch <tt>ant -Dlib.dir=../../lib</tt> from a cmd console to create the java files or to see any error messages. 
 
Note that here is supposed that JAXB and ANT lib is just a lower level that workspace.
 
<br> '''Note:''' If you rename the schema file name, make sure to update the following locations:
 
*Plug-in implementation classes
 
*<tt>MockCrawlerSchema.jxb</tt> (it also should be renamed with the same name as schema)
 
*<tt>build.xml</tt>
 
 
== OSGi and Declarative Service requirements  ==
 
 
*It is not required to implement a BundleActivator.
 
*Create the top level folder <tt>OSGI-INF</tt>.
 
*Create a Component Description file in <tt>OSGI-INF</tt>. You can name the file as you like, but it is good practice to name it like the crawler. Therein you have to provide a unique component name, it should be the same as the crawler's class name. Then you have to provide your implementation class and the service interface class, which is always <tt>org.eclipse.smila.connectivity.framework.Crawler</tt>. Here is an example for the <tt>MockCrawler</tt> component description file you can use as a template:
 
 
<source lang="xml">
 
<component name="MockCrawler" immediate="false" factory="CrawlerFactory">
 
    <implementation class="mypackage.crawer.mock.MockCrawler" />
 
    <service>
 
        <provide interface="org.eclipse.smila.connectivity.framework.Crawler"/>
 
    </service>   
 
</component>
 
</source>
 
 
*Add a ''Service-Component'' entry to your manifest file, e.g.:
 
<pre>Service-Component: OSGI-INF/mockcrawler.xml
 
</pre>
 
*Open <tt>build.properties</tt> and change the binary build: Add the folders <tt>OSGI-INF</tt> and <tt>schemas</tt> as well as the file <tt>plugin.xml</tt>.
 
 
<source lang="xml">
 
bin.includes = META-INF/,\
 
              .,\
 
              plugin.xml,\
 
              schemas/,\
 
              OSGI-INF/
 
</source>
 
 
<br>
 
 
== Implement your crawler  ==
 
 
*Implement your crawler in a new class extending <tt>org.eclipse.smila.connectivity.framework.AbstractCrawler</tt>.
 
 
*Integrate your new agent bundle into the build process: Refer to the page [[SMILA/Development Guidelines/How to integrate new bundle into build process|How to integrate new bundle into build process]] for further instructions.
 
 
* Follow the example of [[SMILA/Component Examples/FileSystemCrawler|FileSystemCrawler]]
 
[optional]
 
 
*Create a JUnit test bundle for this crawler e.g. <tt>myplugin.crawler.mock.test</tt>.
 
*Integrate your test bundle into the build process: Refer to the page [[SMILA/Development Guidelines/How to integrate test bundle into build process|How to integrate test bundle into build process]]) for further instructions.
 
 
== Activate your crawler  ==
 
 
=== Activation SMILA in eclipse  ===
 
 
*Open the ''Run'' dialog, switch to the configuration page of ''Bundles'', select your bundle and set the parameter ''Default Auto-Start'' to ''true''.
 
*Launch <tt>SMILA.launch</tt>.
 
 
=== Activation SMILA application  ===
 
 
*Insert your bundle , e.g. <tt>myplugin.crawler.mock@4:start</tt>, to the <tt>config.ini</tt> file.
 
*Launch SMILA by calling either <tt>SMILA.exe</tt> or <tt>eclipse.exe -console</tt>
 
 
== Run your crawler  ==
 
 
Information on how to start and run an Crawler can be found in the [[SMILA/Documentation/CrawlerController|CrawlerController]] documentation.
 
 
[[Category:SMILA]]
 

Latest revision as of 11:37, 19 January 2012

Back to the top