|
|
Line 1: |
Line 1: |
− | {{note|This is deprecated for SMILA 1.0, the connectivity framework is still functional but will aimed to be replaced by scalable import based on SMILAs job management.}}
| + | #REDIRECT [[SMILA/Development Guidelines/How to implement a crawler]] |
− | | + | |
− | This page explains how to implement a [[SMILA/Glossary#C|Crawler]] and [[SMILA/Howto integrate a component in SMILA|add its functionality]] to SMILA v1.0.
| + | |
− | | + | |
− | == Prepare bundle and manifest ==
| + | |
− | | + | |
− | *Create a new bundle that will contain your crawler. Follow the instructions on [[SMILA/Development Guidelines/Create a bundle (plug-in)|How to create a bundle]]. In this sample we use the prefix <tt>myplugin.crawler.mock</tt> for the name of project.
| + | |
− | *For crawler JXB code generation we need to '''import SMILA.builder''' plugin-project into our workspace. YOu can find it via svn from [[http://dev.eclipse.org/svnroot/rt/org.eclipse.smila/tags/0.9/core/SMILA.builder]].
| + | |
− | | + | |
− | *Edit the plugin configuration and add '''at least''' the following packages to the ''Import-Package'' section of ''Dependencies'' tab.
| + | |
− | **<tt>org.eclipse.smila.connectivity;version="1.0.0"</tt>
| + | |
− | **<tt>org.eclipse.smila.connectivity.framework;version="1.0.0"</tt>
| + | |
− | **<tt>org.eclipse.smila.connectivity.framework.performancecounters;version="1.0.0"</tt>
| + | |
− | **<tt>org.eclipse.smila.connectivity.framework.schema;version="1.8.0"</tt>
| + | |
− | **<tt>org.eclipse.smila.connectivity.framework.schema.config;version="1.0.0"</tt>
| + | |
− | **<tt>org.eclipse.smila.connectivity.framework.schema.config.interfaces;version="1.0.0"</tt>
| + | |
− | **<tt>org.eclipse.smila.connectivity.framework.util;version="1.0.0"</tt>
| + | |
− | **<tt>org.eclipse.smila.datamodel;version="1.0.0"</tt>
| + | |
− | | + | |
− | *you will have to add additional packages to fill you crawler with business logic !
| + | |
− | | + | |
− | *Now your MANIFEST.MF file should be like
| + | |
− | <source lang="text">
| + | |
− | Manifest-Version: 1.0
| + | |
− | Bundle-ManifestVersion: 2
| + | |
− | Bundle-Name: Mock Crawler
| + | |
− | Bundle-SymbolicName: myplugin.crawler.mock
| + | |
− | Bundle-Version: 1.0.0
| + | |
− | Bundle-RequiredExecutionEnvironment: JavaSE-1.6
| + | |
− | Import-Package:
| + | |
− | org.eclipse.smila.connectivity;version="1.0.0",
| + | |
− | org.eclipse.smila.connectivity.framework;version="1.0.0",
| + | |
− | org.eclipse.smila.connectivity.framework.performancecounters;version="1.0.0",
| + | |
− | org.eclipse.smila.connectivity.framework.schema;version="1.0.0",
| + | |
− | org.eclipse.smila.connectivity.framework.schema.config;version="1.0.0",
| + | |
− | org.eclipse.smila.connectivity.framework.schema.config.interfaces;version="1.0.0",
| + | |
− | org.eclipse.smila.connectivity.framework.util;version="1.0.0",
| + | |
− | org.eclipse.smila.datamodel;version="1.0.0"
| + | |
− | </source>
| + | |
− | | + | |
− | == Prepare DataSourceConnect schema and classes ==
| + | |
− | | + | |
− | *create an additional source folder <tt>code/gen</tt> to contain the generated schema sources
| + | |
− | **Right-click your bundle and click ''New > Source Folder''.
| + | |
− | **Enter "code/gen" as the folder name.
| + | |
− | **edit build.properties and add folder <tt>code/gen</tt> to the source folders.
| + | |
− | | + | |
− | <source lang="text">
| + | |
− | source.. = code/src/,\
| + | |
− | code/gen/
| + | |
− | output.. = code/bin/
| + | |
− | </source>
| + | |
− | | + | |
− | <br>
| + | |
− | | + | |
− | *create schema definition
| + | |
− | **create a folder <tt>schema</tt> in your bundle
| + | |
− | **create file <tt>schemas\MockCrawlerSchema.xsd</tt> to contain the XSD schema for the crawler configuration based on the abstract XSD schema "RootDataSourceConnectionConfigSchema"
| + | |
− | **therin you have to provide definitions of "Process" and "Attribute" nodes for crawler specific information
| + | |
− | **the following code snippet can be used as a template
| + | |
− | | + | |
− | <source lang="xml">
| + | |
− | <?xml version="1.0" encoding="UTF-8"?>
| + | |
− | <xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
| + | |
− | <xs:redefine schemaLocation="../../org.eclipse.smila.connectivity.framework.schema/schemas/RootDataSourceConnectionConfigSchema.xsd">
| + | |
− | <xs:complexType name="Process">
| + | |
− | <xs:annotation>
| + | |
− | <xs:documentation>Process Specification</xs:documentation>
| + | |
− | </xs:annotation>
| + | |
− | <xs:complexContent>
| + | |
− | <xs:extension base="Process">
| + | |
− | | + | |
− | <\!--define crawler specific process here -->
| + | |
− | | + | |
− | </xs:extension>
| + | |
− | </xs:complexContent>
| + | |
− | </xs:complexType>
| + | |
− | <xs:complexType name="Attribute">
| + | |
− | <xs:complexContent>
| + | |
− | <xs:extension base="Attribute">
| + | |
− | | + | |
− | <\!--define crawler specific attributes here -->
| + | |
− | | + | |
− | </xs:extension>
| + | |
− | </xs:complexContent>
| + | |
− | </xs:complexType>
| + | |
− | </xs:redefine>
| + | |
− | </xs:schema>
| + | |
− | </source>
| + | |
− | | + | |
− | *create JAXB mapping
| + | |
− | **create file <tt>schemas\MockCrawlerSchema.jxb</tt> to contain the JAXB mappings used for generating configuration classes.
| + | |
− | **Here is an example for the <tt>MockCrawler</tt> JXB file you can use as a template, just rename the "schemaLocation" and "package name":
| + | |
− | | + | |
− | <source lang="xml">
| + | |
− | <jxb:bindings version="1.0"
| + | |
− | xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
| + | |
− | xmlns:xs="http://www.w3.org/2001/XMLSchema"
| + | |
− | >
| + | |
− | <jxb:bindings schemaLocation="MockCrawlerSchema.xsd" node="/xs:schema">
| + | |
− | <jxb:schemaBindings>
| + | |
− | <jxb:package name="mypackage.crawler.mock.messages"/>
| + | |
− | </jxb:schemaBindings>
| + | |
− | <jxb:globalBindings>
| + | |
− | <jxb:javaType name="java.util.Date" xmlType="xs:dateTime" printMethod="org.eclipse.smila.connectivity.framework.schema.tools.SimpleDateFormatter.print" parseMethod="org.eclipse.smila.connectivity.framework.schema.tools.SimpleDateFormatter.parse"/>
| + | |
− | <jxb:javaType name="org.eclipse.smila.connectivity.framework.schema.config.MimeTypeAttributeType" xmlType="MimeTypeAttributeType" parseMethod="org.eclipse.smila.connectivity.framework.schema.config.MimeTypeAttributeType.fromValue" printMethod="org.eclipse.smila.connectivity.framework.schema.config.MimeTypeAttributeType.toValue"/>
| + | |
− | <jxb:serializable uid="1"/>
| + | |
− | </jxb:globalBindings>
| + | |
− | </jxb:bindings>
| + | |
− | </jxb:bindings>
| + | |
− | </source>
| + | |
− | | + | |
− | <br>
| + | |
− | | + | |
− | *Add a schema location reference in the plug-in implementation
| + | |
− | **Create a new class (<tt>DataSourceConnectionConfigPluginImpl</tt>) which implements the interface <tt>DataSourceConnectionConfigPlugin</tt>.
| + | |
− | **Use the method <tt>String getSchemaLocation()</tt> to return "schemas/MockCrawlerSchema.xsd".
| + | |
− | **Use the method <tt>String getMessagesPackage()</tt> to return package name"mypackage.crawler.mock.messages".
| + | |
− | | + | |
− | Here is an example implementation for the <tt>MockCrawler</tt> you can use as a template: <source lang="java">
| + | |
− | package mypackage.crawler.mock;
| + | |
− | | + | |
− | import org.eclipse.smila.connectivity.framework.schema.DataSourceConnectionConfigPlugin;
| + | |
− | | + | |
− | /**
| + | |
− | * The Class DataSourceConnectionConfigPluginImpl.
| + | |
− | */
| + | |
− | public class DataSourceConnectionConfigPluginImpl implements DataSourceConnectionConfigPlugin {
| + | |
− | | + | |
− | /**
| + | |
− | * {@inheritDoc}
| + | |
− | *
| + | |
− | * @see org.eclipse.smila.connectivity.framework.schema.DataSourceConnectionConfigPlugin#getSchemaLocation()
| + | |
− | */
| + | |
− | public String getSchemaLocation() {
| + | |
− | return "schemas/MockCrawlerSchema.xsd";
| + | |
− | }
| + | |
− | | + | |
− | /**
| + | |
− | * {@inheritDoc}
| + | |
− | *
| + | |
− | * @see org.eclipse.smila.connectivity.framework.schema.DataSourceConnectionConfigPlugin#getMessagesPackage()
| + | |
− | */
| + | |
− | public String getMessagesPackage() {
| + | |
− | return "mypackage.crawler.mock.messages";
| + | |
− | }
| + | |
− | | + | |
− | }
| + | |
− | </source>
| + | |
− | | + | |
− | *create new file <tt>plugin.xml</tt>
| + | |
− | **define the extension for <tt>org.eclipse.smila.connectivity.framework.schema.extension</tt>, using the bundle name as ID and NAME.
| + | |
− | **set the schema class to your implmenetation of interface <tt>DataSourceConnectionConfigPlugin</tt>
| + | |
− | **Here is an example for the <tt>MockCrawler</tt> <tt>plugin.xml</tt> file you can use as a template:
| + | |
− | | + | |
− | <source lang="xml">
| + | |
− | <plugin>
| + | |
− | <extension
| + | |
− | id="myplugin.crawler.mock"
| + | |
− | name="myplugin.crawler.mock"
| + | |
− | point="org.eclipse.smila.connectivity.framework.schema.extension">
| + | |
− | <schema
| + | |
− | class="mypackage.crawler.mock.DataSourceConnectionConfigPluginImpl">
| + | |
− | </schema>
| + | |
− | </extension>
| + | |
− | </plugin>
| + | |
− | </source>
| + | |
− | | + | |
− | <br>
| + | |
− | | + | |
− | *Compile schema into JAXB classes by using <tt>ant</tt>
| + | |
− | **See [[SMILA/Development Guidelines/Setup for JAXB code generation]] for instruction on how to setup the JAXB generation tools. '''''It is advised to let lib outside the workspace, for example in a lower level folder.'''''
| + | |
− | **create a new file <tt>build.xml</tt> to contain JXB build information. Use the following template as the content for file <tt>build.xml</tt> and rename the property value accordingly:
| + | |
− | | + | |
− | <source lang="xml">
| + | |
− | <project name="sub-build" default="compile-schema-and-decorate" basedir=".">
| + | |
− | | + | |
− | <property name="schema.name" value="MockCrawlerSchema" />
| + | |
− |
| + | |
− | <import file="../SMILA.builder/xjc/build.xml" />
| + | |
− |
| + | |
− | </project>
| + | |
− | </source>
| + | |
− | **Launch <tt>ant -Dlib.dir=../../lib</tt> from a cmd console to create the java files or to see any error messages.
| + | |
− | Note that here is supposed that JAXB and ANT lib is just a lower level that workspace.
| + | |
− | <br> '''Note:''' If you rename the schema file name, make sure to update the following locations:
| + | |
− | *Plug-in implementation classes
| + | |
− | *<tt>MockCrawlerSchema.jxb</tt> (it also should be renamed with the same name as schema)
| + | |
− | *<tt>build.xml</tt>
| + | |
− | | + | |
− | == OSGi and Declarative Service requirements ==
| + | |
− | | + | |
− | *It is not required to implement a BundleActivator.
| + | |
− | *Create the top level folder <tt>OSGI-INF</tt>.
| + | |
− | *Create a Component Description file in <tt>OSGI-INF</tt>. You can name the file as you like, but it is good practice to name it like the crawler. Therein you have to provide a unique component name, it should be the same as the crawler's class name. Then you have to provide your implementation class and the service interface class, which is always <tt>org.eclipse.smila.connectivity.framework.Crawler</tt>. Here is an example for the <tt>MockCrawler</tt> component description file you can use as a template:
| + | |
− | | + | |
− | <source lang="xml">
| + | |
− | <component name="MockCrawler" immediate="false" factory="CrawlerFactory">
| + | |
− | <implementation class="mypackage.crawer.mock.MockCrawler" />
| + | |
− | <service>
| + | |
− | <provide interface="org.eclipse.smila.connectivity.framework.Crawler"/>
| + | |
− | </service>
| + | |
− | </component>
| + | |
− | </source>
| + | |
− | | + | |
− | *Add a ''Service-Component'' entry to your manifest file, e.g.:
| + | |
− | <pre>Service-Component: OSGI-INF/mockcrawler.xml
| + | |
− | </pre>
| + | |
− | *Open <tt>build.properties</tt> and change the binary build: Add the folders <tt>OSGI-INF</tt> and <tt>schemas</tt> as well as the file <tt>plugin.xml</tt>.
| + | |
− | | + | |
− | <source lang="xml">
| + | |
− | bin.includes = META-INF/,\
| + | |
− | .,\
| + | |
− | plugin.xml,\
| + | |
− | schemas/,\
| + | |
− | OSGI-INF/
| + | |
− | </source>
| + | |
− | | + | |
− | <br>
| + | |
− | | + | |
− | == Implement your crawler ==
| + | |
− | | + | |
− | *Implement your crawler in a new class extending <tt>org.eclipse.smila.connectivity.framework.AbstractCrawler</tt>.
| + | |
− | | + | |
− | *Integrate your new agent bundle into the build process: Refer to the page [[SMILA/Development Guidelines/How to integrate new bundle into build process|How to integrate new bundle into build process]] for further instructions.
| + | |
− | | + | |
− | * Follow the example of [[SMILA/Component Examples/FileSystemCrawler|FileSystemCrawler]]
| + | |
− | [optional]
| + | |
− | | + | |
− | *Create a JUnit test bundle for this crawler e.g. <tt>myplugin.crawler.mock.test</tt>.
| + | |
− | *Integrate your test bundle into the build process: Refer to the page [[SMILA/Development Guidelines/How to integrate test bundle into build process|How to integrate test bundle into build process]]) for further instructions.
| + | |
− | | + | |
− | == Activate your crawler ==
| + | |
− | | + | |
− | === Activation SMILA in eclipse ===
| + | |
− | | + | |
− | *Open the ''Run'' dialog, switch to the configuration page of ''Bundles'', select your bundle and set the parameter ''Default Auto-Start'' to ''true''.
| + | |
− | *Launch <tt>SMILA.launch</tt>.
| + | |
− | | + | |
− | === Activation SMILA application ===
| + | |
− | | + | |
− | *Insert your bundle , e.g. <tt>myplugin.crawler.mock@4:start</tt>, to the <tt>config.ini</tt> file.
| + | |
− | *Launch SMILA by calling either <tt>SMILA.exe</tt> or <tt>eclipse.exe -console</tt>
| + | |
− | | + | |
− | == Run your crawler ==
| + | |
− | | + | |
− | Information on how to start and run an Crawler can be found in the [[SMILA/Documentation/CrawlerController|CrawlerController]] documentation.
| + | |
− | | + | |
− | [[Category:SMILA]]
| + | |