Jump to: navigation, search

SMILA/Documentation/BPEL Workflow Processor

This page describes how to configure the SMILA BPEL workflow processor and how to call SMILA pipelets from BPEL processes. We do not assume any BPEL knowledge here, i.e. this page should contain everything to enable you to create at least simple BPEL processes for being used in SMILA.

Basic configuration

The BPEL WorkflowProcessor expects its configuration in configuration/org.eclipse.smila.processing.bpel. See the test bundle org.eclipse.smila.processing.bpel.test for an example. In this directory it expects a file named processor.properties that describes the main configuration. This file can contain the following SMILA specific properties:

  • pipeline.dir (default="pipelines"): The name of a folder below configuration/org.eclipse.smila.processing.bpel which contains the BPEL process files (together with all needed XSD and WSDL files) and the ODE specific deploy.xml file. See below for details.
  • pipeline.timeout (default="300"): Maximum time in seconds allowed for processing a pipeline. If a pipeline invocation takes longer, it is aborted with an error. You may want to increase this value in case you expect longer processing times in your application (e.g. when analyzing very large documents).
  • record.filter (default = none): A record filter defining the attributes and annotations that should be contained in BPEL workflow objects. If none is set, the workflow objects will contain only the record IDs to be processed. Add only those attributes and annotations to the filter that are actually used in any pipeline, because adding too many (and too huge) elements to the workflow object may decrease performance and use more memory. As the WorkflowProcessor uses the Blackboard to filter objects, you must define the filters in org.eclipse.smila.blackboard/RecordFilters.xml.

As the BPEL WorkflowProcessor is based on the Apache ODE BPEL engine [1], you can also add all ODE specific configuration properties to this file, just use the prefix ode. See ODE documentation for details. You have to add at least the configuration for a database connection which ODE needs for internal purposes (e.g. storing process definitions). For SMILA purposes usually an in-memory HSQLDB instance is completely sufficient, the HSQLDB library is incldued in bundle org.apache.ode. To use it, set the following properties:

ode.db.mode=internal
ode.db.int.driver=org.hsqldb.jdbcDriver
ode.db.int.jdbcurl=jdbc:hsqldb:mem:odedb
ode.db.int.username=sa
ode.db.int.password=

If you want to use a "real" database you will have to make the JDBC driver available to bundle "org.apache.ode", and check the ODE documentation on how to prepare the database schema for ODE.

Pipeline definition using BPEL

The minimal BPEL process for SMILA pipelines looks like this:

<?xml version="1.0" encoding="utf-8" ?>
<process name="$PIPELINENAME" targetNamespace="http://www.eclipse.org/smila/processor"
    xmlns="http://docs.oasis-open.org/wsbpel/2.0/process/executable" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:proc="http://www.eclipse.org/smila/processor" expressionLanguage="urn:oasis:names:tc:wsbpel:2.0:sublang:xpath2.0"
    queryLanguage="urn:oasis:names:tc:wsbpel:2.0:sublang:xpath2.0" xmlns:rec="http://www.eclipse.org/smila/record"
    xmlns:id="http://www.eclipse.org/smila/id">
 
    <import location="processor.wsdl" namespace="http://www.eclipse.org/smila/processor"
        importType="http://schemas.xmlsoap.org/wsdl/" />
 
    <partnerLinks>
        <partnerLink name="Pipeline" partnerLinkType="proc:ProcessorPartnerLinkType" myRole="service" />
    </partnerLinks>
 
    <extensions>
        <extension namespace="http://www.eclipse.org/smila/processor" mustUnderstand="no" />
    </extensions>
 
    <variables>
        <variable name="request" messageType="proc:ProcessorMessage" />
    </variables>
 
    <sequence>
        <receive name="start" partnerLink="Pipeline" portType="proc:ProcessorPortType" operation="process"
            variable="request" createInstance="yes" />
 
        <reply name="end" partnerLink="Pipeline" portType="proc:ProcessorPortType" operation="process" variable="request" />
        <exit />
    </sequence>
</process>

To create a new pipeline:

  1. Copy the above snippet to a new file with the suffix .bpel and saved it to the folder configuration/org.eclipse.smila.processing.bpel/$pipeline.dir.
  2. Then replace $PIPELINENAME by the desired name of your pipeline.
  3. Next, copy the files id.xsd, record.xsd, and processor.wsdl from the xml directory in bundle org.eclipse.smila.processing.bpel to the same folder next to your .bpel file.
  4. Then, still in the same folder, create a file named deploy.xml containing the following content but replace $PIPELINENAME by the name of the new pipeline:
<deploy xmlns="http://www.apache.org/ode/schemas/dd/2007/03" xmlns:proc="http://www.eclipse.org/smila/processor">
    <process name="proc:$PIPELINENAME">
        <in-memory>true</in-memory>
        <provide partnerLink="Pipeline">
            <service name="proc:$PIPELINENAME" port="ProcessorPort" />
        </provide>
    </process>
</deploy>

You can now add pipelet invocations to your pipeline BPEL. To add another pipelet you have to add only another BPEL file and copy the <process> element in deploy.xml for the new pipeline.

Pipelet invocations

Pipelets (aka Simple Pipelets) are classes that implement interface org.eclipse.smila.processing.SimplePipelet (in bundle org.eclipse.smila.processing) and are listed in the SMILA-Pipelets manifest header of the bundles that contain them. They are configured by the WorkflowProcessor on pipeline initialization. One instance is created for each time they occur in any pipeline, instances are not shared between multiple pipelines. Examples that come with the base SMILA distribution are

  • in bundle org.eclipse.smila.processing.pipelets:
    • org.eclipse.smila.processing.pipelets.SetAnnotationPipelet: Sets a configured annotation for each record in the input variable.
    • org.eclipse.smila.processing.pipelets.CommitRecordsPipelet: Commits each record in the input variable on the blackboard to the storages.
    • org.eclipse.smila.processing.pipelets.HtmlToTextPipelet: Extract plain text and metadata from an HTML document in an attribute or attachment of each record and writes it to configurable attributes or attachments.
  • in bundle org.eclipse.smila.processing.pipelets.aperture:
    • org.eclipse.smila.processing.pipelets.aperture.AperturePipelet: Uses Aperture to convert many kinds of documents to plain text.
  • in bundle org.eclipse.smila.processing.pipelets.xmlprocessing:
    • A collection of pipelets for XML processing (XSLT, XPath selection, ...) of documents.

To use such a pipelet in your pipeline, use the SMILA specific BPEL extension activity <invokePipelet> somewhere between <receive> and <reply> in your BPEL process:

<extensionActivity name="invokeSomePipelet">
  <proc:invokePipelet>
    <proc:pipelet class="org.eclipse.smila.pipelet.SomePipelet" />
    <proc:variables input="request" output="request" />
    <proc:configuration>
      <rec:Val key="single-parameter">value</rec:Val>
      <rec:Seq key="multi-parameter">
        <rec:Val>value1</rec:Val>
        <rec:Val>value2</rec:Val>
      <rec:Seq>
      <rec:Map key="complex-parameter">
        <rec:Val key="sub-parameter1">sub-value1</rec:Val>
        <rec:Val key="sub-parameter2">sub-value2</rec:Val>
      </rec:Map>
      <!-- more configuration parameters -->
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

Replace the class name with the class name of the pipelet to use and add configuration parameters as needed - this should be documented by the pipelet provider. The configuration is a generic AnyMap object like the one used as record metadata, see SMILA/Documentation/2011.Simplification/Data Model and Serialization Formats for details. If the output variable is the same as the input variable (which is usually sufficient), you can omit the output attribute.

Pipeline invocations

You can also invoke one pipeline from another to group pipelet invocations that belong together. To do this you have to use the standard BPEL invoke activity to invoke a BPEL partner link for the sub pipeline:

  • define a partner link in the <partnerLinks> section of the BPEL file, replace $SUBPIPELINENAME with the name of pipeline to invoke as defined in its <process> element:
<partnerLinks>
  <partnerLink name="Pipeline" partnerLinkType="proc:ProcessorPartnerLinkType" myRole="service" />
  <partnerLink name="$SUBPIPELINENAME" partnerLinkType="proc:ProcessorPartnerLinkType" partnerRole="service" />
</partnerLinks>
  • add an BPEL <invoke> activity between <receive> and <reply>, replace $SUBPIPELINENAME with the pipeline name and adapt the inputVariable and outputVariable attributes if necessary (omitting outputVariable is not allowed here!):
<invoke name="invokeSubPipeline" operation="process" portType="proc:ProcessorPortType" 
  partnerLink="$SUBPIPELINENAME" inputVariable="request" outputVariable="request"
/>
  • add a declaration for the partner link in the deploy.xml entry of your pipeline:
<deploy xmlns="http://www.apache.org/ode/schemas/dd/2007/03" xmlns:proc="http://www.eclipse.org/smila/processor">
    <process name="proc:$PIPELINENAME">
        <in-memory>true</in-memory>
        <provide partnerLink="Pipeline">
            <service name="proc:$PIPELINENAME" port="ProcessorPort" />
        </provide>
        <invoke partnerLink="$SUBPIPELINENAME">
            <service name="proc:$SUBPIPELINENAME" port="ProcessorPort" />
        </invoke>
    </process>
</deploy>

Advanced process definition

You can of course use all other BPEL elements, too, to create your pipelines like conditions, iterations, parallel flows, invocation of external Web Services, etc. However, to describe them is beyond the scope of this introduction and requires "real" knowledge about BPEL (and WSDL, for invoking Web Services).