Difference between revisions of "SMILA/Documentation/HowTo/How to integrate the HelloWorld webservice as a Pipelet"

From Eclipsepedia

Jump to: navigation, search
(Create Java classes from WSDL using Axis2)
m (Create new bundle)
 
(25 intermediate revisions by 6 users not shown)
Line 15: Line 15:
 
Plug-in Provider: your name or company
 
Plug-in Provider: your name or company
 
</pre>
 
</pre>
* Then integrate your new bundle into the SMILA build process. Refer to the instructions on [[SMILA/Development_Guidelines/How_to_integrate_new_bundle_into_build_process|How to integrate a new bundle into build process]] for details.
 
 
* Edit the file <tt>META-INF/MANIFEST.MF</tt> and add the following import-package dependencies as those are required to implement the basic functionalities of your pipelet:
 
* Edit the file <tt>META-INF/MANIFEST.MF</tt> and add the following import-package dependencies as those are required to implement the basic functionalities of your pipelet:
 
<pre>
 
<pre>
 
Import-Package: org.apache.commons.logging;version="1.1.1",
 
Import-Package: org.apache.commons.logging;version="1.1.1",
  org.eclipse.smila.blackboard;version="0.5.0",
+
  org.eclipse.smila.blackboard;version="0.8.0",
org.eclipse.smila.blackboard.path;version="0.5.0",
+
  org.eclipse.smila.datamodel;version="0.8.0",
  org.eclipse.smila.datamodel.id;version="0.5.0",
+
  org.eclipse.smila.processing;version="0.8.0"
org.eclipse.smila.datamodel.record;version="0.5.0",
+
  org.eclipse.smila.processing;version="0.5.0",
+
org.eclipse.smila.processing.configuration;version="0.5.0"
+
 
</pre>
 
</pre>
* To make sure that the <tt>PipeletTrackerService</tt> detects your new pipelet, add the following line to the file <tt>META-INF/MANIFEST.MF</tt>. This registers the class that will implement your SMILA pipelet:
+
* To make sure that the <tt>PipeletTrackerService</tt> detects your new pipelet, create a folder <tt>SMILA-INF</tt> in the bundle and add a file <tt>HelloWorldPipelet.json</tt> to this folder:
 
<pre>
 
<pre>
SMILA-Pipelets: org.eclipse.smila.sample.pipelet.HelloWorldPipelet
+
{
 +
  "class": "org.eclipse.smila.sample.pipelet.HelloWorldPipelet",
 +
  "parameters": [       
 +
    {
 +
      "name": "IN_ATT_NAME",
 +
      "type": "string"           
 +
    },
 +
    {
 +
      "name": "OUT_ATT_NAME",
 +
      "type": "string"           
 +
    }
 +
  ],
 +
  "description": "Hello World pipelet. Modifies the content of the attribut denoted by the parameter IN_ATT_NAME to the attribute denoted by the parameter OUT_ATT_NAME."
 +
}
 
</pre>
 
</pre>
 +
*Now add the folder <tt>SMILA-INF</tt> to the build.properties (or just check it in the <tt>Build</tt> view of the <tt>MANIFEST.MF</tt> file in your IDE.
  
 
=== Create Java classes from WSDL using Axis2 ===
 
=== Create Java classes from WSDL using Axis2 ===
  
 
+
* Install Axis2 1.4.1: Download from http://ws.apache.org/axis2/download/1_4_1/download.cgi and unpack into any directory.
* install Axis2 1.4.1: Download from http://ws.apache.org/axis2/download/1_4_1/download.cgi and unpack into any directory.
+
* Open a shell in the Axis2 directory and execute <tt>wsdl2java</tt> similar to this example - replace the WSDL-URL with that of the Webservice you want to use after <tt>-uri</tt>, change the package name after <tt>-p</tt> and the output directory after <tt>-o</tt>:
* Open a shell in the Axis2 directory and execute <tt>wsdl2java</tt> similar to this example (replace the WSDL-URL with that of the Webservice you want to use, change the package name after <tt>-p</tt> and the output directory after <tt>-o</tt>):
+
 
<source lang="text">
 
<source lang="text">
bin\wsdl2java -uri http://localhost:8081/axis2/services/HelloWorld.HelloWorldImplPort?wsdl -d xmlbeans -p com.empolis.smila.sample.helloworld -s -o helloworld-ws
+
bin\wsdl2java -uri http://localhost:8081/axis2/services/HelloWorld.HelloWorldImplPort?wsdl  
 +
  -d xmlbeans -p org.eclipse.smila.sample.helloworld -s -o helloworld-ws
 
</source>
 
</source>
 
: This creates two folders inside <tt>helloworld-ws</tt>: <tt>src</tt> and <tt>resources</tt>.  
 
: This creates two folders inside <tt>helloworld-ws</tt>: <tt>src</tt> and <tt>resources</tt>.  
Line 65: Line 75:
 
* Create a source folder <tt>code/gen</tt> in your bundle and move the '''content''' of the generated <tt>src</tt> folder into it.
 
* Create a source folder <tt>code/gen</tt> in your bundle and move the '''content''' of the generated <tt>src</tt> folder into it.
 
* Create a folder <tt>lib</tt> in your bundle, create a zip file from the '''content''' of  the generated <tt>resources</tt> folder, change the suffix to <tt>jar</tt> and move it to <tt>lib</tt>. Refresh the bundle in your Eclipse workspace, and add this jar to the Bundle-Classpath of your bundle (Manifest editor, tab Runtime, Classpath setting).
 
* Create a folder <tt>lib</tt> in your bundle, create a zip file from the '''content''' of  the generated <tt>resources</tt> folder, change the suffix to <tt>jar</tt> and move it to <tt>lib</tt>. Refresh the bundle in your Eclipse workspace, and add this jar to the Bundle-Classpath of your bundle (Manifest editor, tab Runtime, Classpath setting).
* Use the generated webservice client code in your pipelet's <tt>process</tt> method, e.g:
 
  
<source lang="java">
 
try {
 
    HelloWorldStub ws = new HelloWorldStub("http://localhost:8081/axis2/services/HelloWorld.HelloWorldImplPort");
 
    SayHiDocument sayHiDoc = SayHiDocument.Factory.newInstance();
 
    SayHi sayHi = sayHiDoc.addNewSayHi();
 
    sayHi.setArg0("SMILA");
 
    SayHiResponseDocument respDoc = ws.sayHi(sayHiDoc);
 
    SayHiResponse response = respDoc.getSayHiResponse();
 
    _log.info("Webservice responded: " + response.getReturn());
 
    return ids;
 
} catch (AxisFault e) {
 
    throw new ProcessingException(e);
 
} catch (RemoteException e) {
 
    throw new ProcessingException(e);
 
} catch (Throwable e) {
 
    // useful for detecting missing import-package declarations. Should probably not be used in production code...
 
    throw new ProcessingException(e);
 
}
 
</source>
 
  
 
== Implementation ==
 
== Implementation ==
  
 
* Create the package <tt>org.eclipse.smila.sample.pipelet</tt> and the Java class <tt>HelloWorldPipelet</tt>.
 
* Create the package <tt>org.eclipse.smila.sample.pipelet</tt> and the Java class <tt>HelloWorldPipelet</tt>.
* Use the following code as a template for your new class. It contains empty method bodies and a reference to the logger. In the following we are going to gradually replace the comments in this file by the corresponding code snippets. For your convenience you may also download the complete zipped source file from [[Media:HelloWorldPipelet.zip|HelloWorldPipelet.zip]].
+
* Use the following code as a template for your new class. It contains empty method bodies and a reference to the logger. In the following we are going to gradually replace the comments in this file by the corresponding code snippets. For your convenience you may also download the complete zipped source file from [[Media:HelloWorldPipelet_0.9.zip|HelloWorldPipelet.zip]].
  
 
<source lang="Java">
 
<source lang="Java">
 
package org.eclipse.smila.sample.pipelet
 
package org.eclipse.smila.sample.pipelet
+
 
 +
import org.apache.axis2.transport.http.HTTPConstants;
 
import org.apache.commons.logging.Log;
 
import org.apache.commons.logging.Log;
 
import org.apache.commons.logging.LogFactory;
 
import org.apache.commons.logging.LogFactory;
import org.eclipse.smila.blackboard.BlackboardService;
+
import org.eclipse.smila.blackboard.Blackboard;
import org.eclipse.smila.blackboard.path.Path;
+
import org.eclipse.smila.datamodel.AnyMap;
import org.eclipse.smila.datamodel.id.Id;
+
import org.eclipse.smila.datamodel.Value;
import org.eclipse.smila.datamodel.record.Literal;
+
import org.eclipse.smila.processing.Pipelet;
import org.eclipse.smila.datamodel.record.RecordFactory;
+
 
import org.eclipse.smila.processing.ProcessingException;
 
import org.eclipse.smila.processing.ProcessingException;
import org.eclipse.smila.processing.SimplePipelet;
+
import org.eclipse.smila.processing.parameters.ParameterAccessor;
import org.eclipse.smila.processing.configuration.PipeletConfiguration;
+
import org.eclipse.smila.processing.util.ProcessingConstants;
 +
import org.eclipse.smila.processing.util.ResultCollector;
 +
 
 +
import com.empolis.smila.sample.helloworld.HelloWorldStub;
 +
 
 +
import demo.hw.server.SayHi;
 +
import demo.hw.server.SayHiDocument;
 +
import demo.hw.server.SayHiResponse;
 +
import demo.hw.server.SayHiResponseDocument;
 
   
 
   
public class HelloWorldPipelet implements SimplePipelet {
+
public class HelloWorldPipelet implements Pipelet {
 
   
 
   
   // additional member variables  
+
   // additional member variables or constants
   private final Log _log = LogFactory.getLog(HelloWorldPipelet.class);
+
   private final Log _log = LogFactory.getLog(getClass());
  
 
   public HelloWorldPipelet(){
 
   public HelloWorldPipelet(){
 
   }
 
   }
 
   
 
   
   public void configure(PipeletConfiguration configuration) throws ProcessingException {
+
   public void configure(final AnyMap configuration) throws ProcessingException {
 
     // read the configuration properties
 
     // read the configuration properties
 
   }
 
   }
 
   
 
   
   public Id[] process(BlackboardService blackboard, Id[] recordIds) throws ProcessingException {
+
   public String[] process(final Blackboard blackboard, final String[] recordIds) throws ProcessingException {
 
     // process the recordIds and create a result
 
     // process the recordIds and create a result
 
     return null;
 
     return null;
Line 126: Line 124:
  
 
=== Read PipeletConfiguration ===
 
=== Read PipeletConfiguration ===
* First let's create two member variables that store the names of the input and output attributes as well as string constants for the property names used in the configuration. Replace the comment "<tt>// additional member variables </tt>" with the following code snippet.
+
* First let's create two constants for the property names used in the configuration (or the parameters section of the records to be processed) to retrieve the names of the source and target attribute. Replace the comment "<tt>// additional member variables or constants</tt>" with the following code snippet.
 
<source lang="Java">
 
<source lang="Java">
private final String PROP_IN_ATT_NAME= "IN_ATT_NAME";
+
  private final String PROP_IN_ATT_NAME = "IN_ATT_NAME";
private final String PROP_OUT_ATT_NAME= "OUT_ATT_NAME";
+
  
private String _inAttName;
+
  private final String PROP_OUT_ATT_NAME = "OUT_ATT_NAME";
private String _outAttName;
+
 
 +
  private AnyMap _config;
 
</source>
 
</source>
  
* Then we are going to fill those members with the attribute names provided by the <tt>PipeletConfiguration</tt> in method <tt>configure(PipeletConfiguration configuration)</tt>. The method <tt>getPropertyFirstValueNotNull(String)</tt> will check that the value of the property is not null. If it is null a <tt>ProcessingException</tt> will be thrown. In addition we should ensure, that the provided string is not empty or consists of whitespaces only. Replace the comment "<tt>// read the configuration properties</tt>" with the following code snippet.
+
* Then we are going to store the the <tt>PipeletConfiguration</tt> in method <tt>configure(final AnyMap configuration)</tt> for later evalutaion in <tt>process(final Blackboard blackboard, final String[] recordIds)</tt>. So we will allow the user of this pipelet to either use the pipelet configuration to configure the attributes as well as the records themselves (e.g. the administrator could define the attributes in a job, these job properties can override default pipelet configuration properties when using the <tt>ParameterAccessor</tt> in the process method).
 +
 
 
<source lang="Java">
 
<source lang="Java">
_inAttName = (String) configuration.getPropertyFirstValueNotNull(PROP_IN_ATT_NAME);
+
@Override
if (_inAttName.trim().length() == 0) {
+
public void configure(final AnyMap configuration) throws ProcessingException {
    throw new ProcessingException("Property " + PROP_IN_ATT_NAME + " must not be an empty String");
+
  _config = configuration;
}
+
 
+
_outAttName = (String) configuration.getPropertyFirstValueNotNull(PROP_OUT_ATT_NAME);
+
if (_outAttName.trim().length() == 0) {
+
    throw new ProcessingException("Property " + PROP_OUT_ATT_NAME + " must not be an empty String");
+
 
}
 
}
 
</source>
 
</source>
'''Note''': Of course it is also possible to store the <tt>PipeletConfiguration</tt> in a member variable and access the properties as needed in the <tt>process(BlackboardService blackboard, Id[] recordIds)</tt> method.
+
 
 +
'''Note''': It would also be possible to use and configure member variables directly in the configure method and not use the ParameterAccessor to retrieve configuration parameters in the process method. You can do so for properties that won't change during operation or will always stay the same for each record, no matter what the parameters of the record contain. Or for lengthy initialization like reading and parsing configuration from files and such. In these cases you should use member variables that are initialized in the configuration method using only the information from the PipeletConfiguration. but you should clearly document which parameters can only be defined with the PipeletConfiguration and which can be overridden in the records.
  
 
=== Process IDs and implement exception handling ===
 
=== Process IDs and implement exception handling ===
The method <tt>process(BlackboardService blackboard, Id[] recordIds)</tt> has two parameters:
+
The method <tt>process(Blackboard blackboard, String[] recordIds)</tt> has two parameters:
 
* a reference to the [[SMILA/Glossary#B|blackboard service]] that allows access on [[SMILA/Glossary#R|records]] and
 
* a reference to the [[SMILA/Glossary#B|blackboard service]] that allows access on [[SMILA/Glossary#R|records]] and
 
* a list of record IDs to process.
 
* a list of record IDs to process.
The HelloWorld pipelet should therefore iterate over the IDs in the parameter <tt>recordIds</tt>, get the required data from the record identified by the ID, process this data, and store the result in the record. Let's place a <tt>try ... catch()</tt> block in the <tt>for</tt> loop to ensure that errors do only interrupt the processing of the current ID. The comments in the code serve as placeholders for the functionality described in the following sections. At the end we return the unmodified input parameter <tt>recordIds</tt> as the result of the pipelet. Replace the comment "<tt>// process the recordIds and create a result</tt>" with the following code snippet.
+
<The HelloWorld pipelet should therefore iterate over the IDs in the parameter <tt>recordIds</tt>, get the required data from the record identified by the ID, process this data, and store the result in the record.
 +
 
 +
It is suggested that you use the <tt>org.eclipse.smila.processing.util.ResultCollector</tt> utility class to cope with result id collection that also provides a configurable exception handling approach. When creating the ResultCollector, you have to decide whether records that cause an exception will be excepted from the result set or if they will stay in the result set. We will use the system wide default <tt>ProcessingConstants.DROP_ON_ERROR_DEFAULT</tt> which is set to <tt>false</tt>. The ResultCollector will also check the ParameterAccessor for the parameter <tt>_failOnError</tt> (default: <tt>false</tt>).
 +
 
 +
Let's place a <tt>try ... catch()</tt> block in the <tt>for</tt> loop to ensure that errors do only interrupt the processing of the current ID. The comments in the code serve as placeholders for the functionality described in the following sections. At the end we ask the ResulotCollector for the set of <tt>recordIds</tt> as the result of the pipelet. Replace the comment "<tt>// process the recordIds and create a result</tt>" with the following code snippet.
 
<source lang="Java">
 
<source lang="Java">
for (Id id : recordIds) {
+
final ParameterAccessor paramAccessor = new ParameterAccessor(blackboard, _config);
 +
final ResultCollector resultCollector =
 +
      new ResultCollector(paramAccessor, _log, ProcessingConstants.DROP_ON_ERROR_DEFAULT);
 +
for (String id : recordIds) {
 
     try {
 
     try {
 +
        // read your configuration using the parameteraccessor
 +
        paramAccessor.setCurrentRecord(id);
 +
        // read configuration from the accessor
 +
 
         // Read Input Data
 
         // Read Input Data
  
Line 163: Line 169:
 
         // Write Output Data
 
         // Write Output Data
  
 +
        // add the id for a successful operation
 +
        resultCollector.addResult(id);
 
     } catch (final Exception ex) {
 
     } catch (final Exception ex) {
         if (_log.isErrorEnabled()) {
+
         // mark the id for a failed record and let the result collector handle the exception as configured
            _log.error("error during execution of HelloWorldPipelet with record " + id, ex);
+
        resultCollector.addFailedResult(id, e);
        }
+
 
     }
 
     }
 
} // for
 
} // for
  
return recordIds;
+
// let the ResultColletor decide which ids to return:
 +
return resultCollector.getResultIds();
 +
</source>
 +
'''Note''': Most of the time the return value of a pipelet is the same set of record ids as was processed (<tt>recordIds</tt>). However, in some cases a pipelet may filter record IDs or even create new records. Then the record IDs of the records to be filtered out should not be added to the ResultCollector and new record IDs have to be added to the ResultCollector in order to get the correct set of IDs as the result of the process method.
 +
 
 +
=== evaluate configuration parameters ===
 +
Now we have to determine the source an target attribute names that have to be provided with the configuration parameters <tt>PROP_IN_ATT_NAME</tt> and <tt>PROP_OUT_ATT_NAME</tt>. Therefore we first have to determine the attribute names using the parameter accessor (Note: if we didn't want to let job parameters change these attributes, we could have evaluated the piplet configuration in the configure method and stored the result in member variables, but we want to be flexible here in this example).
 +
Replace the comment <tt>// read configuration from the accessor</tt> with the following snippet:
 +
<source lang="Java">
 +
final String inAttName = paramAccessor.getRequiredParameter(PROP_IN_ATT_NAME);
 +
if (inAttName.trim().length() == 0) {
 +
  throw new ProcessingException("Property " + PROP_IN_ATT_NAME + " must not be an empty String");
 +
}
 +
final String outAttName = paramAccessor.getRequiredParameter(PROP_OUT_ATT_NAME);
 +
if (outAttName.trim().length() == 0) {
 +
  throw new ProcessingException("Property " + PROP_OUT_ATT_NAME + " must not be an empty String");
 +
}
 
</source>
 
</source>
'''Note''': Most of the time the return value of a pipelet is the unmodified input parameter <tt>recordIds</tt>. However, in some cases a pipelet may filter record IDs or even create new records. Then the return value has to be adopted appropriately.
 
  
 
=== Read input data ===
 
=== Read input data ===
Now we want to read the data of the attribute with the name stored in <tt>_inAttName</tt>. Therefore we first have to create a <tt>Path</tt> object with the attribute's name. Before accessing the literal value we check if the record contains an attribute with the given <tt>Path</tt>. In this tutorial we know that the value of the attribute is a string value, so we directly access the value by calling the method <tt>getStringValue()</tt>.
+
Now we want to read the data of the attribute we stored in <tt>inAttName</tt>.
 
Replace the comment "<tt>// Read Input Data</tt>" with the following code snippet.
 
Replace the comment "<tt>// Read Input Data</tt>" with the following code snippet.
 
<source lang="Java">
 
<source lang="Java">
 
String inputValue = "";
 
String inputValue = "";
final Path path = new Path(_inAttName);
+
if (blackboard.getMetadata(id).containsKey(inAttName)) {
if (blackboard.hasAttribute(id, path)) {
+
   inputValue = blackboard.getMetadata(id).getStringValue(inAttName);
   inputValue = blackboard.getLiteral(id, path).getStringValue();
+
 
}
 
}
 
</source>
 
</source>
'''Note''': Accessing attribute values can be achieved more generically. Therefore you have to check what data type a certain literal contains using the method <tt>getDataType()</tt>. Then you can use the appropriate getter method to access the raw data.
+
'''Note''': Accessing attribute values can be achieved more generically. Therefore you have to check what data type a certain attribute contains using the method <tt>getValueType()</tt> (or the checking methods <tt>isBoolean()</tt>... etc.). Then you can use the appropriate getter method to access the raw data.
  
 
=== Process input data ===
 
=== Process input data ===
At this point the HelloWorld web service should be called with the parameter <tt>inputValue</tt> and the result should be stored in the variable <tt>outputValue</tt>, using the classes generated from WSDL.<br/>
+
Now we will call the HelloWorld web service with the parameter <tt>inputValue</tt> and store the result in variable <tt>outputValue</tt>. Therefore we use the classes generated from WSDL by Axis2. The HelloWorld web service will return a String message in the format <tt>"Hello "</tt> + the content of variable <tt>inputValue</tt>. Replace the comment "<tt>// Process Input Data</tt>" with the following code snippet.
Until this tutorial description is complete, simply assign the content of the variable <tt>inputValue</tt> to variable <tt>outputValue</tt> and append a constant string value. Replace the comment "<tt>// Process Input Data</tt>" with the following code snippet.
+
 
<source lang="Java">
 
<source lang="Java">
    String outputValue = inputValue + " modified by HelloWorldPipelet";
+
HelloWorldStub ws = new HelloWorldStub("http://localhost:8081/axis2/services/HelloWorld.HelloWorldImplPort");
 +
ws._getServiceClient().getOptions().setProperty(HTTPConstants.CHUNKED, Boolean.FALSE);
 +
SayHiDocument sayHiDoc = SayHiDocument.Factory.newInstance();
 +
SayHi sayHi = sayHiDoc.addNewSayHi();
 +
sayHi.setArg0(inputValue);
 +
SayHiResponseDocument respDoc = ws.sayHi(sayHiDoc);
 +
SayHiResponse response = respDoc.getSayHiResponse();       
 +
String outputValue = response.getReturn();
 
</source>
 
</source>
  
 
=== Write output data ===
 
=== Write output data ===
Finally, we want to store the content of the variable <tt>outputValue</tt> in the record attribute with the name contained in variable <tt>_outAttName</tt>. Therefore we have to create a new <tt>Literal</tt> object and set its value. Then we only need to set this <tt>Literal</tt> for the current ID on the black board.
+
Finally, we want to store the content of the variable <tt>outputValue</tt> in the record attribute with the name contained in variable <tt>outAttName</tt>. Therefore we have to create a new <tt>Value</tt> object and set its value. Then we only need to set this <tt>Value</tt> for the current ID on the black board.
 
Replace the comment "<tt>// Write Output Data</tt>" with the following code snippet.
 
Replace the comment "<tt>// Write Output Data</tt>" with the following code snippet.
 
<source lang="Java">
 
<source lang="Java">
final Literal literal = RecordFactory.DEFAULT_INSTANCE.createLiteral();
+
final Value outLiteral = blackboard.getDataFactory().createStringValue(outputValue);
literal.setStringValue(outputValue);
+
blackboard.getMetadata(id).put(outAttName, outLiteral);
blackboard.setLiteral(id, new Path(_outAttName), literal);
+
 
</source>
 
</source>
 
'''Note''': The method <tt>commit(Id)</tt> of the blackboard service does not need to be called in each pipelet as it is automatically called at the end of the [[SMILA/Glossary/#p|pipeline]].
 
'''Note''': The method <tt>commit(Id)</tt> of the blackboard service does not need to be called in each pipelet as it is automatically called at the end of the [[SMILA/Glossary/#p|pipeline]].
  
 
== Configuration and invocation in BPEL ==
 
== Configuration and invocation in BPEL ==
In this tutorial we will integrate the HelloWorld pipelet in the SMILA indexing process just before the record is stored in the Lucene index. With this configuration the input for the HelloWorld pipelet will be read from attribute ''Title'' and the modified output will be stored in the same attribute, overwriting the previous value.
+
In this tutorial we will integrate the HelloWorld pipelet in the SMILA indexing process just before the record is stored in the Solr core. With this configuration the input for the HelloWorld pipelet will be read from attribute ''Title'' and the modified output will be stored in the same attribute, overwriting the previous value.
* Edit the file <tt>configuration/org.eclipse.smila.processing.bpel/pipelines/addpipeline.bpel</tt> and add the following right between the <tt><extensionActivity name="convertDocument"></tt> and the <tt><extensionActivity name="invokeLuceneService"></tt> section.
+
* Edit the file <tt>configuration/org.eclipse.smila.processing.bpel/pipelines/addpipeline.bpel</tt> and add the following right between the <tt><extensionActivity name="convertDocument"></tt> and the <tt><extensionActivity name="SolrIndexPipelet"></tt> section.
 
<source lang="XML">
 
<source lang="XML">
<extensionActivity name="invokeHelloWorldPipelet">
+
<extensionActivity>
     <proc:invokePipelet>
+
     <proc:invokePipelet name="invokeHelloWorldPipelet">
 
         <proc:pipelet class="org.eclipse.smila.sample.pipelet.HelloWorldPipelet" />
 
         <proc:pipelet class="org.eclipse.smila.sample.pipelet.HelloWorldPipelet" />
 
         <proc:variables input="request" output="request" />
 
         <proc:variables input="request" output="request" />
         <proc:PipeletConfiguration>
+
         <proc:configuration>
             <proc:Property name="IN_ATT_NAME">
+
             <rec:Value name="IN_ATT_NAME">Title</rec:Value>
                <proc:Value>Title</proc:Value>
+
            <rec:Value name="OUT_ATT_NAME">Title</rec:Value>
 
             </proc:Property>
 
             </proc:Property>
            <proc:Property name="OUT_ATT_NAME">
+
         </proc:configuration>       
                <proc:Value>Title</proc:Value>
+
            </proc:Property>
+
         </proc:PipeletConfiguration>       
+
 
     </proc:invokePipelet>
 
     </proc:invokePipelet>
 
</extensionActivity>
 
</extensionActivity>
 
</source>
 
</source>
  
== Test your pipelet ==
+
== Build and Test your pipelet ==
  
To test your pipelet, you have to include the bundle in the OSGi launch configuration:
+
Depending on the SMILA distribution you are using you have different options how to build and test your pipelet:
  
=== Run SMILA in eclipse ===
+
=== Source Code Distribution ===
 +
You can simply integrate your pipelet in the SMILA build process. Refer to the instructions on [[SMILA/Development_Guidelines/How_to_integrate_new_bundle_into_build_process|How to integrate a new bundle into build process]] for details. You can build your SMILA application and run it as usual.
 +
 
 +
In addition to building the SMILA application you can also directly run SMILA within your eclipse IDE. To test your pipelet, you have to include the bundle in the OSGi launch configuration:
 
* Open ''Run > Open Run Dialog''.  
 
* Open ''Run > Open Run Dialog''.  
 
* In the left window select ''OSGi Framework > SMILA''.  
 
* In the left window select ''OSGi Framework > SMILA''.  
Line 235: Line 261:
 
* Launch SMILA by clicking the ''Run'' button.
 
* Launch SMILA by clicking the ''Run'' button.
  
=== Run SMILA as application ===
+
=== Binary Distribution ===
* Copy your bundle to the directory <tt>%SMILA_HOME%/plugins</tt>.
+
 
 +
To test your pipelet you have to add it as a plugin to your SMILA installation.
 +
 
 +
* export your pipelet as a plugin using eclipse IDE wizards. Refer to the instructions on [[SMILA/Development_Guidelines/How_to_export_a_bundle|How to export a bundle]] for a step by step description.
 +
* Copy your plugin to the directory <tt>%SMILA_HOME%/plugins</tt>.
 
* Add the following XML snippet to the file <tt>%SMILA_HOME%/features/org.eclipse.smila.feature_1.0.0/feature.xml</tt>:
 
* Add the following XML snippet to the file <tt>%SMILA_HOME%/features/org.eclipse.smila.feature_1.0.0/feature.xml</tt>:
 
<code lang="XML">
 
<code lang="XML">
Line 246: Line 276:
 
     unpack="false"/>
 
     unpack="false"/>
 
</code>
 
</code>
* Launch SMILA by calling either <tt>eclipse.exe -console</tt> or <tt>launch.cmd</tt>.  
+
* Launch SMILA by starting <tt>SMILA.exe</tt>.  
  
 
If SMILA is running, you can start a crawling job as described in [[SMILA/Development_Guidelines#Run_and_manage_the_connectivity_framework|Run and manage the connectivity framework]] beginning at step 5.
 
If SMILA is running, you can start a crawling job as described in [[SMILA/Development_Guidelines#Run_and_manage_the_connectivity_framework|Run and manage the connectivity framework]] beginning at step 5.
While crawling your data source you can already search for indexed documents. Open your browser, navigate to [http://localhost:8080/AnyFinder/SearchForm  http://localhost:8080/AnyFinder/SearchForm] and execute a search. In the result table take a look at the attribute '''Title'''. Every '''Title''' should now have the suffix <tt>"modified by HelloWorldPipelet"</tt>, as this was added by the pipelet.
+
While crawling your data source you can already search for indexed documents. Open your browser, navigate to [http://localhost:8080/SMILA/search http://localhost:8080/SMILA/search] and execute a query. In the result table take a look at the attribute '''Title'''. Every '''Title''' should now have the suffix <tt>"modified by HelloWorldPipelet"</tt>, as this was added by the pipelet.
  
 
=== Troubleshooting ===
 
=== Troubleshooting ===

Latest revision as of 13:45, 24 October 2012

This page illustrates all steps that need to be performed in order to integrate the HelloWorld web service as a pipelet in SMILA. For general information on how to integrate components and add functionality to SMILA refer to How to integrate a component in SMILA.

Contents

[edit] Preparations

It may be helpful to first take a look at the SMILA Development guidelines as many topics that are beyond the scope of this tutorial are illustrated there.

[edit] Create new bundle

  • Create a new bundle that should contain your pipelet. Follow the instructions on How to create a bundle and use the following settings:
Project name: org.eclipse.smila.sample.pipelet
Plug-in ID: org.eclipse.smila.sample.pipelet
Plug-in Version: 1.0.0
Plug-in Name: Sample Pipelet Bundle
Plug-in Provider: your name or company
  • Edit the file META-INF/MANIFEST.MF and add the following import-package dependencies as those are required to implement the basic functionalities of your pipelet:
Import-Package: org.apache.commons.logging;version="1.1.1",
 org.eclipse.smila.blackboard;version="0.8.0",
 org.eclipse.smila.datamodel;version="0.8.0",
 org.eclipse.smila.processing;version="0.8.0"
  • To make sure that the PipeletTrackerService detects your new pipelet, create a folder SMILA-INF in the bundle and add a file HelloWorldPipelet.json to this folder:
{
  "class": "org.eclipse.smila.sample.pipelet.HelloWorldPipelet",
  "parameters": [        
    {
      "name": "IN_ATT_NAME",
      "type": "string"            
    },
    {
      "name": "OUT_ATT_NAME",
      "type": "string"            
    }
  ],
  "description": "Hello World pipelet. Modifies the content of the attribut denoted by the parameter IN_ATT_NAME to the attribute denoted by the parameter OUT_ATT_NAME."
}
  • Now add the folder SMILA-INF to the build.properties (or just check it in the Build view of the MANIFEST.MF file in your IDE.

[edit] Create Java classes from WSDL using Axis2

  • Install Axis2 1.4.1: Download from http://ws.apache.org/axis2/download/1_4_1/download.cgi and unpack into any directory.
  • Open a shell in the Axis2 directory and execute wsdl2java similar to this example - replace the WSDL-URL with that of the Webservice you want to use after -uri, change the package name after -p and the output directory after -o:
bin\wsdl2java -uri http://localhost:8081/axis2/services/HelloWorld.HelloWorldImplPort?wsdl 
  -d xmlbeans -p org.eclipse.smila.sample.helloworld -s -o helloworld-ws
This creates two folders inside helloworld-ws: src and resources.
If you do not want to run the generator inside the Axis2 installation you must set an environment variable AXIS2_HOME to the Axis2 installation directory.
  • Add Import-Package declarations with minimum versions as available in your target platform (they will be set automatically if you use the Manifest editor's Dependencies tab to add them). To run this example at least these are needed (with valid versions at the time of writing):
javax.xml.stream;version="1.0.1",
org.apache.axiom.om;version="1.2.7",
org.apache.axiom.om.impl;version="1.2.7",
org.apache.axiom.om.impl.llom;version="1.2.7",
org.apache.axiom.soap;version="1.2.7",
org.apache.axis2;version="1.4.1",
org.apache.axis2.addressing;version="1.4.1",
org.apache.axis2.client;version="1.4.1",
org.apache.axis2.context;version="1.4.1",
org.apache.axis2.description;version="1.4.1",
org.apache.axis2.transport;version="1.4.1",
org.apache.axis2.transport.http;version="1.4.1",
org.apache.axis2.wsdl;version="1.4.1",
org.apache.xmlbeans;version="2.3.0",
org.apache.xmlbeans.impl.schema;version="2.3.0",
org.apache.xmlbeans.impl.values;version="2.3.0",
org.apache.xmlbeans.xml.stream;version="2.3.0"
You will not get compile errors if the import for org.apache.xmlbeans.impl.schema is missing, but it is needed during runtime.
For more complex webservices, additional imports may be required. Check the imported generated client code for compile errors.
  • Create a source folder code/gen in your bundle and move the content of the generated src folder into it.
  • Create a folder lib in your bundle, create a zip file from the content of the generated resources folder, change the suffix to jar and move it to lib. Refresh the bundle in your Eclipse workspace, and add this jar to the Bundle-Classpath of your bundle (Manifest editor, tab Runtime, Classpath setting).


[edit] Implementation

  • Create the package org.eclipse.smila.sample.pipelet and the Java class HelloWorldPipelet.
  • Use the following code as a template for your new class. It contains empty method bodies and a reference to the logger. In the following we are going to gradually replace the comments in this file by the corresponding code snippets. For your convenience you may also download the complete zipped source file from HelloWorldPipelet.zip.
package org.eclipse.smila.sample.pipelet
 
import org.apache.axis2.transport.http.HTTPConstants;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.eclipse.smila.blackboard.Blackboard;
import org.eclipse.smila.datamodel.AnyMap;
import org.eclipse.smila.datamodel.Value;
import org.eclipse.smila.processing.Pipelet;
import org.eclipse.smila.processing.ProcessingException;
import org.eclipse.smila.processing.parameters.ParameterAccessor;
import org.eclipse.smila.processing.util.ProcessingConstants;
import org.eclipse.smila.processing.util.ResultCollector;
 
import com.empolis.smila.sample.helloworld.HelloWorldStub;
 
import demo.hw.server.SayHi;
import demo.hw.server.SayHiDocument;
import demo.hw.server.SayHiResponse;
import demo.hw.server.SayHiResponseDocument;
 
public class HelloWorldPipelet implements Pipelet {
 
  // additional member variables or constants
  private final Log _log = LogFactory.getLog(getClass());
 
  public HelloWorldPipelet(){
  }
 
  public void configure(final AnyMap configuration) throws ProcessingException {
    // read the configuration properties
  }
 
  public String[] process(final Blackboard blackboard, final String[] recordIds) throws ProcessingException {
    // process the recordIds and create a result
    return null;
  }
}

[edit] Read PipeletConfiguration

  • First let's create two constants for the property names used in the configuration (or the parameters section of the records to be processed) to retrieve the names of the source and target attribute. Replace the comment "// additional member variables or constants" with the following code snippet.
  private final String PROP_IN_ATT_NAME = "IN_ATT_NAME";
 
  private final String PROP_OUT_ATT_NAME = "OUT_ATT_NAME";
 
  private AnyMap _config;
  • Then we are going to store the the PipeletConfiguration in method configure(final AnyMap configuration) for later evalutaion in process(final Blackboard blackboard, final String[] recordIds). So we will allow the user of this pipelet to either use the pipelet configuration to configure the attributes as well as the records themselves (e.g. the administrator could define the attributes in a job, these job properties can override default pipelet configuration properties when using the ParameterAccessor in the process method).
@Override
public void configure(final AnyMap configuration) throws ProcessingException {
  _config = configuration;
}

Note: It would also be possible to use and configure member variables directly in the configure method and not use the ParameterAccessor to retrieve configuration parameters in the process method. You can do so for properties that won't change during operation or will always stay the same for each record, no matter what the parameters of the record contain. Or for lengthy initialization like reading and parsing configuration from files and such. In these cases you should use member variables that are initialized in the configuration method using only the information from the PipeletConfiguration. but you should clearly document which parameters can only be defined with the PipeletConfiguration and which can be overridden in the records.

[edit] Process IDs and implement exception handling

The method process(Blackboard blackboard, String[] recordIds) has two parameters:

<The HelloWorld pipelet should therefore iterate over the IDs in the parameter recordIds, get the required data from the record identified by the ID, process this data, and store the result in the record.

It is suggested that you use the org.eclipse.smila.processing.util.ResultCollector utility class to cope with result id collection that also provides a configurable exception handling approach. When creating the ResultCollector, you have to decide whether records that cause an exception will be excepted from the result set or if they will stay in the result set. We will use the system wide default ProcessingConstants.DROP_ON_ERROR_DEFAULT which is set to false. The ResultCollector will also check the ParameterAccessor for the parameter _failOnError (default: false).

Let's place a try ... catch() block in the for loop to ensure that errors do only interrupt the processing of the current ID. The comments in the code serve as placeholders for the functionality described in the following sections. At the end we ask the ResulotCollector for the set of recordIds as the result of the pipelet. Replace the comment "// process the recordIds and create a result" with the following code snippet.

final ParameterAccessor paramAccessor = new ParameterAccessor(blackboard, _config);
final ResultCollector resultCollector =
      new ResultCollector(paramAccessor, _log, ProcessingConstants.DROP_ON_ERROR_DEFAULT);
for (String id : recordIds) {
    try {
        // read your configuration using the parameteraccessor
        paramAccessor.setCurrentRecord(id);
        // read configuration from the accessor
 
        // Read Input Data
 
        // Process Input Data
 
        // Write Output Data
 
        // add the id for a successful operation
        resultCollector.addResult(id);
    } catch (final Exception ex) {
        // mark the id for a failed record and let the result collector handle the exception as configured
        resultCollector.addFailedResult(id, e);
    }
} // for
 
// let the ResultColletor decide which ids to return:
return resultCollector.getResultIds();

Note: Most of the time the return value of a pipelet is the same set of record ids as was processed (recordIds). However, in some cases a pipelet may filter record IDs or even create new records. Then the record IDs of the records to be filtered out should not be added to the ResultCollector and new record IDs have to be added to the ResultCollector in order to get the correct set of IDs as the result of the process method.

[edit] evaluate configuration parameters

Now we have to determine the source an target attribute names that have to be provided with the configuration parameters PROP_IN_ATT_NAME and PROP_OUT_ATT_NAME. Therefore we first have to determine the attribute names using the parameter accessor (Note: if we didn't want to let job parameters change these attributes, we could have evaluated the piplet configuration in the configure method and stored the result in member variables, but we want to be flexible here in this example). Replace the comment // read configuration from the accessor with the following snippet:

final String inAttName = paramAccessor.getRequiredParameter(PROP_IN_ATT_NAME);
if (inAttName.trim().length() == 0) {
  throw new ProcessingException("Property " + PROP_IN_ATT_NAME + " must not be an empty String");
}
final String outAttName = paramAccessor.getRequiredParameter(PROP_OUT_ATT_NAME);
if (outAttName.trim().length() == 0) {
  throw new ProcessingException("Property " + PROP_OUT_ATT_NAME + " must not be an empty String");
}

[edit] Read input data

Now we want to read the data of the attribute we stored in inAttName. Replace the comment "// Read Input Data" with the following code snippet.

String inputValue = "";
if (blackboard.getMetadata(id).containsKey(inAttName)) {
  inputValue = blackboard.getMetadata(id).getStringValue(inAttName);
}

Note: Accessing attribute values can be achieved more generically. Therefore you have to check what data type a certain attribute contains using the method getValueType() (or the checking methods isBoolean()... etc.). Then you can use the appropriate getter method to access the raw data.

[edit] Process input data

Now we will call the HelloWorld web service with the parameter inputValue and store the result in variable outputValue. Therefore we use the classes generated from WSDL by Axis2. The HelloWorld web service will return a String message in the format "Hello " + the content of variable inputValue. Replace the comment "// Process Input Data" with the following code snippet.

HelloWorldStub ws = new HelloWorldStub("http://localhost:8081/axis2/services/HelloWorld.HelloWorldImplPort");
ws._getServiceClient().getOptions().setProperty(HTTPConstants.CHUNKED, Boolean.FALSE);
SayHiDocument sayHiDoc = SayHiDocument.Factory.newInstance();
SayHi sayHi = sayHiDoc.addNewSayHi();
sayHi.setArg0(inputValue);
SayHiResponseDocument respDoc = ws.sayHi(sayHiDoc);
SayHiResponse response = respDoc.getSayHiResponse();        
String outputValue = response.getReturn();

[edit] Write output data

Finally, we want to store the content of the variable outputValue in the record attribute with the name contained in variable outAttName. Therefore we have to create a new Value object and set its value. Then we only need to set this Value for the current ID on the black board. Replace the comment "// Write Output Data" with the following code snippet.

final Value outLiteral = blackboard.getDataFactory().createStringValue(outputValue);
blackboard.getMetadata(id).put(outAttName, outLiteral);

Note: The method commit(Id) of the blackboard service does not need to be called in each pipelet as it is automatically called at the end of the pipeline.

[edit] Configuration and invocation in BPEL

In this tutorial we will integrate the HelloWorld pipelet in the SMILA indexing process just before the record is stored in the Solr core. With this configuration the input for the HelloWorld pipelet will be read from attribute Title and the modified output will be stored in the same attribute, overwriting the previous value.

  • Edit the file configuration/org.eclipse.smila.processing.bpel/pipelines/addpipeline.bpel and add the following right between the <extensionActivity name="convertDocument"> and the <extensionActivity name="SolrIndexPipelet"> section.
<extensionActivity>
    <proc:invokePipelet name="invokeHelloWorldPipelet">
        <proc:pipelet class="org.eclipse.smila.sample.pipelet.HelloWorldPipelet" />
        <proc:variables input="request" output="request" />
        <proc:configuration>
            <rec:Value name="IN_ATT_NAME">Title</rec:Value>
            <rec:Value name="OUT_ATT_NAME">Title</rec:Value>
            </proc:Property>
        </proc:configuration>       
    </proc:invokePipelet>
</extensionActivity>

[edit] Build and Test your pipelet

Depending on the SMILA distribution you are using you have different options how to build and test your pipelet:

[edit] Source Code Distribution

You can simply integrate your pipelet in the SMILA build process. Refer to the instructions on How to integrate a new bundle into build process for details. You can build your SMILA application and run it as usual.

In addition to building the SMILA application you can also directly run SMILA within your eclipse IDE. To test your pipelet, you have to include the bundle in the OSGi launch configuration:

  • Open Run > Open Run Dialog.
  • In the left window select OSGi Framework > SMILA.
  • In the right window expand Workspace and select org.eclipse.smila.sample.pipelet.
  • Set the Default Auto-Start option to true.
  • Click the Apply button.
  • Launch SMILA by clicking the Run button.

[edit] Binary Distribution

To test your pipelet you have to add it as a plugin to your SMILA installation.

  • export your pipelet as a plugin using eclipse IDE wizards. Refer to the instructions on How to export a bundle for a step by step description.
  • Copy your plugin to the directory %SMILA_HOME%/plugins.
  • Add the following XML snippet to the file %SMILA_HOME%/features/org.eclipse.smila.feature_1.0.0/feature.xml:

   <plugin
   id="org.eclipse.smila.sample.pipelet"
   download-size="0"
   install-size="0"
   version="1.0.0"
   unpack="false"/>

  • Launch SMILA by starting SMILA.exe.

If SMILA is running, you can start a crawling job as described in Run and manage the connectivity framework beginning at step 5. While crawling your data source you can already search for indexed documents. Open your browser, navigate to http://localhost:8080/SMILA/search and execute a query. In the result table take a look at the attribute Title. Every Title should now have the suffix "modified by HelloWorldPipelet", as this was added by the pipelet.

[edit] Troubleshooting

If there are any problems please take a look at the log files SMILA.log and /workspace/.metadata/.log and feel free to ask for support at the SMILA Newsgroup.