Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Pipelets"

(Combined Lifecycle)
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This page describes Pipelets, ProcessingServices and their lifecycle.  
+
This page describes pipelines, pipelets and their lifecycle.  
  
== Definition ==
+
== Concepts ==
  
Pipelets and ProcessingServices are reusable Java Components in a BPEL workflow and can be orchestrated like any regular BPEL Service. Both are used to process the data contained in [[SMILA/Glossary#R|Records]].
+
A ''Pipeline'' is a synchronous workflow composed of components called ''pipelets'' that processes a list of [[SMILA/Glossary#R|records]] given as the input. Synchronous means that the invoker of the pipeline blocks until the execution has finished, and a set of result records is returned that represents the result of the processing (if successful).
  
=== Pipelets ===
+
Often the list of input records consists of a single record representing a user request and the workflow executes a strict sequence of pipelets to produce a single result record. But it's also possible (especially when a pipeline is used as part of an asynchronous workflow) to send more than one record through the pipeline in one call to reduce the overhead of the pipeline invocation. Finally, more sophisticated pipelines can contain conditions and loops or they can change the number of records going through the pipeline.
A Pipelet is a POJO that implements the interface <tt>org.eclipse.smila.processing.SimplePipelet</tt>. It's lifecycle and configuration are managed by the workflow engine. An instance of a Pipelet is not shared by multiple Pipelines (workflows), even multiple invocations of a Pipelet in the same Pipeleline do not share the same instance. Each <tt><invokePipelet></tt> Pipeline has it's own instance. An instance may still be accessed by multiple threads, for example if the same Pipeline is executed in parallel. The configuration of each Pipelet instance is included in the <tt><invokePipelet></tt> call in the BPEL pipeline. Technical details on Pipelet development can be found in the tutorial [[SMILA/Development_Guidelines/How_to_write_a_Pipelet|How to write a Pipelet]].
+
  
 +
A pipelet is a POJO that implements the interface <tt>org.eclipse.smila.processing.Pipelet</tt>. Its lifecycle and configuration is managed by the workflow engine. An instance of a pipelet is not shared by multiple pipelines, even multiple occurrences of a pipelet in the same pipeline do not share the same instance. However, an instance may still be executed by multiple threads at the same time, for example if the same pipeline is executed in parallel, so a pipelet implementation must be thread-safe. The configuration of each pipelet instance is included in the pipeline description. Technical details on pipelet development can be found in [[SMILA/Development_Guidelines/How_to_write_a_Pipelet|How to write a Pipelet]].
  
=== ProcessingServices ===
+
The default [[SMILA/Documentation/BPEL_Workflow_Processor|SMILA workflow processing engine]] uses [http://en.wikipedia.org/wiki/Business_Process_Execution_Language BPEL] to describe pipelines. But the pipelets do not depend on being called from a BPEL context, so it would be easily possible to replace the BPEL engine by a pipeline engine using a different description language and continue to use the same pipelet implementations.
A ProcessingService is an OSGi service (preferably a Declarative Service) that implements the interface <tt>org.eclipse.smila.processing.ProcessingService</tt>. It's lifecycle and configuration are NOT managed by the workflow engine, but by the OSGi runtime. An instance of a ProcessingService can be shared between multiple Pipelines and so is frequently accessed by multiple threads. The configuration for a ProcessingServices is not contained within the <tt><invokeService></tt> call. Each ProcessingService can have it's own configuration file(s). Typically these are located in a folder equal to the ProcessingServices bundle name within the global configuration folder. For simple configuration options a ProcessingService can use the same XML format used to configure Pipelets. There are ready to use classes for reading and parsing such configuration files available. Technical details on ProcessingServices development can be found in the tutorial [[SMILA/Development_Guidelines/How_to_write_a_ProcessingService|How to write a ProcessingService]].
+
  
 
== Lifecycle ==
 
== Lifecycle ==
  
=== Pipelet Lifecycle ===
+
The following diagram shows the lifecycle of pipelets.
The following diagram shows the lifecycle of Pipelets.<br/>
+
  
 
[[Image:Lifecycle_of_Pipelets.png]]
 
[[Image:Lifecycle_of_Pipelets.png]]
  
Pipelets live inside the workflow engine. When the engine starts, it reads the pipeline definitions (i.e. BPEL workflows) from the ConfigurationHelper. The pipelines are introspected for pipelet invocations (invokePipelet extension activities) and the pipelet configurations that are contained in the invocation XML elements. For each invocation it creates an instance of the specified pipelet class, parses the configuration from the BPEL document and injects it into the pipelet instance. The pipelet instance is stored in the workflow engine as long as the engine is not stopped (and as long as the bundle providing the pipelet is available, of course). So for each single pipelet invocation occurring in the pipelines a different pipelet instance exists with a single configuration.
+
A pipelet instance lives inside the workflow engine. It is declared in the providing bundle by putting a JSON file containing a pipelet description in the <tt>SMILA-INF</tt> directory of the bundle (see [[SMILA/Development_Guidelines/How_to_write_a_Pipelet|How to write a pipelet?]] for details). When the engine starts, it reads predefined pipeline definitions (i.e. BPEL workflows) from the configuration directory  The pipelines are introspected for pipelet invocations and the pipelet configurations that are contained in the pipeline. For each such occurrence it creates an instance of the specified pipelet class and injects the configuration into the pipelet instance. The pipelet instance is stored in the workflow engine as long as the engine is not stopped (and as long as the bundle providing the pipelet is available, of course). So one single pipelet instance exists for each occurrence of a pipelet in a pipeline. The pipelet must be capable of parallel execution because each execution of a pipeline uses the same pipelet instances.
  
 +
== Runtime Parameters ==
  
=== ProcessingService Lifecycle ===
+
We have introduced a convention that records can have a map element in attribute <tt>_parameters</tt> and that the elements of this map should be interpreted by pipelets as overrides for their configuration properties. The class <tt>org.eclipse.smila.processing.parameters.ParameterAccessor</tt> provides helper methods for checking if the record has such "runtime parameters" set and getting them from there, or from the pipelet configuration, if not overridden. However, the accessor can easily be told to look in another attribute, if necessary, or to even use the top-level attributes of the records for parameters. The latter is used by  <tt>org.eclipse.smila.search.api.helper.QueryParameterAccessor</tt>, because in search processing the convention is that query parameters are at top-level of the request record.
The following diagram shows the lifecycle of ProcessingServices.<br/>
+
 
+
[[Image:Lifecycle_of_ProcessingServices.png]]
+
 
+
ProcessingServices live independent from the workflow engine. They are created, activated and registered by the OSGi Declarative Services runtime just like the engine itself (of course, they can also be started and registered using bundle activators or other code, if declaring them as a DS is not appropriate for some reason). They read their configuration by themselves, e.g. by using the ConfigurationHelper. Eventually, the DS runtime binds all correctly registered ProcessingServices to the workflow engine so that it can invoke them when it reaches an invokeService extension activity during execution of a pipeline.
+
 
+
=== Combined Lifecycle ===
+
The following diagram shows the lifecycle of both Pipelets and ProcessingServices.<br/>
+
 
+
[[Image:Lifecycle_of_Pipelets_and_ProcessingServices.png]]
+
 
+
Of course, pipelines can combine the use of Pipelets and ProcessingServices. The main purpose of this figure is to emphasize the main difference between Pipelets and ProcessingServices: Pipelets live inside the workflow engine and their instances are not shared, even if they have the same class and configuration. ProcessingServices live independent from the engine, manage their configuration on their own and instances can be shared by multiple pipelines. This makes them the more appropriate integration model if functionality to be integrated needs a complex internal model and uses a lot of memory during runtime, or if resources should not be duplicated if used in multiple pipelines.
+
 
+
== When to use a ProcessingService ==
+
 
+
Technically there are no limitations on Pipelets compared to ProcessingServices. The same functionality can be implemented using both approaches. There are no strict rules when to use which technology. However, ProcessingServices offer some benefits you may want to make use of. Here are some rules of thumb when to implement functionality as a ProcessingService instead as a Pipelet:
+
* <b>Lower Memory Usage</b>: a shared ProcessingService uses less memory than multiple instances of Pipelets. If your functionality needs lots of memory you should implement it as a ProcessingService
+
* <b>Preserve Internal State</b>: if your component has an internal state that should be preserved independently from the lifecycle of the workflow engine then implement it as a ProcessingService
+
* <b>Reuse Functionality</b>: if your functionality should not only be used inside of a pipeline but also from other services, then you should implement it as a ProcessingService. Being an OSGi service it offers the possibility to provide not only the <tt>ProcessingService</tt> interface but any other interface you like. For example this feature is used in the [[SMILA/Documentation/SimpleMimeTypeIdentifier|SimpleMimeTypeIdentifier]].
+
* <b>Not sharable Resources</b>: if your components accesses resources that cannot be shared among multiple clients implement it as a ProcessingService as then only one instance exists and can be reused in multiple pipelines
+
* <b>Flexible/Complex Configuration</b>: if your component needs configuration options that are not supported by the simple PipeletConfiguration options implement it as a ProcessingService
+
* <b>Use of OSGi Services</b>: if your component wants to use one or more OSGi services than it's easier to implement it as a ProcessingService. References to other OSGi services can be easily configured on the component description of the ProcessingService. Otherwise you have to make use of the ServiceTracker to find references to services.
+
 
+
  
 
[[Category:SMILA]]
 
[[Category:SMILA]]

Latest revision as of 05:51, 23 January 2012

This page describes pipelines, pipelets and their lifecycle.

Concepts

A Pipeline is a synchronous workflow composed of components called pipelets that processes a list of records given as the input. Synchronous means that the invoker of the pipeline blocks until the execution has finished, and a set of result records is returned that represents the result of the processing (if successful).

Often the list of input records consists of a single record representing a user request and the workflow executes a strict sequence of pipelets to produce a single result record. But it's also possible (especially when a pipeline is used as part of an asynchronous workflow) to send more than one record through the pipeline in one call to reduce the overhead of the pipeline invocation. Finally, more sophisticated pipelines can contain conditions and loops or they can change the number of records going through the pipeline.

A pipelet is a POJO that implements the interface org.eclipse.smila.processing.Pipelet. Its lifecycle and configuration is managed by the workflow engine. An instance of a pipelet is not shared by multiple pipelines, even multiple occurrences of a pipelet in the same pipeline do not share the same instance. However, an instance may still be executed by multiple threads at the same time, for example if the same pipeline is executed in parallel, so a pipelet implementation must be thread-safe. The configuration of each pipelet instance is included in the pipeline description. Technical details on pipelet development can be found in How to write a Pipelet.

The default SMILA workflow processing engine uses BPEL to describe pipelines. But the pipelets do not depend on being called from a BPEL context, so it would be easily possible to replace the BPEL engine by a pipeline engine using a different description language and continue to use the same pipelet implementations.

Lifecycle

The following diagram shows the lifecycle of pipelets.

Lifecycle of Pipelets.png

A pipelet instance lives inside the workflow engine. It is declared in the providing bundle by putting a JSON file containing a pipelet description in the SMILA-INF directory of the bundle (see How to write a pipelet? for details). When the engine starts, it reads predefined pipeline definitions (i.e. BPEL workflows) from the configuration directory The pipelines are introspected for pipelet invocations and the pipelet configurations that are contained in the pipeline. For each such occurrence it creates an instance of the specified pipelet class and injects the configuration into the pipelet instance. The pipelet instance is stored in the workflow engine as long as the engine is not stopped (and as long as the bundle providing the pipelet is available, of course). So one single pipelet instance exists for each occurrence of a pipelet in a pipeline. The pipelet must be capable of parallel execution because each execution of a pipeline uses the same pipelet instances.

Runtime Parameters

We have introduced a convention that records can have a map element in attribute _parameters and that the elements of this map should be interpreted by pipelets as overrides for their configuration properties. The class org.eclipse.smila.processing.parameters.ParameterAccessor provides helper methods for checking if the record has such "runtime parameters" set and getting them from there, or from the pipelet configuration, if not overridden. However, the accessor can easily be told to look in another attribute, if necessary, or to even use the top-level attributes of the records for parameters. The latter is used by org.eclipse.smila.search.api.helper.QueryParameterAccessor, because in search processing the convention is that query parameters are at top-level of the request record.