Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/2011.Simplification/Howto integrate a component in SMILA"

(Integrating BPEL service)
(For SMILA 1.0: Simplification pages are obsolete, redirect to SMILA/Howto_integrate_a_component_in_SMILA)
 
Line 1: Line 1:
This page summarizes the different types and complexity levels for the integration of components in SMILA.
+
#REDIRECT [[SMILA/Howto_integrate_a_component_in_SMILA]]
 
+
== Introduction ==
+
 
+
Due to its architecture SMILA allows for the easy integration of third-party components into its framework. Actually there are three different possible integration scenarios available that are depicted in the following table.
+
 
+
{| class="wikitable" border="1" cellpadding="5"
+
|-
+
! width="33%" | [[#Integrating BPEL service|Integrating BPEL service]]
+
! width="33%" | [[#Integrating data sources|Integrating data sources]]
+
! width="33%" | [[#Integrating alternative implementations of SMILA core components|Integrating alternative implementations of SMILA core components]]
+
|-valign="top"
+
| This is probably the most frequently used integration scenario. It allows for the integration or exchange of functionality (services, 3rd party software, etc.) used to process records in the workflow engine.
+
| Integrating your own [[SMILA/Glossary#C|crawler]] or [[SMILA/Glossary#A|agent]] implementations is another common scenario for adding functionality to SMILA. By doing so, further data sources can be unlocked to provide additional input to SMILA.
+
| This scenario is particularly intended for the experienced (SMILA) developer and comprises the possibility to exchange existing implementations of the SMILA core components by your own implementations.
+
|-valign="top"
+
| [[image:Integrate-Service_0.8.0.png]]
+
| [[image:Integrate-Crawler.png]]
+
| [[image:Provide-Alternative-To-Core-Component.png]]
+
|-valign="top"
+
| The figure demonstrates how you can integrate the functionality of your service or your piece of software to SMILA by adding it to the workflow engine.
+
| The figure above exemplary shows how you can add your own crawler implementation to SMILA. Please note that though you may also add an agent implementation likewise this option is not shown in the figure. This was chosen due to simplicity.
+
| The figure above demonstrates how two of the SMILA core components -- ''connectivity'' and ''data store'' -- may be exchanged by your own implementations. These components serve as examples only, that is, you may also exchange other core components such as the [[SMILA/Glossary#B|blackboard service]] or the [[SMILA/Glossary#|delta indexing manager]].
+
|-valign="top"
+
| colspan="3" | The above figures exemplary demonstrate at which levels in the [http://www.eclipse.org/smila SMILA architecture] an integration of new components is applicable. However, for simplicity reason, we restricted the above figures to the index processing chain while completely ignoring the search processing chain that offers the same integration options (except for the integration of agents and crawlers), but is currently not in the focus of this page.
+
|}
+
 
+
== Conventions ==
+
 
+
=== Handling of Character Encoding ===
+
 
+
To make processing of data in SMILA easier: If external data must be converted to a string (e.g. an attribute value), the crawler, agent or any other component accessing an external data source should try everything that is possible to ensure that the conversion is done using the correct encoding. For example, HTTP clients should use the encoding reported by the HTTP server. If the data source does not provide information about the character encoding, you can use the class <tt>org.eclipse.smila.utils.file.EncodingHelper</tt> that tries to convert a byte[] to a string by trying to detect the correct encoding from a <tt>byte[]</tt> by checking BOMs or checking XML and HTML content for instructions and finally by using UTF-8 or, if this fails, the default platform encoding. You may find this helpful.
+
 
+
On the other hand, if valid string data must be converted to a byte[] (e.g. if it is stored as a attachment after pipelet processing), the conversion must always use UTF-8 encoding.
+
 
+
== Integrating BPEL service ==
+
 
+
As already shown in the overview above, SMILA offers the possibility to integrate your own service or piece of software into SMILA [[SMILA/Glossary#BPEL|BPEL]] workflows.
+
In SMILA we simply call these workflows [[SMILA/Glossary#P|pipelines]]. A pipeline is the definition of a BPEL process (or workflow) that orchestrates [[SMILA/Glossary#P|pipelets]] and other BPEL services (e.g. web services).
+
 
+
There are several options on how to achieve this:
+
 
+
* [[#Simple: Integrating web services|Simple]]: The easiest method to add functionality is to invoke a web service by using the standard functionality of BPEL. However, the disadvantage is that not all data of SMILA [[SMILA/Glossary#R|records]] are accessible if you opt for this method of integration.
+
* [[#Default: Integrating local SMILA pipelets|Default]]: The recommended way to integrate additional functionality in SMILA is to provide Java implementations of an interface that allow for an easy creation of the above mentioned [[SMILA/Glossary#P|pipelets]].
+
* [[#Advanced: Integrating remote SMILA processing services|Advanced]]: (''idea, not realized yet'') This method extends the default mechanism by providing an alternative procedure for integrating OSGi services that do not run in the same OSGi runtime as the BPEL workflow but in another OSGI runtime that may even run on a remote machine.
+
 
+
==== Simple: Integrating web services ====
+
The simplest way of integrating additional functionality in SMILA is to call a web service, which is a standard BPEL workflow engine functionality independent of SMILA. However, there are some limitations concerning the input and result data to/from web services: The workflow object (a DOM object) that enters the BPEL workflow in SMILA contains only the record [[SMILA/Glossary#I|IDs]] by default. That means [[SMILA/Glossary#R|records]] and the data contained therein - [[SMILA/Glossary#A|attributes]], [[SMILA/Glossary#A|annotations]], and [[SMILA/Glossary#A|attachments]] - are '''not''' accessible from a BPEL workflow because it can only access and use the values contained in the BPEL workflow object.
+
 
+
To overcome this restriction you can add additional data to the workflow object by adding filters in the configuration file located at <tt>org.eclipse.smila.blackboard/RecordFilters.xml</tt>. These filter rules define which [[SMILA/Glossary#A|attributes]] and [[SMILA/Glossary#A|annotations]] should be copied to the workflow object to make them accessible in the BPEL workflow. Additionally, you should not forget to include all attributes and annotations in the <tt>RecordFilters.xml</tt> file that you wish to write data to. Though filters work on attributes and annotations there is no possibility to access attachments of records because binary data is not reasonable in DOM.
+
 
+
===== Examples =====
+
A good example for this use case is the integration of the [http://www.languageweaver.com/home.asp Language Weaver] web service. The Language Weaver Translation Server provides a web service interface that translates a text into another language. This service could easily be used within SMILA to extend its functionality.
+
 
+
===== Further reading =====
+
Please consult the following how-to tutorials for a more detailed technical description:
+
 
+
* [[SMILA/Development_Guidelines/How to filter and access record data in BPEL|How to filter and access record data in BPEL]]
+
* [[SMILA/Development_Guidelines/How to integrate the HelloWorld webservice in BPEL|How to integrate the HelloWorld webservice in BPEL]]
+
 
+
==== Default: Integrating local SMILA pipelets ====
+
The default and thus recommended technique to integrate functionality or software in SMILA is to provide a [[SMILA/Glossary#P|pipelet]] that runs in the same OSGi runtime as the BPEL workflow engine. Pipelets are easy to implement as they require only standard Java knowledge. They are not shared between multiple pipelines, even multiple invocations of a Pipelet in the same Pipeline do not share the same instance. The lifecycle and configuration of pipelets is managed by the workflow engine, not by OSGi runtime. For further information on pipelets refer to the [[SMILA/Documentation/Pipelets|Pipelets documentation]].
+
 
+
The above mentioned restriction of integrated web services using the BPEL default engine functionality does not apply to pipelets. Both have full access to SMILA [[SMILA/Glossary#R|records]] by using the [SMILA/Glossary#B|blackboard service]], which makes it easy to read, modify, and store [[SMILA/Glossary#R|records]].
+
 
+
In general pipelets follow the same (sometimes optional) logical steps (of course this depends highly on the business logic to be executed). These steps are:
+
* Read the configuration (optional)
+
* Read input data from blackboard (optional)
+
* Execute the business logic
+
* Write result data to blackboard (optional)
+
 
+
In terms of the pipelet that implements the business logic you are totally free to use any desired technology. Some of the posibilities include:
+
* Using POJOs (For examples refer to the [[SMILA/Documentation/Bundle_org.eclipse.smila.processing.pipelets.xmlprocessing|XML processing pipelets]])
+
* Using any local available OSGi service (For an example refer to the [[SMILA/Documentation/LuceneSearchPipelet|LuceneSearchPipelet]] that uses a LuceneSearchService)
+
* Using other technologies such as JNI, RMI, or CORBA to integrate remote or non Java components (As an example consider the integration of [http://www.oracle.com/technologies/embedded/outside-in.html Oracle Outside In Technology].)
+
 
+
===== Examples =====
+
 
+
* Typical examples for pipelets are the [[SMILA/Documentation/Bundle_org.eclipse.smila.processing.pipelets.xmlprocessing|XML processing pipelets]]. These lightweight pipelets are used for XML processing (e.g. XSL transformation). Each pipeline uses its own [[SMILA/Glossary#P|pipelet]] instance.
+
 
+
===== Further reading =====
+
Please consult the following how-to tutorials for a more detailed technical description:
+
* [[SMILA/Development_Guidelines/How to write a Pipelet|How to write a pipelet]]
+
* [[SMILA/Development_Guidelines/How to integrate the HelloWorld webservice as a Pipelet|How to integrate the HelloWorld web service as a pipelet]]
+
 
+
==== Advanced: Integrating remote services ====
+
tbd.
+
 
+
== Integrating data sources ==
+
 
+
Due to the architecture of the SMILA connectivity framework it is easy to include additional data sources by providing appropriate implementations of [[SMILA/Glossary#A|agents]] and/or [[SMILA/Glossary#C|crawlers]].
+
 
+
=== Examples ===
+
* A typical agent is a FilesystemWatcher. It monitors a folder (or folder structure) for changes (creation, modification, or deletion of files/folders) and reports those actions to SMILA.
+
* Typical crawlers are the FilesystemCrawler or the WebCrawler. The first iterates over a folder structure and sends all encountered files to SMILA. The latter traverses the links of HTML pages, follows links to other HTML pages, and sends these pages as well as other resources (images, PDF files, etc.) to SMILA.
+
 
+
=== Further reading ===
+
Please consult the following how-to tutorials for a more detailed technical description:
+
* [[SMILA/Development_Guidelines/How to implement an agent|How to implement an agent]]
+
* [[SMILA/Development_Guidelines/How to implement a crawler|How to implement a crawler]]
+
 
+
== Integrating alternative implementations of SMILA core components ==
+
 
+
The component-based architecture of SMILA even allows you to provide your own implementations of SMILA core components. More info coming soon...
+
 
+
=== Examples ===
+
 
+
A typical example is an alternative implementation of the <tt>DeltaIndexingManager</tt> that does not store its state in memory but in the file system or in a database.
+
 
+
=== Further reading ===
+
tbd.
+
[[Category:SMILA]]
+

Latest revision as of 06:25, 19 January 2012

Copyright © Eclipse Foundation, Inc. All Rights Reserved.