Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Specifications/Management Of The Smila Components"

(Use case)
(Description)
 
(5 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
Some Smila components have to be managed.  
 
Some Smila components have to be managed.  
  
The Lucene index should be managed by means of JMX agent. Operations for deleting, renaming and creating indexes should be accessible. For XMLStorage operations for deleting and renaming partitions should be accessible
+
The Lucene index should be managed by means of JMX agent. Operations for deleting, renaming and creating indexes should be accessible. For XMLStorage operations for deleting and renaming partitions should be accessible.
 
+
  
  
Line 12: Line 11:
 
==Description==
 
==Description==
  
Let's discuss a problem on an example of management of an index.  
+
Let's discuss a problem on an example of management of an index.
  
 
See [[SMILA/Workflow Overview |Workflow Smila]].  
 
See [[SMILA/Workflow Overview |Workflow Smila]].  
  
The core of the system '''Router -> ActiveMQ -> Listener -> BPEL processor''' work only with the records. Or in other words the record connects all components of the system.
+
The core of the SMILA system – consisting of '''Router -> JMS queue -> Listener -> BPEL processor''' - works with Record objects. Or in other words the record connects all components of the system.
  
The Router pushes the record in the JMS message queue ( ActiveMQ). The records are collected in the queue where expect the further processing. The Listener orders the queue and for each record invokes the respective pipeline.  Actual Smila has 2 pipeline: AddPipeline and DeletePipeline, which invoke the services and pipelet to process a record.
+
The Router pushes the record in the JMS message queue ( ActiveMQ). The records are collected in the queue where they await further processing. The Listener orders the queue and invokes the respective pipeline for each recordCurrently Smila has 2 pipelines: The AddPipeline and the DeletePipeline, which invoke several services and one pipelet to process a record.  
  
  
[[Image:pipeline.jpg]]
+
 
 +
[[Image:pipeline5.jpg]]
  
 
''Figure 1. Add pipeline (see addpipeline.bpel). Delete pipeline (see deletepipeline.bpel).
 
''Figure 1. Add pipeline (see addpipeline.bpel). Delete pipeline (see deletepipeline.bpel).
 
''
 
''
  
As shown in ''Figure 1'', the AddPipeline invokes first the SimplemimeTypeIdentifier service and the pipelet HtmlTotextPipelet which prepare the record for adding in the index, and then the LuseneIndexService which directly accesses to an index. In this case for the adding of the record in the index. The DeletePipeline invokes the LuseneIndexService for the deleting of the record from the index.
+
As shown in Figure 1, the AddPipeline invokes the SimplemimeTypeIdentifier service and the pipelet HtmlTotextPipelet which prepare the record for adding it to an index, and then the LuceneIndexService which directly accesses the index for the adding of the record to the index.  
  
LuseneIndexService accesses to an index by means of two methods:
+
The DeletePipeline invokes the LuceneIndexService for deleting a record from the index.
 +
 
 +
LuceneIndexService accesses an index by means of two methods:
  
 
<source lang="java">
 
<source lang="java">
private void addRecord(final BlackboardService blackboard, final Id id, String indexName)  ...
+
private void addRecord(final Blackboard blackboard, final Id id, String indexName)  ...
  
 
private void deleteRecord(final Id id, String indexName) ...
 
private void deleteRecord(final Id id, String indexName) ...
Line 39: Line 41:
 
</source>
 
</source>
  
Thus the system does not know what an index is. To be more exact for the system the accesses to the index means a LuseneIndexService invoking.  
+
Thus the system has no direct reference to the Lucene index implementation as such. All indexing operations are carried out by calling methods of the LuceneIndexService.
To cause this service in turn there should be a record which bears in itself the certain information or which is connected with the certain information.
+
  
 
On this basis there are two ways of implementation of the index management:
 
On this basis there are two ways of implementation of the index management:
  
* Natural way - Management by the fictive records
+
* Natural way - Management by pseudo records which contain the index command.
* Surgical way - Directly by the LuceneIndexService.
+
* Surgical way – Direct Lucence API access by the LuceneIndexService.
  
 
==Technical Proposal==
 
==Technical Proposal==
===Management by the fictive records===
+
===Management by the pseudo records===
  
To send fictive (empty) record with special DataSource. This record hands over the information what exactly is necessary for executing: deleting, renaming and creating of the index.   
+
A pseudo record does not contain any data apart from the index command as part of its meta-data. Its sole purpose is to tell the LuceneIndexService which operation is required: deleting, renaming or creating of an index.   
 +
 
 
   
 
   
 
====The possible realization====
 
====The possible realization====
To create additional pipeline IndexManagementPipeline and to send empty records this will cause it. For this pipeline
 
  
* To create set of the pipelets, on one for each operation  
+
Create an additional pipeline “IndexManagementPipeline” to send pseudo records to.
 +
For this pipeline
 +
 
 +
* Create a set of pipelets, one for each operation  
  
 
or
 
or
  
* To invoke LuceneIndexService, in which on one for each operation to add a new method.
+
* Invoke LuceneIndexService directly from the “IndexManagementPipeline”. In this case the LuceneIndexService has to be enriched with a new method for each of the required operations
  
  
[[Image:Drawling3.jpg]]
+
[[Image:newPipeline1.jpg]]
  
 
''Figure 2. Posible realization IndexManagementPipeline.''
 
''Figure 2. Posible realization IndexManagementPipeline.''
Line 81: Line 85:
 
</source>
 
</source>
  
====Advantages of the given approach====  
+
====Advantages of the given approach====
 +
* A new mechanism for the execution of the index commands is not necessary
 +
(This is especially important for the distributed system)
 +
* The history of commands is easily maintained.
  
* It is not necessary the new mechanism of execution of commands (It is especially important for the distributed system)
+
===Directly by the LuceneIndexService===
* All history of commands easily remains
+
  
 +
We have to accept that the LuceneIndexService has to implement new methods: deleteIndex(), renameIndex() and createIndex().
  
===Directly by the LuceneIndexService===
+
Directly invoking the LuceneIndexService will also lead to a desirable result. However, a solution in this fashion requires surgical interference with the system (requirement to implement new methods) and cannot be considered correct.
 +
The requirement for new functionality would always require changes to the API of the system.
 +
The direct approach does not use the possibilities of the SMILA system and does not allow to control the index operations in the standard way.
 +
 
 +
See also [http://en.wikipedia.org/wiki/Command_pattern Command Pattern]
  
Let's accept that the LuceneIndexService has new methods: deleteIndex(), renameIndex() and createIndex().
+
==Solution choosen==
  
Invoke directly to the LuceneIndexService and call of a necessary method, also will lead to desirable result. However, similar solution reminds surgical interference in the system and he cannot be considered correct.
+
It was decided to use standard management agents. Exactly like in the [[SMILA/Project_Concepts/CrawlerController_Remote_Management | crawler controller management]].
At appearance of each new task or change (extension) of the system, it is necessary to change already ready solution.
+
The given approach does not use possibility of system and does not allow to control in the standard image any process. In it consists advantages of the first way.
+

Latest revision as of 06:06, 4 May 2009

Use case

Some Smila components have to be managed.

The Lucene index should be managed by means of JMX agent. Operations for deleting, renaming and creating indexes should be accessible. For XMLStorage operations for deleting and renaming partitions should be accessible.


This article is written on the basis of discussions with Dmitriy Hazin and Ivan Churkin.

Description

Let's discuss a problem on an example of management of an index.

See Workflow Smila.

The core of the SMILA system – consisting of Router -> JMS queue -> Listener -> BPEL processor - works with Record objects. Or in other words the record connects all components of the system.

The Router pushes the record in the JMS message queue ( ActiveMQ). The records are collected in the queue where they await further processing. The Listener orders the queue and invokes the respective pipeline for each record. Currently Smila has 2 pipelines: The AddPipeline and the DeletePipeline, which invoke several services and one pipelet to process a record.


Pipeline5.jpg

Figure 1. Add pipeline (see addpipeline.bpel). Delete pipeline (see deletepipeline.bpel).

As shown in Figure 1, the AddPipeline invokes the SimplemimeTypeIdentifier service and the pipelet HtmlTotextPipelet which prepare the record for adding it to an index, and then the LuceneIndexService which directly accesses the index for the adding of the record to the index.

The DeletePipeline invokes the LuceneIndexService for deleting a record from the index.

LuceneIndexService accesses an index by means of two methods:

private void addRecord(final Blackboard blackboard, final Id id, String indexName)  ...
 
private void deleteRecord(final Id id, String indexName) ...
 
 
IndexConnection indexConnection = IndexManager.getInstance(indexName);

Thus the system has no direct reference to the Lucene index implementation as such. All indexing operations are carried out by calling methods of the LuceneIndexService.

On this basis there are two ways of implementation of the index management:

  • Natural way - Management by pseudo records which contain the index command.
  • Surgical way – Direct Lucence API access by the LuceneIndexService.

Technical Proposal

Management by the pseudo records

A pseudo record does not contain any data apart from the index command as part of its meta-data. Its sole purpose is to tell the LuceneIndexService which operation is required: deleting, renaming or creating of an index.


The possible realization

Create an additional pipeline “IndexManagementPipeline” to send pseudo records to. For this pipeline

  • Create a set of pipelets, one for each operation

or

  • Invoke LuceneIndexService directly from the “IndexManagementPipeline”. In this case the LuceneIndexService has to be enriched with a new method for each of the required operations


NewPipeline1.jpg

Figure 2. Posible realization IndexManagementPipeline.

Configuration file for IndexManagementPipeline indexmanagementpipeline.bpel can consist:

<proc:invokeService>
    <proc:service name="LuceneIndexService" />
    <proc:variables input="request" output="request" />
          <proc:setAnnotations>
             <rec:An n="org.eclipse.smila.lucene.LuceneIndexService">
                <rec:V n="indexName">test_index</rec:V>
                <rec:V n="executionMode">DELETE_INDEX</rec:V>
             </rec:An>
          </proc:setAnnotations>
</proc:invokeService>

Advantages of the given approach

  • A new mechanism for the execution of the index commands is not necessary

(This is especially important for the distributed system)

  • The history of commands is easily maintained.

Directly by the LuceneIndexService

We have to accept that the LuceneIndexService has to implement new methods: deleteIndex(), renameIndex() and createIndex().

Directly invoking the LuceneIndexService will also lead to a desirable result. However, a solution in this fashion requires surgical interference with the system (requirement to implement new methods) and cannot be considered correct. The requirement for new functionality would always require changes to the API of the system. The direct approach does not use the possibilities of the SMILA system and does not allow to control the index operations in the standard way.

See also Command Pattern

Solution choosen

It was decided to use standard management agents. Exactly like in the crawler controller management.

Back to the top