Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/LuceneIndexPipelet"

Line 1: Line 1:
== Bundle: <tt>org.eclipse.smila.lucene.LuceneIndexService</tt> ==
+
== Bundle: org.eclipse.smila.lucene.LuceneIndexPipelet ==
  
 
=== Description ===
 
=== Description ===
This ProcessingService is used to index SMILA records in a Lucene document index. It supports adding, updating and deleting of records.
+
This pipelet is used to index SMILA records in a Lucene document index. It supports adding, updating, and deleting records.
  
 
=== Configuration ===
 
=== Configuration ===
  
==== Annotations ====
+
==== Pipelet Configuration ====
The LuceneIndexService uses the Annotation <tt>org.eclipse.smila.lucene.LuceneIndexService</tt> on records to decide how to handle a record. It supports the following required values.
+
The LuceneIndexPipelet uses the parameter <tt>_indexing</tt> on records to decide on how to handle a record. This parameter must be a map with the following parameter values:
 
{| border = 1
 
{| border = 1
 
!Name!!Value!!Description
 
!Name!!Value!!Description
 
|-
 
|-
|indexName||a String||the name of the index to work on
+
|''indexname''||String||The name of the index to work on
 
|-
 
|-
|executionMode||ADD or DELETE||''ADD'' - add or update the record, ''DELETE'' - delete the record from the index
+
|''executionMode''||ADD or DELETE||''ADD'' - add or update the record, ''DELETE'' - delete the record from the index
 
|-
 
|-
|allowDoublets||boolean||''true'' - allow doublets in the index, no check if a document already exists is performed, ''false'' do a check if a document already exists and if so delete it first. Default is ''false''
+
|''allowDoublets''||Boolean||''true'' - allow doublets in the index, i.e. no check if a document already exists is performed, ''false'' - does not allow for doublets in the index, i.e. do a check if a document already exists and if so delete it first. Default is ''false''.
 
|}
 
|}
 +
The <tt>_indexing</tt> parameters can be defined in the pipelet configuration in BPEL and can be overridden by the record.
  
==== Configuration files ====
+
==== LuceneService Configuration Files ====
  
* <tt>configuration/org.eclipse.smila.search.datadictionary/DataDictionary.xml</tt>
+
=====<tt>configuration/org.eclipse.smila.search.datadictionary/DataDictionary.xml</tt>=====
Here the Lucene index structure and the search template are configured. It is possible to define more than one index here. The index to work on is set in the pipeline by the Annotation "indexName". The defined "FieldNo" are referenced in file Mappings.xml. For more information about configuration of DataDictionary.xml see Anyfinder documentation.
+
This file configures the Lucene index structure and the search template. It is possible to define more than one index here. The index to work on is set in the pipeline by the annotation ''indexName''. The defined "FieldNo" is referenced in the file <tt>Mappings.xml</tt>. For more information about the configuration of <tt>DataDictionary.xml</tt>, see Anyfinder documentation.
This file is used to prepare the settings for indicies. When an index is needed, it is created automatically on demand, and configuration is loaded from this file to created the index.
+
This file is used to prepare the settings for indices. When an index is needed, it is created automatically on demand, and the configuration is loaded from this file to create the index.
(Beside: the Framework creates a Datadictionary.xml file also in the workspace. This file only contains the information(settings/configuration) for created indices).
+
(Note: The framework will also create a <tt>Datadictionary.xml</tt> file in the workspace. This file contains the information (settings/configuration) for created indices only).
The Framework will create an index by itself, when a record is configured to be stored in an index. But the user can also use the createIndex JMX-command to create an index.
+
The framework will create an index by itself, when a record is configured to be stored in an index. But the user may as well use ''createIndex'' JMX command to create an index explicitly.
  
There are some new optional configuration options available for each index configuration to modify the flushing behavior of the underlying lucene index. If none of these are specified the old logic ()flush after every add/delete) is executed.
+
There are some new optional configuration options available for each index configuration which modify the flushing behavior of the underlying Lucene index. If they are not specified, the old logic (flush after every add/delete) is executed.
  
 
<source lang="xml">
 
<source lang="xml">
 
...
 
...
<Index Name="test_index" ForceFlush="false" RAMBufferSize="20" MaxBufferedDocs="100000" MaxBufferedDeleteTerms="100000" >
+
<Index Name="test_index" ForceFlush="false"
 +
  RAMBufferSize="20" MaxBufferedDocs="100000"
 +
  MaxBufferedDeleteTerms="100000">
 
...
 
...
 
</source>
 
</source>
Line 37: Line 40:
 
!Name!!Value!!Description
 
!Name!!Value!!Description
 
|-
 
|-
|ForceFlush||boolean||''true'' a flush is forced after a document is added/deleted, ''false'' flushing is configured by the parameters below. Default is ''true''. If set to ''true'' the parameters below have no effect.
+
|''ForceFlush''||Boolean||''true'' - a flush is forced after a document is added/deleted; ''false'' - flushing is configured by the parameters below. Default is ''true''. If set to ''true'' the parameters below have no effect.
 
|-
 
|-
| RAMBufferSize || Integer || The amount of RAM in MB that may be used before the buffered in-memory documents are flushed. If no value is specified the lucene default 16 MB is used. See Lucene documentation for details.
+
|''RAMBufferSize''|| Integer || The amount of RAM in MB that may be used before the buffered in-memory documents are flushed. If no value is specified, the Lucene default of 16 MB is used. Refer to the Lucene documentation for details.
 
|-
 
|-
| MaxBufferedDocs || Integer || The minimal number of documents required before the buffered in-memory documents are flushed. If no value is specified lucene uses flushing by RAMBufferSize by default. See Lucene documentation for details.
+
|''MaxBufferedDocs''|| Integer || The minimal number of documents required before the buffered in-memory documents are flushed. If no value is specified, Lucene uses flushing by ''RAMBufferSize'' by default. Refer to the Lucene documentation for details.
 
|-
 
|-
| MaxBufferedDeleteTerms || Integer || The minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed. If no value is specified lucene uses flushing by RAMBufferSize by default. See Lucene documentation for details.
+
|''MaxBufferedDeleteTerms''|| Integer || The minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed. If no value is specified, Lucene uses flushing by ''RAMBufferSize'' by default. Refer to the Lucene documentation for details.
 
|}
 
|}
  
 
;Note:
 
;Note:
For best performance use ForceFlush="false" and RAMBufferSize with a meaningful value depending on your system resources. It is important to understand, that if ForceFlush="false" then added/deleted documents may not be visible in a search until a final flush is triggered. A flush is always performed when bundle <tt>org.eclipse.smila.lucene</tt> is stopped. It is also possible to perform a manual flush via the JMX Console.
+
For best performance use ''ForceFlush="false"'' and ''RAMBufferSize'' with a meaningful value depending on your system resources. It is important to understand, that if ''ForceFlush="false"'' then added/deleted documents may not be visible in a search until a final flush is triggered. A flush is always performed when the bundle <tt>org.eclipse.smila.lucene</tt> is stopped. It is also possible to perform a manual flush via the JMX Console.
 
+
 
+
* <tt>configuration/org.eclipse.smila.lucene/Mappings.xml</tt>
+
Here a mapping of attribute and attachment names to Lucene "FieldNo" (defined in DataDictionary.xml) is configured. It is possible to define mappings for multiple indexes in this file, using the same "indexName" as in file DataDictionary.xml.
+
  
 +
=====<tt>configuration/org.eclipse.smila.lucene/Mappings.xml</tt>=====
 +
This file configures the mapping of attribute and attachment names to Lucene "FieldNo" (defined in <tt>DataDictionary.xml</tt>). It is possible to define mappings for multiple indices in this file, using the same "indexName" as in the file <tt>DataDictionary.xml</tt>.
  
 
==== Example ====
 
==== Example ====
  
The following example was used in the SMILA example application to index records delivered by Filesystem- and WebCrawler.
+
The following example was used in the SMILA example application to index records delivered by the File System or Web Crawler respectively.
  
 
'''addpipeline.bpel'''
 
'''addpipeline.bpel'''
 
<source lang="xml">
 
<source lang="xml">
 
...
 
...
<extensionActivity name="invokeLuceneService">
+
<extensionActivity>
     <proc:invokeService>
+
     <proc:invokePipelet name="invokeLuceneService">
         <proc:service name="LuceneIndexService" />
+
         <proc:pipelet class="org.eclipse.smila.lucene.pipelets.LuceneIndexPipelet" />
 
         <proc:variables input="request" output="request" />
 
         <proc:variables input="request" output="request" />
         <proc:setAnnotations>
+
         <proc:configuration>
            <rec:An n="org.eclipse.smila.lucene.LuceneIndexService">
+
          <rec:Map key="_indexing">
                <rec:V n="indexName">test_index</rec:V>
+
              <rec:Val key="indexname">test_index</rec:Val>
                <rec:V n="executionMode">ADD</rec:V>
+
              <rec:Val key="executionMode">ADD</rec:Val>
                <rec:V n="allowDoublets">false</rec:V>
+
              <rec:Val key="allowDoublets">false</rec:Val>
            </rec:An>
+
          </rec:Map>
         </proc:setAnnotations>
+
         </proc:configuration>
     </proc:invokeService>
+
     </proc:invokePipelet>
 
</extensionActivity>
 
</extensionActivity>
 
...
 
...
Line 80: Line 81:
 
<source lang="xml">
 
<source lang="xml">
 
<?xml version="1.0" encoding="UTF-8"?>
 
<?xml version="1.0" encoding="UTF-8"?>
<AnyFinderDataDictionary xmlns="http://www.anyfinder.de/DataDictionary" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.anyfinder.de/DataDictionary ../xml/AnyFinderDataDictionary.xsd">
+
<AnyFinderDataDictionary
 +
  xmlns="http://www.anyfinder.de/DataDictionary"
 +
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 +
  xsi:schemaLocation="http://www.anyfinder.de/DataDictionary ../xml/AnyFinderDataDictionary.xsd"
 +
  >
 
   <Index Name="test_index">
 
   <Index Name="test_index">
 
     <Connection xmlns="http://www.anyfinder.de/DataDictionary/Connection" MaxConnections="5"/>
 
     <Connection xmlns="http://www.anyfinder.de/DataDictionary/Connection" MaxConnections="5"/>
 
     <IndexStructure xmlns="http://www.anyfinder.de/IndexStructure" Name="test_index">
 
     <IndexStructure xmlns="http://www.anyfinder.de/IndexStructure" Name="test_index">
 
       <Analyzer ClassName="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
 
       <Analyzer ClassName="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
       <IndexField FieldNo="8" IndexValue="true" Name="MimeType" StoreText="true" Tokenize="true" Type="Text"/>
+
       <IndexField FieldNo="8" IndexValue="true" Name="MimeType"
       <IndexField FieldNo="7" IndexValue="true" Name="Size" StoreText="true" Tokenize="true" Type="Text"/>
+
              StoreText="true" Tokenize="true" Type="Text"/>
       <IndexField FieldNo="6" IndexValue="true" Name="Extension" StoreText="true" Tokenize="true" Type="Text"/>
+
       <IndexField FieldNo="7" IndexValue="true" Name="Size"
       <IndexField FieldNo="5" IndexValue="true" Name="Title" StoreText="true" Tokenize="true" Type="Text"/>
+
              StoreText="true" Tokenize="true" Type="Text"/>
       <IndexField FieldNo="4" IndexValue="true" Name="Url" StoreText="true" Tokenize="false" Type="Text">
+
       <IndexField FieldNo="6" IndexValue="true" Name="Extension"
 +
              StoreText="true" Tokenize="true" Type="Text"/>
 +
       <IndexField FieldNo="5" IndexValue="true" Name="Title"
 +
              StoreText="true" Tokenize="true" Type="Text"/>
 +
       <IndexField FieldNo="4" IndexValue="true" Name="Url"
 +
              StoreText="true" Tokenize="false" Type="Text">
 
         <Analyzer ClassName="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
 
         <Analyzer ClassName="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
 
       </IndexField>
 
       </IndexField>
       <IndexField FieldNo="3" IndexValue="true" Name="LastModifiedDate" StoreText="true" Tokenize="false" Type="Text"/>
+
       <IndexField FieldNo="3" IndexValue="true" Name="LastModifiedDate"
       <IndexField FieldNo="2" IndexValue="true" Name="Path" StoreText="true" Tokenize="true" Type="Text"/>
+
        StoreText="true" Tokenize="false" Type="Text"/>
       <IndexField FieldNo="1" IndexValue="true" Name="Filename" StoreText="true" Tokenize="true" Type="Text"/>
+
       <IndexField FieldNo="2" IndexValue="true" Name="Path" StoreText="true"
       <IndexField FieldNo="0" IndexValue="true" Name="Content" StoreText="true" Tokenize="true" Type="Text"/>
+
              Tokenize="true" Type="Text"/>
 +
       <IndexField FieldNo="1" IndexValue="true" Name="Filename"
 +
              StoreText="true" Tokenize="true" Type="Text"/>
 +
       <IndexField FieldNo="0" IndexValue="true" Name="Content"
 +
              StoreText="true" Tokenize="true" Type="Text"/>
 
     </IndexStructure>
 
     </IndexStructure>
 
     <Result>
 
     <Result>
 
       <Field FieldNo="0" Name="ID"/>             
 
       <Field FieldNo="0" Name="ID"/>             
 
     </Result>
 
     </Result>
     <Configuration xmlns="http://www.anyfinder.de/DataDictionary/Configuration" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.anyfinder.de/DataDictionary/Configuration ../xml/DataDictionaryConfiguration.xsd">
+
     <Configuration xmlns="http://www.anyfinder.de/DataDictionary/Configuration"
 +
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 +
      xsi:schemaLocation="http://www.anyfinder.de/DataDictionary/Configuration ../xml/DataDictionaryConfiguration.xsd"
 +
        >
 
       <DefaultConfig>
 
       <DefaultConfig>
 
         <Field FieldNo="8">
 
         <Field FieldNo="8">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>
+
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
 +
              Tolerance="exact"/>
 
           </FieldConfig>
 
           </FieldConfig>
 
         </Field>
 
         </Field>
 
         <Field FieldNo="7">
 
         <Field FieldNo="7">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>
+
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
 +
              Tolerance="exact"/>
 
           </FieldConfig>
 
           </FieldConfig>
 
         </Field>
 
         </Field>
 
         <Field FieldNo="6">
 
         <Field FieldNo="6">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>
+
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
 +
              Tolerance="exact"/>
 
           </FieldConfig>
 
           </FieldConfig>
 
         </Field>         
 
         </Field>         
 
         <Field FieldNo="5">
 
         <Field FieldNo="5">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>
+
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
 +
              Tolerance="exact"/>
 
           </FieldConfig>
 
           </FieldConfig>
 
         </Field>
 
         </Field>
 
         <Field FieldNo="4">
 
         <Field FieldNo="4">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>
+
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
 +
              Tolerance="exact"/>
 
           </FieldConfig>
 
           </FieldConfig>
 
         </Field>
 
         </Field>
 
         <Field FieldNo="3">
 
         <Field FieldNo="3">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>
+
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
 +
              Tolerance="exact"/>
 
           </FieldConfig>
 
           </FieldConfig>
 
         </Field>
 
         </Field>
 
         <Field FieldNo="2">
 
         <Field FieldNo="2">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>
+
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
 +
              Tolerance="exact"/>
 
           </FieldConfig>
 
           </FieldConfig>
 
         </Field>
 
         </Field>
 
         <Field FieldNo="1">
 
         <Field FieldNo="1">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>
+
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
 +
              Tolerance="exact"/>
 
           </FieldConfig>
 
           </FieldConfig>
 
         </Field>
 
         </Field>
 
         <Field FieldNo="0">
 
         <Field FieldNo="0">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
 
           <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
             <NodeTransformer xmlns="http://www.anyfinder.de/Search/ParameterObjects" Name="urn:ExtendedNodeTransformer">
+
             <NodeTransformer
 +
              xmlns="http://www.anyfinder.de/Search/ParameterObjects"
 +
              Name="urn:ExtendedNodeTransformer">
 
               <ParameterSet xmlns="http://www.brox.de/ParameterSet"/>
 
               <ParameterSet xmlns="http://www.brox.de/ParameterSet"/>
 
             </NodeTransformer>
 
             </NodeTransformer>
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>
+
             <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
 +
              Tolerance="exact"/>
 
           </FieldConfig>
 
           </FieldConfig>
 
         </Field>
 
         </Field>
Line 165: Line 193:
 
   <Mapping indexName="test_index">
 
   <Mapping indexName="test_index">
 
     <Attributes>
 
     <Attributes>
    <Attribute name="Filename" fieldNo="1" />
+
      <Attribute name="Filename" fieldNo="1" />
    <Attribute name="Path" fieldNo="2" />     
+
      <Attribute name="Path" fieldNo="2" />     
    <Attribute name="LastModifiedDate" fieldNo="3" />
+
      <Attribute name="LastModifiedDate" fieldNo="3" />
    <Attribute name="Url" fieldNo="4" />
+
      <Attribute name="Url" fieldNo="4" />
    <Attribute name="Title" fieldNo="5" />     
+
      <Attribute name="Title" fieldNo="5" />     
    <Attribute name="Extension" fieldNo="6" />
+
      <Attribute name="Extension" fieldNo="6" />
    <Attribute name="Size" fieldNo="7" />
+
      <Attribute name="Size" fieldNo="7" />
    <Attribute name="MimeType" fieldNo="8" />           
+
      <Attribute name="MimeType" fieldNo="8" />           
 
     </Attributes>
 
     </Attributes>
 
     <Attachments>
 
     <Attachments>
        <Attachment name="Content" fieldNo="0" />       
+
      <Attachment name="Content" fieldNo="0" />       
 
     </Attachments>
 
     </Attachments>
 
   </Mapping>
 
   </Mapping>
Line 182: Line 210:
  
  
[[Category:SMILA]] [[Category:SMILA/Processing Service]]
+
[[Category:SMILA]] [[Category:SMILA/Pipelet]]

Revision as of 09:34, 20 April 2011

Bundle: org.eclipse.smila.lucene.LuceneIndexPipelet

Description

This pipelet is used to index SMILA records in a Lucene document index. It supports adding, updating, and deleting records.

Configuration

Pipelet Configuration

The LuceneIndexPipelet uses the parameter _indexing on records to decide on how to handle a record. This parameter must be a map with the following parameter values:

Name Value Description
indexname String The name of the index to work on
executionMode ADD or DELETE ADD - add or update the record, DELETE - delete the record from the index
allowDoublets Boolean true - allow doublets in the index, i.e. no check if a document already exists is performed, false - does not allow for doublets in the index, i.e. do a check if a document already exists and if so delete it first. Default is false.

The _indexing parameters can be defined in the pipelet configuration in BPEL and can be overridden by the record.

LuceneService Configuration Files

configuration/org.eclipse.smila.search.datadictionary/DataDictionary.xml

This file configures the Lucene index structure and the search template. It is possible to define more than one index here. The index to work on is set in the pipeline by the annotation indexName. The defined "FieldNo" is referenced in the file Mappings.xml. For more information about the configuration of DataDictionary.xml, see Anyfinder documentation. This file is used to prepare the settings for indices. When an index is needed, it is created automatically on demand, and the configuration is loaded from this file to create the index. (Note: The framework will also create a Datadictionary.xml file in the workspace. This file contains the information (settings/configuration) for created indices only). The framework will create an index by itself, when a record is configured to be stored in an index. But the user may as well use createIndex JMX command to create an index explicitly.

There are some new optional configuration options available for each index configuration which modify the flushing behavior of the underlying Lucene index. If they are not specified, the old logic (flush after every add/delete) is executed.

...
<Index Name="test_index" ForceFlush="false"
  RAMBufferSize="20" MaxBufferedDocs="100000"
  MaxBufferedDeleteTerms="100000">
...
Name Value Description
ForceFlush Boolean true - a flush is forced after a document is added/deleted; false - flushing is configured by the parameters below. Default is true. If set to true the parameters below have no effect.
RAMBufferSize Integer The amount of RAM in MB that may be used before the buffered in-memory documents are flushed. If no value is specified, the Lucene default of 16 MB is used. Refer to the Lucene documentation for details.
MaxBufferedDocs Integer The minimal number of documents required before the buffered in-memory documents are flushed. If no value is specified, Lucene uses flushing by RAMBufferSize by default. Refer to the Lucene documentation for details.
MaxBufferedDeleteTerms Integer The minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed. If no value is specified, Lucene uses flushing by RAMBufferSize by default. Refer to the Lucene documentation for details.
Note

For best performance use ForceFlush="false" and RAMBufferSize with a meaningful value depending on your system resources. It is important to understand, that if ForceFlush="false" then added/deleted documents may not be visible in a search until a final flush is triggered. A flush is always performed when the bundle org.eclipse.smila.lucene is stopped. It is also possible to perform a manual flush via the JMX Console.

configuration/org.eclipse.smila.lucene/Mappings.xml

This file configures the mapping of attribute and attachment names to Lucene "FieldNo" (defined in DataDictionary.xml). It is possible to define mappings for multiple indices in this file, using the same "indexName" as in the file DataDictionary.xml.

Example

The following example was used in the SMILA example application to index records delivered by the File System or Web Crawler respectively.

addpipeline.bpel

...
<extensionActivity>
    <proc:invokePipelet name="invokeLuceneService">
        <proc:pipelet class="org.eclipse.smila.lucene.pipelets.LuceneIndexPipelet" />
        <proc:variables input="request" output="request" />
        <proc:configuration>
           <rec:Map key="_indexing">
              <rec:Val key="indexname">test_index</rec:Val>
              <rec:Val key="executionMode">ADD</rec:Val>
              <rec:Val key="allowDoublets">false</rec:Val>
           </rec:Map>
        </proc:configuration>
    </proc:invokePipelet>
</extensionActivity>
...

DataDictionary.xml

<?xml version="1.0" encoding="UTF-8"?>
<AnyFinderDataDictionary
  xmlns="http://www.anyfinder.de/DataDictionary"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.anyfinder.de/DataDictionary ../xml/AnyFinderDataDictionary.xsd"
  >
  <Index Name="test_index">
    <Connection xmlns="http://www.anyfinder.de/DataDictionary/Connection" MaxConnections="5"/>
    <IndexStructure xmlns="http://www.anyfinder.de/IndexStructure" Name="test_index">
      <Analyzer ClassName="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
      <IndexField FieldNo="8" IndexValue="true" Name="MimeType"
               StoreText="true" Tokenize="true" Type="Text"/>
      <IndexField FieldNo="7" IndexValue="true" Name="Size"
               StoreText="true" Tokenize="true" Type="Text"/>
      <IndexField FieldNo="6" IndexValue="true" Name="Extension"
               StoreText="true" Tokenize="true" Type="Text"/>
      <IndexField FieldNo="5" IndexValue="true" Name="Title"
               StoreText="true" Tokenize="true" Type="Text"/>
      <IndexField FieldNo="4" IndexValue="true" Name="Url"
               StoreText="true" Tokenize="false" Type="Text">
        <Analyzer ClassName="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
      </IndexField>
      <IndexField FieldNo="3" IndexValue="true" Name="LastModifiedDate"
        StoreText="true" Tokenize="false" Type="Text"/>
      <IndexField FieldNo="2" IndexValue="true" Name="Path" StoreText="true"
               Tokenize="true" Type="Text"/>
      <IndexField FieldNo="1" IndexValue="true" Name="Filename"
               StoreText="true" Tokenize="true" Type="Text"/>
      <IndexField FieldNo="0" IndexValue="true" Name="Content"
               StoreText="true" Tokenize="true" Type="Text"/>
    </IndexStructure>
    <Result>
      <Field FieldNo="0" Name="ID"/>            
    </Result>
    <Configuration xmlns="http://www.anyfinder.de/DataDictionary/Configuration"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.anyfinder.de/DataDictionary/Configuration ../xml/DataDictionaryConfiguration.xsd"
         >
      <DefaultConfig>
        <Field FieldNo="8">
          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
               Tolerance="exact"/>
          </FieldConfig>
        </Field>
        <Field FieldNo="7">
          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
               Tolerance="exact"/>
          </FieldConfig>
        </Field>
        <Field FieldNo="6">
          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
               Tolerance="exact"/>
          </FieldConfig>
        </Field>        
        <Field FieldNo="5">
          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
               Tolerance="exact"/>
          </FieldConfig>
        </Field>
        <Field FieldNo="4">
          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
               Tolerance="exact"/>
          </FieldConfig>
        </Field>
        <Field FieldNo="3">
          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
               Tolerance="exact"/>
          </FieldConfig>
        </Field>
        <Field FieldNo="2">
          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
               Tolerance="exact"/>
          </FieldConfig>
        </Field>
        <Field FieldNo="1">
          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
              Tolerance="exact"/>
          </FieldConfig>
        </Field>
        <Field FieldNo="0">
          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
            <NodeTransformer
              xmlns="http://www.anyfinder.de/Search/ParameterObjects"
              Name="urn:ExtendedNodeTransformer">
              <ParameterSet xmlns="http://www.brox.de/ParameterSet"/>
            </NodeTransformer>
            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR"
               Tolerance="exact"/>
          </FieldConfig>
        </Field>
      </DefaultConfig>
    </Configuration>
  </Index>
</AnyFinderDataDictionary>

Mappings.xml

<?xml version="1.0" encoding="utf-8" ?>
<Mappings xmlns="http://www.eclipse.org/smila/lucene"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="schemas/Mappings.xsd"
>
  <Mapping indexName="test_index">
    <Attributes>
      <Attribute name="Filename" fieldNo="1" />
      <Attribute name="Path" fieldNo="2" />    
      <Attribute name="LastModifiedDate" fieldNo="3" />
      <Attribute name="Url" fieldNo="4" />
      <Attribute name="Title" fieldNo="5" />    
      <Attribute name="Extension" fieldNo="6" />
      <Attribute name="Size" fieldNo="7" />
      <Attribute name="MimeType" fieldNo="8" />           
    </Attributes>
    <Attachments>
      <Attachment name="Content" fieldNo="0" />      
    </Attachments>
  </Mapping>
</Mappings>

Copyright © Eclipse Foundation, Inc. All Rights Reserved.