Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/HighlightingPipelet"

Line 1: Line 1:
== Bundle: <tt>org.eclipse.smila.search.highlighting.HighlightingService</tt> ==
+
== Bundle: org.eclipse.smila.search.highlighting.HighlightingPipelet ==
  
 
=== Description ===
 
=== Description ===
This SearchProcessingService is used to highlight the results of a former executed search (e.g. with Lucene). A search may return highlight annotations on attributes. These annotations contain the original text as well as positions and scores of hits. The HighlightingService uses HighlightingTransformers to transform these annotations into a markup text, for example in a most basic case all hit terms could be marked bold. The original text is overwritten with this markup and the position annotations are removed.
+
This pipelet is used to highlight the results of a former executed search (e.g. with Lucene). A search may return highlight annotations. These annotations contain the original text as well as positions and scores of hits. The HighlightingPipelet uses HighlightingTransformers to transform these annotations into a markup text, for example in a most basic case all hit terms could be marked bold. The original text is overwritten with this markup and the position annotations are removed. Via annotations on the query a HighlightingTransformer is selected and configured.
  
Multiple HighlightingTransformer implementations (OSGi Declarative Services) can register themselves at the HighlightingService. Via annotations on the query a HighlightingTransforme is selected and configured.
+
Note on implementation: The HighlightingTransformers are own OSGi services. They are managed for the pipelet by a seperate OSGi service named HighlightingService which can reference multiple of these HighlightingTransformer services.
  
 
=== Configuration ===
 
=== Configuration ===
  
==== Annotations ====
+
==== Pipelet Configuration ====
The HighlightingService expects the following Annotation structure on any record Attribute that should be highlighted in a search (of course only Attributes with Sting values can be highlighted):
+
 
 +
The highlighting transformer to be used can and must be configured seperately for each attribute for which the highlight should be transformed. This is done by a parameter named "highlightingTransformers" which can be defined either in the configuration in BPEL or can be overridden in the search request. For example, in the configuration this looks like:
 +
 
 
<source lang="xml">
 
<source lang="xml">
<An n="highlight">
+
<extensionActivity>
     <An n="HighlightingTransformer">
+
  <proc:invokePipelet name="highlight results">
         <V n="name">%HIGHLIGHTING_TRANSFORMER_NAME%</V>  
+
     <proc:pipelet class="org.eclipse.smila.search.highlighting.HighlightingPipelet" />
        <An n="%PARAMETER_NAME%">
+
    <proc:variables input="request" />
            <V n="value">%PARAMETER_VALUE%</V>  
+
    <proc:configuration>
        </An>
+
      <rec:Map key="highlightingTransformers">
        ...
+
         <rec:Map key="Content">
     </An>
+
          <rec:Val key="name">Sentence</rec:Val>
</An>
+
          <rec:Val key="MaxLength">300</rec:Val>
 +
          <rec:Val key="MarkupPrefix">&lt;b&gt;</rec:Val>
 +
          <rec:Val key="MarkupSuffix">&lt;/b&gt;</rec:Val>
 +
          <rec:Val key="MaxHLElements">999</rec:Val>
 +
          <rec:Val key="MaxSucceedingCharacters">30</rec:Val>
 +
          <rec:Val key="SucceedingCharacters">...</rec:Val>
 +
          <rec:Val key="SortAlgorithm">Occurrence</rec:Val>
 +
          <rec:Val key="TextHandling">ReturnSnipplet</rec:Val>
 +
        </rec:Map>
 +
      </rec:Map>
 +
     </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 
</source>
 
</source>
  
* "highlight" is the base annotation. On query records it is used to pass parameters for highlighting. On result records it is used to store the highlighting positions and text.
+
* "highlightingTransformers" is the base request parameter. It has a map value.
* "HighlightingTransformer" is a sub-annotation of "highlight".  
+
* It contains a map for each attribute to be highlighted.
** It must contain a named value "name" that contains the name of the HighlightingTransformer implementation to use. This name has to be euqual to the value specified for property <tt>smila.highlighting.transformer.type</tt> in the HighlightingTransformer OSGi component description file.
+
** These maps must contain a value "name" that contains the name of the HighlightingTransformer implementation to use. This name has to be euqual to the value specified for property <tt>smila.highlighting.transformer.type</tt> in the HighlightingTransformer OSGi component description file.
** It may contain additional annotations as parameters for the HighlightingTransformer. The annotations name is the name of the configuration parameter, the named value "value" contains the value of the configuration property. See the description for each HighlightingTransformer to see what parameters are supported.
+
** It may contain values as parameters for the HighlightingTransformer. The value's key is the name of the configuration property. See the description for each HighlightingTransformer to see what parameters are supported.
 +
 
 +
The pipelet reads also the "highlight" search request parameter that contains the names of attributes to be highlighted in this query (default values can be defined in the pipelet configuration, too). So it's possible to configure transformers for many different attributes, but to do actual highlighting and transforming only for selected attributes.
  
 
On the result records the following Annotation structure is expected on Attributes that should be highlighted (these annotations are for example created by the LuceneSearchService):
 
On the result records the following Annotation structure is expected on Attributes that should be highlighted (these annotations are for example created by the LuceneSearchService):
 
<source lang="xml">
 
<source lang="xml">
<An n="highlight">
+
<Map key="_highlight">
     <V n="text">The original text without any markup where hits should be highlighted.</V>  
+
  <Map key="Content">
     <An n="positions">
+
     <Val n="text">The original text without any markup where hits should be highlighted.</V>  
         <V n="start">%START_POSITION_OF_HIT%</V>  
+
     <Seq key="positions">
         <V n="end">%START_POSITION_OF_HIT%</V>  
+
      <Map>
         <V n="quality">%QUALITY_OF_HIT%</V>  
+
         <Val key="start" type="long">%START_POSITION_OF_HIT%</Val>  
    </An>
+
         <Val key="end" type="long">%START_POSITION_OF_HIT%</Val>  
    ...
+
         <Val key="quality" type="long">%QUALITY_OF_HIT%</Val>
</An>
+
      </Map>
 +
      ...
 +
    </Seq>
 +
  </Map>
 +
</Map>
 
</source>
 
</source>
  
* "highlight" is again used as the base annotation.  
+
* "_highlight" is used as the base annotation. It's a map containing a further map for each highlighted attribute.
** it contains a named value "text" that contains the originla text that should be highlighted
+
** The map for an attribute contains a named value "text" that contains the original text that should be highlighted
** it may contain sub-annotations "positions for marking hits in the text. A "positions" annotation containes the named values
+
** it may contain a sequence "positions" or maps for marking hits in the text. An element of the "positions" sequence contains the values:
 
*** "start" containing the start position of the hit in the text
 
*** "start" containing the start position of the hit in the text
 
*** "end" containing the end position of the hit in the text
 
*** "end" containing the end position of the hit in the text
 
*** "quality" containing a measure for the quality of the hit
 
*** "quality" containing a measure for the quality of the hit
  
The highlight annotation is processe by a HighlightingTransformer. By some algorithm it finds the positions in the text and adds some markup for highlighting. The "positions" annotations are all removed and the named value "text" is replaced by the text containing the markup. A result has the following annotation structure:
+
The highlight annotation is processed by a HighlightingTransformer. By some algorithm it finds the positions in the text and adds some markup for highlighting. The "positions" annotations are all removed and the named value "text" is replaced by the text containing the markup. A result has the following annotation structure:
  
 
<source lang="xml">
 
<source lang="xml">
<An n="highlight">
+
<Map key="_highlight">
    <V n="text">The text containing some <b>markup</b> to highlight the hits.</V>  
+
  <Map key="Content">
</An>
+
    <Val key="text">The text containing some <b>markup</b> to highlight the hits.</Val>
 +
  </Map>
 +
</Map>
 
</source>
 
</source>
  
  
==== Configuration files ====
 
 
The HighlightingService itself has no configuration files. It works with Annotations only.
 
  
 
==== Example ====
 
==== Example ====
  
The following example shows how the HighlightingService is used after a Lucene search.
+
The following example shows how the HighlightingPipelet is used after a Lucene search. The "highlight" parameter with the list of attributes to highlight must be set by the search client in this example.
  
 
'''searchpipeline.bpel'''
 
'''searchpipeline.bpel'''
 
<source lang="xml">
 
<source lang="xml">
 
...
 
...
<extensionActivity name="invokeLuceneSearchService">
+
<extensionActivity name="invokeLuceneSearchPipelet">
     <proc:invokeService>
+
     <proc:invokePipelet name="search">
         <proc:service name="LuceneSearchService" />
+
         <proc:pipelet class="org.eclipse.smila.lucene.pipelets.LuceneSearchPipelet" />
 
         <proc:variables input="request" output="request" />
 
         <proc:variables input="request" output="request" />
         <proc:setAnnotations>
+
         <proc:configuration>
            <rec:An n="org.eclipse.smila.lucene.LuceneSearchService">
+
          <rec:Val n="index">test_index</rec:Val>
                <rec:V n="indexName">test_index</rec:V>
+
         </proc:configuration>
            </rec:An>
+
     </proc:invokePipelet>
         </proc:setAnnotations>
+
     </proc:invokeService>
+
 
</extensionActivity>
 
</extensionActivity>
  
<extensionActivity name="invokeHighlightingService">
+
<extensionActivity>
     <proc:invokeService>
+
  <proc:invokePipelet name="highlight results">
        <proc:service name="HighlightingService" />
+
     <proc:pipelet class="org.eclipse.smila.search.highlighting.HighlightingPipelet" />
        <proc:variables input="request" output="request" />
+
    <proc:variables input="request" output="request" />
     </proc:invokeService>
+
     <proc:configuration>
 +
      <rec:Map key="highlightingTransformers">
 +
        <rec:Map key="Content">
 +
          <rec:Val key="name">Sentence</rec:Val>
 +
          <rec:Val key="MaxLength">300</rec:Val>
 +
          <rec:Val key="MarkupPrefix">&lt;b&gt;</rec:Val>
 +
          <rec:Val key="MarkupSuffix">&lt;/b&gt;</rec:Val>
 +
          <rec:Val key="MaxHLElements">999</rec:Val>
 +
          <rec:Val key="MaxSucceedingCharacters">30</rec:Val>
 +
          <rec:Val key="SucceedingCharacters">...</rec:Val>
 +
          <rec:Val key="SortAlgorithm">Occurrence</rec:Val>
 +
          <rec:Val key="TextHandling">ReturnSnipplet</rec:Val>
 +
        </rec:Map>
 +
      </rec:Map>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 
</extensionActivity>
 
</extensionActivity>
 
...
 
...
 
</source>
 
</source>
 
  
 
== HighlightingTransformer ==
 
== HighlightingTransformer ==
Line 176: Line 207:
 
|}
 
|}
  
[[Category:SMILA]] [[Category:SMILA/Processing Service]]
+
[[Category:SMILA]] [[Category:SMILA/Pipelet]]

Revision as of 09:28, 20 April 2011

Bundle: org.eclipse.smila.search.highlighting.HighlightingPipelet

Description

This pipelet is used to highlight the results of a former executed search (e.g. with Lucene). A search may return highlight annotations. These annotations contain the original text as well as positions and scores of hits. The HighlightingPipelet uses HighlightingTransformers to transform these annotations into a markup text, for example in a most basic case all hit terms could be marked bold. The original text is overwritten with this markup and the position annotations are removed. Via annotations on the query a HighlightingTransformer is selected and configured.

Note on implementation: The HighlightingTransformers are own OSGi services. They are managed for the pipelet by a seperate OSGi service named HighlightingService which can reference multiple of these HighlightingTransformer services.

Configuration

Pipelet Configuration

The highlighting transformer to be used can and must be configured seperately for each attribute for which the highlight should be transformed. This is done by a parameter named "highlightingTransformers" which can be defined either in the configuration in BPEL or can be overridden in the search request. For example, in the configuration this looks like:

<extensionActivity>
  <proc:invokePipelet name="highlight results">
    <proc:pipelet class="org.eclipse.smila.search.highlighting.HighlightingPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Map key="highlightingTransformers">
        <rec:Map key="Content">
          <rec:Val key="name">Sentence</rec:Val>
          <rec:Val key="MaxLength">300</rec:Val>
          <rec:Val key="MarkupPrefix">&lt;b&gt;</rec:Val>
          <rec:Val key="MarkupSuffix">&lt;/b&gt;</rec:Val>
          <rec:Val key="MaxHLElements">999</rec:Val>
          <rec:Val key="MaxSucceedingCharacters">30</rec:Val>
          <rec:Val key="SucceedingCharacters">...</rec:Val>
          <rec:Val key="SortAlgorithm">Occurrence</rec:Val>
          <rec:Val key="TextHandling">ReturnSnipplet</rec:Val>
        </rec:Map>
      </rec:Map>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>
  • "highlightingTransformers" is the base request parameter. It has a map value.
  • It contains a map for each attribute to be highlighted.
    • These maps must contain a value "name" that contains the name of the HighlightingTransformer implementation to use. This name has to be euqual to the value specified for property smila.highlighting.transformer.type in the HighlightingTransformer OSGi component description file.
    • It may contain values as parameters for the HighlightingTransformer. The value's key is the name of the configuration property. See the description for each HighlightingTransformer to see what parameters are supported.

The pipelet reads also the "highlight" search request parameter that contains the names of attributes to be highlighted in this query (default values can be defined in the pipelet configuration, too). So it's possible to configure transformers for many different attributes, but to do actual highlighting and transforming only for selected attributes.

On the result records the following Annotation structure is expected on Attributes that should be highlighted (these annotations are for example created by the LuceneSearchService):

<Map key="_highlight">
  <Map key="Content">
    <Val n="text">The original text without any markup where hits should be highlighted.</V> 
    <Seq key="positions">
      <Map>
        <Val key="start" type="long">%START_POSITION_OF_HIT%</Val> 
        <Val key="end" type="long">%START_POSITION_OF_HIT%</Val> 
        <Val key="quality" type="long">%QUALITY_OF_HIT%</Val>
      </Map>
      ...
    </Seq>
  </Map>
</Map>
  • "_highlight" is used as the base annotation. It's a map containing a further map for each highlighted attribute.
    • The map for an attribute contains a named value "text" that contains the original text that should be highlighted
    • it may contain a sequence "positions" or maps for marking hits in the text. An element of the "positions" sequence contains the values:
      • "start" containing the start position of the hit in the text
      • "end" containing the end position of the hit in the text
      • "quality" containing a measure for the quality of the hit

The highlight annotation is processed by a HighlightingTransformer. By some algorithm it finds the positions in the text and adds some markup for highlighting. The "positions" annotations are all removed and the named value "text" is replaced by the text containing the markup. A result has the following annotation structure:

<Map key="_highlight">
  <Map key="Content">
     <Val key="text">The text containing some <b>markup</b> to highlight the hits.</Val> 
  </Map>
</Map>


Example

The following example shows how the HighlightingPipelet is used after a Lucene search. The "highlight" parameter with the list of attributes to highlight must be set by the search client in this example.

searchpipeline.bpel

...
<extensionActivity name="invokeLuceneSearchPipelet">
    <proc:invokePipelet name="search">
        <proc:pipelet class="org.eclipse.smila.lucene.pipelets.LuceneSearchPipelet" />
        <proc:variables input="request" output="request" />
        <proc:configuration>
          <rec:Val n="index">test_index</rec:Val>
        </proc:configuration>
    </proc:invokePipelet>
</extensionActivity>
 
<extensionActivity>
  <proc:invokePipelet name="highlight results">
    <proc:pipelet class="org.eclipse.smila.search.highlighting.HighlightingPipelet" />
    <proc:variables input="request" output="request" />
    <proc:configuration>
      <rec:Map key="highlightingTransformers">
        <rec:Map key="Content">
          <rec:Val key="name">Sentence</rec:Val>
          <rec:Val key="MaxLength">300</rec:Val>
          <rec:Val key="MarkupPrefix">&lt;b&gt;</rec:Val>
          <rec:Val key="MarkupSuffix">&lt;/b&gt;</rec:Val>
          <rec:Val key="MaxHLElements">999</rec:Val>
          <rec:Val key="MaxSucceedingCharacters">30</rec:Val>
          <rec:Val key="SucceedingCharacters">...</rec:Val>
          <rec:Val key="SortAlgorithm">Occurrence</rec:Val>
          <rec:Val key="TextHandling">ReturnSnipplet</rec:Val>
        </rec:Map>
      </rec:Map>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>
...

HighlightingTransformer

Here is a list of available HighlightingTransformer, a short description and an overview of supported configuration parameters. Each HighlightingTransformer is an OSGi Declarative Service implementing interface org.eclipse.smila.search.highlighting.transformer.HighlightingTransformer. It's service component description file must conatain the property smila.highlighting.transformer.type whose value matches the name of the HighlightingTransformer.


MaxTextLength

The MaxTextLength highlighting transformer is a very simple transformer that limites the highlighted text to a maximum number of characters.

Parameters

The following parameters are supported:

Name Type Constraint Default Description
MarkupPrefix String optional <b> The markup prefix used for highlighting
MarkupSuffix String optional </b> The markup suffix used for highlighting
MaxLength Integer optional 300 Max length of returned highlighted text


Sentence

The Sentence highlighting transformer is able to highlight text while taking sentence boundaries into credit.

Parameters

The following parameters are supported:

Name Type Constraint Default Description
MarkupPrefix String optional <b> The markup prefix used for highlighting
MarkupSuffix String optional </b> The markup suffix used for highlighting
MaxLength Integer optional 300 Max length of returned highlighted text
MaxHLElements Integer optional - Max number of hl elements
MaxSucceedingCharacters Integer optional 30 Max succeeding characters in after of an hl element
SucceedingCharacters String optional - Succeeding characters after an hl element
SortAlgorithm Enumeration: Score, Occurrence optional Occurrence Sort algorithm for selecting hl elements
TextHandling Enumeration: ReturnSnipplet, ReturnFullText, ReturnNoText optional ReturnFullText Sort algorithm for selecting hl elements

ComplexHLResultAggregation

This is the most complex available algorithm for highlighting.

Parameters

The following parameters are supported:

Name Type Constraint Default Description
MarkupPrefix String optional <b> The markup prefix used for highlighting
MarkupSuffix String optional </b> The markup suffix used for highlighting
MaxLength Integer optional 300 Max length of returned highlighted text
MaxHLElements Integer optional - Max number of hl elements
MaxSucceedingCharacters Integer optional 30 Max succeeding characters in after of an hl element
SucceedingCharacters String optional - Succeeding characters after an hl element
SortAlgorithm Enumeration: Score, Occurrence optional Occurrence Sort algorithm for selecting hl elements
TextHandling Enumeration: ReturnSnipplet, ReturnFullText, ReturnNoText optional ReturnFullText Sort algorithm for selecting hl elements
MaxPrecedingCharacters Integer optional 30 Max preceding characters in front of an hl element
PrecedingCharacters String optional - Preceding characters in front of an hl element
HLElementFilter Boolean optional false Filter for hl elements containing identical text

Back to the top