Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/HighlightingPipelet"

m
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Bundle: <tt>org.eclipse.smila.search.highlighting.HighlightingService</tt> ==
+
{{Note|Deprecated:
 
+
The Lucene integration is outdated. Please take a look at the new Solr integration <tt>org.eclipse.smila.solr</tt> which is described here: [[SMILA/Documentation/Solr]].}}
=== Description ===
+
This SearchProcessingService is used to highlight the results of a former executed search (e.g. with Lucene). A search may return highlight annotations on attributes. These annotations contain the original text as well as positions and scores of hits. The HighlightingService uses HighlightingTransformers to transform these annotations into a markup text, for example in a most basic case all hit terms could be marked bold. The original text is overwritten with this markup and the position annotations are removed.
+
 
+
Multiple HighlightingTransformer implementations (OSGi Declarative Services) can register themselves at the HighlightingService. Via annotations on the query a HighlightingTransforme is selected and configured.
+
 
+
=== Configuration ===
+
 
+
==== Annotations ====
+
The HighlightingService expects the following Annotation structure on any record Attribute that should be highlighted in a search (of course only Attributes with Sting values can be highlighted):
+
<source lang="xml">
+
<An n="highlight">
+
    <An n="HighlightingTransformer">
+
        <V n="name">%HIGHLIGHTING_TRANSFORMER_NAME%</V>
+
        <An n="%PARAMETER_NAME%">
+
            <V n="value">%PARAMETER_VALUE%</V>
+
        </An>
+
        ...
+
    </An>
+
</An>
+
</source>
+
 
+
* "highlight" is the base annotation. On query records it is used to pass parameters for highlighting. On result records it is used to store the highlighting positions and text.
+
* "HighlightingTransformer" is a sub-annotation of "highlight".
+
** It must contain a named value "name" that contains the name of the HighlightingTransformer implementation to use. This name has to be euqual to the value specified for property <tt>smila.highlighting.transformer.type</tt> in the HighlightingTransformer OSGi component description file.
+
** It may contain additional annotations as parameters for the HighlightingTransformer. The annotations name is the name of the configuration parameter, the named value "value" contains the value of the configuration property. See the description for each HighlightingTransformer to see what parameters are supported.
+
 
+
On the result records the following Annotation structure is expected on Attributes that should be highlighted (these annotations are for example created by the LuceneSearchService):
+
<source lang="xml">
+
<An n="highlight">
+
    <V n="text">The original text without any markup where hits should be highlighted.</V>
+
    <An n="positions">
+
        <V n="start">%START_POSITION_OF_HIT%</V>
+
        <V n="end">%START_POSITION_OF_HIT%</V>
+
        <V n="quality">%QUALITY_OF_HIT%</V>
+
    </An>
+
    ...
+
</An>
+
</source>
+
 
+
* "highlight" is again used as the base annotation.
+
** it contains a named value "text" that contains the originla text that should be highlighted
+
** it may contain sub-annotations "positions for marking hits in the text. A "positions" annotation containes the named values
+
*** "start" containing the start position of the hit in the text
+
*** "end" containing the end position of the hit in the text
+
*** "quality" containing a measure for the quality of the hit
+
 
+
The highlight annotation is processe by a HighlightingTransformer. By some algorithm it finds the positions in the text and adds some markup for highlighting. The "positions" annotations are all removed and the named value "text" is replaced by the text containing the markup. A result has the following annotation structure:
+
 
+
<source lang="xml">
+
<An n="highlight">
+
    <V n="text">The text containing some <b>markup</b> to highlight the hits.</V>
+
</An>
+
</source>
+
 
+
 
+
==== Configuration files ====
+
 
+
The HighlightingService itself has no configuration files. It works with Annotations only.
+
 
+
==== Example ====
+
 
+
The following example shows how the HighlightingService is used after a Lucene search.
+
 
+
'''searchpipeline.bpel'''
+
<source lang="xml">
+
...
+
<extensionActivity name="invokeLuceneSearchService">
+
    <proc:invokeService>
+
        <proc:service name="LuceneSearchService" />
+
        <proc:variables input="request" output="request" />
+
        <proc:setAnnotations>
+
            <rec:An n="org.eclipse.smila.lucene.LuceneSearchService">
+
                <rec:V n="indexName">test_index</rec:V>
+
            </rec:An>
+
        </proc:setAnnotations>
+
    </proc:invokeService>
+
</extensionActivity>
+
 
+
<extensionActivity name="invokeHighlightingService">
+
    <proc:invokeService>
+
        <proc:service name="HighlightingService" />
+
        <proc:variables input="request" output="request" />
+
    </proc:invokeService>
+
</extensionActivity>
+
...
+
</source>
+
 
+
 
+
== HighlightingTransformer ==
+
 
+
Here is a list of available HighlightingTransformer, a short description and an overview of supported configuration parameters.
+
Each HighlightingTransformer is an OSGi Declarative Service implementing interface <tt>org.eclipse.smila.search.highlighting.transformer.HighlightingTransformer</tt>. It's service component description file must conatain the property <tt>smila.highlighting.transformer.type</tt> whose value matches the name of the HighlightingTransformer.
+
 
+
 
+
=== MaxTextLength ===
+
 
+
The MaxTextLength highlighting transformer is a very simple transformer that limites the highlighted text to a maximum number of characters.
+
 
+
==== Parameters ====
+
 
+
The following parameters are supported:
+
 
+
{| border = 1
+
!Name!!Type!!Constraint!!Default!!Description
+
|-
+
|MarkupPrefix||String||optional||&lt;b&gt;||The markup prefix used for highlighting
+
|-
+
|MarkupSuffix||String||optional||&lt;/b&gt;||The markup suffix used for highlighting
+
|-
+
|MaxLength||Integer||optional||300||Max length of returned highlighted text
+
|}
+
 
+
 
+
=== Sentence ===
+
 
+
The Sentence highlighting transformer is able to highlight text while taking sentence boundaries into credit.
+
 
+
==== Parameters ====
+
 
+
The following parameters are supported:
+
 
+
{| border = 1
+
!Name!!Type!!Constraint!!Default!!Description
+
|-
+
|MarkupPrefix||String||optional||&lt;b&gt;||The markup prefix used for highlighting
+
|-
+
|MarkupSuffix||String||optional||&lt;/b&gt;||The markup suffix used for highlighting
+
|-
+
|MaxLength||Integer||optional||300||Max length of returned highlighted text
+
|-
+
|MaxHLElements||Integer||optional||-||Max number of hl elements
+
|-
+
|MaxSucceedingCharacters||Integer||optional||30||Max succeeding characters in after of an hl element
+
|-
+
|SucceedingCharacters||String||optional||-||Succeeding characters after an hl element
+
|-
+
|SortAlgorithm||Enumeration: Score, Occurrence||optional||Occurrence||Sort algorithm for selecting hl elements
+
|-
+
|TextHandling||Enumeration: ReturnSnipplet, ReturnFullText, ReturnNoText||optional||ReturnFullText||Sort algorithm for selecting hl elements
+
|}
+
 
+
=== ComplexHLResultAggregation ===
+
 
+
This is the most complex available algorithm for highlighting.
+
 
+
==== Parameters ====
+
 
+
The following parameters are supported:
+
 
+
{| border = 1
+
!Name!!Type!!Constraint!!Default!!Description
+
|-
+
|MarkupPrefix||String||optional||&lt;b&gt;||The markup prefix used for highlighting
+
|-
+
|MarkupSuffix||String||optional||&lt;/b&gt;||The markup suffix used for highlighting
+
|-
+
|MaxLength||Integer||optional||300||Max length of returned highlighted text
+
|-
+
|MaxHLElements||Integer||optional||-||Max number of hl elements
+
|-
+
|MaxSucceedingCharacters||Integer||optional||30||Max succeeding characters in after of an hl element
+
|-
+
|SucceedingCharacters||String||optional||-||Succeeding characters after an hl element
+
|-
+
|SortAlgorithm||Enumeration: Score, Occurrence||optional||Occurrence||Sort algorithm for selecting hl elements
+
|-
+
|TextHandling||Enumeration: ReturnSnipplet, ReturnFullText, ReturnNoText||optional||ReturnFullText||Sort algorithm for selecting hl elements
+
|-
+
|MaxPrecedingCharacters||Integer||optional||30||Max preceding characters in front of an hl element
+
|-
+
|PrecedingCharacters||String||optional||-||Preceding characters in front of an hl element
+
|-
+
|HLElementFilter||Boolean||optional||false||Filter for hl elements containing identical text
+
|}
+
 
+
[[Category:SMILA]] [[Category:SMILA/Processing Service]]
+

Latest revision as of 05:51, 19 January 2012

Note.png
Deprecated: The Lucene integration is outdated. Please take a look at the new Solr integration org.eclipse.smila.solr which is described here: SMILA/Documentation/Solr.

Back to the top