Difference between revisions of "SMILA/Documentation/2011.Simplification/HighlightingPipelet"

Revision as of 07:16, 28 February 2011

Bundle: `org.eclipse.smila.search.highlighting.HighlightingPipelet`

Description

This pipelet is used to highlight the results of a former executed search (e.g. with Lucene). A search may return highlight annotations on attributes. These annotations contain the original text as well as positions and scores of hits. The HighlightingPipelet uses HighlightingTransformers to transform these annotations into a markup text, for example in a most basic case all hit terms could be marked bold. The original text is overwritten with this markup and the position annotations are removed. Via annotations on the query a HighlightingTransformer is selected and configured.

Note on implementation: The HighlightingTransformers are own OSGi services. They are managed for the pipelet by a seperate OSGi service named HighlightingService which can reference multiple of these HighlightingTransformer services.

Configuration

Annotations

The HighlightingPipelet expects the following Annotation structure on any record Attribute that should be highlighted in a search (of course only Attributes with Sting values can be highlighted):

<An n="highlight">
    <An n="HighlightingTransformer">
        <V n="name">%HIGHLIGHTING_TRANSFORMER_NAME%</V> 
        <An n="%PARAMETER_NAME%">
            <V n="value">%PARAMETER_VALUE%</V> 
        </An>
        ...
    </An>
</An>

"highlight" is the base annotation. On query records it is used to pass parameters for highlighting. On result records it is used to store the highlighting positions and text.
"HighlightingTransformer" is a sub-annotation of "highlight".
- It must contain a named value "name" that contains the name of the HighlightingTransformer implementation to use. This name has to be euqual to the value specified for property smila.highlighting.transformer.type in the HighlightingTransformer OSGi component description file.
- It may contain additional annotations as parameters for the HighlightingTransformer. The annotations name is the name of the configuration parameter, the named value "value" contains the value of the configuration property. See the description for each HighlightingTransformer to see what parameters are supported.

On the result records the following Annotation structure is expected on Attributes that should be highlighted (these annotations are for example created by the LuceneSearchService):

<An n="highlight">
    <V n="text">The original text without any markup where hits should be highlighted.</V> 
    <An n="positions">
        <V n="start">%START_POSITION_OF_HIT%</V> 
        <V n="end">%START_POSITION_OF_HIT%</V> 
        <V n="quality">%QUALITY_OF_HIT%</V> 
    </An>
    ...
</An>

"highlight" is again used as the base annotation.
- it contains a named value "text" that contains the originla text that should be highlighted
- it may contain sub-annotations "positions for marking hits in the text. A "positions" annotation containes the named values
  - "start" containing the start position of the hit in the text
  - "end" containing the end position of the hit in the text
  - "quality" containing a measure for the quality of the hit

The highlight annotation is processed by a HighlightingTransformer. By some algorithm it finds the positions in the text and adds some markup for highlighting. The "positions" annotations are all removed and the named value "text" is replaced by the text containing the markup. A result has the following annotation structure:

<An n="highlight">
    <V n="text">The text containing some <b>markup</b> to highlight the hits.</V> 
</An>

Configuration files

The HighlightingPipelet and its service have no configuration files. It works with Annotations only.

Example

The following example shows how the HighlightingPipelet is used after a Lucene search.

searchpipeline.bpel

...
<extensionActivity name="invokeLuceneSearchPipelet">
    <proc:invokePipelet name="search">
        <proc:pipelet class="org.eclipse.smila.lucene.pipelets.LuceneSearchPipelet" />
        <proc:variables input="request" output="request" />
        <proc:setAnnotations>
            <rec:An n="org.eclipse.smila.lucene.LuceneSearchService">
                <rec:V n="indexName">test_index</rec:V>
            </rec:An>
        </proc:setAnnotations>
    </proc:invokePipelet>
</extensionActivity>
 
<extensionActivity>
    <proc:invokePipelet name="highlight results">
        <proc:pipelet class="org.eclipse.smila.search.highlighting.HighlightingPipelet" />
        <proc:variables input="request" output="request" />
    </proc:invokePipelet>
</extensionActivity>
...

HighlightingTransformer

Here is a list of available HighlightingTransformer, a short description and an overview of supported configuration parameters. Each HighlightingTransformer is an OSGi Declarative Service implementing interface org.eclipse.smila.search.highlighting.transformer.HighlightingTransformer. It's service component description file must conatain the property smila.highlighting.transformer.type whose value matches the name of the HighlightingTransformer.

MaxTextLength

The MaxTextLength highlighting transformer is a very simple transformer that limites the highlighted text to a maximum number of characters.

Parameters

The following parameters are supported:

Name	Type	Constraint	Default	Description
MarkupPrefix	String	optional	<b>	The markup prefix used for highlighting
MarkupSuffix	String	optional	</b>	The markup suffix used for highlighting
MaxLength	Integer	optional	300	Max length of returned highlighted text

Sentence

The Sentence highlighting transformer is able to highlight text while taking sentence boundaries into credit.

Parameters

The following parameters are supported:

Name	Type	Constraint	Default	Description
MarkupPrefix	String	optional	<b>	The markup prefix used for highlighting
MarkupSuffix	String	optional	</b>	The markup suffix used for highlighting
MaxLength	Integer	optional	300	Max length of returned highlighted text
MaxHLElements	Integer	optional	-	Max number of hl elements
MaxSucceedingCharacters	Integer	optional	30	Max succeeding characters in after of an hl element
SucceedingCharacters	String	optional	-	Succeeding characters after an hl element
SortAlgorithm	Enumeration: Score, Occurrence	optional	Occurrence	Sort algorithm for selecting hl elements
TextHandling	Enumeration: ReturnSnipplet, ReturnFullText, ReturnNoText	optional	ReturnFullText	Sort algorithm for selecting hl elements

ComplexHLResultAggregation

This is the most complex available algorithm for highlighting.

Parameters

The following parameters are supported:

Name	Type	Constraint	Default	Description
MarkupPrefix	String	optional	<b>	The markup prefix used for highlighting
MarkupSuffix	String	optional	</b>	The markup suffix used for highlighting
MaxLength	Integer	optional	300	Max length of returned highlighted text
MaxHLElements	Integer	optional	-	Max number of hl elements
MaxSucceedingCharacters	Integer	optional	30	Max succeeding characters in after of an hl element
SucceedingCharacters	String	optional	-	Succeeding characters after an hl element
SortAlgorithm	Enumeration: Score, Occurrence	optional	Occurrence	Sort algorithm for selecting hl elements
TextHandling	Enumeration: ReturnSnipplet, ReturnFullText, ReturnNoText	optional	ReturnFullText	Sort algorithm for selecting hl elements
MaxPrecedingCharacters	Integer	optional	30	Max preceding characters in front of an hl element
PrecedingCharacters	String	optional	-	Preceding characters in front of an hl element
HLElementFilter	Boolean	optional	false	Filter for hl elements containing identical text

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "SMILA/Documentation/2011.Simplification/HighlightingPipelet"

Revision as of 07:16, 28 February 2011

Contents

Bundle: `org.eclipse.smila.search.highlighting.HighlightingPipelet`

Description

Configuration

Annotations

Configuration files

Example

HighlightingTransformer

MaxTextLength

Parameters

Sentence

Parameters

ComplexHLResultAggregation

Parameters

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "SMILA/Documentation/2011.Simplification/HighlightingPipelet"

Revision as of 07:16, 28 February 2011

Contents

Bundle: org.eclipse.smila.search.highlighting.HighlightingPipelet

Description

Configuration

Annotations

Configuration files

Example

HighlightingTransformer

MaxTextLength

Parameters

Sentence

Parameters

ComplexHLResultAggregation

Parameters

Bundle: `org.eclipse.smila.search.highlighting.HighlightingPipelet`