Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Bundle org.eclipse.smila.processing.pipelets.xmlprocessing"

(added Read Type)
(added Read Type to pipelets)
Line 10: Line 10:
 
!Property!!Data Type!!Read Type!!Description
 
!Property!!Data Type!!Read Type!!Description
 
|-
 
|-
|''inputType''||String : ATTACHMENT, ATTRIBUTE||Runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.
+
|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.
 
|-
 
|-
|''outputType''||String : ATTACHMENT, ATTRIBUTE||Runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
+
|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
 
|-
 
|-
|''inputName''||String||Runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).
+
|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).
 
|-
 
|-
|''outputName''||String||Runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
+
|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
 
|-
 
|-
|''xslFile''||String||Runtime||The name (with relative or absolute path) of the XSL file to be used for transformation.
+
|''xslFile''||String||init||The name (with relative or absolute path) of the XSL file to be used for transformation.
 
|}
 
|}
  
Line 44: Line 44:
  
 
{| border = 1
 
{| border = 1
!Property!!Type!!Description
+
!Property!!Data Type!!Read Type!!Description
 
|-
 
|-
|''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record.  
+
|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.  
 
|-
 
|-
|''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
+
|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
 
|-
 
|-
|''inputName''||String||The name of the input attachment or the path to the input attribute (process literals of attribute).
+
|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).
 
|-
 
|-
|''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
+
|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
 
|-
 
|-
|''xpath''||String||The XPath expression to be evaluated.
+
|''xpath''||String||runtime||The XPath expression to be evaluated.
 
|-
 
|-
|''separator''||String||The optional separator.
+
|''separator''||String||runtime||The optional separator.
 
|-
 
|-
|''namespace''||String||The optional XML namespace.
+
|''namespace''||String||runtime||The optional XML namespace.
 
|}
 
|}
  
Line 87: Line 87:
  
 
{| border = 1
 
{| border = 1
!Property!!Type!!Description
+
!Property!!Data Type!!Read Type!!Description
 
|-
 
|-
|''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record.
+
|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.
 
|-
 
|-
|''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.  
+
|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.  
 
|-
 
|-
|''inputName''||String||The name of the input attachment or the path to the input attribute (process literals of attribute).  
+
|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).  
 
|-
 
|-
|''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute).  
+
|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).  
 
|-
 
|-
|''xpath''||String||The XPath expressions to be evaluated (multi-valued property).
+
|''xpath''||String||runtime||The XPath expressions to be evaluated (multi-valued property).
 
|-
 
|-
|''filterMode''||String : INCLUDE, EXCLUDE||The filter mode, defining whether to include or exclude the elements matched by the XPath expressions.
+
|''filterMode''||String : INCLUDE, EXCLUDE||runtime||The filter mode, defining whether to include or exclude the elements matched by the XPath expressions.
 
|-
 
|-
|''separator''||String||The optional separator.
+
|''separator''||String||runtime||The optional separator.
 
|-
 
|-
|''namespace''||String||The optional XML namespace.
+
|''namespace''||String||runtime||The optional XML namespace.
 
|}
 
|}
  
Line 150: Line 150:
  
 
{| border = 1
 
{| border = 1
!Property!!Type!!Description
+
!Property!!Data Type!!Read Type!!Description
 
|-
 
|-
|''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record.  
+
|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.  
 
|-
 
|-
|''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.  
+
|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.  
 
|-
 
|-
|''inputName''||String||The name of the input attachment or the path to the input attribute (process literals of attribute).  
+
|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).  
 
|-
 
|-
|''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute).  
+
|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).  
 
|-
 
|-
|''elementId''||String||The ID of the XML element to be removed.
+
|''elementId''||String||runtime||The ID of the XML element to be removed.
 
|}
 
|}
  
Line 187: Line 187:
  
 
{| border = 1
 
{| border = 1
!Property!!Type!!Description
+
!Property!!Data Type!!Read Type!!Description
 
|-
 
|-
|''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record.  
+
|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.  
 
|-
 
|-
|''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.  
+
|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.  
 
|-
 
|-
|''inputName''||String||The name of the input attachment or the path to the input attribute (process literals of attribute).
+
|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).
 
|-
 
|-
|''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute).  
+
|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).  
 
|-
 
|-
|''tidyFile''||String||The name (with relative or absolute path) of the Tidy configuration file to be used by the transformation.
+
|''tidyFile''||String||init||The name (with relative or absolute path) of the Tidy configuration file to be used by the transformation.
 
|}
 
|}
  
Line 226: Line 226:
 
!Property!!Type!!Description
 
!Property!!Type!!Description
 
|-
 
|-
|''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record. An input attribute is not interpreted as to contain XML content itself but rather as a file path or an URL to the XML document.
+
|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record. An input attribute is not interpreted as to contain XML content itself but rather as a file path or an URL to the XML document.
 
|-
 
|-
|''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML snippet should be stored in an attachment or in an attribute of the newly created record.  
+
|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML snippet should be stored in an attachment or in an attribute of the newly created record.  
 
|-
 
|-
|''inputName''||String||The name of the input attachment or the path to the input attribute.
+
|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute.
 
|-
 
|-
|''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute).  
+
|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).  
 
|-
 
|-
|''beginTagName''||String||The name of the tag to start the XML snippet with.
+
|''beginTagName''||String||runtime||The name of the tag to start the XML snippet with.
 
|-
 
|-
|''isBeginClosingTag''||Boolean||A boolean flag defining whether ''beginTagName'' is a closing tag (true) or not (false).
+
|''isBeginClosingTag''||Boolean||runtime||A boolean flag defining whether ''beginTagName'' is a closing tag (true) or not (false).
 
|-
 
|-
|''endTagName''||String||The name of the tag to end the xml snippet with.
+
|''endTagName''||String||runtime||The name of the tag to end the xml snippet with.
 
|-
 
|-
|''isEndClosingTag''||Boolean||A boolean flag defining whether ''endTagName'' is a closing tag (true) or not (false).
+
|''isEndClosingTag''||Boolean||runtime||A boolean flag defining whether ''endTagName'' is a closing tag (true) or not (false).
 
|-
 
|-
|''keyTagName''||String||The name of the tag used to create a record ID.
+
|''keyTagName''||String||runtime||The name of the tag used to create a record ID.
 
|-
 
|-
|''maxBufferSize''||Integer||The maximum size of the internal record buffer (optional, default is 20).
+
|''maxBufferSize''||Integer||runtime||The maximum size of the internal record buffer (optional, default is 20).
 
|-
 
|-
|''idSeparator''||String||The separator used to create the record IDs of the split records (optional, default is "#").
+
|''idSeparator''||String||runtime||The separator used to create the record IDs of the split records (optional, default is "#").
 
|-
 
|-
|''xmlSnippetJobName''||String||The JobManager job name to submit the split records to. It must be running when the pipelet is executed.
+
|''xmlSnippetJobName''||String||runtime||The JobManager job name to submit the split records to. It must be running when the pipelet is executed.
 
|}
 
|}
  

Revision as of 09:52, 14 September 2011

org.eclipse.smila.processing.pipelets.xmlprocessing.XslTransformationPipelet

Description

This pipelet performs an XSL transformation on an attribute or attachment value and stores the transformed document in an attribute or attachment.

Configuration

Property Data Type Read Type Description
inputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName String runtime The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName String runtime The name of the output attachment or the path to the output attribute (store result as literals of attribute).
xslFile String init The name (with relative or absolute path) of the XSL file to be used for transformation.

Example

Pipelet configuration for XslTransformationPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="xslFile">./configuration/data/author.xsl<rec:Val>
</proc:configuration>

org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet

Description

This pipelet extracts elements selected by XPath, converts them to appropriate data types (Boolean, Double, String), and stores the transformed value in an attribute or attachment.

Configuration

Property Data Type Read Type Description
inputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName String runtime The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName String runtime The name of the output attachment or the path to the output attribute (store result as literals of attribute).
xpath String runtime The XPath expression to be evaluated.
separator String runtime The optional separator.
namespace String runtime The optional XML namespace.

Example

Pipelet configuration for XPathExtractorPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="xpath">author/email<rec:Val>
    <rec:Val key="separator"><rec:Val>
    <rec:Val key="namespace"><rec:Val>
</proc:configuration>

org.eclipse.smila.processing.pipelets.xmlprocessing.XPathFilterPipelet

Description

This pipelet filters elements by XPath expressions (either using include or exclude mode) and stores the filtered elements as a new document in an attribute or attachment.

Configuration

Property Data Type Read Type Description
inputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName String runtime The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName String runtime The name of the output attachment or the path to the output attribute (store result as literals of attribute).
xpath String runtime The XPath expressions to be evaluated (multi-valued property).
filterMode String : INCLUDE, EXCLUDE runtime The filter mode, defining whether to include or exclude the elements matched by the XPath expressions.
separator String runtime The optional separator.
namespace String runtime The optional XML namespace.

Examples

Pipelet configuration for XPathFilterPipelet with multi-valued xpath

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Seq key="xpath">
        <rec:Val>author/email<rec:Val>
        <rec:Val>author/name<rec:Val>
    </rec:Seq>
    <rec:Val key="filterMode">EXCLUDE<rec:Val>
    <rec:Val key="seperator"><rec:Val>
    <rec:Val key="namespace"><rec:Val>
</proc:configuration>

Pipelet configuration for XPathFilterPipelet with single-valued xpath

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="xpath">author/email<rec:Val>
    <rec:Val key="filterMode">EXCLUDE<rec:Val>
    <rec:Val key="seperator"><rec:Val>
    <rec:Val key="namespace"><rec:Val>
</proc:configuration>

org.eclipse.smila.processing.pipelets.xmlprocessing.RemoveElementFromXMLPipelet

Description

This pipelet removes a selected element from an XML document and stores the manipulated document in an attribute or attachment.

Configuration

Property Data Type Read Type Description
inputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName String runtime The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName String runtime The name of the output attachment or the path to the output attribute (store result as literals of attribute).
elementId String runtime The ID of the XML element to be removed.

Example

Pipelet configuration for RemoveElementFromXMLPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="elementId">title<rec:Val>
</proc:configuration>

org.eclipse.smila.processing.pipelets.xmlprocessing.TidyPipelet

Description

This pipelet performs a Tidy transformation on an attribute or attachment value and stores the result in an attribute or attachment.

Configuration

Property Data Type Read Type Description
inputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName String runtime The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName String runtime The name of the output attachment or the path to the output attribute (store result as literals of attribute).
tidyFile String init The name (with relative or absolute path) of the Tidy configuration file to be used by the transformation.

Example

Pipelet configuration for TidyPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="tidyFile">./configuration/data/tidy_config.txt<rec:Val>
</proc:configuration>

org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet

Description

This pipelet splits an XML stream into multiple XML snippets. For each snippet a new record is created where the XML snippet is stored in either an attribute or attachment. The created records are not returned as a PipeletResult (this is just the same as the incoming RecordIds) but are directly sent to the ConnectivityManager and are routed once more to the queue.

On each created record the attribute __isXmlSnippet=true is set to true. Incoming records with this attribute set are not splitted again, but returned as the Pipelet result. This way it's possible to add further processing steps that should be done on the splitted records to the same pipeline that does the splitting. See XmlSplitAndAddPipeline.bpel for an example.

Configuration

Property Type Description
inputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the XML input is found in an attachment or in an attribute of the record. An input attribute is not interpreted as to contain XML content itself but rather as a file path or an URL to the XML document.
outputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the XML snippet should be stored in an attachment or in an attribute of the newly created record.
inputName String runtime The name of the input attachment or the path to the input attribute.
outputName String runtime The name of the output attachment or the path to the output attribute (store result as literals of attribute).
beginTagName String runtime The name of the tag to start the XML snippet with.
isBeginClosingTag Boolean runtime A boolean flag defining whether beginTagName is a closing tag (true) or not (false).
endTagName String runtime The name of the tag to end the xml snippet with.
isEndClosingTag Boolean runtime A boolean flag defining whether endTagName is a closing tag (true) or not (false).
keyTagName String runtime The name of the tag used to create a record ID.
maxBufferSize Integer runtime The maximum size of the internal record buffer (optional, default is 20).
idSeparator String runtime The separator used to create the record IDs of the split records (optional, default is "#").
xmlSnippetJobName String runtime The JobManager job name to submit the split records to. It must be running when the pipelet is executed.

The first four attributes can be set only in the pipelet configuration. All other attributes can seperately customized for each single record by setting them as values of the the _parameters map in the record.

Example

Pipelet configuration for XmlSplitterPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE</rec:Val>
    <rec:Val key="outputType">ATTRIBUTE</rec:Val>
    <rec:Val key="inputName">xmlIn</rec:Val>
    <rec:Val key="outputName">xmlOut</rec:Val>
    <rec:Val key="beginTagName">document</rec:Val>
    <rec:Val key="isBeginClosingTag">false</rec:Val>
    <rec:Val key="endTagName">document</rec:Val>
    <rec:Val key="isEndClosingTag">true</rec:Val>
    <rec:Val key="keyTagName">docId</rec:Val>
    <rec:Val key="idSeparator">#</rec:Val>
    <rec:Val key="xmlSnippetJobName">indexUpdateXml</rec:Val>
</proc:configuration>


The above configuration would split this XML format:

<sampleCollection>
    ...
    <document>
        <docId>4711</docId>
        <title>Some title</title>
        ...
        <text>Some text</text>
    </document>
    <document>
        <docId>0815</docId>  
        ...
    </document>
    ...
</sampleCollection>

into XML snippets like this one:

<document>
    <docId>4711</docId>
    <title>Some title</title>
    ...
    <text>Some text</text>
</document>

And for each snippet a record would be created and submitted to a job run of job "indexUpdateXml".

<Record xmlns="http://www.eclipse.org/smila/record" version="2.0">
  <Val key="_recordid">xmlsplitter:someBigXmlfile.xml#4711</Val>
  <Val key="_source">xmlsplitter</Val>
  <Val key="__isXmlSnippet">true</Val>
  <Val key="xmlOut">
         <document>
            <docId>4711</docId>
            <title>Some title</title>
            ...
            <text>Some text</text>
        </document>   
  </Val>
</Record>

To use the XmlSplitAndAddPipeline.bpel via the predefined asynchronous workflow indexUpdateXml you should create a job definition that sets the "xmlSnippetJobName" to the job's own name:

{
  "name": "indexUpdateXmlJob",
  "workflow": "indexUpdateXml",
  "parameters": {
    "tempStore": "xmlbulks",
    "xmlSnippetJobName": "indexUpdateXmlJob"
  }
}

Back to the top