Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "SMILA/Documentation/Bundle org.eclipse.smila.processing.pipelets.xmlprocessing"
Line 3: | Line 3: | ||
=== Description === | === Description === | ||
− | + | This pipelet performs an XSL transformation on an attribute or attachment value and stores the transformed document in an attribute or attachment. | |
=== Configuration === | === Configuration === | ||
Line 10: | Line 10: | ||
!Property!!Type!!Description | !Property!!Type!!Description | ||
|- | |- | ||
− | |inputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record. |
|- | |- | ||
− | |outputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
|- | |- | ||
− | |inputName||String||name of input attachment or path to input attribute (process literals of attribute) | + | |''inputName''||String||The name of the input attachment or the path to the input attribute (process literals of attribute). |
|- | |- | ||
− | |outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute) | + | |''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
|- | |- | ||
− | |xslFile||String|| | + | |''xslFile''||String||The name (with relative or absolute path) of the XSL file to be used for transformation. |
|} | |} | ||
==== Example ==== | ==== Example ==== | ||
− | ''' | + | '''Pipelet configuration for XslTransformationPipelet''' |
+ | |||
<source lang="xml"> | <source lang="xml"> | ||
− | < | + | <proc:configuration> |
− | + | <rec:Val key="inputType">ATTRIBUTE<rec:Val> | |
− | + | <rec:Val key="outputType">ATTRIBUTE<rec:Val> | |
− | < | + | <rec:Val key="inputName">xmlIn<rec:Val> |
− | + | <rec:Val key="outputName">xmlOut<rec:Val> | |
− | + | <rec:Val key="xslFile">./configuration/data/author.xsl<rec:Val> | |
− | < | + | </proc:configuration> |
− | + | ||
− | + | ||
− | < | + | |
− | + | ||
− | + | ||
− | < | + | |
− | + | ||
− | + | ||
− | </ | + | |
− | </ | + | |
</source> | </source> | ||
− | |||
== <tt>org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet</tt> == | == <tt>org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet</tt> == | ||
Line 49: | Line 39: | ||
=== Description === | === Description === | ||
− | + | This pipelet extracts elements selected by XPath, converts them to appropriate data types (Boolean, Double, String), and stores the transformed value in an attribute or attachment. | |
=== Configuration === | === Configuration === | ||
Line 56: | Line 46: | ||
!Property!!Type!!Description | !Property!!Type!!Description | ||
|- | |- | ||
− | |inputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record. |
|- | |- | ||
− | |outputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
|- | |- | ||
− | |inputName||String||name of input attachment or path to input attribute (process literals of attribute) | + | |''inputName''||String||The name of the input attachment or the path to the input attribute (process literals of attribute). |
|- | |- | ||
− | |outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute) | + | |''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
|- | |- | ||
− | |xpath||String|| | + | |''xpath''||String||The XPath expression to be evaluated. |
|- | |- | ||
− | | | + | |''separator''||String||The optional separator. |
|- | |- | ||
− | |namespace||String|| | + | |''namespace''||String||The optional XML namespace. |
|} | |} | ||
==== Example ==== | ==== Example ==== | ||
− | ''' | + | '''Pipelet configuration for XPathExtractorPipelet''' |
<source lang="xml"> | <source lang="xml"> | ||
− | < | + | <proc:configuration> |
− | + | <rec:Val key="inputType">ATTRIBUTE<rec:Val> | |
− | + | <rec:Val key="outputType">ATTRIBUTE<rec:Val> | |
− | + | <rec:Val key="inputName">xmlIn<rec:Val> | |
− | + | <rec:Val key="outputName">xmlOut<rec:Val> | |
− | + | <rec:Val key="xpath">author/email<rec:Val> | |
− | + | <rec:Val key="separator"><rec:Val> | |
− | + | <rec:Val key="namespace"><rec:Val> | |
− | + | </proc:configuration> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</source> | </source> | ||
Line 106: | Line 82: | ||
=== Description === | === Description === | ||
− | + | This pipelet filters elements by XPath expressions (either using include or exclude mode) and stores the filtered elements as a new document in an attribute or attachment. | |
=== Configuration === | === Configuration === | ||
Line 113: | Line 89: | ||
!Property!!Type!!Description | !Property!!Type!!Description | ||
|- | |- | ||
− | |inputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record. |
|- | |- | ||
− | |outputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
|- | |- | ||
− | |inputName||String||name of input attachment or path to input attribute (process literals of attribute) | + | |''inputName''||String||The name of the input attachment or the path to the input attribute (process literals of attribute). |
|- | |- | ||
− | |outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute) | + | |''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
|- | |- | ||
− | |xpath||String|| | + | |''xpath''||String||The XPath expressions to be evaluated (multi-valued property). |
|- | |- | ||
− | |filterMode||String : INCLUDE, EXCLUDE|| | + | |''filterMode''||String : INCLUDE, EXCLUDE||The filter mode, defining whether to include or exclude the elements matched by the XPath expressions. |
|- | |- | ||
− | | | + | |''separator''||String||The optional separator. |
|- | |- | ||
− | |namespace||String|| | + | |''namespace''||String||The optional XML namespace. |
|} | |} | ||
− | ==== | + | ==== Examples ==== |
+ | |||
+ | '''Pipelet configuration for XPathFilterPipelet with multi-valued xpath''' | ||
+ | <source lang="xml"> | ||
+ | <proc:configuration> | ||
+ | <rec:Val key="inputType">ATTRIBUTE<rec:Val> | ||
+ | <rec:Val key="outputType">ATTRIBUTE<rec:Val> | ||
+ | <rec:Val key="inputName">xmlIn<rec:Val> | ||
+ | <rec:Val key="outputName">xmlOut<rec:Val> | ||
+ | <rec:Seq key="xpath"> | ||
+ | <rec:Val>author/email<rec:Val> | ||
+ | <rec:Val>author/name<rec:Val> | ||
+ | </rec:Seq> | ||
+ | <rec:Val key="filterMode">EXCLUDE<rec:Val> | ||
+ | <rec:Val key="seperator"><rec:Val> | ||
+ | <rec:Val key="namespace"><rec:Val> | ||
+ | </proc:configuration> | ||
+ | </source> | ||
− | ''' | + | '''Pipelet configuration for XPathFilterPipelet with single-valued xpath''' |
<source lang="xml"> | <source lang="xml"> | ||
− | < | + | <proc:configuration> |
− | + | <rec:Val key="inputType">ATTRIBUTE<rec:Val> | |
− | + | <rec:Val key="outputType">ATTRIBUTE<rec:Val> | |
− | + | <rec:Val key="inputName">xmlIn<rec:Val> | |
− | + | <rec:Val key="outputName">xmlOut<rec:Val> | |
− | + | <rec:Val key="xpath">author/email<rec:Val> | |
− | + | <rec:Val key="filterMode">EXCLUDE<rec:Val> | |
− | + | <rec:Val key="seperator"><rec:Val> | |
− | + | <rec:Val key="namespace"><rec:Val> | |
− | + | </proc:configuration> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</source> | </source> | ||
Line 171: | Line 145: | ||
=== Description === | === Description === | ||
− | + | This pipelet removes a selected element from an XML document and stores the manipulated document in an attribute or attachment. | |
=== Configuration === | === Configuration === | ||
Line 178: | Line 152: | ||
!Property!!Type!!Description | !Property!!Type!!Description | ||
|- | |- | ||
− | |inputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record. |
|- | |- | ||
− | |outputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
|- | |- | ||
− | |inputName||String||name of input attachment or path to input attribute (process literals of attribute) | + | |''inputName''||String||The name of the input attachment or the path to the input attribute (process literals of attribute). |
|- | |- | ||
− | |outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute) | + | |''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
|- | |- | ||
− | |elementId||String|| | + | |''elementId''||String||The ID of the XML element to be removed. |
|} | |} | ||
==== Example ==== | ==== Example ==== | ||
− | ''' | + | '''Pipelet configuration for RemoveElementFromXMLPipelet''' |
<source lang="xml"> | <source lang="xml"> | ||
− | < | + | <proc:configuration> |
− | + | <rec:Val key="inputType">ATTRIBUTE<rec:Val> | |
− | + | <rec:Val key="outputType">ATTRIBUTE<rec:Val> | |
− | + | <rec:Val key="inputName">xmlIn<rec:Val> | |
− | + | <rec:Val key="outputName">xmlOut<rec:Val> | |
− | + | <rec:Val key="elementId">title<rec:Val> | |
− | + | </proc:configuration> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | </ | + | |
</source> | </source> | ||
Line 218: | Line 182: | ||
=== Description === | === Description === | ||
− | + | This pipelet performs a Tidy transformation on an attribute or attachment value and stores the result in an attribute or attachment. | |
=== Configuration === | === Configuration === | ||
Line 225: | Line 189: | ||
!Property!!Type!!Description | !Property!!Type!!Description | ||
|- | |- | ||
− | |inputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record. |
|- | |- | ||
− | |outputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
|- | |- | ||
− | |inputName||String||name of input attachment or path to input attribute (process literals of attribute) | + | |''inputName''||String||The name of the input attachment or the path to the input attribute (process literals of attribute). |
|- | |- | ||
− | |outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute) | + | |''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
|- | |- | ||
− | |tidyFile||String|| | + | |''tidyFile''||String||The name (with relative or absolute path) of the Tidy configuration file to be used by the transformation. |
|} | |} | ||
==== Example ==== | ==== Example ==== | ||
− | ''' | + | '''Pipelet configuration for TidyPipelet''' |
<source lang="xml"> | <source lang="xml"> | ||
− | < | + | <proc:configuration> |
− | + | <rec:Val key="inputType">ATTRIBUTE<rec:Val> | |
− | + | <rec:Val key="outputType">ATTRIBUTE<rec:Val> | |
− | + | <rec:Val key="inputName">xmlIn<rec:Val> | |
− | + | <rec:Val key="outputName">xmlOut<rec:Val> | |
− | + | <rec:Val key="tidyFile">./configuration/data/tidy_config.txt<rec:Val> | |
− | + | </proc:configuration> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | </ | + | |
</source> | </source> | ||
− | |||
− | |||
− | |||
== <tt>org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet</tt> == | == <tt>org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet</tt> == | ||
Line 266: | Line 217: | ||
=== Description === | === Description === | ||
− | + | This pipelet splits an XML stream into multiple XML snippets. For each snippet a new record is created where the XML snippet is stored in either an attribute or attachment. The created records are not returned as a PipeletResult (this is just the same as the incoming RecordIds) but are directly sent to the ConnectivityManager and are routed once more to the queue. | |
− | On each created record the | + | On each created record the annotation <tt>MessageProperties</tt> is set with the key-value pair <tt>isXmlSnippet</tt>=<tt>true</tt>. This can be used in Listener rules to select particular XML snippets for processing. |
=== Configuration === | === Configuration === | ||
Line 275: | Line 226: | ||
!Property!!Type!!Description | !Property!!Type!!Description | ||
|- | |- | ||
− | |inputType||String : ATTACHMENT, ATTRIBUTE|| | + | |''inputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML input is found in an attachment or in an attribute of the record. An input attribute is not interpreted as to contain XML content itself but rather as a file path or an URL to the XML document. |
+ | |- | ||
+ | |''outputType''||String : ATTACHMENT, ATTRIBUTE||Defines whether the XML snippet should be stored in an attachment or in an attribute of the newly created record. | ||
|- | |- | ||
− | | | + | |''inputName''||String||The name of the input attachment or the path to the input attribute. |
|- | |- | ||
− | | | + | |''outputName''||String||The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
|- | |- | ||
− | | | + | |''beginTagName''||String||The name of the tag to start the XML snippet with. |
|- | |- | ||
− | | | + | |''isBeginClosingTag''||Boolean||A boolean flag defining whether ''beginTagName'' is a closing tag (true) or not (false). |
|- | |- | ||
− | | | + | |''endTagName''||String||The name of the tag to end the xml snippet with. |
|- | |- | ||
− | | | + | |''isEndClosingTag''||Boolean||A boolean flag defining whether ''endTagName'' is a closing tag (true) or not (false). |
|- | |- | ||
− | + | |''keyTagName''||String||The name of the tag used to create a record ID. | |
|- | |- | ||
− | | | + | |''maxBufferSize''||Integer||The maximum size of the internal record buffer (optional, default is 20). |
|- | |- | ||
− | | | + | |''idSeparator''||String||The separator used to create the record IDs of the split records (optional, default is "#"). |
|} | |} | ||
==== Example ==== | ==== Example ==== | ||
− | ''' | + | '''Pipelet configuration for XmlSplitterPipelet''' |
<source lang="xml"> | <source lang="xml"> | ||
− | < | + | <proc:configuration> |
− | < | + | <rec:Val key="inputType">ATTRIBUTE<rec:Val> |
− | + | <rec:Val key="outputType">ATTRIBUTE<rec:Val> | |
− | < | + | <rec:Val key="inputName">xmlIn<rec:Val> |
− | + | <rec:Val key="outputName">xmlOut<rec:Val> | |
− | + | <rec:Val key="beginTagName">document<rec:Val> | |
− | < | + | <rec:Val key="isBeginClosingTag">false<rec:Val> |
− | + | <rec:Val key="endTagName">document<rec:Val> | |
− | + | <rec:Val key="isEndClosingTag">true<rec:Val> | |
− | < | + | <rec:Val key="keyTagName">docId<rec:Val> |
− | + | <rec:Val key="idSeparator">#<rec:Val> | |
− | + | </proc:configuration> | |
− | < | + | |
− | + | ||
− | + | ||
− | < | + | |
− | + | ||
− | + | ||
− | < | + | |
− | + | ||
− | + | ||
− | < | + | |
− | + | ||
− | + | ||
− | < | + | |
− | + | ||
− | + | ||
− | < | + | |
− | </ | + | |
</source> | </source> | ||
− | The | + | The above configuration would split this XML format: |
<source lang="xml"> | <source lang="xml"> | ||
<sampleCollection> | <sampleCollection> | ||
Line 350: | Line 286: | ||
</source> | </source> | ||
− | into XML snippets like this one | + | into XML snippets like this one: |
<source lang="xml"> | <source lang="xml"> | ||
<document> | <document> | ||
Line 360: | Line 296: | ||
</source> | </source> | ||
− | And | + | And for each snippet a record would be created: |
<source lang="xml"> | <source lang="xml"> | ||
− | <Record | + | <Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> |
− | + | <Val key="_recordid">xmlsplitter:someBigXmlfile.xml#4711</Val> | |
− | + | <Val key="_source">xmlsplitter</Val> | |
− | + | <Map key="_messageProperties"> | |
− | + | <Val key="isXmlSnippet">true</Val> | |
− | < | + | </Map |
− | + | <Val key="xmlOut"> | |
− | < | + | <document> |
− | </ | + | |
− | < | + | |
− | + | ||
− | + | ||
− | + | ||
<docId>4711</docId> | <docId>4711</docId> | ||
<title>Some title</title> | <title>Some title</title> | ||
Line 381: | Line 312: | ||
<text>Some text</text> | <text>Some text</text> | ||
</document> | </document> | ||
− | + | </Val> | |
− | + | ||
− | </ | + | |
</Record> | </Record> | ||
</source> | </source> |
Revision as of 10:25, 20 April 2011
Contents
- 1 org.eclipse.smila.processing.pipelets.xmlprocessing.XslTransformationPipelet
- 2 org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet
- 3 org.eclipse.smila.processing.pipelets.xmlprocessing.XPathFilterPipelet
- 4 org.eclipse.smila.processing.pipelets.xmlprocessing.RemoveElementFromXMLPipelet
- 5 org.eclipse.smila.processing.pipelets.xmlprocessing.TidyPipelet
- 6 org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet
org.eclipse.smila.processing.pipelets.xmlprocessing.XslTransformationPipelet
Description
This pipelet performs an XSL transformation on an attribute or attachment value and stores the transformed document in an attribute or attachment.
Configuration
Property | Type | Description |
---|---|---|
inputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the XML input is found in an attachment or in an attribute of the record. |
outputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
inputName | String | The name of the input attachment or the path to the input attribute (process literals of attribute). |
outputName | String | The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
xslFile | String | The name (with relative or absolute path) of the XSL file to be used for transformation. |
Example
Pipelet configuration for XslTransformationPipelet
<proc:configuration> <rec:Val key="inputType">ATTRIBUTE<rec:Val> <rec:Val key="outputType">ATTRIBUTE<rec:Val> <rec:Val key="inputName">xmlIn<rec:Val> <rec:Val key="outputName">xmlOut<rec:Val> <rec:Val key="xslFile">./configuration/data/author.xsl<rec:Val> </proc:configuration>
org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet
Description
This pipelet extracts elements selected by XPath, converts them to appropriate data types (Boolean, Double, String), and stores the transformed value in an attribute or attachment.
Configuration
Property | Type | Description |
---|---|---|
inputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the XML input is found in an attachment or in an attribute of the record. |
outputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
inputName | String | The name of the input attachment or the path to the input attribute (process literals of attribute). |
outputName | String | The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
xpath | String | The XPath expression to be evaluated. |
separator | String | The optional separator. |
namespace | String | The optional XML namespace. |
Example
Pipelet configuration for XPathExtractorPipelet
<proc:configuration> <rec:Val key="inputType">ATTRIBUTE<rec:Val> <rec:Val key="outputType">ATTRIBUTE<rec:Val> <rec:Val key="inputName">xmlIn<rec:Val> <rec:Val key="outputName">xmlOut<rec:Val> <rec:Val key="xpath">author/email<rec:Val> <rec:Val key="separator"><rec:Val> <rec:Val key="namespace"><rec:Val> </proc:configuration>
org.eclipse.smila.processing.pipelets.xmlprocessing.XPathFilterPipelet
Description
This pipelet filters elements by XPath expressions (either using include or exclude mode) and stores the filtered elements as a new document in an attribute or attachment.
Configuration
Property | Type | Description |
---|---|---|
inputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the XML input is found in an attachment or in an attribute of the record. |
outputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
inputName | String | The name of the input attachment or the path to the input attribute (process literals of attribute). |
outputName | String | The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
xpath | String | The XPath expressions to be evaluated (multi-valued property). |
filterMode | String : INCLUDE, EXCLUDE | The filter mode, defining whether to include or exclude the elements matched by the XPath expressions. |
separator | String | The optional separator. |
namespace | String | The optional XML namespace. |
Examples
Pipelet configuration for XPathFilterPipelet with multi-valued xpath
<proc:configuration> <rec:Val key="inputType">ATTRIBUTE<rec:Val> <rec:Val key="outputType">ATTRIBUTE<rec:Val> <rec:Val key="inputName">xmlIn<rec:Val> <rec:Val key="outputName">xmlOut<rec:Val> <rec:Seq key="xpath"> <rec:Val>author/email<rec:Val> <rec:Val>author/name<rec:Val> </rec:Seq> <rec:Val key="filterMode">EXCLUDE<rec:Val> <rec:Val key="seperator"><rec:Val> <rec:Val key="namespace"><rec:Val> </proc:configuration>
Pipelet configuration for XPathFilterPipelet with single-valued xpath
<proc:configuration> <rec:Val key="inputType">ATTRIBUTE<rec:Val> <rec:Val key="outputType">ATTRIBUTE<rec:Val> <rec:Val key="inputName">xmlIn<rec:Val> <rec:Val key="outputName">xmlOut<rec:Val> <rec:Val key="xpath">author/email<rec:Val> <rec:Val key="filterMode">EXCLUDE<rec:Val> <rec:Val key="seperator"><rec:Val> <rec:Val key="namespace"><rec:Val> </proc:configuration>
org.eclipse.smila.processing.pipelets.xmlprocessing.RemoveElementFromXMLPipelet
Description
This pipelet removes a selected element from an XML document and stores the manipulated document in an attribute or attachment.
Configuration
Property | Type | Description |
---|---|---|
inputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the XML input is found in an attachment or in an attribute of the record. |
outputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
inputName | String | The name of the input attachment or the path to the input attribute (process literals of attribute). |
outputName | String | The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
elementId | String | The ID of the XML element to be removed. |
Example
Pipelet configuration for RemoveElementFromXMLPipelet
<proc:configuration> <rec:Val key="inputType">ATTRIBUTE<rec:Val> <rec:Val key="outputType">ATTRIBUTE<rec:Val> <rec:Val key="inputName">xmlIn<rec:Val> <rec:Val key="outputName">xmlOut<rec:Val> <rec:Val key="elementId">title<rec:Val> </proc:configuration>
org.eclipse.smila.processing.pipelets.xmlprocessing.TidyPipelet
Description
This pipelet performs a Tidy transformation on an attribute or attachment value and stores the result in an attribute or attachment.
Configuration
Property | Type | Description |
---|---|---|
inputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the XML input is found in an attachment or in an attribute of the record. |
outputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the transformed output should be stored in an attachment or in an attribute of the record. |
inputName | String | The name of the input attachment or the path to the input attribute (process literals of attribute). |
outputName | String | The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
tidyFile | String | The name (with relative or absolute path) of the Tidy configuration file to be used by the transformation. |
Example
Pipelet configuration for TidyPipelet
<proc:configuration> <rec:Val key="inputType">ATTRIBUTE<rec:Val> <rec:Val key="outputType">ATTRIBUTE<rec:Val> <rec:Val key="inputName">xmlIn<rec:Val> <rec:Val key="outputName">xmlOut<rec:Val> <rec:Val key="tidyFile">./configuration/data/tidy_config.txt<rec:Val> </proc:configuration>
org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet
Description
This pipelet splits an XML stream into multiple XML snippets. For each snippet a new record is created where the XML snippet is stored in either an attribute or attachment. The created records are not returned as a PipeletResult (this is just the same as the incoming RecordIds) but are directly sent to the ConnectivityManager and are routed once more to the queue.
On each created record the annotation MessageProperties is set with the key-value pair isXmlSnippet=true. This can be used in Listener rules to select particular XML snippets for processing.
Configuration
Property | Type | Description |
---|---|---|
inputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the XML input is found in an attachment or in an attribute of the record. An input attribute is not interpreted as to contain XML content itself but rather as a file path or an URL to the XML document. |
outputType | String : ATTACHMENT, ATTRIBUTE | Defines whether the XML snippet should be stored in an attachment or in an attribute of the newly created record. |
inputName | String | The name of the input attachment or the path to the input attribute. |
outputName | String | The name of the output attachment or the path to the output attribute (store result as literals of attribute). |
beginTagName | String | The name of the tag to start the XML snippet with. |
isBeginClosingTag | Boolean | A boolean flag defining whether beginTagName is a closing tag (true) or not (false). |
endTagName | String | The name of the tag to end the xml snippet with. |
isEndClosingTag | Boolean | A boolean flag defining whether endTagName is a closing tag (true) or not (false). |
keyTagName | String | The name of the tag used to create a record ID. |
maxBufferSize | Integer | The maximum size of the internal record buffer (optional, default is 20). |
idSeparator | String | The separator used to create the record IDs of the split records (optional, default is "#"). |
Example
Pipelet configuration for XmlSplitterPipelet
<proc:configuration> <rec:Val key="inputType">ATTRIBUTE<rec:Val> <rec:Val key="outputType">ATTRIBUTE<rec:Val> <rec:Val key="inputName">xmlIn<rec:Val> <rec:Val key="outputName">xmlOut<rec:Val> <rec:Val key="beginTagName">document<rec:Val> <rec:Val key="isBeginClosingTag">false<rec:Val> <rec:Val key="endTagName">document<rec:Val> <rec:Val key="isEndClosingTag">true<rec:Val> <rec:Val key="keyTagName">docId<rec:Val> <rec:Val key="idSeparator">#<rec:Val> </proc:configuration>
The above configuration would split this XML format:
<sampleCollection> ... <document> <docId>4711</docId> <title>Some title</title> ... <text>Some text</text> </document> <document> <docId>0815</docId> ... </document> ... </sampleCollection>
into XML snippets like this one:
<document> <docId>4711</docId> <title>Some title</title> ... <text>Some text</text> </document>
And for each snippet a record would be created:
<Record xmlns="http://www.eclipse.org/smila/record" version="2.0"> <Val key="_recordid">xmlsplitter:someBigXmlfile.xml#4711</Val> <Val key="_source">xmlsplitter</Val> <Map key="_messageProperties"> <Val key="isXmlSnippet">true</Val> </Map <Val key="xmlOut"> <document> <docId>4711</docId> <title>Some title</title> ... <text>Some text</text> </document> </Val> </Record>
The Listener rules to split the XML files and to process the XML snippets could look like this:
<Rule Name="Splitter Rule" WaitMessageTimeout="10" Threads="2" MaxMessageBlockSize="1"> <Source BrokerId="broker1" Queue="SMILA.connectivity"/> <Condition>Operation='ADD' and DataSourceID LIKE '%xmlsplitting%' and NOT(isXmlSnippet='true')</Condition> <Task> <Process Workflow="SplitterPipeline"/> </Task> </Rule> <Rule Name="Snippet Rule" WaitMessageTimeout="10" Threads="4" MaxMessageBlockSize="20"> <Source BrokerId="broker1" Queue="SMILA.connectivity"/> <Condition>Operation='ADD' and DataSourceID LIKE '%xmlsplitting%' and isXmlSnippet='true'</Condition> <Task> <Process Workflow="Snippetipeline"/> </Task> </Rule>