Difference between revisions of "SMILA/Documentation/Bundle org.eclipse.smila.processing.pipelets.xmlprocessing"

Revision as of 05:14, 27 March 2012

General

All pipelets in this bundle support the configurable error handling as described in SMILA/Development_Guidelines/How_to_write_a_Pipelet#Implementation. When used in jobmanager workflows, records causing errors are dropped.

Read Type

runtime: Parameters are read when processing records. Parameter value can be set per Record.
init: Parameters are read once from Pipelet configuration when initializing the Pipelet. Parameter value can not be overwritten in Record.

`org.eclipse.smila.processing.pipelets.xmlprocessing.XslTransformationPipelet`

Description

This pipelet performs an XSL transformation on an attribute or attachment value and stores the transformed document in an attribute or attachment.

Configuration

Property	Data Type	Read Type	Description
inputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName	String	runtime	The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName	String	runtime	The name of the output attachment or the path to the output attribute (store result as literals of attribute).
xslFile	String	runtime	The name (with relative or absolute path) of the XSL file to be used for transformation.
parameters	Map or Boolean	runtime	Either a map of XSL parameters or a boolean that indicates to add all attributes as XSL parameters.

Example

Pipelet configuration for XslTransformationPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="xslFile">./configuration/data/author.xsl<rec:Val>
</proc:configuration>

`org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet`

Description

This pipelet extracts elements selected by XPath, converts them to appropriate data types (Boolean, Double, String), and stores the transformed value in an attribute or attachment.

Configuration

Property	Data Type	Read Type	Description
inputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName	String	runtime	The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName	String	runtime	The name of the output attachment or the path to the output attribute (store result as literals of attribute).
xpath	String	runtime	The XPath expression to be evaluated.
separator	String	runtime	The optional separator.
namespace	String	runtime	The optional XML namespace.

Example

Pipelet configuration for XPathExtractorPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="xpath">author/email<rec:Val>
    <rec:Val key="separator"><rec:Val>
    <rec:Val key="namespace"><rec:Val>
</proc:configuration>

`org.eclipse.smila.processing.pipelets.xmlprocessing.XPathFilterPipelet`

Description

This pipelet filters elements by XPath expressions (either using include or exclude mode) and stores the filtered elements as a new document in an attribute or attachment.

Configuration

Property	Data Type	Read Type	Description
inputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName	String	runtime	The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName	String	runtime	The name of the output attachment or the path to the output attribute (store result as literals of attribute).
xpath	String	runtime	The XPath expressions to be evaluated (multi-valued property).
filterMode	String : INCLUDE, EXCLUDE	runtime	The filter mode, defining whether to include or exclude the elements matched by the XPath expressions.
namespace	String	runtime	The optional XML namespace.

Examples

Pipelet configuration for XPathFilterPipelet with multi-valued xpath

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Seq key="xpath">
        <rec:Val>author/email<rec:Val>
        <rec:Val>author/name<rec:Val>
    </rec:Seq>
    <rec:Val key="filterMode">EXCLUDE<rec:Val>
    <rec:Val key="seperator"><rec:Val>
    <rec:Val key="namespace"><rec:Val>
</proc:configuration>

Pipelet configuration for XPathFilterPipelet with single-valued xpath

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="xpath">author/email<rec:Val>
    <rec:Val key="filterMode">EXCLUDE<rec:Val>
    <rec:Val key="seperator"><rec:Val>
    <rec:Val key="namespace"><rec:Val>
</proc:configuration>

`org.eclipse.smila.processing.pipelets.xmlprocessing.RemoveElementFromXMLPipelet`

Description

This pipelet removes a selected element from an XML document and stores the manipulated document in an attribute or attachment.

Configuration

Property	Data Type	Read Type	Description
inputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName	String	runtime	The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName	String	runtime	The name of the output attachment or the path to the output attribute (store result as literals of attribute).
elementId	String	runtime	The ID of the XML element to be removed.

Example

Pipelet configuration for RemoveElementFromXMLPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="elementId">title<rec:Val>
</proc:configuration>

`org.eclipse.smila.processing.pipelets.xmlprocessing.TidyPipelet`

Description

This pipelet performs a Tidy transformation on an attribute or attachment value and stores the result in an attribute or attachment.

Configuration

Property	Data Type	Read Type	Description
inputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the XML input is found in an attachment or in an attribute of the record.
outputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
inputName	String	runtime	The name of the input attachment or the path to the input attribute (process literals of attribute).
outputName	String	runtime	The name of the output attachment or the path to the output attribute (store result as literals of attribute).
tidyFile	String	init	The name (with relative or absolute path) of the Tidy configuration file to be used by the transformation.

Example

Pipelet configuration for TidyPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
    <rec:Val key="inputName">xmlIn<rec:Val>
    <rec:Val key="outputName">xmlOut<rec:Val>
    <rec:Val key="tidyFile">./configuration/data/tidy_config.txt<rec:Val>
</proc:configuration>

`org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet`

Description

This pipelet splits an XML stream into multiple XML snippets. For each snippet a new record is created where the XML snippet is stored in either an attribute or attachment. The created records are not returned as a PipeletResult (this is just the same as the incoming RecordIds) but are directly sent to the ConnectivityManager and are routed once more to the queue.

On each created record the attribute __isXmlSnippet=true is set to true. Incoming records with this attribute set are not splitted again, but returned as the Pipelet result. This way it's possible to add further processing steps that should be done on the splitted records to the same pipeline that does the splitting. See XmlSplitAndAddPipeline.bpel for an example.

Namespaces visible in its scope are added to each snippet, thus the result is a valid XML document.

Configuration

Property	Type	Description
inputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the XML input is found in an attachment or in an attribute of the record. An input attribute is not interpreted as to contain XML content itself but rather as a file path or an URL to the XML document.
outputType	String : ATTACHMENT, ATTRIBUTE	runtime	Defines whether the XML snippet should be stored in an attachment or in an attribute of the newly created record.
inputName	String	runtime	The name of the input attachment or the path to the input attribute containing a path to an external data source, e.g. an xml file name
outputName	String	runtime	The name of the output attachment or the path to the output attribute (store result as literals of attribute).
beginTagName	String	runtime	The name of the tag to start the XML snippet with.
beginTagNamespace	String	runtime	The namespace of the start tag. Namespaces are not checked, if not given (in that case any namespace matches).
endTagName	String	runtime	The name of the tag to end the xml snippet with, defaults to the value of beginTagName
endTagNamespace	String	runtime	The namespace of the end tag, defaults to the value of beginTagNamespace.
keyTagName	String	runtime	The name of the tag used to create a record ID.
maxBufferSize	Integer	runtime	The maximum size of the internal record buffer (optional, default is 20).
idSeparator	String	runtime	The separator used to create the record IDs of the split records (optional, default is "#").
xmlSnippetJobName	String	runtime	The JobManager job name to submit the split records to. It must be running when the pipelet is executed.

The first four attributes can be set only in the pipelet configuration. All other attributes can seperately customized for each single record by setting them as values of the the _parameters map in the record.

Example

Pipelet configuration for XmlSplitterPipelet

<proc:configuration>
    <rec:Val key="inputType">ATTRIBUTE</rec:Val>
    <rec:Val key="outputType">ATTRIBUTE</rec:Val>
    <rec:Val key="inputName">xmlIn</rec:Val>
    <rec:Val key="outputName">xmlOut</rec:Val>
    <rec:Val key="beginTagName">document</rec:Val>
    <rec:Val key="keyTagName">docId</rec:Val>
    <rec:Val key="idSeparator">#</rec:Val>
    <rec:Val key="xmlSnippetJobName">indexUpdateXml</rec:Val>
</proc:configuration>

The above configuration would split this XML format:

<sampleCollection>
    ...
    <document>
        <docId>4711</docId>
        <title>Some title</title>
        ...
        <text>Some text</text>
    </document>
    <document>
        <docId>0815</docId>  
        ...
    </document>
    ...
</sampleCollection>

into XML snippets like this one:

<document>
    <docId>4711</docId>
    <title>Some title</title>
    ...
    <text>Some text</text>
</document>

And for each snippet a record would be created and submitted to a job run of job "indexUpdateXml".

<Record xmlns="http://www.eclipse.org/smila/record" version="2.0">
  <Val key="_recordid">xmlsplitter:someBigXmlfile.xml#4711</Val>
  <Val key="_source">xmlsplitter</Val>
  <Val key="__isXmlSnippet">true</Val>
  <Val key="xmlOut">
         <document>
            <docId>4711</docId>
            <title>Some title</title>
            ...
            <text>Some text</text>
        </document>   
  </Val>
</Record>

To use the XmlSplitAndAddPipeline.bpel via the predefined asynchronous workflow indexUpdateXml you should create a job definition that sets the "xmlSnippetJobName" to the job's own name:

{
  "name": "indexUpdateXmlJob",
  "workflow": "indexUpdateXml",
  "parameters": {
    "tempStore": "xmlbulks",
    "xmlSnippetJobName": "indexUpdateXmlJob"
  }
}

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "SMILA/Documentation/Bundle org.eclipse.smila.processing.pipelets.xmlprocessing"

Revision as of 05:14, 27 March 2012

Contents

General

`org.eclipse.smila.processing.pipelets.xmlprocessing.XslTransformationPipelet`

Description

Configuration

Example

`org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet`

Description

Configuration

Example

`org.eclipse.smila.processing.pipelets.xmlprocessing.XPathFilterPipelet`

Description

Configuration

Examples

`org.eclipse.smila.processing.pipelets.xmlprocessing.RemoveElementFromXMLPipelet`

Description

Configuration

Example

`org.eclipse.smila.processing.pipelets.xmlprocessing.TidyPipelet`

Description

Configuration

Example

`org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet`

Description

Configuration

Example

@@ Line 1: / Line 1: @@
+== General ==
+All pipelets in this bundle support the configurable error handling as described in [[SMILA/Development_Guidelines/How_to_write_a_Pipelet#Implementation]]. When used in jobmanager workflows, records causing errors are dropped.
+''' Read Type '''
+* ''runtime'': Parameters are read when processing records. Parameter value can be set per Record.
+* ''init'': Parameters are read once from Pipelet configuration when initializing the Pipelet. Parameter value can not be overwritten in Record.
 == <tt>org.eclipse.smila.processing.pipelets.xmlprocessing.XslTransformationPipelet</tt> ==
 === Description ===
-Pipelet that performs an XSL transformation on an attribute or attachment value and stores the transformed document in an attribute or attachment.
+This pipelet performs an XSL transformation on an attribute or attachment value and stores the transformed document in an attribute or attachment.
 === Configuration ===
 {| border = 1
-!Property!!Type!!Description
+!Property!!Data Type!!Read Type!!Description
 |-
-|inputType||String : ATTACHMENT, ATTRIBUTE||selects if the XML input is found in an attachment or attribute of the record
+|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.
 |-
-|outputType||String : ATTACHMENT, ATTRIBUTE||selects if the transformed output should be stored in an attachment or attribute of the record
+|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
 |-
-|inputName||String||name of input attachment or path to input attribute (process literals of attribute)
+|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).
 |-
-|outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute)
+|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
 |-
-|xslFile||String||the name (with relative or absolute path) of the XSL file to use for transformation
+|''xslFile''||String||runtime||The name (with relative or absolute path) of the XSL file to be used for transformation.
+|-
+|''parameters''||Map or Boolean||runtime||Either a map of XSL parameters or a boolean that indicates to add all attributes as XSL parameters.
 |}
 ==== Example ====
-'''PipeletConfiguration for XslTransformationPipelet'''
+'''Pipelet configuration for XslTransformationPipelet'''
 <source lang="xml">
-<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
+<proc:configuration>
-    <Property name="xslFile" type="java.lang.String">
+     <rec:Val key="inputType">ATTRIBUTE<rec:Val>
-        <Value>./configuration/data/author.xsl</Value>
+     <rec:Val key="outputType">ATTRIBUTE<rec:Val>
-     </Property>
+     <rec:Val key="inputName">xmlIn<rec:Val>
-    <Property name="inputType" type="java.lang.String">
+     <rec:Val key="outputName">xmlOut<rec:Val>
-        <Value>ATTRIBUTE</Value>
+     <rec:Val key="xslFile">./configuration/data/author.xsl<rec:Val>
-     </Property>
+</proc:configuration>
-    <Property name="outputType" type="java.lang.String">
-        <Value>ATTRIBUTE</Value>
-     </Property>
-    <Property name="inputName" type="java.lang.String">
-        <Value>xmlIn</Value>
-     </Property>
-    <Property name="outputName" type="java.lang.String">
-        <Value>xmlOut</Value>
-     </Property>
-</PipeletConfiguration>
 </source>
 == <tt>org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet</tt> ==
@@ Line 49: / Line 49: @@
 === Description ===
-Pipelet that extracts elements selected by XPath, converts them in appropriate data types (Boolean, Double, String) and stores the transformed value in an attribute or attachment.
+This pipelet extracts elements selected by XPath, converts them to appropriate data types (Boolean, Double, String), and stores the transformed value in an attribute or attachment.
 === Configuration ===
 {| border = 1
-!Property!!Type!!Description
+!Property!!Data Type!!Read Type!!Description
 |-
-|inputType||String : ATTACHMENT, ATTRIBUTE||selects if the XML input is found in an attachment or attribute of the record
+|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.
 |-
-|outputType||String : ATTACHMENT, ATTRIBUTE||selects if the transformed output should be stored in an attachment or attribute of the record
+|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
 |-
-|inputName||String||name of input attachment or path to input attribute (process literals of attribute)
+|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).
 |-
-|outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute)
+|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
 |-
-|xpath||String||the XPATH to evaluate
+|''xpath''||String||runtime||The XPath expression to be evaluated.
 |-
-|seperator||String||the seperator (optional)
+|''separator''||String||runtime||The optional separator.
 |-
-|namespace||String||the XML namespace (optional)
+|''namespace''||String||runtime||The optional XML namespace.
 |}
 ==== Example ====
-'''PipeletConfiguration for XPathExtractorPipelet'''
+'''Pipelet configuration for XPathExtractorPipelet'''
 <source lang="xml">
-<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
+<proc:configuration>
-	<Property name="xpath" type="java.lang.String">
+    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
-		<Value>author/email</Value>
+    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
-	</Property>
+    <rec:Val key="inputName">xmlIn<rec:Val>
-	<Property name="seperator" type="java.lang.String">
+    <rec:Val key="outputName">xmlOut<rec:Val>
-		<Value></Value>
+    <rec:Val key="xpath">author/email<rec:Val>
-	</Property>
+    <rec:Val key="separator"><rec:Val>
-	<Property name="namespace" type="java.lang.String">
+    <rec:Val key="namespace"><rec:Val>
-		<Value></Value>
+</proc:configuration>
-	</Property>
-	<Property name="inputType" type="java.lang.String">
-		<Value>ATTRIBUTE</Value>
-	</Property>
-	<Property name="outputType" type="java.lang.String">
-		<Value>ATTRIBUTE</Value>
-	</Property>
-	<Property name="inputName" type="java.lang.String">
-		<Value>xmlIn</Value>
-	</Property>
-	<Property name="outputName" type="java.lang.String">
-		<Value>xmlOut</Value>
-	</Property>
-</PipeletConfiguration>
 </source>
@@ Line 106: / Line 92: @@
 === Description ===
-Pipelet that filters elements by XPath (either include or exclude mode) and stores the filtered elements as a new document in an attribute or attachment.
+This pipelet filters elements by XPath expressions (either using include or exclude mode) and stores the filtered elements as a new document in an attribute or attachment.
 === Configuration ===
 {| border = 1
-!Property!!Type!!Description
+!Property!!Data Type!!Read Type!!Description
 |-
-|inputType||String : ATTACHMENT, ATTRIBUTE||selects if the XML input is found in an attachment or attribute of the record
+|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.
 |-
-|outputType||String : ATTACHMENT, ATTRIBUTE||selects if the transformed output should be stored in an attachment or attribute of the record
+|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
 |-
-|inputName||String||name of input attachment or path to input attribute (process literals of attribute)
+|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).
 |-
-|outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute)
+|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
 |-
-|xpath||String||the XPATHs to evaluate (multivalue)
+|''xpath''||String||runtime||The XPath expressions to be evaluated (multi-valued property).
 |-
-|filterMode||String : INCLUDE, EXCLUDE||the filter mode, if to include or exclude the elements specified by xpath
+|''filterMode''||String : INCLUDE, EXCLUDE||runtime||The filter mode, defining whether to include or exclude the elements matched by the XPath expressions.
 |-
-|seperator||String||the seperator (optional)
+|''namespace''||String||runtime||The optional XML namespace.
-|-
-|namespace||String||the XML namespace (optional)
 |}
-==== Example ====
+==== Examples ====
-'''PipeletConfiguration for XPathFilterPipelet'''
+'''Pipelet configuration for XPathFilterPipelet with multi-valued xpath'''
 <source lang="xml">
-<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
+<proc:configuration>
-	<Property name="xpath" type="java.lang.String">
+    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
-		<Value>author/name</Value>
+    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
-	</Property>
+    <rec:Val key="inputName">xmlIn<rec:Val>
-	<Property name="xpath" type="java.lang.String">
+    <rec:Val key="outputName">xmlOut<rec:Val>
-		<Value>author/email</Value>
+    <rec:Seq key="xpath">
-	</Property>
+        <rec:Val>author/email<rec:Val>
-	<Property name="filterMode" type="java.lang.String">
+        <rec:Val>author/name<rec:Val>
-		<Value>EXCLUDE</Value>
+    </rec:Seq>
-	</Property>
+    <rec:Val key="filterMode">EXCLUDE<rec:Val>
-	<Property name="seperator" type="java.lang.String">
+    <rec:Val key="seperator"><rec:Val>
-		<Value></Value>
+    <rec:Val key="namespace"><rec:Val>
-	</Property>
+</proc:configuration>
-	<Property name="namespace" type="java.lang.String">
+</source>
-		<Value></Value>
-	</Property>
+'''Pipelet configuration for XPathFilterPipelet with single-valued xpath'''
-	<Property name="inputType" type="java.lang.String">
+<source lang="xml">
-		<Value>ATTRIBUTE</Value>
+<proc:configuration>
-	</Property>
+    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
-	<Property name="outputType" type="java.lang.String">
+    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
-		<Value>ATTRIBUTE</Value>
+    <rec:Val key="inputName">xmlIn<rec:Val>
-	</Property>
+    <rec:Val key="outputName">xmlOut<rec:Val>
-	<Property name="inputName" type="java.lang.String">
+    <rec:Val key="xpath">author/email<rec:Val>
-		<Value>xmlIn</Value>
+    <rec:Val key="filterMode">EXCLUDE<rec:Val>
-	</Property>
+    <rec:Val key="seperator"><rec:Val>
-	<Property name="outputName" type="java.lang.String">
+    <rec:Val key="namespace"><rec:Val>
-		<Value>xmlOut</Value>
+</proc:configuration>
-	</Property>
-</PipeletConfiguration>
 </source>
@@ Line 171: / Line 153: @@
 === Description ===
-Pipelet that removes a selected element from an XML document and stores the remaining document in an attribute or attachment.
+This pipelet removes a selected element from an XML document and stores the manipulated document in an attribute or attachment.
 === Configuration ===
 {| border = 1
-!Property!!Type!!Description
+!Property!!Data Type!!Read Type!!Description
 |-
-|inputType||String : ATTACHMENT, ATTRIBUTE||selects if the XML input is found in an attachment or attribute of the record
+|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.
 |-
-|outputType||String : ATTACHMENT, ATTRIBUTE||selects if the transformed output should be stored in an attachment or attribute of the record
+|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
 |-
-|inputName||String||name of input attachment or path to input attribute (process literals of attribute)
+|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).
 |-
-|outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute)
+|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
 |-
-|elementId||String||the id of the XML element to remove
+|''elementId''||String||runtime||The ID of the XML element to be removed.
 |}
 ==== Example ====
-'''PipeletConfiguration for RemoveElementFromXMLPipelet'''
+'''Pipelet configuration for RemoveElementFromXMLPipelet'''
 <source lang="xml">
-<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
+<proc:configuration>
-	<Property name="elementId" type="java.lang.String">
+    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
-		<Value>1</Value>
+    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
-	</Property>
+    <rec:Val key="inputName">xmlIn<rec:Val>
-	<Property name="inputType" type="java.lang.String">
+    <rec:Val key="outputName">xmlOut<rec:Val>
-		<Value>ATTACHMENT</Value>
+    <rec:Val key="elementId">title<rec:Val>
-	</Property>
+</proc:configuration>
-	<Property name="outputType" type="java.lang.String">
-		<Value>ATTACHMENT</Value>
-	</Property>
-	<Property name="inputName" type="java.lang.String">
-		<Value>xmlIn</Value>
-	</Property>
-	<Property name="outputName" type="java.lang.String">
-		<Value>xmlOut</Value>
-	</Property>
-</PipeletConfiguration>
 </source>
@@ Line 218: / Line 190: @@
 === Description ===
-Pipelet that performs a Tidy transformation on an attribute or attachment value and stores the result in an attribute or attachment.
+This pipelet performs a Tidy transformation on an attribute or attachment value and stores the result in an attribute or attachment.
 === Configuration ===
 {| border = 1
-!Property!!Type!!Description
+!Property!!Data Type!!Read Type!!Description
 |-
-|inputType||String : ATTACHMENT, ATTRIBUTE||selects if the XML input is found in an attachment or attribute of the record
+|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record.
 |-
-|outputType||String : ATTACHMENT, ATTRIBUTE||selects if the transformed output should be stored in an attachment or attribute of the record
+|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the transformed output should be stored in an attachment or in an attribute of the record.
 |-
-|inputName||String||name of input attachment or path to input attribute (process literals of attribute)
+|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute (process literals of attribute).
 |-
-|outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute)
+|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
 |-
-|tidyFile||String||the name (with relative or absolute path) of the tidy configuration file to use
+|''tidyFile''||String||init||The name (with relative or absolute path) of the Tidy configuration file to be used by the transformation.
 |}
 ==== Example ====
-'''PipeletConfiguration for TidyPipelet'''
+'''Pipelet configuration for TidyPipelet'''
 <source lang="xml">
-<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
+<proc:configuration>
-	<Property name="tidyFile" type="java.lang.String">
+    <rec:Val key="inputType">ATTRIBUTE<rec:Val>
-		<Value>./configuration/data/tidy_config.txt</Value>
+    <rec:Val key="outputType">ATTRIBUTE<rec:Val>
-	</Property>
+    <rec:Val key="inputName">xmlIn<rec:Val>
-	<Property name="inputType" type="java.lang.String">
+    <rec:Val key="outputName">xmlOut<rec:Val>
-		<Value>ATTRIBUTE</Value>
+    <rec:Val key="tidyFile">./configuration/data/tidy_config.txt<rec:Val>
-	</Property>
+</proc:configuration>
-	<Property name="outputType" type="java.lang.String">
-		<Value>ATTRIBUTE</Value>
-	</Property>
-	<Property name="inputName" type="java.lang.String">
-		<Value>xmlIn</Value>
-	</Property>
-	<Property name="outputName" type="java.lang.String">
-		<Value>xmlOut</Value>
-	</Property>
-</PipeletConfiguration>
 </source>
 == <tt>org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet</tt> ==
@@ Line 266: / Line 225: @@
 === Description ===
-Pipelet that splits a XML stream into multiple xml snippets. For each snippet a new Record is created where the XML snippet is stored in either an attribute or attachment. The created records are not returned as a PipeletResult (this is just the same as the incoming RecordIds) but are directly send to the ConnectivityManager and are routed once more to the Queue.
+This pipelet splits an XML stream into multiple XML snippets. For each snippet a new record is created where the XML snippet is stored in either an attribute or attachment. The created records are not returned as a PipeletResult (this is just the same as the incoming RecordIds) but are directly sent to the ConnectivityManager and are routed once more to the queue.
-On each created record the Annotation <tt>MessageProperties</tt> is set with the key value pair <tt>isXmlSnippet</tt>=<tt>true</tt>. This can  be used in Listener rules to select for XML snippets to process.
+On each created record the attribute <tt>__isXmlSnippet</tt>=<tt>true</tt> is set to true. Incoming records with this attribute set are not splitted again, but returned as the Pipelet result. This way it's possible to add further processing steps that should be done on the splitted records to the same pipeline that does the splitting. See [https://dev.eclipse.org/svnroot/rt/org.eclipse.smila/trunk/core/SMILA.application/configuration/org.eclipse.smila.processing.bpel/pipelines/XmlSplitAndAddPipeline.bpel XmlSplitAndAddPipeline.bpel] for an example.
+Namespaces visible in its scope are added to each snippet, thus the result is a valid XML document.
 === Configuration ===
@@ Line 275: / Line 236: @@
 !Property!!Type!!Description
 |-
-|inputType||String : ATTACHMENT, ATTRIBUTE||selects if the XML input is found in an attachment or attribute of the record. An input Attribute is not interpreted as content but as a file path or an URL to the XML document.
+|''inputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML input is found in an attachment or in an attribute of the record. An input attribute is not interpreted as to contain XML content itself but rather as a file path or an URL to the XML document.
+|-
+|''outputType''||String : ATTACHMENT, ATTRIBUTE||runtime||Defines whether the XML snippet should be stored in an attachment or in an attribute of the newly created record.
+|-
+|''inputName''||String||runtime||The name of the input attachment or the path to the input attribute containing a path to an external data source, e.g. an  xml file name
 |-
-|outputType||String : ATTACHMENT, ATTRIBUTE||selects if the XML snippet should be stored in an attachment or attribute of the newly created record
+|''outputName''||String||runtime||The name of the output attachment or the path to the output attribute (store result as literals of attribute).
 |-
-|inputName||String||name of input attachment or path to input attribute.
+|''beginTagName''||String||runtime||The name of the tag to start the XML snippet with.
 |-
-|outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute)
+|''beginTagNamespace''||String||runtime||The namespace of the start tag. Namespaces are not checked, if not given (in that case any namespace matches).
 |-
-|beginTagName||String|| the name of the tag to start the xml snippet
+|''endTagName''||String||runtime||The name of the tag to end the xml snippet with, defaults to the value of ''beginTagName''
 |-
-|isBeginClosingTag||Boolean|| flag if the beginTagName is a closing tag (true) or not (false)
+|''endTagNamespace''||String||runtime||The namespace of the end tag, defaults to the value of ''beginTagNamespace''.
 |-
-|endTagName||String|| the name of the tag to end the xml snippet
+|''keyTagName''||String||runtime||The name of the tag used to create a record ID.
 |-
-isEndClosingTag||Boolean|| flag if the endTagName is a closing tag (true) or not (false)
+|''maxBufferSize''||Integer||runtime||The maximum size of the internal record buffer (optional, default is 20).
 |-
-|keyTagName||String|| the name of the tag used to create a record id
+|''idSeparator''||String||runtime||The separator used to create the record IDs of the split records (optional, default is "#").
 |-
-|maxBufferSize||Integer|| the maximum size of the internal record buffer (optional, default is 20)
+|''xmlSnippetJobName''||String||runtime||The JobManager job name to submit the split records to. It must be running when the pipelet is executed.
 |}
+The first four attributes can be set only in the pipelet configuration. All other attributes can seperately customized for each single record by setting them as values of the the <tt>_parameters</tt> map in the record.
 ==== Example ====
-'''PipeletConfiguration for XmlSplitterPipelet'''
+'''Pipelet configuration for XmlSplitterPipelet'''
 <source lang="xml">
-<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
+<proc:configuration>
-     <Property name="inputType" type="java.lang.String">
+     <rec:Val key="inputType">ATTRIBUTE</rec:Val>
-        <Value>ATTRIBUTE</Value>
+     <rec:Val key="outputType">ATTRIBUTE</rec:Val>
-     </Property>
+     <rec:Val key="inputName">xmlIn</rec:Val>
-    <Property name="outputType" type="java.lang.String">
+     <rec:Val key="outputName">xmlOut</rec:Val>
-        <Value>ATTRIBUTE</Value>
+     <rec:Val key="beginTagName">document</rec:Val>
-     </Property>
+     <rec:Val key="keyTagName">docId</rec:Val>
-    <Property name="inputName" type="java.lang.String">
+     <rec:Val key="idSeparator">#</rec:Val>
-        <Value>Path</Value>
+     <rec:Val key="xmlSnippetJobName">indexUpdateXml</rec:Val>
-     </Property>
+</proc:configuration>
-    <Property name="outputName" type="java.lang.String">
-        <Value>Content</Value>
-     </Property>
-    <Property name="beginTagName" type="java.lang.String">
-        <Value>doc</Value>
-     </Property>
-    <Property name="isBeginClosingTag" type="java.lang.Boolean">
-        <Value>false</Value>
-     </Property>
-    <Property name="endTagName" type="java.lang.String">
-        <Value>doc</Value>
-     </Property>
-    <Property name="isEndClosingTag" type="java.lang.Boolean">
-        <Value>true</Value>
-    </Property>
-    <Property name="keyTagName" type="java.lang.String">
-        <Value>docId</Value>
-    </Property>
-</PipeletConfiguration>
 </source>
-The former configuration would split the following XML format
+The above configuration would split this XML format:
 <source lang="xml">
 <sampleCollection>
@@ Line 350: / Line 298: @@
 </source>
-into XML snippets like this one
+into XML snippets like this one:
 <source lang="xml">
 <document>
@@ Line 360: / Line 308: @@
 </source>
-And each for each snippet a record would be created:
+And for each snippet a record would be created and submitted to a job run of job "indexUpdateXml".
 <source lang="xml">
-<Record version="1.0">
+<Record xmlns="http://www.eclipse.org/smila/record" version="2.0">
-  <Id version="1.0" xmlns="http://www.eclipse.org/smila/id">
+  <Val key="_recordid">xmlsplitter:someBigXmlfile.xml#4711</Val>
-    < Source>xmlsplitter</ Source>
+   <Val key="_source">xmlsplitter</Val>
-    <Key name="Path">someeBigXmlfile.xml</Key>
+  <Val key="__isXmlSnippet">true</Val>
-    <id:Fragment>4711</id:Fragment>
+   <Val key="xmlOut">
-  </Id>
+         <document>
-   <A n="MessageProperties">
-      <V n="isXmlSnippet">true</V>
-   </A>
-  <A n="Content">
-    <L>
-      <V>
-        <document>
              <docId>4711</docId>
              <title>Some title</title>
@@ Line 381: / Line 322: @@
              <text>Some text</text>
          </document>
-      </V>
+   </Val>
-    </L>
-   </A>
 </Record>
 </source>
+To use the [https://dev.eclipse.org/svnroot/rt/org.eclipse.smila/trunk/core/SMILA.application/configuration/org.eclipse.smila.processing.bpel/pipelines/XmlSplitAndAddPipeline.bpel XmlSplitAndAddPipeline.bpel] via the predefined asynchronous workflow <tt>indexUpdateXml</tt> you should create a job definition that sets the "xmlSnippetJobName" to the job's own name:
-The Listener rules to split the XML files and to process the XML snippets could look like this:
+<source lang="javascript">
-<source lang="xml">
+{
- <Rule Name="Splitter Rule" WaitMessageTimeout="10" Threads="2" MaxMessageBlockSize="1">
+  "name": "indexUpdateXmlJob",
-     <Source BrokerId="broker1" Queue="SMILA.connectivity"/>
+  "workflow": "indexUpdateXml",
-     <Condition>Operation='ADD' and DataSourceID LIKE '%xmlsplitting%' and NOT(isXmlSnippet='true')</Condition>
+  "parameters": {
-    <Task>
+     "tempStore": "xmlbulks",
-      <Process Workflow="SplitterPipeline"/>
+     "xmlSnippetJobName": "indexUpdateXmlJob"
-    </Task>
+  }
-  </Rule>
+}
+</source>
- <Rule Name="Snippet Rule" WaitMessageTimeout="10" Threads="4" MaxMessageBlockSize="20">
-    <Source BrokerId="broker1" Queue="SMILA.connectivity"/>
-    <Condition>Operation='ADD' and DataSourceID LIKE '%xmlsplitting%' and isXmlSnippet='true'</Condition>
-    <Task>
-      <Process Workflow="Snippetipeline"/>
-    </Task>
-  </Rule>
-</source>
 [[Category:SMILA]]  [[Category:SMILA/Pipelet]]

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "SMILA/Documentation/Bundle org.eclipse.smila.processing.pipelets.xmlprocessing"

Revision as of 05:14, 27 March 2012

Contents

General

org.eclipse.smila.processing.pipelets.xmlprocessing.XslTransformationPipelet

Description

Configuration

Example

org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet

Description

Configuration

Example

org.eclipse.smila.processing.pipelets.xmlprocessing.XPathFilterPipelet

Description

Configuration

Examples

org.eclipse.smila.processing.pipelets.xmlprocessing.RemoveElementFromXMLPipelet

Description

Configuration

Example

org.eclipse.smila.processing.pipelets.xmlprocessing.TidyPipelet

Description

Configuration

Example

org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet

Description

Configuration

Example

`org.eclipse.smila.processing.pipelets.xmlprocessing.XslTransformationPipelet`

`org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet`

`org.eclipse.smila.processing.pipelets.xmlprocessing.XPathFilterPipelet`

`org.eclipse.smila.processing.pipelets.xmlprocessing.RemoveElementFromXMLPipelet`

`org.eclipse.smila.processing.pipelets.xmlprocessing.TidyPipelet`

`org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet`