Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Bundle org.eclipse.smila.processing.pipelets.xmlprocessing"

Line 259: Line 259:
 
</source>
 
</source>
  
 +
 +
 +
 +
== <tt>org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet</tt> ==
 +
 +
=== Description ===
 +
 +
Pipelet that splits a XML stream into multiple xml snippets. For each snippet a new Record is created where the XML snippet is stored in either an attribute or attachment. The created records are not returned as a PipeletResult (this is just the same as the incoming RecordIds) but are directly send to the ConnectivityManager and are routed once more to the Queue.
 +
 +
On each created record the Annotation <tt>MessageProperties</tt> is set with the key value pair <tt>isXmlSnippet</tt>=<tt>true</tt>. This can  be used in Listener rules to select for XML snippets to process.
 +
 +
=== Configuration ===
 +
 +
{| border = 1
 +
!Property!!Type!!Description
 +
|-
 +
|inputType||String : ATTACHMENT, ATTRIBUTE||selects if the XML input is found in an attachment or attribute of the record. An input Attribute is not interpreted as content but as a file path or an URL to the XML document.
 +
|-
 +
|outputType||String : ATTACHMENT, ATTRIBUTE||selects if the XML snippet should be stored in an attachment or attribute of the newly created record
 +
|-
 +
|inputName||String||name of input attachment or path to input attribute.
 +
|-
 +
|outputName||String|| name of output attachment or path to output attribute (store result as literals of attribute)
 +
|-
 +
|beginTagName||String|| the name of the tag to start the xml snippet
 +
|-
 +
|isBeginClosingTag||Boolean|| flag if the beginTagName is a closing tag (true) or not (false)
 +
|-
 +
|endTagName||String|| the name of the tag to end the xml snippet
 +
|-
 +
isEndClosingTag||Boolean|| flag if the endTagName is a closing tag (true) or not (false)
 +
|-
 +
|keyTagName||String|| the name of the tag used to create a record id
 +
|-
 +
|maxBufferSize||Integer|| the maximum size of the internal record buffer (optional, default is 20)
 +
|}
 +
 +
==== Example ====
 +
 +
'''PipeletConfiguration for XmlSplitterPipelet'''
 +
<source lang="xml">
 +
<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
 +
    <Property name="inputType" type="java.lang.String">
 +
        <Value>ATTRIBUTE</Value>
 +
    </Property>      
 +
    <Property name="outputType" type="java.lang.String">
 +
        <Value>ATTRIBUTE</Value>
 +
    </Property>
 +
    <Property name="inputName" type="java.lang.String">
 +
        <Value>Path</Value>
 +
    </Property>
 +
    <Property name="outputName" type="java.lang.String">
 +
        <Value>Content</Value>
 +
    </Property>
 +
    <Property name="beginTagName" type="java.lang.String">
 +
        <Value>doc</Value>
 +
    </Property>
 +
    <Property name="isBeginClosingTag" type="java.lang.Boolean">
 +
        <Value>false</Value>
 +
    </Property>
 +
    <Property name="endTagName" type="java.lang.String">
 +
        <Value>doc</Value>
 +
    </Property>
 +
    <Property name="isEndClosingTag" type="java.lang.Boolean">
 +
        <Value>true</Value>
 +
    </Property>
 +
    <Property name="keyTagName" type="java.lang.String">
 +
        <Value>docId</Value>
 +
    </Property>
 +
</PipeletConfiguration>
 +
</source>
 +
 +
 +
The former configuration would split the following XML format
 +
<source lang="xml">
 +
<sampleCollection>
 +
    ...
 +
    <document>
 +
        <docId>4711</docId>
 +
        <title>Some title</title>
 +
        ...
 +
        <text>Some text</text>
 +
    </document>
 +
    <document>
 +
        <docId>0815</docId> 
 +
        ...
 +
    </document>
 +
    ...
 +
</sampleCollection>
 +
</source>
 +
 +
into XML snippets like this one
 +
<source lang="xml">
 +
<document>
 +
    <docId>4711</docId>
 +
    <title>Some title</title>
 +
    ...
 +
    <text>Some text</text>
 +
</document>
 +
</source>
 +
 +
And each for each snippet a record would be created:
 +
 +
<source lang="xml">
 +
<Record version="1.0">
 +
  <Id version="1.0" xmlns="http://www.eclipse.org/smila/id">
 +
    < Source>xmlsplitter</ Source>
 +
    <Key name="Path">someeBigXmlfile.xml</Key>
 +
    <id:Fragment>4711</id:Fragment>
 +
  </Id> 
 +
  <A n="MessageProperties">
 +
      <V n="isXmlSnippet">true</V>
 +
  </A>
 +
  <A n="Content">
 +
    <L>
 +
      <V>
 +
        <document>
 +
            <docId>4711</docId>
 +
            <title>Some title</title>
 +
            ...
 +
            <text>Some text</text>
 +
        </document> 
 +
      </V>
 +
    </L>
 +
  </A> 
 +
</Record>
 +
</source>
 +
 +
 +
The Listener rules to split the XML files and to process the XML snippets could look like this:
 +
<source lang="xml">
 +
<Rule Name="Splitter Rule" WaitMessageTimeout="10" Threads="2" MaxMessageBlockSize="1">
 +
    <Source BrokerId="broker1" Queue="SMILA.connectivity"/>
 +
    <Condition>Operation='ADD' and DataSourceID LIKE '%xmlsplitting%' and NOT(isXmlSnippet='true')</Condition>
 +
    <Task>
 +
      <Process Workflow="SplitterPipeline"/>
 +
    </Task>
 +
  </Rule>   
 +
 +
<Rule Name="Snippet Rule" WaitMessageTimeout="10" Threads="4" MaxMessageBlockSize="20">
 +
    <Source BrokerId="broker1" Queue="SMILA.connectivity"/>
 +
    <Condition>Operation='ADD' and DataSourceID LIKE '%xmlsplitting%' and isXmlSnippet='true'</Condition>
 +
    <Task>
 +
      <Process Workflow="Snippetipeline"/>
 +
    </Task>
 +
  </Rule>
 +
</source>
 
[[Category:SMILA]]  [[Category:SMILA/Pipelet]]
 
[[Category:SMILA]]  [[Category:SMILA/Pipelet]]

Revision as of 11:49, 14 July 2009

org.eclipse.smila.processing.pipelets.xmlprocessing.XslTransformationPipelet

Description

Pipelet that performs an XSL transformation on an attribute or attachment value and stores the transformed document in an attribute or attachment.

Configuration

Property Type Description
inputType String : ATTACHMENT, ATTRIBUTE selects if the XML input is found in an attachment or attribute of the record
outputType String : ATTACHMENT, ATTRIBUTE selects if the transformed output should be stored in an attachment or attribute of the record
inputName String name of input attachment or path to input attribute (process literals of attribute)
outputName String name of output attachment or path to output attribute (store result as literals of attribute)
xslFile String the name (with relative or absolute path) of the XSL file to use for transformation

Example

PipeletConfiguration for XslTransformationPipelet

<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
    <Property name="xslFile" type="java.lang.String">
        <Value>./configuration/data/author.xsl</Value>
    </Property>
    <Property name="inputType" type="java.lang.String">
        <Value>ATTRIBUTE</Value>
    </Property>	
    <Property name="outputType" type="java.lang.String">
        <Value>ATTRIBUTE</Value>
    </Property>	
    <Property name="inputName" type="java.lang.String">
        <Value>xmlIn</Value>
    </Property>	
    <Property name="outputName" type="java.lang.String">
        <Value>xmlOut</Value>
    </Property>	
</PipeletConfiguration>


org.eclipse.smila.processing.pipelets.xmlprocessing.XPathExtractorPipelet

Description

Pipelet that extracts elements selected by XPath, converts them in appropriate data types (Boolean, Double, String) and stores the transformed value in an attribute or attachment.

Configuration

Property Type Description
inputType String : ATTACHMENT, ATTRIBUTE selects if the XML input is found in an attachment or attribute of the record
outputType String : ATTACHMENT, ATTRIBUTE selects if the transformed output should be stored in an attachment or attribute of the record
inputName String name of input attachment or path to input attribute (process literals of attribute)
outputName String name of output attachment or path to output attribute (store result as literals of attribute)
xpath String the XPATH to evaluate
seperator String the seperator (optional)
namespace String the XML namespace (optional)

Example

PipeletConfiguration for XPathExtractorPipelet

<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
	<Property name="xpath" type="java.lang.String">
		<Value>author/email</Value>
	</Property>
	<Property name="seperator" type="java.lang.String">
		<Value></Value>
	</Property>
	<Property name="namespace" type="java.lang.String">
		<Value></Value>
	</Property>		
	<Property name="inputType" type="java.lang.String">
		<Value>ATTRIBUTE</Value>
	</Property>	
	<Property name="outputType" type="java.lang.String">
		<Value>ATTRIBUTE</Value>
	</Property>	
	<Property name="inputName" type="java.lang.String">
		<Value>xmlIn</Value>
	</Property>	
	<Property name="outputName" type="java.lang.String">
		<Value>xmlOut</Value>
	</Property>	
</PipeletConfiguration>

org.eclipse.smila.processing.pipelets.xmlprocessing.XPathFilterPipelet

Description

Pipelet that filters elements by XPath (either include or exclude mode) and stores the filtered elements as a new document in an attribute or attachment.

Configuration

Property Type Description
inputType String : ATTACHMENT, ATTRIBUTE selects if the XML input is found in an attachment or attribute of the record
outputType String : ATTACHMENT, ATTRIBUTE selects if the transformed output should be stored in an attachment or attribute of the record
inputName String name of input attachment or path to input attribute (process literals of attribute)
outputName String name of output attachment or path to output attribute (store result as literals of attribute)
xpath String the XPATHs to evaluate (multivalue)
filterMode String : INCLUDE, EXCLUDE the filter mode, if to include or exclude the elements specified by xpath
seperator String the seperator (optional)
namespace String the XML namespace (optional)

Example

PipeletConfiguration for XPathFilterPipelet

<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
	<Property name="xpath" type="java.lang.String">
		<Value>author/name</Value>
	</Property>		
	<Property name="xpath" type="java.lang.String">
		<Value>author/email</Value>
	</Property>
	<Property name="filterMode" type="java.lang.String">
		<Value>EXCLUDE</Value>
	</Property>		
	<Property name="seperator" type="java.lang.String">
		<Value></Value>
	</Property>
	<Property name="namespace" type="java.lang.String">
		<Value></Value>
	</Property>		
	<Property name="inputType" type="java.lang.String">
		<Value>ATTRIBUTE</Value>
	</Property>	
	<Property name="outputType" type="java.lang.String">
		<Value>ATTRIBUTE</Value>
	</Property>	
	<Property name="inputName" type="java.lang.String">
		<Value>xmlIn</Value>
	</Property>	
	<Property name="outputName" type="java.lang.String">
		<Value>xmlOut</Value>
	</Property>	
</PipeletConfiguration>

org.eclipse.smila.processing.pipelets.xmlprocessing.RemoveElementFromXMLPipelet

Description

Pipelet that removes a selected element from an XML document and stores the remaining document in an attribute or attachment.

Configuration

Property Type Description
inputType String : ATTACHMENT, ATTRIBUTE selects if the XML input is found in an attachment or attribute of the record
outputType String : ATTACHMENT, ATTRIBUTE selects if the transformed output should be stored in an attachment or attribute of the record
inputName String name of input attachment or path to input attribute (process literals of attribute)
outputName String name of output attachment or path to output attribute (store result as literals of attribute)
elementId String the id of the XML element to remove

Example

PipeletConfiguration for RemoveElementFromXMLPipelet

<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
	<Property name="elementId" type="java.lang.String">
		<Value>1</Value>
	</Property>
	<Property name="inputType" type="java.lang.String">
		<Value>ATTACHMENT</Value>
	</Property>	
	<Property name="outputType" type="java.lang.String">
		<Value>ATTACHMENT</Value>
	</Property>	
	<Property name="inputName" type="java.lang.String">
		<Value>xmlIn</Value>
	</Property>	
	<Property name="outputName" type="java.lang.String">
		<Value>xmlOut</Value>
	</Property>	
</PipeletConfiguration>

org.eclipse.smila.processing.pipelets.xmlprocessing.TidyPipelet

Description

Pipelet that performs a Tidy transformation on an attribute or attachment value and stores the result in an attribute or attachment.

Configuration

Property Type Description
inputType String : ATTACHMENT, ATTRIBUTE selects if the XML input is found in an attachment or attribute of the record
outputType String : ATTACHMENT, ATTRIBUTE selects if the transformed output should be stored in an attachment or attribute of the record
inputName String name of input attachment or path to input attribute (process literals of attribute)
outputName String name of output attachment or path to output attribute (store result as literals of attribute)
tidyFile String the name (with relative or absolute path) of the tidy configuration file to use

Example

PipeletConfiguration for TidyPipelet

<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
	<Property name="tidyFile" type="java.lang.String">
		<Value>./configuration/data/tidy_config.txt</Value>
	</Property>
	<Property name="inputType" type="java.lang.String">
		<Value>ATTRIBUTE</Value>
	</Property>	
	<Property name="outputType" type="java.lang.String">
		<Value>ATTRIBUTE</Value>
	</Property>	
	<Property name="inputName" type="java.lang.String">
		<Value>xmlIn</Value>
	</Property>	
	<Property name="outputName" type="java.lang.String">
		<Value>xmlOut</Value>
	</Property>	
</PipeletConfiguration>



org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet

Description

Pipelet that splits a XML stream into multiple xml snippets. For each snippet a new Record is created where the XML snippet is stored in either an attribute or attachment. The created records are not returned as a PipeletResult (this is just the same as the incoming RecordIds) but are directly send to the ConnectivityManager and are routed once more to the Queue.

On each created record the Annotation MessageProperties is set with the key value pair isXmlSnippet=true. This can be used in Listener rules to select for XML snippets to process.

Configuration

isEndClosingTag||Boolean|| flag if the endTagName is a closing tag (true) or not (false)
Property Type Description
inputType String : ATTACHMENT, ATTRIBUTE selects if the XML input is found in an attachment or attribute of the record. An input Attribute is not interpreted as content but as a file path or an URL to the XML document.
outputType String : ATTACHMENT, ATTRIBUTE selects if the XML snippet should be stored in an attachment or attribute of the newly created record
inputName String name of input attachment or path to input attribute.
outputName String name of output attachment or path to output attribute (store result as literals of attribute)
beginTagName String the name of the tag to start the xml snippet
isBeginClosingTag Boolean flag if the beginTagName is a closing tag (true) or not (false)
endTagName String the name of the tag to end the xml snippet
keyTagName String the name of the tag used to create a record id
maxBufferSize Integer the maximum size of the internal record buffer (optional, default is 20)

Example

PipeletConfiguration for XmlSplitterPipelet

<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
    <Property name="inputType" type="java.lang.String">
        <Value>ATTRIBUTE</Value>
    </Property>				       
    <Property name="outputType" type="java.lang.String">
        <Value>ATTRIBUTE</Value>
    </Property>
    <Property name="inputName" type="java.lang.String">
        <Value>Path</Value>
    </Property>
    <Property name="outputName" type="java.lang.String">
        <Value>Content</Value>
    </Property>
    <Property name="beginTagName" type="java.lang.String">
        <Value>doc</Value>
    </Property>
    <Property name="isBeginClosingTag" type="java.lang.Boolean">
        <Value>false</Value>
    </Property>
    <Property name="endTagName" type="java.lang.String">
        <Value>doc</Value>
    </Property>
    <Property name="isEndClosingTag" type="java.lang.Boolean">
        <Value>true</Value>
    </Property>
    <Property name="keyTagName" type="java.lang.String">
        <Value>docId</Value>
    </Property>
</PipeletConfiguration>


The former configuration would split the following XML format

<sampleCollection>
    ...
    <document>
        <docId>4711</docId>
        <title>Some title</title>
        ...
        <text>Some text</text>
    </document>
    <document>
        <docId>0815</docId>  
        ...
    </document>
    ...
</sampleCollection>

into XML snippets like this one

<document>
    <docId>4711</docId>
    <title>Some title</title>
    ...
    <text>Some text</text>
</document>

And each for each snippet a record would be created:

<Record version="1.0">
  <Id version="1.0" xmlns="http://www.eclipse.org/smila/id">
    < Source>xmlsplitter</ Source>
    <Key name="Path">someeBigXmlfile.xml</Key>
    <id:Fragment>4711</id:Fragment>
  </Id>  
  <A n="MessageProperties">
      <V n="isXmlSnippet">true</V>
  </A>
  <A n="Content">
    <L>
      <V>
        <document>
            <docId>4711</docId>
            <title>Some title</title>
            ...
            <text>Some text</text>
        </document>   
      </V>
    </L>
  </A>   
</Record>


The Listener rules to split the XML files and to process the XML snippets could look like this:

 <Rule Name="Splitter Rule" WaitMessageTimeout="10" Threads="2" MaxMessageBlockSize="1">
    <Source BrokerId="broker1" Queue="SMILA.connectivity"/>
    <Condition>Operation='ADD' and DataSourceID LIKE '%xmlsplitting%' and NOT(isXmlSnippet='true')</Condition>
    <Task>
      <Process Workflow="SplitterPipeline"/>
    </Task>
  </Rule>    
 
 <Rule Name="Snippet Rule" WaitMessageTimeout="10" Threads="4" MaxMessageBlockSize="20">
    <Source BrokerId="broker1" Queue="SMILA.connectivity"/>
    <Condition>Operation='ADD' and DataSourceID LIKE '%xmlsplitting%' and isXmlSnippet='true'</Condition>
    <Task>
      <Process Workflow="Snippetipeline"/>
    </Task>
  </Rule>

Back to the top