Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Bundle org.eclipse.smila.processing.pipelets"

(Example)
(Description)
(44 intermediate revisions by 8 users not shown)
Line 1: Line 1:
 
This page describes the SMILA pipelets provided by bundle <tt>org.eclipse.smila.processing.pipelets</tt>.
 
This page describes the SMILA pipelets provided by bundle <tt>org.eclipse.smila.processing.pipelets</tt>.
 +
 +
== General ==
 +
 +
All pipelets in this bundle support the configurable error handling as described in [[SMILA/Development_Guidelines/How_to_write_a_Pipelet#Implementation]]. When used in jobmanager workflows, records causing errors are dropped.
 +
 +
''' Read Type '''
 +
* ''runtime'': Parameters are read when processing records. Parameter value can be set per Record.
 +
* ''init'': Parameters are read once from Pipelet configuration when initializing the Pipelet. Parameter value can not be overwritten in Record.
  
 
== org.eclipse.smila.processing.pipelets.CommitRecordsPipelet ==
 
== org.eclipse.smila.processing.pipelets.CommitRecordsPipelet ==
Line 20: Line 28:
 
!Property
 
!Property
 
!Type
 
!Type
 +
!Read Type
 
!Description
 
!Description
 
|-
 
|-
 
|''outputAttribute''
 
|''outputAttribute''
|A string value
+
|string
 +
|runtime
 
|The name of the attribute to add values to
 
|The name of the attribute to add values to
 
|-
 
|-
 
|''valuesToAdd''
 
|''valuesToAdd''
 
|Anything, usually a value or a sequence of values
 
|Anything, usually a value or a sequence of values
 +
|runtime
 
|The values to add
 
|The values to add
 
|}
 
|}
Line 49: Line 60:
 
</source>
 
</source>
  
== RemoveAttributePipelet ==
+
== org.eclipse.smila.processing.pipelets.SetValuePipelet ==
  
Removes a single attribute completely from each record.
+
Sets a value for an attribute in every processed records. If the attribute exists already, it is not change by default. Useful for initializations of required attributes.
  
 
=== Configuration ===
 
=== Configuration ===
Line 58: Line 69:
 
!Property
 
!Property
 
!Type
 
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''outputAttribute''
 +
|string
 +
|runtime
 +
|The name of the attribute to set the value for
 +
|-
 +
|''value''
 +
|anything
 +
|runtime
 +
|The constant value to set for the attribute (a map or sequence is possible, too)
 +
|-
 +
|''overwrite''
 +
|boolean
 +
|runtime
 +
|Indicates to overwrite any value that the attribute contains already (optional, defaults to false)
 +
|}
 +
 +
=== Example ===
 +
 +
This sets a map containing two values into attribute1, even if there is already a value in that attribute.
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="setMapForExistingAttribute">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.SetValuePipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="outputAttribute">attribute1</rec:Val>
 +
      <rec:Val key="overwrite" type="boolean">true</rec:Val>
 +
      <rec:Map key="value">
 +
        <rec:Val key="key1">value1</rec:Val>
 +
        <rec:Val key="key2">value2</rec:Val>
 +
      </rec:Map>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.RemoveAttributePipelet ==
 +
 +
Removes an attribute from each record.
 +
 +
=== Configuration ===
 +
 +
The configuration property is either read from the <tt>_parameters</tt> attribute of a record or from the pipelet configuration. If not set at all, the record remains unchanged.
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 
!Description
 
!Description
 
|-
 
|-
 
|''removeAttribute''
 
|''removeAttribute''
 
|A string value
 
|A string value
 +
|runtime
 
|The name of the attribute to remove
 
|The name of the attribute to remove
 
|}
 
|}
Line 77: Line 141:
 
     <rec:Val key="removeAttribute">_parameters</rec:Val>
 
     <rec:Val key="removeAttribute">_parameters</rec:Val>
 
     </proc:configuration>         
 
     </proc:configuration>         
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.FilterPipelet ==
 +
 +
Copies only those record IDs to the result which match a configurable regular expression in a configurable single-valued attribute. This is useful for conditional processing while at the same time pushing multiple records through the pipeline in a single request: Instead of using BPEL conditions use a FilterPipelet to select only the matching records in a new variable and use the this variable as the input variable for the next pipelets. You can still use the original BPEL variable in the BPEL <tt><reply></tt> activity at the end of the pipeline to return all records as the final result.
 +
 +
=== Configuration ===
 +
The configuration properties are read either from the <tt>_parameters</tt> attribute of each record or from the pipelet configuration.
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''filterAttribute''
 +
|A string value
 +
|runtime
 +
|The name of the attribute to match
 +
|-
 +
|''filterExpression''
 +
|A string value
 +
|runtime
 +
|The regular expression to match the attribute value against
 +
|}
 +
 +
=== Example ===
 +
 +
To get only those records in the <tt>textRecords</tt> BPEL variable that have a MimeType starting with <tt>text</tt> something like this could be used:
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="invokeFilterPipelet">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.FilterPipelet" />
 +
    <proc:variables input="request" output="textRecords" />
 +
    <proc:configuration>
 +
      <rec:Val key="filterAttribute">MimeType</rec:Val>
 +
      <rec:Val key="filterExpression">text/.+</rec:Val>
 +
    </proc:configuration>
 
   </proc:invokePipelet>
 
   </proc:invokePipelet>
 
</extensionActivity>
 
</extensionActivity>
Line 85: Line 190:
 
=== Description ===
 
=== Description ===
  
Extract plain text and metadata from an HTML document in an attribute or attachment of each record and writes it to configurable attributes or attachments.
+
Extract plain text and metadata from an HTML document from an attribute or attachment of each record and writes the results to configurable attributes or attachments.
  
 
The pipelet uses the CyberNeko HTML parser [http://nekohtml.sourceforge.net/ NekoHTML] to parse HTML documents.
 
The pipelet uses the CyberNeko HTML parser [http://nekohtml.sourceforge.net/ NekoHTML] to parse HTML documents.
Line 94: Line 199:
 
!Property
 
!Property
 
!Type
 
!Type
 +
!Read Type
 
!Description
 
!Description
 
|-
 
|-
 
|''inputType''
 
|''inputType''
 
|String : ''ATTACHMENT, ATTRIBUTE''
 
|String : ''ATTACHMENT, ATTRIBUTE''
 +
|runtime
 
|Defines whether the HTML input is found in an attachment or in an attribute of the record
 
|Defines whether the HTML input is found in an attachment or in an attribute of the record
 
|-
 
|-
 
|''outputType''
 
|''outputType''
 
|String : ''ATTACHMENT, ATTRIBUTE''
 
|String : ''ATTACHMENT, ATTRIBUTE''
 +
|runtime
 
|Defines whether the plain text should be stored in an attachment or in an attribute of the record
 
|Defines whether the plain text should be stored in an attachment or in an attribute of the record
 
|-
 
|-
 
|''inputName''
 
|''inputName''
 
|String
 
|String
 +
|runtime
 
|Name of input attachment or path to input attribute (process literals of attribute)
 
|Name of input attachment or path to input attribute (process literals of attribute)
 
|-
 
|-
 
|''outputName''
 
|''outputName''
 
|String
 
|String
 +
|runtime
 
|Name of output attachment or path to output attribute for plain text (store result as literals of attribute)
 
|Name of output attachment or path to output attribute for plain text (store result as literals of attribute)
 
|-
 
|-
 
|''defaultEncoding''
 
|''defaultEncoding''
 
|String
 
|String
 +
|runtime
 
|Optional, default encoding to apply to documents when not specified in the documents themselves
 
|Optional, default encoding to apply to documents when not specified in the documents themselves
 
|-
 
|-
 
|''removeContentTags''
 
|''removeContentTags''
 
|String
 
|String
 +
|runtime
 
|Comma-separated list of HTML tags (case-insensitive) for which the complete content should be removed from the resulting plain text. If not set, it defaults to ''"applet,frame,object,script,style"''. If the value is set, you must add the default tags explicitly to have their contents removed, too.
 
|Comma-separated list of HTML tags (case-insensitive) for which the complete content should be removed from the resulting plain text. If not set, it defaults to ''"applet,frame,object,script,style"''. If the value is set, you must add the default tags explicitly to have their contents removed, too.
 
|-
 
|-
 
|''meta:<name>''
 
|''meta:<name>''
 
|String: attribute path
 
|String: attribute path
 +
|init
 
|Store the content of the <tt><META></tt> tag with ''name="<name>"'' (case insensitive) to the attribute named as the value of the property. E.g. a property named ''"meta:author"'' with value "authors" causes the content attributes of <tt><META name="author" content="..."></tt> tags to be stored in the attribute ''authors'' of the respective record.
 
|Store the content of the <tt><META></tt> tag with ''name="<name>"'' (case insensitive) to the attribute named as the value of the property. E.g. a property named ''"meta:author"'' with value "authors" causes the content attributes of <tt><META name="author" content="..."></tt> tags to be stored in the attribute ''authors'' of the respective record.
 
|-
 
|-
 
|''tag:title''
 
|''tag:title''
 
|String: attribute path
 
|String: attribute path
 +
|init
 
|Store the content of the <tt><TITLE></tt> tag with to the attribute named as the value of the property.
 
|Store the content of the <tt><TITLE></tt> tag with to the attribute named as the value of the property.
 
|}
 
|}
  
==== Example ====
+
=== Example ===
  
This configuration extracts plain text from the HTML document in attachment ''"html"'' and stores it in the attribute ''"text"''. It removes the complete content of heading tags <tt><nowiki><h1>, ..., <h4></nowiki></tt>. Additionally it looks for <tt><meta></tt> tags with names ''"author"'' and ''"keywords"'' and stores their contents in attributes ''"authors"'' and ''"keywords"'', respectively:
+
This configuration extracts plain text from the HTML document in attachment ''"html"'' and stores the results to the attribute ''"text"''. It removes the complete content of heading tags <tt><nowiki><h1>, ..., <h4></nowiki></tt>. In addition to that, it looks for <tt><meta></tt> tags with names ''"author"'' and ''"keywords"'' and stores their contents in attributes ''"authors"'' and ''"keywords"'', respectively:
  
 
<source lang="xml">
 
<source lang="xml">
Line 157: Line 271:
 
=== Description ===
 
=== Description ===
  
This pipelet can be used to copy a String value between attributes and/or attachments. It suppoprts two execution modes:
+
This pipelet can be used to copy or move attribute values to other attributes or to copy or move a string value between attributes and/or attachments. It suppoprts two execution modes:
* COPY: copy the value from the input attribute/attachment to thee output attribute/attachment  
+
* COPY: copy the value from the input attribute/attachment to the output attribute/attachment  
 
* MOVE: same as COPY, but after that delete the value from the input attribute/attachment
 
* MOVE: same as COPY, but after that delete the value from the input attribute/attachment
 +
When an attribute is copied to another attribute, the type remains the same. When copying an attachment to an attribute, a string value is created by assuming the the attachment is a text in UTF-8 encoding. When copying an attribute value to an attachment, the attribute must be single value which is interpreted as a string value and converted to a byte array using UTF-8 encoding.
  
 
=== Configuration ===
 
=== Configuration ===
Line 166: Line 281:
 
!Property
 
!Property
 
!Type
 
!Type
 +
!Read Type
 
!Description
 
!Description
 
|-
 
|-
 
|''inputType''
 
|''inputType''
 
|String : ''ATTACHMENT, ATTRIBUTE''
 
|String : ''ATTACHMENT, ATTRIBUTE''
 +
|runtime
 
|selects if the input is found in an attachment or attribute of the record
 
|selects if the input is found in an attachment or attribute of the record
 
|-
 
|-
 
|''outputType''
 
|''outputType''
 
|String : ''ATTACHMENT, ATTRIBUTE''
 
|String : ''ATTACHMENT, ATTRIBUTE''
 +
|runtime
 
|selects if output should be stored in an attachment or attribute of the record
 
|selects if output should be stored in an attachment or attribute of the record
 
|-
 
|-
 
|''inputName''
 
|''inputName''
 
|String
 
|String
|name of input attachment or path to input attribute (process a String literal of attribute)
+
|runtime
 +
|name of input attachment or input attribute
 
|-
 
|-
 
|''outputName''
 
|''outputName''
 
|String
 
|String
| name of output attachment or path to output attribute for plain text (store result as String literal of attribute)
+
|runtime
 +
| name of output attachment or output attribute
 
|-
 
|-
 
|''mode''
 
|''mode''
 
|String : ''COPY, MOVE''
 
|String : ''COPY, MOVE''
 +
|runtime
 
| execution mode. Copy the value or move (copy and delete) the value. Default is COPY.
 
| execution mode. Copy the value or move (copy and delete) the value. Default is COPY.
 
|-
 
|-
 
|}
 
|}
  
==== Example ====
+
=== Example ===
  
 
This configuration shows how to copy the value of attachment 'Content' into the attribute 'TextContent':
 
This configuration shows how to copy the value of attachment 'Content' into the attribute 'TextContent':
Line 215: Line 336:
 
=== Description ===
 
=== Description ===
  
Extracts Literal values from an attribute that has a nested maps. The attributes in the nested map can have nested maps themselves. To address a attribute in the nested structure a path needs to be specified. The pipelet supports different execution modes:  
+
Extracts literal values from an attribute that has a nested map. The attributes in the nested map can have nested maps themselves. To address a attribute in the nested structure, a path needs to be specified. The pipelet supports different execution modes:  
 
*FIRST: selects only the first literal of the specified attribute
 
*FIRST: selects only the first literal of the specified attribute
 
*LAST: selects only the last literal of the specified attribute
 
*LAST: selects only the last literal of the specified attribute
Line 231: Line 352:
 
!Property
 
!Property
 
!Type
 
!Type
 +
!Read Type
 
!Description
 
!Description
 
|-
 
|-
 
|''inputPath''
 
|''inputPath''
 
|String
 
|String
 +
|runtime
 
|the path to the input attribute with Literals
 
|the path to the input attribute with Literals
 
|-
 
|-
 
|''outputPath''
 
|''outputPath''
 
|String
 
|String
 +
|runtime
 
|the name of the attribute to store the extracted value(s) as Literals in (not a path, only a top-level attribute, currently)
 
|the name of the attribute to store the extracted value(s) as Literals in (not a path, only a top-level attribute, currently)
 
|-
 
|-
 
|''mode''
 
|''mode''
 
|String : ''FIRST, LAST, ALL_AS_LIST, ALL_AS_ONE''
 
|String : ''FIRST, LAST, ALL_AS_LIST, ALL_AS_ONE''
 +
|runtime
 
| execution mode. See above for details.
 
| execution mode. See above for details.
 
|-
 
|-
 
|''separator''
 
|''separator''
 
|String
 
|String
 +
|runtime
 
| the separation string used for mode ALL_AS_ONE. Default is a blank
 
| the separation string used for mode ALL_AS_ONE. Default is a blank
 
|-
 
|-
 
|}
 
|}
  
==== Example ====
+
=== Example ===
  
 
This configuration can be applied to records provided by the FeedAgent. It shows how to access the subattribute 'Value' of attribute 'Contents', concatenating all values to one:
 
This configuration can be applied to records provided by the FeedAgent. It shows how to access the subattribute 'Value' of attribute 'Contents', concatenating all values to one:
Line 270: Line 396:
 
</source>
 
</source>
  
== Bundle: org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet ==
+
== org.eclipse.smila.processing.pipelets.ReplacePipelet ==
  
 
=== Description ===
 
=== Description ===
This pipelet is used to identify the MIME type of a document.
 
It uses an <tt>org.eclipse.smila.processing.pipelets.mimetype.MimeTypeIdentifier</tt> service to perform the actual identification of the MIME type. Depending on the specified properties, the MIME type is detected from the file content, from the file extension, or from both. If the identification does not return a MIME type - and if configured accordingly - the service will search the metadata for this information. The identified MIME type is then stored to an attribute in the record.
 
  
 +
Searches for one or more patterns in the literal value of an attribute and substitutes the found occurrences by the configured replacements.
 +
 +
You can choose from different matching types:
 +
 +
* ''entity'': Every pattern is matched against the whole attribute value (with respect to the ''ignoreCase'' property) and the first matching pattern defines the new value of the attribute. If no pattern matches, the result is the current value of the attribute.
 +
* ''substring'': All patterns that are part of the attribute value are replaced.
 +
* ''regexp'': Interpret all patterns as [http://en.wikipedia.org/wiki/Regular_expression regular expression], see [http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#replaceAll(java.lang.String) Matcher#replaceAll(String)]
 +
 +
This pipelet works only on attributes, not on attachments!
 +
 +
=== Configuration ===
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''inputAttribute''
 +
|String
 +
|runtime
 +
|the name of the attribute that contains the literal to search in
 +
|-
 +
|''outputAttribute''
 +
|String
 +
|runtime
 +
|the name of the attribute to store the result value as string, defaults to the input attribute
 +
|-
 +
|''type''
 +
|String : ''entity'', ''substring'', ''regexp''
 +
|init
 +
|Identifies the type of the pattern, see above for details. Defaults to ''substring''.
 +
|-
 +
|''ignoreCase''
 +
|Boolean
 +
|init
 +
|indicates that the case is ignored when matching patterns, defaults to ''false''.
 +
|-
 +
|''mapping''
 +
|Map
 +
|init
 +
|A mapping of multiple patterns and replacements. Each key is a pattern and its value the replacement.
 +
|-
 +
|''pattern''
 +
|String
 +
|init
 +
|the pattern to apply to the literal value (see above for a description of possible types), required if no mapping is given
 +
|-
 +
|''replacement''
 +
|String
 +
|init
 +
|the substitution string used to replace all occurrences of the pattern, defaults to the empty string
 +
|-
 +
|}
 +
 +
=== Examples ===
 +
 +
This configuration can be used to map language ids to their label:
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="set language label">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ReplacePipelet" />
 +
    <proc:variables input="request" output="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="inputAttribute">Language</rec:Val>
 +
      <rec:Val key="outputAttribute">LanguageLabel</rec:Val>
 +
      <rec:Val key="type">entity</rec:Val>
 +
      <rec:Val key="ignoreCase" type="boolean">true</rec:Val>
 +
      <rec:Map key="mapping">
 +
        <rec:Val key="de">German</rec:Val>
 +
        <rec:Val key="en">English</rec:Val>
 +
        <rec:Val key="es">Spanish</rec:Val>
 +
        <rec:Val key="fr">French</rec:Val>
 +
        ...
 +
      </rec:Map>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
This configuration can be used to cut the time information from a timestamp:
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="cut time">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ReplacePipelet" />
 +
    <proc:variables input="request" output="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="inputAttribute">ModificationTime</rec:Val>
 +
      <rec:Val key="outputAttribute">ModificationDate</rec:Val>
 +
      <rec:Val key="type">regexp</rec:Val>
 +
      <rec:Val key="pattern">[T ].*</rec:Val>
 +
      <rec:Val key="replacement"></rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.ScriptPipelet ==
 +
 +
=== Description ===
 +
 +
Executes a script for each record.
 +
 +
For execution the [http://en.wikipedia.org/wiki/Scripting_for_the_Java_Platform Java Scripting API (JSR 223)] is responsible - thus any compatible scripting engine can be used. JavaScript is available "out of the box" and the default script language.
 +
 +
The context of the script will contain four variables:
 +
* ''blackboard'': a reference to the [http://build.eclipse.org/rt/smila/javadoc/current/org/eclipse/smila/blackboard/Blackboard.html blackboard]
 +
* ''id'': the ID of the current record
 +
* ''record'': the [http://build.eclipse.org/rt/smila/javadoc/current/org/eclipse/smila/datamodel/AnyMap.html metadata] of the current record
 +
* ''results'': a slightly modified version of a [http://build.eclipse.org/rt/smila/javadoc/current/org/eclipse/smila/processing/util/ResultCollector.html result collector] that provides methods to add a new record id to the list of result ids (''results.addResult('...id...')'') and to drop the current record from the same list (''results.excludeCurrentRecord()'')
 +
* ''parameterAccessor'': the [http://build.eclipse.org/rt/smila/javadoc/current/org/eclipse/smila/processing/parameters/ParameterAccessor.html ParameterAccessor] instance for access to the configuration (e.g. ''parameterAccessor.getParameterAny("configMap").asMap().getLongValue("longValue")'').
 +
 +
Please be aware that the intention of this pipelet is to write pipelines fast, but not to write fast pipelines - the script is parsed for every record. Don't use it for production environments where performance matters, but use it to develop an algorithm that you can put into [[SMILA/Development_Guidelines/How_to_write_a_Pipelet|your own pipelet]].
 +
 +
=== Configuration ===
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''type''
 +
|String
 +
|init
 +
|the mime type of the scripting language, defaults to "text/javascript"
 +
|-
 +
|''scriptFile''
 +
|String
 +
|runtime
 +
|the path of the file that contains the script - modifications of this file are observed on every execution of the pipelet
 +
|-
 +
|''script''
 +
|String
 +
|init
 +
|The "inline" script, required unless ''scriptFile'' is specified (ignored in that case)
 +
|-
 +
|''resultAttribute''
 +
|String
 +
|runtime
 +
|The name of an attribute that will receive the result of the script (usually the result of the last expression)
 +
|-
 +
|}
 +
 +
=== Examples ===
 +
 +
This configuration can be used to concatenate the values of two attributes and save the result into a third one:
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="create full name">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ScriptPipelet" />
 +
    <proc:variables input="request" output="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="script">record.get("firstName") + " " + record.get("lastName")</rec:Val>
 +
      <rec:Val key="resultAttribute">fullName</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
This configuration can be used to execute a java script file from $SMILA_PATH$/configuration/example.js:
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="execute script">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ScriptPipelet" />
 +
    <proc:variables input="request" output="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="scriptFile">configuration/example.js</rec:Val>   
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.ExecPipelet ==
 +
 +
=== Description ===
 +
 +
Executes an external program for each record.
 +
 +
This pipelet may be used to integrate native programs into the pipeline.
 +
 +
'''Attention''': This pipelet may lead to security issues! Please be aware that although one can not change the executed command during runtime (as this parameter is only evaluated at initialization time), it is possible to change the arguments and input of the command using values in the processed record. Every "pipeline developer" should ensure that only arguments in the expected value range are processed (especially if the program is allowing files from the file system as arguments).
 +
 +
=== Configuration ===
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''command''
 +
|String
 +
|init
 +
|The program to execute (including its path in the file system).
 +
|-
 +
|''directory''
 +
|String
 +
|runtime
 +
|The (optional) working directory for the command. The SMILA directory is used if not given.
 +
|-
 +
|''parameters''
 +
|Sequence of strings
 +
|runtime
 +
|The optional parameters given to the program (ignored if the contents of the parameters attribute exists).
 +
|-
 +
|''parametersAttribute''
 +
|String
 +
|runtime
 +
|The optional name of the attribute that contains the sequence of parameters given to the program.
 +
|-
 +
|''inputAttachment''
 +
|String
 +
|runtime
 +
|The optional name of the attachment that contains the bytes to send as input for the program.
 +
|-
 +
|''outputAttachment''
 +
|String
 +
|runtime
 +
|The optional name of the attachment that is filled with the standard output of the program.
 +
|-
 +
|''exitCodeAttribute''
 +
|String
 +
|runtime
 +
|The name of the attribute that is filled with the exit code of the program.
 +
|-
 +
|''errorAttachment''
 +
|String
 +
|runtime
 +
|The optional name of the attachment that is filled with the error output of the program.
 +
|-
 +
|''failOnError''
 +
|Either a boolean or a sequence of strings
 +
|runtime
 +
|Indicates to mark a record as failed if the program returns an error code. Either as a sequence of exit code ranges or as a boolean where "true" means that everything except 0 is an error code. Defaults to false.
 +
|-
 +
|}
 +
 +
=== Examples ===
 +
 +
This configuration can be used to execute FFMPEG for transformation of an MP3 input file into a WAV output file:
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="ConvertMP3">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ExecPipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="command">.../ffmpeg</rec:Val>
 +
      <rec:Seq key="parameters">
 +
          <rec:Val>-i</rec:Val>
 +
          <rec:Val>.../example.mp3</rec:Val>
 +
          <rec:Val>-ar</rec:Val>
 +
          <rec:Val>16000</rec:Val>
 +
          <rec:Val>.../example.wav</rec:Val>
 +
      </rec:Seq>
 +
      <rec:Val key="failOnError" type="boolean">true</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet ==
 +
 +
=== Description ===
 +
This pipelet is used to identify the MIME type of a document.
 +
It uses an <tt>[[SMILA/Documentation/MimeTypeIdentifier| org.eclipse.smila.processing.pipelets.mimetype.MimeTypeIdentifier]]</tt> service to perform the actual identification of the MIME type. Depending on the specified properties, the MIME type is detected from the file content, from the file extension, or from both. If the identification does not return a MIME type - and if configured accordingly - the service will search the metadata for this information. The identified MIME type is then stored to an attribute in the record.
  
 
=== Configuration ===
 
=== Configuration ===
Line 282: Line 677:
  
 
{| border = 1
 
{| border = 1
!Property!!Type!!Usage!!Description
+
!Property!!Type!!Read Type!!Usage!!Description
 
|-
 
|-
|''FileExtensionAttribute''||String||Optional||Name of the attribute containing the file extension
+
|''FileExtensionAttribute''||String||init||Optional||Name of the attribute containing the file extension
 
|-
 
|-
|''ContentAttachment''||String||Optional||Name of the attachment containing the file content
+
|''ContentAttachment''||String||init||Optional||Name of the attachment containing the file content
 
|-
 
|-
|''MetaDataAttribute''||String||Optional||Name of the attribute containing metadata information, e.g. a Web Crawler returns a response header containing applicable MIME type information
+
|''MetaDataAttribute''||String||init||Optional||Name of the attribute containing metadata information, e.g. a Web Crawler returns a response header containing applicable MIME type information
 
|-
 
|-
|''MimeTypeAttribute''||String||Required||Name of the attribute to store the identified MIME type to
+
|''MimeTypeAttribute''||String||init||Required||Name of the attribute to store the identified MIME type to
 
|}
 
|}
 
Note that at least one of the properties ''FileExtensionAttribute'', ''ContentAttachment'', and ''MetaDataAttribute'' must be specified!
 
Note that at least one of the properties ''FileExtensionAttribute'', ''ContentAttachment'', and ''MetaDataAttribute'' must be specified!
  
==== Example ====
+
=== Example ===
  
 
The following example is used in the SMILA example application to identify the MIME types of documents that are delivered by the File System Crawler or Web Crawler.
 
The following example is used in the SMILA example application to identify the MIME types of documents that are delivered by the File System Crawler or Web Crawler.
Line 312: Line 707:
 
</extensionActivity>
 
</extensionActivity>
 
</source>
 
</source>
 +
 +
 +
== org.eclipse.smila.processing.pipelets.LanguageIdentifyPipelet ==
 +
 +
=== Description ===
 +
This pipelet identifies the language of textual input and stores the returned ISO 639 language code to some target attribute. It uses an <tt>org.eclipse.smila.common.language.LanguageIdentifier</tt> service to perform the actual identification. If the identification does not return a language, the specified <tt>DefaultLanguage</tt> (or <tt>DefaultAlternativeName</tt>) is returned. If no defaults are specified, no value is set.
 +
 +
The pipelet returns the detected language as an ISO 639 code. Where you need special language tags in your application, the pipelet is able to produce
 +
an alternative language code according to a configurable mapping. To define such a mapping, create the file <tt>SMILA/configuration/org.eclipse.smila.tika/languageMapping.properties</tt>. The following shows an exemplary mapping:
 +
 +
<source lang="text">
 +
de=german
 +
en=english
 +
es=spanish
 +
fi=finnish
 +
fr=french
 +
</source>
 +
 +
The pipelet uses [http://tika.apache.org/ Apache Tika] technology for the actual language detection.
 +
 +
=== Configuration ===
 +
 +
The pipelet is configured using the <tt><configuration></tt> section inside the <tt><invokePipelet></tt> activity of the corresponding BPEL file. It provides the following properties:
 +
 +
{| border = 1
 +
!Property!!Type!!Read Type!!Usage!!Description
 +
|-
 +
|''ContentAttribute''||String||runtime||Required||Name of the attribute containing the text whose language should be identified
 +
|-
 +
|''LanguageAttribute''||String||runtime||Optional||Name of the attribute to store the code of the identified language to
 +
|-
 +
|''DefaultLanguage''||String||runtime||Optional||Language code to set if no language could be detected. If not set and no language could be identified, the <tt>LanguageAttribute</tt> attribute remains empty.
 +
|-
 +
|''AlternativeNameAttribute''||String||runtime||Optional||Name of the attribute to store the alternative language code of the identified language to. The mapping defining this alternative code must be located in <tt>SMILA/configuration/org.eclipse.smila.tika/languageMapping.properties</tt> (see above).
 +
|-
 +
|''DefaultAlternativeName''||String||runtime||Optional||Alternative language code to set if no language could be detected. If not set and no language could be identified, the <tt>DefaultAlternativeName</tt> attribute remains empty.
 +
|-
 +
|''UseCertainLanguagesOnly''||Boolean||runtime||Optional||Boolean flag indicating whether to apply only those languages that were identified with a reasonable certainty (true) or all (false). Default is false.
 +
|}
 +
 +
 +
=== Example ===
 +
 +
The following example could be used to identify the language of documents that are delivered by the File System Crawler or Web Crawler.
 +
 +
'''addpipeline.bpel'''
 +
<source lang="xml">
 +
<extensionActivity>
 +
    <proc:invokePipelet name="detect Language">
 +
        <proc:pipelet class="org.eclipse.smila.processing.pipelets.LanguageIdentifyPipelet" />
 +
        <proc:variables input="request" output="request" />
 +
        <proc:configuration>
 +
          <rec:Val key="ContentAttribute">Content</rec:Val>
 +
          <rec:Val key="LanguageAttribute">Language</rec:Val>
 +
          <rec:Val key="DefaultLanguage">de</rec:Val>
 +
          <rec:Val key="AlternativeNameAttribute">AltLanguage</rec:Val>
 +
          <rec:Val key="DefaultAlternativeName">german</rec:Val>
 +
          <rec:Val key="UseCertainLanguagesOnly">false</rec:Val>
 +
        </proc:configuration>
 +
    </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.FileReaderPipelet ==
 +
 +
=== Description ===
 +
 +
This pipelet can be used to read content from a file and add it as an attachment.
 +
 +
=== Configuration ===
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''pathAttribute''
 +
|String
 +
|runtime
 +
|The name of the attribute with the path of the file to read from
 +
|-
 +
|''contentAttachment''
 +
|String
 +
|runtime
 +
|The name of the attachment to store the content
 +
|-
 +
|}
 +
 +
=== Example ===
 +
 +
<source lang="xml">
 +
<!-- read from file and add attachment -->
 +
<extensionActivity>
 +
  <proc:invokePipelet name="invokeReadFile">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.FileReaderPipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="pathAttribute">path</rec:Val>
 +
      <rec:Val key="contentAttachment">content</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.FileWriterPipelet ==
 +
 +
=== Description ===
 +
 +
This pipelet can be used to write the content of an attachment to a file.
 +
 +
If the attachment does not exist a warning is logged, but the record will not be dropped.
 +
 +
=== Configuration ===
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''pathAttribute''
 +
|String
 +
|runtime
 +
|The name of the attribute with the path of the target file
 +
|-
 +
|''contentAttachment''
 +
|String
 +
|runtime
 +
|The name of the attachment to write to the file
 +
|-
 +
|''append''
 +
|Boolean
 +
|runtime
 +
|Indicates to append the attachment to the file (if it exists already), defaults to false
 +
|-
 +
|}
 +
 +
=== Example ===
 +
 +
This example saves all bytes of the attachment "content" to the file path that is contained in the attribute "path".
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="writeFile">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.FileWriterPipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="pathAttribute">path</rec:Val>
 +
      <rec:Val key="contentAttachment">content</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.PushRecordsPipelet ==
 +
 +
=== Description ===
 +
 +
Sends all current records to another (asynchronous) job.
 +
 +
The records are not removed from the pipeline - thus a following pipelet in the current pipeline will process the records as well.
 +
 +
=== Configuration ===
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''type''
 +
|String
 +
|init
 +
|The name of the target job.
 +
|-
 +
|}
 +
 +
=== Example ===
 +
 +
This example sends all current records to the job "TheOtherJob".
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="callJob">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.PushRecordsPipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="job">TheOtherJob</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.JSONReaderPipelet ==
 +
 +
=== Description ===
 +
 +
Fills attributes of the record from a JSON string.
 +
 +
It is not possible to overwrite the record id of the record, even if a key "_recordid" exists in the JSON string.
 +
 +
=== Configuration ===
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''inputType''
 +
|String : ''ATTACHMENT, ATTRIBUTE''
 +
|init
 +
|selects if the JSON string is found in an attachment or attribute of the record
 +
|-
 +
|''inputName''
 +
|String
 +
|init
 +
|name of the input attachment or input attribute that contains the JSON string
 +
|-
 +
|''outputAttribute''
 +
|String
 +
|init
 +
|the optional name of the attribute in the record where the generated object is put into. If no attribute is specified and the object is a map, all contained attributes are written to the current record.
 +
|-
 +
|}
 +
 +
=== Examples ===
 +
 +
The following examples use this input object:
 +
<source lang="javascript">
 +
{ "jsonString": "{\"attribute1\": \"value1\"}" }
 +
</source>
 +
 +
 +
This example unwraps the contents of the attribute "jsonString" into the attribute "jsonObject":
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="readJSON">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.JSONReaderPipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="inputType">ATTRIBUTE</rec:Val>
 +
      <rec:Val key="inputName">jsonString</rec:Val>
 +
      <rec:Val key="outputAttribute">jsonString</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
The result would be:
 +
<source lang="javascript">
 +
{
 +
  "jsonString": "{\"attribute1\": \"value1\"}",
 +
  "jsonObject": {
 +
    "attribute1": "value1"
 +
  }
 +
}
 +
</source>
 +
 +
 +
This example unwraps the contents of the attribute "jsonString" into the object itself:
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="readJSON">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.JSONReaderPipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="inputType">ATTRIBUTE</rec:Val>
 +
      <rec:Val key="inputName">jsonString</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
The result would be:
 +
<source lang="javascript">
 +
{
 +
  "jsonString": "{\"attribute1\": \"value1\"}",
 +
  "attribute1": "value1"
 +
}
 +
</source>
 +
 +
== org.eclipse.smila.processing.pipelets.JSONWriterPipelet ==
 +
 +
=== Description ===
 +
 +
Writes some or all attributes of the record into a JSON string.
 +
 +
=== Configuration ===
 +
 +
{| border="1"
 +
!Property
 +
!Type
 +
!Read Type
 +
!Description
 +
|-
 +
|''inputAttributes''
 +
|String/Sequence of String
 +
|init
 +
|the names of the attributes in the record that contain the objects to write into JSON. If nothing is given, the whole record is used. If only a string is given, the content of that attribute is used.
 +
|-
 +
|''outputType''
 +
|String : ''ATTACHMENT, ATTRIBUTE''
 +
|init
 +
|selects if the JSON string is written to an attachment or attribute of the record
 +
|-
 +
|''outputName''
 +
|String
 +
|init
 +
|name of the target attachment or attribute
 +
|-
 +
|''printPretty''
 +
|Boolean
 +
|init
 +
|Indicates to format the output for better readability, defaults to true.
 +
|-
 +
|}
 +
 +
=== Examples ===
 +
 +
This example writes the content of attribute "a1" into the attribute "value" without any whitespaces:
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="writeJSON">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.JSONWriterPipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="inputAttributes">a1</rec:Val>
 +
      <rec:Val key="outputType">ATTRIBUTE</rec:Val>
 +
      <rec:Val key="outputName">value</rec:Val>
 +
      <rec:Val key="printPretty" type="boolean">false</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
 +
<source lang="javascript">
 +
input : { "a1": [ 1 ], "a2": 2 }
 +
result : { "a1": [ 1 ], "a2": 2, "value": "[1]" }
 +
</source>
 +
 +
This example appends the whole object to the file "records.log":
 +
 +
<source lang="xml">
 +
<extensionActivity>
 +
  <proc:invokePipelet name="createJSONLogEntry">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.JSONWriterPipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="outputType">ATTACHMENT</rec:Val>
 +
      <rec:Val key="outputName">jsonLog</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
<extensionActivity>
 +
  <proc:invokePipelet name="createJSONFileName">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.SetValuePipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="outputAttribute">jsonFile</rec:Val>
 +
      <rec:Val key="value">records.log</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
<extensionActivity>
 +
  <proc:invokePipelet name="appendToJSONLog">
 +
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.FileWriterPipelet" />
 +
    <proc:variables input="request" />
 +
    <proc:configuration>
 +
      <rec:Val key="pathAttribute">jsonFile</rec:Val>
 +
      <rec:Val key="contentAttachment">jsonLog</rec:Val>
 +
      <rec:Val key="append" type="boolean">true</rec:Val>
 +
    </proc:configuration>
 +
  </proc:invokePipelet>
 +
</extensionActivity>
 +
</source>
 +
  
 
[[Category:SMILA]] [[Category:SMILA/Pipelet]]
 
[[Category:SMILA]] [[Category:SMILA/Pipelet]]

Revision as of 03:15, 20 March 2013

This page describes the SMILA pipelets provided by bundle org.eclipse.smila.processing.pipelets.

Contents

General

All pipelets in this bundle support the configurable error handling as described in SMILA/Development_Guidelines/How_to_write_a_Pipelet#Implementation. When used in jobmanager workflows, records causing errors are dropped.

Read Type

  • runtime: Parameters are read when processing records. Parameter value can be set per Record.
  • init: Parameters are read once from Pipelet configuration when initializing the Pipelet. Parameter value can not be overwritten in Record.

org.eclipse.smila.processing.pipelets.CommitRecordsPipelet

Description

Commits each record in the input variable on the blackboard to the storages. Can be used to save the records immediately during the workflow instead of only when a workflow has been finished.

Configuration

none.

org.eclipse.smila.processing.pipelets.AddValuesPipelet

Adds something to an attribute in the processed records. If the attribute does not contain a sequence already, the current value will be wrapped in one before the new values are added.

Configuration

Property Type Read Type Description
outputAttribute string runtime The name of the attribute to add values to
valuesToAdd Anything, usually a value or a sequence of values runtime The values to add

Example

From a test pipeline: This adds two string values to whatever already exists in attribute "out" of the processed records.

<proc:invokePipelet name="addValuesToNonExistingAttribute">
  <proc:pipelet class="org.eclipse.smila.processing.pipelets.AddValuesPipelet" />
  <proc:variables input="request" />
  <proc:configuration>
  <rec:Val key="outputAttribute">out</rec:Val>
    <rec:Seq key="valuesToAdd">
      <rec:Val>value1</rec:Val>
      <rec:Val>value2</rec:Val>
    </rec:Seq>
  </proc:configuration>
</proc:invokePipelet>

org.eclipse.smila.processing.pipelets.SetValuePipelet

Sets a value for an attribute in every processed records. If the attribute exists already, it is not change by default. Useful for initializations of required attributes.

Configuration

Property Type Read Type Description
outputAttribute string runtime The name of the attribute to set the value for
value anything runtime The constant value to set for the attribute (a map or sequence is possible, too)
overwrite boolean runtime Indicates to overwrite any value that the attribute contains already (optional, defaults to false)

Example

This sets a map containing two values into attribute1, even if there is already a value in that attribute.

<extensionActivity>
  <proc:invokePipelet name="setMapForExistingAttribute">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.SetValuePipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="outputAttribute">attribute1</rec:Val>
      <rec:Val key="overwrite" type="boolean">true</rec:Val>
      <rec:Map key="value">
        <rec:Val key="key1">value1</rec:Val>
        <rec:Val key="key2">value2</rec:Val>
      </rec:Map>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.RemoveAttributePipelet

Removes an attribute from each record.

Configuration

The configuration property is either read from the _parameters attribute of a record or from the pipelet configuration. If not set at all, the record remains unchanged.

Property Type Read Type Description
removeAttribute A string value runtime The name of the attribute to remove

Example

To remove the complete structure in attribute _parameters, use:

<extensionActivity>
  <proc:invokePipelet name="removeParameters">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.RemoveAttributePipelet" />
    <proc:variables input="result" output="result" />
    <proc:configuration>
     <rec:Val key="removeAttribute">_parameters</rec:Val>
    </proc:configuration>        
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.FilterPipelet

Copies only those record IDs to the result which match a configurable regular expression in a configurable single-valued attribute. This is useful for conditional processing while at the same time pushing multiple records through the pipeline in a single request: Instead of using BPEL conditions use a FilterPipelet to select only the matching records in a new variable and use the this variable as the input variable for the next pipelets. You can still use the original BPEL variable in the BPEL <reply> activity at the end of the pipeline to return all records as the final result.

Configuration

The configuration properties are read either from the _parameters attribute of each record or from the pipelet configuration.

Property Type Read Type Description
filterAttribute A string value runtime The name of the attribute to match
filterExpression A string value runtime The regular expression to match the attribute value against

Example

To get only those records in the textRecords BPEL variable that have a MimeType starting with text something like this could be used:

<extensionActivity>
  <proc:invokePipelet name="invokeFilterPipelet">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.FilterPipelet" />
    <proc:variables input="request" output="textRecords" />
    <proc:configuration>
      <rec:Val key="filterAttribute">MimeType</rec:Val>
      <rec:Val key="filterExpression">text/.+</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.HtmlToTextPipelet

Description

Extract plain text and metadata from an HTML document from an attribute or attachment of each record and writes the results to configurable attributes or attachments.

The pipelet uses the CyberNeko HTML parser NekoHTML to parse HTML documents.

Configuration

Property Type Read Type Description
inputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the HTML input is found in an attachment or in an attribute of the record
outputType String : ATTACHMENT, ATTRIBUTE runtime Defines whether the plain text should be stored in an attachment or in an attribute of the record
inputName String runtime Name of input attachment or path to input attribute (process literals of attribute)
outputName String runtime Name of output attachment or path to output attribute for plain text (store result as literals of attribute)
defaultEncoding String runtime Optional, default encoding to apply to documents when not specified in the documents themselves
removeContentTags String runtime Comma-separated list of HTML tags (case-insensitive) for which the complete content should be removed from the resulting plain text. If not set, it defaults to "applet,frame,object,script,style". If the value is set, you must add the default tags explicitly to have their contents removed, too.
meta:<name> String: attribute path init Store the content of the <META> tag with name="<name>" (case insensitive) to the attribute named as the value of the property. E.g. a property named "meta:author" with value "authors" causes the content attributes of <META name="author" content="..."> tags to be stored in the attribute authors of the respective record.
tag:title String: attribute path init Store the content of the <TITLE> tag with to the attribute named as the value of the property.

Example

This configuration extracts plain text from the HTML document in attachment "html" and stores the results to the attribute "text". It removes the complete content of heading tags <h1>, ..., <h4>. In addition to that, it looks for <meta> tags with names "author" and "keywords" and stores their contents in attributes "authors" and "keywords", respectively:

<extensionActivity>
  <proc:invokePipelet name="invokeHtml2Txt">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.HtmlToTextPipelet" />
    <proc:variables input="request" output="request" />
    <proc:configuration>
      <rec:Val key="inputType">ATTACHMENT</rec:Val>
      <rec:Val key="outputType">ATTRIBUTE</rec:Val>
      <rec:Val key="inputName">html</rec:Val>
      <rec:Val key="outputName">text</rec:Val>
      <rec:Val key="defaultEncoding">UTF-8</rec:Val>
      <rec:Val key="meta:author">author</rec:Val>
      <rec:Val key="meta:keywords">keywords</rec:Val>
      <rec:Val key="meta:title">title</rec:Val>
      <rec:Val key="removeContentTags">h1,h2,h3,h4</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.CopyPipelet

Description

This pipelet can be used to copy or move attribute values to other attributes or to copy or move a string value between attributes and/or attachments. It suppoprts two execution modes:

  • COPY: copy the value from the input attribute/attachment to the output attribute/attachment
  • MOVE: same as COPY, but after that delete the value from the input attribute/attachment

When an attribute is copied to another attribute, the type remains the same. When copying an attachment to an attribute, a string value is created by assuming the the attachment is a text in UTF-8 encoding. When copying an attribute value to an attachment, the attribute must be single value which is interpreted as a string value and converted to a byte array using UTF-8 encoding.

Configuration

Property Type Read Type Description
inputType String : ATTACHMENT, ATTRIBUTE runtime selects if the input is found in an attachment or attribute of the record
outputType String : ATTACHMENT, ATTRIBUTE runtime selects if output should be stored in an attachment or attribute of the record
inputName String runtime name of input attachment or input attribute
outputName String runtime name of output attachment or output attribute
mode String : COPY, MOVE runtime execution mode. Copy the value or move (copy and delete) the value. Default is COPY.

Example

This configuration shows how to copy the value of attachment 'Content' into the attribute 'TextContent':

<!-- copy txt from attachment to attribute -->
<extensionActivity>
  <proc:invokePipelet name="invokeCopyContent">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.CopyPipelet" />
    <proc:variables input="request" output="request" />
    <proc:configuration>
      <rec:Val key="inputType">ATTACHMENT</rec:Val>
      <rec:Val key="outputType">ATTRIBUTE</rec:Val>
      <rec:Val key="inputName">Content</rec:Val>
      <rec:Val key="outputName">TextContent</rec:Val>
      <rec:Val key="mode">COPY</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.SubAttributeExtractorPipelet

Description

Extracts literal values from an attribute that has a nested map. The attributes in the nested map can have nested maps themselves. To address a attribute in the nested structure, a path needs to be specified. The pipelet supports different execution modes:

  • FIRST: selects only the first literal of the specified attribute
  • LAST: selects only the last literal of the specified attribute
  • ALL_AS_LIST: selects all literal values of the specified attribute and returns a list
  • ALL_AS_ONE: selects all literal values of the specified attribute and concatenates them to a single string, using a separator (default is blank)

This pipelet works only on attributes, not on attachments!

Note: If the maps on the path are nested in sequences, the pipelet uses the first element of such a sequence.

Configuration

Property Type Read Type Description
inputPath String runtime the path to the input attribute with Literals
outputPath String runtime the name of the attribute to store the extracted value(s) as Literals in (not a path, only a top-level attribute, currently)
mode String : FIRST, LAST, ALL_AS_LIST, ALL_AS_ONE runtime execution mode. See above for details.
separator String runtime the separation string used for mode ALL_AS_ONE. Default is a blank

Example

This configuration can be applied to records provided by the FeedAgent. It shows how to access the subattribute 'Value' of attribute 'Contents', concatenating all values to one:

<!-- extract content -->
<extensionActivity>
  <proc:invokePipelet name="extract content">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.SubAttributeExtractorPipelet" />
    <proc:variables input="request" output="request" />
    <proc:configuration>
      <rec:Val key="inputPath">Contents/Value</rec:Val>
      <rec:Val key="outputPath">Content</rec:Val>
      <rec:Val key="mode">ALL_AS_ONE</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.ReplacePipelet

Description

Searches for one or more patterns in the literal value of an attribute and substitutes the found occurrences by the configured replacements.

You can choose from different matching types:

  • entity: Every pattern is matched against the whole attribute value (with respect to the ignoreCase property) and the first matching pattern defines the new value of the attribute. If no pattern matches, the result is the current value of the attribute.
  • substring: All patterns that are part of the attribute value are replaced.
  • regexp: Interpret all patterns as regular expression, see Matcher#replaceAll(String)

This pipelet works only on attributes, not on attachments!

Configuration

Property Type Read Type Description
inputAttribute String runtime the name of the attribute that contains the literal to search in
outputAttribute String runtime the name of the attribute to store the result value as string, defaults to the input attribute
type String : entity, substring, regexp init Identifies the type of the pattern, see above for details. Defaults to substring.
ignoreCase Boolean init indicates that the case is ignored when matching patterns, defaults to false.
mapping Map init A mapping of multiple patterns and replacements. Each key is a pattern and its value the replacement.
pattern String init the pattern to apply to the literal value (see above for a description of possible types), required if no mapping is given
replacement String init the substitution string used to replace all occurrences of the pattern, defaults to the empty string

Examples

This configuration can be used to map language ids to their label:

<extensionActivity>
  <proc:invokePipelet name="set language label">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ReplacePipelet" />
    <proc:variables input="request" output="request" />
    <proc:configuration>
      <rec:Val key="inputAttribute">Language</rec:Val>
      <rec:Val key="outputAttribute">LanguageLabel</rec:Val>
      <rec:Val key="type">entity</rec:Val>
      <rec:Val key="ignoreCase" type="boolean">true</rec:Val>
      <rec:Map key="mapping">
        <rec:Val key="de">German</rec:Val>
        <rec:Val key="en">English</rec:Val>
        <rec:Val key="es">Spanish</rec:Val>
        <rec:Val key="fr">French</rec:Val>
        ...
      </rec:Map>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

This configuration can be used to cut the time information from a timestamp:

<extensionActivity>
  <proc:invokePipelet name="cut time">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ReplacePipelet" />
    <proc:variables input="request" output="request" />
    <proc:configuration>
      <rec:Val key="inputAttribute">ModificationTime</rec:Val>
      <rec:Val key="outputAttribute">ModificationDate</rec:Val>
      <rec:Val key="type">regexp</rec:Val>
      <rec:Val key="pattern">[T ].*</rec:Val>
      <rec:Val key="replacement"></rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.ScriptPipelet

Description

Executes a script for each record.

For execution the Java Scripting API (JSR 223) is responsible - thus any compatible scripting engine can be used. JavaScript is available "out of the box" and the default script language.

The context of the script will contain four variables:

  • blackboard: a reference to the blackboard
  • id: the ID of the current record
  • record: the metadata of the current record
  • results: a slightly modified version of a result collector that provides methods to add a new record id to the list of result ids (results.addResult('...id...')) and to drop the current record from the same list (results.excludeCurrentRecord())
  • parameterAccessor: the ParameterAccessor instance for access to the configuration (e.g. parameterAccessor.getParameterAny("configMap").asMap().getLongValue("longValue")).

Please be aware that the intention of this pipelet is to write pipelines fast, but not to write fast pipelines - the script is parsed for every record. Don't use it for production environments where performance matters, but use it to develop an algorithm that you can put into your own pipelet.

Configuration

Property Type Read Type Description
type String init the mime type of the scripting language, defaults to "text/javascript"
scriptFile String runtime the path of the file that contains the script - modifications of this file are observed on every execution of the pipelet
script String init The "inline" script, required unless scriptFile is specified (ignored in that case)
resultAttribute String runtime The name of an attribute that will receive the result of the script (usually the result of the last expression)

Examples

This configuration can be used to concatenate the values of two attributes and save the result into a third one:

<extensionActivity>
  <proc:invokePipelet name="create full name">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ScriptPipelet" />
    <proc:variables input="request" output="request" />
    <proc:configuration>
      <rec:Val key="script">record.get("firstName") + " " + record.get("lastName")</rec:Val>
      <rec:Val key="resultAttribute">fullName</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

This configuration can be used to execute a java script file from $SMILA_PATH$/configuration/example.js:

<extensionActivity>
  <proc:invokePipelet name="execute script">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ScriptPipelet" />
    <proc:variables input="request" output="request" />
    <proc:configuration>
      <rec:Val key="scriptFile">configuration/example.js</rec:Val>    
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.ExecPipelet

Description

Executes an external program for each record.

This pipelet may be used to integrate native programs into the pipeline.

Attention: This pipelet may lead to security issues! Please be aware that although one can not change the executed command during runtime (as this parameter is only evaluated at initialization time), it is possible to change the arguments and input of the command using values in the processed record. Every "pipeline developer" should ensure that only arguments in the expected value range are processed (especially if the program is allowing files from the file system as arguments).

Configuration

Property Type Read Type Description
command String init The program to execute (including its path in the file system).
directory String runtime The (optional) working directory for the command. The SMILA directory is used if not given.
parameters Sequence of strings runtime The optional parameters given to the program (ignored if the contents of the parameters attribute exists).
parametersAttribute String runtime The optional name of the attribute that contains the sequence of parameters given to the program.
inputAttachment String runtime The optional name of the attachment that contains the bytes to send as input for the program.
outputAttachment String runtime The optional name of the attachment that is filled with the standard output of the program.
exitCodeAttribute String runtime The name of the attribute that is filled with the exit code of the program.
errorAttachment String runtime The optional name of the attachment that is filled with the error output of the program.
failOnError Either a boolean or a sequence of strings runtime Indicates to mark a record as failed if the program returns an error code. Either as a sequence of exit code ranges or as a boolean where "true" means that everything except 0 is an error code. Defaults to false.

Examples

This configuration can be used to execute FFMPEG for transformation of an MP3 input file into a WAV output file:

<extensionActivity>
  <proc:invokePipelet name="ConvertMP3">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.ExecPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="command">.../ffmpeg</rec:Val>
      <rec:Seq key="parameters">
          <rec:Val>-i</rec:Val>
          <rec:Val>.../example.mp3</rec:Val>
          <rec:Val>-ar</rec:Val>
          <rec:Val>16000</rec:Val>
          <rec:Val>.../example.wav</rec:Val>
      </rec:Seq>
      <rec:Val key="failOnError" type="boolean">true</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet

Description

This pipelet is used to identify the MIME type of a document. It uses an org.eclipse.smila.processing.pipelets.mimetype.MimeTypeIdentifier service to perform the actual identification of the MIME type. Depending on the specified properties, the MIME type is detected from the file content, from the file extension, or from both. If the identification does not return a MIME type - and if configured accordingly - the service will search the metadata for this information. The identified MIME type is then stored to an attribute in the record.

Configuration

The pipelet is configured using the <configuration> section inside the <invokePipelet> activity of the corresponding BPEL file. It provides the following properties:

Property Type Read Type Usage Description
FileExtensionAttribute String init Optional Name of the attribute containing the file extension
ContentAttachment String init Optional Name of the attachment containing the file content
MetaDataAttribute String init Optional Name of the attribute containing metadata information, e.g. a Web Crawler returns a response header containing applicable MIME type information
MimeTypeAttribute String init Required Name of the attribute to store the identified MIME type to

Note that at least one of the properties FileExtensionAttribute, ContentAttachment, and MetaDataAttribute must be specified!

Example

The following example is used in the SMILA example application to identify the MIME types of documents that are delivered by the File System Crawler or Web Crawler.

addpipeline.bpel

<extensionActivity>
    <proc:invokePipelet name="detect MimeType">
        <proc:pipelet class="org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet" />
        <proc:variables input="request" output="request" />
        <proc:configuration>
          <rec:Val key="FileExtensionAttribute">Extension</rec:Val>
          <rec:Val key="MetaDataAttribute">MetaData</rec:Val>
          <rec:Val key="MimeTypeAttribute">MimeType</rec:Val>
        </proc:configuration>
    </proc:invokePipelet>
</extensionActivity>


org.eclipse.smila.processing.pipelets.LanguageIdentifyPipelet

Description

This pipelet identifies the language of textual input and stores the returned ISO 639 language code to some target attribute. It uses an org.eclipse.smila.common.language.LanguageIdentifier service to perform the actual identification. If the identification does not return a language, the specified DefaultLanguage (or DefaultAlternativeName) is returned. If no defaults are specified, no value is set.

The pipelet returns the detected language as an ISO 639 code. Where you need special language tags in your application, the pipelet is able to produce an alternative language code according to a configurable mapping. To define such a mapping, create the file SMILA/configuration/org.eclipse.smila.tika/languageMapping.properties. The following shows an exemplary mapping:

de=german
en=english
es=spanish
fi=finnish
fr=french

The pipelet uses Apache Tika technology for the actual language detection.

Configuration

The pipelet is configured using the <configuration> section inside the <invokePipelet> activity of the corresponding BPEL file. It provides the following properties:

Property Type Read Type Usage Description
ContentAttribute String runtime Required Name of the attribute containing the text whose language should be identified
LanguageAttribute String runtime Optional Name of the attribute to store the code of the identified language to
DefaultLanguage String runtime Optional Language code to set if no language could be detected. If not set and no language could be identified, the LanguageAttribute attribute remains empty.
AlternativeNameAttribute String runtime Optional Name of the attribute to store the alternative language code of the identified language to. The mapping defining this alternative code must be located in SMILA/configuration/org.eclipse.smila.tika/languageMapping.properties (see above).
DefaultAlternativeName String runtime Optional Alternative language code to set if no language could be detected. If not set and no language could be identified, the DefaultAlternativeName attribute remains empty.
UseCertainLanguagesOnly Boolean runtime Optional Boolean flag indicating whether to apply only those languages that were identified with a reasonable certainty (true) or all (false). Default is false.


Example

The following example could be used to identify the language of documents that are delivered by the File System Crawler or Web Crawler.

addpipeline.bpel

<extensionActivity>
    <proc:invokePipelet name="detect Language">
        <proc:pipelet class="org.eclipse.smila.processing.pipelets.LanguageIdentifyPipelet" />
        <proc:variables input="request" output="request" />
        <proc:configuration>
          <rec:Val key="ContentAttribute">Content</rec:Val>
          <rec:Val key="LanguageAttribute">Language</rec:Val>
          <rec:Val key="DefaultLanguage">de</rec:Val>
          <rec:Val key="AlternativeNameAttribute">AltLanguage</rec:Val>
          <rec:Val key="DefaultAlternativeName">german</rec:Val>
          <rec:Val key="UseCertainLanguagesOnly">false</rec:Val>
        </proc:configuration>
    </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.FileReaderPipelet

Description

This pipelet can be used to read content from a file and add it as an attachment.

Configuration

Property Type Read Type Description
pathAttribute String runtime The name of the attribute with the path of the file to read from
contentAttachment String runtime The name of the attachment to store the content

Example

<!-- read from file and add attachment -->
<extensionActivity>
  <proc:invokePipelet name="invokeReadFile">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.FileReaderPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="pathAttribute">path</rec:Val>
      <rec:Val key="contentAttachment">content</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.FileWriterPipelet

Description

This pipelet can be used to write the content of an attachment to a file.

If the attachment does not exist a warning is logged, but the record will not be dropped.

Configuration

Property Type Read Type Description
pathAttribute String runtime The name of the attribute with the path of the target file
contentAttachment String runtime The name of the attachment to write to the file
append Boolean runtime Indicates to append the attachment to the file (if it exists already), defaults to false

Example

This example saves all bytes of the attachment "content" to the file path that is contained in the attribute "path".

<extensionActivity>
  <proc:invokePipelet name="writeFile">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.FileWriterPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="pathAttribute">path</rec:Val>
      <rec:Val key="contentAttachment">content</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.PushRecordsPipelet

Description

Sends all current records to another (asynchronous) job.

The records are not removed from the pipeline - thus a following pipelet in the current pipeline will process the records as well.

Configuration

Property Type Read Type Description
type String init The name of the target job.

Example

This example sends all current records to the job "TheOtherJob".

<extensionActivity>
  <proc:invokePipelet name="callJob">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.PushRecordsPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="job">TheOtherJob</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.JSONReaderPipelet

Description

Fills attributes of the record from a JSON string.

It is not possible to overwrite the record id of the record, even if a key "_recordid" exists in the JSON string.

Configuration

Property Type Read Type Description
inputType String : ATTACHMENT, ATTRIBUTE init selects if the JSON string is found in an attachment or attribute of the record
inputName String init name of the input attachment or input attribute that contains the JSON string
outputAttribute String init the optional name of the attribute in the record where the generated object is put into. If no attribute is specified and the object is a map, all contained attributes are written to the current record.

Examples

The following examples use this input object:

{ "jsonString": "{\"attribute1\": \"value1\"}" }


This example unwraps the contents of the attribute "jsonString" into the attribute "jsonObject":

<extensionActivity>
  <proc:invokePipelet name="readJSON">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.JSONReaderPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="inputType">ATTRIBUTE</rec:Val>
      <rec:Val key="inputName">jsonString</rec:Val>
      <rec:Val key="outputAttribute">jsonString</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

The result would be:

{ 
  "jsonString": "{\"attribute1\": \"value1\"}",
  "jsonObject": { 
     "attribute1": "value1"
  }
}


This example unwraps the contents of the attribute "jsonString" into the object itself:

<extensionActivity>
  <proc:invokePipelet name="readJSON">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.JSONReaderPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="inputType">ATTRIBUTE</rec:Val>
      <rec:Val key="inputName">jsonString</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>

The result would be:

{ 
  "jsonString": "{\"attribute1\": \"value1\"}",
  "attribute1": "value1"
}

org.eclipse.smila.processing.pipelets.JSONWriterPipelet

Description

Writes some or all attributes of the record into a JSON string.

Configuration

Property Type Read Type Description
inputAttributes String/Sequence of String init the names of the attributes in the record that contain the objects to write into JSON. If nothing is given, the whole record is used. If only a string is given, the content of that attribute is used.
outputType String : ATTACHMENT, ATTRIBUTE init selects if the JSON string is written to an attachment or attribute of the record
outputName String init name of the target attachment or attribute
printPretty Boolean init Indicates to format the output for better readability, defaults to true.

Examples

This example writes the content of attribute "a1" into the attribute "value" without any whitespaces:

<extensionActivity>
  <proc:invokePipelet name="writeJSON">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.JSONWriterPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="inputAttributes">a1</rec:Val>
      <rec:Val key="outputType">ATTRIBUTE</rec:Val>
      <rec:Val key="outputName">value</rec:Val>
      <rec:Val key="printPretty" type="boolean">false</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>
input : { "a1": [ 1 ], "a2": 2 }
result : { "a1": [ 1 ], "a2": 2, "value": "[1]" }

This example appends the whole object to the file "records.log":

<extensionActivity>
  <proc:invokePipelet name="createJSONLogEntry">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.JSONWriterPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="outputType">ATTACHMENT</rec:Val>
      <rec:Val key="outputName">jsonLog</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>
<extensionActivity>
  <proc:invokePipelet name="createJSONFileName">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.SetValuePipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="outputAttribute">jsonFile</rec:Val>
      <rec:Val key="value">records.log</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>
<extensionActivity>
  <proc:invokePipelet name="appendToJSONLog">
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.FileWriterPipelet" />
    <proc:variables input="request" />
    <proc:configuration>
      <rec:Val key="pathAttribute">jsonFile</rec:Val>
      <rec:Val key="contentAttachment">jsonLog</rec:Val>
      <rec:Val key="append" type="boolean">true</rec:Val>
    </proc:configuration>
  </proc:invokePipelet>
</extensionActivity>