Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/2011.Simplification/org.eclipse.smila.processing.pipelets"

(New page: == org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet == === Description === This pipelet is used to identify the mimetype of a document. It uses a <tt>org.eclipse.smila.proce...)
 
m (Proof-reading)
Line 2: Line 2:
  
 
=== Description ===
 
=== Description ===
This pipelet is used to identify the mimetype of a document.  
+
This pipelet is used to identify the MIME type of a document.  
It uses a <tt>org.eclipse.smila.processing.pipelets.mimetype.MimeTypeIdentifier</tt> service to perform the actual identification of the mimetype. Depending on what properties are specified the mime type is detected from the content or the file extension or both. If the identification does not return a mime type then, if configured, the service searches the metadata for a mimetype. The identified MimeType is store in an attribute in the record.
+
It uses an <tt>org.eclipse.smila.processing.pipelets.mimetype.MimeTypeIdentifier</tt> service to perform the actual identification of the MIME type. Depending on the specified properties, the MIME type is detected from the file content, from the file extension, or from both. If the identification does not return a MIME type - and if configured accordingly - the service will search the metadata for this information. The identified MIME type is then stored to an attribute in the record.
  
  
 
=== Configuration ===
 
=== Configuration ===
  
The pipelet is configured using the PipeletConfiguration section inside the invokePipelet activity in the BPEL file:
+
The pipelet is configured using the <tt><PipeletConfiguration></tt> section inside the <tt><invokePipelet></tt> activity of the corresponding BPEL file. It provides the following properties:
  
 
{| border = 1
 
{| border = 1
 
!Property!!Type!!Usage!!Description
 
!Property!!Type!!Usage!!Description
 
|-
 
|-
|FileExtensionAttribute||String||optional||name of the attribute containing the file extension
+
|''FileExtensionAttribute''||String||Optional||Name of the attribute containing the file extension
 
|-
 
|-
|ContentAttachment||String||optional||name of the attachment containing the file content
+
|''ContentAttachment''||String||Optional||Name of the attachment containing the file content
 
|-
 
|-
|MetaDataAttribute||String||optional||name of the attribute containing metadata information. e.g. a WebCrawler returns a response header containing mime type information
+
|''MetaDataAttribute''||String||Optional||Name of the attribute containing metadata information, e.g. a Web Crawler returns a response header containing applicable MIME type information
 
|-
 
|-
|MimeTypeAttribute||String||required||name of the attribute to store the identified MimeType in
+
|''MimeTypeAttribute''||String||Required||Name of the attribute to store the identified MIME type to
 
|}
 
|}
Note that at least one of the properties FileExtensionAttribute, ContentAttachment and MetaDataAttribute needs to be specified!
+
Note that at least one of the properties ''FileExtensionAttribute'', ''ContentAttachment'', and ''MetaDataAttribute'' must be specified!
  
 
==== Example ====
 
==== Example ====
  
The following example is used in the SMILA example application to identify MimeTypes of documents delivered by Filesystem- and WebCrawler.
+
The following example is used in the SMILA example application to identify the MIME types of documents that are delivered by the File System Crawler or Web Crawler.
  
 
'''addpipeline.bpel'''
 
'''addpipeline.bpel'''

Revision as of 11:22, 2 March 2011

org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet

Description

This pipelet is used to identify the MIME type of a document. It uses an org.eclipse.smila.processing.pipelets.mimetype.MimeTypeIdentifier service to perform the actual identification of the MIME type. Depending on the specified properties, the MIME type is detected from the file content, from the file extension, or from both. If the identification does not return a MIME type - and if configured accordingly - the service will search the metadata for this information. The identified MIME type is then stored to an attribute in the record.


Configuration

The pipelet is configured using the <PipeletConfiguration> section inside the <invokePipelet> activity of the corresponding BPEL file. It provides the following properties:

Property Type Usage Description
FileExtensionAttribute String Optional Name of the attribute containing the file extension
ContentAttachment String Optional Name of the attachment containing the file content
MetaDataAttribute String Optional Name of the attribute containing metadata information, e.g. a Web Crawler returns a response header containing applicable MIME type information
MimeTypeAttribute String Required Name of the attribute to store the identified MIME type to

Note that at least one of the properties FileExtensionAttribute, ContentAttachment, and MetaDataAttribute must be specified!

Example

The following example is used in the SMILA example application to identify the MIME types of documents that are delivered by the File System Crawler or Web Crawler.

addpipeline.bpel

<extensionActivity>
    <proc:invokePipelet name="detect MimeType">
        <proc:pipelet class="org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet" />
        <proc:variables input="request" output="request" />
        <proc:PipeletConfiguration>
          <proc:Property name="FileExtensionAttribute">
            <proc:Value>Extension</proc:Value>
          </proc:Property>  
          <proc:Property name="MetaDataAttribute">
            <proc:Value>MetaData</proc:Value>
          </proc:Property>
          <proc:Property name="MimeTypeAttribute">
            <proc:Value>MimeType</proc:Value>
          </proc:Property>    
        </proc:PipeletConfiguration>
    </proc:invokePipelet>
</extensionActivity>

Back to the top