Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/AperturePipelet"

(New page: == Bundle: <tt>org.eclipse.eilf.processing.pipelets.aperture.AperturePipelet</tt> == === Description === This Pipelet converts various document formats (like PDF,XLS, etc.) to plain text ...)
 
Line 1: Line 1:
 +
 
== Bundle: <tt>org.eclipse.eilf.processing.pipelets.aperture.AperturePipelet</tt> ==
 
== Bundle: <tt>org.eclipse.eilf.processing.pipelets.aperture.AperturePipelet</tt> ==
  
Line 7: Line 8:
 
=== Configuration ===
 
=== Configuration ===
  
* <tt>configuration/org.eclipse.eilf.processing.pipelets.aperture/ConverterConfig.xml</tt>
+
Configuration File: <tt>configuration/org.eclipse.eilf.processing.pipelets.aperture/ConverterConfig.xml</tt>
  
 
{| border = 1
 
{| border = 1
Line 18: Line 19:
 
|AttachmentMimeType||String||name of the attribute containing the MimeType of the original document content
 
|AttachmentMimeType||String||name of the attribute containing the MimeType of the original document content
 
|-
 
|-
Note that all properties are required and must be provided.
 
 
|}
 
|}
 +
 +
Note that all properties are required and must be provided.
 +
 
==== Example ====
 
==== Example ====
  

Revision as of 07:01, 12 August 2008

Bundle: org.eclipse.eilf.processing.pipelets.aperture.AperturePipelet

Description

This Pipelet converts various document formats (like PDF,XLS, etc.) to plain text using [Aperture|Glossary#Aperture] technology. It converts the document's content in AttachmentContent and stores the plain text result in AttachmentText. The optional MimeType of AttachmentContent in AttachmentMimeType is used for conversion. If no MimeType is provided a MimeType identification is done inside the Pipelet using a MimeTypeIdentifier service.

Configuration

Configuration File: configuration/org.eclipse.eilf.processing.pipelets.aperture/ConverterConfig.xml

Property Type Description
AttachmentContent String name of the attachment containing the original document content
AttachmentText String name of the attachment to store the converted text in
AttachmentMimeType String name of the attribute containing the MimeType of the original document content

Note that all properties are required and must be provided.

Example

The following example was used in the EILF example application to convert documents delivered by Filesystem- and WebCrawler to plain text.

ConverterConfig.xml

<PipeletConfiguration xmlns="http://www.eclipse.org/eilf/processor">
  <Property name="AttachmentContent">
    <Value>Content</Value>
  </Property>
  <Property name="AttachmentText">
    <Value>Text</Value>
  </Property>
  <Property name="AttachmentMimeType">
    <Value>MimeType</Value>
  </Property>  
</PipeletConfiguration>