Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Documentation/AperturePipelet
Contents
Bundle: org.eclipse.eilf.processing.pipelets.aperture.AperturePipelet
Description
This Pipelet converts various document formats (like PDF,XLS, etc.) to plain text using [Aperture|Glossary#Aperture] technology. It converts the document's content in AttachmentContent and stores the plain text result in AttachmentText. The optional MimeType of AttachmentContent in AttachmentMimeType is used for conversion. If no MimeType is provided a MimeType identification is done inside the Pipelet using a MimeTypeIdentifier service.
Configuration
Configuration File: configuration/org.eclipse.eilf.processing.pipelets.aperture/ConverterConfig.xml
Property | Type | Description |
---|---|---|
AttachmentContent | String | name of the attachment containing the original document content |
AttachmentText | String | name of the attachment to store the converted text in |
AttachmentMimeType | String | name of the attribute containing the MimeType of the original document content |
Note that all properties are required and must be provided.
Example
The following example was used in the SMILA example application to convert documents delivered by Filesystem- and WebCrawler to plain text.
ConverterConfig.xml
<PipeletConfiguration xmlns="http://www.eclipse.org/eilf/processor"> <Property name="AttachmentContent"> <Value>Content</Value> </Property> <Property name="AttachmentText"> <Value>Text</Value> </Property> <Property name="AttachmentMimeType"> <Value>MimeType</Value> </Property> </PipeletConfiguration>