Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Documentation/AperturePipelet
< SMILA | Documentation
Revision as of 07:00, 12 August 2008 by Juergen.schumacher.empolis.com (Talk | contribs) (New page: == Bundle: <tt>org.eclipse.eilf.processing.pipelets.aperture.AperturePipelet</tt> == === Description === This Pipelet converts various document formats (like PDF,XLS, etc.) to plain text ...)
Contents
Bundle: org.eclipse.eilf.processing.pipelets.aperture.AperturePipelet
Description
This Pipelet converts various document formats (like PDF,XLS, etc.) to plain text using [Aperture|Glossary#Aperture] technology. It converts the document's content in AttachmentContent and stores the plain text result in AttachmentText. The optional MimeType of AttachmentContent in AttachmentMimeType is used for conversion. If no MimeType is provided a MimeType identification is done inside the Pipelet using a MimeTypeIdentifier service.
Configuration
- configuration/org.eclipse.eilf.processing.pipelets.aperture/ConverterConfig.xml
Property | Type | Description |
---|---|---|
AttachmentContent | String | name of the attachment containing the original document content |
AttachmentText | String | name of the attachment to store the converted text in |
AttachmentMimeType | String | name of the attribute containing the MimeType of the original document content |
Example
The following example was used in the EILF example application to convert documents delivered by Filesystem- and WebCrawler to plain text.
ConverterConfig.xml
<PipeletConfiguration xmlns="http://www.eclipse.org/eilf/processor"> <Property name="AttachmentContent"> <Value>Content</Value> </Property> <Property name="AttachmentText"> <Value>Text</Value> </Property> <Property name="AttachmentMimeType"> <Value>MimeType</Value> </Property> </PipeletConfiguration>