Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "SMILA/Documentation/AperturePipelet"
m |
(→Description) |
||
Line 2: | Line 2: | ||
=== Description === | === Description === | ||
− | This Pipelet converts various document formats (like PDF,XLS, etc.) to plain text using [ | + | This Pipelet converts various document formats (like PDF,XLS, etc.) to plain text using [[SMILA/Glossary#Aperture|Aperture]] technology. |
It converts the document's content in ''AttachmentContent'' and stores the plain text result in ''AttachmentText''. The optional MimeType of ''AttachmentContent'' in ''AttachmentMimeType'' is used for conversion. If no MimeType is provided a MimeType identification is done inside the Pipelet using a '''MimeTypeIdentifier''' service. | It converts the document's content in ''AttachmentContent'' and stores the plain text result in ''AttachmentText''. The optional MimeType of ''AttachmentContent'' in ''AttachmentMimeType'' is used for conversion. If no MimeType is provided a MimeType identification is done inside the Pipelet using a '''MimeTypeIdentifier''' service. | ||
Revision as of 05:29, 13 November 2008
Contents
Bundle: org.eclipse.eilf.processing.pipelets.aperture.AperturePipelet
Description
This Pipelet converts various document formats (like PDF,XLS, etc.) to plain text using Aperture technology. It converts the document's content in AttachmentContent and stores the plain text result in AttachmentText. The optional MimeType of AttachmentContent in AttachmentMimeType is used for conversion. If no MimeType is provided a MimeType identification is done inside the Pipelet using a MimeTypeIdentifier service.
Configuration
Configuration File: configuration/org.eclipse.eilf.processing.pipelets.aperture/ConverterConfig.xml
Property | Type | Description |
---|---|---|
AttachmentContent | String | name of the attachment containing the original document content |
AttachmentText | String | name of the attachment to store the converted text in |
AttachmentMimeType | String | name of the attribute containing the MimeType of the original document content |
Note that all properties are required and must be provided.
Example
The following example was used in the SMILA example application to convert documents delivered by Filesystem- and WebCrawler to plain text.
ConverterConfig.xml
<PipeletConfiguration xmlns="http://www.eclipse.org/eilf/processor"> <Property name="AttachmentContent"> <Value>Content</Value> </Property> <Property name="AttachmentText"> <Value>Text</Value> </Property> <Property name="AttachmentMimeType"> <Value>MimeType</Value> </Property> </PipeletConfiguration>