Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/ApertureMimeTypeIdentifier"

Line 5: Line 5:
 
=== Description ===
 
=== Description ===
 
This ProcessingService is used to identify the mimetype of a document. The service uses either the document's content (a byte[]), a file extension or both. So it is not required that the record contains a value for both properties ''ContentAttachment'' and ''FileExtensionAttribute''. The identified MimeType is store in an attribute in the record.
 
This ProcessingService is used to identify the mimetype of a document. The service uses either the document's content (a byte[]), a file extension or both. So it is not required that the record contains a value for both properties ''ContentAttachment'' and ''FileExtensionAttribute''. The identified MimeType is store in an attribute in the record.
 +
 +
It is strongly recommended that you use both (input data and extension) to identify the mime type of the data, since the aperture mime type identification mainly focuses on the magic numbers in the file and so often fails to determine e.g. office documents' mime types when no conten is given.
 +
 +
For further information on the aperture mime type extraction please consult the apropriate [http://aperture.sourceforge.net/ Aperture] documentation pages (e.g. [http://sourceforge.net/apps/trac/aperture/wiki/MIMETypeIdentification MIMETypeIdentification]).
 +
 +
The javadoc for the implemented interface can be found [http://build.eclipse.org/rt/smila/javadoc/current/index.html?org/eclipse/smila/common/mimetype/package-summary.html here].
  
 
==== Useful Information ====
 
==== Useful Information ====
Line 12: Line 18:
 
=== Configuration ===
 
=== Configuration ===
  
* <tt>configuration/org.eclipse.smila.processing.pipelets.aperture/MimeTypeConfig.xml</tt>
+
For information on how to configure the mime type identification pipelet, which accesses the MimeTypeIdentifier service please refer to [[SMILA/Documentation/Bundle_org.eclipse.smila.processing.pipelets#Bundle:_org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet|MimeTypeIdentifyPipelet]].
 
+
{| border = 1
+
!Property!!Type!!Description
+
|-
+
|ContentAttachment||String||name of the attachment containing the document content
+
|-
+
|FileExtensionAttribute||String||name of the attribute containing the file extension
+
|-
+
|MimeTypeAttribute||String||name of the attribute to store the identified MimeType in
+
Note that all properties are required and must be provided.
+
|}
+
 
+
==== Example ====
+
 
+
The following example was used in the SMILA example application to identify MimeTypes of documents delivered by Filesystem- and WebCrawler.
+
 
+
'''MimeTypeConfig.xml'''
+
<source lang="xml">
+
<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
+
  <Property name="ContentAttachment">
+
    <Value>Content</Value>
+
  </Property>
+
  <Property name="FileExtensionAttribute">
+
    <Value>FileExtension</Value>
+
  </Property> 
+
  <Property name="MimeTypeAttribute">
+
    <Value>MimeType</Value>
+
  </Property>   
+
</PipeletConfiguration>
+
</source>
+
  
 
[[Category:SMILA]] [[Category:SMILA/Processing Service]]
 
[[Category:SMILA]] [[Category:SMILA/Processing Service]]

Revision as of 06:01, 16 September 2011

This component is not yet available in our repository. As soon as the new Aperture release is available we will submit appropriate CQs and hopefully get permission to use it in our project.

Bundle: org.eclipse.smila.processing.pipelets.aperture.ApertureMimeTypeIdentifier

Description

This ProcessingService is used to identify the mimetype of a document. The service uses either the document's content (a byte[]), a file extension or both. So it is not required that the record contains a value for both properties ContentAttachment and FileExtensionAttribute. The identified MimeType is store in an attribute in the record.

It is strongly recommended that you use both (input data and extension) to identify the mime type of the data, since the aperture mime type identification mainly focuses on the magic numbers in the file and so often fails to determine e.g. office documents' mime types when no conten is given.

For further information on the aperture mime type extraction please consult the apropriate Aperture documentation pages (e.g. MIMETypeIdentification).

The javadoc for the implemented interface can be found here.

Useful Information

Note that this ProcessingService also is a DeclarativeService that implements interface org.eclipse.smila.processing.pipelets.aperture.MimeTypeIdentifier and can be used outside the workflow as well.

Configuration

For information on how to configure the mime type identification pipelet, which accesses the MimeTypeIdentifier service please refer to MimeTypeIdentifyPipelet.

Back to the top