Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/ApertureMimeTypeIdentifier"

m
 
(9 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<span style="color:#ff0000"> '''THIS BUNDLE IS NOT AVAILABLE, YET !!!'''</span>
+
<span style="color:#ff0000">'''This component is not available as we have switched from Aperture to Tika.'''</span>
  
== Bundle: <tt>org.eclipse.smila.processing.pipelets.aperture.ApertureMimeTypeIdentifier</tt> ==
+
== Class: <tt>org.eclipse.smila.aperture.ApertureMimeTypeIdentifier</tt> ==  
  
=== Description ===
+
Located in bundle: <tt>org.eclipse.smila.aperture</tt>
This ProcessingService is used to identify the mimetype of a document. The service uses either the document's content (a byte[]), a file extension or both. So it is not required that the record contains a value for both properties ''ContentAttachment'' and ''FileExtensionAttribute''. The identified MimeType is store in an attribute in the record.
+
  
==== Useful Information ====
+
=== Description ===
  
Note that this ProcessingService also is a DeclarativeService that implements interface <tt>org.eclipse.smila.processing.pipelets.aperture.MimeTypeIdentifier</tt> and can be used outside the workflow as well.
+
This service implements the [http://build.eclipse.org/rt/smila/javadoc/current/index.html?org/eclipse/smila/common/mimetype/package-summary.html MimeTypeIdentifier] interface using the "magic" identification of MIME types in [http://aperture.sourceforge.net/index.html Aperture]. The service uses either the document's content (a byte[]), a file extension or both. For best results, it is recommended to use both input data and extension to identify the MIME type of data, since the MIME type identification of Aperture mainly focuses on the magic numbers in the file and so often fails to determine e.g. office documents' MIME types when no content is given.
  
=== Configuration ===
+
For further information on the MIME type extraction in Aperture please consult the respective documentation pages (e.g. [http://sourceforge.net/apps/trac/aperture/wiki/MIMETypeIdentification MIMETypeIdentification]).
  
* <tt>configuration/org.eclipse.smila.processing.pipelets.aperture/MimeTypeConfig.xml</tt>
+
The JavaDoc of the implemented interface can be found [http://build.eclipse.org/rt/smila/javadoc/current/index.html?org/eclipse/smila/common/mimetype/package-summary.html here].
  
{| border = 1
+
To enable the service, start bundle <tt>org.eclipse.smila.aperture</tt> and get a OSGi service reference for interface <tt>org.eclipse.smila.common.mimetype.MimeTypeIdentifier</tt>. You should take care not to start the <tt>org.eclipse.smila.common.mimetype.impl</tt> bundle to ensure that the Aperture based implementation is used and not simplistic one that SMILA provides as a fallback. We have set the service rankings of those services such that the Aperture implementation should be preferred if both are running, but it's always better to be sure what happens in your system ;-)
!Property!!Type!!Description
+
|-
+
|ContentAttachment||String||name of the attachment containing the document content
+
|-
+
|FileExtensionAttribute||String||name of the attribute containing the file extension
+
|-
+
|MimeTypeAttribute||String||name of the attribute to store the identified MimeType in
+
Note that all properties are required and must be provided.
+
|}
+
  
==== Example ====
+
==== Interaction with the MimeTypeIdentifyPipelet ====
  
The following example was used in the SMILA example application to identify MimeTypes of documents delivered by Filesystem- and WebCrawler.
+
When the Aperture based <tt>MimeTypeIdentifier</tt> is started, it uses the <tt>org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet</tt> automatically (if no other MimeTypeIdentifier service with yet a higher service ranking is active, of course).
  
'''MimeTypeConfig.xml'''
+
For information on how to configure the MIME type identification pipelet, which uses the <tt>MimeTypeIdentifier</tt> service to recognize the MIME types of attachments please refer to [[SMILA/Documentation/Bundle_org.eclipse.smila.processing.pipelets#Bundle:_org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet|MimeTypeIdentifyPipelet]].
<source lang="xml">
+
<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
+
  <Property name="ContentAttachment">
+
    <Value>Content</Value>
+
  </Property>
+
  <Property name="FileExtensionAttribute">
+
    <Value>FileExtension</Value>
+
  </Property> 
+
  <Property name="MimeTypeAttribute">
+
    <Value>MimeType</Value>
+
  </Property>   
+
</PipeletConfiguration>
+
</source>
+
  
 
[[Category:SMILA]] [[Category:SMILA/Processing Service]]
 
[[Category:SMILA]] [[Category:SMILA/Processing Service]]

Latest revision as of 06:57, 11 January 2013

This component is not available as we have switched from Aperture to Tika.

Class: org.eclipse.smila.aperture.ApertureMimeTypeIdentifier

Located in bundle: org.eclipse.smila.aperture

Description

This service implements the MimeTypeIdentifier interface using the "magic" identification of MIME types in Aperture. The service uses either the document's content (a byte[]), a file extension or both. For best results, it is recommended to use both input data and extension to identify the MIME type of data, since the MIME type identification of Aperture mainly focuses on the magic numbers in the file and so often fails to determine e.g. office documents' MIME types when no content is given.

For further information on the MIME type extraction in Aperture please consult the respective documentation pages (e.g. MIMETypeIdentification).

The JavaDoc of the implemented interface can be found here.

To enable the service, start bundle org.eclipse.smila.aperture and get a OSGi service reference for interface org.eclipse.smila.common.mimetype.MimeTypeIdentifier. You should take care not to start the org.eclipse.smila.common.mimetype.impl bundle to ensure that the Aperture based implementation is used and not simplistic one that SMILA provides as a fallback. We have set the service rankings of those services such that the Aperture implementation should be preferred if both are running, but it's always better to be sure what happens in your system ;-)

Interaction with the MimeTypeIdentifyPipelet

When the Aperture based MimeTypeIdentifier is started, it uses the org.eclipse.smila.processing.pipelets.MimeTypeIdentifyPipelet automatically (if no other MimeTypeIdentifier service with yet a higher service ranking is active, of course).

For information on how to configure the MIME type identification pipelet, which uses the MimeTypeIdentifier service to recognize the MIME types of attachments please refer to MimeTypeIdentifyPipelet.