Difference between revisions of "SMILA/Documentation/Importing/CompoundExtractorService"

Revision as of 08:17, 20 February 2012

CompoundExtractor Service

Interface: org.eclipse.smila.importing.CompoundExtractor

A CompoundExtractor service provides two kinds of methods:

check if an object's filename, URL or mimetype idenfifies it as a compound object that can be extracted by the service.
extract the compound: Given an InputStream with the compound content produce records for the elements.

The element records can contain the following attributes:

fileName: the complete name of the entry in the compound object, usually something like a filesystem path
isCompound: true, if the element is a supported compound object itself.
size: uncompressed size of the element
time: last modification timestamp, as a datetime value.
compounds: a sequence of the compound files to look into to reach this element. For example, if the compound /data/compound.zip contains a file archived/subcompound.zip which contain a file x.html, the compounds list for x.html would be:
```
[/data/compound.zip, archived/subcompound.zip]
```
compressedSize: compressed size of the element
comment: a comment for the element in the compound (if supported by the compound type)
isRootCompound: set to true if the record describes the processed compound object itself.

SimpleCompoundExtractorService

Bundle: org.eclipse.smila.importing.compounds.simple

This extractor service uses the classes provided by the JDK's java.util.zip package to extract compound objects. This means that it can currently support ZIP files and GZ files (not TAR.GZ, though).

Supported Mimetypes:

application/zip
application/x-gunzip
application/x-gzip

If the mimetype is not provided by the caller at all or it is only application/octet-stream it uses the current MimeType Identifier service to recognize the real mimetype from the filename extension.

For ZIP files, it creates one record for the ZIP file itself and one record for each contained element.

For GZ it creates one record with the original filename of the GZ file, but the uncompressed content.

@@ Line 12: / Line 12: @@
 * <tt>size</tt>: uncompressed size of the element
 * <tt>time</tt>: last modification timestamp, as a datetime value.
-* <tt>compounds</tt>: a sequence of the compound files to look into to reach this element. For example, if the compound <tt>/data/compound.zip</tt> contains a file <tt>archived/subcompound.zip</tt> which contain a file <tt>x.html</tt>, the <tt>compounds</tt> list for <tt>x.html</tt> would be <pre>[/data/compound.zip, archived/subcompound.zip]</pre>.
+* <tt>compounds</tt>: a sequence of the compound files to look into to reach this element. For example, if the compound <tt>/data/compound.zip</tt> contains a file <tt>archived/subcompound.zip</tt> which contain a file <tt>x.html</tt>, the <tt>compounds</tt> list for <tt>x.html</tt> would be: <pre>[/data/compound.zip, archived/subcompound.zip]</pre>
 * <tt>compressedSize</tt>: compressed size of the element
 * <tt>comment</tt>: a comment for the element in the compound (if supported by the compound type)

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "SMILA/Documentation/Importing/CompoundExtractorService"

Revision as of 08:17, 20 February 2012

CompoundExtractor Service

SimpleCompoundExtractorService