A CompoundExtractor service provides two kinds of methods:
- check if an object's filename, URL or mimetype idenfifies it as a compound object that can be extracted by the service.
- extract the compound: Given an InputStream with the compound content produce records for the elements.
The element records can contain the following attributes:
- fileName: the complete name of the entry in the compound object, usually something like a filesystem path
- isCompound: true, if the element is a supported compound object itself.
- size: uncompressed size of the element
- time: last modification timestamp, as a datetime value.
- compounds: a sequence of the compound files to look into to reach this element. For example, if the compound /data/compound.zip contains a file archived/subcompound.zip which contain a file x.html, the compounds list for x.html would be