...
For some types of content, deep detection is performed, scanning even nested files (for example files in archives, office documents, etc.). The types of these deeply scanned containers are listed here: https://tika.apache.org/1.22/formats.html.
A simple tip, how to discover a MIME type of some file, is to just send it through the SOFiE application and have a look at what MIME type was detected for the file.
Alternatively this can be found out from the above mentioned list of supported MIME types, where for example the *.avi file type has the following specification:
Code Block | ||
---|---|---|
| ||
<mime-type type="video/x-msvideo">
<_comment>Audio Video Interleave File</_comment>
<alias type="video/avi"/>
<alias type="video/msvideo"/>
<magic priority="50">
<match value="RIFF....AVI " type="string" offset="0"
mask="0xFFFFFFFF00000000FFFFFFFF"/>
<match offset="8" type="string" value="\x41\x56\x49\x20"/>
</magic>
<glob pattern="*.avi"/>
</mime-type> |
And so the MIME type of such file is video/x-msvideo.
Configuration example
...