Document toolboxDocument toolbox

MIME types detection settings

In the MIME settings dialog it is possible to set the parameters affecting MIME detection engine. Besides the blacklist / whitelist functionality for selected MIME types, the engine also performs refinement of MIME types for all uploaded files (the original MIME type set by a browser may not be reliable). So if maximum accuracy of file MIME types is desired, this engine should be enabled, even if no blocking is set.

The following parameters can be set:

Maximum file size limit

Specifies the maximum file size for which to perform the check by the engine. The check is done in memory and in case of archives and other containers their whole contents are extracted into the memory. Therefore we do not recommend to set this value higher than approximately 10% of server’s memory.

Treat oversize as clean

If this is enabled and the check is skipped because the file is too big, the file is considered “clean” (like if the check was performed and found no problems). If this is disabled, then such file is considered “unclean” (like if the check was performed and found a problem) and the whole package will be quarantined.

Maximum number of stored records of detected objects

Some containers (typically archives like zip, etc.) may contain a very large amount of embedded files/objects and so the list of all found MIME types can be very long. This limit limits the maximum number of stored records of found MIME types for one file. This limit was introduced to prevent problems with displaying all found MIME types and also to prevent over-filling the database. The limit does not affect the detection of selected MIME types (all detected MIME types are searched, not just those that fit into the limit and are saved).

Detection mode

If set to blacklist mode, the packages with files matching the selected MIME types are quarantined. If set to whitelist mode, all packages with files not matching one of the selected MIME types are quarantined.

Selected file types

Sets the MIME file types, which the files should be checked for in whitelist or blacklist mode. One MIME type per one line. The list could be empty, which in blacklist mode causes no packages to be quarantined, but the detection is performed and the refined MIME types are set for each of the checked files.

MIME types may be entered in the form of regular expressions, so for example the following is a valid specification of a group of MIME types: “video/.*“, which should cover all known video formats.

Supported MIME types are listed here: https://raw.githubusercontent.com/apache/tika/master/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

For some types of content, deep detection is performed, scanning even nested files (for example files in archives, office documents, etc.). The types of these deeply scanned containers are listed here: https://tika.apache.org/1.22/formats.html.

A simple tip, how to discover a MIME type of some file, is to just send it through the SOFiE application and have a look at what MIME type was detected for the file.

Alternatively this can be found out from the above mentioned list of supported MIME types, where for example the *.avi file type has the following specification:

<mime-type type="video/x-msvideo"> <_comment>Audio Video Interleave File</_comment> <alias type="video/avi"/> <alias type="video/msvideo"/> <magic priority="50"> <match value="RIFF....AVI " type="string" offset="0" mask="0xFFFFFFFF00000000FFFFFFFF"/> <match offset="8" type="string" value="\x41\x56\x49\x20"/> </magic> <glob pattern="*.avi"/> </mime-type>

And so the MIME type of such file is video/x-msvideo.

Configuration example