Mime

Name

Mime -- defines external parser for given mime-type

indexer.conf

Synopsis

Mime {from_mime} {to_mime} {command line} [content]

Description

This is used to add support for parsing documents with mime types other than text/plain, text/html or text/xml. It can be done via external parser (which must provide output in plain or html or xml) or just by substituting mime type so indexer will understand it.

from_mime and to_mime are standard mime types. to_mime is either text/plain or text/html or text/xml

to_mime can have charset part. If charset part is omitted, the parser output is considered to be in LocalCharset.

By default, when executing a parser, indexer sends data its stdin and reads results from its stdout.

Some parsers can not operate on stdin and need a file. Command line may have $1 parameter which stands for a temporary file name. If $1 is specified, then before executing a parser indexer creates a temporary file and substitutes its name in the place of $1.

Command line can also use variables, for example ${URL} or ${Content-Type}. See the list of all available variables in "indexer -v6" output, in the lines having "Response." in their beginning.

The fourth parameter "content" is optional. It can specify what kind of data is sent to the parser. By default, indexer sends raw document content. Using the fourth parameter you can mix document content with other kind of data, for example, URL or HTTP headers, using the same notation with the "command line" parameter. Raw content is available as "${HTTP.Content}".

Note: To make ${HTTP.Content} available, use "Section HTTP.Content 0 0" command.

Examples


Mime application/msword      "text/plain; charset=cp1251"  "catdoc $1"
Mime application/x-troff-man  text/plain                    "deroff"
Mime text/x-postscript        text/plain                    "ps2ascii"
Mime application/pdf          text/plain                    "pdftotext $1 -"
Mime application/vnd.ms-excel text/plain                    "xls2csv $1"
Mime "text/rtf*"              text/html                     "rthc --use-stdout $1 2>/dev/null"

# A parser example with variables in its command line
Mime application/mytype       text/html    "myparser -u ${URL} -t ${Content-Type} $1"

# Mixing content with URL and HTTP headers
Section HTTP.Content 0 0
Mime application/mytype2      text/html    "myparser2"   "${URL} # ${Content-Type} # ${HTTP.Content}"

See also

AddType, DefaultContentType, UseRemoteContentType.