Chapter 7. File Filters

1. File filters dialog
2. Filter options
3. Edit filter dialog
3.1. Source file type, filename pattern
3.2. Source and Target file encoding
3.3. Target filename

OmegaT features highly customizable filters, enabling you to configure numerous aspects. File filters are pieces of code capable of:

To see which file formats can be handled by OmegaT, see the menu Options > File Filters ...

Most users will find the default file filter options sufficient. If this is not the case, open the main dialog by selecting Options → File Filters... from the main menu. You can also enable project-specific file filters, which will only be used on the current project, by selecting the File Filters... option in Project Properties.

You can enable project specific filters via the Project → Properties.... Click on File Filters button and activate the check box Enable project specific filters. A copy of the filters configuration will be stored with the project in this case. If you later change filters, only the project filters will be updated, while the user filters stay unchanged.

Warning! Should you change filter options whilst a project is open, you must reload the project in order for the changes to take effect.

1. File filters dialog

This dialog lists available file filters, where the filters used by the current project are displayed in bold. Should you wish not to use OmegaT to translate files of a certain type, you can turn off the corresponding filter by deactivating the check box beside its name. OmegaT will then omit the appropriate files while loading projects, and will copy them unmodified when creating target documents. When you wish to use the filter again, just tick the check box. Click Defaults to reset the file filters to the default settings. To edit which files in which encodings the filter is to process, select the filter from the list and click Edit.

The dialog allows to enable or disable the following options:

  • Remove leading and trailing tags: uncheck this option to display all the tags including the leading and trailing ones. Warning: in Microsoft Open XML formats (docx, xlsx, etc.), if all tags are displayed, DO NOT write text before the first tag (it is a technical tag that must always begin the segment).

  • Remove leading and trailing whitespace in non-segmented projects: by default, OmegaT removes leading and trailing whitespace. In non-segmented projects, it is possible to keep it by unchecking this option.

  • Preserve spaces for all tags: check this option if the source documents contain significant spaces (for layout matters) that must not be ignored.

  • Ignore file context when identifying segments with alternate translations: by default, OmegaT uses the source file name as part of the identification of an alternate translation. if the option is checked, the source file name will not be used, and alternative translations will take effect in any file as long as the other context (previous/next segments, or some sort of ID depending on the file format) matches.

2. Filter options

Several filters (Text files, XHTML files, HTML and XHTML files, OpenDocument files and Microsoft Open XML files) have one or more specific options. To modify the options select the filter from the list and click on Options. The available options are:

Text files

  • Paragraph segmentation on line breaks, empty lines or never:

    if sentence segmentation rules are active, the text will further be segmented according to the option selected here.

PO files

  • Allow blank translations in the target file:

    If on, when a PO segment (which may be a whole paragraph) is not translated, the translation will be empty in the target file. Technically speaking, the msgstr segment in the PO target file, if created, will be left empty. As this is the standard behavior for PO files, it is on by default. If the option is off, the source text will be copied to the target segment.

  • Skip PO header

    PO header will be skipped and left unchanged, if this option is checked.

  • Auto replace 'nplurals=INTEGER; plural=EXPRESSION;' in header

    The option allows OmegaT to override the specification in the PO file header and use the default for the selected target language.

XHTML Files

  • Translate the following attributes: the selected attributes will appear as segments in the Editor window.

  • Start a new paragraph on: the <br> HTML tag will constitute a paragraph for segmentation purposes.

  • Skip text matching regular expression: the text matching the regular expression gets skipped. It is shown rendered red in the tag validator. Text in source segment that matches is shown in italic.

  • Do not translate the content attribute of meta-tags ... : The following meta-tags will not be translated.

  • Do not translate the content of tags with the following attribute key-value pairs (separate with commas): a match in the list of key-value pairs will cause the content of tags to be ignored

    It is sometimes useful to be able make some tags untranslatable based on the value of attributes. For example, <div class="hide"> <span translate="no"> You can define key-value pairs for tags to be left untranslated. For the example above, the field would contain: class=hide, translate=no

Microsoft Office Open XML files

You can select which elements are to be translated. They will appear as separate segments in the translation.

  • Word: non-visible instruction text, comments, footnotes, endnotes, footers

  • Excel: comments, sheet names

  • Power Point: slide comments, slide masters, slide layouts

  • Global: charts, diagrams, drawings, WordArt

  • Other Options:

    • Aggregate tags: if checked, tags without translatable text between them will be aggregated into single tags.

    • Preserve spaces for all tags: if checked, "white space" (i.e., spaces and newlines) will be preserved, even if not set technically in the document

HTML and XHTML files

  • Add or rewrite encoding declaration in HTML and XHTML files: frequently the target files must have the encoding character set different from the one in the source file (wether it is explicitly defined or implied). Using this option the translator can specify, whether the target files are to have the encoding declaration included. For instance, if the file filter specifies UTF8 as the encoding scheme for the target files, selecting Always will assure that this information is included in the translated files.

  • Translate the following attributes: the selected attributes will appear as segments in the Editor window.

  • Start a new paragraph on: the <br> HTML tag will constitute a paragraph for segmentation purposes.

  • Skip text matching regular expression: the text matching the regular expression gets skipped. It is shown rendered red in the tag validator. Text in source segment that matches is shown in italic.

  • Do not translate the content attribute of meta-tags ... : The following meta-tags will not be translated.

  • Do not translate the content of tags with the following attribute key-value pairs (separate with commas): a match in the list of key-value pairs will cause the content of tags to be ignored

    It is sometimes useful to be able make some tags untranslatable based on the value of attributes. For example, <div class="hide"> <span translate="no"> You can define key-value pairs for tags to be left untranslated. For the example above, the field would contain: class=hide, translate=no

  • Compress whitespace in translated document: multiple continuous whitespaces will be converted into one single whitespace in translated document.

  • Remove HTML comments in translated document: all commented parts (between <!-- and -->) won't be copied in the translated document.

Open Document Format (ODF) files

  • You can select which of the following items are to be translated:

    index entries, bookmarks, bookmark references, notes, comments, presentation notes, links (URL), sheet names

3. Edit filter dialog

This dialog enables you to set up the source filename patterns of files to be processed by the filter, customize the filenames of translated files, and select which encodings should be used for loading the file and saving its translated counterpart. To modify a file filter pattern, either modify the fields directly or click Edit. To add a new file filter pattern, click Add. The same dialog is used to add a pattern or to edit a particular pattern. The dialog is useful because it includes a special target filename pattern editor with which you can customize the names of output files.

3.1. Source file type, filename pattern

When OmegaT encounters a file in its source folder, it attempts to select the filter based upon the file's extension. More precisely, OmegaT attempts to match each filter's source filename patterns against the filename. For example, the pattern *.xhtml matches any file with the .xhtml extension. If the appropriate filter is found, the file is assigned to it for processing. For example, by default, XHTML filters are used for processing files with the .xhtml extension. You can change or add filename patterns for files to be handled by each file filter. Source filename patterns use wild card characters similar to those used in Searches. The '*' character matches zero or more characters. The '?' character matches exactly one character. All other characters represent themselves. For example, if you wish the text filter to handle readme files (readme, read.me, and readme.txt) you should use the pattern read*.

3.2. Source and Target file encoding

Only a limited number of file formats specify a mandatory encoding. File formats that do not specify their encoding will use the encoding you set up for the extension that matches their name. For example, by default .txt files will be loaded using the default encoding of your operating system. You may change the source encoding for each different source filename pattern. Such files may also be written out in any encoding. By default, the translated file encoding is the same as the source file encoding. Source and target encoding fields use combo boxes with all supported encodings included. <auto> leaves the encoding choice to OmegaT. This is how it works:

  • OmegaT identifies the source file encoding by using its encoding declaration, if present (HTML files, XML based files)

  • OmegaT is instructed to use a mandatory encoding for certain file formats (Java properties etc)

  • OmegaT uses the default encoding of the operating system for text files.

3.3. Target filename

Sometimes you may wish to rename the files you translate automatically, for example adding a language code after the file name. The target filename pattern uses a special syntax, so if you wish to edit this field, you must click Edit...and use the Edit Pattern Dialog. If you wish to revert to default configuration of the filter, click Defaults. You may also modify the name directly in the target filename pattern field of the file filters dialog. The Edit Pattern Dialog offers among others the following options:

  • Default is ${filename}– full filename of the source file with extension: in this case the name of the translated file is the same as that of the source file.

  • ${nameOnly}– allows you to insert only the name of the source file without the extension.

  • ${extension} - the original file extension

  • ${targetLocale}– target locale code (of a form "xx_YY").

  • ${targetLanguage}– the target language and country code together (of a form "XX-YY").

  • ${targetLanguageCode} – the target language - only "XX"

  • ${targetCountryCode}– the target country - only "YY"

  • ${timestamp-????} – system date time at generation time in various patterns

    See Oracle documentation for examples of the "SimpleDateFormat" patterns

  • ${system-os-name} - operating system of the computer used

  • ${system-user-name} - system user name

  • ${system-host-name} - system host name

  • ${file-source-encoding} - source file encoding

  • ${file-target-encoding} - target file encoding

  • ${targetLocaleLCID} - Microsoft target locale

Additional variants are available for variables ${nameOnly} and ${Extension}. In case the file name has ambivalent name, one can apply variables of the form ${name only-extension number} and ${extension-extension number} . If for example the original file is named Document.xx.docx, the following variables will give the following results:

  • ${nameOnly-0} Document

  • ${nameOnly-1} Document.xx

  • ${nameOnly-2} Document.xx.docx

  • ${extension-0} docx

  • ${extension-1} xx.docx

  • ${extension-2} Document.xx.docx