OmegaT features highly customizable filters, enabling you to configure numerous aspects. File filters are pieces of code capable of:
Reading the document in some specific file format. For instance, plain text files.
Extracting the translatable content out of the file.
Automating modifications of the translated document file names by replacing translatable contents with its translation.
To see which file formats can be handled by OmegaT, see the menu
Most users will find the default file filter options sufficient. If this is not the case, open the main dialog by selecting Options → File Filters... from the main menu. You can also enable project-specific file filters, which will only be used on the current project, by selecting the File Filters... option in Project Properties.
You can enable project specific filters via the Project → Properties.... Click on button and activate the check box . A copy of the filters configuration will be stored with the project in this case. If you later change filters, only the project filters will be updated, while the user filters stay unchanged.
Warning! Should you change filter options whilst a project is open, you must reload the project in order for the changes to take effect.
This dialog lists available file filters, where the filters used by the current project are displayed in bold. Should you wish not to use OmegaT to translate files of a certain type, you can turn off the corresponding filter by deactivating the check box beside its name. OmegaT will then omit the appropriate files while loading projects, and will copy them unmodified when creating target documents. When you wish to use the filter again, just tick the check box. Click Defaults to reset the file filters to the default settings. To edit which files in which encodings the filter is to process, select the filter from the list and click Edit.
The dialog allows to enable or disable the following options:
Remove leading and trailing tags: uncheck this option to display all the tags including the leading and trailing ones. Warning: in Microsoft Open XML formats (docx, xlsx, etc.), if all tags are displayed, DO NOT write text before the first tag (it is a technical tag that must always begin the segment).
Remove leading and trailing whitespace in non-segmented projects: by default, OmegaT removes leading and trailing whitespace. In non-segmented projects, it is possible to keep it by unchecking this option.
Preserve spaces for all tags: check this option if the source documents contain significant spaces (for layout matters) that must not be ignored.
Ignore file context when identifying segments with alternate translations: by default, OmegaT uses the source file name as part of the identification of an alternate translation. if the option is checked, the source file name will not be used, and alternative translations will take effect in any file as long as the other context (previous/next segments, or some sort of ID depending on the file format) matches.
Several filters (Text files, XHTML files, HTML and XHTML files, OpenDocument files and Microsoft Open XML files) have one or more specific options. To modify the options select the filter from the list and click on Options. The available options are:
Text files
Paragraph segmentation on line breaks, empty lines or never:
if sentence segmentation rules are active, the text will further be segmented according to the option selected here.
PO files
Allow blank translations in the target file:
If on, when a PO segment (which may be a whole paragraph) is not
translated, the translation will be empty in the target file.
Technically speaking, the msgstr
segment in the PO target
file, if created, will be left empty. As this is the standard behavior
for PO files, it is on by default. If the option is off, the source
text will be copied to the target segment.
Skip PO header
PO header will be skipped and left unchanged, if this option is checked.
Auto replace 'nplurals=INTEGER; plural=EXPRESSION;' in header
The option allows OmegaT to override the specification in the PO file header and use the default for the selected target language.
XHTML Files
Translate the following attributes: the selected attributes will appear as segments in the Editor window.
Start a new paragraph on: the <br> HTML tag will constitute a paragraph for segmentation purposes.
Skip text matching regular expression: the text matching the regular expression gets skipped. It is shown rendered red in the tag validator. Text in source segment that matches is shown in italic.
Do not translate the content attribute of meta-tags ... : The following meta-tags will not be translated.
Do not translate the content of tags with the following attribute key-value pairs (separate with commas): a match in the list of key-value pairs will cause the content of tags to be ignored
It is sometimes useful to be able make some tags untranslatable
based on the value of attributes. For example, <div
class="hide"> <span translate="no">
You can define
key-value pairs for tags to be left untranslated. For the example
above, the field would contain: class=hide, translate=no
Microsoft Office Open XML files
You can select which elements are to be translated. They will appear as separate segments in the translation.
Word: non-visible instruction text, comments, footnotes, endnotes, footers
Excel: comments, sheet names
Power Point: slide comments, slide masters, slide layouts
Global: charts, diagrams, drawings, WordArt
Other Options:
Aggregate tags: if checked, tags without translatable text between them will be aggregated into single tags.
Preserve spaces for all tags: if checked, "white space" (i.e., spaces and newlines) will be preserved, even if not set technically in the document
HTML and XHTML files
Add or rewrite encoding declaration in HTML and XHTML files: frequently the target files must have the encoding character set different from the one in the source file (wether it is explicitly defined or implied). Using this option the translator can specify, whether the target files are to have the encoding declaration included. For instance, if the file filter specifies UTF8 as the encoding scheme for the target files, selecting Always will assure that this information is included in the translated files.
Translate the following attributes: the selected attributes will appear as segments in the Editor window.
Start a new paragraph on: the <br> HTML tag will constitute a paragraph for segmentation purposes.
Skip text matching regular expression: the text matching the regular expression gets skipped. It is shown rendered red in the tag validator. Text in source segment that matches is shown in italic.
Do not translate the content attribute of meta-tags ... : The following meta-tags will not be translated.
Do not translate the content of tags with the following attribute key-value pairs (separate with commas): a match in the list of key-value pairs will cause the content of tags to be ignored
It is sometimes useful to be able make some tags untranslatable
based on the value of attributes. For example, <div
class="hide"> <span translate="no">
You can define
key-value pairs for tags to be left untranslated. For the example
above, the field would contain: class=hide, translate=no
Compress whitespace in translated document: multiple continuous whitespaces will be converted into one single whitespace in translated document.
Remove HTML comments in translated document: all commented parts (between <!-- and -->) won't be copied in the translated document.
Open Document Format (ODF) files
You can select which of the following items are to be translated:
index entries, bookmarks, bookmark references, notes, comments, presentation notes, links (URL), sheet names
This dialog enables you to set up the source filename patterns of files to be processed by the filter, customize the filenames of translated files, and select which encodings should be used for loading the file and saving its translated counterpart. To modify a file filter pattern, either modify the fields directly or click Edit. To add a new file filter pattern, click Add. The same dialog is used to add a pattern or to edit a particular pattern. The dialog is useful because it includes a special target filename pattern editor with which you can customize the names of output files.
When OmegaT encounters a file in its source folder, it attempts to
select the filter based upon the file's extension. More precisely,
OmegaT attempts to match each filter's source filename patterns against
the filename. For example, the pattern *.xhtml
matches any file with the .xhtml
extension.
If the appropriate filter is found, the file is assigned to it for
processing. For example, by default, XHTML filters are used for
processing files with the .xhtml extension. You can change or add
filename patterns for files to be handled by each file filter. Source filename
patterns use wild card characters similar to those used in Searches. The '*' character matches zero or more
characters. The '?' character matches exactly one character. All other
characters represent themselves. For example, if you wish the text
filter to handle readme files (readme, read.me
, and
readme.txt
) you should use the pattern
read*
.
Only a limited number of file formats specify a mandatory
encoding. File formats that do not specify their encoding will use the
encoding you set up for the extension that matches their name. For
example, by default .txt
files will be loaded using
the default encoding of your operating system. You may change the source
encoding for each different source filename pattern. Such files may also
be written out in any encoding. By default, the translated file encoding
is the same as the source file encoding. Source and target encoding
fields use combo boxes with all supported encodings included.
<auto> leaves the encoding choice to
OmegaT. This is how it works:
OmegaT identifies the source file encoding by using its encoding declaration, if present (HTML files, XML based files)
OmegaT is instructed to use a mandatory encoding for certain file formats (Java properties etc)
OmegaT uses the default encoding of the operating system for text files.
Sometimes you may wish to rename the files you translate automatically, for example adding a language code after the file name. The target filename pattern uses a special syntax, so if you wish to edit this field, you must click Edit...and use the Edit Pattern Dialog. If you wish to revert to default configuration of the filter, click Defaults. You may also modify the name directly in the target filename pattern field of the file filters dialog. The Edit Pattern Dialog offers among others the following options:
Default is ${filename}
– full filename of
the source file with extension: in this case the name of the
translated file is the same as that of the source file.
${nameOnly}
– allows you to insert only the
name of the source file without the extension.
${extension}
- the original file
extension
${targetLocale}
– target locale code (of a
form "xx_YY").
${targetLanguage}
– the target language and
country code together (of a form "XX-YY").
${targetLanguageCode}
– the target language
- only "XX"
${targetCountryCode}
– the target country -
only "YY"
${timestamp-????}
– system date time at
generation time in various patterns
See Oracle documentation for examples of the "SimpleDateFormat" patterns
${system-os-name}
- operating system of the
computer used
${system-user-name}
- system user
name
${system-host-name}
- system host
name
${file-source-encoding}
- source file
encoding
${file-target-encoding}
- target file
encoding
${targetLocaleLCID}
- Microsoft target
locale
Additional variants are available for variables ${nameOnly} and
${Extension}. In case the file name has ambivalent name, one can apply
variables of the form ${name only
-extension
number} and
${extension
-extension number} .
If for example the original file is named Document.xx.docx, the
following variables will give the following results:
${nameOnly-0}
Document
${nameOnly-1}
Document.xx
${nameOnly-2}
Document.xx.docx
${extension-0}
docx
${extension-1}
xx.docx
${extension-2}
Document.xx.docx