OmegaT projects can have translation memory files - i.e. files with the extension tmx - in five different places:
The omegat folder contains the
project_save.tmx
and possibly a number of
backup TMX files. The project_save.tmx
file
contains all the segments that have been recorded in memory since
you started the project. This file always exists in the project.
Its contents will always be sorted alphabetically by the source
segment.
The main project folder contains 3 tmx files,
project_name-omegat.tmx
,
project_name-level1.tmx
and
project_name-level2.tmx
(project_name being
the name of your project).
The level1 file contains only textual information.
The level2 file encapsulates OmegaT specific tags in correct tmx tags so that the file can be used with its formatting information in a translation tool that supports tmx level 2 memories, or OmegaT itself.
The OmegaT file includes OmegaT specific formatting tags so that the file can be used in other OmegaT projects
These files are copies of the file
project_save.tmx
, i.e. of the project's main
translation memory, excluding the so-called orphan segments. They
carry appropriately changed names, so that its contents still
remain identifiable, when used elsewhere, for instance in the
tm
subfolder of some other project (see
below).
tm
folderThe /tm/ folder can contain any number of ancillary translation memories - i.e. tmx files. Such files can be created in any of the three varieties indicated above. Note that other CAT tools can export (and import as well) tmx files, usually in all three forms. The best thing of course is to use OmegaT-specific TMX files (see above), so that the in-line formatting within the segment is retained.
The contents of translation memories in the tm subfolder serve to generate suggestions for the text(s) to be translated. Any text, already translated and stored in those files, will appear among the fuzzy matches, if it is sufficiently similar to the text currently being translated.
If the source segment in one of the ancillary TMs is identical to the text being translated, OmegaT acts as defined in the [fuzzy], so that the translator can review the translations at a later stage and check whether the segments tagged this way, have been translated correctly (see the Editing behavior chapter)
→ dialog window. For instance (if the default is accepted), the translation from the ancillary TM is accepted and prefixed withIt may happen, that translation memories, available in the
tm
subfolder, contain segments with identical
source text, but differing targets. TMX files are read sorted by
their names and segments within a given TMX file line by line. The
last segment with the identical source text will thus prevail
(Note: of course it makes more sense to avoid this to happen in
the first place).
Note that the TMX files in the tm folder can be compressed with gzip.
If it is clear from the very start, that translations in a given TM (or TMs) are all correct, one can put them into the tm/auto folder and avoid confirming a lot of [fuzzy] cases.
Put the TMX in /tm/auto.
Open the project. The changes are displayed.
Make a slight change anywhere in the project. This
modifies project_save.tmx
(by adding
proper Translation Units from "auto" TMX)
Note: if TMX is removed from /tm/auto before step 3, no extra Translation Unit is added.
If you have no doubt that a TMX is more accurate than the
project_save.tmx
of OmegaT, put this TMX in
/tm/enforce to overwrite existing default translations
unconditionally.
Put the TMX in /tm/enforce.
Open the project. The changes are displayed.
Make a slight change anywhere in the project. This
modifies project_save.tmx
.
Make decision about immunity of the enforced segments:
If they don't need to stay immune from further changes, then remove the TMX from /tm/enforce.
If they need to stay immune from further changes, then keep the TMX in /tm/enforce.
Note: if TMX is removed from /tm/enforce before step 3, enforcements aren't kept at all.
In the editor pane, when a match is inserted from a TMX contained in a folder named mt, the background of the active segment is changed to red. The background is restored to normal when the segment is left.
Sometimes, it is useful to distinguish between high-quality translation memories and those that are, because of the subject matter, client, revision status, etc., less reliable. For translation memories in folders with a name "penalty-xxx" (with xxx between 0 and 100), matches will be degraded according to the name of the folder: a 100% match in any of TMs, residing in a folder called Penalty-30 for instance, will be lowered to a 70% match. The penalty applies to all three match percentages: matches 75, 80, 90 will in this case be lowered to 45, 50, 60.
Optionally, you can let OmegaT have an additional tmx file (OmegaT-style) anywhere you specify, containing all translatable segments of the project. See pseudo-translated memory below.
Note that all the translation memories are loaded into memory when
the project is opened. Back-ups of the project translation memory are
produced regularly (see next chapter), and
project_save.tmx
is also saved/updated when the
project is closed or loaded again. This means for instance that you do
not need to exit a project you are currently working on if you decide to
add another ancillary TM to it: you simply reload the project, and the
changes you have made will be included.
The locations of the various different translation memories for a given project are user-defined (see Project dialog window in Project properties)
Depending on the situation, different strategies are thus possible, for instance:
several projects on the same subject: keep the project structure, and change source and target folders (Source = source/order1, target = target/order1 etc). Note that you segments from order1, that are not present in order2 and other subsequent jobs, will be tagged as orphan segments; however, they will still be useful for getting fuzzy matches.
several translators working on the same
project: split the source files into source/Alice,
source/Bob... and allocate them to team members (Alice, Bob ...). They
can then create their own projects and, deliver their own
project_save.tmx
, when finished or when a given
milestone has been reached. The project_save.tmx
files are then collected and possible conflicts as regards terminology
for instance get resolved. A new version of the master TM is then
created, either to be put in team members'
tm/autosubfolders or to replace their
project_save.tmx
files. The team can also use the
same subfolder structure for the target files. This allows them for
instance to check at any moment, whether the target version for the
complete project is still OK
As you translate your files, OmegaT
stores your work continually in project_save.tmx
in
the project's /omegat
subfolder.
OmegaT also backups translation memory
to project_save.tmx.YEARMMDDHHNN.bak
in the same
subfolder whenever a project is opened or reloaded. YEAR is 4-digit
year, MM is a month, DD day of the month, HH and NN are hours and
minutes when the previous translation memory was saved.
If you believe you have lost translation data, follow the following procedure:
Close the project
Rename the current project_save.tmx
file
( e.g. to project_save.tmx.temporary
)
Select the backup translation memory that is most likely - e.g. the most recent one, or the last version from the day before) to contain the data you are looking for
Copy it to project_save.tmx
Open the project
Tmx files contain translation units, made of a number of equivalent segments in several languages. A translation unit comprises at least two translation unit variants (TUV). Either can be used as the source or target.
The settings in your project indicate which is the source and which the target language. OmegaT thus takes the TUV segments corresponding to the project's source and target language codes and uses them as the source and target segments respectively. OmegaT recognizes the language codes using the following two standard conventions :
2 letters (e.g. JA for Japanese), or
2- or 3-letter language code followed by the 2-letter country code (e.g. EN-US - See Appendix A, Languages - ISO 639 code list for a partial list of language and country codes).
If the project language codes and the tmx language codes fully match, the segments are loaded in memory. If languages match but not the country, the segments still get loaded. If neither the language code not the country code match, the segments will be ignored.
TMX files can generally contain translation units with several candidate languages. If for a given source segment there is no entry for the selected target language, all other target segments are loaded, regardless of the language. For instance, if the language pair of the project is DE-FR, it can be still be of some help to see hits in the DE-EN translation, if there's none in the DE-FR pair.
The file project_save.tmx
contains all the
segments that have been translated since you started the project. If you
modify the project segmentation or delete files from the source, some
matches may appear as orphan strings in
the Match Viewer: such matches refer to segments that do not exist any
more in the source documents, as they correspond to segments translated
and recorded before the modifications took place.
Initially, that is when the project is created, the main TM of the
project, project_save.tmx
is empty. This TM gradually
becomes filled during the translation. To speed up this process, existing
translations can be reused. If a given sentence has already been
translated once, and translated correctly, there is no need for it to be
retranslated. Translation memories may also contain reference
translations: multinational legislation, such as that of the European
Community, is a typical example.
When you create the target documents in an
OmegaT project, the translation memory of the
project is output in the form of three files in the root folder of your
OmegaT project (see the above description). You
can regard these three tmx files (-omegat.tmx
,
-level1.tmx
and -level2.tmx
) as
an "export translation memory", i.e. as an export of your current
project's content in bilingual form.
Should you wish to reuse a translation memory from a previous project (for example because the new project is similar to the previous project, or uses terminology which might have been used before), you can use these translation memories as "input translation memories", i.e. for import into your new project. In this case, place the translation memories you wish to use in the /tm or /tm/auto folder of your new project: in the former case you will get hits from these translation memories in the fuzzy matches viewer, and in the latter case these TMs will be used to pre-translate your source text.
By default, the /tm folder is below the project's root folder (e.g. .../MyProject/tm), but you can choose a different folder in the project properties dialog if you wish. This is useful if you frequently use translation memories produced in the past, for example because they are on the same subject or for the same customer. In this case, a useful procedure would be:
Create a folder (a "repository folder") in a convenient location on your hard drive for the translation memories for a particular customer or subject.
Whenever you finish a project, copy one of the three "export" translation memory files from the root folder of the project to the repository folder.
When you begin a new project on the same subject or for the same customer, navigate to the repository folder in the
and select it as the translation memory folder.Note that all the tmx files in the /tm repository are parsed when
the project is opened, so putting all different TMs you may have on hand
into this folder may unnecessarily slow OmegaT down. You may even consider
removing those that are not required any more, once you have used their
contents to fill up the project-save.tmx
file.
OmegaT supports imported tmx versions 1.1-1.4b (both level 1 and level 2). This enables the translation memories produced by other tools to be read by OmegaT. However, OmegaT does not fully support imported level 2 tmx files (these store not only the translation, but also the formatting). Level 2 tmx files will still be imported and their textual content can be seen in OmegaT, but the quality of fuzzy matches will be somewhat lower.
OmegaT follows very strict procedures when loading translation memory (tmx) files. If an error is found in such a file, OmegaT will indicate the position within the defective file at which the error is located.
Some tools are known to produce invalid tmx files under certain conditions. If you wish to use such files as reference translations in OmegaT, they must be repaired, or OmegaT will report an error and fail to load them. Fixes are trivial operations and OmegaT assists troubleshooting with the related error message. You can ask the user group for advice if you have problems.
OmegaT exports version 1.4 TMX files (both level 1 and level 2). The level 2 export is not fully compliant with the level 2 standard, but is sufficiently close and will generate correct matches in other translation memory tools supporting TMX Level 2. If you only need textual information (and not formatting information), use the level 1 file that OmegaT has created.
In case translators need to share their TMX bases while excluding
some of their parts or including just translations of certain files,
sharing the complete ProjectName-omegat.tmx
is out
of question. The following recipee is just one of the possibilities, but
simple enough to follow and without any dangers for the assets.
Create a project, separate for other projects, in the desired language pair, with an appropriate name - note that the TMXs created will include this name.
Copy the documents, you need the translation memory for, into the source folder of the project.
Copy the translation memories, containing the translations of
the documents above, into tm/auto
subfolder of
the new project.
Start the project. Check for possible Tag errors with Ctrl+T and untranslated segments with Ctrl+U. To check everything is as expected, you may press Ctrl+D to create the target documents and check their contents.
When you exit the project. the TMX files in the main project folder (see above) now contain the transltions in the selected language pair, for the files, you have copied into the source folder. Copy them to a safe place for future referrals.
To avoid reusing the project and thus possibly polluting future cases, delete the project folder or archive it away from your workplace.
In cases where a team of translators is involved, translators will prefer to share common translation memories rather than distribute their local versions.
OmegaT interfaces to SVN and Git, two common team software versioning and revision control systems (RCS), available under an open source license. In case of OmegaT complete project folders - in other words the translation memories involved as well as source folders, project settings etc - are managed by the selected RCS. see more in Chapter
There may be cases where you have done a project with e.g. Dutch sources, and a translation in say English. Then you need a translation in e.g. Chinese, but your translator does not understand Dutch; she, however, understands perfectly English. In this case, the NL-EN translation memory can serve as a go-between to help generate NL to ZH translation.
The solution in our example is to copy the existing translation memory into the tm/tmx2source/ subfolder and rename it to ZH_CN.tmx to indicate the target language of the tmx. The translator will be shown English translations for source segments in Dutch and use them to create the Chinese translation.
Important: the supporting TMX must be renamed XX_YY.tmx, where XX_YY is the target language of the tmx, for instance to ZH_CN.tmx in the example above. The project and TMX source languages should of course be identical - NL in our example. Note that only one TMX for a given language pair is possible, so if several translation memories should be involved, you will need to merge them all into the XX_YY.tmx.
Some types of source files (for instance PO, TTX, etc.) are
bilingual, i.e. they serve both as a source and as a translation memory.
In such cases, an existing translation, found in the file, is included in
the project_save.tmx
. It is treated as a default
translation, if no match has been found, or as an alternative translation,
in case the same source segment exists, but with a target text. The result
will thus depend on the order in which the source segments have been
loaded.
All translations from source documents are also displayed in the Comment pane, in addition to the Match pane. In case of PO files, a 20% penalty applied to the alternative translation (i.e., a 100% match becomes an 80% match). The word [Fuzzy] is displayed on the source segment.
When you load a segmented TTX file, segments with source = target will be included, if "Allow translation to be equal to source" in Options → Editing Behavior... has been checked. This may be confusing, so you may consider unchecking this option in this case.
Of interest for advanced users only!
Before segments get translated, you may wish to pre-process them or address them in some other way than is possible with OmegaT. For example, if you wish to create a pseudo-translation for testing purposes, OmegaT enables you to create an additional tmx file that contains all segments of the project. The translation in this tmx can be either
translation equals source (default)
translation segment is empty
The tmx file can be given any name you specify. A pseudo-translated memory can be generated with the following command line parameters:
java -jar omegat.jar --pseudotranslatetmx=<filename>
[pseudotranslatetype=[equal|empty]]
Replace <filename>
with the name of the
file you wish to create, either absolute or relative to the working folder
(the folder you start OmegaT from). The second
argument --pseudotranslatetype
is optional. Its value
is either equal
(default value, for source=target) or
empty
(target segment is empty). You can process the
generated tmx with any tool you want. To reuse it in
OmegaT rename it to project_save.tmx
and place it in the omegat
-folder of your
project.
Very early versions of OmegaT were capable of segmenting source files into paragraphs only and were inconsistent when numbering formatting tags in HTML and Open Document files. OmegaT can detect and upgrade such tmx files on the fly to increase fuzzy matching quality and leverage your existing translation better, saving you the work of doing this manually.
A project's tmx will be upgraded only once, and will be written in
upgraded form into the project-save.tmx
; legacy tmx
files will be upgraded on the fly each time the project is loaded. Note
that in some cases changes in file filters in
OmegaT may lead to totally different
segmentation; as a result, you will have to upgrade your translation
manually in such rare cases.