Plain text files - in most cases files with a txt extension - contain just textual information and offer no clearly defined way to inform the computer which language they contain. The most that OmegaT can do in such a case, is to assume that the text is written in the same language the computer itself uses. This is no problem for files encoded in Unicode using a 16 bit character encoding set. If the text is encoded in 8 bits, however, one can be faced with the following awkward situation: instead of displaying the text, for Japanese characters...
...the system will display it like this for instance:
The computer, running OmegaT, has Russian as the default language, and thus shows the characters in the Cyrillic alphabet and not in Kanji.
There are basically three ways to address this problem in OmegaT. They all involve the application of file filters in the Options menu.
open your source file in a text editor that correctly
interprets its encoding and save the file in "UTF-8" encoding. Change the file extension
from .txt
to .utf8.
OmegaT will automatically interpret the
file as a UTF-8 file. This is the most common-sense alternative,
sparing you problems in the long run.
- i.e. files with a .txt
extension - : in
the Text files section of the file
filters dialog, change the Source File
Encoding from <auto> to the encoding that
corresponds to your source .txt
file, for
instance to .jp for the above example.
for instance from .txt
to
.jp
for Japanese plain texts: in the Text files section of the file filters
dialog, add new Source Filename
Pattern (*.jp
for this example) and
select the appropriate parameters for the source and target
encoding
OmegaT has by default the following short list available to make it easier for you to deal with some plain text files:
.txt
files are automatically (<auto>)
interpreted by OmegaT as being encoded in
the computer's default encoding.
You can check that yourself by selecting the item File Filters in the menu Options. For example, when you have a Czech text
file (very probably written in the ISO-8859-2 code) you just need to change the
extension .txt
to .txt2
and
OmegaT will interpret its contents correctly.
And of course, if you wish to be on the safe side, consider converting
this kind of file to Unicode, i.e. to the .utf8
file
format.