Translate Toolkit The Translate Toolkit (http://toolkit.translatehouse.org/) is a set of libraries and command line tools to help you with your translation work. It is distributed under the GPLv2 license and can be used freely by anyone. The developer is Translate House, which is implemented in Python. It supports two Python systems, and will not be supported before Python 2.5 from the next 1.11.0. Translate House is also developing the translation platform Pootle, which also uses the Translate Toolkit (Doc-ja Advent Calendar 2013 Day 1. / naruoga / 20131201/1385838237), Day 3 The translation of LibreOffice introduced in (http://d.hatena.ne.jp/naruoga/20131203/1386063346) is also done on Pootle).
In this article, I'll use the Translate Toolkit library to create a simple translation tool. In particular, we deal with tools that automate the editing of PO files. This is useful when the editing process is simple but the target area is huge. Speaking of mechanical editing, existing Unix-like tools such as sed are originally good at it, but PO is not line-oriented, so it is not compatible with Unix-like tools and it is not straightforward. Now let's use the Translate Toolkit.
In this article, PO by GNU gettext is assumed as the message catalog format, but Translate Toolkit is not limited to PO. It supports various formats such as XLIFF. The PO itself is explained in the article Day 2 and will not be mentioned here.
$ pip install translate-toolkit
Alternatively, if your OS provides the translate-toolkit package, you may want to use it. The source code is maintained on github.
For example, consider the following work example. "Three periods added as suffixes to the original message (common in GUI parts, etc.) were replaced with a three-point reader, which caused a large amount of fuzzy. The translation itself is simple, but the target area is huge. I want to process it mechanically. "
The following is an excerpt of the PO to be edited. Imagine that there are so many examples like this that you don't want to edit them by hand. Some artificial descriptions are also included to make the effect easier to understand.
#Update target
#, fuzzy
#| msgid "Open..."
msgid "Open…"
msgstr "open..."
#Update target
#There is a line break in the middle of the translation side period...
#, fuzzy
#| msgid "Save As..."
msgid "Save As…"
msgstr "save as."
".."
#For some reason, he has been a leader for some time. No update required
msgid "Print…"
msgstr "printing…"
#For some reason, I want to leave the translation without the suffix. Drop only fuzzy
#, fuzzy
#| msgid "Print Preview..."
msgid "Print Preview…"
msgstr "Print preview"
#For some reason the original text remains a period(For the time being, the translation follows it)
msgid "Preference..."
msgstr "Setting..."
The translator also replaces the three periods with a three-point reader to match the updated source text. The conditions of the target message are as follows.
This script reads PO from standard input and writes the edits to standard output.
# -*- coding: utf-8 -*-
import sys
from translate.storage.po import pofile
PREV_SUFFIX = '...'
NEW_SUFFIX = u'…'
for unit in pofile.parsefile(sys.stdin).units:
prev_source = unit.prev_source
new_source = unit.source
if unit.isfuzzy() and \
prev_source.endswith(PREV_SUFFIX) and \
new_source.endswith(NEW_SUFFIX) and \
prev_source.rstrip(PREV_SUFFIX) == new_source.rstrip(NEW_SUFFIX):
if unit.target.endswith(PREV_SUFFIX):
updated_target = unit.target.replace(PREV_SUFFIX, NEW_SUFFIX)
unit.settarget(updated_target)
unit.markfuzzy(False)
unit.prev_source = None
print unit
The processing result of the above PO is as follows. Can you see that fuzzy processing is done without excess or deficiency, and that line wrapping in the middle of the text can be handled appropriately?
#Update target
msgid "Open…"
msgstr "open…"
#Update target
#There is a line break in the middle of the translation side period...
msgid "Save As…"
msgstr "save as…"
#For some reason, he has been a leader for some time. No update required
msgid "Print…"
msgstr "printing…"
#For some reason, I want to leave the translation without the suffix. Drop only fuzzy
msgid "Print Preview…"
msgstr "Print preview"
#For some reason the original text remains a period(For the time being, the translation follows it)
msgid "Preference..."
msgstr "Setting..."
Overall, the words source and target often appear, but these are terms that are often used in the translation community. Source text and Target text mean the original and translated texts, respectively. Corresponds to msgid and msgstr in PO.
PO-related APIs will be available by importing the translate.storage.po module. For more information on the API, see the API Reference (http://docs.translatehouse.org/projects/translate-toolkit/en/1.11.0-rc1/api/index.html).
If you parse the PO with the parsefile method of the pofile class, a series of messages will be in the list called units. Each message that is an element of the list can be operated as an object of pounit class. pounit provides API for message manipulation, and not only msgid and msgid, but also previous msgid, various comments, and plural forms attributes defined by gettext You can access /manual/gettext.html#PO-Files). Through the pounit API, you can manipulate messages without being bound by some annoying physical description of PO files.
The 4 lines from the first if correspond to the above 4 conditions. Editing a message that matches the conditions.
At the end of the for loop, print prints each pounit to standard output. pounit defines the str method, and you can output it as a PO message in a valid format just by printing it as it is.
As a similar work, it can be applied to various purposes such as mechanical check of typo and batch correction due to translation style change. In other common cases, you can easily create tools such as taking the update difference between two POs and merging the differences into one.
In fact, I helped with the GNOME Project Japanese translation (https://l10n.gnome.org/teams/ja/) to change the style of long vowels (such as changing "user" to "user"). Even when I did, I created an editing tool with Translate Toolkit and processed it all at once. The number of PO messages in the GNOME project is roughly estimated to be more than 100,000 in terms of msgid. It wasn't something that could be edited manually, and the Translate Toolkit really helped.
This time, I've introduced only a small part of the Translate Toolkit library. There are many other useful features. In addition to the library, there are also out-of-the-box command line tools. For more information, see the API Reference and other documentation (http://docs.translatehouse.org/projects/translate-toolkit/en/1.11.0-rc1/). Try using the Translate Toolkit for your daily translation life. Happy translating!
Tomorrow is knok.
Recommended Posts