Translating XML files painlessly

By: Marco Cevoli - Read Time: 5 minutes

As part of our collaboration as OmegaT lecturers with the Universitat Autònoma de Barcelona (UAB), specifically with the Master in Tradumática, for two years we coordinated in the guise of tutors the thesis work of students, who had to tackle a project of translating from English into Spanish and Catalan real computer programs, some of the systems developed as part of the Public Knowledge Project.

This localization project is a perfect example to illustrate how to prepare and translate files XML with localization and translation tools, the subject of this article.

The [PKP] project (https://pkp.sfu.ca/ "PKP website, Public Knowledge Project") is a combined initiative by several universities, aimed at developing free software to improve the quality of academic publications. This project has resulted in some of the most widely used applications in academia:

for the management of academic publications (Open Journal Systems),
of monographs (Open Monograph Press),
of conferences (Open Conference Systems) and
for indexing the relevant metadata (Open Harvester Systems).

Since these are open source, open and nonprofit systems, some of them have also been chosen by UAB for internal operation. It was therefore only natural that the Publications Office and the Master’s in Tradumàtica (Translation Technology) would work together to allow students to localise the as yet untranslated parts of the programs into Spanish and Catalan.

Note: at the time of our collaboration, the localization of these software involved the translation of numerous XML files. Only more recently the PKP project has chosen to use a localization tool (Weblate) to simplify the workflow and make the work of translators and project managers easier.

The localization projects we have coordinated are those of OMP (Open Monograph Press) and OCS (Open Conference Systems) and, as such, are no different from any commercial project.

In fact, they suffered all the usual problems inherent in translating out-of-context strings extracted from hundreds of different files, with the use of reference material that is not entirely consistent and - out of necessity - largely unknown tools that had room for improvement.

In short, this was not only an excellent test run for the master’s students, who will have to deal with these problems on a daily basis if they want to work in website or software localisation, but also for those supervising and coordinating activities between half a dozen groups of 3-4 people, and with rather tight deadlines.

For our purposes, it’s interesting to observe how any XML file can be prepared for straightforward translation with any chosen translation tool, by using the all-purpose Okapi Framework program suite.

You may ask why you should convert an XML file to XLIFF format when almost all assisted translation tools, or CAT tools, allow you to translate XMLs directly.

First of all, not all CAT tools are capable of identifying which elements of an XML file are translatable, and which must be protected by tags, in a simple way.

Using a sophisticated external tool like Okapi allows you to prepare all kinds of XML files, and gives you the added advantage of not being dependent on CAT tools and therefore being able to assign the project to anyone without them needing to be linked to a specific CAT tool.

Once the XLIFF file has been created, it can be sent to the translators, who will then be able to translate it with any tool that reads XLIFF files, sparing the translators the pain of having to reconfigure their CAT tools.

The two videos featured here, currently only available in Spanish with subtitles in Italian, were initially recorded for the master’s students (and are therefore a bit informal), and explain both the structure of translatable files from the OMP package (as an example), and the process for creating an appropriate filter for Rainbow/Okapi.

The same procedure can be applied to any program with translatable text in XML form. It’s no coincidence that this approach is the same as that described in our article "How to translate a Moodle course", which we recommend for further details.

The procedure can be summed up like this:

Analyse the XML files to identify translatable elements and attributes
Create the appropriate configuration file for the Rainbow/Okapi XML Stream filter
Convert the XML files to XLIFF using Rainbow
Check all translatable content is actually exposed and that untranslatable content is properly tagged
Translation and revision phase
Reconversion process from XLIFF to XML
Linguistic and functional testing within the program.

The procedure in the specific case of the two projects featured here was a little complex, as the file preparation phase was combined with a terminological extraction phase and the project was divided up between multiple work groups. Schematically speaking, the steps were:

Extract the files that made up the package to be translated, i.e. the English-language files, into a folder
Analyse the DTD files in order to identify the translatable elements and attributes of the relevant XML files
creating the filter with Okapi Rainbow, following the instructions in the Okapi Wiki, specifically the pages on XML Stream Filter and HTML Filter;
Now copy the filter into the main folder of the package to be translated
Drag the files into Rainbow
Configure the languages and UTF8 encoding correctly
Set up the path to the configuration file
Select all the files and change the predefined filter type (using the specially created one)
Convert them to XLIFF
Resolve any errors produced by incorrect syntax in the originals In this case concerned, one error did occur due to the presence of too much CDATA in one of the files, which we corrected by intervening in the file itself manually
after conversion, extraction of the most recurrent terms with the special Rainbow’s statistical terminology extraction function;
Then create a project for each target language in OmegaT
Add the translation memories and the corresponding glossaries, set a penalty for memories from unreliable sources
Do a sight analysis of the files to look for any segmentation errors
Modify the segmentation rules ad hoc in OmegaT
Do a count analysis of the files
Divide the files up between the multiple work groups.

We are here for any questions you put in the comment space below.

If you have XML files to translate, we’re here to help!

Marco Cevoli

58 articles published

Technical translator, project manager, entrepreneur. Languages graduate with an MA in Design and Multimedia Production. He founded Qabiria in 2008.

Translating XML files painlessly

An impossible mission?

Marco Cevoli

Further Reading

Share this article

Search Here

Latest Posts

The Ultimate Style Guide now also available in Spanish!

Which CMS is the most easily translatable?

7 mistakes to avoid when translating a WordPress site

The best tools for working with TEX files

Top Categories

Chat to one of us