Translating XML files painlessly

An impossible mission?

Ragazza salta kungfu

As part of our collaboration as OmegaT lecturers with the Universitat Autònoma de Barcelona (UAB), specifically with the Master in Tradumática, for two years we coordinated in the guise of tutors the thesis work of students, who had to tackle a project of translating from English into Spanish and Catalan real computer programs, some of the systems developed as part of the Public Knowledge Project.

This localization project is a perfect example to illustrate how to prepare and translate files XML with localization and translation tools, the subject of this article.

The [PKP] project (https://pkp.sfu.ca/ "PKP website, Public Knowledge Project") is a combined initiative by several universities, aimed at developing free software to improve the quality of academic publications. This project has resulted in some of the most widely used applications in academia:

  • for the management of academic publications (Open Journal Systems),
  • of monographs (Open Monograph Press),
  • of conferences (Open Conference Systems) and
  • for indexing the relevant metadata (Open Harvester Systems).

Since these are open source, open and nonprofit systems, some of them have also been chosen by UAB for internal operation. It was therefore only natural that the Publications Office and the Master’s in Tradumàtica (Translation Technology) would work together to allow students to localise the as yet untranslated parts of the programs into Spanish and Catalan.

Note: at the time of our collaboration, the localization of these software involved the translation of numerous XML files. Only more recently the PKP project has chosen to use a localization tool (Weblate) to simplify the workflow and make the work of translators and project managers easier.

The localization projects we have coordinated are those of OMP (Open Monograph Press) and OCS (Open Conference Systems) and, as such, are no different from any commercial project.

In fact, they suffered all the usual problems inherent in translating out-of-context strings extracted from hundreds of different files, with the use of reference material that is not entirely consistent and - out of necessity - largely unknown tools that had room for improvement.

In short, this was not only an excellent test run for the master’s students, who will have to deal with these problems on a daily basis if they want to work in website or software localisation, but also for those supervising and coordinating activities between half a dozen groups of 3-4 people, and with rather tight deadlines.

For our purposes, it’s interesting to observe how any XML file can be prepared for straightforward translation with any chosen translation tool, by using the all-purpose Okapi Framework program suite.

You may ask why you should convert an XML file to XLIFF format when almost all assisted translation tools, or CAT tools, allow you to translate XMLs directly.

First of all, not all CAT tools are capable of identifying which elements of an XML file are translatable, and which must be protected by tags, in a simple way.

Using a sophisticated external tool like Okapi allows you to prepare all kinds of XML files, and gives you the added advantage of not being dependent on CAT tools and therefore being able to assign the project to anyone without them needing to be linked to a specific CAT tool.

Once the XLIFF file has been created, it can be sent to the translators, who will then be able to translate it with any tool that reads XLIFF files, sparing the translators the pain of having to reconfigure their CAT tools.

The two videos featured here, currently only available in Spanish with subtitles in Italian, were initially recorded for the master’s students (and are therefore a bit informal), and explain both the structure of translatable files from the OMP package (as an example), and the process for creating an appropriate filter for Rainbow/Okapi.

The same procedure can be applied to any program with translatable text in XML form. It’s no coincidence that this approach is the same as that described in our article "How to translate a Moodle course", which we recommend for further details.

The procedure can be summed up like this:

  1. Analyse the XML files to identify translatable elements and attributes
  2. Create the appropriate configuration file for the Rainbow/Okapi XML Stream filter
  3. Convert the XML files to XLIFF using Rainbow
  4. Check all translatable content is actually exposed and that untranslatable content is properly tagged
  5. Translation and revision phase
  6. Reconversion process from XLIFF to XML
  7. Linguistic and functional testing within the program.

The procedure in the specific case of the two projects featured here was a little complex, as the file preparation phase was combined with a terminological extraction phase and the project was divided up between multiple work groups. Schematically speaking, the steps were:

  1. Extract the files that made up the package to be translated, i.e. the English-language files, into a folder
  2. Analyse the DTD files in order to identify the translatable elements and attributes of the relevant XML files
  3. creating the filter with Okapi Rainbow, following the instructions in the Okapi Wiki, specifically the pages on XML Stream Filter and HTML Filter;
  4. Now copy the filter into the main folder of the package to be translated
  5. Drag the files into Rainbow
  6. Configure the languages and UTF8 encoding correctly
  7. Set up the path to the configuration file
  8. Select all the files and change the predefined filter type (using the specially created one)
  9. Convert them to XLIFF
  10. Resolve any errors produced by incorrect syntax in the originals In this case concerned, one error did occur due to the presence of too much CDATA in one of the files, which we corrected by intervening in the file itself manually
  11. after conversion, extraction of the most recurrent terms with the special Rainbow’s statistical terminology extraction function;
  12. Then create a project for each target language in OmegaT
  13. Add the translation memories and the corresponding glossaries, set a penalty for memories from unreliable sources
  14. Do a sight analysis of the files to look for any segmentation errors
  15. Modify the segmentation rules ad hoc in OmegaT
  16. Do a count analysis of the files
  17. Divide the files up between the multiple work groups. 

We are here for any questions you put in the comment space below.

If you have XML files to translate, we’re here to help!

Technical translator, project manager, entrepreneur. Languages graduate with an MA in Design and Multimedia Production. He founded Qabiria in 2008.

Further Reading

Chat to one of us

Let us know what you need by sending an email to hola@qabiria.com or by filling in the contact form. We guarantee a response within 24 hours, but usually we’re much faster.

Contact us