
Table of Contents
Introduction to TeX/LaTeX translation
One of the trademarks of LaTeX is its maturity and stability. For decades it has been used by many authors in various and large organizations as a preferred drafting environment. Today, it has evolved into a very powerful and extremely efficient writing tool that is widely used in academia for publishing and communicating scientific papers in a variety of fields such as statistics, mathematics, engineering, computer science, physics, chemistry, quantitative psychology, economics, political science, and philosophy.
LaTeX also plays a significant role in the publication and preparation of articles and books that include complicated multilingual materials, although it is also increasingly being used in other fields for which LaTeX was not intended when it was created, such as philology or linguistics.
Translation plays a key role in making all these scientific discoveries, old and new, accessible to as many people as possible. It allows wider access to knowledge, but also collaboration between communities that do not share a common language. However, doing so in documents written in LaTeX adds some spice to the usual challenges presented by translation.
Understanding the challenges of TeX/LaTeX translation
As mentioned in the introductory article, TeX/LaTeX projects use plain text files to separate content from design.
This is achieved by specific formatting commands in the files, intended as instructions to the TeX/LaTeX engine. These commands do not have to be translated. If one of them is altered during the translation process, it is likely that the file will not be compiled (and if by chance it is compiled, it is very likely that the formatting is incorrect).
Internally, Microsoft Word documents and similar alternatives also use tags to mark formatting. Since these software solutions focus on the WYSIWYG approach, they do not show these tags to the user (but directly the result of their application). Because of this, the set of tags and formatting you can apply to the text you are working on is limited, as the only options available are within the graphical interface provided by the software. In brief: there is a limited number of possible tags included in those documents.
Also in LaTeX there is a set of standard tags that allow the user to mark formatting or references, but there is also the possibility of defining new tags or even macros that are in addition to the standard ones already provided.
And this is where problems arise, because this case is very common in TeX/LaTeX files.
Popular text formats (such as .docx
or .odt
) are widely and easily recognized by almost all modern computer-aided translation tools. What happens when, in addition to the standard markings, there is the (very likely) possibility that the author defines new tags and structures that will be used within the document but do not have to be taken into account in the translation? If given the chance, authors define their own abbreviations for frequently repeated portions of text. This is especially common in mathematics, where equations are explained and solved step by step (there are few changes between equations). This can become complicated and lead to problems if the text and files are not handled properly before translation.
Considerations before translation
The most critical aspect of TeX/LaTeX is correctly identifying what is translatable and what is not. TeX/LaTeX wraps translatable content in untranslatable tags. If you do not correctly identify what is translatable and what is not, you risk breaking the code and generating a file that does not compile or a file that is not accurate (from a formatting standpoint) to the original.
A brief but mandatory checklist for translation preparation might be as follows:
- Check compatibility with the translation tool: most of them do not natively support TeX/LaTeX.
- Identify and solve coding problems: different coding may be the main cause of software misidentifying characters, such as accents or special characters.
- Manage packages or custom macros: TeX/LaTeX allows the use of additional packages, which add more commands to the standard collection. Check in the header which packages are used in the files to make sure that all additional commands are taken into account.
- There is also the possibility for authors to define their own commands and macros. These are usually defined at the beginning of the document. Be sure to mark them as untranslatable.
- Don't forget the non-textual elements: scientific papers tend to include non-textual elements such as figures, tables and equations. Usually these can be safely marked as DNT ("Do Not Translate"), but there is also the possibility that they include translatable text within them. In such a case, the client should be asked to send the figures separately in an editable format so that they can be translated. Keeping this in mind before starting translation can save a lot of time and tug-of-war situations between clients and translators.
- Find the most appropriate professional: the translator working with the files should have at least a basic knowledge of how LaTeX works to avoid breaking the code and to be able to point out any errors in identifying the translatable text. Changing the order of some special characters is common (due to grammar), but some CAT software may report it as an error or warning.
Best practices for TeX/LaTeX translation
The general guidelines for TeX/LaTeX translation will be nothing new to those accustomed to working in the translation industry. The use of a CAT tool will help ensure consistency. A translation memory well populated with previous works will be of great help.
It is also very likely that industry-specific and/or scientific jargon will be used in these files, which is why it is recommended that the most relevant words/phrases be extracted beforehand to build a glossary to help throughout the process. Accuracy is also critical in these translations, as a mistranslated nuance could imply an incorrect assumption or contradiction within the text.
When working with scientific papers or publications, very often new theorems and hypotheses are proven right or wrong. It is critical that the correct meaning be maintained in the translation. For this reason, it is highly advisable to work with a translator whose field of specialization is the one covered in the translation, or, at least, who has a good knowledge of a related field, enabling a better understanding of the ideas presented. Our team of professionals will ensure that the texts you submit to us are accurately translated from/into the desired language.
Regular visual inspection of both the original and translated files is also necessary to ensure that the formatting is kept unchanged. Since TeX/LaTeX is a markup language and assuming that there are tags that change their position, it is important to periodically inspect the document to make sure that the format is consistent.
Another common situation for this type of translation is cross-references between different articles and/or quotations from other authors. If possible, it is important to ask for the files being referenced to ensure proper citation. It also sometimes happens that a client submits multiple files with internal references. Therefore, it is necessary to make sure that the citations are reported correctly in the final files.
Choosing the right translation tool
Currently, the translation process is extremely simplified if the right tool is used. Since TeX uses plain text files, you will always be able to open the files that make up a manual using an advanced text editor (such as Notepad++) and translate them one at a time. However, to ensure greater consistency and to avoid translating repeated texts several times (technical manuals are full of repetitions) and to translate all individual files at once, we recommend using a CAT tool, not meaning a feline specimen but a computer-assisted translation tool.
In order to do this properly, the CAT tool must first find a way to distinguish the parts to be translated from the rest of the text, while maintaining the structural integrity of the folder in which they are located and the format of each file.
There are 3 approaches:
- Use Tortoise Tagger: this set of macros for Microsoft Word, from a few years ago, allows you to block out parts of the text and reveal only translatable text. After passing TeX/LaTeX files through this tagger, you can use a CAT tool that can process tagged Word or RTF files, such as Trados Studio itself, Wordfast or memoQ.
- Use a filter based on regular expressions: some translation tools such as Trados or memoQ support regular expressions (regex), to block out all untranslatable parts. If you have the time and patience to create a set of regexes suitable for the project you are working on, this is a process that avoids preprocessing the files with a tagger.
- Use OmegaT: this open-source CAT tool is the recommended procedure, as it can read TeX files directly. It eliminates the hassle of labeling files before starting the translation (and then exporting it back to a .tex file) and refining to the max a complex regex that will potentially only work for the current project.
Ensure quality after translation
At the end of the translation process, a validation process is needed to ensure that all files have been handled correctly, that there is no text left to translate, that all files are in the right place so that the engine can compile the project, and that the overall output is the same as the original files.
LaTeX has a very good engine that usually generates a very consistent result, even with different languages. This is not always the case for all pages, but most of the time you can check 1 to 1 each page to compare the two outputs. They will probably look the same, with the only difference being the language used.
Of course, it is highly recommended that another specialist correct the text made by the translator. In this case, it is not entirely mandatory that this person be familiar with the inner workings of LaTeX, since the "final" version of the document may already be generated and it is likely that only minor changes need to be made (by the same translator who worked with the original files).
At Qabiria we offer both translation and proofreading of TeX/LaTeX documents. Learn more about our LaTeX translation services.