Adaptation and layout in LaTeX format from French, German and Italian into English

As part of a research project, a major Japanese university approached Qabiria to translate several scientific articles on theoretical physics in French, German, and Italian into English and subsequently typeset them in LaTeX

Page of a scientific article formatted in LaTeX

Professor F. checked both the translation and the layout and was extremely satisfied.

N.T.
N.T. Research Unit Administrator

9

project weeks

350

pages of scientific papers

85

thousand words

About Japanese University

This major research center offering doctoral programs in Science and Engineering was founded by the local government to promote international science and technology research, attract leading researchers from around the world, and foster the development of a cutting-edge research hub.

Case study summary

Industry Engineering
Services Layout, Specialized translation
Languages Italian, French, German, English

The challenge

Towards the end of 2016, we received a rather unusual request: an employee of a Japanese university was looking for LaTeX-skilled translators to translate, and then typeset, a series of scientific articles, specifically related to physics, for one of the centre’s researchers.

The task was to translate a dozen articles from French, German and Italian into English, format their layout in LaTeX and export them as PDFs. For those unfamiliar with LaTeX, it is a very powerful and versatile system for creating complex documents, used mostly by the academic and scientific community, but also by many engineering firms and technical writing agencies.

Due to their characteristics, the articles to be translated, in PDF format, made the project particularly complex, both technically and linguistically.

  • First of all, the articles were in non-editable format, in the form of PDFs derived from photocopies of the paper originals, sometimes with a rather low resolution that made it difficult to read certain symbols (contained in the numerous equations, but also in the text), giving rise to misunderstandings, potentially very dangerous given the nature of the texts.
  • In turn, these originals were extracts of books published at the end of the 19th century, thus written with rather archaic language, often difficult to interpret.
  • The quality of the scans was not uniform and in some cases far from optimal: some articles had been digitised without cleaning, so some pages appeared with a heavy grey background and were difficult to read in some parts.
  • The most critical point was the presence of hundreds of equations, sometimes expressed in the notation of the time, different from the modern notation accepted by the international scientific community.
  • Finally, one of the PDFs also presented a dozen graphs, fortunately not too complex, but still requiring reproduction with a vector drawing programme in order to insert them in the final documents.

The solution

The first delicate step was the word count, which was required to prepare an estimate. The client had already given us a rough estimate of the words in each PDF. After trying a few OCR solutions with poor results, due to both the poor quality of the originals and the complexity of the documents, we decided to work out the total count from a sample of pages from each document, arriving at similar conclusions to those of the client.

We then looked at which option was the most cost-effective:

  1. format the original document in LaTeX and give the translators the TEX files to translate, or
  2. give the translators the originals in PDF and then format the layout of the translations.

Broadly speaking, the first option has a couple of advantages.

  1. The first is to be able to translate the files via a computer-aided translation programme, so-called CAT tools, speeding up the work and guaranteeing uniformity of terminology and design. OmegaT, the CAT tool we usually use, has a filter for this type of file. Unfortunately, however, the filter is not perfect, i.e. it does not recognise all the codes that can be used in the structure of the document, making parts that should not be translated translatable and vice versa.
  2. The second advantage is that once the document is translated, you already have the files ready in the format for delivery to the client (TEX). Nevertheless, it is quite likely that translators will unknowingly alter the file structure during the translation and revision phase, thus requiring the LaTeX expert to revise the document structure.

On the other hand, the advance formatting of documents also entails some disadvantages.

  1. On the one hand, it forces translators to work with OmegaT, adding a further constraint to the already difficult task of selecting translators, as this programme is not among the most widely used in the industry.
  2. On the other hand, at the planning level, anticipating the layout phase means having to delay sending the documents to the translators.

The second option, that of having the PDF translated directly by typing the translation into a Word file, also has some immediate advantages.

  1. The first is to have translators work with a word processor with which they are familiar and which practically everyone has.
  2. The other advantage is that you can begin the translation phase immediately, without waiting for the source files to be prepared.

Of course, formatting afterwards has a major disadvantage:

  • the translators do not have a digital text to work on, they have to type the entire translation into Word.

To make this task easier, we therefore decided not to have them write down the equations, nor the symbols (mainly Greek letters) contained in the text. This work would have been essentially useless, because we would have had to rewrite all the equations in the typical LaTeX format anyway. Instead of each equation, as well as each mathematical symbol within the sentences, we agreed to write an easy-to-type sequence of characters (e.g. @@@), which could be isolated to make the following layout phase easier.

If the client had requested translation into several languages, we would probably have chosen the first option, i.e. we would have prepared TEX files from the originals in the PDF and given the translators these TEX files to translate. Since instead the files were to be translated only into English, we chose the second option, agreeing on an extra fee with the translators for the transcription work.

Finally, the order in which to translate the documents was chosen according to the number of pages, in order to guarantee a constant workflow for the LaTeX expert and to meet the deadlines set by the client.

The team

After resolving workflow issues, we selected a team of translators with higher level studies in mathematics or physics, which were essential for a thorough understanding of the original texts and for rendering them properly. The chosen professionals then carried out a short test translation of an extract from the articles. The test was evaluated and accepted by the client (the professor in charge of research was a native English speaker) before the translation phase began.

Given the complexity of the layout work, we also selected a LaTeX expert, who could prepare the translated documents and select the necessary additional packages of the basic system for the project (LaTeX is a modular system that allows additional packages to be installed according to the required functions). This person, who also has a scientific background, worked with a maths student to resolve various queries about writing equations.

It must be said that we were very lucky, because we found this LaTeX expert located very close to us, thus having the opportunity to meet in person periodically throughout the duration of the project, in order to analyse (and solve) the critical aspects.

The result

The project lasted a total of about 9 weeks, during which our team translated and formatted almost 350 pages of scientific documents in LaTeX, equivalent to over 85,000 words. Both TEX files and the corresponding PDF files were delivered to the client, one of which was complete with graphics, recreated in Corel Draw and then attached to the project as EPS/PDF.

The TEX files meet industry standards for layout (following the "article" class) and equation format. In addition, all files have been duly analysed at code level, to allow for modifications by the client without having to ask us again.

Lessons learned

For Qabiria, this was a truly exciting project, because it allowed us to deepen our knowledge of formatting with LaTeX, while also giving us the opportunity to contribute - albeit indirectly - to a research project and, therefore, to spread knowledge, in line with our business principles.

We also had the pleasure of working with very qualified professionals, whose contribution has been fundamental in order to be able to deliver the job to the client, with which they were extremely satisfied. Not least, it also demonstrates the immense potential that online marketing holds for those with the ability - or luck - to be found by search engines.

Update 2024

After a few years, the same client contacted us again with another set of articles (again to be translated into English from Italian, French and German). Unlike the previous project, this time the articles are more recent and the quality of the PDFs better. Otherwise, the project was basically the same as the first one, so we followed the same procedure. The contributor to the layout in LaTeX has remained the same, while the translator-reviewers have changed.

Further Reading

Get the same results as this japanese university!

Let us know what you need by sending an email to hola@qabiria.com or by filling in the contact form. We guarantee a response within 24 hours, but usually we’re much faster.

Contact us