AppTek Blog | Leveraging Large Language Models for Translation: The Future of Multilingual Communication

Large Language Models (LLMs) are the latest milestone in Natural Language Processing (NLP) and the topic that has captivated the headlines and the attention of the scientific community the world over since the launch of ChatGPT approximately a year and a half ago. Despite hallucinations, LLMs have demonstrated impressive capabilities in a wide array of scenarios and tasks, including translation, making our imaginations run wild about the future of multilingual communication.

The Evolution of Translation Technology

Production-grade translation technology has come a long way from simple bilingual dictionaries, the systematic management of concept-oriented words and phrases in term bases, or the painstaking collection of parallel translation data (phrases or sentences) in databases known as translation memories.

Machine translation as a discipline is not even a century old, yet it has already evolved through three significant milestones. Early systems relied on rule-based approaches where linguistic rules were manually encoded into the system by computational linguists. Such methods struggled with the complexity and variability of the world’s natural languages and took a long time to develop, never reaching an acceptable level of fluency or being able to offer anything more than literal word-for-word translations which limited their effectiveness to very narrow domains with controlled and restricted vocabularies.

The introduction of Statistical Machine Translation (SMT) marked a significant improvement by leveraging bilingual translation corpora to generate translations based on probabilistic models. Google Translate was launched in 2006 and quickly became a household name. The generation of people that grew up with it was the first to think translation a commodity produced not by humans but by machines. Yet SMT struggled with languages for which no significant amounts of translation data were available or languages with rich linguistic structure. It also proved to have significant limitations regarding figurative language, context and ambiguity. This limited its use to non-creative, information-laden texts.

The advent of Neural Machine Translation (NMT) in 2015 represented a major leap forward. Artificial neural networks made it possible to model the machine translation process in an end-to-end manner. This significantly improved translation quality, especially for languages with high linguistic complexity and resulted in texts with unforeseen fluency, even though accuracy was sometimes compromised. However, NMT required substantial amounts of parallel data as well as computational resources for training, specifically the more expensive, but efficient Graphic Processing Units (GPUs). One type of neural network architecture – the Transformer (Vaswani et al., 2017) – was shown to be most effective for machine translation and many other NLP tasks. A few years later, it has also become the basis for larger models such as (chat)GPT – the Generative Pre-trained Transformer.

Large Language Models: The New Paradigm

With GPT, the new era of Large Language Models has begun. Supplied with the number of parameters several orders of magnitude higher than what was previously used, trained on datasets comprising of billions of words instead of millions as was the case in NMT systems, LLMs are able to generate human-like text based on a long and freely formulated context provided to them, across a wide range of languages. As a result, they bring significant advantages to the translation task:

Contextual understanding: LLMs excel at grasping context, a critical aspect of producing accurate and fluent translations. They ‘understand’ the nuances of a sentence, including idiomatic expressions, slang and cultural references, which traditional MT systems often misinterpret. As a result, some of the first domains in which people started to experiment with the use of LLMs in translation are creative ones like marketing.
Zero-shot and few-shot learning: LLMs are proficient in zero-shot and few-shot learning, meaning they can perform translation tasks with little to no task-specific training data. This is particularly valuable for narrow domains or also low-resource languages that lack extensive bilingual corpora, yet a recent evaluation at last year’s WMT (Kocmi et al., 2023) showed that for such language pairs, LLM MT translation quality was still behind that of state-of-the-art dedicated NMT models.
Dynamic adaptation: LLMs can dynamically adapt to new information and context. This allows them to provide more accurate translations for specialized domains such as legal, medical, or technical fields, where precise terminology is crucial.

Notwithstanding the above benefits, the use of LLMs does not come without challenges and ethical concerns, especially around sensitive contexts. As with any MT system, LLMs can inadvertently perpetuate biases present in their training data, which is harder to curate due to its sheer size. At the same time, the issue of reliability and accuracy of translation remains a big question mark due to the LLM tendency for hallucination which makes its application prohibitive for high-stakes applications without human oversight.

Aside from the above, questions have been raised regarding privacy and data security concerns when using LLMs, which are imperative to establish if user trust is to be ascertained. An additional drawback of using LLMs for translating large amounts of text is that they are resource-hungry and thus expensive in deployment. They need large GPU machines, but still the translations are generated rather slowly. In contrast, the standard Transformer NMT models can be efficiently deployed even on CPU-only machines and still deliver translations at a comfortable speed of several sentences per second.

Integrating LLMs into the Translation Pipeline

Though LLMs have not yet replaced the specialized enterprise-grade NMT models traditionally used in the translation workflows of language service providers, there are specific tasks for which they are particularly suitable. As such, they are steadily making headway in claiming their place in machine translation pipelines. At AppTek, we leverage LLMs primarily to improve the training process of state-of-the-art Transformer NMT model and MT quality estimation models. This can be done in a variety of ways:

Synthetic data generation: LLMs can be used to generate synthetic data so as to support metadata-aware MT approaches (Matusov et al., 2020), by reformulating existing translations from the original parallel data using alternative signals regarding speaker gender, text formality level, genre, and translation length (e.g. retranslate a sentence in informal style, as if spoken by a different speaker, etc.). After carefully filtering the synthetic data to ensure there is no meaning distortion, they can be used to alleviate any bias coming from the uneven distribution of the features modelled in the original parallel data. This enhances NMT training for better coverage of such linguistic features.
Rule generation for data classification: LLMs can also be used to suggest regular expressions or rules for matching linguistic phenomena in training data, helping define different classes and improve metadata-based customization. An example of this approach is to automatically generate rules for the detection of different formality levels in languages which have many of them, such as Japanese and Korean.
Back-translation into English: As LLMs typically provide high translation quality when translating into English as opposed to other languages, they are used to translate foreign sentences into English, thus creating high-quality synthetic data to train NMT systems for translating from English into said foreign languages.
Text normalization: Expanding on the use case of parallel synthetic data generation, LLMs are used for text normalization purposes, i.e. to convert numbers, dates, and other tokens with digits into their spoken form, as well as for introducing speech-like disfluencies. Such synthetic sentences are then used as source sentences to resemble spontaneous spoken data, which helps improve system robustness against speech recognition errors and other speech artefacts. This allows AppTek to successfully use our NMT system for translation of speech as generated by our automatic speech recognition (ASR) system, with proper punctuation, casing, and number formatting in the MT output.
MT Quality Estimation: LLMs have been shown to excel at evaluating MT output quality (Kocmi and Federmann, 2023), an approach we also choose to follow by prompting LLMs with the source sentence, its MT output and possibly a human reference translation. We combine such automatic predictions with human judgments to train classifiers for word-level and sentence-level MT confidence values. The latter are used as in a lightweight quality estimation component which also uses the NMT model score, so that there is no significant overhead in computing time when running NMT plus confidence prediction.
Automatic post-editing: LLMs have been shown to also work well in correcting terminology translations and other aspects of NMT output (Moslem et al., 2023), offering better NMT performance when computational resources are available. In many cases, the result tends to be better than asking the LLM to translate the same sentence from scratch.
Adaptation/Fine-tuning: Finally, LLMs can be adapted for specific tasks, such as translating domain-specific content, through continued pre-training and instruction fine-tuning, a solution we have implemented for the climate domain (Thulke et al., 2024). A similar approach is being applied to a concrete translation task, for translation into English from multiple languages, taking longer context into account and reacting to additional world knowledge formulated in prompts, e.g. about the formality level required, idiomatic expressions or even labels representing speaker emotions.

The Future of Translation with LLMs

The future of translation is undeniably intertwined with the continued advancement of LLMs. As these models become more sophisticated, their ability to represent and generate human language will only improve, leading to more accurate, context-aware translations. Moreover, ongoing research and development in areas like multilingual pre-training, cross-lingual transfer learning, and low-resource language translation will further enhance their capabilities.

As we begin to integrate LLMs into translation workflows, the advantages are already becoming apparent in specific, well-defined tasks. In doing so it is crucial to implement LLMs with guidance from a knowledgeable scientific team that can navigate the complexities while fully leveraging their potential to enhance task performance. At AppTek, we prioritize domain adaptation in LLM deployment, an approach that not only improves business efficiency but can also be used to support initiatives like environmental sustainability.

References

(Matusov et al., 2020) Evgeny Matusov, Patrick Wilken, and Christian Herold. 2020. Flexible Customization of a Single Neural Machine Translation System with Multi-dimensional Metadata Inputs. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track), pages 204–216, Virtual. Association for Machine Translation in the Americas.

(Kocmi et al., 2023) Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Philipp Koehn, Benjamin Marie, Christof Monz, Makoto Morishita, Kenton Murray, Makoto Nagata, Toshiaki Nakazawa, Martin Popel, et al. 2023. Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet. In Proceedings of the Eighth Conference on Machine Translation, pages 1–42, Singapore. Association for Computational Linguistics.

(Kocmi and Federmann, 2023) Tom Kocmi and Christian Federmann. 2023. Large Language Models Are State-of-the-Art Evaluators of Translation Quality. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 193–203, Tampere, Finland. European Association for Machine Translation.

(Moslem et al., 2023) Yasmin Moslem, Gianfranco Romani, Mahdi Molaei, John D. Kelleher, Rejwanul Haque, and Andy Way. 2023. Domain Terminology Integration into Machine Translation: Leveraging Large Language Models. In Proceedings of the Eighth Conference on Machine Translation, pages 902–911, Singapore. Association for Computational Linguistics.

(Thulke et al., 2024) Thulke, D., Gao, Y., Pelser, P., Brune, R., Jalota, R., Fok, F., ... & Erasmus, D. (2024). ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change. arXiv preprint arXiv:2401.09646.

‍

Leveraging Large Language Models for Translation: The Future of Multilingual Communication

Home / Speech Technology Blog

The Evolution of Translation Technology

Large Language Models: The New Paradigm

Integrating LLMs into the Translation Pipeline

The Future of Translation with LLMs

References

AI and ML Technologies to Bridge the Language Gap

Find us on Social Media:









ABOUT APPTEK.ai

SEARCH APPTEK.AI

SITEMAP

LATEST NEWS

LATEST BLOG POSTS