Adapting the Generic English-Croatian NMT Model to a Religious Domain

Authors

Marija Brkić Bakarić
Faculty of Informatics and Digital Technologies, University of Rijeka, Croatia
https://orcid.org/0000-0003-4079-4012 (unauthenticated)
Lucia Načinović Prskalo
Faculty of Informatics and Digital Technologies, University of Rijeka, Croatia
https://orcid.org/0000-0002-8832-2527 (unauthenticated)
Košuta Estera Lerga
Faculty of Humanities and Social Sciences, University of Rijeka, Croatia

Synopsis

Recent discoveries in the field of artificial intelligence have significantly impacted various professions, including the translation industry, leading to notable changes in translators’ work processes. The study presented in this article indicates that today any translator, even those without advanced IT skills, can develop a higher quality Neural Machine Translation (NMT) system based on their own texts. This paper evaluates Google’s AutoML Translation service, which enables users to train high-quality models using their own text data. Specifically, AutoML Translation integrates an additional layer that tailors the generic Translation API model to a specific domain. The training process involves providing a user-defined dataset containing aligned sentences in the source and target languages. Google’s AutoML Translation service was used to adapt the base English-Croatian Google NMT model to the field of religion. Following a brief introduction to machine translation, this paper outlines the key aspects of the training and evaluation processes. Additionally, it presents two corpora employed in the training phase. The results demonstrate that a customized model outperforms the base model, as evidenced by the BLEU score.

Downloads

Published

January 9, 2025

How to Cite

Brkić Bakarić, M. ., Načinović Prskalo, L. ., & Lerga, K. E. . (2025). Adapting the Generic English-Croatian NMT Model to a Religious Domain. In L. . Grčić & M. . Brkić Bakarić (Eds.), Corpora in Language Learning, Translation and Research: Proceedings of the International Conference Corpora in Language Learning, Translation and Research held at the University of Zadar (August 23–24, 2023) (pp. 107-116). Morepress Books. https://doi.org/10.15291/9789533315355.08