Applying Large Language Models in Legal Translation: The State-of-the-Art
Abstract
While there is no denying that new AI technologies and tools are making a significant impact on translation, specialized translation remains problematic for automation especially in regard to terminology. Precise and consistent terminology use requires particular attention in the field of legal translation as a sensitive area affecting the whole of the society. Most recently, tools based on large language models have been gaining attention as potential game changers in the area of legal translation. How do such tools cope with the complexity of legal terminology and legal texts? Can they be trusted to perform the task of legal translation? To investigate this, a two-stage research methodology is applied: first, an analysis of papers published on the topic of large language models and legal translation by relying on quantitative bibliographic and bibliometric indicators, and second, a qualitative content analysis of the retrieved papers. The results show that despite the unparalleled interest into the application of generative AI in all spheres of life, to date there has been scarce research on its application in the field of legal translation. The results of this study therefore provide detailed insight into the state-of-the-art research on this novel topic, tracing the current and proposing future research trajectories. With a view of examining the potential and liabilities of using LLMs in legal translation, it is instrumental to conduct further empirical studies from interdisciplinary perspectives including diverse legal texts and both low and high-resourced languages.
Keywords
legal translation, terminology, generative AI, NMT, bibliometric study
References
- Ait ElFqih, Khadija & Monti, Johanna (2023). On the evaluation of terminology translation errors in NMT and PB-SMT in the legal domain: A study on the translation of Arabic legal documents into English and French. Proceedings of the First ConTenNTS Workshop and the 16th BUCC Workshop, 26–35. DOI: 10.26615/978-954-452-090-8_004
- Bago, Petra; Castilho, Sheila; Celeste, Edoardo; Dunne, Jane; Gaspari, Federico; Gíslason, Níels; […] & Way, Andy (2022). Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages. Revista de Llengua i Dret (Journal of Language and Law), 78, 9–34. DOI: 10.2436/rld.i78.2022.3741
- Bajčić, Martina & Dobrić Basaneže, Katja (2021). Considering foreignization and domestication in EU legal translation: A corpus-based study. Perspectives, 29(5), 706–721. DOI: 10.1080/0907676X.2020.1794016
- Bajcic, Martina (2010). Challenges of translating EU terminology. In Gotti & Williams (Eds.), Legal Discourse across Languages and Cultures (pp. 75–94). Peter Lang.
- Bajčić, Martina (2021). Linguistic comparison within CJEU’s decision-making: A debunking exercise. International Journal for the Semiotics of Law-Revue Internationale de Sémiotique Juridique, 34(5), 1433–1449. DOI: 10.1007/s11196-020-09751-4
- Biel, Łucja (2023). Variation of legal terms in monolingual and multilingual contexts: Types, distribution, attitudes and causes. In Biel & Hendrik (Eds.), Handbook of Terminology. Volume 3. Legal Terminology (pp. 90–123). John Benjamins.
- Briva-Iglesias, Vicent; Camargo, Joao & Dogru, Gokhan (2024). Large Language Models ‚ad referendum‘: How good are they at machine translation in the legal domain?. ArXiv Preprint ArXiv:2402.07681. DOI: 10.48550/arXiv.2402.07681
- Briva-Iglesias, Vicent (2021). Traducción humana vs. traducción automática: análisis contrastivo e implicaciones para la aplicación de la traducción automática en traducción jurídica. Mutatis Mutandis. Revista Latinoamericana de Traducción, 14(2), 571–600. DOI: 10.17533/udea.mut.v14n2a14
- C-6/90 and C-9/90, Andrea Francovich v. Italy. ECLI:EU:C:1991:428. eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A61990CJ0006 (accessed 17. Nov 2024).
- Cabezas-García, Melania & León-Araúz, Pilar (2023). Machine versus corpus-based translation of multiword terms. Digital Scholarship in the Humanities, 38(1), i6–i16. DOI: 10.1093/llc/fqad026
- Carvalho Inês & Ivanov, Stanislav (2024). ChatGPT for tourism: Applications, benefits and risks. Tourism Review, 79(2), 290–303. DOI: 10.1108/TR-02-2023-0088
- Cohen, Glenn (2023). What should ChatGPT mean for bioethics?. The American Journal of Bioethics, 23(10), 8–16. DOI: 10.1080/15265161.2023.2233357
- Cui, Ying; Liu, Xiao & Cheng, Yuqin (2023). A comparative study on the effort of human translation and post-editing in relation to text types: An eye-tracking and key-logging experiment. SAGE Open, 13(1), DOI: 10.1177/21582440231155849
- De Bellis, Nicola (2009). Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics. Scarecrow Press.
- Dobrić Basaneže, Katja & Bajčić, Martina (2023). The “new normal” terminology: A corpus-based study into term variation in COVID-19-related EU legislative texts. Rasprave: Časopis Instituta za Hrvatski Jezik i Jezikoslovlje, 49(2), 365–385. DOI: 10.31724/rihjj.49.2.9
- Duro Moreno, Miguel (2012). El modelo de los entornos de la traducción: aplicaciones didácticas a la traducción jurídica. In Ortega Arjonilla; Balliu; Alarcón Navío & Belén Martínez López (Eds.), À Propos de L’enseignement de la Traduction et L’interprétation en Europe/Sobre la Enseñanza de la Traducción y la Interpretación en Europa (pp. 279–288). Comares.
- Francovich ν Italian Republic, Opinion of Advocate General Cosmas, delivered on 11 July 1995, ECLI:EU:C:1995:232. eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:61993CC0479&from=EN (accessed 17. Nov 2024).
- Friesen, Erica (2022). The artificial researcher: information literacy and AI in the legal research classroom. Legal Writing: Journal of Legal Writing Institute, 26, 241.
- Gadiraju, Vinitha; Kane, Shaun; Dev, Sunipa; Taylor, Alex; Wang, Ding; Denton, Emily & Brewer, Robin (2023). ‘I wouldn’t say offensive but…‘: disability-centered perspectives on Large Language Models. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 205–216). Association for Computing Machinery. DOI: 10.1145/3593013.3593989
- Giampieri, Patrizia (2023). Is machine translation reliable in the legal field? A corpus-based critical comparative analysis for teaching ESP at tertiary level. ESP Today, 11(1), 119–137. DOI: 10.18485/esptoday.2023.11.1.6
- Grow, April & Khosmood, Foaad (2023). ChatGPT GameJam: Unleashing the power of Large Language Models for Game Jams. Proceedings of the 7th International Conference on Game Jams, Hackathons and Game Creation Events (pp. 51–54). Association for Computing Machinery. DOI: 10.1145/3610602.3610605
- Lee, Gyubok; Yang, Seongjun & Choi, Edward (2021). Improving lexically constrained neural machine translation with source-conditioned masked span prediction. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Vol. 2: Short Papers) (pp. 743–753). Association for Computational Linguistics (ACL).
- Hacker, Philipp; Engel, Andreas & Mauer, Marco (2023). Regulating ChatGPT and other large generative AI models. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 1112–1123). DOI: 10.1145/3593013.3594067
- Hebrang Grgić, Ivana (2016). Časopisi i Znanstvena Komunikacija. Naklada Ljevak.
- Huck, Matthias; Hangya, Viktor & Fraser, Alexander. (2019). Better OOV translation with bilingual terminology mining. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 5809–5815).
- Jokić, Maja (2005). Bibliometrijski Aspekti Vrednovanja Znanstvenog Rada. Sveučilišna knjižara.
- Kerremans, Koen & Temmerman, Rita (2016). How terminological equivalence differs from translation equivalence: Quantitative and qualitative comparisons of term variants and their translations in a parallel corpus of EU texts. Corpus-Based Approaches to Translation and Interpreting: From Theory to Applications, 43–63.
- Killman, J. (2023). Rendering morphosyntactic features of legal Spanish judgments using neural and statistical machine translation. In Zhao, Li & Lei (Eds.), New Advances in Legal Translation and Interpreting (pp. 221–242). Singapore: Springer Nature. DOI: 10.1007/978-981-19-9422-7_12
- Killman, Jeffrey (2023)a. Machine translation and legal terminology: Data-driven approaches to contextual accuracy. In Kockaert & Steurs (Eds.), Handbook of Terminology (pp. 485–510). John Benjamins.
- Krimpas, Panagiotis & Valavani, Christina (2022). Attention mechanism and skip-gram embedded phrases: short and long-distance dependency n-grams for legal corpora. Comparative Legilinguistics, 52, 318–350. DOI: 10.14746/cl.52.2022.14
- Krippendorff, Klaus (2003). Content Analysis: An Introduction to its Methodology. Sage Publications. DOI: 10.4135/9781071878781
- Larroyed, Aline (2023). Redefining patent translation: The influence of ChatGPT and the urgency to align patent language regimes in Europe with progress in translation technology. GRUR International, 72(11), 1009–1017. DOI: 10.1093/grurint/ikad099
- Li, Jian & Hu, Xitao (2022). Visualizing legal translation: a bibliometric study. International Journal of Legal Discourse, 7(1), 143–162. DOI: 10.1515/ijld-2022-2067
- Maslej, Nestor; Fattorini, Loredana; Brynjolfsson, Erik; Etchemendy, John; Ligett, Katrina; Lyons, Terah; […] & Perrault, Raymond (2023). Artificial intelligence index report 2023. ArXiv Preprint ArXiv:2310.03715.
- Mitrović, Goranka & Romić, Kristina (2018). Bibliometrijska analiza časopisa" Sigurnost" od 2005. do 2015. godine. Sigurnost: Časopis Za Sigurnost u Radnoj i Zivotnoj Okolini, 60(1), 25–35. DOI: 10.31306/s.60.1.3
- Moslavac, Aleksandra (2022). Bibliografska, bibliometrijska i scientometrijska analiza znanstvenih časopisa Filozofskog fakulteta Sveučilišta u Rijeci za razdoblje od 2010. do 2020. godine. BOSNIACA-časopis Nacionalne i Univerzitetske Biblioteke Bosne i Hercegovine, (27), 192–215.
- Porsdam Mann, Sebastian; Earp, Brian; Møller, Nikolaj; Vynn, Suren & Savulescu, Julian (2023). AUTOGEN: A personalized large language model for academic enhancement—Ethics and proof of principle. The American Journal of Bioethics, 23(10), 28–41.
- Portillo-Lara, Roberto; Tahirbegi, Bogachan; Chapman, Christopher; Goding, Josef & Green, Rylie (2021). Mind the gap: State-of-the-art technologies and applications for EEG-based brain–computer interfaces. APL Bioengineering, 5(3). DOI: 10.1063/5.0047237
- Rahimzadeh, Vasiliki; Kostick-Quenet, Kirstin; Blumenthal Barby, Jennifer & McGuire, Amy (2023). Ethics education for healthcare professionals in the era of ChatGPT and other large language models: Do we still need it?. The American Journal of Bioethics, 23(10), 17–27. DOI: 10.1080/15265161.2023.2233358
- Rezaeikhonakdar, Delaram (2023). AI chatbots and challenges of HIPAA compliance for AI developers and vendors. Journal of Law, Medicine & Ethics, 51(4), 988–995. DOI: 10.1017/jme.2024.15
- Savolainen, Reijo (2022). Everyday Life Information Seeking. Encyclopaedia of Library and Information Science. Dekker Encyclopaedia.
- SCOPUS. scopus.com/
- Shnurenko, Igor; Murovana, Tatiana; Kushchu, Ibrahim & Demirel, Tuba (2020). Artificial intelligence: media and information literacy, human rights and freedom of expression. UNESCO. unesdoc.unesco.org/ark:/48223/pf0000375983 (accessed 17 Nov 2024).
- Sosoni, Vilelmini; O’Shea, John & Stasimioti, Maria (2022). Translating law: A comparison of human and post-edited translations from Greek to English. Revista de Llengua i Dret, 78, 92–120.
- Sousa-Silva, Rui (2024). ‘We attempted to deliver your package’: Forensic translation in the fight against cross-border cybercrime. International Journal for the Semiotics of Law-Revue Internationale de Sémiotique Juridique, 1–27.
- Taylor, Alice. (2023). Albania to speed up EU accession using ChatGPT. Euractiv.
- Tkalac Verčič, Ana; Sinčić Ćorić, Dubravka & Pološki Vokić, Nina (2011). Priručnik za Metodologiju Istraživačkog Rada u Društvenim Istraživanjima. MEP.
- Van Brussel, Laura; Tezcan, Arda & Macken, Lieve (2018). A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch. Eleventh International Conference on Language Resources and Evaluation (pp. 3799–3804). European Language Resources Association (ELRA). biblio.ugent.be/publication/8561558 (accessed 17. Nov 2024).
- Vigier-Moreno, Francisco & Pérez-Macías, Lorena (2022). Assessing neural machine translation of court documents: A case study on the translation of a Spanish remand order into English. Revista de Llengua i Dret, Journal of Language and Law, (78), 73–91. DOI: 10.2436/rld.i78.2022.3691
- Web of Science Core Collection. webofscience.com/wos/woscc/basic-search (accessed 17. Nov 2024).
- Weingart, Peter (2005). Impact of bibliometrics upon the science system: Inadvertent consequences?. Scientometrics, 62, 117–131. DOI: 10.1007/s11192-005-0007-7
- Vieira, Lucas; O’Hagan, Minako & O’Sullivan, Carol (2021). Understanding the societal impacts of machine translation: a critical review of the literature on medical and legal use cases. Information, Communication & Society, 24(11), 1515–1532.
- Wiesmann, Eva (2019). Machine translation in the field of law: A study of the translation of Italian legal texts into German. Comparative Legilinguistics, 37(1), 117–153.
- Wouters, Paul. (2020). The mismeasurement of quality and impact. In Biagioli (Ed.), Gaming the Metrics: Misconduct and Manipulation in Academic Research (pp. 67–76). The MIT Press.
- Zini, Julia & Awad, Mariette (2022). On the explainability of natural language processing deep models. ACM Computing Surveys, 55(5), 1–31.
- Zemni, Bahia; Bouhadiba, Farouk & Zitouni, Mimouna (2021). Pesquisa cognitiva e tradução automática em jurilinguística. Texto Livre, 15. DOI: 10.35699/1983-3652.2022.27031