The Quality of Google Translate and ChatGPT English to Arabic Translation The Case of Scientific Text Translation

Authors

  • Elham Alzain

    King Faisal University, Alahsa, Saudi Arabia

  • Khalil A. Nagi

    University of Saba Region, Marib, Yemen

  • Faiz AlGobaei

    Northern Border University, Rafha, Saudi Arabia

DOI:

https://doi.org/10.30564/fls.v6i3.6799
Received: 26 June 2024 | Revised: 22 July 2024 | Accepted: 29 July 2024 | Published Online: 27 August 2024

Abstract

The aim of the study is to investigate the quality of neural machine translation (NMT) and that of large language models (LLMs). The research team uses Google Translate and ChatGPT in the translation of various selected scientific texts. They provide an evaluation of the translation outputs. Both an error analysis and human evaluation are provided by professional annotators. The error analysis is provided based on the typology of errors introduced by Multidimensional Quality Metrics (MQM). A professional evaluation is also provided using a 7-point Likert scale. The professional annotators provide an evaluation on the document level. Both the evaluation and the number of errors show that Google Translate outperforms ChatGPT. However, the results indicate that both systems still require a lot of training. It is also suggested that annotated corpora need to be constructed. The study provides invaluable insights on the strength and weakness of the systems under study which will be beneficial for translators, researchers and developers of machine translations.

Keywords:

Google Translate; ChatGPT; translation, quality; error analysis; English-Arabic

References

Aghai, M. 2024. ChatGPT vs. Google Translate: Comparative analysis of translation quality. Iranian Journal of Translation Studies, 22(85). https://journal.translationstudies.ir/ts/article/view/1156

Ahmadnia, B., Dorr, B. J., 2020. Low-resource multi-domain machine translation for Spanish-Farsi: Neural or statistical? Procedia Computer Science. 177, 575–580. DOI: https://doi.org/10.1016/j.procs.2020.10.081

Almekhlafi, H. A., & Nagi, K. A. 2024. Fine-grained evaluation of English to Arabic neural machine translation: A case study of education research abstract. Al-Andalus Journal for Humanities & Social Sciences, 95(11). DOI: https://doi.org/10.35781/1637-000-095-007

Barrault, L., Bojar, O., Costa-jussà, M. R., et al.. 2019. Findings of the 2019 Conference on Machine Translation (WMT19). Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). DOI: https://doi.org/10.18653/v1/w19-530

Bentivogli, L., Bisazza, A., Cettolo, M., & Federico, M. 2016. Neural versus phrase-based machine translation quality: A case study. arXiv preprint arXiv:1608.04631. DOI: https://doi.org/10.48550/arXiv.1608.04631

Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., ... & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712. DOI: https://doi.org/10.48550/arXiv.2303.12712

Buscemi, A., & Proverbio, D. 2024. ChatGPT vs Gemini vs Llama on multilingual sentiment analysis. arXiv preprint arXiv:2402.01715. DOI: https://doi.org/10.48550/arXiv.2402.01715

Castilho, S. 2020. On the same page? Comparing inter-annotator agreement in sentence and document level human machine translation evaluation. In Proceedings of the Fifth Conference on Machine Translation (pp. 1150-1159). https://aclanthology.org/2020.wmt-1.137

Castilho, S., Doherty, S., Gaspari, F., & Moorkens, J. 2018. Approaches to human and machine translation quality assessment. In: J. Moorkens, S. Castilho, F. Gaspari. & S. Doherty (Eds.), Machine Translation: Technologies and Applications. Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-91241-7

Chan, C., Jiayang, C., Wang, W., Jiang, Y., Fang, T., Liu, X., & Song, Y. 2024. Exploring the potential of ChatGPT on sentence level relations: A focus on temporal, causal, and discourse relations. In: Findings of the Association for Computational Linguistics: EACL 2024 (pp. 684-721). https://aclanthology.org/2024.findings-eacl.47

Daems, J., Macken, L., & Vandepitte, S. 2014. On the origin of errors: a fine-grained analysis of MT and PE errors and their relationship. In 9th International Conference on Language Resources and Evaluation (LREC) (pp. 62-66). European Language Resources Association (ELRA). http://hdl.handle.net/1854/LU-4418636

Escartín, C. P., Goulet, M.-J.. 2020. When the post-editor is not a translator. Can machine translation be post-edited by academics to prepare their publications in English? In: Translation Revision and Post-Editing, 89–106. Routledge. DOI: https://doi.org/10.4324/9781003096962-8

Farghal, M. 2017. Textual issues relating to cohesion and coherence in Arabic/English translation. Jordan Journal of Modern Languages and Literature, 9(1), 29-50.

Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., &Macherey, W. 2021. Experts, errors, and context: A large-scale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics, 9, 1460-1474. DOI: https://doi.org/10.1162/tacl_a_00437

Graham, Y., Haddow, B., & Koehn, P. 2019. Translationese in machine translation evaluation. arXiv e-prints, arXiv-1906. DOI: https://doi.org/10.48550/arXiv.1906.09833

Graham, Y., Haddow, B., Koehn. P., 2020. Statistical power and translationese in machine translation evaluation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). DOI: https://doi.org/10.18653/v1/2020.emnlp-main.6

Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., ... & Zhou, M. 2018. Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567. DOI: https://doi.org/10.48550/arXiv.1803.05567

Hendy, A., Abdelrehim, M., Sharaf, A., Raunak, V., Gabr, M., Matsushita, H., ... & Awadalla, H. H. 2023. How good are GPT models at machine translation? a comprehensive evaluation. arXiv preprint arXiv:2302.09210. DOI: https://doi.org/10.48550/arXiv.2302.09210

Hicks, M. T., Humphries, J., & Slater, J. 2024. ChatGPT is bullshit. Ethics and Information Technology, 26(2), 38. DOI: https://doi.org/10.1007/s10676-024-09775-5

Isabelle, P., Cherry, C., Foster. G., 2017. A challenge set approach to evaluating machine translation. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. DOI: https://doi.org/10.18653/v1/d17-1263

Jiao, W., Wang, W., Huang, J. T., Wang, X., & Tu, Z. 2023. Is ChatGPT a good translator? A preliminary study. arXiv preprint arXiv:2301.08745, 1(10). DOI: https://doi.org/10.48550/arXiv.2301.08745

Khoshafah. F., 2023. ChatGPT for Arabic-English translation: Evaluating the accuracy. DOI: https://doi.org/10.21203/rs.3.rs-2814154/v1

Kocmi, T., & Federmann, C. 2023. Large language models are state-of-the-art evaluators of translation quality. In: Proceedings of the 24th Annual Conference of the European Association for Machine Translation (pp. 193-203). https://aclanthology.org/2023.eamt-1.19

Kocmi, T., Bawden, R., Bojar, O., Dvorkovich, A., Federmann, C., Fishel, M. ... & Popović, M. (2022). Findings of the 2022 Conference on Machine Translation (WMT22). In Proceedings of the Seventh Conference on Machine Translation (WMT) (pp. 1-45). https://aclanthology.org/2022.wmt-1.1

Koehn, P., Knowles. R., 2017. Six challenges for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation. DOI: https://doi.org/10.18653/v1/w17-3204

Läubli, S., Castilho, S., Neubig, G., et al.. 2020. A Set of Recommendations for Assessing Human–Machine Parity in Language Translation. Journal of Artificial Intelligence Research, 67. DOI: https://doi.org/10.1613/jair.1.11371

Läubli, S., Sennrich, R., Volk. M., 2018. Has machine translation achieved human parity? A case for document-level evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. DOI: https://doi.org/10.18653/v1/d18-1512

Levin, P., Dhanuka, N., & Khalilov, M. 2017. Machine translation at booking.com: Journey and lessons learned. arXiv preprint arXiv:1707.07911. DOI: https://doi.org/10.48550/arXiv.1707.07911

Liu, J., Liu, C., Zhou, P., Lv, R., Zhou, K., & Zhang, Y. 2023. Is ChatGPT a good recommender? A preliminary study. arXiv preprint arXiv:2304.10149. DOI: https://doi.org/10.48550/arXiv.2304.10149

Lommel, A., Uszkoreit, H., Burchardt, A., 2014. Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics. Tradumàtica Tecnologies de La Traducció. 12, 455–463. DOI: https://doi.org/10.5565/rev/tradumatica.77

Nagi, K. A., 2023. Arabic and English relative clauses and machine translation challenges. Journal of Social Studies. 29(3), 145–165. DOI: https://doi.org/10.20428/jss.v29i3.2180

Nagi, K. A., Alzain, E., & Naji, E. 2024. Informed prompts and improving ChatGPT English to Arabic translation. Al-Andalus Journal for Humanities & Social Sciences, 98(11). https://www.researchgate.net/publication/382295323_Informed_Prompts_and_Improving_ChatGPT_English_to_Arabic_Translation

OpenAI, R. 2023. GPT-4 technical report. arxiv 2303.08774. View in Article, 2, 13. DOI: https://doi.org/10.48550/arXiv.2303.08774

Ortega-Martín, M., García-Sierra, Ó., Ardoiz, A., Álvarez, J., Armenteros, J. C., & Alonso, A. 2023. Linguistic ambiguity analysis in ChatGPT. arXiv preprint arXiv:2302.06426. DOI: https://doi.org/10.48550/arXiv.2302.06426

Poibeau, T. 2022. On "human parity" and "super human performance" in machine translation evaluation. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 6018-6023). https://hal.science/hal-03738720/

Popel, M., Tomkova, M., Tomek, J., et al.. 2020. Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. Nature Communications, 11(1). DOI: https://doi.org/10.1038/s41467-020-18073-9

Popović, M. 2021. On nature and causes of observed MT errors. In Proceedings of the 18th Biennial Machine Translation Summit (Volume 1: Research Track) (pp. 163-175). https://aclanthology.org/2021.mtsummit-research.14

Reeder, F. 2004. Investigation of intelligibility judgments. In: Conference of the Association for Machine Translation in the Americas (pp. 227-235). Berlin, Heidelberg: Springer Berlin Heidelberg. DOI: https://doi.org/10.1007/978-3-540-30194-3_25

Rivera-Trigueros, I., 2021. Machine translation systems and quality assessment: a systematic review. Language Resources and Evaluation. 56(2), 593–619. DOI: https://doi.org/10.1007/s10579-021-09537-5

Saunders, D., 2022. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. Journal of Artificial Intelligence Research. 75, 351–424. DOI: https://doi.org/10.1613/jair.1.13566

Sennrich, R., Zhang. B., 2019. Revisiting low-resource neural machine translation: A case study. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/p19-1021

Siu. S. C., 2023. ChatGPT and GPT-4 for professional translators: Exploring the potential of large language models in translation. SSRN Electronic Journal. DOI: https://doi.org/10.2139/ssrn.4448091

Tehseen, I., Tahir, G. R., Shakeel, K., & Ali, M. 2018. Corpus based machine translation for scientific text. In L. Iliadis, I. Maglogiannis. & V. Plagianakos (Eds.), Artificial Intelligence Applications and Innovations, pp. 196–206. Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-92007-8

Toral, A. 2020. Reassessing claims of human parity and super-human performance in machine translation at WMT 2019. arXiv preprint arXiv:2005.05738. DOI: https://doi.org/10.48550/arXiv.2005.05738

Toral, A., Sánchez-Cartagena, V. M.. 2017. A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. DOI: https://doi.org/10.18653/v1/e17-1100

Toral, A., Castilho, S., Hu, K.. et al., 2018. Attaining the unattainable? Reassessing claims of human parity in neural machine translation. Proceedings of the Third Conference on Machine Translation: Research Papers. DOI: https://doi.org/10.18653/v1/w18-6312

Ulitkin, I., Filippova, I., Ivanova, N., et al., 2021. Automatic evaluation of the quality of machine translation of a scientific text: The results of a five-year-long experiment. E3S Web of Conferences. 284, 08001. DOI: https://doi.org/10.1051/e3sconf/202128408001

Zhu, W., Liu, H., Dong, Q., Xu, J., Kong, L., Chen, J., ... & Huang, S. 2023. Multilingual machine translation with large language models: Empirical results and analysis. arXiv e-prints, arXiv-2304. DOI: https://doi.org/10.48550/arXiv.2304.04675

Zulfiqar, S., Wahab, M. F., Sarwar, M. I., et al., 2018. Is machine translation a reliable tool for reading German scientific databases and research articles? Journal of Chemical Information and Modeling. 58(11), 2214–2223. DOI: https://doi.org/10.1021/acs.jcim.8b00534

Downloads

How to Cite

Elham Alzain, Nagi, K. A., & AlGobaei, F. (2024). The Quality of Google Translate and ChatGPT English to Arabic Translation The Case of Scientific Text Translation. Forum for Linguistic Studies, 6(3), 837–849. https://doi.org/10.30564/fls.v6i3.6799

Issue

Article Type

Article