-
1968
-
1141
-
812
-
616
-
601
The Quality of Google Translate and ChatGPT English to Arabic Translation The Case of Scientific Text Translation
DOI:
https://doi.org/10.30564/fls.v6i3.6799Abstract
The aim of the study is to investigate the quality of neural machine translation (NMT) and that of large language models (LLMs). The research team uses Google Translate and ChatGPT in the translation of various selected scientific texts. They provide an evaluation of the translation outputs. Both an error analysis and human evaluation are provided by professional annotators. The error analysis is provided based on the typology of errors introduced by Multidimensional Quality Metrics (MQM). A professional evaluation is also provided using a 7-point Likert scale. The professional annotators provide an evaluation on the document level. Both the evaluation and the number of errors show that Google Translate outperforms ChatGPT. However, the results indicate that both systems still require a lot of training. It is also suggested that annotated corpora need to be constructed. The study provides invaluable insights on the strength and weakness of the systems under study which will be beneficial for translators, researchers and developers of machine translations.
Keywords:
Google Translate; ChatGPT; translation, quality; error analysis; English-ArabicReferences
Aghai, M. 2024. ChatGPT vs. Google Translate: Comparative analysis of translation quality. Iranian Journal of Translation Studies, 22(85). https://journal.translationstudies.ir/ts/article/view/1156
Ahmadnia, B., Dorr, B. J., 2020. Low-resource multi-domain machine translation for Spanish-Farsi: Neural or statistical? Procedia Computer Science. 177, 575–580. DOI: https://doi.org/10.1016/j.procs.2020.10.081
Almekhlafi, H. A., & Nagi, K. A. 2024. Fine-grained evaluation of English to Arabic neural machine translation: A case study of education research abstract. Al-Andalus Journal for Humanities & Social Sciences, 95(11). DOI: https://doi.org/10.35781/1637-000-095-007
Barrault, L., Bojar, O., Costa-jussà, M. R., et al.. 2019. Findings of the 2019 Conference on Machine Translation (WMT19). Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). DOI: https://doi.org/10.18653/v1/w19-530
Bentivogli, L., Bisazza, A., Cettolo, M., & Federico, M. 2016. Neural versus phrase-based machine translation quality: A case study. arXiv preprint arXiv:1608.04631. DOI: https://doi.org/10.48550/arXiv.1608.04631
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., ... & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712. DOI: https://doi.org/10.48550/arXiv.2303.12712
Buscemi, A., & Proverbio, D. 2024. ChatGPT vs Gemini vs Llama on multilingual sentiment analysis. arXiv preprint arXiv:2402.01715. DOI: https://doi.org/10.48550/arXiv.2402.01715
Castilho, S. 2020. On the same page? Comparing inter-annotator agreement in sentence and document level human machine translation evaluation. In Proceedings of the Fifth Conference on Machine Translation (pp. 1150-1159). https://aclanthology.org/2020.wmt-1.137
Castilho, S., Doherty, S., Gaspari, F., & Moorkens, J. 2018. Approaches to human and machine translation quality assessment. In: J. Moorkens, S. Castilho, F. Gaspari. & S. Doherty (Eds.), Machine Translation: Technologies and Applications. Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-91241-7
Chan, C., Jiayang, C., Wang, W., Jiang, Y., Fang, T., Liu, X., & Song, Y. 2024. Exploring the potential of ChatGPT on sentence level relations: A focus on temporal, causal, and discourse relations. In: Findings of the Association for Computational Linguistics: EACL 2024 (pp. 684-721). https://aclanthology.org/2024.findings-eacl.47
Daems, J., Macken, L., & Vandepitte, S. 2014. On the origin of errors: a fine-grained analysis of MT and PE errors and their relationship. In 9th International Conference on Language Resources and Evaluation (LREC) (pp. 62-66). European Language Resources Association (ELRA). http://hdl.handle.net/1854/LU-4418636
Escartín, C. P., Goulet, M.-J.. 2020. When the post-editor is not a translator. Can machine translation be post-edited by academics to prepare their publications in English? In: Translation Revision and Post-Editing, 89–106. Routledge. DOI: https://doi.org/10.4324/9781003096962-8
Farghal, M. 2017. Textual issues relating to cohesion and coherence in Arabic/English translation. Jordan Journal of Modern Languages and Literature, 9(1), 29-50.
Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., &Macherey, W. 2021. Experts, errors, and context: A large-scale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics, 9, 1460-1474. DOI: https://doi.org/10.1162/tacl_a_00437
Graham, Y., Haddow, B., & Koehn, P. 2019. Translationese in machine translation evaluation. arXiv e-prints, arXiv-1906. DOI: https://doi.org/10.48550/arXiv.1906.09833
Graham, Y., Haddow, B., Koehn. P., 2020. Statistical power and translationese in machine translation evaluation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). DOI: https://doi.org/10.18653/v1/2020.emnlp-main.6
Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., ... & Zhou, M. 2018. Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567. DOI: https://doi.org/10.48550/arXiv.1803.05567
Hendy, A., Abdelrehim, M., Sharaf, A., Raunak, V., Gabr, M., Matsushita, H., ... & Awadalla, H. H. 2023. How good are GPT models at machine translation? a comprehensive evaluation. arXiv preprint arXiv:2302.09210. DOI: https://doi.org/10.48550/arXiv.2302.09210
Hicks, M. T., Humphries, J., & Slater, J. 2024. ChatGPT is bullshit. Ethics and Information Technology, 26(2), 38. DOI: https://doi.org/10.1007/s10676-024-09775-5
Isabelle, P., Cherry, C., Foster. G., 2017. A challenge set approach to evaluating machine translation. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. DOI: https://doi.org/10.18653/v1/d17-1263
Jiao, W., Wang, W., Huang, J. T., Wang, X., & Tu, Z. 2023. Is ChatGPT a good translator? A preliminary study. arXiv preprint arXiv:2301.08745, 1(10). DOI: https://doi.org/10.48550/arXiv.2301.08745
Khoshafah. F., 2023. ChatGPT for Arabic-English translation: Evaluating the accuracy. DOI: https://doi.org/10.21203/rs.3.rs-2814154/v1
Kocmi, T., & Federmann, C. 2023. Large language models are state-of-the-art evaluators of translation quality. In: Proceedings of the 24th Annual Conference of the European Association for Machine Translation (pp. 193-203). https://aclanthology.org/2023.eamt-1.19
Kocmi, T., Bawden, R., Bojar, O., Dvorkovich, A., Federmann, C., Fishel, M. ... & Popović, M. (2022). Findings of the 2022 Conference on Machine Translation (WMT22). In Proceedings of the Seventh Conference on Machine Translation (WMT) (pp. 1-45). https://aclanthology.org/2022.wmt-1.1
Koehn, P., Knowles. R., 2017. Six challenges for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation. DOI: https://doi.org/10.18653/v1/w17-3204
Läubli, S., Castilho, S., Neubig, G., et al.. 2020. A Set of Recommendations for Assessing Human–Machine Parity in Language Translation. Journal of Artificial Intelligence Research, 67. DOI: https://doi.org/10.1613/jair.1.11371
Läubli, S., Sennrich, R., Volk. M., 2018. Has machine translation achieved human parity? A case for document-level evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. DOI: https://doi.org/10.18653/v1/d18-1512
Levin, P., Dhanuka, N., & Khalilov, M. 2017. Machine translation at booking.com: Journey and lessons learned. arXiv preprint arXiv:1707.07911. DOI: https://doi.org/10.48550/arXiv.1707.07911
Liu, J., Liu, C., Zhou, P., Lv, R., Zhou, K., & Zhang, Y. 2023. Is ChatGPT a good recommender? A preliminary study. arXiv preprint arXiv:2304.10149. DOI: https://doi.org/10.48550/arXiv.2304.10149
Lommel, A., Uszkoreit, H., Burchardt, A., 2014. Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics. Tradumàtica Tecnologies de La Traducció. 12, 455–463. DOI: https://doi.org/10.5565/rev/tradumatica.77
Nagi, K. A., 2023. Arabic and English relative clauses and machine translation challenges. Journal of Social Studies. 29(3), 145–165. DOI: https://doi.org/10.20428/jss.v29i3.2180
Nagi, K. A., Alzain, E., & Naji, E. 2024. Informed prompts and improving ChatGPT English to Arabic translation. Al-Andalus Journal for Humanities & Social Sciences, 98(11). https://www.researchgate.net/publication/382295323_Informed_Prompts_and_Improving_ChatGPT_English_to_Arabic_Translation
OpenAI, R. 2023. GPT-4 technical report. arxiv 2303.08774. View in Article, 2, 13. DOI: https://doi.org/10.48550/arXiv.2303.08774
Ortega-Martín, M., García-Sierra, Ó., Ardoiz, A., Álvarez, J., Armenteros, J. C., & Alonso, A. 2023. Linguistic ambiguity analysis in ChatGPT. arXiv preprint arXiv:2302.06426. DOI: https://doi.org/10.48550/arXiv.2302.06426
Poibeau, T. 2022. On "human parity" and "super human performance" in machine translation evaluation. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 6018-6023). https://hal.science/hal-03738720/
Popel, M., Tomkova, M., Tomek, J., et al.. 2020. Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. Nature Communications, 11(1). DOI: https://doi.org/10.1038/s41467-020-18073-9
Popović, M. 2021. On nature and causes of observed MT errors. In Proceedings of the 18th Biennial Machine Translation Summit (Volume 1: Research Track) (pp. 163-175). https://aclanthology.org/2021.mtsummit-research.14
Reeder, F. 2004. Investigation of intelligibility judgments. In: Conference of the Association for Machine Translation in the Americas (pp. 227-235). Berlin, Heidelberg: Springer Berlin Heidelberg. DOI: https://doi.org/10.1007/978-3-540-30194-3_25
Rivera-Trigueros, I., 2021. Machine translation systems and quality assessment: a systematic review. Language Resources and Evaluation. 56(2), 593–619. DOI: https://doi.org/10.1007/s10579-021-09537-5
Saunders, D., 2022. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. Journal of Artificial Intelligence Research. 75, 351–424. DOI: https://doi.org/10.1613/jair.1.13566
Sennrich, R., Zhang. B., 2019. Revisiting low-resource neural machine translation: A case study. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/p19-1021
Siu. S. C., 2023. ChatGPT and GPT-4 for professional translators: Exploring the potential of large language models in translation. SSRN Electronic Journal. DOI: https://doi.org/10.2139/ssrn.4448091
Tehseen, I., Tahir, G. R., Shakeel, K., & Ali, M. 2018. Corpus based machine translation for scientific text. In L. Iliadis, I. Maglogiannis. & V. Plagianakos (Eds.), Artificial Intelligence Applications and Innovations, pp. 196–206. Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-92007-8
Toral, A. 2020. Reassessing claims of human parity and super-human performance in machine translation at WMT 2019. arXiv preprint arXiv:2005.05738. DOI: https://doi.org/10.48550/arXiv.2005.05738
Toral, A., Sánchez-Cartagena, V. M.. 2017. A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. DOI: https://doi.org/10.18653/v1/e17-1100
Toral, A., Castilho, S., Hu, K.. et al., 2018. Attaining the unattainable? Reassessing claims of human parity in neural machine translation. Proceedings of the Third Conference on Machine Translation: Research Papers. DOI: https://doi.org/10.18653/v1/w18-6312
Ulitkin, I., Filippova, I., Ivanova, N., et al., 2021. Automatic evaluation of the quality of machine translation of a scientific text: The results of a five-year-long experiment. E3S Web of Conferences. 284, 08001. DOI: https://doi.org/10.1051/e3sconf/202128408001
Zhu, W., Liu, H., Dong, Q., Xu, J., Kong, L., Chen, J., ... & Huang, S. 2023. Multilingual machine translation with large language models: Empirical results and analysis. arXiv e-prints, arXiv-2304. DOI: https://doi.org/10.48550/arXiv.2304.04675
Zulfiqar, S., Wahab, M. F., Sarwar, M. I., et al., 2018. Is machine translation a reliable tool for reading German scientific databases and research articles? Journal of Chemical Information and Modeling. 58(11), 2214–2223. DOI: https://doi.org/10.1021/acs.jcim.8b00534
Downloads
How to Cite
Issue
Article Type
License
Copyright © 2024 Elham Alzain, Khalil A. Nagi, Faiz AlGobaei
This is an open access article under the Creative Commons Attribution 4.0 International License.