Semantic Errors in Machine Translation: A Study of Colloquial Arabic-English Captions on TikTok

Authors

  • Shatha Abdullah Alshaye

    Department of English, College of Language Sciences, King Saud University, Riyadh 11652, Saudi Arabia

DOI:

https://doi.org/10.30564/fls.v7i9.10711
Received: 25 June 2025 | Revised: 7 July 2025 | Accepted: 15 July 2025 | Published Online: 1 September 2025

Abstract

This study investigates the semantic accuracy of TikTok's machine translation (MT) system, focusing on auto-generated English captions translated from colloquial Arabic content posted by the verified MBC1 and Shahid TikTok accounts. The primary objective is to identify and categorize semantic errors using Sayogie's (2014) framework, which classifies meaning into three interrelated dimensions: grammatical, contextual, and referential. Employing a descriptive qualitative approach, the research analyses a sample of colloquial Arabic captions to evaluate the extent to which TikTok's translation system preserves the intended meaning. Findings reveal a consistent presence of all three error types, underscoring the system's limitations in processing idiomatic, context-dependent expressions typical of collo-quial Arabic. These inaccuracies primarily result from the system's reliance on literal translation strategies, which fail to account for figurative language, cultural references, and emotional nuance. While TikTok's MT feature enhances accessibility for multilingual users, it remains inadequate in preserving semantic integrity, particularly in informal, culturally embedded content. To address these challenges, the study recommends developing more context-aware, dialect-sensitive models capable of handling the pragmatic and cultural complexity of colloquial speech. The findings contribute to current debates in MT evaluation by highlighting the need to prioritize semantic adequacy, especially for low-resource and dialect-rich languages such as Arabic.

Keywords:

Colloquial Language; Machine Translation; Semantic Errors; TikTok; Translation Errors

References

[1] Utami, N.M.V., Irwandika, G., 2021. Semantic errors in the translation of the Hindu’s Instagram account. English Language Teaching, Literature, and Translation (ELTLT) 2021. 10(1), 84–88.

[2] Bowker, L., 2020. Chinese speakers’ use of machine translation as an aid for scholarly writing in English: A review of the literature and a report on a pilot workshop on machine translation literacy. Asia Pacific Translation and Intercultural Studies. 7(3), 288–298. DOI: https://doi.org/10.1080/23306343.2020.1805843

[3] Kumar, M.A., Premjith, B., Singh, S., et al., 2019. An overview of the shared task on machine translation in Indian languages (MTIL)- 2017. Journal of Intelligent Systems. 28(3), 455–464. DOI: https://doi.org/10.1515/jisys-2018-0024

[4] Kunchukuttan, A., Bhattacharyya, P., 2022. Machine translation and transliteration involving related and low-resource languages. CNC Press: Boca Raton, FL, USA.

[5] Larson, M.L., 1998. Meaning-based translation: A guide to cross-language equivalence. University Press of America: New York, NY, USA.

[6] Sari, D.M., 2019. An error analysis on students' translation text. Eralingua: Jurnal Pendidikan Bahasa Asing dan Sastra. 3(2), 65–74. DOI: https://doi.org/10.26858/eralingua.v3i2.8658

[7] Lee, S.M., 2020. The impact of using machine translation on EFL students’ writing. Computer Assisted Language Learning. 33(3), 157–175. DOI: https://doi.org/10.1080/09588221.2018.1553186

[8] Khoong, E.C., Rodriguez, J.A., 2022. A research agenda for using machine translation in clinical medicine. Journal of General Internal Medicine. 37(5), 1275–1277. DOI: https://doi.org/10.1007/s11606-021-07164-y

[9] Gupta, V., Thakral, K.S., 2019. Divergence Issues in machine translation for English-Punjabi language. Proceedings of Recent Advances in Interdisciplinary Trends in Engineering & Applications (RAITEA); Indore, India; 14–16 February 2019.

[10] Liu, S., Sun, Y., Wang, L., 2021. Recent advances in dialogue machine translation. Information. 12(11), 1–2. DOI: https://doi.org/10.3390/info12110481

[11] Hoi, H.T., 2020. Machine translation and its impact in our modern society. International Journal of Scientific & Technology Research. 9(4), 1918–1921.

[12] Ying, C., Shuyu, Y., Jing, L., et al., 2021. Errors of machine translation of terminology in the patent text from English into Chinese. ASP Transactions on Computers. 1(1), 12–17. DOI: https://doi.org/10.52810/TC.2021.100022

[13] Kostadinova, V., Yáñez-Bouza, N., Dreschler, G., et al., 2019. I English Language. The Year's Work in English Studies. 98(1), 1–166.

[14] Popowich, F., 1996. A chart generator for Shake and Bake Machine Translation. In: Advances in Artificial Intelligence: 11th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI'96 Toronto, Ontario, Canada, May 21–24, 1996 Proceedings, 11, 97–108. Springer: Berlin/Heidelberg, Germany.

[15] Rozovskaya, A., Sproat, R., Benmamoun, E., 2006. Challenges in processing colloquial Arabic. Proceedings of the International Conference on the Challenge of Arabic for NLP/MT; London, UK; 23 October 2006. pp. 4–14.

[16] Mirhashemi, Z., Gholami, M., Bahri, H., 2024. A Comparative Study on Translation of Persian Colloquialism into English by ChatGPT and Other Translation Platforms. Iranian Journal of Translation Studies. 22(87). Available from: https://journal.translationstudies.ir/ts/article/view/1203 (cited 25 May 2025).

[17] Sun, Z., Zemel, R., Xu, Y., 2022. Semantically informed slang interpretation. arXiv preprint. arXiv:2205.00616.

[18] Fattah, B.O., Salih, S.M., 2023. Drawing a Demarcation Line between Two Overlapping Colloquial Elements: The Case of Idioms and Clichés. Koya University Journal of Humanities and Social Sciences. 6(1), 65–76.

[19] Nasution, D.K., 2022. Machine translation in website localization: Assessing its translation quality for language learning. AL-ISHLAH: Jurnal Pendidikan. 14(2), 1879–1886. DOI: https://doi.org/10.35445/alishlah.v14i2.1308

[20] Läubli, S., Sennrich, R., Volk, M., 2018. Has machine translation achieved human parity? A case for document-level evaluation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; Brussels, Belgium; 31 October–4 November 2018. Association for Computational Linguistics: Brussels, Belgium. pp. 4791–4796. DOI: http://dx.doi.org/10.18653/v1/D18-1512

[21] Läubli, S., Castilho, S., Neubig, G., et al., 2020. A set of recommendations for assessing human–machine parity in language translation. Journal of Artificial Intelligence Research. 67, 653–672. DOI: https://doi.org/10.1613/jair.1.11371

[22] Putri, A.T., Setiajid, H.H., 2021. Instagram translate and human translation in the English captions of Jokowi’s account: An analysis of Koponen’s error category. English Language and Literature International Conference (ELLiC) Proceedings. 4, 432–436.

[23] Utami, N.M.V., Jayantini, I.G.A.S.R., Pratiwi, Y., 2021. Lexical analysis of semantic errors found in the translation of Joko Widodo’s Instagram account. English Language and Literature International Conference (ELLiC) Proceedings. 4, 291–297.

[24] Susanti, E., 2018. Lexical errors produced by Instagram machine translation [Ph.D. Thesis]. Universitas Islam Negeri Maulana Malik Ibrahim: Malang, Indonesia. Available from: http://etheses.uin-malang.ac.id/id/eprint/14231 (cited 20 May 2025).

[25] Tan, X., Chen, J., He, D., et al., 2019. Multilingual neural machine translation with language clustering. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Hong Kong, China; 3–7 November 2019. DOI: https://doi.org/10.18653/v1/D19-1089

[26] Han, L., Jones, G.J., Smeaton, A.F., 2021. Translation quality assessment: A brief survey on manual and automatic methods. Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age; Saarbrücken, Germany; 31 May 2021. pp. 15–33.

[27] Fan, A., Bhosale, S., Schwenk, H., et al., 2021. Beyond English-centric multilingual machine translation. The Journal of Machine Learning Research. 22(1), 4839–4886.

[28] Vieira, L.N., O’Hagan, M., O’Sullivan, C., 2021. Understanding the societal impacts of machine translation: A critical review of the literature on medical and legal use cases. Information, Communication & Society. 24(11), 1515–1532. DOI: https://doi.org/10.1080/1369118X.2020.1776370

[29] Sayogie, F., 2014. Teori dan praktikum penerjemahan. Transpustaka: Tangerang Selatan, Indonesia.

[30] Cao, J., Li, M., Li, Y., et al., 2022. SemMT: a semantic-based testing approach for machine translation systems. ACM Transactions on Software Engineering and Methodology (TOSEM). 31(2), 1–36.

[31] Utami, N.P.L.D., Utami, N.M.V., 2023. Unveiling Semantic Errors Found in Lexical Translations of Tasya Farasya’s TikTok Account. Lingua Cultura. 17(2), 219–225.

[32] Lambert, V.A., Lambert, C.E., 2012. Qualitative descriptive research: An acceptable design. Pacific Rim International Journal of Nursing Research. 16(4), 255.

[33] MBC1 [@MBC1], 2025. TikTok. Available from: https://www.tiktok.com/@mbc (cited 10 May 2025).

[34] Shahid [@shahid], 2025. TikTok. Available from: https://www.tiktok.com/@shahid (cited 10 June 2025).

Downloads

How to Cite

Abdullah Alshaye, S. (2025). Semantic Errors in Machine Translation: A Study of Colloquial Arabic-English Captions on TikTok. Forum for Linguistic Studies, 7(9), 159–171. https://doi.org/10.30564/fls.v7i9.10711