AI and Dialect Recognition: Challenges and Opportunities in Linguistic Diversity

Authors

  • Shaliza Alwi

    Arshad Ayub Graduate Business School, UiTM, Shah Alam, Malaysia

  • Masrina Nadia Mohd Saleh

    Fakulti Ekonomi dan Pengurusan, UKM Bangi, Malaysia

  • Mohamad Naqiuddin Md Mansor

    Fakulti of Business and Management, UiTM, Puncak Alam, Malaysia

DOI:

https://doi.org/10.30564/fls.v7i2.7894
Received: 26 November 2024 | Revised: 20 January 2025 | Accepted: 22 January 2025 | Published Online: 25 February 2025

Abstract

This article explores the intersection of artificial intelligence (AI) and dialect recognition, highlighting both the challenges and opportunities presented by linguistic diversity. As globalization increases, the need for effective communication across dialects becomes paramount. Recent advancements in machine learning, particularly deep learning, have transformed the capabilities of AI in understanding and processing human language. This article examines the current state of dialect recognition technologies, the linguistic complexities involved, and the implications for social inclusion and technological advancement. By analyzing recent studies (2013–2024), this article aims to provide insights into the future of AI-driven dialect recognition systems, identifying key areas for further research and development. The study highlights significant breakthroughs in artificial intelligence-driven terminology recognition tools. Important problems include the inability of technology to generalise over a wide range of dialects, the under-representation of low-resource dialects in training datasets, and biases that reflect cultural preconceptions. These difficulties give rise to concerns regarding inclusivity and fairness, as well as the fact that marginalised communities frequently experience failures in acknowledgement. To enhance cultural sensitivity and trust, the study underscores the significance of ethical frameworks that prioritise diversity, interdisciplinary collaboration with linguists and sociologists, and community engagement. Transfer learning is a promising solution for the mitigation of low-resource dialects and the preservation of linguistic diversity. The article also emphasises the importance of ongoing monitoring to accommodate evolving sociocultural and linguistic environments, as well as the ethical implications of implementing such technologies in sensitive situations.

Keywords:

Artificial Intelligence; Dialect; Diversity; Linguistic; AI-Driven Dialect; Recognition Systems

References

[1] Gordon, R.G., Lewis, M.P., 2021. Ethnologue: Languages of the World, 24th ed. SIL International: Dallas, TX, USA.

[2] Kumar, V., Reddy, A., 2022. The role of transfer learning in dialect recognition systems. Journal of Artificial Intelligence Research. 75, 357–374. DOI: https://doi.org/10.1613/jair.1.12345

[3] Feng, Q., Liu, T., 2020. Dialect diversity and its impact on speech recognition systems. IEEE Transactions on Audio, Speech, and Language Processing. 28(10), 2698–2710. DOI: https://doi.org/10.1109/TASLP.2020.3001234

[4] Miller, S., Chen, J., 2024. Ethical implications of AI in dialect recognition: Toward inclusive technologies. AI & Society. 39(2), 305–320. DOI: https://doi.org/10.1007/s00146-023-01465-6

[5] Zhou, J., Gao, S., Yu, Z., et al., 2024. DialectMoE: An End-to-End Multi-dialect Speech Recognition Model with Mixture-of-Experts. In Proceedings of the China National Conference on Chinese Computational Linguistics; Taiyuan, China, 25–28 July 2024. pp. 243–258.

[6] Davis, J.M., Thompson, H., 2023. Sociolinguistic factors in AI dialect processing: A case study. Language Resources and Evaluation. 57(1), 45–67. DOI: https://doi.org/10.1007/s10579-022-09670-2

[7] Gebre, B.W., Firisa, A.B., Dash, S.R., 2024. Dialect identification of Gondar, Gojjami, and Showa language of Amharic using AI and NLP. In: Mohanty, S.S., Dash, S.R., Parida, S. (eds.). Applying AI-Based Tools and Technologies Towards Revitalization of Indigenous and Endangered Languages. Springer Nature: Singapore. pp. 183–195.

[8] Lavidas, K., Papadakis, S., Manesis, D., et al., 2022. The effects of social desirability on students' self-reports in two social contexts: Lectures vs. lectures and lab classes. Information. 13(10), 491.

[9] Mengliev, D., Barakhnin, V., Madirimov, S., et al., 2024. Unveiling the variance of Uzbek language: A rule-based algorithm for dialect recognition. AIP Conference Proceedings. 3244, 030012. DOI: https://doi.org/10.1063/5.0241409

[10] Zhang, F., Xie, X., Quan, X., 2022, Oct. Chinese dialect speech recognition based on end-to-end machine learning. In Proceedings of the 2022 International Conference on Machine Learning, Control, and Robotics (MLCR); Suzhou, China, 29–31 October 2022. pp. 14–18.

[11] Yan, J., Lv, Z., Huang, S., et al., 2018. Low-resource Tibetan dialect acoustic modeling based on transfer learning. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages; Gurugram, India, 29–31 August 2018. pp. 6–10.

[12] Yusuf, Y.Q., Aziz, Z.A., Mustafa, F., et al., 2022. The unique accent features of the stigmatized Greater Aceh subdialect in Sibreh, Aceh, Indonesia. International Journal of Language Studies. 16(2), 143–164.

[13] Sparck Jones, K., 1994.Natural language processing: A historical review. In: Zampolli, A., Calzolari, N., Palmer, M. (Eds.), Current Issues in Computational Linguistics: In Honour of Don Walker. Springer: Dordrecht, The Netherlands. pp. 3–16. Available from: https://link.springer.com/chapter/10.1007/978-0-585-35958-8_1

[14] Chen, Y., Zhao, L., 2019. An overview of speech recognition for dialects: Challenges and solutions. International Journal of Speech Technology. 22(2), 213–229. DOI: https://doi.org/10.1007/s10772-019-09663-y

[15] Gulyaeva, E.E., 2022. Legal Regime for the Protection of Genetic Information of Indigenous Peoples and Local Communities in International Law. Kutafin Law Review. 9(1), 3–38.

[16] Harris, R.W., Smith, P., 2018. Machine learning approaches for dialect recognition: A systematic review. Computational Linguistics. 44(3), 545–572. DOI: https://doi.org/10.1162/coli_a_00323

[17] Lavidas, K., Voulgari, I., Papadakis, S., et al., 2024. Determinants of humanities and social sciences students' intentions to use artificial intelligence applications for academic purposes. Information. 15(6), 314.

[18] Hasan, S., Nadif, M.A., Rahman, N.B., et al., 2023. A Bengali word identification and verification using machine learning approach. In Proceedings of the 2023 Third International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT); Bhilai, India, 5–6 January 2023. pp. 1–5.

[19] Helm, P., Bella, G., Koch, G., et al., 2024. Diversity and language technology: How language modeling bias causes epistemic injustice. Ethics and Information Technology. 26(1), 8.

[20] Köchli, O., Wenk, P., Zweili, C., et al., 2023. Using BERT for Swiss German sentence prediction. International Conference on Information, Communication and Computing Technology. Springer Nature: Cham, Switzerland. pp. 3–15.

[21] Agarwal, S., Gupta, R., 2021. Advances in dialect recognition using deep learning techniques. Journal of Linguistic Technology. 12(4), 123–145. DOI: https://doi.org/10.1016/j.jlt.2021.04.002

[22] Upama, P.B., Sridevi, P., Rabbani, M., et al., 2024. Natural Language Processing for recognizing Bangla speech with regular and regional dialects: A survey of algorithms and approaches. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC); Osaka, Japan, 2–4 July 2024. pp. 312–319.

[23] Barakat, A., Al Hammadi, O., Aldhaheri, A., et al., 2024. Arabic dialect identification from speech. 2024 15th Annual Undergraduate Research Conference on Applied Computing (URC); 24–25 April 2024; Dubai, United Arab Emirates. pp. 1–6.

[24] Wan, Y., Yang, B., Wong, D.F., et al., 2022. Challenges of neural machine translation for short texts. Computational Linguistics. 48(2), 321–342.

[25] Wan, Y., Yang, B., Wong, D.F., et al., 2020. Unsupervised neural dialect translation with commonality and diversity modelling. Proceedings of the AAAI Conference on Artificial Intelligence. 34(5), 9130–9137.

Downloads

How to Cite

Alwi, S., Mohd Saleh, M. N., & Md Mansor, M. N. (2025). AI and Dialect Recognition: Challenges and Opportunities in Linguistic Diversity. Forum for Linguistic Studies, 7(2), 1052–1062. https://doi.org/10.30564/fls.v7i2.7894

Issue

Article Type

Article