A Context-Aware Embedding Approach to Meaning Conflation Deficiency in Sesotho sa Leboa: Addressing Semantic Ambiguity

Authors

  • Mosima A. Masethe

    Department of Computer Science and Information Technology, Sefako Makgatho Health Sciences University, Ga-Rankuwa 0208, South Africa

    Department of Information Technology, Durban University of Technology, Durban 4001, South Africa

  • Sunday O. Ojo

    Department of Information Technology, Durban University of Technology, Durban 4001, South Africa

  • Hlaudi D. Masethe

    Department of Computer Science, Tshwane University of Technology, Soshanguve 0152, South Africa

DOI:

https://doi.org/10.30564/fls.v7i8.9831
Received: 1 May 2025 | Revised: 17 June 2025 | Accepted: 30 June 2025 | Published Online: 14 August 2025

Abstract

A major problem in Natural Language Processing (NLP) is Meaning Conflation Deficiency (MCD), especially in low-resource, morphologically rich languages like Sesotho sa Leboa. In downstream tasks like Word Sense Disambiguation (WSD), traditional word embeddings frequently perform poorly because they are unable to distinguish between a word's numerous senses. To ascertain how well various context-aware and multi-prototype word embedding models—such as ELMo, GPT-2, BERT, Universal Sentence Encoder, and hybrid versions of Doc2Vec and SBERT—resolve MCD, this study examines and assesses them. Standard classification measures (precision, recall, F1-score, and accuracy) as well as clustering-based metrics and visualisation approaches were used to assess the models after they were trained and tested on a sense-annotated Sesotho sa Leboa corpus. According to the results, deep contextual models—in particular, ELMo and GPT-2—perform noticeably better in terms of accuracy and sense separation than static and unsupervised models. With well-separated confusion matrices, ELMo showed excellent interpretability and the highest F1-score (93%) of any model. According to the results, context-aware architecture provides reliable MCD solutions as well as a scalable framework for improving WSD in language applications with limited resources. For future studies on semantic disambiguation in under-represented languages, the work offers fresh standards and perspectives.

Keywords:

Meaning Conflation Deficiency; Contextual Word Embeddings; Word Sense Disambiguation; Low-Resourced Languages; Morphologically Rich Languages; Semantic Ambiguity; Transformer-Based Models; Multilingual BERT

References

[1] Masethe, M.A., Masethe, H.D., Ojo, S.O., 2024. ContextAware Embedding Techniques for Addressing Meaning Conflation Deficiency in Morphologically Rich Languages Word Embedding: A Systematic Review and Meta Analysis. Computers. 13(10), 271. DOI: https://doi.org/10.3390/computers13100271

[2] Masethe, H.D., Masethe, M.A., Ojo, S.O., et al., 2024. Word Sense Disambiguation for Morphologically Rich LowResourced Languages: A Systematic Literature Review and MetaAnalysis. Information. 15(9), 540. DOI: https://doi.org/10.3390/info15090540

[3] Majumdar, S., Varshney, A., Das, P., et al., 2022. An Effective LowDimensional Software Code Representation using BERT and ELMo. In Proceedings of 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), Guangzhou, China, (5–9 December 2022); pp. 763–774. DOI: https://doi.org/10.1109/QRS57517.2022.00082

[4] Wang, B., Kuo, C.J., 2020. SBERTWK: A Sentence Embedding Method by Dissecting BERTBased Word Models. In Proceedings of the IEEE/ACM Transactions on Audio, Speech, and Language Processing Conference, Virtual/Online, (15–17 March 2020); pp. 2146–2157. DOI: https://doi.org/10.1109/TASLP.2020.3008390

[5] Hongwiengchan, W., Charnkeitkong, P., Qu, J., 2022. Analyzing of crowdfunding projects using BERT sentence summarization. In Proceedings of the 6th International Conference on Information Technology (InCIT 2022), Panyapiwat Institute of Management, Nonthaburi, Thailand, (10–11 November 2022); pp. 191–195. DOI: https://doi.org/10.1109/InCIT56086.2022.10067618

[6] Laxmi, S.T., Rismala, R., Nurrahmi, H., 2021. Cyberbullying Detection on Indonesian Twitter using Doc2Vec and Convolutional Neural Network. In Proceedings of the 9th International Conference on Information and Communication Technology (ICoICT 2021), Yogyakarta, Indonesia, (3–5 August 2021); pp. 82–86. DOI: https://doi.org/10.1109/ICoICT52021.2021.9527420

[7] Liu, G., Wu, X., 2019. Using collaborative filtering algorithms combined with Doc2Vec for movie recommendation. In Proceedings of the 3rd IEEE Information Technology, Networking, Electronic and Automation Control Conference (ITNEC 2019), Chengdu, China, (15–17 March 2019); pp. 1461–1464. DOI: https://doi.org/10.1109/ITNEC.2019.8729076

[8] Ajallouda, L., Najmani, K., Zellou, A., et al., 2022. Doc2Vec, SBERT, InferSent, and USE: Which embedding technique for noun phrases? In Proceedings of the 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET 2022), Meknes, Morocco, (3–4 March 2022); pp. 1–5. DOI: https://doi.org/10.1109/IRASET52964.2022.9738300

[9] Oubounyt, M., Louadi, Z., Tayara, H., et al., 2018. Deep Learning Models Based on Distributed Feature Representations for Alternative Splicing Prediction. In Proceedings of IEEE Access Conference, Virtual/Online, (7 August 2018); pp. 58826–58834. DOI: https://doi.org/10.1109/ACCESS.2018.2874208

[10] Fang, L., Luo, Y., Feng, K., et al., 2023. A KnowledgeEnriched Ensemble Method for Word Embedding and MultiSense Embedding. In Proceedings of the IEEE Transactions on Knowledge and Data Engineering Conference, Virtual/Online, pp. 5534–5549. DOI: https://doi.org/10.1109/TKDE.2022.3159539

[11] Nath Nandi, R., Zaman, M.M.A., Muntasir, T.A., et al., 2018. Bangla News Recommendation Using Doc2Vec. In Proceedings of the 2018 International Conference on Bangla Speech and Language Processing (ICBSLP 2018), Dhaka, Bangladesh, (21–22 September 2018); pp. 1–5. DOI: https://doi.org/10.1109/ICBSLP.2018.8554679

[12] Hoque, M.T., Islam, A., Ahmed, E., et al., 2019. Analyzing Performance of Different Machine Learning Approaches with Doc2Vec for Classifying Sentiment of Bengali Natural Language. In Proceedings of the 2nd International Conference on Electrical, Computer and Communication Engineering (ECCE 2019), Cox's Bazar, Bangladesh, (7–9 February 2019); pp. 1–5. DOI: https://doi.org/10.1109/ECACE.2019.8679272

[13] Fujita, Y., Ueda, K., 2024. A Method for Selecting Training Data Using Doc2Vec for Automatic Test Cases Generation.In Proceedings of the IEEE International Conference on Consumer Electronics (ICCE 2024), Chengdu, China, (6–8 June 2024); pp. 1–6. DOI: https://doi.org/10.1109/ICCE59016.2024.10444275

[14] Reshma, P.K., Rajagopal, S., Lajish, V.L., 2020. A Novel Document and Query Similarity Indexing Using VSM for Unstructured Documents. In Proceedings of the 6th International Conference on Advanced Computing and Communication Systems (ICACCS 2020), Coimbatore, India, (6–7 March 2020); pp. 676–681. DOI: https://doi.org/10.1109/ICACCS48705.2020.9074255

[15] Susanto, A.D., Pradita, S.A., Stryadhi, C., et al., 2023. Text Vectorization Techniques for Trending Topic Clustering on Twitter: A Comparative Evaluation. In Proceedings of the 5th International Conference on Cybernetics and Intelligent Systems (ICORIS 2023), Moscow, Russia, (6—7 October 2023); pp. 1–7. DOI: https://doi.org/10.1109/ICORIS60118.2023.10352228

[16] Alghamdi, J., Lin, Y., Luo, S., 2024. Unveiling the hidden patterns: A novel semantic deep learning approach to fake news detection on social media. Engineering Applications of Artificial Intelligence. 137, 109240. DOI: https://doi.org/10.1016/j.engappai.2024.109240

[17] Vithanage, D., Yu, P., Wang, L., et al., 2024. Contextual Word Embedding for Biomedical Knowledge Extraction: A Rapid Review and Case Study. Journal of Healthcare Informatics Research. 8(1), 158–179. DOI: https://doi.org/10.1007/s41666-023-00157-y

[18] Roumeliotis, K.I., Tselikas, N.D., Nasiopoulos, D.K., 2024. LLMs in ecommerce: A comparative analysis of GPT and LLaMA models in product review evaluation. Natural Language Processing Journal. 6, 100056. DOI: https://doi.org/10.1016/j.nlp.2024.100056

[19] Li, Y., Xu, C., Cai, J., et al., 2024. Multilabel Classification of News Topics Based on Universal Sentence Encoder. In Proceedings of the 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI 2024), Shenzhen, China, (31 May–2 June 2024); pp. 419–422. DOI: https://doi.org/10.1109/ICECAI62591.2024.10675181

[20] Saka, S.O., Cömert, Z., 2024. Sentiment Analysis based on Text with Universal Sentence Encoder and CNNLSTM Models. In Proceedings of the 8th International Artificial Intelligence and Data Processing Symposium (IDAP 2024), Prague, Czech Republic, (15–17 May 2024); pp. 1–4. DOI: https://doi.org/10.1109/IDAP64064.2024.10711063

[21] Pandya, V., Troia, F.D., 2023. Malware Detection through Contextualized Vector Embeddings. In Proceedings of the Silicon Valley Cybersecurity Conference (SVCC 2023), San Jose, CA, USA, (16–17 October 2023); pp. 1–7. DOI: https://doi.org/10.1109/SVCC56964.2023.10165170

[22] Huang, W., Zhang, J., Li, X., et al., 2025. A Semantic and Intelligent Focused Crawler based on BERT Semantic Vector Space Model and Hybrid Algorithm (October 2024). In Proceedings of the IEEE Access Conference (virtual), Virtual/Online, (1–3 October 2024); p. 1. DOI: https://doi.org/10.1109/ACCESS.2025.3542064

[23] Masethe, H.D., Masethe, M.A., Ojo, S.O., et al., 2025. Hybrid TransformerBased Large Language Models for Word Sense Disambiguation in the LowResource Sesotho sa Leboa Language.Applied Sciences. 15(3608), 1–33. DOI: https://doi.org/10.3390/app15073608

[24] Garg, S., Sharma, D.K., 2022. Role of ELMo Embedding in Detecting Fake News on Social Media. In Proceedings of the 11th International Conference on System Modeling & Advancement in Research Trends (SMART 2022), Bhubaneswar, India, 16–17 December 2022; pp. 57–60. DOI: https://doi.org/10.1109/SMART55829.2022.10046789

[25] Jayakody, J., Vidanagama, V., Perera, I., et al., 2023. ELMo Layer Embedding Comparison with Short Text Classification. In Proceedings of the 3rd Asian Conference on Innovation in Technology (ASIANCON 2023), Bangkok, Thailand, (25–27 August 2023); pp. 1–6. DOI: https://doi.org/10.1109/ASIANCON58793.2023.10270646

Downloads

How to Cite

Masethe, M. A., Ojo , S. O., & Masethe, H. D. (2025). A Context-Aware Embedding Approach to Meaning Conflation Deficiency in Sesotho sa Leboa: Addressing Semantic Ambiguity. Forum for Linguistic Studies, 7(8), 845–867. https://doi.org/10.30564/fls.v7i8.9831