-
5246
-
5100
-
1988
-
1880
-
1538
Using Deep Learning Models for Multimodal Sentence Level Sentiment Analysis of Sign Language
DOI:
https://doi.org/10.30564/fls.v7i4.9041Abstract
Deaf and hard-of-hearing individuals communicate with signs such as hand signals, gestures, facial expressions, and body movements. This medium of communication is called sign language, which is a non-verbal, visual means of communication. However, some non-deaf and non-hard-of-hearing individuals do not understand sign language; those who understand it use it to communicate with deaf and hard-of-hearing individuals. Some active users of social media are deaf and hard-of-hearing individuals; therefore, it is necessary to develop technological tools that will guarantee effective communication between deaf & hard-of-hearing individuals and non-deaf & non-hard-of-hearing individuals, especially across various social media platforms. Sentiment analysis of sign language is one such technological tool that helps to communicate the polarity expressed in sign language. A multimodal approach to sentiment analysis of sign language is the focus of this study, which uses a multimodal sign language dataset to train two Deep Learning models. The dataset consists of video clips of sentence-level sign language and textual equivalents. The dataset trains a deep convolutional neural network model called VGG16 for visual modality. The other Deep Learning model, which the dataset trains, is Bidirectional Encoder Representation from Transformer, BERT for textual modality. The results of the performance metrics showed that the multimodal approach performed better than the single-modality text-based approach.
Keywords:
Sign Language; Sentiment Analysis; Visual Signal; Textual Modality; Multimodality; Sign Language Sentiment RecognitionReferences
[1] Akandeh, A., 2022. Sentence-Level Sign Language Recognition Framework. Proceedings of the 2022 International Conference on Computational Science and Computational Intelligence (CSCI); 14–16 December 2022; Las Vegas, NV, USA. pp. 1436–1441. DOI: https://doi.org/10.1109/CSCI58124.2022.00256
[2] Jamwal, A., Vasukidevi, G., Malleswari, T.Y.J.N., et al., 2022. Real Time Conversion of American Sign Language to text with Emotion using Machine Learning. Proceedings of the Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC-2022); 10–12 November 2022; Dharan, Nepal. pp. 603–609.
[3] Nimisha, K.P., Jacob, A., 2020. A Brief Review of the Recent Trends in Sign Language Recognition. Proceedings of the 2020 International Conference on Communication and Signal Processing; 28–30 July 2020; Chennai, India. pp. 0186–0190.
[4] Sharma, A., Panda, S., Verma, S., 2020. Sign Language to Speech Translation. Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT); 01–03 July 2020; Kharagpur, India. pp. 1–8.
[5] Alaghband, M., Maghroor, H.R., Garibay, I., 2023. A survey on sign language literature. Machine Learning with Applications. 14, 100504. DOI: https://doi.org/10.1016/j.mlwa.2023.100504
[6] Tao, T., Zhao, Y., Zhu, J., 2024. Sign Language Recognition: A Comprehensive Review of Traditional and Deep Learning Approaches, Datasets, and Challenges. IEEE Access. 12, 75034–75060. DOI: https://doi.org/10.1109/ACCESS.2024.3398806
[7] Núñez-Marcos, A., Perez-de-Viñaspre, O., Labaka, G., 2023. A survey on Sign Language machine translation. Expert Systems with Applications. 213, 11899. DOI: https://doi.org/10.1016/j.eswa.2022.118993
[8] Fan, Y., Lam, J.C.K., Li, V.O.K., 2018. Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition. In: Kůrková, V., Manolopoulos, Y., Hammer, B., et al. (eds.). Artificial Neural Networks and Machine Learning – ICANN 2018. Springer: Cham, Switzerland. pp. 84–94. DOI: https://doi.org/10.1007/978-3-030-01418-6_9
[9] Elakkiya, R., Natarajan, B., 2021. ISL-CSLTR: Indian Sign Language Dataset for Continuous Sign Language Translation and Recognition. Mendeley Data. DOI: https://doi.org/10.17632/kcmpdxky7p.1
[10] Girija, V.R., Sudha, T., Cheriyan, R., 2023. Analysis of Sentiments in Low Resource Languages: Challenges and Solutions. Proceedings of the 2023 IEEE International Conference on Recent Advances in Systems Science and Engineering (RASSE); 08–11 November 2023; Kerala, India. pp. 1–6. DOI: https://doi.org/10.1109/RASSE60029.2023.10363469
[11] Venkit, P.N., Wilson, S., Cheriyan, R., 2021. IDENTIFICATION OF BIAS AGAINST PEOPLE WITH DISABILITIES IN SENTIMENT ANALYSIS AND TOXICITY DETECTION MODELS. arXiv: Cornell Tech, New York, USA. pp. 1–12. DOI: https://doi.org/10.48550/arXiv.2111.13259.
[12] Mgimwa, P.A., Dash, S.R., 2024. Reviving Endangered Languages: Exploring AI Technologies for the Preservation of Tanzania's Hehe Language. In: Mohanty, S.S., Dash, S.R., Parida, S. (eds). Applying AI-Based Tools and Technologies Towards Revitalization of Indigenous and Endangered Languages. Studies in Computational Intelligence. 1148. Springer: Singapore. DOI: https://doi.org/10.1007/978-981-97-1987-7_2
[13] Elakkiya, R., Vijayakumar, P., Kumar, N., 2021. An optimized generative adversarial network-based continuous sign language classification. Expert Systems with Applications. 182, 115276. DOI: https://doi.org/10.1016/j.eswa.2021.115276
[14] Zhou, H., Zhou, W., Zhou, Y., et al., 2020. Spatial-temporal multi-cue network for continuous sign language recognition. Proceedings of the AAAI conference on artificial intelligence. 34(07), 13009–13016.
[15] Jindal, N., Yadav, N., Nirvan, N., et al., 2022. Sign Language Detection using Convolutional Neural Network (CNN). Proceedings of the 2022 IEEE World Conference on Applied Intelligence and Computing; 17–19 June 2022; Sonbhadra, India. pp. 354–360.
[16] Bantupalli, K., Xie, Y., 2018. American Sign Language Recognition using Deep Learning and Computer Vision. Proceedings of the 2018 IEEE International Conference on Big Data (Big Data); 10–13 December 2018; Seattle, WA, USA. pp. 4896–4899.
[17] Ng, R., Zou, E., Ahn, H.S., 2021. Sign Language and Emotion Understanding. Proceedings of the HRI '21 Companion: Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction; 8–11 March 2021; Boulder CO, USA. pp. 673–674.
[18] Li, D., Opazo, C. R., Yu, X., et al., 2020. World Level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. Proceedings of the IEEE Winter Conference on Application of Computer Vision (WACV); 01–05 March, 2020; Snowmass, CO, USA. DOI: https://doi.org/110.1109/WACV45572.2020.9093512
[19] Mariappan, M., Gomathi, V., 2019. Real-Time Recognition of Indian Sign Language. Proceedings of the 2019 International Conference on Computational Intelligence in Data Science (ICCIDS); 21–23 February 2019; Chennai, India. pp. 1–6.
[20] Kamal, S.M., Chen, Y., Li, S., 2017. Technical Approaches to Chinese Sign Language Processing: A Review. IEEE Access. 7, 96926–96935.
[21] Abhishek, K.S., Qubeley, L.C.F., Ho, D., 2016. Glove Based Hand Gesture Recognition Sign Language Translation Using Capacitive Touch Sensor. Proceedings of the 2016 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC); 03–05 August 2016; Hong Kong, China. pp. 334–337.
[22] Xiao, Q., Qin, M., Guo, P., et al., 2019. Multimodal Fusion Based on LSTM and a Couple Conditional Hidden Markov Model for Chinese Sign Language Recognition. IEEE Access. 7, 112258–112268.
[23] Rosero-Montalvo, P.D., Godoy-Trujillo, P., Flores-Bosmediano, E., et al., 2018. Sign Language Recognition Based on Intelligent Glove Using Machine Learning Techniques. Proceedings of the 2018 IEEE Third Ecuador Technical Chapters Meeting (ETCM); 15–19 October 2018; Cuenca, Ecuador. pp. 1–5.
[24] Wang, H., Chai, X., Hong, X., et al., 2016. Isolated sign language recognition with Grassmann covariance matrices. ACM Trans. ACM Transactions on Accessible Computing (TACCESS). 8(4), 14.
[25] Liang, Z., Liao, S.-B., Hu, B.-Z., 2018. 3D convolutional neural networks for dynamic sign language recognition. The Computer Journal. 61(11), 1724–1736.
[26] Miah, A.S.M., Hasan, M.A.M., Shin, J., et al., 2023. Multistage spatial attention-based neural network for hand gesture recognition. Computers. 12(1), 13.
[27] Hu, H., Zhao, W., Zhou, W., et al., 2021. SignBERT: Pretraining of hand-model-aware representation for sign language recognition. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 10–17 October 2021; Montreal, QC, Canada. pp. 11067–11076. DOI: https://doi.org/10.1109/ICCV48922.2021.01090
[28] Pugeault, N., Bowden, R., 2011. Spelling it out: Real-time ASL fingerspelling recognition. Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops); 06–13 November 2011; Barcelona, Spain. pp. 1114–1119.
[29] Koller, O., Ney, H., Bowden, R., 2016. Deep hand: How to train a CNN on 1 million hand images when your data is continuous and weakly labeled. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 27–30 June 2016; Las Vegas, NV, USA. pp. 3793–3802.
[30] von Agris, U., Knorr, M., Kraiss, K.-F., 2008. The significance of facial features for automatic sign language recognition. Proceedings of the 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition; 17–19 September 2008; Amsterdam, Netherlands. pp. 1–6.
[31] Rastgoo, R., Kiani, K., Escalera, S., et al., 2021. Sign Language Production: A Review. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 19–25 June 2021; Nashville, TN, USA. pp. 3446–3456.
[32] Jin, C.M., Omar, Z., Jaward, M.H., 2016. A mobile application of American sign language translation via image processing algorithms. Proceedings of the 2016 IEEE Region 10 Symposium (TENSYMP); 09–11 May 2016; Bali, Indonesia. pp. 104–109.
[33] Al-Barahamtoshy, O.H., Al-Barhamtoshy, H.M., 2017. Arabic text-to-sign (ArTTS) model from automatic SR system. Procedia Computer Science. 117, 304–311.
[34] Tolba, M.F., Elons, A.S., 2013. Recent Developments in Sign Language Recognition Systems. Proceedings of the 2013 8th International Conference on Computer Engineering & Systems (ICCES); 26–28 November 2013; Cairo, Egypt. pp. 36–42.
[35] Kumar, P., Roy, P.P., Dogra, D.P., 2018. Independent Bayesian classifier combination-based sign language recognition using facial expression. Information Sciences. 428, 30–48.
[36] Jain, N., Kumar, S., Kumar, A., et al., 2018. Hybrid deep neural networks for face emotion recognition. Pattern Recognition Letters. 115, 101–106.
[37] Barsoum, E., Zhang, C., Ferrer, C.C., et al., 2016. Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution. Proceedings of the 18th ACM International Conference on Multimodal Interaction; 12–16 November 2016; Tokyo, Japan. pp. 279–283. DOI: https://doi.org/10.1145/2993148.2993165
[38] Jung, H., Lee, S., Yim, J., et al., 2015. Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV); 07–13 December 2015; Santiago, Chile. pp. 2983–2991.
[39] Zheng, X., Guo, Y., Huang, H., et al., 2020. A survey of deep facial attribute analysis. International Journal of Computer Vision. 128, 2002–2034.
[40] Savran, A., Gur, R., Verma, R., 2013. Automatic detection of emotion valence on faces using consumer depth cameras. Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops; 02–08 December 2013; Sydney, NSW, Australia. pp. 75–82.
[41] Kelly, D., Delannoy, J.R., McDonald, J., et al., 2009. A framework for continuous multimodal sign language recognition. Proceedings of the 2009 International Conference on Multimodal Interfaces; 2–4 November 2009; Cambridge, Massachusetts, USA. pp. 351–358.
[42] Yang, H.-D., Lee, S.-W., 2011. Combination of manual and non-manual features for sign language recognition based on conditional random field and active appearance model. Proceedings of the 2011 International Conference on Machine Learning and Cybernetics; 10–13 July 2011; Guilin, China. pp. 1726–1731.
[43] Yang, H.-D., Lee, S.-W., 2013. Robust sign language recognition by combining manual and non-manual features based on conditional random field and support vector machine. Pattern Recognition Letters. 34(16), 2051–2056.
[44] Huang, J., Zhou, W., Zhang, Q., et al., 2018. Video-Based Sign Language Recognition Without Temporal Segmentation. Thirty-Second AAAI Conference on Artificial Intelligence. 32(1). DOI: https://doi.org/10.1609/aaai.v32i1.11903
[45] Aran, O., Burger, T., Caplier, A., et al., 2009. A belief-based sequential fusion approach for fusing manual signs and non-manual signals. Pattern Recognition. 42(5), 812–822.
[46] Oguike, O., Primus, M. Multimodal Sentence Level Sentiment Analysis of Sign Language Using Deep Learning and Language Model. Available from: https://authorea.com/users/824042/articles/1220511-multimodal-sentence-level-sentiment analysis-of-sign-language-using-deep-learning-and-language-models (cited 30 August 2024).
[47] Degadwala, S., Vyas, D., 2024. Survey on Systematic Analysis of Deep Learning Models Compare to Machine Learning. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. 10(3), 556–566. DOI: https://doi.org/10.32628/CSEIT24103206
[48] Pandeya, Y.R., Bhattarai, B., Lee, J., 2021. Deep-Learning-Based multimodal emotion classification for music videos. Sensors. 21(14), 4927.
[49] Dashtipour, K., Gogate, M., Cambria, E., et al., 2021. A novel context-aware multimodal framework for Persian sentiment analysis. Neurocomputing. 457, 377–388.
Downloads
How to Cite
Issue
Article Type
License
Copyright © 2025 Osondu Oguike, Mpho Primus

This is an open access article under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.