A comparative analysis of Indian sign language recognition using deep learning models

Authors

  • Bunny Saini

    Symbiosis International (Deemed University)

  • Divya Venkatesh

    Symbiosis International (Deemed University)

  • Nikita Chaudhari

    Symbiosis International (Deemed University)

  • Tanaya Shelake

    Symbiosis International (Deemed University)

  • Shilpa Gite

    Symbiosis International (Deemed University)

  • Biswajeet Pradhan

    University of Technology Sydney

DOI:

https://doi.org/10.59400/fls.v5i1.1617

Abstract

Sign language is a form of communication where people use bodily gestures, particularly those of hands and arms. This method of communication is put into motion when spoken communication is unattainable or disfavored. There are very few people who can translate sign language and readily understand them. It would be convenient for the hearing-impaired to have a platform where their sign language could be translated easily. Hence, through this study, with the help of artificial neural networks, we wish to compare how various widely implemented deep learning architectures respond to faultless translation of Indian sign language for the native audience. This research would streamline the development of software tools that can accurately predict or translate ISL. For the purpose of understanding the method of training the machine and exploring our model’s performance without any optimizations, a Convolutional Neural Network architecture was implemented. Over the course of our research, there have been several Pre-trained Transfer Learning Models implemented that have yielded promising results. The research aims to contrast how various convolutional neural networks perform while translating Indian Sign Actions on a custom dataset that factors in illumination, angles, and different backgrounds to provide a balanced and distinctive set of images. The goal of this study is to make clear comparisons between the various deep learning frameworks. Hence, a fresh Indian sign language dataset is introduced. Since every dataset in the field of deep learning has special properties that may be utilized for the betterment of the existing models, the development of a fresh dataset could be viewed as a development in the field. The optimum model for our task: classification of these gestures is found to be ResNet-50 (Accuracy = 98.25% and F1-score = 99.34%), and the least favorable was InceptionNet V3 (Accuracy = 66.75%, and F1-score = 70.89%).

Keywords:

Indian sign language, deep learning, transfer learning, convolutional neural network, ResNet-50

References

[1] Abiyev RH, Arslan M, and Idoko J (2020) Sign language translation using deep convolutional neural networks. KSII Transactions on Internet and Information Systems 14(2). DOI: 10.3837/tiis.2020.02.009

[2] Adeyanju IA, Bello OO, and Adegboye MA (2021) Machine learning methods for sign language recognition: A critical review and analysis. Intelligent Systems with Applications 12: 200056. DOI: 10.1016/j.iswa.2021.200056

[3] Adithya V and Rajesh R (2020) A deep convolutional neural network approach for static hand gesture recognition. Procedia Computer Science 171: 2353–2361. DOI: 10.1016/j.procs.2020.04.255

[4] Alzubaidi L, Zhang J, Humaidi AJ, et al. (2021) Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data 8(53). DOI: 10.1186/s40537-021-00444-8

[5] Arikeri P (2021) Indian Sign Language (ISL) [online]. Available at: https://www.kaggle.com/datasets/prathumarikeri/indian-sign-language-isl

[6] Bansal D, Chhikara R, Khanna K, and Gupta P (2018) Comparative analysis of various machine learning algorithms for detecting dementia. Procedia Computer Science 132: 1497–1502. DOI: 10.1016/j.procs.2018.05.102

[7] Barbhuiya AA, Karsh RK, and Jain R (2021) CNN based feature extraction and classification for sign language. Multimedia Tools and Applications 80(2): 3051–3069. DOI: 10.1007/s11042-020-09829-y

[8] Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 21–26 July 2017, pp.1800–1807. New York: IEEE.

[9] Correll R (2022) Challenges That Still Exist for the Deaf Community [online]. Available at: https://www.verywellhealth.com/what-challenges-still-exist-for-the-deaf-community-4153447 (Accessed: 28 July 2022).

[10] Deshmukh D (1997) Sign Language and Bilingualism in Deaf Education [online]. Ichalkaranji: Hunda Infotech. Available at: https://bilingualism.in/

[11] Dong K, Zhou C, Ruan Y, and Li Y (2020) MobileNet V2 model for image classification. In: 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China, 18–20 December 2020, pp.476–480. New York: IEEE.

[12] Dumbre A (2022) Indian Sign Language (ISLRTC referred) [online]. Available at: https://www.kaggle.com/datasets/atharvadumbre/indian-sign-language-islrtc-referred

[13] Dutta S, Manideep BCS, Rai S, and Vijayarajan V (2017) A comparative study of deep learning models for medical image classification. IOP Conference Series: Materials Science and Engineering 263(4): 042097. DOI: 10.1088/1757-899x/263/4/042097

[14] Ebrahimi MS and Abadi HK (2021) Study of residual networks for image recognition. In: Arai K (ed.) Intelligent Computing. Berlin: Springer, pp.754–763.

[15] Elakkiya R and Natarajan B (2021) ISL-CSLTR: Indian sign language dataset for continuous sign language translation and recognition. Mendeley Data 1. DOI: 10.17632/kcmpdxky7p.1

[16] Ethnologue (2022) Sign Language [online]. Available at: https://www.ethnologue.com/subgroup/2/

[17] Fabien M (2019) XCeption Model and Depthwise Separable Convolutions [online]. Available at: https://maelfabien.github.io/deeplearning/xception/#ii-in-keras (Accessed: 31 August 2022).

[18] Fable (2022) What Is Sign Language? [online]. Available at: https://makeitfable.com/glossary-term/sign-language/

[19] Garcia B and Viesca SA (2016) Real-time American sign language recognition with convolutional neural networks. Convolutional Neural Networks for Visual Recognition 2: 225–232.

[20] Google Cloud (2023) Advanced Guide to Inception V3 [online]. Available at: https://cloud.google.com/tpu/docs/inception-v3-advanced

[21] Goutte C and Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada DE and Fernández-Luna JM (eds.) ECIR 2005: Advances in Information Retrieval. Berlin: Springer Berlin Heidelberg, pp.345–359.

[22] Goyal C (2021) 20 Questions to Test Your Skills on CNN (Convolutional Neural Networks) [online]. Available at: https://www.analyticsvidhya.com/blog/2021/05/20-questions-to-test-your-skills-on-cnn-convolutional-neural-networks/ (Accessed: 18 August 2022).

[23] Gu J, Wang Z, Kuen J, et al. (2018) Recent advances in convolutional neural networks. Pattern Recognition 77: 354–377. DOI: 10.1016/j.patcog.2017.10.013

[24] He K, Zhang X, Ren S, and Sun J (2016a) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 27–30 June 2016, pp.770–778. New York: IEEE.

[25] He K, Zhang X, Ren S, and Sun J (2016b) Identity mappings in deep residual networks. In: Leibe B, Matas J, Sebe N, and Welling M (eds.) Computer Version—ECCV 2016. Berlin: Springer, pp.630–645.

[26] Heath N (2020) What is Machine Learning? Everything You Need to Know [online]. Available at: https://www.zdnet.com/article/what-is-machine-learning-everything-you-need-to-know (Accessed: 23 August 2022).

[27] Hossin M and Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process 5(2): 1–11. DOI: 10.5121/ijdkp.2015.5201

[28] Howard AG, Zhu M, Chen B, et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [online]. Available at: https://arxiv.org/abs/1704.04861

[29] Huang G, Liu Z, Van Der Maaten L, and Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 21–26 July 2017, pp.2261–2269. New York: IEEE.

[30] Hussin S, Elashek K, and Yildrim R (2019) Convolutional neural network baseline modelbuilding for person re-identification. In: Saritas I, Cunkas M, and Basciftci F (eds.) Conference: International Conference on Engineering Technologies (ICENTE’19), Konya, Turkey, 25–27 October 2019, pp.53–57. Meram: SN Bilgi Teknolojileri.

[31] Khan Z (2017) How do We Speak with the 18 Million Indians Who Are Deaf? [online] Available at: https://www.wionews.com/south-asia/how-do-we-speak-with-the-18-million-indians-who-are-deaf-18835 (Accessed: 3 August 2022).

[32] Khanna M (2021) Paper Review: DenseNet-Densely Connected Convolutional Networks [online]. Available at: https://towardsdatascience.com/paper-review-densenet-densely-connected-convolutional-networks-acf9065dfefb (Accessed: 29 July 2022).

[33] Kumar K (2022) Indian Sign Language Dataset [online]. Available at: https://www.kaggle.com/datasets/kshitij192/isl-dataset

[34] Le K (2021) An Overview of VGG16 and NiN Models [online]. Available at: https://medium.com/mlearning-ai/an-overview-of-vgg16-and-nin-models-96e4bf398484 (Accessed: 22 June 2022).

[35] Mandke K and Chandekar P (2019) Deaf education in India. In: Knoors H, Brons M, and Marschark M (eds.) Deaf Education Beyond the Western World: Context, Challenges, and Prospects, Perspectives on Deafness. Oxford: Oxford University Press, pp.261–284.

[36] Mascarenhas S and Agarwal M (2021) A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for image classification. In: 2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON), Bengaluru, India, 19–21 November 2021, pp.96–99. New York: IEEE.

[37] MathWorks UK (2023) VGG-19 Convolutional Neural Network [online]. Available at: https://uk.mathworks.com/help/deeplearning/ref/vgg19.html

[38] Metev SM and Veiko VP (1998) Laser-Assisted Microtechnology. Berlin: Springer Berlin Heidelberg.

[39] McNeely-White DG, Beveridge JR, and Draper BA (2019) Inception and ResNet: Same training, same features. In: Samsonovich AV (ed.) Biologically Inspired Cognitive Architectures. Berlin: Springer, pp.352–357.

[40] Mikołajczyk A and Grochowski M (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary PhD Workshop (IIPhDW), Świnoujście, Poland, 9–12 May 2018, pp.117–122. New York: IEEE.

[41] Mitter S (2017) This Country Is Developing Its Own Sign Language Dictionary [online]. Available at: https://mashable.com/article/india-sign-language-dictionary (Accessed: 23 August 2022).

[42] Pigou L, Dieleman S, Kindermans P, and Schrauwen B (2014) Sign language recognition using convolutional neural networks. In: Agapito L, Bronstein MM, and Rother C (eds.) Computer Vision—ECCV 2014 Workshops. Berlin: Springer, pp.572–578.

[43] Powers DMW (2011) Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies 2(1): 37–63.

[44] Rajalakshmi E, Elakkiya R, Prikhodko AL, et al. (2022) Static and dynamic isolated Indian and Russian sign language recognition with spatial and temporal feature detection using hybrid neural network. ACM Transactions on Asian and Low-Resource Language Information Processing 22(1): 1–23. DOI: 10.1145/3530989

[45] Rathi P, Gupta RK, Agarwal S, and Shukla A (2020) Sign language recognition using ResNet50 deep neural network architecture. Social Science Research Network. DOI: 10.2139/ssrn.3545064

[46] Ribani R and Marengoni M (2019) A survey of transfer learning for convolutional neural networks. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), Rio de Janeiro, Brazil, 28–31 October 2019, pp.47–57. New York: IEEE.

[47] Sandler M, Howard AW, Zhu M, et al. (2018) MobileNet V2: Inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, USA, 18–23 June In: 2018, pp.4510–4520. New York: IEEE.

[48] Sec I (2021) VGG-19 convolutional neural network. In: Machine Learning Blog. Available at: https://blog.techcraft.org/vgg-19-convolutional-neural-network/ (Accessed: 22 June 2022).

[49] Shafiq M and Gu Z (2022) Deep residual learning for image recognition: A survey. Applied Sciences 12(18): 8972. DOI: 10.3390/app12188972

[50] Sharma P and Anand RS (2021) A comprehensive evaluation of deep models and optimizers for Indian sign language recognition. Graphics and Visual Computing 5: 200032. DOI: 10.1016/j.gvc.2021.200032

[51] Sharma S and Singh S (2021) Vision-based hand gesture recognition using deep learning for the interpretation of sign language. Expert Systems with Applications 182: 115657. DOI: 10.1016/j.eswa.2021.115657

[52] Shorten C and Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. Journal of Big Data 6(1). DOI: 10.1186/s40537-019-0197-0

[53] Simonyan K and Zisserman A (2014) Very Deep Convolutional Networks for Large-scale Image Recognition [online]. Available at: https://arxiv.org/abs/1409.1556

[54] Smeda K (2019) Understand the Architecture of CNN [online]. Available at: https://towardsdatascience.com/understand-the-architecture-of-cnn-90a25e244c7 (Accessed: 23 June 2022).

[55] Sonawane V (2020) Indian Sign Language Dataset [online]. Available at: https://www.kaggle.com/datasets/vaishnaviasonawane/indian-sign-language-dataset

[56] Szegedy C, Vanhoucke V, Ioffe S, et al. (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 27–30 June 2016, pp.2818–2826. New York: IEEE.

[57] Tammina S (2019) Transfer learning using VGG-16 with deep convolutional neural network for classifying images. International Journal of Scientific and Research Publications 9(10): 9420. DOI: 10.29322/ijsrp.9.10.2019.p9420

[58] Vasishta M, Woodward JC, and Wilson KL (1978) Sign language in India: Regional variation within the deaf population. Indian Journal of Applied Linguistics 4(2): 66–74.

[59] Wikipedia (2023) History of Sign Language [online]. Available at: https://en.wikipedia.org/wiki/History_of_sign_language

[60] World Health Organization (2019) WHO-ITU Standard Aims to Prevent Hearing Loss among 1.1 Billion Young People [online]. Available at: https://www.who.int/news/item/12-02-2019-new-who-itu-standard-aims-to-prevent-hearing-loss-among-1.1-billion-young-people (Accessed: 19 August 2022).

[61] World Health Organization (2023) Deafness and Hearing Loss [online]. Available at: https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss

[62] Yamashita R, Nishio M, Do RKG, et al. (2018) Convolutional neural networks: An overview and application in radiology. Insights into Imaging 9(4): 611–629. DOI: 10.1007/s13244-018-0639-9

[63] Yao G, Lei T, and Zhong J (2019) A review of convolutional-neural-network-based action recognition. Pattern Recognition Letters 118: 14–22. DOI: 10.1016/j.patrec.2018.05.018

Downloads

Issue

Article Type

Article