Ensemble Model of Attention Mechanism-Based DCGAN and Autoencoder for Noised OCR Classification

Authors

  • Shuguang Xiong

    Baidu Inc. Beijing 100085, China

  • Huitao Zhang

    Bybit Global Digital Solutions FZE, Dubai, United Arab Emirate

  • Meng Wang

    Newmark Group, New York City, NY 10017, US

DOI:

https://doi.org/10.30564/jeis.v4i1.6725

Abstract

Optical Character Recognition (OCR) is a technology that converts images of text into machine-readable formats, essential for digitizing printed texts and enabling digital searches. Traditional OCR methods often struggle with variations in font styles and noise. This paper proposes an innovative approach to enhance OCR classification under challenging conditions by leveraging an ensemble model that combines an Attention Mechanism-Based Generative Adversarial Network (GAN) and an Autoencoder. The GAN generates synthetic data to mitigate the limitations of small datasets, while the autoencoder extracts robust features from noisy images. The model undergoes a two-phase training process, initially learning from the augmented dataset and then fine-tuning on a smaller, labeled dataset. Grad-CAM is used to demonstrate interpretability, highlighting the attention regions during predictions. Experimental results show significant improvements in OCR accuracy and robustness, validating the effectiveness of the proposed method in handling noise and limited training data.

Keywords:

Component; Optical character classification; Deep learning; Autoencoder; GAN

References

[1] A. Chaudhuri et al., 2017. Optical character recognition systems. Springer.

[2] N. Islam, Z. Islam, N. Noor, 1999. A survey on optical character recognition system. arXiv preprint arXiv:1710.05703.

[3] G. Nagy,. Nartker, T.A, Rice, S.V., 1999. Optical character recognition: An illustrated guide to the frontier. Document recognition and retrieval VII. SPIE, pp. 58–69.

[4] White, J.M., Rohrer, G.D., 1983. Image thresholding for optical character recognition and other applications requiring character image extraction. IBM Journal of research and development. 27(4), 400–411.

[5] F. Yu, J. Strobel, 2021. Work-in-Progress: Pre-college Teachers’ Metaphorical Beliefs about Engineering. 2021 IEEE Global Engineering Education Conference (EDUCON). IEEE. pp.1497–1501.

[6] Y. Qiu, 2019. Estimation of tail risk measures in finance: Approaches to extreme value mixture modeling. Johns Hopkins University.

[7] Y. Liu, L. Liu, L. Yang, et al., 2021. Measuring distance using ultra-wideband radio technology enhanced by extreme gradient boosting decision tree (XGBoost). Automation in Construction. 126, 103678.

[8] S. Xiong, H. Zhang, M. Wang, et al., 2022. Distributed data parallel acceleration-based generative adversarial network for fingerprint generation. Innovations in Applied Engineering and Technology. 1–12.

[9] Y. Liu, Y. Bao, 2022. Review on automated condition assessment of pipelines with machine learning. Advanced Engineering Informatics. 53, 101687.

[10] S. Li, K. Singh, N. Riedel, et al., 2022. Digital learning experience design and research of a self-paced online course for risk-based inspection of food imports,” Food Control, vol. 135, p. 108698, 2022.

[11] Y. Qiu, Y. Yang, Z. Lin, et al., 2020. Improved denoising autoencoder for maritime image denoising and semantic segmentation of USV. China Communications. 17(3), 46–57.

[12] F. Yu, J. O. Milord, L. Y. Flores, et al., 2022. Work in Progress: Faculty choice and reflection on teaching strategies to improve engineering self-efficacy. 2022 ASEE Annual Conference.

[13] J. Milord, F. Yu, S. Orton, et al., 2021. Impact of COVID Transition to Remote Learning on Engineering Self-Efficacy and Outcome Expectations. 2021 ASEE Virtual Annual Conference.

[14] D. Xia, A. K. Alexander, A. Isbell, et al., 2017. Establishing a co-culture system for Clostridium cellulo vorans and Clostridium aceticum for high efficiency biomass transformation. The Journal of Science and Health at the University of Alabama. 14, 8–13.

[15] E. Boros, N. K. Nguyen, G. Lejeune, et al., 2022. Assessing the impact of OCR noise on multilingual event detection over digitised documents. International Journal on Digital Libraries. 23(3), 241–266.

[16] N. Premchaiswadi, S. Yimgnagm, W. Premchaiswadi, 2010. A scheme for salt and pepper noise reduction and its application for ocr systems. WSEAS Transactions on Computers. 9(4), 351–360.

[17] J. Martínek, L. Lenc, P. Král, 2020. Building an efficient OCR system for historical documents with little training data. Neural Computing and Applications. 32,17209–17227.

[18] J. Martínek, L. Lenc, P. Král, et al., 2019. Hybrid training data for historical text OCR. 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE. pp. 565–570.

[19] G. Sun, T. Zhan, B. G. Owusu, et al., 2020. Revised reinforcement learning based on anchor graph hashing for autonomous cell activation in cloud-RANs. Future Generation Computer Systems. 104, 60–73.

[20] J. Memon, M. Sami, R. A. Khan, et al., 2020. Hand written optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE access. 8, 142642–142668.

[21] M. Li et al., 2023. Trocr: Transformer-based optical character recognition with pre-trained models,. Proceedings of the AAAI Conference on Artificial Intelligence. 37(11), 13094–13102.

[22] Y. Deng, A. Kanervisto, J. Ling, et al., 2017. Image-to-markup generation with coarse-to-fine attention. International Conference on Machine Learning. PMLR. 980–989.

[23] F. Yu, J. Milord, S. L. Orton, et al., 2022. The concerns and perceived challenges students faced when traditional in-person engineering courses suddenly transitioned to remote learning. 2022 ASEE Annual Conference

[24] Y. Qiu, 2017. Financial Deepening and Economic Growth in Select Emerging Markets with Currency Board Systems: Theory and Evidence. The Johns Hopkins Institute for Applied Economics, Global Health.

[25] Y. Shen, H.-m. Gu, L. Zhai, et al., 2022. The role of hepatic Surf4 in lipoprotein metabolism and the development of atherosclerosis in apoE−/− mice. Bio chimica et Biophysica Acta (BBA)-Molecular and Cell Biology of Lipids. 1867(10), 159196.

[26] M. Wang et al., 2022. Identification of amino acid residues in the MT-loop of MT1-MMP critical for its ability to cleave low-density lipoprotein receptor. Frontiers in Cardiovascular Medicine. 9, 917238.

[27] J. Horne et al., 2020. Caffeine and Theophylline Inhibit β-Galactosidase Activity and Reduce Expression in Escherichia coli. ACS omega. 5(50), 32250– 32255.

[28] Kaggle, 2022. OCR-dataset [Internet]. Available from: https://www.kaggle.com/datasets/harieh/ocr-dataset(cited May 1, 2024)

[29] A. Creswell, T. White, V. Dumoulin, et al., 2018. Generative adversarial networks: An overview. IEEE signal processing magazine. 35(1), 53–65.

[30] K. Wang, C. Gou, Y. Duan, et al., 2017. Generative adversarial networks: introduction and outlook. IEEE/CAA Journal of Automatica Sinica. 4(4), 588– 598.

[31] J. Gui, Z. Sun, Y. Wen, et al., 2021. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE transactions on knowledge and data engineering. 35(4), 3313–3332.

[32] Q. Wu, Y. Chen, J. Meng, 2020. DCGAN-based data augmentation for tomato leaf disease identification. IEEE access. 8, 98716–98728.

[33] W. Fang, F. Zhang, V. S. Sheng, et al., 2018. A Method for Improving CNN-Based Image Recognition Using DCGAN. Computers, Materials and Continua. 57, 1.

[34] B. Liu, J. Lv, X. Fan, et al., 2022. Application of an improved dcgan for image generation. Mobile Information Systems. 2022.

[35] J. Hu, L. Shen, G. Sun, 2018. Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.

[36] X. Jin, Y. Xie, X.-S. Wei, et al., 2022. Delving deep into spatial pooling for squeeze-and-excitation networks. Pattern Recognition. 121, 108159.

[37] Y. Qiu, J. Wang, Z. Jin, et al., 2022. Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomedical Signal Processing and Control. 72, 103323.

[38] M. Tschannen, O. Bachem, M. Lucic, 2018. Recent advances in autoencoder-based representation learning. arXiv preprint arXiv:1812.05069.

[39] Y. Zhang, 2018. A better autoencoder for image: Convolutional autoencoder [Internet]. Available from: http://users.cecs.anu.edu.au/Tom.Gedeon/conf/ABCs2018/paper/ABCs2018_paper_58.pdf (cited Mar. 23, 2017).

[40] A. F. Agarap, 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375.

[41] Y. Bai, 2022. RELU-function and derived function review. SHS Web of Conferences. 144, 02006.

[42] Z. Zhang, 2018. Improved adam optimizer for deep neural networks. 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS). Ieee. 1–2.

[43] K. Bae, H. Ryu, H. Shin, 2019. Does Adam optimizer keep close to the optimal point? arXiv preprint arXiv:1911.00289.

[44] B. Chaithanya, T. Swasthika Jain, A. Usha Ruby, et al., 2021. An approach to categorize chest X-ray images using sparse categorical cross entropy. Indonesian Journal of Electrical Engineering and Computer Science. 1700–1710.

[45] J. Kakarla, B. V. Isunuri, K. S. Doppalapudi, et al., 2021. Three-class classification of brain magnetic resonance images using average-pooling convolutional neural network. International Journal of Imaging Systems and Technology. 31(3), 1731–1740.

[46] S. J. Rigatti, 2017. Random forest. Journal of Insurance Medicine. 47(1), 31–39.

[47] G. Biau and E. Scornet, 2016. A random forest guided tour. Test. 25, 197–227.

[48] Y.-Y. Song, L. Ying, 2015. Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry. 27(2), 130.

[49] S. Suthaharan, 2016. Machine learning models and algorithms for big data classification. Integr. Ser. Inf. Syst. 36, 1–12.

[50] L. E. Peterson, 2009. K-nearest neighbor. Scholarpe dia. 4(2), 1883.

[51] J. Laaksonen, E. Oja, 1996. Classification with learning k-nearest neighbors. Proceedings of international conference on neural networks (ICNN’96). IEEE. pp. 1480–1483.

Downloads

How to Cite

Xiong, S., Zhang, H., & Wang, M. (2022). Ensemble Model of Attention Mechanism-Based DCGAN and Autoencoder for Noised OCR Classification. Journal of Electronic & Information Systems, 4(1), 33–41. https://doi.org/10.30564/jeis.v4i1.6725

Issue

Article Type

Article