Assessing Four Neural Networks on Handwritten Digit Recognition Dataset (MNIST)
DOI:
https://doi.org/10.30564/jcsr.v6i3.6804Abstract
Although the image recognition has been a research topic for many years, many researchers still have a keen interest in it. In some papers, however, there is a tendency to compare models only on one or two datasets, either because of time restraints or because the model is tailored to a specific task. Accordingly, it is hard to understand how well a certain model generalizes across image recognition field. In this paper, we compare four neural networks on MNIST dataset with different division. Among of them, three are Convolutional Neural Networks (CNN), Deep Residual Network (ResNet) and Dense Convolutional Network (DenseNet) respectively, and the other is our improvement on CNN baseline through introducing Capsule Network (CapsNet) to image recognition area. We show that the previous models despite do a quite good job in this area, our retrofitting can be applied to get a better performance. The result obtained by CapsNet is an accuracy rate of 99.75%, and it is the best result published so far. Another inspiring result is that CapsNet only needs a small amount of data to get the excellent performance. Finally, we will apply CapsNet's ability to generalize in other image recognition field in the future.
Keywords:
Neural network; CNN; CapsNet; DenseNet; ResNet; MNISTReferences
[1] Chen, F., Luo, Z., Xu, Y., et al., 2019. Complementary fusion of multi-features and multi-modalities in sentiment analysis. arXiv preprint arXiv: 1904.08138.
[2] Luo, Z., Xu, H., Chen, F., 2019. Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network. In AffCon@AAAI. pp. 80–87.
[3] Luo, Z., Zeng, X., Bao, Z., et al., 2019. Deep learning-based strategy for macromolecules classification with imbalanced data from cellular electron cryotomography. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE. pp. 1–8.
[4] Luo, Z., 2023. Knowledge-guided Aspect-based Summarization. In 2023 International Conference on Communications, Computing and Artificial Intelligence (CCCAI). IEEE. pp. 17–22.
[5] Chen, F., Luo, Z., 2019. Sentiment Analysis using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities. CoRR.
[6] Chen, F., Luo, Z., 2018. Learning robust heterogeneous signal features from parallel neural network for audio sentiment analysis. arXiv preprint arXiv: 1811.08065.
[7] Luo, Z., Xu, H., Chen, F., 2018. Utterance-based audio sentiment analysis learned by a parallel combination of cnn and lstm. arXiv preprint arXiv: 1811.08065.
[8] Chen, F., Luo, Z., Zhou, L., et al., 2024. Comprehensive survey of model compression and speed up for vision transformers. arXiv preprint arXiv: 2404.10407.
[9] Pan, X., Luo, Z., Zhou, L., 2022. Comprehensive Survey of State-of-the-Art Convolutional Neural Network Architectures and Their Applications in Image Classification. Innovations in Applied Engineering and Technology. pp. 1–16.
[10] Zhou, L., Luo, Z., Pan, X., 2024. Machine learning-based system reliability analysis with Gaussian Process Regression. arXiv preprint arXiv: 2403.11125.
[11] MLA Chatfield, Ken, et al., 2011. The devil is in the details: an evaluation of recent feature encoding methods. British machine vision conference. p. 1–12.
[12] Ba, J., Mnih, V., Kavukcuoglu, K., 2015. Multiple object recognition with visual attention. International conference on learning representations.
[13] Goodfellow, I.J., Bulatov, Y., Ibarz, J., et al., 2014. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks. International conference on learning representations.
[14] Hinton, G.E., Ghahramani, Z., Teh, Y.W., 2000. Learning to Parse Images. Neural Information Processing Systems. 463–469.
[15] Deng, J., Dong, W., Socher, R., et al., 2009. ImageNet: A large-scale hierarchical image database. Computer Vision And Pattern Recognition. 248–255.
[16] Goodfellow, I.J., et al., 2013. Maxout Networks. International Conference on Machine Learning.
[17] Abadi, M., Agarwal, A., Barham, P., et al., 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv: Distributed, Parallel, and Cluster Computing.
[18] Ren, S., He, K., Girshick, R.B., et al., 2015. Object Detection Networks on Convolutional Feature Maps. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39, 1476–1481.
[19] Chang, J., Chen, Y., 2015. Batch-normalized Maxout Network in Network. arXiv: Computer Vision and Pattern Recognition.
[20] Pan, X., Luo, Z., Zhou, L., 2024. Navigating the landscape of distributed file systems: Architectures, implementations, and considerations. arXiv preprint arXiv: 2403.15701.
[21] He, K., Sun, J., 2015. Convolutional neural networks at constrained time cost. computer vision and pattern recognition. 5353–5360.
[22] Ren, S., He, K., Girshick, R., et al., 2015. Faster R-CNN: towards real-time object detection with region proposal networks. Neural Information Processing Systems. 91–99.
[23] Hinton, G.F., 1981. Shape representation in parallel systems. International Joint Conference on Artificial Intelligence. 1088–1096.
[24] Glorot, X., Bordes, A., Bengio, Y., 2011. Deep Sparse Rectifier Neural Networks. International Conference on Artificial Intelligence And Statistics. 315–323.
Downloads
How to Cite
Issue
Article Type
License
Copyright © 2024 Author(s)
This is an open access article under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.