Domain Adaptation-Based Deep Learning Framework for Android Malware Detection Across Diverse Distributions

Authors

  • Shuguang Xiong

    1. Microsoft Inc., Beijing 100102, China
    2. Baidu Inc. Beijing 100085, China

  • Xiaoyang Chen

    The Ohio State University, Columbus, OH 43210, United States

  • Huitao Zhang

    Northen Arizona University, San Francisco St, AZ 86011, United States

  • Meng Wang

    Newmark Group, United States

DOI:

https://doi.org/10.30564/aia.v6i1.6718
Received: 25 March 2024 | Accepted: 26 April 2024 | Published Online: 29 June 2024

Abstract

This study addresses the challenge of Android malware detection, a critical issue due to the pervasive threats affecting mobile devices. As Android malware evolves, conventional detection methods struggle with novel or polymorphic malware that bypasses traditional defenses. This research leverages machine learning (ML) and deep learning (DL) techniques to overcome these limitations by adopting domain adaptation strategies that enhance model generalization across different distributions. The approach involves dividing a dataset into distinct distributions and applying domain adaptation techniques to ensure robustness and accuracy despite distribution shifts. Preliminary results demonstrate that domain adaptation significantly improves detection accuracy in target domains not represented in the training data. This paper showcases a domain adaptation-based method for Android malware detection, illustrating its potential to enhance security measures in dynamic environments. The findings suggest that integrating advanced ML and DL strategies with domain adaptation can substantially improve the efficacy of malware detection systems.

Keywords:

Component; Android malware detection; Deep learning; Domain adaptation

References

[1] K. Liu, S. Xu, G. Xu, et al., 2020. A review of android malware detection approaches based on machine learning. IEEE access. 8, 124579–124607.

[2] Y. Zhou, X. Jiang, 2012. Dissecting android malware: Characterization and evolution. 2012 IEEE symposium on security and privacy. IEEE. pp. 95–109.

[3] P. Ren, Z. Zhao, 2024. Parental recognition of double reduction policy, family economic status and educational anxiety: exploring the mediating influence of educational technology substitutive resource. Economics and Management Information. 1–12.

[4] F. Yu, J. Milord, S. L. Orton, et al., 2022. The concerns and perceived challenges students faced when traditional in-person engineering courses suddenly transitioned to remote learning. 2022 ASEE Annual Conference.

[5] P. R. Pardhi, J. K. Rout, N. K. Ray, 2021. Implementation of a malware scanner using signature-based approach for android applications. 2021 19th OITS International Conference on Information Technology (OCIT). IEEE. pp. 14–19.

[6] L. Zhou, Z. Luo, X. Pan, 2024. Machine learning-based system reliability analysis with Gaussian Process Regression. arXiv preprint arXiv:2403.11125.

[7] Y. Liu, L. Liu, L. Yang, et al., 2021. Measuring distance using ultra-wideband radio technology enhanced by extreme gradient boosting decision tree (XGBoost). Automation in Construction. 126, 103678.

[8] H. Wang, Y. Zhou, E. Perez, et al., 2024. Jointly Learning Selection Matrices For Transmitters, Receivers And Fourier Coefficients In Multichannel Imaging. arXiv preprint arXiv:2402.19023.

[9] Y. Liu, Y. Bao, 2023. Real-time remote measurement of distance using ultra-wideband (UWB) sensors," Automation in Construction. 150, 104849.

[10] Y. Qiu, J. Wang, 2024. A Machine Learning Approach to Credit Card Customer Segmentation for Economic Stability. Proceedings of the 4th International Conference on Economic Management and Big Data Applications, ICEMBDA, October 27–29, 2023, Tianjin, China.

[11] M. Li, J. He, G. Jiang, et al., 2024. DDN-SLAM: Real-time dense dynamic neural implicit SLAM with joint semantic encoding. arXiv preprint arXiv:2401.01545.

[12] Y. Qiu, 2019. Estimation of tail risk measures in finance: Approaches to extreme value mixture modeling. Johns Hopkins University.

[13] Y. Zhou et al., 2023. Semantic Wireframe Detection.

[14] F. Zhao, F. Yu, T. Trull, et al., 2023. A new method using LLMs for keypoints generation in qualitative data analysis. 2023 IEEE Conference on Artificial Intelligence (CAI). IEEE. pp. 333–334.

[15] S. Li, K. Singh, N. Riedel, et al., 2022. Digital learning experience design and research of a self-paced online course for risk-based inspection of food imports. Food Control. 135, 108698.

[16] Y. Liu, H. Yang, C. Wu, 2023. Unveiling patterns: A study on semi-supervised classification of strip surface defects. IEEE Access. 11, 119933–119946.

[17] A. Farahani, S. Voghoei, K. Rasheed, et al., 2021. A brief review of domain adaptation. Advances in data science and information engineering: proceedings from ICDATA 2020 and IKE 2020. 877–894.

[18] G. Csurka, 2017. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374.

[19] S. Ben-David, J. Blitzer, K. Crammer, et al., 2006. Analysis of representations for domain adaptation. Advances in neural information processing systems. 19.

[20] Y. Qiu et al., 2024. A novel image expression-driven modeling strategy for coke quality prediction in the smart cokemaking process. Energy. 294, 130866.

[21] Y. Hao, Z. Chen, J. Jin, et al., 2023. Joint operation planning of drivers and trucks for semi-autonomous truck platooning. Transportmetrica A: Transport Science. 1–37.

[22] S. Xiong, H. Zhang, M. Wang, et al., 2022. Distributed Data Parallel Acceleration-Based Generative Adversarial Network for Fingerprint Generation. Innovations in Applied Engineering and Technology. 1–12.

[23] J. Lee, H. Jang, S. Ha, et al., 2021. Android malware detection using machine learning with feature selection based on the genetic algorithm. Mathematics. 9(21), 2813.

[24] C. Palma, A. Ferreira, M. Figueiredo, et al., 2024. Explainable machine learning for malware detection on android applications. Information. 15(1), 25.

[25] J. Senanayake, H. Kalutarage, M.O. Al-Kadri, 2021. Android mobile malware detection using machine learning: A systematic review. Electronics. 10(13), 1606.

[26] X. Pan, Z. Luo, L. Zhou, 2024. Navigating the landscape of distributed file systems: Architectures, implementations, and considerations. arXiv preprint arXiv:2403.15701.

[27] X. Deng, L. Li, M. Enomoto, aet al., 2019. Continuously frequency-tuneable plasmonic structures for terahertz bio-sensing and spectroscopy. Scientific reports. 9(1), 3498.

[28] X. Deng, Y. Kawano, 2018. Surface plasmon polariton graphene midinfrared photodetector with multifrequency resonance. Journal of Nanophotonics. 12(2), 026017–026017.

[29] Kaggle. Network Traffic Android Malware [Internet]. https://www.kaggle.com/datasets/xwolf12/network-traffic-android-malware (cited May 1, 2024).

[30] M. Ahmed, R. Seraj, S.M.S. Islam, 2020. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics. 9(8), 1295.

[31] K. Krishna, M.N. Murty, 1999. Genetic K-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 29(3), 433–439.

[32] L. Zhou, H. Zhang, N. Zhou, 2024. Double-compressed artificial neural network for efficient model storage in customer churn prediction. Artificial Intelligence Advances. 6(1), 1–12.

[33] S. Li, P. Kou, M. Ma, et al., 2024. Application of Semi-supervised Learning in Image Classification: Research on Fusion of Labeled and Unlabeled Data. IEEE Access.

[34] F. Chen, Z. Luo, L. Zhou, et al., 2024. Comprehensive survey of model compression and speed up for vision transformers. arXiv preprint arXiv:2404.10407.

[35] F. Zhao, F. Yu, 2024. Enhancing multi-class news classification through bert-augmented prompt engineering in large language models: a novel approach. The 10th International scientific and practical conference “Problems and prospects of modern science and education”(March 12–15, 2024) Stockholm, Sweden. International Science Group. 381 p., 2024, p. 297.

[36] S. Agatonovic-Kustrin, R. Beresford, 2000. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. Journal of pharmaceutical and biomedical analysis. 22(5), 717–727.

[37] D. Jahed Armaghani, M. Hasanipanah, A. Mahdiyar, et al., 2018. Airblast prediction through a hybrid genetic algorithm-ANN model. Neural Computing and Applications. 29, 619–629.

[38] C. Banerjee, T. Mukherjee, E. Pasiliao Jr, 2019. An empirical study on generalizations of the ReLU activation function. Proceedings of the 2019 ACM Southeast Conference. 164–167.

[39] K. Eckle, J. Schmidt-Hieber, 2019. A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Networks. 110, 232–242.

[40] D. P. Kingma, J. Ba, 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[41] Z. Zhang, 2018. Improved adam optimizer for deep neural networks. 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS). IEEE. pp. 1–2.

[42] Y. Qiu, J. Wang, Z. Jin, et al., 2022. Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomedical Signal Processing and Control. 72, 103323.

[43] S. Orton, F. Yu, L. Flores, et al., 2023. Student perceptions of confidence in learning and teaching before and after teaching improvements. 2023 ASEE Annual Conference. ASEE PEER.

[44] Y. Wu, Z. Jin, C. Shi, et al., 2024. Research on the application of deep learning-based BERT model in sentiment analysis. arXiv preprint arXiv:2403.08217.

[45] Y. Hao, Z. Chen, X. Sun, et al., 2024. Planning of Truck Platooning for Road-Network Capacitated Vehicle Routing Problem. arXiv preprint arXiv:2404.13512.

[46] Y. Liu,Y. Bao, 2021. Review of electromagnetic waves-based distance measurement technologies for remote monitoring of civil engineering structures. Measurement. 176, 109193.

[47] B. Sun, K. Saenko, 2016. Deep coral: Correlation alignment for deep domain adaptation. Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14. Springer. pp. 443–450.

[48] B. Sun, J. Feng, K. Saenko, 2017. Correlation alignment for unsupervised domain adaptation. Domain adaptation in computer vision applications. 153–171.

[49] Z. Cao, L. Ma, M. Long, et al., 2018. Partial adversarial domain adaptation. Proceedings of the European conference on computer vision (ECCV). 135–150.

[50] M. Wang, W. Deng, 2018. Deep visual domain adaptation: A survey. Neurocomputing. 312, 135–153.

[51] W. M. Kouw, M. Loog, 2019. A review of domain adaptation without target labels. IEEE transactions on pattern analysis and machine intelligence. 43(3), 766–785.

Downloads

How to Cite

Xiong, S., Chen, X., Zhang, H., & Wang, M. (2024). Domain Adaptation-Based Deep Learning Framework for Android Malware Detection Across Diverse Distributions. Artificial Intelligence Advances, 6(1), 13–24. https://doi.org/10.30564/aia.v6i1.6718

Issue

Article Type

Articles