Comparative Evaluation of Ensemble and Tree-Based Machine Learning Algorithms for Network Intrusion Detection
DOI:
https://doi.org/10.30564/jeis.v7i2.12299Abstract
The increasing sophistication and scale of malicious network activities demand a fundamental shift from traditional signature-based intrusion detection systems toward adaptive, data-driven security architectures. Machine learning (ML) provides an effective paradigm for addressing this challenge by identifying intricate and non-linear patterns associated with cyber threats within complex, high-dimensional network data. This study presents a comprehensive comparative analysis of four widely used ensemble and tree-based ML algorithms Random Forest (RF), Decision Tree (DT), XGBoost, and LightGBM applied to the multi-class classification of contemporary network intrusions. Using the benchmark CIC-IDS-2017 dataset, a meticulous preprocessing framework was implemented to ensure data integrity, reproducibility, and methodological rigor. Model performance was evaluated through standard classification metrics, with macro-averaged F1-score prioritized to provide an equitable assessment across highly imbalanced class distributions. Experimental findings reveal substantial differences in performance among the examined algorithms. Although RF, DT, and LightGBM achieved overall accuracy levels exceeding 99.8%, XGBoost consistently demonstrated superior capability in identifying minority attack categories, achieving the highest overall accuracy of 99.89% and a macro-averaged F1-score of 0.8903. These results highlight XGBoost’s enhanced generalization capacity and resilience to class imbalance, confirming its suitability for deployment in real-time cybersecurity environments. In conclusion, this research establishes a consistent methodological benchmark for evaluating ensemble-based intrusion detection algorithms. It underscores the critical importance of balanced model assessment in the context of skewed network traffic distributions. The findings suggest that XGBoost offers the most reliable and balanced performance profile for practical implementation within modern Security Operations Centers (SOCs), providing a strong foundation for adaptive and intelligent intrusion detection frameworks.
Keywords:
Cyber Security; Machine Learning; Intrusion Detection; Classification; Random Forest; CIC-IDS-2017; XGBoost; LightGBMReferences
[1] Thakkar, A., Lohiya, R., 2021. A review on machine learning and deep learning perspectives of IDS for IoT: Recent updates, security issues, and challenges. Archives of Computational Methods in Engineering. 28, 3211–3243. DOI: https://doi.org/10.1007/s11831-020-09496-0
[2] Apruzzese, G., Colajanni, M., Ferretti, L., et al., 2018. On the effectiveness of machine and deep learning for cyber security. In Proceedings of the 10th International Conference on Cyber Conflict (CyCon), Tallinn, Estonia, 29 May–1 June 2018; pp. 371–390. DOI: https://doi.org/10.23919/CYCON.2018.8405026
[3] Genuario, F., Santoro, G., Giliberti, et al., 2024. Machine learning-based methodologies for cyber-attacks and network traffic monitoring: A review and insights. Information. 15(11), 741. DOI: https://doi.org/10.3390/info15110741
[4] Gu, J., Lu, S., 2021. An effective intrusion detection approach using SVM with naïve Bayes feature embedding. Computers & Security. 103, 102158. DOI: https://doi.org/10.1016/j.cose.2020.102158
[5] Ahmad, I., Basheri, M., Iqbal, M.J., et al., 2018. Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection. IEEE Access. 6, 33789–33795.
[6] Hindy, H., Brosset, D., Bayne, et al., 2020. A taxonomy of network threats and the effect of current datasets on intrusion detection systems. IEEE Access. 8, 104650–104675. DOI: https://doi.org/10.1109/ACCESS.2020.3000179
[7] Canadian Institute for Cybersecurity, 2025. Intrusion detection evaluation dataset (CIC-IDS2017). Available from: https://www.unb.ca/cic/datasets/ids-2017.html (cited 13 July 2025).
[8] Mahbooba, B., Timilsina, M., Sahal, R., et al., 2021. Explainable artificial intelligence (XAI) to enhance trust management in intrusion detection systems using decision tree model. Complexity. 2021(1), 6634811. DOI: https://doi.org/10.1155/2021/6634811
[9] McCulloch, W.S., Pitts, W., 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology. 5(1/2), 99–115.
[10] Ullah, I., Mahmoud, Q.H., 2022. Design and development of RNN anomaly detection model for IoT networks. IEEE Access. 10, 62722–62750. DOI: https://doi.org/10.1109/ACCESS.2022.3176317
[11] Panigrahi, R., Borah, S., 2018. A detailed analysis of CICIDS2017 dataset for designing network intrusion detection systems. International Journal of Engineering & Technology. 7(3), 479–482.
[12] Cieslak, D.A., Chawla, N.V., Striegel, A., 2006. Combating imbalance in network intrusion datasets. In Proceedings of the 2006 IEEE International Conference on Granular Computing, Atlanta, GA, USA, 10–12 May 2006.
[13] Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A., 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), Funchal, Portugal, 22–24 January 2018; pp. 108–116. DOI: https://doi.org/10.5220/0006639801080116
[14] Moustafa, N., Slay, J., 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. DOI: https://doi.org/10.1109/MilCIS.2015.7348942
[15] Azeez, N.A., Odufuwa, O.E., Misra, S., et al., 2021. Windows PE malware detection using ensemble learning. Informatics. 8(1), 10. DOI: https://doi.org/10.3390/informatics8010010
[16] Azam, Z., Islam, M.M., Huda, M.N., 2023. Comparative analysis of intrusion detection systems and machine learning-based model analysis through decision tree. IEEE Access. 11, 80348–80391. DOI: https://doi.org/10.1109/ACCESS.2023.3296444
[17] James, G., Witten, D., Hastie, T., et al., 2013. An Introduction to Statistical Learning. Springer: New York, NY, USA. DOI: https://doi.org/10.1007/978-1-4614-7138-7
[18] Almuhanna, R., Dardouri, S., 2025. A deep learning/machine learning approach for anomaly-based network intrusion detection. Frontiers in Artificial Intelligence. 8, 1625891. https://doi.org/10.3389/frai.2025.1625891
[19] Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning. Springer: New York, NY, USA. DOI: https://doi.org/10.1007/978-0-387-84858-7
[20] Maseer, Z.K., Yusof, R., Bahaman, N., et al., 2021. Benchmarking of machine learning for anomaly-based intrusion detection systems in the CICIDS2017 dataset. IEEE Access. 9, 22351–22370. DOI: https://doi.org/10.1109/ACCESS.2021.3056614
[21] Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Springer: New York, NY, USA.
[22] Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press: Cambridge, MA, USA.
[23] Vinayakumar, R., Alazab, M., Soman, K.P., et al., 2019. Deep learning approach for intelligent intrusion detection system. IEEE Access. 7, 41525–41550. DOI: https://doi.org/10.1109/ACCESS.2019.2895334
[24] Aldweesh, A., Derhab, A., Emam, A.Z., 2020. Deep learning approaches for anomaly-based intrusion detection systems: A survey, taxonomy, and open issues. Knowledge-Based Systems. 189, 105124. DOI: https://doi.org/10.1016/j.knosys.2019.105124
[25] Kim, J., Kim, J., Thu, H.L.T., et al., 2016. Long short term memory recurrent neural network classifier for intrusion detection. In Proceedings of the International Conference on Platform Technology and Service (PlatCon), Jeju, Republic of Korea, 15–17 February 2016; pp. 1–5.
[26] Buczak, A.L., Guven, E., 2016. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials. 18(2), 1153–1176. DOI: https://doi.org/10.1109/COMST.2015.2494502
[27] Ke, G., Meng, Q., Finley, T., et al., 2017. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 3149–3157.
Downloads
How to Cite
Issue
Article Type
License
Copyright © 2025 Atakan Özçelebi, Vedat Marttin

This is an open access article under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.




Atakan Özçelebi