Efficient Feature Selection and ML Algorithm for Accurate Diagnostics

Authors

  • Vincent Omollo Nyangaresi Faculty of Biological & Physical Sciences, Tom Mboya University College, Homabay, Kenya
  • Nidhal Kamel Taha El-Omari The World Islamic Science and Education University, Amman, Jordan
  • Judith Nyakanga Nyakina School of Nursing, Uzima University, Kisumu, Kenya

DOI:

https://doi.org/10.30564/jcsr.v4i1.3852

Abstract

Machine learning algorithms have been deployed in numerous optimization, prediction and classification problems. This has endeared them for application in fields such as computer networks and medical diagnosis. Although these machine learning algorithms achieve convincing results in these fields, they face numerous challenges when deployed on imbalanced dataset. Consequently, these algorithms are often biased towards majority class, hence unable to generalize the learning process. In addition, they are unable to effectively deal with high-dimensional datasets. Moreover, the utilization of conventional feature selection techniques from a dataset based on attribute significance render them ineffective for majority of the diagnosis applications. In this paper, feature selection is executed using the more effective Neighbour Components Analysis (NCA). During the classification process, an ensemble classifier comprising of K-Nearest Neighbours (KNN), Naive Bayes (NB), Decision Tree (DT) and Support Vector Machine (SVM) is built, trained and tested. Finally, cross validation is carried out to evaluate the developed ensemble model. The results shows that the proposed classifier has the best performance in terms of precision, recall, F-measure and classification accuracy.

Keywords:

Accuracy; Classifier; Ensemble; F-measure; Machine learning; Precision; Recall

References

[1] Gupta, M., Gupta, B., 2018. A comparative study of breast cancer diagnosis using supervised machine learning techniques. in 2018 second international conference on computing methodologies and communication (ICCMC), IEEE. 997-1002.

[2] Alkeshuosh, A.H., Moghadam, M.Z., Al Mansoori, I., Abdar, M., 2017. Using PSO Algorithm for Producing Best Rules in Diagnosis of Heart Disease. in 2017 International Conference on Computer and Applications (ICCA), IEEE. 306-311.

[3] Nyangaresi, V.O., Rodrigues, A.J., Abeka, S.O., 2020. Secure Handover Protocol for High Speed 5G Networks. Int. J. Advanced Networking and Applications. 11(06), 4429-4442.

[4] Rashad, T., Sudhir, A., 2019. Fuzzy-Neural based Cost Effective Handover Prediction Technique for 5G-IoT networks. International Journal of Innovative Technology and Exploring Engineering. 9(2S3), 191- 197.

[5] Mahira, A.G., Subhedar, M.S., 2017. Handover decision in wireless heterogeneous networks based on feed forward artificial neural network. in Computational Intelligence in Data Mining, Springer, Singapore. 663-669.

[6] Nyangaresi, V.O., Abeka, S.O., Rodgrigues, A., 2018. Secure Timing Advance Based Context-Aware Handover Protocol for Vehicular Ad-Hoc Heterogeneous Networks. International Journal of Cyber-Security and Digital Forensics. 7(3), 256-275.

[7] Wafa, B., Adnane, L., Vicent, P., 2019. Applying ANFIS Model in Decision-making of Vertical Handover between Macrocell and Femto-cell Integrated Network. Journal of Telecommunication, Electronic and Computer Engineering. 11(1), 57-62.

[8] Azzali, F., Ghazali, O., Omar, M.H., 2017. Fuzzy Logic-based Intelligent Scheme for Enhancing QoS of Vertical Handover Decision in Vehicular Ad-hoc Networks. International Research and Innovation Summit (IRIS2017). 226, 1-12.

[9] Nyangaresi, V.O., Abeka, S.O., Rodrigues, A.J., 2020. Delay Sensitive Protocol for High Availability LTE Handovers. American Journal of Networks and Communications. 9(1), 1-10.

[10] Shanmugam, K., 2017. A novel candidate network selection based handover management with fuzzy logic in heterogeneous wireless networks. in 4th International Conference on Advanced Computing and Communication Systems (ICACCS), IEEE. 1-6.

[11] Aibinu, A., Onumanyi, J.A., Adedigba, P., Ipinyomi, M., Folorunso, T., Salami, M., 2017. Development of hybrid artificial intelligent based handover decision algorithm. Int. J. Eng. Sci. Technol. 20(2), 381-390.

[12] Nyangaresi, V.O., Rodrigues, A.J., Abeka, S.O., 2020. ANN-FL secure handover protocol for 5G and beyond networks. in International Conference on e-Infrastructure and e-Services for Developing Countries, Springer, Cham. 99-118.

[13] Zineb, A., Ayadi, M., Tabbane, S., 2017. QoE-based vertical handover decision management for cognitive networks using ANN. in Proceedings of the 2017 Sixth International Conference on Communications and Networking (ComNet), IEEE. 1-7.

[14] Eman, Z., Amr, A., Abdelkerim, T., Abdelhalim, Z., 2018. A novel vertical handover algorithm based on Adaptive Neuro-Fuzzy Inference System (ANFIS). International Journal of Engineering & Technology. 7(1), 74-78.

[15] Nyangaresi, V.O., Rodrigues, A.J., 2022. Efficient handover protocol for 5G and beyond networks. Computers & Security. 113, 102546.

[16] Pragati, K., Haridas, S.L., 2019. Reducing Ping-Pong Effect in Heterogeneous Wireless Networks Using Machine Learning. Intelligent Communication, Control and Devices. 697-705.

[17] Jamal, F.A., Firudin, K.M., 2017. Direction prediction assisted handover using the multilayer perception neural network to reduce the handover time delays in LTE networks. in 9th International Conference on Theory and Application of Soft Computing, Computing with Words and Perception, Procedia Computer Science. 120, 719-727.

[18] Nyangaresi, V.O., Rodrigues, A.J., Abeka, S.O., 2020. Neuro-Fuzzy Based Handover Authentication Protocol for Ultra Dense 5G Networks. in 2020 2nd Global Power, Energy and Communication Conference (GPECOM), IEEE. 339-344.

[19] Eshtay, M., Faris, H., Obeid, N., 2018. Improving extreme learning machine by competitive swarm optimization and its application for medical diagnosis problems. Expert Syst. Appl. 104, 134-152.

[20] Rosellini, A.J., Liu, S., Anderson, G.N., Sbi, S., Tung, E.S., Knyazhanskaya, E., 2020. Developing algorithms to predict adult onset internalizing disorders: An ensemble learning approach. Journal of psychiatric research. 121, 189-196.

[21] Sevakula, R.K., Verma, N.K., 2017. Assessing generalization ability of majority vote point classifiers. IEEE Transactions on Neural Networks and Learning Systems. 28(12), 2985-97.

[22] Miao, K.H., Miao, J.H., Miao, G.J., 2016. Diagnosing coronary heart disease using ensemble machine learning. Int J Adv Comput Sci Appl (IJACSA). 7(10), 1-12.

[23] Li, H., Cui, Y., Liu, Y., Li, W., Shi, Y., Fang, C., Lu, Y., 2018. Ensemble learning for overall power conversion efficiency of the all-organic dye-sensitized solar cells. IEEE Access. 6, 34118-34126.

[24] An, N., Ding, H., Yang, J., Au, R., Ang, T.F., 2020. Deep ensemble learning for Alzheimer's disease classification. Journal of biomedical informatics. 105, 103411.

[25] Sun, J., Lang, J., Fujita, H., Li, H., 2018. Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf. Sci. 425,76-91.

[26] Zhang, X., Mahadevan, S., 2019. Ensemble machine learning models for aviation incident risk prediction. Decis. Support Syst. 116, 48-63.

[27] Mittal, D., Gaurav, D., Roy, S.S., 2015. An effective hybridized classifier for breast cancer diagnosis. in 2015 IEEE international conference on advanced intelligent mechatronics (AIM), IEEE,1026-1031.

[28] Jiang, J., Li, X., Zhao, C., Guan, Y., Yu, Q., 2017. Learning and inference in knowledge-based probabilistic model for medical diagnosis. Knowledge-Based Systems. 138, 58-68.

[29] Baccouche, A., Garcia-Zapirain, B., Castillo Olea, C., Elmaghraby, A., 2020. Ensemble Deep Learning Models for Heart Disease Classification: A Case Study from Mexico. Information. 11(4), 207.

[30] Wang, L., Zhou, W., Chang, Q., Chen, J., Zhou, X., 2019. Deep Ensemble Detection of Congestive Heart Failure using Short-term RR Intervals. IEEE Access. 7, 69559-69574.

[31] Liu, N., Li, X., Qi, E., Xu, M., Li, L., Gao, B., 2020. A novel Ensemble Learning Paradigm for Medical Diagnosis with Imbalanced Data. IEEE Access. 8, 171263-171280.

[32] Abdar, M., Zomorodi-Moghadam, M., Zhou, X., Gururajan, R., Tao, X., Barua, P.D., Gururajan, R., 2020. A new nested ensemble technique for automated diagnosis of breast cancer. Pattern Recognit. Lett. 132, 123-131.

[33] Kazemi, Y., Mirroshandel, S.A., 2018. A novel method for predicting kidney stone type using ensemble learning. Artif. Intell. Med. 84, 117-126.

[34] Wang, Y., Wang, D., Ye, X., Wang, Y., Yin, Y., Jin, Y., 2019. A tree ensemble based two-stage model for advanced-stage colorectal cancer survival prediction. Inf. Sci. 474, 106-124.

[35] Nilashi, M., Ahmadi, H., Shahmoradi, L., Ibrahim, O., Akbari, E., 2019. A predictive method for hepatitis disease diagnosis using ensembles of neuro-fuzzy technique. Journal of infection and public health. 12(1), 13-20.

[36] Ali, F., El-Sappagh, S., Islam, S.R., Kwak, D., Ali, A., Imran, M., Kwak, K.S., 2020. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Information Fusion. 63, 208-222.

[37] Reddy, G.T., Bhattacharya, S., Ramakrishnan, S.S., Chowdhary, C.L., Hakak, S., Kaluri, R., Reddy, M.P.K., 2020. An ensemble based machine learning model for diabetic retinopathy classification. in 2020 international conference on emerging trends in information technology and engineering (ic-ETITE), IEEE. 1-6.

[38] Zhang, B., Qi, S., Monkam, P., Li, C., Yang, F., Yao, Y.D., Qian, W., 2019. Ensemble learners of multiple deep CNNs for pulmonary nodules classification using CT images. IEEE Access. 7, 110358-110371.

[39] Han, L., Luo, S., Yu, J., Pan, L., Chen, S., 2015. Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE Journal of Biomedical and Health Informatics. 19(2), 728-34.

[40] Lindsey, T., Lee, J.J., 2020. Automated Cardiovascular Pathology Assessment Using Semantic Segmentation and Ensemble Learning. Journal of digital imaging. 1-6.

[41] Aličković, E., Subasi, A., 2017. Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Computing and Applications. 28(4), 753-763.

[42] Esfahani, H.A., Ghazanfari, M., 2017. Cardiovascular disease detection using a new ensemble classifier. in Proceedings of the 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), Tehran, Iran. 1011-1014.

[43] Yu, Y., Lin, H., Meng, J., Wei, X., Zhao, Z., 2017. Assembling deep neural networks for medical compound figure detection. Information. 8, 48.

[44] Pławiak, P., Acharya, U.R., 2020. Novel deep genetic ensemble of classifiers for arrhythmia detection using ECG signals. Neural Computing and Applications. 32(15), 11137-11161.

[45] Yekkala, I., Dixit, S., Jabbar, M.A., 2017. Prediction of heart disease using ensemble learning and Particle Swarm Optimization. in 2017 International Conference On Smart Technologies For Smart Nation (SmartTechCon), IEEE. 691-698.

[46] Krishnaiah, V., Srinivas, M., Narsimha, G., Chandra, N.S., 2014. Diagnosis of heart disease patients using fuzzy classification technique. in International Conference on Computing and Communication Technologies, IEEE. 1-7.

[47] Moitra, D., Mandal, R.K., 2019. Automated AJCC staging of non-small cell lung cancer (NSCLC) using deep convolutional neural network (CNN) and recurrent neural network (RNN). Health information science and systems. 7(1), 1-12.

[48] Islam, M.M., Islam, M.Z., Asraf, A., Ding, W., 2021. Diagnosis of COVID-19 from X-rays using combined CNN-RNN architecture with transfer learning. MedRxiv. 2020-08.

[49] Sethy, P.K., Behera, S.K., Ratha, P.K., Biswas, P., 2020. Detection of Coronavirus Disease (COVID-19) based on Deep Features and Support Vector Machine. International Journal of Mathematical Engineering and Management Sciences. 643-651.

[50] Narin, A., Kaya, C., Pamuk, Z., 2021. Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. Pattern Analysis and Applications. 1-14.

[51] Loey, M., Smarandache, F., M Khalifa, N.E., 2020. Within the lack of chest COVID-19 X-ray dataset: a novel detection model based on GAN and deep transfer learning. Symmetry. 12(4), 651.

[52] Hemdan, E.E.D., Shouman, M.A., Karar, M.E., 2020. COVIDX-Net: A Framework of Deep Learning Classifiers to Diagnose COVID-19 in X-Ray Images, arXiv. 2003, 11055.

[53] Mienye, I.D., Sun, Y., Wang, Z., 2020. An improved ensemble learning approach for the prediction of heart disease risk. Informatics in Medicine Unlocked. 20, 100402.

[54] Tjahjadi, H., Ramli, K., 2020. Noninvasive Blood Pressure Classification Based on Photoplethysmography Using K-Nearest Neighbors Algorithm: A Feasibility Study. Information. 11, 93.

[55] Azizi, S., Bayat, S., Yan, P., Tahmasebi, A., Kwak, J.T., Xu, S., Abolmaesumi, P., 2018. Deep recurrent neural networks for prostate cancer detection: analysis of temporal enhanced ultrasound. IEEE transactions on medical imaging. 37(12), 2695-2703.

[56] Rossetto, A.M., Zhou, W., 2017. Deep learning for categorization of lung cancer CT images. in 2017 IEEE/ACM international conference on connected health: applications, systems and engineering technologies (CHASE), IEEE, Philadelphia, PA. 272- 273.

[57] Coudray, N., Ocampo, P.S., Sakellaropoulos, T., Narula, N., Snuderl, M., Fenyö, D., Moreira, A.L., Razavian, N., Tsirigos, A., 2018. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nature medicine. 24(10), 1559-1567.

[58] Punn, N.S., Agarwal, S., 2021. Automated diagnosis of COVID-19 with limited posteroanterior chest X-ray images using fine-tuned deep neural networks. Applied Intelligence. 51(5), 2689-2702.

[59] Khan, M.A., 2020. An IoT Framework for Heart Disease Prediction Based on MDCNN Classifier. IEEE Access. 8, 34717-34727.

Downloads

How to Cite

Nyangaresi, V. O., El-Omari, N. K. T., & Nyakina, J. N. (2022). Efficient Feature Selection and ML Algorithm for Accurate Diagnostics. Journal of Computer Science Research, 4(1), 10–19. https://doi.org/10.30564/jcsr.v4i1.3852

Issue

Article Type

Article