Soft voting ensemble model to improve Parkinson’s disease prediction with SMOTE

(1) * Jumanto Unjung Mail (Department of Computer Science, Universitas Negeri Semarang, Indonesia)
(2) Rofik Rofik Mail (Department of Computer Science, Universitas Negeri Semarang, Indonesia)
(3) Endang Sugiharti Mail (Department of Computer Science, Universitas Negeri Semarang, Indonesia)
(4) Alamsyah Alamsyah Mail (Department of Computer Science, Universitas Negeri Semarang, Indonesia)
(5) Riza Arifudin Mail (Department of Computer Science, Universitas Negeri Semarang, Indonesia)
(6) Budi Prasetiyo Mail (Department of Computer Science, Universitas Negeri Semarang, Indonesia)
(7) Much Aziz Muslim Mail (Department of Computer Science, Universitas Negeri Semarang, Indonesia; and Faculty of Technology Management, Universiti Tun Hussein Onn Malaysia, Johor, Malaysia)
*corresponding author

Abstract


Parkinson's disease is one of the major neurodegenerative diseases that affect the central nervous system, often leading to motor and cognitive impairments in affected individuals. A precise diagnosis is currently unreliable, plus there are no specific tests such as electroencephalography or blood tests to diagnose the disease. Several studies have focused on the voice-based classification of Parkinson's disease. These studies attempt to enhance the accuracy of classification models. However, a major issue in predictive analysis is the imbalance in data distribution and the low performance of classification algorithms. This research aims to improve the accuracy of speech-based Parkinson's disease prediction by addressing class imbalance in the data and building an appropriate model. The proposed new model is to perform class balancing using SMOTE and build an ensemble voting model. The research process is systematically structured into multiple phases: data preprocessing, sampling, model development utilizing a voting ensemble approach, and performance evaluation. The model was tested using voice recording data from 31 people, where the data was taken from OpenML. The evaluation results were carried out using stratified cross-validation and showed good model performance. From the measurements taken, this study obtained an accuracy of 97.44%, with a precision of 97.95%, recall of 97.44%, and F1-Score of 97.56%. This study demonstrates that implementing the soft-voting ensemble-SMOTE method can enhance the model's predictive accuracy.

Keywords


Soft-voting ensemble; SMOTE; Parkinson's disease; prediction; cross-validation

   

DOI

https://doi.org/10.26555/ijain.v11i1.1627
      

Article metrics

Abstract views : 812 | PDF views : 134

   

Cite

   

Full Text

Download

References


[1] E. J. Alqahtani, F. H. Alshamrani, H. F. Syed, and S. O. Olatunji, “Classification of Parkinson’s Disease Using NNge Classification Algorithm,” in 2018 21st Saudi Computer Society National Computer Conference (NCC), IEEE, Apr. 2018, pp. 1–7. doi: 10.1109/NCG.2018.8592989.

[2] Z. Karapinar Senturk, “Early diagnosis of Parkinson’s disease using machine learning algorithms,” Med Hypotheses, vol. 138, p. 109603, May 2020, doi: 10.1016/j.mehy.2020.109603.

[3] R. Sheibani, E. Nikookar, and S. Alavi, “An ensemble method for diagnosis of Parkinson’s disease based on voice measurements,” J Med Signals Sens, vol. 9, no. 4, p. 221, 2019, doi: 10.4103/jmss.JMSS_57_18.

[4] Y. Liu, Y. Li, X. Tan, P. Wang, and Y. Zhang, “Local discriminant preservation projection embedded ensemble learning based dimensionality reduction of speech data of Parkinson’s disease,” Biomed Signal Process Control, vol. 63, p. 102165, Jan. 2021, doi: 10.1016/j.bspc.2020.102165.

[5] E. Avuçlu and A. Elen, “Evaluation of train and test performance of machine learning algorithms and Parkinson diagnosis with statistical measurements,” Med Biol Eng Comput, vol. 58, no. 11, pp. 2775–2788, Nov. 2020, doi: 10.1007/s11517-020-02260-3.

[6] V. S. Rathore, M. Worring, D. K. Mishra, A. Joshi, and S. Maheshwari, Eds., Emerging Trends in Expert Applications and Security, vol. 841. Singapore: Springer Singapore, 2019. doi: 10.1007/978-981-13-2285-3.

[7] K. M. Alalayah, E. M. Senan, H. F. Atlam, I. A. Ahmed, and H. S. A. Shatnawi, “Automatic and Early Detection of Parkinson’s Disease by Analyzing Acoustic Signals Using Classification Algorithms Based on Recursive Feature Elimination Method,” Diagnostics, vol. 13, no. 11, p. 1924, May 2023, doi: 10.3390/diagnostics13111924.

[8] A. U. Haq et al., “Feature Selection Based on L1-Norm Support Vector Machine and Effective Recognition System for Parkinson’s Disease Using Voice Recordings,” IEEE Access, vol. 7, pp. 37718–37734, 2019, doi: 10.1109/ACCESS.2019.2906350.

[9] R. Khaskhoussy and Y. Ben Ayed, “Improving Parkinson’s disease recognition through voice analysis using deep learning,” Pattern Recognit Lett, vol. 168, pp. 64–70, Apr. 2023, doi: 10.1016/j.patrec.2023.03.011.

[10] S. K. Bhasin and I. U. Bharadwaj, “Perceptions and meanings of living with Parkinson’s disease: an account of caregivers lived experiences,” Int J Qual Stud Health Well-being, vol. 16, no. 1, Jan. 2021, doi: 10.1080/17482631.2021.1967263.

[11] S. Aich, H.-C. Kim, K. younga, K. L. Hui, A. A. Al-Absi, and M. Sain, “A Supervised Machine Learning Approach using Different Feature Selection Techniques on Voice Datasets for Prediction of Parkinson’s Disease,” in 2019 21st International Conference on Advanced Communication Technology (ICACT), IEEE, Feb. 2019, pp. 1116–1121. doi: 10.23919/ICACT.2019.8701961.

[12] J. Jumanto et al., “Optimizing Support Vector Machine Performance for Parkinson’s Disease Diagnosis Using GridSearchCV and PCA-Based Feature Extraction,” Journal of Information Systems Engineering and Business Intelligence, vol. 10, no. 1, pp. 38–50, Feb. 2024, doi: 10.20473/jisebi.10.1.38-50.

[13] C. O. Sakar et al., “A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform,” Appl Soft Comput, vol. 74, pp. 255–263, Jan. 2019, doi: 10.1016/j.asoc.2018.10.022.

[14] P. Klumpp et al., “The phonetic footprint of Parkinson’s disease,” Comput Speech Lang, vol. 72, p. 101321, Mar. 2022, doi: 10.1016/j.csl.2021.101321.

[15] T. Khan et al., “Assessing Parkinson’s disease severity using speech analysis in non-native speakers,” Comput Speech Lang, vol. 61, p. 101047, May 2020, doi: 10.1016/j.csl.2019.101047.

[16] J. M. Tracy, Y. Özkanca, D. C. Atkins, and R. Hosseini Ghomi, “Investigating voice as a biomarker: Deep phenotyping methods for early detection of Parkinson’s disease,” J Biomed Inform, vol. 104, p. 103362, Apr. 2020, doi: 10.1016/j.jbi.2019.103362.

[17] J. Unjung and M. R. Ningsih, “Optimized Handwriting-based Parkinson’s Disease Classification Using Ensemble Modeling and VGG19 Feature Extraction,” Scientific Journal of Informatics, vol. 10, no. 4, pp. 489–498, 2023, doi: https://doi.org/10.15294/sji.v10i4.47108.

[18] L. Sigcha et al., “Deep learning and wearable sensors for the diagnosis and monitoring of Parkinson’s disease: A systematic review,” Expert Syst Appl, vol. 229, p. 120541, Nov. 2023, doi: 10.1016/j.eswa.2023.120541.

[19] H. Syamsudin, S. Khalidah, and J. Unjung, “Lepidoptera Classification Using Convolutional Neural Network EfficientNet-B0,” Indonesian Journal of Artificial Intelligence and Data Mining, vol. 7, no. 1, Nov. 2023, doi: 10.24014/ijaidm.v7i1.24586.

[20] S. Saravanan et al., “A Systematic Review of Artificial Intelligence (AI) Based Approaches for the Diagnosis of Parkinson’s Disease,” Archives of Computational Methods in Engineering, vol. 29, no. 6, pp. 3639–3653, Oct. 2022, doi: 10.1007/s11831-022-09710-1.

[21] O. Yaman, F. Ertam, and T. Tuncer, “Automated Parkinson’s disease recognition based on statistical pooling method using acoustic features,” Med Hypotheses, vol. 135, p. 109483, Feb. 2020, doi: 10.1016/j.mehy.2019.109483.

[22] P. Ghaheri, H. Nasiri, A. Shateri, and A. Homafar, “Diagnosis of Parkinson’s disease based on voice signals using SHAP and hard voting ensemble method,” Comput Methods Biomech Biomed Engin, pp. 1–17, Sep. 2023, doi: 10.1080/10255842.2023.2263125.

[23] G. Solana-Lavalle and R. Rosas-Romero, “Analysis of voice as an assisting tool for detection of Parkinson’s disease and its subsequent clinical interpretation,” Biomed Signal Process Control, vol. 66, p. 102415, Apr. 2021, doi: 10.1016/j.bspc.2021.102415.

[24] I. Ahmed, S. Aljahdali, M. Shakeel Khan, and S. Kaddoura, “Classification of Parkinson Disease Based on Patient’s Voice Signal Using Machine Learning,” Intelligent Automation & Soft Computing, vol. 32, no. 2, pp. 705–722, 2022, doi: 10.32604/iasc.2022.022037.

[25] T. D.K., P. B.G, and F. Xiong, “Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques,” Pattern Recognit Lett, vol. 128, pp. 544–550, Dec. 2019, doi: 10.1016/j.patrec.2019.10.029.

[26] S. Davoudi, A. Ahmadi, and M. R. Daliri, “Frequency–amplitude coupling: a new approach for decoding of attended features in covert visual attention task,” Neural Comput Appl, vol. 33, no. 8, pp. 3487–3502, Apr. 2021, doi: 10.1007/s00521-020-05222-w.

[27] M. Huu Nguyen, T.-A. Nguyen, and H.-B. Ly, “Ensemble XGBoost schemes for improved compressive strength prediction of UHPC,” Structures, vol. 57, p. 105062, Nov. 2023, doi: 10.1016/j.istruc.2023.105062.

[28] C. Jatoth, N. E., M. A.V.R., and S. R. Annaluri, “Effective monitoring and prediction of Parkinson disease in Smart Cities using intelligent health care system,” Microprocess Microsyst, vol. 92, p. 104547, Jul. 2022, doi: 10.1016/j.micpro.2022.104547.

[29] Y. Huang, Y. Wang, P. Wang, and Y. Lai, “An XGBOOST predictive model of void ratio in sandy soils with shear-wave velocity as major input,” Transportation Geotechnics, vol. 42, p. 101100, Sep. 2023, doi: 10.1016/j.trgeo.2023.101100.

[30] J. Ye, B. Zhao, and H. Deng, “Photovoltaic Power Prediction Model Using Pre-train and Fine-tune Paradigm Based on LightGBM and XGBoost,” Procedia Comput Sci, vol. 224, pp. 407–412, 2023, doi: 10.1016/j.procs.2023.09.056.

[31] Z. Cui, X. Qing, H. Chai, S. Yang, Y. Zhu, and F. Wang, “Real-time rainfall-runoff prediction using light gradient boosting machine coupled with singular spectrum analysis,” J Hydrol (Amst), vol. 603, p. 127124, Dec. 2021, doi: 10.1016/j.jhydrol.2021.127124.

[32] L. Tian, L. Feng, L. Yang, and Y. Guo, “Stock price prediction based on LSTM and LightGBM hybrid model,” J Supercomput, vol. 78, no. 9, pp. 11768–11793, Jun. 2022, doi: 10.1007/s11227-022-04326-5.

[33] X. Wang, N. Xu, X. Meng, and H. Chang, “Prediction of Gas Concentration Based on LSTM-LightGBM Variable Weight Combination Model,” Energies (Basel), vol. 15, no. 3, p. 827, Jan. 2022, doi: 10.3390/en15030827.

[34] Y. Dasril, M. A. Muslim, M. F. Al Hakim, J. Jumanto, and B. Prasetiyo, “Credit Risk Assessment in P2P Lending Using LightGBM and Particle Swarm Optimization,” Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 9, no. 1, pp. 18–28, Feb. 2023, doi: 10.26594/register.v9i1.3060.

[35] M. A. Muslim et al., “New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning,” Intelligent Systems with Applications, vol. 18, p. 200204, May 2023, doi: 10.1016/j.iswa.2023.200204.

[36] D. GhoshRoy, P. A. Alvi, and K. Santosh, “Explainable AI to Predict Male Fertility Using Extreme Gradient Boosting Algorithm with SMOTE,” Electronics (Basel), vol. 12, no. 1, p. 15, Dec. 2022, doi: 10.3390/electronics12010015.

[37] L. Li, Z. Liu, J. Shen, F. Wang, W. Qi, and S. Jeon, “A LightGBM-based strategy to predict tunnel rockmass class from TBM construction data for building control,” Advanced Engineering Informatics, vol. 58, p. 102130, Oct. 2023, doi: 10.1016/j.aei.2023.102130.

[38] M. A. Khan, N. Iqbal, Imran, H. Jamil, and D.-H. Kim, “An optimized ensemble prediction model using AutoML based on soft voting classifier for network intrusion detection,” Journal of Network and Computer Applications, vol. 212, p. 103560, Mar. 2023, doi: 10.1016/j.jnca.2022.103560.

[39] A. U. Dullah, F. N. Apsari, and J. Jumanto, “Ensemble learning technique to improve breast cancer classification model,” Journal of Soft Computing Exploration, vol. 4, no. 2, Jun. 2023, doi: 10.52465/joscex.v4i2.166.

[40] M. A. Muslim et al., “An Ensemble Stacking Algorithm to Improve Model Accuracy in Bankruptcy Prediction,” Journal of Data Science and Intelligent Systems, Mar. 2023, doi: 10.47852/bonviewJDSIS3202655.

[41] N. Rai, N. Kaushik, D. Kumar, C. Raj, and A. Ali, “Mortality prediction of COVID-19 patients using soft voting classifier,” International Journal of Cognitive Computing in Engineering, vol. 3, pp. 172–179, Jun. 2022, doi: 10.1016/j.ijcce.2022.09.001.

[42] Y. Jin, Y. Liu, W. Zhang, S. Zhang, and Y. Lou, “A novel multi-stage ensemble model with multiple K-means-based selective undersampling: An application in credit scoring,” Journal of Intelligent & Fuzzy Systems, vol. 40, no. 5, pp. 9471–9484, Apr. 2021, doi: 10.3233/JIFS-201954.

[43] R. Verma and S. Chandra, “RepuTE: A soft voting ensemble learning framework for reputation-based attack detection in fog-IoT milieu,” Eng Appl Artif Intell, vol. 118, p. 105670, Feb. 2023, doi: 10.1016/j.engappai.2022.105670.

[44] P. Tavana, M. Akraminia, A. Koochari, and A. Bagherifard, “An efficient ensemble method for detecting spinal curvature type using deep transfer learning and soft voting classifier,” Expert Syst Appl, vol. 213, p. 119290, Mar. 2023, doi: 10.1016/j.eswa.2022.119290.

[45] M. T R, V. K. V, D. K. V, O. Geman, M. Margala, and M. Guduri, “The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification,” Healthcare Analytics, vol. 4, p. 100247, Dec. 2023, doi: 10.1016/j.health.2023.100247.

[46] N. Peker and C. Kubat, “Application of Chi-square discretization algorithms to ensemble classification methods,” Expert Syst Appl, vol. 185, p. 115540, Dec. 2021, doi: 10.1016/j.eswa.2021.115540.

[47] V. Despotovic, T. Skovranek, and C. Schommer, “Speech Based Estimation of Parkinson’s Disease Using Gaussian Processes and Automatic Relevance Determination,” Neurocomputing, vol. 401, pp. 173–181, Aug. 2020, doi: 10.1016/j.neucom.2020.03.058.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
 andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0