Predict customer churn in the banking sector: a machine learning approach with imbalanced data handling techniques

Jong-Hwa Lee; Van-Ho Nguyen; Hoanh-Su Le

doi:10.26555/ijain.v12i1.2262


Predict customer churn in the banking sector: a machine learning approach with imbalanced data handling techniques

⁽¹⁾ Jong-Hwa Lee

(Dong-Eui University, Busan City, Korea, Republic of)
⁽²⁾ Van-Ho Nguyen

(University of Economics and Law and Vietnam National University, Ho Chi Minh City, Viet Nam)
^{(3) *} Hoanh-Su Le

(University of Economics and Law and Vietnam National University, Ho Chi Minh City, Viet Nam)
^*corresponding author

Abstract

Customer value analysis is a critical component in formulating effective marketing and customer relationship management (CRM) strategies, especially in sectors where client movement and strong competition are prevalent. A key element of this process lies in enhancing customer retention rates, as retaining existing clients is typically more cost-effective than acquiring new ones and directly contributes to improving overall profitability. In today’s banking environment, where customers can choose from a broad range of financial services, customer churn has become a critical challenge. Predicting and understanding attrition enables financial institutions to implement proactive and targeted interventions to protect market share and strengthen customer loyalty. This study analyzes a real-world dataset comprising 10,127 customer records from a commercial bank, where only 1,627 entries correspond to churned customers, thereby presenting a notable class imbalance problem. To address this, several data balancing techniques were applied, including class-weight adjustment, SMOTE, SMOTE-Tomek Links, and SMOTE-ENN. Multiple machine learning models - Support Vector Machine, Random Forest, Decision Tree, Logistic Regression, AdaBoost - were evaluated to identify the most effective approach for churn prediction. The Random Forest model achieved an 86% F1-score after applying SMOTE-Tomek Links, demonstrating strong predictive capability. The key contribution of this study lies in integrating advanced resampling techniques with ensemble learning and customer behavioral insights to improve churn prediction performance and support data-driven retention strategies in the banking sector.

Keywords

Customer churn; Churn Prediction; Imbalance data handling; Machine Learning; Ensemble Learning

DOI

https://doi.org/10.26555/ijain.v12i1.2262

Article metrics

Abstract views : 334 | PDF views : 50

Cite

How to cite item

Full Text

Download

References

[1] E. Kaya, X. Dong, Y. Suhara, S. Balcisoy, B. Bozkaya, and A. S. Pentland, “Behavioral attributes and financial churn prediction,” EPJ Data Sci., vol. 7, no. 1, p. 41, Dec. 2018, doi: 10.1140/epjds/s13688-018-0165-5.

[2] V. Gkonis and I. Tsakalos, “Deep Dive Into Churn Prediction in the Banking Sector: The Challenge of Hyperparameter Selection and Imbalanced Learning,” J. Forecast., vol. 44, no. 2, pp. 281–296, Mar. 2025, doi: 10.1002/for.3194.

[3] P. Jiang, Z. Liu, M. Z. Abedin, J. Wang, W. Yang, and Q. Dong, “Profit-driven weighted classifier with interpretable ability for customer churn prediction,” Omega, vol. 125, no. June, p. 103034, Jun. 2024, doi: 10.1016/j.omega.2024.103034.

[4] I. Ajzen, “The theory of planned behavior,” Organ. Behav. Hum. Decis. Process., vol. 50, no. 2, pp. 179–211, Dec. 1991, doi: 10.1016/0749-5978(91)90020-T.

[5] T. A. Burnham, J. K. Frels, and V. Mahajan, “Consumer Switching Costs: A Typology, Antecedents, and Consequences,” J. Acad. Mark. Sci., vol. 31, no. 2, pp. 109–126, Apr. 2003, doi: 10.1177/0092070302250897.

[6] T. A. Ganaie and M. A. Bhat, “Impact of Perceived Switching Costs on Customer Loyalty in Banks: An Empirical Investigation,” Int. J. Res. Trends Innov., vol. 8, no. 4, p. 9, 2023, Accessed: Mar. 09, 2026. [Online]. Available at: https://www.ijrti.org/papers/IJRTI2304170.

[7] G. Batmunkh, “An investigation of the switching behavior and why customers switch banks in the retailing banking sector,” Front. Commun., vol. 10, p. 1548050, May 2025, doi: 10.3389/fcomm.2025.1548050.

[8] M. Georgiou, S. Daskou, A. Anastasiou, and M. Siakalli, “The effects of the theory of planned behaviour on the switching propensity of retail banking customers at different critical switching incidents,” Int. J. Bank Mark., vol. 41, no. 7, pp. 1872–1898, Dec. 2023, doi: 10.1108/IJBM-12-2022-0532.

[9] Ø. Bortne, J. Bjornestad, M. N. Arnestad, T. Tjora, and K. K. Brønnick, “Self-efficacy or perceived behavioral control: which influences bank-switching intention?,” J. Mark. Anal., pp. 1–14, May 2025, doi: 10.1057/s41270-025-00408-4.

[10] J. S. Thomas, R. C. Blattberg, and E. J. Fox, “Recapturing Lost Customers,” J. Mark. Res., vol. 41, no. 1, pp. 31–45, Feb. 2004, doi: 10.1509/jmkr.41.1.31.25086.

[11] A. Bilal Zoric, “Predicting Customer Churn in Banking Industry using Neural Networks,” Interdiscip. Descr. Complex Syst., vol. 14, no. 2, pp. 116–124, 2016, doi: 10.7906/indecs.14.2.1.

[12] C. Ledro, A. Nosella, and I. Dalla Pozza, “Integration of AI in CRM: Challenges and guidelines,” J. Open Innov. Technol. Mark. Complex., vol. 9, no. 4, p. 100151, Dec. 2023, doi: 10.1016/j.joitmc.2023.100151.

[13] P. Supsermpol, V. N. Huynh, S. Thajchayapong, and N. Chiadamrong, “Predicting financial performance for listed companies in Thailand during the transition period: A class-based approach using logistic regression and random forest algorithm,” J. Open Innov. Technol. Mark. Complex., vol. 9, no. 3, p. 100130, Sep. 2023, doi: 10.1016/j.joitmc.2023.100130.

[14] P. P. Singh, F. I. Anik, R. Senapati, A. Sinha, N. Sakib, and E. Hossain, “Investigating customer churn in banking: a machine learning approach and visualization app for data science and management,” Data Sci. Manag., vol. 7, no. 1, pp. 7–16, Mar. 2024, doi: 10.1016/j.dsm.2023.09.002.

[15] A. Muneer, R. Faizan Ali, A. Alghamdi, S. Mohd Taib, A. Almaghthawi, and E. A. A. Ghaleb, “Predicting customers churning in banking industry: A machine learning approach,” Indones. J. Electr. Eng. Comput. Sci., vol. 26, no. 1, p. 539, Apr. 2022, doi: 10.11591/ijeecs.v26.i1.pp539-549.

[16] R. Bhuria et al., “Ensemble-based customer churn prediction in banking: a voting classifier approach for improved client retention using demographic and behavioral data,” Discov. Sustain., vol. 6, no. 1, p. 28, Jan. 2025, doi: 10.1007/s43621-025-00807-8.

[17] K. Peng, Y. Peng, and W. Li, “Research on customer churn prediction and model interpretability analysis,” PLoS One, vol. 18, no. 12, p. e0289724, Dec. 2023, doi: 10.1371/journal.pone.0289724.

[18] J. Tao et al., “Explainable AI for Cheating Detection and Churn Prediction in Online Games,” IEEE Trans. Games, vol. 15, no. 2, pp. 242–251, Jun. 2023, doi: 10.1109/TG.2022.3173399.

[19] A. Manzoor, M. Atif Qureshi, E. Kidney, and L. Longo, “A Review on Machine Learning Methods for Customer Churn Prediction and Recommendations for Business Practitioners,” IEEE Access, vol. 12, pp. 70434–70463, 2024, doi: 10.1109/ACCESS.2024.3402092.

[20] M. T. Ribeiro, S. Singh, and C. Guestrin, “Model-Agnostic Interpretability of Machine Learning,” 2016. doi: 10.48550/arXiv.1606.05386.

[21] M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?,’” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, Aug. 2016, pp. 1135–1144. doi: 10.1145/2939672.2939778.

[22] S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” Neural information processing systems foundation, Nov. 2017. doi: 10.48550/arXiv.1705.07874.

[23] S. Tonati, M. Di Vece, R. Pellungrini, and F. Giannotti, “Ensemble Counterfactual Explanations for Churn Analysis,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 15244 LNAI, Springer Science and Business Media Deutschland GmbH, 2025, pp. 332–347. doi: 10.1007/978-3-031-78980-9_21.

[24] S. S. Poudel, S. Pokharel, and M. Timilsina, “Explaining customer churn prediction in telecom industry using tabular machine learning models,” Mach. Learn. with Appl., vol. 17, no. September, p. 100567, Sep. 2024, doi: 10.1016/j.mlwa.2024.100567.

[25] C. Ashurst and A. Weller, “Fairness Without Demographic Data: A Survey of Approaches,” in Equity and Access in Algorithms, Mechanisms, and Optimization, New York, NY, USA: ACM, Oct. 2023, pp. 1–12. doi: 10.1145/3617694.3623234.

[26] C. Denis, R. Elie, M. Hebiri, and F. Hu, “Fairness guarantee in multi-class classification,” Mar. 2023. doi: 10.48550/arXiv.2109.13642.

[27] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” in Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, New York, NY, USA: ACM, Jan. 2012, pp. 214–226. doi: 10.1145/2090236.2090255.

[28] E. Jackson and R. Agrawal, “Performance Evaluation of Different Feature Encoding Schemes on Cybersecurity Logs,” in 2019 SoutheastCon, IEEE, Apr. 2019, pp. 1–9. doi: 10.1109/SoutheastCon42311.2019.9020560.

[29] C. Lukita, “Predictive and Analytics using Data Mining and Machine Learning for Customer Churn Prediction,” J. Appl. Data Sci., vol. 4, no. 4, pp. 454–465, Dec. 2023, doi: 10.47738/jads.v4i4.131.

[30] S. K. Wagh, A. A. Andhale, K. S. Wagh, J. R. Pansare, S. P. Ambadekar, and S. H. Gawande, “Customer churn prediction in telecom sector using machine learning techniques,” Results Control Optim., vol. 14, no. March, p. 100342, Mar. 2024, doi: 10.1016/j.rico.2023.100342.

[31] M. Maduna, A. Telukdarie, I. Munien, U. Onkonkwo, and A. Vermeulen, “Smart Customer Churn Management System Using Machine Learning,” Procedia Comput. Sci., vol. 237, pp. 552–558, Jan. 2024, doi: 10.1016/j.procs.2024.05.139.

[32] R. Krishna, D. Jayanthi, D. S. Shylu Sam, K. Kavitha, N. K. Maurya, and T. Benil, “Application of machine learning techniques for churn prediction in the telecom business,” Results Eng., vol. 24, no. 4, p. 103165, Dec. 2024, doi: 10.1016/j.rineng.2024.103165.

[33] J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, no. September, pp. 189–215, Sep. 2020, doi: 10.1016/j.neucom.2019.10.118.

[34] R. A. Mowri, M. Siddula, and K. Roy, “A Comparative Performance Analysis of Explainable Machine Learning Models With And Without RFECV Feature Selection Technique Towards Ransomware Classification,” Dec. 2022. doi: 10.48550/arXiv.2212.04864.

[35] A. Bhat, V. Lavanya, and T. Shahana, “Ensemble Classifiers for Bankruptcy Prediction Using SMOTE and RFECV,” Int. J. Enterp. Netw. Manag., vol. 15, no. 1, pp. 109–132, 2024, doi: 10.1504/IJENM.2024.10058997.

[36] P. Vuttipittayamongkol, E. Elyan, and A. Petrovski, “On the class overlap problem in imbalanced data classification,” Knowledge-Based Syst., vol. 212, no. Januari, p. 106631, Jan. 2021, doi: 10.1016/j.knosys.2020.106631.

[37] A. Fernandez, S. Garcia, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, Apr. 2018, doi: 10.1613/jair.1.11192.

[38] B. Bakirarar and A. H. Elhan, “Class Weighting Technique to Deal with Imbalanced Class Problem in Machine Learning: Methodological Research,” Turkiye Klin. J. Biostat., vol. 15, no. 1, pp. 19–29, 2023, doi: 10.5336/biostatic.2022-93961.

[39] H. Henderi, “Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer,” IJIIS Int. J. Informatics Inf. Syst., vol. 4, no. 1, pp. 13–20, Mar. 2021, doi: 10.47738/ijiis.v4i1.73.

[40] S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, p. 281, Dec. 2019, doi: 10.1186/s12911-019-1004-8.

[41] M. E. Shipe, S. A. Deppen, F. Farjah, and E. L. Grogan, “Developing prediction models for clinical use using logistic regression: an overview,” J. Thorac. Dis., vol. 11, no. S4, pp. S574–S584, Mar. 2019, doi: 10.21037/jtd.2019.01.25.

[42] T. G. Nick and K. M. Campbell, “Logistic Regression,” in Methods in molecular biology (Clifton, N.J.), vol. 404, Humana Press, 2007, pp. 273–301. doi: 10.1007/978-1-59745-530-5_14.

[43] I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Comput. Sci., vol. 2, no. 3, p. 160, May 2021, doi: 10.1007/s42979-021-00592-x.

[44] J. R. Quinlan, “Learning efficient classification procedures and their application to chess end games,” in Machine Learning, Elsevier, 1983, pp. 463–482. doi: 10.1016/B978-0-08-051054-5.50019-4.

[45] B. de Ville, “Decision trees,” WIREs Comput. Stat., vol. 5, no. 6, pp. 448–455, Nov. 2013, doi: 10.1002/wics.1278.

[46] Y. Wang and L. Feng, “Improved Adaboost Algorithm for Classification Based on Noise Confidence Degree and Weighted Feature Selection,” IEEE Access, vol. 8, pp. 153011–153026, 2020, doi: 10.1109/ACCESS.2020.3017164.

[47] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.

[48] A. Parmar, R. Katariya, and V. Patel, “A Review on Random Forest: An Ensemble Classifier,” in Lecture Notes on Data Engineering and Communications Technologies, vol. 26, Springer, Cham, 2019, pp. 758–763. doi: 10.1007/978-3-030-03146-6_86.

[49] Y. Liu, Y. Wang, and J. Zhang, “New Machine Learning Algorithm: Random Forest,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7473 LNCS, Springer, Berlin, Heidelberg, 2012, pp. 246–252. doi: 10.1007/978-3-642-34062-8_32.

[50] I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim, “A Churn Prediction Model Using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector,” IEEE Access, vol. 7, pp. 60134–60149, 2019, doi: 10.1109/ACCESS.2019.2914999.

[51] D. Do, P. Huynh, P. Vo, and T. Vu, “Customer churn prediction in an internet service provider,” in 2017 IEEE International Conference on Big Data (Big Data), IEEE, Dec. 2017, pp. 3928–3933. doi: 10.1109/BigData.2017.8258400.

[52] Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou, “Exploratory Undersampling for Class-Imbalance Learning,” IEEE Trans. Syst. Man, Cybern. Part B, vol. 39, no. 2, pp. 539–550, Apr. 2009, doi: 10.1109/TSMCB.2008.2007853.

[53] T. Wang, C. Lu, W. Ju, and C. Liu, “Imbalanced heartbeat classification using EasyEnsemble technique and global heartbeat information,” Biomed. Signal Process. Control, vol. 71, no. January, p. 103105, Jan. 2022, doi: 10.1016/j.bspc.2021.103105.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me