Detection of code smells using machine learning techniques combined with data-balancing methods

(1) * Nasraldeen Alnor Adam Khleel Mail (University of Miskolc, Hungary)
(2) Károly Nehéz Mail (University of Miskolc, Hungary)
*corresponding author

Abstract


Code smells are prevalent issues in software design that arise when implementation or design principles are violated. These issues manifest as symptoms or anomalies in the source code. Timely identification of code smells plays a crucial role in enhancing software quality and facilitating software maintenance. Previous studies have shown that code smell detection can be accomplished through the utilization of machine learning (ML) methods. However, despite their increasing popularity, research suggests that the suitability of these methods are not always appropriate due to the problem of imbalanced data. Consequently, the effectiveness of ML models may be negatively affected. This study aims to propose a novel method for detecting code smells by employing five ML algorithms, namely decision tree (DT), k-nearest neighbors (K-NN), support vector machine (SVM), XGboost (XGB), and multi-layer perceptron (MLP). Additionally, to tackle the challenge of imbalanced data, the proposed method incorporates the random oversampling technique. Experiments were conducted in this study using four datasets that encompassed code smells, specifically god-class, data-class, long-method, and feature-envy. The experimental outcomes were evaluated and compared using various performance metrics. Upon comparing the outcomes of our models on both the balanced and original datasets, we found that the XGB model achieved the highest accuracy of 100% for detecting the data class and long method on the original datasets. In contrast, the highest accuracy of 100% was obtained for the data class and long method using DT, SVM, and XGB models on the balanced datasets. According to the empirical findings, there is significant promise in using ML techniques for the accurate prediction of code smells.

Keywords


Code smells; Software metrics; Machine learning techniques; Class Imbalance; Data balancing methods

   

DOI

https://doi.org/10.26555/ijain.v9i3.981
      

Article metrics

Abstract views : 665 | PDF views : 132

   

Cite

   

Full Text

Download

References


[1] M. Y. Mhawish and M. Gupta, “Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics,” J. Comput. Sci. Technol., vol. 35, no. 6, pp. 1428–1445, Nov. 2020, doi: 10.1007/S11390-020-0323-7.

[2] F. Pecorelli, D. Di Nucci, C. De Roover, and A. De Lucia, “On the role of data balancing for machine learning-based code smell detection,” MaLTeSQuE 2019 - Proc. 3rd ACM SIGSOFT Int. Work. Mach. Learn. Tech. Softw. Qual. Eval. co-located with ESEC/FSE 2019, pp. 19–24, Aug. 2019, doi: 10.1145/3340482.3342744.

[3] N. A. A. Khleel and K. Nehéz, “Deep convolutional neural network model for bad code smells detection based on oversampling method,” Indones. J. Electr. Eng. Comput. Sci., vol. 26, no. 3, pp. 1725–1735, Jun. 2022, doi: 10.11591/IJEECS.V26.I3.PP1725-1735.

[4] J. Pereira dos Reis, F. Brito e Abreu, G. de Figueiredo Carneiro, and C. Anslow, “Code Smells Detection and Visualization: A Systematic Literature Review,” Arch. Comput. Methods Eng., vol. 29, no. 1, pp. 47–94, Jan. 2022, doi: 10.1007/S11831-021-09566-X.

[5] T. Sharma, V. Efstathiou, P. Louridas, and D. Spinellis, “Code smell detection by deep direct-learning and transfer-learning,” J. Syst. Softw., vol. 176, p. 110936, Jun. 2021, doi: 10.1016/J.JSS.2021.110936.

[6] A. Kaur, S. Jain, S. Goel, and G. Dhiman, “A Review on Machine-learning Based Code Smell Detection Techniques in Object-oriented Software System(s),” Recent Adv. Electr. Electron. Eng. (Formerly Recent Patents Electr. Electron. Eng., vol. 14, no. 3, pp. 290–303, Sep. 2020, doi: 10.2174/2352096513999200922125839.

[7] A. Al-Shaaby, H. Aljamaan, and M. Alshayeb, “Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review,” Arab. J. Sci. Eng., vol. 45, no. 4, pp. 2341–2369, Apr. 2020, doi: 10.1007/S13369-019-04311-W.

[8] P. Kokol, M. Kokol, and S. Zagoranski, “Code smells: A Synthetic Narrative Review,” Libr. Philos. Pract., vol. 2020, pp. 1-13, Mar. 2021. [Online]. Available: https://arxiv.org/abs/2103.01088v1

[9] F. Arcelli Fontana, M. V. Mäntylä, M. Zanoni, and A. Marino, “Comparing and experimenting machine learning techniques for code smell detection,” Empir. Softw. Eng., vol. 21, no. 3, pp. 1143–1191, Jun. 2016, doi: 10.1007/S10664-015-9378-4.

[10] T. Guggulothu and S. A. Moiz, “Code smell detection using multi-label classification approach,” Softw. Qual. J., vol. 28, no. 3, pp. 1063–1086, Sep. 2020, doi: 10.1007/S11219-020-09498-Y.

[11] J. Nanda and J. K. Chhabra, “SSHM: SMOTE-stacked hybrid model for improving severity classification of code smell,” Int. J. Inf. Technol., vol. 14, no. 5, pp. 2701–2707, Aug. 2022, doi: 10.1007/S41870-022-00943-8.

[12] F. L. Caram, B. R. D. O. Rodrigues, A. S. Campanelli, and F. S. Parreiras, “Machine Learning Techniques for Code Smells Detection: A Systematic Mapping Study,” International Journal of Software Engineering and Knowledge Engineering, vol. 29, no. 2, pp. 285–316, Feb. 2019, doi: 10.1142/S021819401950013X.

[13] F. C. Luiz, B. R. De Oliveira, and F. S. Parreiras, “Machine learning techniques for code smells detection: An empirical experiment on a highly imbalanced setup,” ACM Int. Conf. Proceeding Ser., May 2019, doi: 10.1145/3330204.3330275.

[14] D. Oliveira, W. K. G. Assunção, L. Souza, W. Oizumi, A. Garcia, and B. Fonseca, “Applying Machine Learning to Customized Smell Detection: A Multi-Project Study,” ACM Int. Conf. Proceeding Ser., pp. 233–242, Oct. 2020, doi: 10.1145/3422392.3422427.

[15] D. Di Nucci, F. Palomba, D. A. Tamburri, A. Serebrenik, and A. De Lucia, “Detecting code smells using machine learning techniques: Are we there yet?,” 25th IEEE Int. Conf. Softw. Anal. Evol. Reengineering, SANER 2018 - Proc., vol. 2018-March, pp. 612–621, Apr. 2018, doi: 10.1109/SANER.2018.8330266.

[16] D. Cruz, A. Santana, and E. Figueiredo, “Detecting bad smells with machine learning algorithms: An empirical study,” Proc. - 2020 IEEE/ACM Int. Conf. Tech. Debt, TechDebt 2020, pp. 31–40, Jun. 2020, doi: 10.1145/3387906.3388618.

[17] A. D. F. Martins, C. Melo, J. M. Monteiro, and J. de Castro Machado, “Empirical study about class change proneness prediction using software metrics and code smells,” ICEIS 2020 - Proc. 22nd Int. Conf. Enterp. Inf. Syst., vol. 1, pp. 140–147, 2020, doi: 10.5220/0009410601400147.

[18] M. Hozano, N. Antunes, B. Fonseca, and E. Costa, “Evaluating the accuracy of machine learning algorithms on detecting code smells for different developers,” ICEIS 2017 - Proc. 19th Int. Conf. Enterp. Inf. Syst., vol. 2, pp. 474–482, 2017, doi: 10.5220/0006338804740482.

[19] T. Sharma, V. Efstathiou, P. Louridas, and D. Spinellis, “On the Feasibility of Transfer-learning Code Smells using Deep Learning,” ACM Trans. Softw. Eng. Methodol. 1, 1, Artic., vol. 1, pp. 1-34, Apr. 2019, doi: 10.48550/arXiv.1904.03031.

[20] S. Dewangan, R. S. Rao, A. Mishra, and M. Gupta, “A novel approach for code smell detection: An empirical study,” IEEE Access, vol. 9, pp. 162869–162883, 2021, doi: 10.1109/ACCESS.2021.3133810.

[21] S. Jain and A. Saha, “Rank-based univariate feature selection methods on machine learning classifiers for code smell detection,” Evol. Intell., vol. 15, no. 1, pp. 609–638, Mar. 2022, doi: 10.1007/S12065-020-00536-Z.

[22] F. Pecorelli, D. Di Nucci, C. De Roover, and A. De Lucia, “A large empirical assessment of the role of data balancing in machine-learning-based code smell detection,” J. Syst. Softw., vol. 169, p. 110693, Nov. 2020, doi: 10.1016/J.JSS.2020.110693.

[23] E. Tempero et al., “The Qualitas Corpus: A curated collection of Java code for empirical studies,” Proc. - Asia-Pacific Softw. Eng. Conf. APSEC, pp. 336–345, 2010, doi: 10.1109/APSEC.2010.46.

[24] G. Saranya, H. Khanna Nehemiah, A. Kannan, and V. Nithya, “Model level code smell detection using EGAPSO based on similarity measures,” Alexandria Eng. J., vol. 57, no. 3, pp. 1631–1642, Sep. 2018, doi: 10.1016/J.AEJ.2017.07.006.

[25] W. Xu and X. Zhang, “Multi-granularity code smell detection using deep learning method based on abstract syntax tree,” Proc. Int. Conf. Softw. Eng. Knowl. Eng. SEKE, vol. 2021-July, pp. 503–509, 2021, doi: 10.18293/SEKE2021-014.

[26] F. Pecorelli, F. Palomba, D. Di Nucci, and A. De Lucia, “Comparing heuristic and machine learning approaches for metric-based code smell detection,” IEEE Int. Conf. Progr. Compr., vol. 2019-May, pp. 93–104, May 2019, doi: 10.1109/ICPC.2019.00023.

[27] M. Hadj-Kacem and N. Bouassida, “A hybrid approach to detect code smells using deep learning,” ENASE 2018 - Proc. 13th Int. Conf. Eval. Nov. Approaches to Softw. Eng., vol. 2018-March, pp. 137–146, 2018, doi: 10.5220/0006709801370146.

[28] N. A. A. Khleel and K. Nehez, “Comprehensive Study on Machine Learning Techniques for Software Bug Prediction,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 8, pp. 726–735, 2021, doi: 10.14569/IJACSA.2021.0120884.

[29] S. Jain and A. Saha, “Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection,” Sci. Comput. Program., vol. 212, p. 102713, Dec. 2021, doi: 10.1016/J.SCICO.2021.102713.

[30] Z. Agusta, Z. P. Agusta, and A. Adiwijaya, “Modified balanced random forest for improving imbalanced data prediction,” Int. J. Adv. Intell. Informatics, vol. 5, no. 1, pp. 58–65, Mar. 2019, doi: 10.26555/ijain.v5i1.255.

[31] Z. M. Zain et al., “Predicting breast cancer recurrence using principal component analysis as feature extraction: an unbiased comparative analysis,” Int. J. Adv. Intell. Informatics, vol. 6, no. 3, pp. 313–327, Nov. 2020, doi: 10.26555/ijain.v6i3.462.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0