Improved point center algorithm for K-Means clustering to increase software defect prediction

Riski Annisa; Didi Rosiyadi; Dwiza Riana

doi:10.26555/ijain.v6i3.484


Improved point center algorithm for K-Means clustering to increase software defect prediction

^{(1) *} Riski Annisa

(Universitas Bina Sarana Informatika, Indonesia)
⁽²⁾ Didi Rosiyadi

(Research Center for Informatics, Indonesian Institute of Sciences (LIPI), Bandung, Indonesia; and Master Program of Computer Science STMIK Nusa Mandiri, Jakarta, Indonesia)
⁽³⁾ Dwiza Riana

(STMIK Nusa Mandiri, Indonesia)
^*corresponding author

Abstract

The k-means is a clustering algorithm that is often and easy to use. This algorithm is susceptible to randomly chosen centroid points so that it cannot produce optimal results. This research aimed to improve the k-means algorithmâ€™s performance by applying a proposed algorithm called point center. The proposed algorithm overcame the random centroid value in k-means and then applied it to predict software defects modulesâ€™ errors. The point center algorithm was proposed to determine the initial centroid value for the k-means algorithm optimization. Then, the selection of X and Y variables determined the cluster center members. The ten datasets were used to perform the testing, of which nine datasets were used for predicting software defects. The proposed center point algorithm showed the lowest errors. It also improved the k-means algorithmâ€™s performance by an average of 12.82% cluster errors in the software compared to the centroid value obtained randomly on the simple k-means algorithm. The findings are beneficial and contribute to developing a clustering model to handle data, such as to predict software defect modules more accurately.

Keywords

Algorithm, K-Means, Cluster, Centroid, Software defect

DOI

https://doi.org/10.26555/ijain.v6i3.484

Article metrics

Abstract views : 2407 | PDF views : 372

Cite

How to cite item

Full Text

Download

References

[1] M. G. Siavvas, K. C. Chatzidimitriou, and A. L. Symeonidis, â€œQATCH - An adaptive framework for software product quality assessment,â€ Expert Syst. Appl., vol. 86, pp. 350â€“366, Nov. 2017, doi: 10.1016/j.eswa.2017.05.060.

[2] L. Qiao, X. Li, Q. Umer, and P. Guo, â€œDeep learning based software defect prediction,â€ Neurocomputing, vol. 385, pp. 100â€“110, Apr. 2020, doi: 10.1016/j.neucom.2019.11.067.

[3] X. Chen, D. Zhang, Y. Zhao, Z. Cui, and C. Ni, â€œSoftware defect number prediction: Unsupervised vs supervised methods,â€ Inf. Softw. Technol., vol. 106, pp. 161â€“181, Feb. 2019, doi: 10.1016/j.infsof.2018.10.003.

[4] G. K. Rajbahadur, S. Wang, Y. Kamei, and A. E. Hassan, â€œThe Impact of Using Regression Models to Build Defect Classifiers,â€ in 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), 2017, pp. 135â€“145, doi: 10.1109/MSR.2017.4.

[5] A. Majd, M. Vahidi-Asl, A. Khalilian, P. Poorsarvi-Tehrani, and H. Haghighi, â€œSLDeep: Statement-level software defect prediction using deep-learning model on static code features,â€ Expert Syst. Appl., vol. 147, p. 113156, Jun. 2020, doi: 10.1016/j.eswa.2019.113156.

[6] R. Moussa and D. Azar, â€œA PSO-GA approach targeting fault-prone software modules,â€ J. Syst. Softw., vol. 132, pp. 41â€“49, Oct. 2017, doi: 10.1016/j.jss.2017.06.059.

[7] A. Boucher and M. Badri, â€œPredicting Fault-Prone Classes in Object-Oriented Software: An Adaptation of an Unsupervised Hybrid SOM Algorithm,â€ in 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), 2017, pp. 306â€“317, doi: 10.1109/QRS.2017.41.

[8] Z. Sun, J. Zhang, H. Sun, and X. Zhu, â€œCollaborative filtering based recommendation of sampling methods for software defect prediction,â€ Appl. Soft Comput., vol. 90, p. 106163, May 2020, doi: 10.1016/j.asoc.2020.106163.

[9] F. HUANG and B. LIU, â€œSoftware defect prevention based on human error theories,â€ Chinese J. Aeronaut., vol. 30, no. 3, pp. 1054â€“1070, Jun. 2017, doi: 10.1016/j.cja.2017.03.005.

[10] N. Li, M. Shepperd, and Y. Guo, â€œA systematic review of unsupervised learning techniques for software defect prediction,â€ Inf. Softw. Technol., vol. 122, p. 106287, Jun. 2020, doi: 10.1016/j.infsof.2020.106287.

[11] Q. Huang, X. Xia, and D. Lo, â€œSupervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction,â€ in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2017, pp. 159â€“170, doi: 10.1109/ICSME.2017.51.

[12] X. Chen, Y. Zhao, Q. Wang, and Z. Yuan, â€œMULTI: Multi-objective effort-aware just-in-time software defect prediction,â€ Inf. Softw. Technol., vol. 93, pp. 1â€“13, Jan. 2018, doi: 10.1016/j.infsof.2017.08.004.

[13] R. Chang, X. Shen, B. Wang, and Q. Xu, â€œA novel method for software defect prediction in the context of big data,â€ in 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)(, 2017, pp. 100â€“104, doi: 10.1109/ICBDA.2017.8078785.

[14] A. Boucher and M. Badri, â€œSoftware metrics thresholds calculation techniques to predict fault-proneness: An empirical comparison,â€ Inf. Softw. Technol., vol. 96, pp. 38â€“67, Apr. 2018, doi: 10.1016/j.infsof.2017.11.005.

[15] S. Singh and R. Singla, â€œClassification of defective modules using object-oriented metrics,â€ Int. J. Intell. Syst. Technol. Appl., vol. 16, no. 1, p. 1, 2017, doi: 10.1504/IJISTA.2017.081311.

[16] M. Yan, X. Zhang, C. Liu, L. Xu, M. Yang, and D. Yang, â€œAutomated change-prone class prediction on unlabeled dataset using unsupervised method,â€ Inf. Softw. Technol., vol. 92, pp. 1â€“16, Dec. 2017, doi: 10.1016/j.infsof.2017.07.003.

[17] M. Yan, Y. Fang, D. Lo, X. Xia, and X. Zhang, â€œFile-Level Defect Prediction: Unsupervised vs. Supervised Models,â€ in 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2017, pp. 344â€“353, doi: 10.1109/ESEM.2017.48.

[18] J. Liu, Y. Zhou, Y. Yang, H. Lu, and B. Xu, â€œCode Churn: A Neglected Metric in Effort-Aware Just-in-Time Defect Prediction,â€ in 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2017, pp. 11â€“19, doi: 10.1109/ESEM.2017.8.

[19] E. Zhu, Y. Zhang, P. Wen, and F. Liu, â€œFast and stable clustering analysis based on Grid-mapping K-means algorithm and new clustering validity index,â€ Neurocomputing, vol. 363, pp. 149â€“170, Oct. 2019, doi: 10.1016/j.neucom.2019.07.048.

[20] S. Khanmohammadi, N. Adibeig, and S. Shanehbandy, â€œAn improved overlapping k-means clustering method for medical applications,â€ Expert Syst. Appl., vol. 67, pp. 12â€“18, Jan. 2017, doi: 10.1016/j.eswa.2016.09.025.

[21] A. Kaur, S. K. Pal, and A. P. Singh, â€œHybridization of Chaos and Flower Pollination Algorithm over K-Means for data clustering,â€ Appl. Soft Comput., vol. 97, p. 105523, Dec. 2020, doi: 10.1016/j.asoc.2019.105523.

[22] A. Fadaei and S. H. Khasteh, â€œEnhanced K-means re-clustering over dynamic networks,â€ Expert Syst. Appl., vol. 132, pp. 126â€“140, Oct. 2019, doi: 10.1016/j.eswa.2019.04.061.

[23] P. FrÃ¤nti and S. Sieranoja, â€œHow much can k-means be improved by using better initialization and repeats?,â€ Pattern Recognit., vol. 93, pp. 95â€“112, Sep. 2019, doi: 10.1016/j.patcog.2019.04.014.

[24] H. Ismkhan, â€œI-k-meansâˆ’+: An iterative clustering algorithm based on an enhanced version of the k-means,â€ Pattern Recognit., vol. 79, pp. 402â€“413, Jul. 2018, doi: 10.1016/j.patcog.2018.02.015.

[25] S. K. Majhi and S. Biswal, â€œOptimal cluster analysis using hybrid K-Means and Ant Lion Optimizer,â€ Karbala Int. J. Mod. Sci., vol. 4, no. 4, pp. 347â€“360, Dec. 2018, doi: 10.1016/j.kijoms.2018.09.001.

[26] N. Nidheesh, K. A. Abdul Nazeer, and P. M. Ameer, â€œAn enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data,â€ Comput. Biol. Med., vol. 91, pp. 213â€“221, Dec. 2017, doi: 10.1016/j.compbiomed.2017.10.014.

[27] R. S. Wahono, â€œA Systematic Literature Review of Software Defect Prediction : Research Trends , Datasets , Methods and Frameworks,â€ J. Softw. Eng., vol. 1, no. 1, pp. 1â€“16, 2015, Available at: Google Scholar

[28] R. A. Fisher, â€œThe use of multiple measurements in taxonomic problems,â€ Ann. Hum. Genet., vol. 7, no. 2, pp. 179â€“188, 1936, doi: 10.1111/j.1469-1809.1936.tb02137.x.

[29] X. Wu et al., â€œTop 10 algorithms in data mining,â€ Knowl. Inf. Syst., vol. 14, no. 1, pp. 1â€“37, Jan. 2008, doi: 10.1007/s10115-007-0114-2.

[30] S. F. Hussain and M. Haris, â€œA k-means based co-clustering (kCC) algorithm for sparse, high dimensional data,â€ Expert Syst. Appl., vol. 118, pp. 20â€“34, Mar. 2019, doi: 10.1016/j.eswa.2018.09.006.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me