Boosting and bagging classification for computer science journal

Nastiti Susetyo Fanany Putri; Aji Prasetya Wibawa; Harits Ar Rasyid; Andrew Nafalski; Ummi Rabaah Hasyim

doi:10.26555/ijain.v9i1.985


Boosting and bagging classification for computer science journal

^{(1) *} Nastiti Susetyo Fanany Putri

(Universitas Negeri Malang, Indonesia)
⁽²⁾ Aji Prasetya Wibawa

(Universitas Negeri Malang, Indonesia)
⁽³⁾ Harits Ar Rasyid

(Universitas Negeri Malang, Indonesia)
⁽⁴⁾ Andrew Nafalski

(University of South Australia, Australia)
⁽⁵⁾ Ummi Rabaah Hasyim

(Univeriti Teknikal Malaysia Melaka Melaka, Malaysia)
^*corresponding author

Abstract

In recent years, data processing has become an issue across all disciplines. Good data processing can provide decision-making recommendations. Data processing is covered in academic data processing publications, including those in computer science. This topic has grown over the past three years, demonstrating that data processing is expanding and diversifying, and there is a great deal of interest in this area of study. Within the journal, groupings (quartiles) indicate the journal's influence on other similar studies. SCImago provides this category. There are four quartiles, with the highest quartile being 1 and the lowest being 4. There are, however, numerous differences in class quartiles, with different quartile values for the same journal in different disciplines. Therefore, a method of categorization is provided to solve this issue. Classification is a machine-learning technique that groups data based on the supplied label class. Ensemble Boosting and Bagging with Decision Tree (DT) and Gaussian Nave Bayes (GNB) were utilized in this study. Several modifications were made to the ensemble algorithm's depth and estimator settings to examine the influence of adding values on the resultant precision. In the DT algorithm, both variables are altered, whereas, in the GNB algorithm, just the estimator's value is modified. Based on the average value of the accuracy results, it is known that the best algorithm for computer science datasets is GNB Bagging, with values of 68.96%, 70.99%, and 69.05%. Second-place XGBDT has 67.75% accuracy, 67.69% precision, and 67.83 recall. The DT Bagging method placed third with 67.31 percent recall, 68.13 percent precision, and 67.30 percent accuracy. The fourth sequence is the XGBoost GNB approach, which has an accuracy of 67.07%, a precision of 68.85%, and a recall of 67.18%. The Adaboost DT technique ranks in the fifth position with an accuracy of 63.65%, a precision of 64.21 %, and a recall of 63.63 %. Adaboost GNB is the least efficient algorithm for this dataset since it only achieves 43.19 % accuracy, 48.14 % precision, and 43.2% recall. The results are still quite far from the ideal. Hence the proposed method for journal quartile inequality issues is not advised.

Keywords

Ensemble Learning, Boosting, Bagging, Decision Tree, Gaussian Naive Bayes, SCImago Journal Rank

DOI

https://doi.org/10.26555/ijain.v9i1.985

Article metrics

Abstract views : 1384 | PDF views : 331

Cite

How to cite item

Full Text

Download

References

[1] J. Vom Brocke, R. Winter, A. Hevner, and A. Maedche, â€œSpecial issue editorial â€“ accumulation and evolution of design knowledge in design science research: A journey through time and space,â€ J. Assoc. Inf. Syst., vol. 21, no. 3, pp. 520â€“544, 2020, doi: 10.17705/1jais.00611.

[2] A. P. Wibawa et al., â€œNaÃ¯ve Bayes Classifier for Journal Quartile Classification,â€ Int. J. Recent Contrib. from Eng. Sci. IT, vol. 7, no. 2, p. 91, 2019, doi: 10.3991/ijes.v7i2.10659.

[3] M. Sabokrou, M. Fathy, G. Zhao, and E. Adeli, â€œDeep End-to-End One-Class Classifier,â€ IEEE Trans. Neural Networks Learn. Syst., vol. 32, no. 2, pp. 675â€“684, 2021, doi: 10.1109/TNNLS.2020.2979049.

[4] M. Abdar et al., â€œA new nested ensemble technique for automated diagnosis of breast cancer,â€ Pattern Recognit. Lett., vol. 132, pp. 123â€“131, 2020, doi: 10.1016/j.patrec.2018.11.004.

[5] J. NaliÄ‡, G. MartinoviÄ‡, and D. Å½agar, â€œNew hybrid data mining model for credit scoring based on feature selection algorithm and ensemble classifiers,â€ Adv. Eng. Informatics, vol. 45, p. 101130, 2020, doi: 10.1016/j.aei.2020.101130.

[6] J. L. FernÃ¡ndez-AlemÃ¡n, J. M. Carrillo-de-Gea, M. Hosni, A. Idri, and G. GarcÃa-Mateos, â€œHomogeneous and heterogeneous ensemble classification methods in diabetes disease: a review,â€ in 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019, pp. 3956â€“3959, doi: 10.1109/EMBC.2019.8856341.

[7] Y. Himeur, A. Alsalemi, F. Bensaali, and A. Amira, â€œRobust event-based non-intrusive appliance recognition using multi-scale wavelet packet tree and ensemble bagging tree,â€ Appl. Energy, vol. 267, p. 114877, 2020, doi: 10.1016/j.apenergy.2020.114877.

[8] G. Tuysuzoglu and D. Birant, â€œEnhanced bagging (eBagging): A novel approach for ensemble learning,â€ Int. Arab J. Inf. Technol., vol. 17, no. 4, pp. 515â€“528, 2020, doi: 10.34028/iajit/17/4/10.

[9] X. Huang et al., â€œEnsemble-boosting effect of Ru-Cu alloy on catalytic activity towards hydrogen evolution in ammonia borane hydrolysis,â€ Appl. Catal. B Environ., vol. 287, p. 119960, 2021, doi: 10.1016/j.apcatb.2021.119960.

[10] A. Mosavi, F. Sajedi Hosseini, B. Choubin, M. Goodarzi, A. A. Dineva, and E. Rafiei Sardooi, â€œEnsemble Boosting and Bagging Based Machine Learning Models for Groundwater Potential Prediction,â€ Water Resour. Manag., vol. 35, no. 1, pp. 23â€“37, 2021, doi: 10.1007/s11269-020-02704-3.

[11] Y. Xiong, M. Ye, and C. Wu, â€œCancer Classification with a Cost-Sensitive Naive Bayes Stacking Ensemble,â€ Comput. Math. Methods Med., vol. 2021, 2021, doi: 10.1155/2021/5556992.

[12] B. A. Hassan and T. A. Rashid, â€œA multidisciplinary ensemble algorithm for clustering heterogeneous datasets,â€ Neural Comput. Appl., vol. 33, no. 17, pp. 10987â€“11010, 2021, doi: 10.1007/s00521-020-05649-1.

[13] A. P. Wibawa, â€œInternational Journal Quartile Classification Using the K-Nearest Neighbor Method,â€ 2019, doi: 10.1109/ICEEIE47180.2019.8981413.

[14] K. Nahar, B. I. Shova, T. Ria, H. B. Rashid, and A. H. M. S. Islam, â€œMining educational data to predict students performance,â€ Educ. Inf. Technol., vol. 26, no. 5, pp. 6051â€“6067, Sep. 2021, doi: 10.1007/s10639-021-10575-3.

[15] H. Benhar, A. Idri, and J. L. FernÃ¡ndez-AlemÃ¡n, â€œData preprocessing for heart disease classification: A systematic literature review,â€ Comput. Methods Programs Biomed., vol. 195, p. 105635, 2020, doi: 10.1016/j.cmpb.2020.105635.

[16] S.-A. N. Alexandropoulos, S. B. Kotsiantis, and M. N. Vrahatis, â€œData preprocessing in predictive data mining,â€ Knowl. Eng. Rev., vol. 34, p. e1, 2019, doi: 10.1017/S026988891800036X.

[17] K. H. Tae, Y. Roh, Y. H. Oh, H. Kim, and S. E. Whang, â€œData Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach,â€ 2019, doi: 10.1145/3329486.3329493.

[18] C. B. C. Latha and S. C. Jeeva, â€œImproving the accuracy of prediction of heart disease risk based on ensemble classification techniques,â€ Informatics Med. Unlocked, vol. 16, p. 100203, 2019, doi: 10.1016/j.imu.2019.100203.

[19] R. Sambasivan, S. Das, and S. K. Sahu, â€œA Bayesian perspective of statistical machine learning for big data,â€ Comput. Stat., vol. 35, no. 3, pp. 893â€“930, Sep. 2020, doi: 10.1007/s00180-020-00970-8.

[20] H. Liu, R. Yang, T. Wang, and L. Zhang, â€œA hybrid neural network model for short-term wind speed forecasting based on decomposition, multi-learner ensemble, and adaptive multiple error corrections,â€ Renew. Energy, vol. 165, pp. 573â€“594, Mar. 2021, doi: 10.1016/j.renene.2020.11.002.

[21] H. Jafarzadeh, M. Mahdianpari, E. Gill, F. Mohammadimanesh, and S. Homayouni, â€œBagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation,â€ Remote Sens., vol. 13, no. 21, 2021, doi: 10.3390/rs13214405.

[22] B. So, J.-P. Boucher, and E. A. Valdez, â€œCost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics,â€ Jul. 2020, doi: 10.48550/arXiv.2007.03100.

[23] E. K. Sahin, â€œAssessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest,â€ SN Appl. Sci., vol. 2, no. 7, p. 1308, Jul. 2020, doi: 10.1007/s42452-020-3060-1.

[24] M. Ma et al., â€œXGBoost-based method for flash flood risk assessment,â€ J. Hydrol., vol. 598, p. 126382, Jul. 2021, doi: 10.1016/j.jhydrol.2021.126382.

[25] V. Patel, S. Choe, and T. Halabi, â€œPredicting Future Malware Attacks on Cloud Systems using Machine Learning,â€ in 2020 IEEE 6th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), May 2020, pp. 151â€“156, doi: 10.1109/BigDataSecurity-HPSC-IDS49724.2020.00036.

[26] F. Harrou, A. Saidi, and Y. Sun, â€œWind power prediction using bootstrap aggregating trees approach to enabling sustainable wind power integration in a smart grid,â€ Energy Convers. Manag., vol. 201, p. 112077, 2019, doi: 10.1016/j.enconman.2019.112077.

[27] D. C. Yadav and S. Pal, â€œPrediction of thyroid disease using decision tree ensemble method,â€ Human-Intelligent Syst. Integr., vol. 2, no. 1â€“4, pp. 89â€“95, 2020, doi: 10.1007/s42454-020-00006-y.

[28] M. Ashraf, M. Zaman, and M. Ahmed, â€œAn Intelligent Prediction System for Educational Data Mining Based on Ensemble and Filtering approaches,â€ Procedia Comput. Sci., vol. 167, pp. 1471â€“1483, 2020, doi: 10.1016/j.procs.2020.03.358.

[29] Z. P. Brodeur, J. D. Herman, and S. Steinschneider, â€œBootstrap Aggregation and Crossâ€Validation Methods to Reduce Overfitting in Reservoir Control Policy Search,â€ Water Resour. Res., vol. 56, no. 8, Aug. 2020, doi: 10.1029/2020WR027184.

[30] S. Boukir and W. Feng, â€œBoundary bagging to address training data issues in ensemble classification,â€ in 2020 25th International Conference on Pattern Recognition (ICPR), Jan. 2021, pp. 9975â€“9981, doi: 10.1109/ICPR48806.2021.9413055.

[31] H. Du and Y. Zhang, â€œNetwork anomaly detection based on selective ensemble algorithm,â€ J. Supercomput., vol. 77, no. 3, pp. 2875â€“2896, Mar. 2021, doi: 10.1007/s11227-020-03374-z.

[32] P. Melin, J. C. Monica, D. Sanchez, and O. Castillo, â€œMultiple Ensemble Neural Network Models with Fuzzy Response Aggregation for Predicting COVID-19 Time Series: The Case of Mexico,â€ Healthcare, vol. 8, no. 2, p. 181, Jun. 2020, doi: 10.3390/healthcare8020181.

[33] A. Theissler, M. Thomas, M. Burch, and F. Gerschner, â€œConfusionVis: Comparative evaluation and selection of multi-class classifiers based on confusion matrices,â€ Knowledge-Based Syst., vol. 247, p. 108651, Jul. 2022, doi: 10.1016/j.knosys.2022.108651.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me