(2) Opim Salim Sitompul (Universitas Sumatera Utara, Indonesia)
(3) Tulus Tulus (Universitas Sumatera Utara, Indonesia)
(4) Erna Budhiarti Nababan (Universitas Sumatera Utara, Indonesia)
*corresponding author
AbstractClass imbalance occurs when instances in a class are much higher than in other classes. This machine learning major problem can affect the predicted accuracy. Support Vector Machine (SVM) is robust and precise method in handling class imbalance problem but weak in the bias data distribution, Biased Support Vector Machine (BSVM) became popular choice to solve the problem. BSVM provide better control sensitivity yet lack accuracy compared to general SVM. This study proposes the integration of BSVM and SMOTEBoost to handle class imbalance problem. Non Support Vector (NSV) sets from negative samples and Support Vector (SV) sets from positive samples will undergo a Weighted-SMOTE process. The results indicate that implementation of Biased Support Vector Machine and Weighted-SMOTE achieve better accuracy and sensitivity.
KeywordsClass Imbalance; Biased Support Vector Machine; Borderline-SMOTE; Positive Samples; Negative Samples
|
DOIhttps://doi.org/10.26555/ijain.v4i1.146 |
Article metricsAbstract views : 7210 | PDF views : 498 |
Cite |
Full TextDownload |
References
[1] S. M. A. Elrahman and A. Abraham, “A Review of Class Imbalance Problem,” J. Netw. Innov. Comput., vol. 1, pp. 332–340, 2013, available at: http://ias04.softcomputing.net/jnic2.pdf.
[2] S. Maldonado and J. López, “Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification,” Appl. Soft Comput., Vol. 67, pp. 94-105, Jun. 2018, doi: https://doi.org/10.1016/j.asoc.2018.02.051.
[3] J. Ortigosa-Hernández, I. Inza, and J. A. Lozano, “Measuring the class-imbalance extent of multi-class problems,” Pattern Recognit. Lett., vol. 98, pp. 32–38, 2017, doi: https://doi.org/10.1016/j.patrec.2017.08.002.
[4] O. Loyola-González, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, and M. García-Borroto, “Effect of class imbalance on quality measures for contrast patterns: An experimental study,” Inf. Sci., vol. 374, pp. 179–192, Dec. 2016, doi: https://doi.org/10.1016/j.ins.2016.09.040.
[5] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 42, no. 4, pp. 463–484, Jul. 2012, doi: https://doi.org/10.1109/TSMCC.2011.2161285.
[6] Hartono, O. S. Sitompul, Tulus, and E. B. Nababan, “Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem,” IOP Conf. Ser. Mater. Sci. Eng., vol. 288, p. 012075, Jan. 2018, doi: https://doi.org/10.1088/1757-899X/288/1/012075.
[7] W.-C. Lin, C.-F. Tsai, Y.-H. Hu, and J.-S. Jhang, “Clustering-based undersampling in class-imbalanced data,” Inf. Sci., vol. 409–410, pp. 17–26, Oct. 2017, doi: https://doi.org/10.1016/j.ins.2017.05.008 .
[8] J. A. Sáez, B. Krawczyk, and M. Woźniak, “Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets,” Pattern Recognit., vol. 57, pp. 164–178, Sep. 2016, doi: https://doi.org/10.1016/j.patcog.2016.03.012.
[9] D. Furundzic, S. Stankovic, S. Jovicic, S. Punisic, and M. Subotic, “Distance based resampling of imbalanced classes: With an application example of speech quality assessment,” Eng. Appl. Artif. Intell., vol. 64, pp. 440–461, Sep. 2017, doi: https://doi.org/10.1016/j.engappai.2017.07.001.
[10] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from Class-Imbalanced Data: Review of Methods and Applications,” Expert Syst. Appl., vol. 73, pp. 220–239, May 2017, doi: https://doi.org/10.1016/j.eswa.2016.12.035.
[11] L. Zhang, C. Zhang, R. Gao, R. Yang, and Q. Song, “Using the SMOTE technique and hybrid features to predict the types of ion channel-targeted conotoxins,” J. Theor. Biol., vol. 403, pp. 75–84, 2016, doi: https://doi.org/10.1016/j.jtbi.2016.04.034.
[12] X. Yuan, L. Xie, and M. Abouelenien, “A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data,” Pattern Recognit., vol. 77, pp. 160–172, May 2018, doi: https://doi.org/10.1016/j.patcog.2017.12.017.
[13] K. Veropoulos, C. Campbell, and N. Cristianini, “Controlling the Sensitivity of Support Vector Machines,” presented at the International Joint Conference on AI, 1999, pp. 55–60, available at: http://seis.bris.ac.uk/~enicgc/pubs/1999/ijcai_ss.pdf.
[14] C. Jian, J. Gao, and Y. Ao, “A new sampling method for classifying imbalanced data based on support vector machine ensemble,” Neurocomputing, vol. 193, pp. 115–122, Jun. 2016, doi: https://doi.org/10.1016/j.neucom.2016.02.006.
[15] M. R. Prusty, T. Jayanthi, and K. Velusamy, “Weighted-SMOTE: A Modification to SMOTE for Event Classification in Sodium Cooled Fast Reactors,” Prog. Nucl. Energy, vol. 100, pp. 355–364, 2017, doi: https://doi.org/10.1016/j.pnucene.2017.07.015.
[16] M. Gao, X. Hong, S. Chen, and C. J. Harris, “A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems,” Neurocomputing, vol. 74, no. 17, pp. 3456–3466, Oct. 2011, doi: https://doi.org/10.1016/j.neucom.2011.06.010.
[17] H. He and A. Ghodsi, “Rare class classification by support vector machine,” in Pattern Recognition (ICPR), 2010 20th International Conference on, 2010, pp. 548–551, doi: https://doi.org/10.1109/ICPR.2010.139.
[18] L. Gonzalez-Abril, H. Nuñez, C. Angulo, and F. Velasco, “GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems,” Appl. Soft Comput., vol. 17, pp. 23–31, Apr. 2014, doi: https://doi.org/10.1016/j.asoc.2013.12.013.
[19] L. Gonzalez-Abril, C. Angulo, H. Nuñez, and Y. Leal, “Handling binary classification problems with a priority class by using Support Vector Machines,” Appl. Soft Comput., vol. 61, pp. 661–669, Dec. 2017, doi: https://doi.org/10.1016/j.asoc.2017.08.023.
[20] R. Loohach and K. Garg, “Effect of Distance Functions on K-Means Clustering Algorithm,” Int. J. Comput. Appl., vol. 49, no. 6, pp. 7–9, 2012, doi: https://doi.org/10.5120/7629-0698.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0