Biased support vector machine and weighted-smote in handling class imbalance problem

(1) * Hartono Hartono Mail (Universitas Sumatera Utara, Indonesia)
(2) Opim Salim Sitompul Mail (Universitas Sumatera Utara, Indonesia)
(3) Tulus Tulus Mail (Universitas Sumatera Utara, Indonesia)
(4) Erna Budhiarti Nababan Mail (Universitas Sumatera Utara, Indonesia)
*corresponding author

Abstract


Class imbalance occurs when instances in a class are much higher than in other classes. This machine learning major problem can affect the predicted accuracy. Support Vector Machine (SVM) is robust and precise method in handling class imbalance problem but weak in the bias data distribution, Biased Support Vector Machine (BSVM) became popular choice to solve the problem. BSVM provide better control sensitivity yet lack accuracy compared to general SVM. This study proposes the integration of BSVM and SMOTEBoost to handle class imbalance problem. Non Support Vector (NSV) sets from negative samples and Support Vector (SV) sets from positive samples will undergo a Weighted-SMOTE process. The results indicate that implementation of Biased Support Vector Machine and Weighted-SMOTE achieve better accuracy and sensitivity.

Keywords


Class Imbalance; Biased Support Vector Machine; Borderline-SMOTE; Positive Samples; Negative Samples

   

DOI

https://doi.org/10.26555/ijain.v4i1.146
      

Article metrics

Abstract views : 571 | PDF views : 152

   

Cite

   

Full Text

Download

References


[1] S. M. A. Elrahman and A. Abraham, “A Review of Class Imbalance Problem,” J. Netw. Innov. Comput., vol. 1, pp. 332–340, 2013, available at: http://ias04.softcomputing.net/jnic2.pdf.

[2] S. Maldonado and J. López, “Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification,” Appl. Soft Comput., Vol. 67, pp. 94-105, Jun. 2018, doi: https://doi.org/10.1016/j.asoc.2018.02.051.

[3] J. Ortigosa-Hernández, I. Inza, and J. A. Lozano, “Measuring the class-imbalance extent of multi-class problems,” Pattern Recognit. Lett., vol. 98, pp. 32–38, 2017, doi: https://doi.org/10.1016/j.patrec.2017.08.002.

[4] O. Loyola-González, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, and M. García-Borroto, “Effect of class imbalance on quality measures for contrast patterns: An experimental study,” Inf. Sci., vol. 374, pp. 179–192, Dec. 2016, doi: https://doi.org/10.1016/j.ins.2016.09.040.

[5] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 42, no. 4, pp. 463–484, Jul. 2012, doi: https://doi.org/10.1109/TSMCC.2011.2161285.

[6] Hartono, O. S. Sitompul, Tulus, and E. B. Nababan, “Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem,” IOP Conf. Ser. Mater. Sci. Eng., vol. 288, p. 012075, Jan. 2018, doi: https://doi.org/10.1088/1757-899X/288/1/012075.

[7] W.-C. Lin, C.-F. Tsai, Y.-H. Hu, and J.-S. Jhang, “Clustering-based undersampling in class-imbalanced data,” Inf. Sci., vol. 409–410, pp. 17–26, Oct. 2017, doi: https://doi.org/10.1016/j.ins.2017.05.008 .

[8] J. A. Sáez, B. Krawczyk, and M. Woźniak, “Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets,” Pattern Recognit., vol. 57, pp. 164–178, Sep. 2016, doi: https://doi.org/10.1016/j.patcog.2016.03.012.

[9] D. Furundzic, S. Stankovic, S. Jovicic, S. Punisic, and M. Subotic, “Distance based resampling of imbalanced classes: With an application example of speech quality assessment,” Eng. Appl. Artif. Intell., vol. 64, pp. 440–461, Sep. 2017, doi: https://doi.org/10.1016/j.engappai.2017.07.001.

[10] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from Class-Imbalanced Data: Review of Methods and Applications,” Expert Syst. Appl., vol. 73, pp. 220–239, May 2017, doi: https://doi.org/10.1016/j.eswa.2016.12.035.

[11] L. Zhang, C. Zhang, R. Gao, R. Yang, and Q. Song, “Using the SMOTE technique and hybrid features to predict the types of ion channel-targeted conotoxins,” J. Theor. Biol., vol. 403, pp. 75–84, 2016, doi: https://doi.org/10.1016/j.jtbi.2016.04.034.

[12] X. Yuan, L. Xie, and M. Abouelenien, “A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data,” Pattern Recognit., vol. 77, pp. 160–172, May 2018, doi: https://doi.org/10.1016/j.patcog.2017.12.017.

[13] K. Veropoulos, C. Campbell, and N. Cristianini, “Controlling the Sensitivity of Support Vector Machines,” presented at the International Joint Conference on AI, 1999, pp. 55–60, available at: http://seis.bris.ac.uk/~enicgc/pubs/1999/ijcai_ss.pdf.

[14] C. Jian, J. Gao, and Y. Ao, “A new sampling method for classifying imbalanced data based on support vector machine ensemble,” Neurocomputing, vol. 193, pp. 115–122, Jun. 2016, doi: https://doi.org/10.1016/j.neucom.2016.02.006.

[15] M. R. Prusty, T. Jayanthi, and K. Velusamy, “Weighted-SMOTE: A Modification to SMOTE for Event Classification in Sodium Cooled Fast Reactors,” Prog. Nucl. Energy, vol. 100, pp. 355–364, 2017, doi: https://doi.org/10.1016/j.pnucene.2017.07.015.

[16] M. Gao, X. Hong, S. Chen, and C. J. Harris, “A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems,” Neurocomputing, vol. 74, no. 17, pp. 3456–3466, Oct. 2011, doi: https://doi.org/10.1016/j.neucom.2011.06.010.

[17] H. He and A. Ghodsi, “Rare class classification by support vector machine,” in Pattern Recognition (ICPR), 2010 20th International Conference on, 2010, pp. 548–551, doi: https://doi.org/10.1109/ICPR.2010.139.

[18] L. Gonzalez-Abril, H. Nuñez, C. Angulo, and F. Velasco, “GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems,” Appl. Soft Comput., vol. 17, pp. 23–31, Apr. 2014, doi: https://doi.org/10.1016/j.asoc.2013.12.013.

[19] L. Gonzalez-Abril, C. Angulo, H. Nuñez, and Y. Leal, “Handling binary classification problems with a priority class by using Support Vector Machines,” Appl. Soft Comput., vol. 61, pp. 661–669, Dec. 2017, doi: https://doi.org/10.1016/j.asoc.2017.08.023.

[20] R. Loohach and K. Garg, “Effect of Distance Functions on K-Means Clustering Algorithm,” Int. J. Comput. Appl., vol. 49, no. 6, pp. 7–9, 2012, doi: https://doi.org/10.5120/7629-0698.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by Informatics Department - Universitas Ahmad Dahlan , and UTM Big Data Centre - Universiti Teknologi Malaysia
Published by Universitas Ahmad Dahlan
W : http://ijain.org
E : info@ijain.org, andri.pranolo@tif.uad.ac.id (paper handling issues)
     ijain@uad.ac.id, andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0