Hybrid approach redefinition with cluster-based instance selection in handling class imbalance problem

(1) * Hartono Hartono Mail (Universitas Potensi Utama, Indonesia)
(2) Erianto Ongko Mail (Akademi Teknologi Industri Immanuel, Indonesia)
(3) Dahlan Abdullah Mail (Universitas Malikussaleh, Indonesia)
*corresponding author

Abstract


Class Imbalance problems often occur in the classification process, the existence of these problems is characterized by the tendency of a class to have instances that are much larger than other classes. This problem certainly causes a tendency towards low accuracy in minority classes with smaller number of instances and also causes important information on minority classes not to be obtained. Various methods have been applied to overcome the problem of the imbalance class. One of them is the Hybrid Approach Redefinition method which is one of the Hybrid Ensembles methods. The tendency to pay attention to the performance classifier, has led to an understanding of the importance of selecting an instance that will be used as a classifier. In the classic Hybrid Approach Redefinition method classifier selection is done randomly using the Random Under Sampling approach, and it is interesting to study how performance is obtained if the sampling process is based on Cluster-Based by selecting existing instances. The purpose of this study is to apply the Hybrid Approach Redefinition method with Cluster-Based Instance Selection (CBIS) approach so that it can obtain a better performance classifier. The results showed that Hybrid Approach Redefinition with cluster-based instance selection gave better results on the number of classifiers, data diversity, and performance classifiers compared to classic Hybrid Approach Redefinition.

Keywords


Class Imbalance;Hybrid Approach Redefinition;Hybrid Ensembles;Classifier;Data Diversity

   

DOI

https://doi.org/10.26555/ijain.v7i3.515
      

Article metrics

Abstract views : 275 | PDF views : 47

   

Cite

   

Full Text

Download

References


[1] A. de Haro-García, G. Cerruela-García, and N. García-Pedrajas, “Ensembles of feature selectors for dealing with class-imbalanced datasets: A proposal and comparative study,” Inf. Sci. (Ny)., 2020, doi: 10.1016/j.ins.2020.05.077.

[2] C. K. Maurya, D. Toshniwal, and G. V. Venkoparao, “Online anomaly detection via class-imbalance learning,” in 2015 8th International Conference on Contemporary Computing, IC3 2015, 2015, doi: 10.1109/IC3.2015.7346648.

[3] B. Richhariya and M. Tanveer, “A reduced universum twin support vector machine for class imbalance learning,” Pattern Recognit., 2020, doi: 10.1016/j.patcog.2019.107150.

[4] N. V. Chawla, “Data Mining for Imbalanced Datasets: An Overview,” 2009, doi: 10.1007/978-0-387-09823-4_45.

[5] D. Tomar and S. Agarwal, “Hybrid Feature Selection Based Weighted Least Squares Twin Support Vector Machine Approach for Diagnosing Breast Cancer, Hepatitis, and Diabetes,” Adv. Artif. Neural Syst., 2015, doi: 10.1155/2015/265637.

[6] Z. P. Agusta and Adiwijaya, “Modified balanced random forest for improving imbalanced data prediction,” Int. J. Adv. Intell. Informatics, 2019, doi: 10.26555/ijain.v5i1.255.

[7] B. Richhariya and M. Tanveer, “EEG signal classification using universum support vector machine,” Expert Syst. Appl., 2018, doi: 10.1016/j.eswa.2018.03.053.

[8] B. Krawczyk, “Learning from imbalanced data: open challenges and future directions,” 2016, doi: 10.1007/s13748-016-0094-0.

[9] C. K. Maurya and D. Toshniwal, “Large-Scale Distributed Sparse Class-Imbalance Learning,” Inf. Sci. (Ny)., 2018, doi: 10.1016/j.ins.2018.05.004.

[10] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches,” 2012, doi: 10.1109/TSMCC.2011.2161285.

[11] C. Jian, J. Gao, and Y. Ao, “A new sampling method for classifying imbalanced data based on support vector machine ensemble,” Neurocomputing, 2016, doi: 10.1016/j.neucom.2016.02.006.

[12] M. J. Kim, D. K. Kang, and H. B. Kim, “Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction,” Expert Syst. Appl., 2015, doi: 10.1016/j.eswa.2014.08.025.

[13] P. Yang, P. D. Yoo, J. Fernando, B. B. Zhou, Z. Zhang, and A. Y. Zomaya, “Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications,” IEEE Trans. Cybern., 2014, doi: 10.1109/TCYB.2013.2257480.

[14] Hartono, E. Ongko, O. S. Sitompul, Tulus, E. B. Nababan, and D. Abdullah, “Hybrid Approach Redefinition (HAR) Method with Loss Factors in Handling Class Imbalance Problem,” in Proceeding - 2018 International Symposium on Advanced Intelligent Informatics: Revolutionize Intelligent Informatics Spectrum for Humanity, SAIN 2018, 2019, doi: 10.1109/SAIN.2018.8673370.

[15] Hartono, O. S. Sitompul, Tulus, and E. B. Nababan, “Biased support vector machine and weighted-SMOTE in handling class imbalance problem,” Int. J. Adv. Intell. Informatics, 2018, doi: 10.26555/ijain.v4i1.146.

[16] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., 2009, doi: 10.1109/TKDE.2008.239.

[17] D. R. Wilson and T. R. Martinez, “Reduction techniques for instance-based learning algorithms,” Mach. Learn., 2000, doi: 10.1023/A:1007626913721.

[18] V. Vigneron and H. Chen, “A multi-scale seriation algorithm for clustering sparse imbalanced data: application to spike sorting,” Pattern Anal. Appl., 2016, doi: 10.1007/s10044-015-0458-2.

[19] S. García, J. Derrac, J. R. Cano, and F. Herrera, “Prototype selection for nearest neighbor classification: Taxonomy and empirical study,” IEEE Trans. Pattern Anal. Mach. Intell., 2012, doi: 10.1109/TPAMI.2011.142.

[20] C. F. Tsai, W. C. Lin, Y. H. Hu, and G. T. Yao, “Under-sampling class imbalanced datasets by combining clustering analysis and instance selection,” Inf. Sci. (Ny)., 2019, doi: 10.1016/j.ins.2018.10.029.

[21] I. C. Irsan and M. L. Khodra, “Hierarchical multi-label news article classification with distributed semantic model based features,” Int. J. Adv. Intell. Informatics, 2019, doi: 10.26555/ijain.v5i1.168.

[22] A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit., 2019, doi: 10.1016/j.patcog.2019.02.023.

[23] J. Alcalá-Fdez et al., “KEEL: A software tool to assess evolutionary algorithms for data mining problems,” Soft Comput., 2009, doi: 10.1007/s00500-008-0323-y.

[24] Hartono, O. S. Sitompul, E. B. Nababan, Tulus, D. Abdullah, and A. S. Ahmar, “A new diversity technique for imbalance learning ensembles,” Int. J. Eng. Technol., 2018, doi: 10.14419/ijet.v7i2.11251.

[25] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., 2002, doi: 10.1613/jair.953.

[26] J. F. Díez-Pastor, J. J. Rodríguez, C. I. García-Osorio, and L. I. Kuncheva, “Diversity techniques improve the performance of the best imbalance learning ensembles,” Inf. Sci. (Ny)., 2015, doi: 10.1016/j.ins.2015.07.025.

[27] L. I. Kuncheva, Combining Pattern Classifiers, 2004, doi: 10.1002/0471660264.

[28] G. U. Yule, “VII. On the association of attributes in statistics: with illustrations from the material of the childhood society, &c,” Philos. Trans. R. Soc. London. Ser. A, Contain. Pap. a Math. or Phys. Character, vol. 194, no. 252–261, pp. 257–319, Jan. 1900, doi: 10.1098/rsta.1900.0019.

[29] A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: A review,” Int. J. Adv. Soft Comput. its Appl., 2015.

[30] R. Soleymani, E. Granger, and G. Fumera, “Progressive boosting for class imbalance and its application to face re-identification,” Expert Syst. Appl., vol. 101, pp. 271–291, Jul. 2018, doi: 10.1016/j.eswa.2018.01.023.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by Informatics Department - Universitas Ahmad Dahlan, and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: ijain@uad.ac.id (paper handling issues)
    info@ijain.org, andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0