Tree-based mining contrast subspace

Florence Sia; Rayner Alfred

doi:10.26555/ijain.v5i2.359


Tree-based mining contrast subspace

^{(1) *} Florence Sia

(Knowledge Technology Research Unit, Univeristi Malaysia Sabah, Malaysia)
⁽²⁾ Rayner Alfred

(Knowledge Technology Research Unit, Univeristi Malaysia Sabah, Malaysia)
^*corresponding author

Abstract

All existing mining contrast subspace methods employ density-based likelihood contrast scoring function to measure the likelihood of a query object to a target class against other class in a subspace. However, the density tends to decrease when the dimensionality of subspaces increases causes its bounds to identify inaccurate contrast subspaces for the given query object. This paper proposes a novel contrast subspace mining method that employs tree-based likelihood contrast scoring function which is not affected by the dimensionality of subspaces. The tree-based scoring measure recursively binary partitions the subspace space in the way that objects belong to the target class are grouped together and separated from objects belonging to other class. In contrast subspace, the query object should be in a group having a higher number of objects of the target class than other class. It incorporates the feature selection approach to find a subset of one-dimensional subspaces with high likelihood contrast score with respect to the query object. Therefore, the contrast subspaces are then searched through the selected subset of one-dimensional subspaces. An experiment is conducted to evaluate the effectiveness of the tree-based method in terms of classification accuracy. The experiment results show that the proposed method has higher classification accuracy and outperform the existing method on several real-world data sets.

DOI

https://doi.org/10.26555/ijain.v5i2.359

Article metrics

Abstract views : 2125 | PDF views : 255

Cite

How to cite item

Full Text

Download

References

[1] X. H. Dang, B. MicenkovÃ¡, I. Assent, and R. T. Ng, â€œLocal Outlier Detection with Interpretation,â€ 2013, pp. 304â€“320, doi: 10.1007/978-3-642-40994-3_20.

[2] T. Sellam and M. Kersten, â€œFast, Explainable View Detection to Characterize Exploration Queries,â€ in Proceedings of the 28th International Conference on Scientific and Statistical Database Management - SSDBM â€™16, 2016, pp. 1â€“12, doi: 10.1145/2949689.2949692.

[3] G. Manco, E. Ritacco, P. Rullo, L. Gallucci, W. Astill, D. Kimber, and M. Antonelli, â€œFault detection and explanation through big data analysis on sensor streams,â€ Expert Syst. Appl., vol. 87, pp. 141â€“156, Nov. 2017, doi: 10.1016/j.eswa.2017.05.079.

[4] M. A. Siddiqui, A. Fern, T. G. Dietterich, and W.-K. Wong, â€œSequential Feature Explanations for Anomaly Detection,â€ ACM Trans. Knowl. Discov. Data, vol. 13, no. 1, pp. 1â€“22, Jan. 2019, doi: 10.1145/3230666.

[5] F. Angiulli, F. Fassetti, and L. Palopoli, â€œDetecting outlying properties of exceptional objects,â€ ACM Trans. Database Syst., vol. 34, no. 1, pp. 1â€“62, Apr. 2009, doi: 10.1145/1508857.1508864.

[6] L. Duan, G. Tang, J. Pei, J. Bailey, A. Campbell, and C. Tang, â€œMining outlying aspects on numeric data,â€ Data Min. Knowl. Discov., vol. 29, no. 5, pp. 1116â€“1151, Sep. 2015, doi: 10.1007/s10618-014-0398-2.

[7] L. Duan, G. Tang, J. Pei, J. Bailey, G. Dong, A. Campbell, and C. Tang, â€œMining Contrast Subspaces,â€ 2014, pp. 249â€“260, doi: 10.1007/978-3-319-06608-0_21.

[8] L. Duan, , G. Tang, J. Pei, J. Bailey, G. Dong, V. Nguyen, and C. Tang, â€œEfficient discovery of contrast subspaces for object explanation and characterization,â€ Knowl. Inf. Syst., vol. 47, no. 1, pp. 99â€“129, Apr. 2016, doi: 10.1007/s10115-015-0835-6.

[9] A. Zimek, E. Schubert, and H.-P. Kriegel, â€œA survey on unsupervised outlier detection in high-dimensional numerical data,â€ Stat. Anal. Data Min., vol. 5, no. 5, pp. 363â€“387, Oct. 2012, doi: 10.1002/sam.11161.

[10] I. Assent, R. Krieger, E. MÃ¼ller, and T. Seidl, â€œDUSC: Dimensionality Unbiased Subspace Clustering,â€ in Seventh IEEE International Conference on Data Mining (ICDM 2007), 2007, pp. 409â€“414, doi: 10.1109/ICDM.2007.49.

[11] S. B. Kotsiantis, â€œDecision trees: a recent overview,â€ Artif. Intell. Rev., vol. 39, no. 4, pp. 261â€“283, Apr. 2013, doi: 10.1007/s10462-011-9272-4.

[12] C. C. Aggarwal, â€œData Classification,â€ Data mining: the textbook, Springer, 2015, pp. 285â€“344, doi: 10.1007/978-3-319-14142-8_10.

[13] J. R. Quinlan, â€œLearning decision tree classifiers,â€ ACM Comput. Surv., vol. 28, no. 1, pp. 71â€“72, Mar. 1996, doi: 10.1145/234313.234346.

[14] Z.-H. Zhou, K.-J. Chen, and H.-B. Dai, â€œEnhancing relevance feedback in image retrieval using unlabeled data,â€ ACM Trans. Inf. Syst., vol. 24, no. 2, pp. 219â€“244, Apr. 2006, doi: 10.1145/1148020.1148023.

[15] F. Laguzet, A. Romero, M. GouiffÃ¨s, L. Lacassagne, and D. Etiemble, â€œColor tracking with contextual switching: real-time implementation on CPU,â€ J. Real-Time Image Process., vol. 10, no. 2, pp. 403â€“422, Jun. 2015, doi: 10.1007/s11554-013-0358-x.

[16] V. BolÃ³n-Canedo, N. SÃ¡nchez-MaroÃ±o, and A. Alonso-Betanzos, Feature Selection for High-Dimensional Data, 2015, doi: https://doi.org/10.1007/978-3-319-21858-8.

[17] G. Chandrashekar and F. Sahin, â€œA survey on feature selection methods,â€ Comput. Electr. Eng., vol. 40, no. 1, pp. 16â€“28, Jan. 2014, doi: 10.1016/j.compeleceng.2013.11.024.

[18] J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, â€œFeature Selection: A Data Perspective,â€ ACM Comput. Surv., vol. 50, no. 6, pp. 1â€“45, Dec. 2017, doi: 10.1145/3136625.

[19] J. Tang and H. Liu, â€œFeature Selection for Social Media Data,â€ ACM Trans. Knowl. Discov. Data, vol. 8, no. 4, pp. 1â€“27, Oct. 2014, doi: 10.1145/2629587.

[20] N. Bhargava, G. Sharma, R. Bhargava, and M. Mathuria, â€œDecision tree analysis on j48 algorithm for data mining,â€ Proc. Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 3, no. 6, 2013, available at: Google Scholar.

[21] Z. Yu, F. Haghighat, B. C. M. Fung, and H. Yoshino, â€œA decision tree method for building energy demand modeling,â€ Energy Build., vol. 42, no. 10, pp. 1637â€“1646, Oct. 2010, doi: 10.1016/j.enbuild.2010.04.006.

[22] D. Lavanya and K. U. Rani, â€œPerformance Evaluation of Decision Tree Classifiers on Medical Datasets,â€ Int. J. Comput. Appl., vol. 26, no. 4, pp. 1â€“4, Jul. 2011, doi: 10.5120/3095-4247.

[23] E. Venkatesan and T. Velmurugan, â€œPerformance Analysis of Decision Tree Algorithms for Breast Cancer Classification,â€ Indian J. Sci. Technol., vol. 8, no. 29, Nov. 2015, doi: 10.17485/ijst/2015/v8i1/84646.

[24] C. Blake, â€œUCI repository of machine learning databases,â€ 1998, available at: http://www.ics.uci.edu/~mlearn/MLRepository.html.

[25] B. Micenkova, R. T. Ng, X.-H. Dang, and I. Assent, â€œExplaining Outliers by Subspace Separability,â€ in 2013 IEEE 13th International Conference on Data Mining, 2013, pp. 518â€“527, doi: 10.1109/ICDM.2013.132.

[26] J. Ali, R. Khan, N. Ahmad, and I. Maqsood, â€œRandom forests and decision trees,â€ Int. J. Comput. Sci. Issues, vol. 9, no. 5, p. 272, 2012, available at: Google Scholar.

[27] I. Rish, â€œAn empirical study of the naive Bayes classifier,â€ in IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, vol. 3, no. 22, pp. 41â€“46, available at: Google Scholar.

[28] S. R. Gunn, â€œSupport vector machines for classification and regression,â€ ISIS Tech. Rep., vol. 14, no. 1, pp. 5â€“16, 1998, available at: Google Scholar.

[29] H. V. Nguyen, V. Gopalkrishnan, and I. Assent, â€œAn Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data,â€ 2011, pp. 138â€“152, doi: 10.1007/978-3-642-20149-3_12.

[30] I. Assent, R. Krieger, E. MÃ¼ller, and T. Seidl, â€œEDSC: efficient density-based subspace clustering,â€ in Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM â€™08, 2008, p. 1093, doi: 10.1145/1458082.1458227.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me