Tree-based mining contrast subspace

(1) * Florence Sia Mail (Knowledge Technology Research Unit, Univeristi Malaysia Sabah, Malaysia)
(2) Rayner Alfred Mail (Knowledge Technology Research Unit, Univeristi Malaysia Sabah, Malaysia)
*corresponding author


All existing mining contrast subspace methods employ density-based likelihood contrast scoring function to measure the likelihood of a query object to a target class against other class in a subspace. However, the density tends to decrease when the dimensionality of subspaces increases causes its bounds to identify inaccurate contrast subspaces for the given query object. This paper proposes a novel contrast subspace mining method that employs tree-based likelihood contrast scoring function which is not affected by the dimensionality of subspaces. The tree-based scoring measure recursively binary partitions the subspace space in the way that objects belong to the target class are grouped together and separated from objects belonging to other class. In contrast subspace, the query object should be in a group having a higher number of objects of the target class than other class. It incorporates the feature selection approach to find a subset of one-dimensional subspaces with high likelihood contrast score with respect to the query object. Therefore, the contrast subspaces are then searched through the selected subset of one-dimensional subspaces. An experiment is conducted to evaluate the effectiveness of the tree-based method in terms of classification accuracy. The experiment results show that the proposed method has higher classification accuracy and outperform the existing method on several real-world data sets.



Article metrics

Abstract views : 1014 | PDF views : 215




Full Text



[1] X. H. Dang, B. Micenková, I. Assent, and R. T. Ng, “Local Outlier Detection with Interpretation,” 2013, pp. 304–320, doi: 10.1007/978-3-642-40994-3_20.

[2] T. Sellam and M. Kersten, “Fast, Explainable View Detection to Characterize Exploration Queries,” in Proceedings of the 28th International Conference on Scientific and Statistical Database Management - SSDBM ’16, 2016, pp. 1–12, doi: 10.1145/2949689.2949692.

[3] G. Manco, E. Ritacco, P. Rullo, L. Gallucci, W. Astill, D. Kimber, and M. Antonelli, “Fault detection and explanation through big data analysis on sensor streams,” Expert Syst. Appl., vol. 87, pp. 141–156, Nov. 2017, doi: 10.1016/j.eswa.2017.05.079.

[4] M. A. Siddiqui, A. Fern, T. G. Dietterich, and W.-K. Wong, “Sequential Feature Explanations for Anomaly Detection,” ACM Trans. Knowl. Discov. Data, vol. 13, no. 1, pp. 1–22, Jan. 2019, doi: 10.1145/3230666.

[5] F. Angiulli, F. Fassetti, and L. Palopoli, “Detecting outlying properties of exceptional objects,” ACM Trans. Database Syst., vol. 34, no. 1, pp. 1–62, Apr. 2009, doi: 10.1145/1508857.1508864.

[6] L. Duan, G. Tang, J. Pei, J. Bailey, A. Campbell, and C. Tang, “Mining outlying aspects on numeric data,” Data Min. Knowl. Discov., vol. 29, no. 5, pp. 1116–1151, Sep. 2015, doi: 10.1007/s10618-014-0398-2.

[7] L. Duan, G. Tang, J. Pei, J. Bailey, G. Dong, A. Campbell, and C. Tang, “Mining Contrast Subspaces,” 2014, pp. 249–260, doi: 10.1007/978-3-319-06608-0_21.

[8] L. Duan, , G. Tang, J. Pei, J. Bailey, G. Dong, V. Nguyen, and C. Tang, “Efficient discovery of contrast subspaces for object explanation and characterization,” Knowl. Inf. Syst., vol. 47, no. 1, pp. 99–129, Apr. 2016, doi: 10.1007/s10115-015-0835-6.

[9] A. Zimek, E. Schubert, and H.-P. Kriegel, “A survey on unsupervised outlier detection in high-dimensional numerical data,” Stat. Anal. Data Min., vol. 5, no. 5, pp. 363–387, Oct. 2012, doi: 10.1002/sam.11161.

[10] I. Assent, R. Krieger, E. Müller, and T. Seidl, “DUSC: Dimensionality Unbiased Subspace Clustering,” in Seventh IEEE International Conference on Data Mining (ICDM 2007), 2007, pp. 409–414, doi: 10.1109/ICDM.2007.49.

[11] S. B. Kotsiantis, “Decision trees: a recent overview,” Artif. Intell. Rev., vol. 39, no. 4, pp. 261–283, Apr. 2013, doi: 10.1007/s10462-011-9272-4.

[12] C. C. Aggarwal, “Data Classification,” Data mining: the textbook, Springer, 2015, pp. 285–344, doi: 10.1007/978-3-319-14142-8_10.

[13] J. R. Quinlan, “Learning decision tree classifiers,” ACM Comput. Surv., vol. 28, no. 1, pp. 71–72, Mar. 1996, doi: 10.1145/234313.234346.

[14] Z.-H. Zhou, K.-J. Chen, and H.-B. Dai, “Enhancing relevance feedback in image retrieval using unlabeled data,” ACM Trans. Inf. Syst., vol. 24, no. 2, pp. 219–244, Apr. 2006, doi: 10.1145/1148020.1148023.

[15] F. Laguzet, A. Romero, M. Gouiffès, L. Lacassagne, and D. Etiemble, “Color tracking with contextual switching: real-time implementation on CPU,” J. Real-Time Image Process., vol. 10, no. 2, pp. 403–422, Jun. 2015, doi: 10.1007/s11554-013-0358-x.

[16] V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos, Feature Selection for High-Dimensional Data, 2015, doi:

[17] G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Comput. Electr. Eng., vol. 40, no. 1, pp. 16–28, Jan. 2014, doi: 10.1016/j.compeleceng.2013.11.024.

[18] J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, “Feature Selection: A Data Perspective,” ACM Comput. Surv., vol. 50, no. 6, pp. 1–45, Dec. 2017, doi: 10.1145/3136625.

[19] J. Tang and H. Liu, “Feature Selection for Social Media Data,” ACM Trans. Knowl. Discov. Data, vol. 8, no. 4, pp. 1–27, Oct. 2014, doi: 10.1145/2629587.

[20] N. Bhargava, G. Sharma, R. Bhargava, and M. Mathuria, “Decision tree analysis on j48 algorithm for data mining,” Proc. Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 3, no. 6, 2013, available at: Google Scholar.

[21] Z. Yu, F. Haghighat, B. C. M. Fung, and H. Yoshino, “A decision tree method for building energy demand modeling,” Energy Build., vol. 42, no. 10, pp. 1637–1646, Oct. 2010, doi: 10.1016/j.enbuild.2010.04.006.

[22] D. Lavanya and K. U. Rani, “Performance Evaluation of Decision Tree Classifiers on Medical Datasets,” Int. J. Comput. Appl., vol. 26, no. 4, pp. 1–4, Jul. 2011, doi: 10.5120/3095-4247.

[23] E. Venkatesan and T. Velmurugan, “Performance Analysis of Decision Tree Algorithms for Breast Cancer Classification,” Indian J. Sci. Technol., vol. 8, no. 29, Nov. 2015, doi: 10.17485/ijst/2015/v8i1/84646.

[24] C. Blake, “UCI repository of machine learning databases,” 1998, available at:

[25] B. Micenkova, R. T. Ng, X.-H. Dang, and I. Assent, “Explaining Outliers by Subspace Separability,” in 2013 IEEE 13th International Conference on Data Mining, 2013, pp. 518–527, doi: 10.1109/ICDM.2013.132.

[26] J. Ali, R. Khan, N. Ahmad, and I. Maqsood, “Random forests and decision trees,” Int. J. Comput. Sci. Issues, vol. 9, no. 5, p. 272, 2012, available at: Google Scholar.

[27] I. Rish, “An empirical study of the naive Bayes classifier,” in IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, vol. 3, no. 22, pp. 41–46, available at: Google Scholar.

[28] S. R. Gunn, “Support vector machines for classification and regression,” ISIS Tech. Rep., vol. 14, no. 1, pp. 5–16, 1998, available at: Google Scholar.

[29] H. V. Nguyen, V. Gopalkrishnan, and I. Assent, “An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data,” 2011, pp. 138–152, doi: 10.1007/978-3-642-20149-3_12.

[30] I. Assent, R. Krieger, E. Müller, and T. Seidl, “EDSC: efficient density-based subspace clustering,” in Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM ’08, 2008, p. 1093, doi: 10.1145/1458082.1458227.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
E: (paper handling issues) (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0