Medoid-based shadow value validation and visualization

Weksi Budiaji

doi:10.26555/ijain.v5i2.326


Medoid-based shadow value validation and visualization

^{(1) *} Weksi Budiaji

(Sultan Ageng Tirtayasa University, Indonesia; University of Natural Resources and Life Sciences, Austria)
^*corresponding author

Abstract

A silhouette index is a well-known measure of an internal criteria validation for the clustering algorithm results. While it is a medoid-based validation index, a centroid-based validation index that is called a centroid-based shadow value (CSV) has been developed.Â Although both are similar, the CSV has an additional unique property where an image of a 2-dimensional neighborhood graph is possible. A new internal validation index is proposed in this article in order to create a medoid-based validation that has an ability to visualize the results in a 2-dimensional plot. The proposed index behaves similarly to the silhouette index and produces a network visualization, which is comparable to the neighborhood graph of the CSV. The network visualization has a multiplicative parameter (c) to adjust its edges visibility. Due to the medoid-based, in addition, it is more an appropriate visualization technique for any type of data than a neighborhood graph of the CSV.

Keywords

Cluster validation; Cluster visualization; Internal criteria; Medoid; Shadow value

DOI

https://doi.org/10.26555/ijain.v5i2.326

Article metrics

Abstract views : 2445 | PDF views : 470

Cite

How to cite item

Full Text

Download

References

[1] A.R. Webb and K. Copsey, Statistical Pattern Recognition, 3rd ed. West Sussex, UK: John Wiley and Sons, 2011, doi: 10.1002/9781119952954.

[2] A.K. Jain and J. V. Moreau, â€œBootstrap Technique in Cluster Analysis,â€ Pattern Recognit., vol. 20, pp. 547â€“568, 1987, doi: 10.1016/0031-3203(87)90081-1 .

[3] Y. Fang and J. Wang, â€œSelection of the number of clusters via the bootstrap method,â€ Comput. Stat. Data Anal., vol. 56, no. 1, pp. 468â€“477, 2012, doi: 10.1016/j.csda.2011.09.003.

[4] S. Monti, P. Tamayo, J. Mesirov, and T. Golub, â€œConsensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data,â€ Mach. Learn., vol. 52, pp. 91â€“118, 2003, doi: 10.1023/A:1023949509487.

[5] J. Handl, J. Knowles, and D. B. Kell, â€œComputational cluster validation in post-genomic data analysis,â€ Bioinformatics, vol. 21, no. 15, pp. 3201â€“3212, 2005, doi: 10.1093/bioinformatics/bti517.

[6] J. Ji, T. Bai, C. Zhou, C. Ma, and Z. Wang, â€œAn improved k-prototypes clustering algorithm for mixed numeric and categorical data,â€ Neurocomputing, vol. 120, pp. 590â€“596, 2013, doi: 10.1016/j.neucom.2013.04.011.

[7] X. Wu et al., â€œTop 10 algorithms in data mining,â€ Knowl. Inf. Syst., vol. 14, no. 1, pp. 1â€“37, 2008, doi: 10.1007/s10115-007-0114-2.

[8] K. Waiyamai and T. Kangkachit, â€œConstraint-based discriminative dimension selection for high-dimensional stream clustering,â€ Int. J. Adv. Intell. Informatics, vol. 4, no. 3, pp. 167â€“179, Nov. 2018, doi: 10.26555/ijain.v4i3.271.

[9] O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J.M. Perez, and I. Perona, â€œAn extensive comparative study of cluster validity indices,â€ Pattern Recognit., vol. 46, no. 1, pp. 243â€“256, 2013, doi: 10.1016/j.patcog.2012.07.021.

[10] W. M. Rand, â€œObjective Criteria for the Evaluation of Clustering Methods,â€ J. Am. Stat. Assoc., vol. 66, no. 336, pp. 846â€“850, 1971, doi: 10.1080/01621459.1971.10482356.

[11] L. Hubert and P. Arabie, â€œComparing Partitions,â€ J. Classif., vol. 2, no. 1, pp. 193â€“218, 1985, doi: 10.1007/BF01908075.

[12] M. Charrad, N. Ghazzali, V. Boiteau, and A. Niknafs, â€œNbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set,â€ J. Stat. Softw., vol. 61, no. 6, pp. 1â€“36, 2014, doi: 10.18637/jss.v061.i06.

[13] P. J. Rousseeuw, â€œSilhouettes: a graphical aid to the interpretation and validation of cluster analysis,â€ J. Comput. Appl. Math., vol. 20, pp. 53â€“65, 1987, doi: 10.1016/0377-0427(87)90125-7.

[14] R. Tibshirani, G. Walther, and T. Hastie, â€œEstimating the Number of Clusters in a Data Set Via the Gap Statistic,â€ J. R. Stat. Soc. B, vol. 63, no. 2, pp. 411â€“423, 2001, doi: 10.1111/1467-9868.00293.

[15] F. Leisch, â€œHandbook of Data Visualization,â€ Chen, Hardle, and A. Unwin, Eds. Springer Verlag, 2008, pp. 561â€“587, doi: 10.1007/978-3-540-33037-0_22.

[16] G. Brock, V. Pihur, S. Datta, and S. Datta, â€œclValid: An R Package for Cluster Validation,â€ J. Stat. Softw., vol. 25, no. 4, 2008, doi: 10.18637/jss.v025.i04.

[17] F. Leisch, â€œA toolbox for K-centroids cluster analysis,â€ Comput. Stat. Data Anal., vol. 51, pp. 526â€“544, 2006, doi: 10.1016/j.csda.2005.10.006.

[18] F. Leisch, â€œNeighborhood graphs, stripes and shadow plots for cluster visualization,â€ Stat. Comput., vol. 20, pp. 457â€“469, 2010, doi: 10.1007/s11222-009-9137-8.

[19] G. D. Battista, P. Eades, R. Tamassia, and I. G. Tollis, â€œAlgorithm for drawing graphs: An annotated bibliography,â€ Comput. Geom., vol. 4, no. 235â€“282, 1994, doi: 10.1016/0925-7721(94)00014-X.

[20] T. Kamada and S. Kawai, â€œAn Algorithm for Drawing General Undirected Graphs,â€ Inf. Process. Lett., vol. 31, pp. 7â€“15, Apr. 1989, doi: 10.1016/0020-0190(89)90102-6.

[21] T. M. Fruchterman and E. M. Reingold, â€œGraph Drawing by Force-directed Placement,â€ Software-Practice Exp., vol. 21, no. 11, pp. 1129â€“1164, Nov. 1991, doi: 10.1002/spe.4380211102.

[22] Qiu and H. Joe, â€œGeneration of Random Clusters with Specified Degree of Separation,â€ J. Classif., vol. 23, pp. 315â€“34, 2006, doi: 10.1007/s00357-006-0018-y.

[23] W. Qiu and H. Joe, â€œSeparation Index and Partial Membership for Clustering,â€ Comput. Stat. Data Anal., vol. 50, no. 3, pp. 585â€“603, 2006, doi: 10.1016/j.csda.2004.09.009.

[24] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data. New York, USA: John Wiley and Sons, 1990, doi: 10.1002/9780470316801.

[25] C. Hennig, â€œCluster-wise Assement of Cluster Stability,â€ Comput. Stat. Data Anal., vol. 52, pp. 258â€“271, 2007, doi: 10.1016/j.csda.2006.11.025.

[26] M. Lichman, UCI Machine Learning Repository. 2013, available at: http://archive.ics.uci.edu/ml.

[27] R Core Team, R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2015, available at: https://www.r-project.org/.

[28] W. Qiu and H. Joe, clusterGeneration: Random Cluster Generation (with Specified Degree of Separation). R package version 1.3.4. 2015, available at: https://CRAN.R-project.org/package=clusterGeneration.

[29] M. Maechler, P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik, cluster: Cluster Analysis Basics and Extensions. R package version 2.0.6 --- For new features, see the â€œChangelogâ€ file (in the package source). 2017, available at: https://cran.r-project.org/package=cluster.

[30] W. Budiaji, kmed: Distance-Based k-Medoids. R package version 0.2.0. 2019, available at: https://cran.r-project.org/package=kmed.

[31] H. Wickham, ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag, 2016, doi: 10.1007/978-3-319-24277-4_9.

[32] S. Tyner and H. Hofmann, geomnet: Network Visualization in the â€œggplot2â€ Framework. R package version 0.2.0. 2016, available at: https://cran.r-project.org/package=geomnet.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me