Assessing and quantifying clusteredness: The OPTICS Cordillera

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Authors Thomas Rusch, Kurt Hornik, Patrick Mair
Journal/Conference Name Journal of Computational and Graphical Statistics
Paper Category
Paper Abstract ABSTRACTThis article provides a framework for assessing and quantifying “clusteredness” of a data representation. Clusteredness is a global univariate property defined as a layout diverging from equidistance of points to the closest neighboring point set. The OPTICS algorithm encodes the global clusteredness as a pair of clusteredness-representative distances and an algorithmic ordering. We use this to construct an index for quantification of clusteredness, coined the OPTICS Cordillera, as the norm of subsequent differences over the pair. We provide lower and upper bounds and a normalization for the index. We show the index captures important aspects of clusteredness such as cluster compactness, cluster separation, and number of clusters simultaneously. The index can be used as a goodness-of-clusteredness statistic, as a function over a grid or to compare different representations. For illustration, we apply our suggestion to dimensionality reduced 2D representations of Californian counties with respect t...
Date of publication 2018
Code Programming Language R
Comment

Copyright Researcher 2021