“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.”

-John Tukey, The future of data analysis, p 13

Introduction

This vignette serves as a code repository for clustering algorithms that take distances or similarities as input. To be clear, it is not a referendum on which clustering algorithm is best because there is no such thing as a best distance or best clustering algorithm or best validation method. Every clustering problem is a domain-specific problem that needs patience, iteration and domain-expertise to acquire usable results.

With that out of the way, please feel free to recommend clustering algorithms we may have missed by lodging an issue at https://github.com/bmuchmore/PreciseDist/issues

Data set-up

Data and set-up comes from the Cell Cycle Vignette - Experiment 5: Minkowski 100x. See that vignette for more details.

Now that we have the graph, we will extract the distance, and call precise_transform() to ensure that it is, in fact, in distance format. Please note though that some functions require similarities. With those functions, we coax the distance into a similarity using proxy::proxy::pr_dist2simil():

Hierarchical Clustering

K-Means Clustering

DBSCAN (Density-based spatial clustering of applications with noise)

Hierarchical DBSCAN

DIvisive ANAlysis Clustering

Partitioning Around Medoids

Affinity Propagation

Affinity Propagation for Pre-defined Number of Clusters

Spectral Clustering