This function is the set-up function for precise_viz. It creates and clusters a sparsified t-SNE graph through the following gerneral steps. See details for more specific information.

  1. Run several iterations of t-SNE on the input data.

  2. Compute the random forest similarity for each t-SNE iteration output.

  3. Fuse the random forest similarities using distatis.

  4. Sparsify the fused matrix using Sparsify.matrix(k = perplexity).

  5. Cluster the graph using cluster_louvain.

precise_cluster(data, cluster_alg = "louvain", parallel = FALSE,
  verbose = FALSE)

Arguments

data

A square, numeric dataframe, matrix or tibble.

verbose

TRUE or FALSE. Should the function tell you what is happening internally?

perplexity

A positive integer that loosely equates to the number of nearest neighboors. See details.

theta

A positive numeric value between 0-1 that toggles between exact t-SNE and Barnes-Hut t-SNE. See details.

max_iter

A positive integer for the number of iterations. See details.

cores

An integer value equal to 1 or greater for the number of computer cores to use.

Value

A list with four objects: tsne_dist = a matrix of the the distatis fusion of random forest similarities computed the t-SNE results, tsne_d2 = a tibble with the 2D t-SNE results, tsne_d3 = a tibble with the 3D t-SNE results, precise_clusters = a tibble of the louvain clustering.

Details

This function has a lot going on underneath the hood, and some of the parameters which may seem familiar to many users are being used in ways one might not expect:

  • perplexity sets both the perplexity paramater for Rtsne as well as k for Sparsify.matrix.

  • theta toggles between two versions of t-SNE. If theta = 0.0, a modified version of tsne is run where instead of using the perplexity parameter Sparsify.matrix(k = perplexity) is run on the input matrix. If theta > 0.0, Rtsne is run.

  • max_iter sets both the number of iterations for t-SNE and the number of trees for randomForest.

  • cores has a maximum useful setting of 4.

References

Muchmore, B., Muchmore P. and Alarcón-Riquelme ME. (2018). Optimal Distance Matrix Construction with PreciseDist and PreciseGraph.

Jesse H. Krijthe (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation, URL: https://github.com/jkrijthe/Rtsne

Justin Donaldson (2016). tsne: T-Distributed Stochastic Neighbor Embedding for R (t-SNE). R package version 0.1-3. https://CRAN.R-project.org/package=tsne

Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.org

Examples

# NOT RUN {
library(PreciseDist)

test_matrix <- replicate(10, rnorm(50))

test_distances <- test_matrix %>%
  precise_dist(dists = c("euclidean", "manhattan"))

test_fusion <- test_distances %>%
  precise_fusion(fusion = "distatis", verbose = TRUE)

test_graph <- test_fusion %>%
  precise_graph(perplexity = 5, theta = 0.5, max_iter = 1000, cores = 1, verbose = TRUE)
# }