Create a sparse, clustered graph from a distance matrix

This function is the set-up function for precise_viz. It creates and clusters a sparsified t-SNE graph through the following gerneral steps. See details for more specific information.

Run several iterations of t-SNE on the input data.
Compute the random forest similarity for each t-SNE iteration output.
Fuse the random forest similarities using distatis.
Sparsify the fused matrix using Sparsify.matrix(k = perplexity).
Cluster the graph using cluster_louvain.

precise_graph(data, method = 1, distance = TRUE, n_neighbors = 15,
  spread = 1, min_dist = 0.01, bandwidth = 1, parallel = FALSE,
  verbose = FALSE)

Arguments

data	A square, numeric dataframe, matrix or tibble.
verbose	TRUE or FALSE. Should the function tell you what is happening internally?
perplexity	A positive integer that loosely equates to the number of nearest neighboors. See details.
theta	A positive numeric value between 0-1 that toggles between exact t-SNE and Barnes-Hut t-SNE. See details.
max_iter	A positive integer for the number of iterations. See details.
cores	An integer value equal to 1 or greater for the number of computer cores to use.

Value

A list with four objects: tsne_dist = a matrix of the the distatis fusion of random forest similarities computed the t-SNE results, tsne_d2 = a tibble with the 2D t-SNE results, tsne_d3 = a tibble with the 3D t-SNE results, precise_clusters = a tibble of the louvain clustering.

Details

This function has a lot going on underneath the hood, and some of the parameters which may seem familiar to many users are being used in ways one might not expect:

perplexity sets both the perplexity paramater for Rtsne as well as k for Sparsify.matrix.
theta toggles between two versions of t-SNE. If theta = 0.0, a modified version of tsne is run where instead of using the perplexity parameter Sparsify.matrix(k = perplexity) is run on the input matrix. If theta > 0.0, Rtsne is run.
max_iter sets both the number of iterations for t-SNE and the number of trees for randomForest.
cores has a maximum useful setting of 4.

References

Muchmore, B., Muchmore P. and Alarcón-Riquelme ME. (2018). Optimal Distance Matrix Construction with PreciseDist and PreciseGraph.

Jesse H. Krijthe (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation, URL: https://github.com/jkrijthe/Rtsne

Justin Donaldson (2016). tsne: T-Distributed Stochastic Neighbor Embedding for R (t-SNE). R package version 0.1-3. https://CRAN.R-project.org/package=tsne

Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.org

Examples

# NOT RUN {
library(PreciseDist)

test_matrix <- replicate(10, rnorm(50))

test_distances <- test_matrix %>%
  precise_dist(dists = c("euclidean", "manhattan"))

test_fusion <- test_distances %>%
  precise_fusion(fusion = "distatis", verbose = TRUE)

test_graph <- test_fusion %>%
  precise_graph(perplexity = 5, theta = 0.5, max_iter = 1000, cores = 1, verbose = TRUE)
# }