Generally, one should skip this function before running precise_fusion on the results of precise_dist for the first time. It can be useful, however, to experiment with different transformations to the data, which can dramatically alter the final results (for better or worse). Any transformations applied will be applied to every element of the input data.

precise_transform(data, remove_dups = FALSE, filter_string = NULL,
  keep_string = NULL, add_prefix = NULL, add_suffix = NULL,
  enforce_dist = FALSE, enforce_sim = FALSE, transform = "none",
  diagonal = NULL, binary_matrix = NULL, min_k = NULL, fixed_k = NULL,
  affinity_matrix = NULL, add_noise = NULL, remove_errors = FALSE,
  return_list = FALSE, return_df = FALSE, parallel = FALSE)

Arguments

data

Either a list of distance matrices or the native output of precise_dist.

filter_string

NULL or a string value that greedily filters distances found in the input.

keep_string

NULL or a string value that greedily keeps distances found in the input.

enforce_dist

TRUE or FALSE. Should all non-distances be coereced into a distance matrix?

enforce_sim

TRUE or FALSE. Should all non-similarities be coereced into a similarity matrix?

transform

NULL or a string value. Possible choices include "none", "range01", "center", "scale", "center_scale", "chua", "laplacian", "prob" and "binorm".

binary_matrix

NULL, a numeric value or a string value. If numeric, a binary matrix is obtained by thresholding using raw values. If string, a binary matrix is obtained by thresholding using quantiles.

min_k

NULL or an integer value guranteeing a minimum of k edges for each observation (row).

fixed_k

NULL or an integer value guranteeing exactly k edges for each observation (row). The resulting matrices may not symmetric.

affinity_matrix

NULL or an integer representing the number of nearest neighbors used to compute an affinity matrix from a distance matrix.

return_list

TRUE or FALSE. Should the precise_dist output format be coerced into a named list?

return_df

TRUE or FALSE. Should a named list be coerced into the precise_dist output format?

cores

An integer value equal to 1 or greater for the number of computer cores to use.

Value

A data object (list or precise_dist dataframe) with applied transformations.

Details

It is important to note that the order of operations is run in the same order as the arguments list, thus filter_string is run before enforce_dist, which is run before min_k. If you need to run the operations in a different order, simply call precise_transform multiple times (see examples). The following are notes concerning the different parameter choices:

  • cores Note that setting cores > 1 is only applied if transform = "chua" because most of the transformations are computationally inexpensive.

  • filter_string This is a greedy string search so c("manha", "euc") would return "manhattan" and "euclidean".

  • keep_string This is a greedy string search so c("manha", "euc") would filter "manhattan" and "euclidean".

  • enforce_dist This argument checks to see if data1, 1 < data2, 1 and applies 1 - abs(data) if TRUE. Note this could fail if data1, 1 = data2, 1.

  • enforce_sim This argument checks to see if data1, 1 > data2, 1 and applies 1/(1 + x) if TRUE. Note this could fail if data1, 1 = data[2, 1

  • transform The following are the available transformation functions:

  • binary_matrix = Binary.matrix.by.thresh from the NetPreProc package. When the input value is a string, it is coereced to a numeric and passed as the probs argument to quantile from the stats package, which is subsequently passed as the thresh argument to Binary.matrix.by.thresh.

  • min_k = Sparsify.matrix from the NetPreProc package.

  • fixed_k = Sparsify.matrix.fixed.neighbours from the NetPreProc package.

  • affinity_matrix = affinityMatrix from the SNFtool package. Sigma is set automatically as the median value of sigest from the kernlab package.

  • return_list Note that the Time_Taken_Seconds column information returned by precise_dist will be dropped.

  • return_df Note that the Time_Taken_Seconds column information returned by precise_dist will all be NA.

References

Muchmore, B., Muchmore P. and Alarcón-Riquelme ME. (2018). Optimal Distance Matrix Construction with PreciseDist and PreciseGraph.

Giorgio Valentini and Matteo Re -- Universita' degli Studi di Milano (2015). NetPreProc: Network Pre-Processing and Normalization. R package version 1.1. https://CRAN.R-project.org/package=NetPreProc

Examples

test_matrix <- replicate(10, rnorm(100)) test_dists <- test_matrix %>% precise_dist(c("manhattan", "rbf_1"))
#> [1] "Starting dists calculations at 2018-11-29 15:20:27" #> [1] "Finished dists calculations at 2018-11-29 15:20:28" #> [1] "Calculations took: 0.55 seconds" #> [1] "Starting dist_funcs calculations at 2018-11-29 15:20:28" #> [1] "Finished dist_funcs calculations at 2018-11-29 15:20:28" #> [1] "Calculations took: 0 seconds"
test_dists_transformed <- test_dists %>% precise_transform(enforce_dist = TRUE)