precise_dist
precise_transform.Rd
Generally, one should skip this function before running precise_fusion
on the results of precise_dist
for the first time.
It can be useful, however, to experiment with different transformations to the data, which can dramatically alter the final results (for better or worse).
Any transformations applied will be applied to every element of the input data.
precise_transform(data, remove_dups = FALSE, filter_string = NULL, keep_string = NULL, add_prefix = NULL, add_suffix = NULL, enforce_dist = FALSE, enforce_sim = FALSE, transform = "none", diagonal = NULL, binary_matrix = NULL, min_k = NULL, fixed_k = NULL, affinity_matrix = NULL, add_noise = NULL, remove_errors = FALSE, return_list = FALSE, return_df = FALSE, parallel = FALSE)
data | Either a list of distance matrices or the native output of |
---|---|
filter_string | NULL or a string value that greedily filters distances found in the input. |
keep_string | NULL or a string value that greedily keeps distances found in the input. |
enforce_dist | TRUE or FALSE. Should all non-distances be coereced into a distance matrix? |
enforce_sim | TRUE or FALSE. Should all non-similarities be coereced into a similarity matrix? |
transform | NULL or a string value. Possible choices include "none", "range01", "center", "scale", "center_scale", "chua", "laplacian", "prob" and "binorm". |
binary_matrix | NULL, a numeric value or a string value. If numeric, a binary matrix is obtained by thresholding using raw values. If string, a binary matrix is obtained by thresholding using quantiles. |
min_k | NULL or an integer value guranteeing a minimum of k edges for each observation (row). |
fixed_k | NULL or an integer value guranteeing exactly k edges for each observation (row). The resulting matrices may not symmetric. |
affinity_matrix | NULL or an integer representing the number of nearest neighbors used to compute an affinity matrix from a distance matrix. |
return_list | TRUE or FALSE. Should the |
return_df | TRUE or FALSE. Should a named list be coerced into the |
cores | An integer value equal to 1 or greater for the number of computer cores to use. |
A data object (list or precise_dist
dataframe) with applied transformations.
It is important to note that the order of operations is run in the same order as the arguments list, thus filter_string is run before enforce_dist, which is run before min_k. If you need to run the operations in a different order, simply call precise_transform multiple times (see examples). The following are notes concerning the different parameter choices:
cores Note that setting cores > 1 is only applied if transform = "chua" because most of the transformations are computationally inexpensive.
filter_string This is a greedy string search so c("manha", "euc") would return "manhattan" and "euclidean".
keep_string This is a greedy string search so c("manha", "euc") would filter "manhattan" and "euclidean".
enforce_dist This argument checks to see if data1, 1 < data2, 1 and applies 1 - abs(data) if TRUE. Note this could fail if data1, 1 = data2, 1.
enforce_sim This argument checks to see if data1, 1 > data2, 1 and applies 1/(1 + x) if TRUE. Note this could fail if data1, 1 = data[2, 1
transform The following are the available transformation functions:
"range01" = Max.Min.norm
from the NetPreProc package.
"center" = scale(data, center = TRUE, scale = FALSE)
"center_scale" = scale(data, center = TRUE, scale = TRUE)
"chua" = Chua.norm
from the NetPreProc package.
"laplacian" = Laplacian.norm
from the NetPreProc package.
"prob" = Prob.norm
from the NetPreProc package.
"binorm" = Magnify.binary.features.norm
from the NetPreProc package.
"random_walk" = random.walk
from the diffusr package.
binary_matrix = Binary.matrix.by.thresh
from the NetPreProc package.
When the input value is a string, it is coereced to a numeric and passed as the probs argument to quantile
from the stats package,
which is subsequently passed as the thresh argument to Binary.matrix.by.thresh
.
min_k = Sparsify.matrix
from the NetPreProc package.
fixed_k = Sparsify.matrix.fixed.neighbours
from the NetPreProc package.
affinity_matrix = affinityMatrix
from the SNFtool package.
Sigma is set automatically as the median value of sigest
from the kernlab package.
return_list Note that the Time_Taken_Seconds column information returned by precise_dist
will be dropped.
return_df Note that the Time_Taken_Seconds column information returned by precise_dist
will all be NA.
Muchmore, B., Muchmore P. and Alarcón-Riquelme ME. (2018). Optimal Distance Matrix Construction with PreciseDist and PreciseGraph.
Giorgio Valentini and Matteo Re -- Universita' degli Studi di Milano (2015). NetPreProc: Network Pre-Processing and Normalization. R package version 1.1. https://CRAN.R-project.org/package=NetPreProc
test_matrix <- replicate(10, rnorm(100)) test_dists <- test_matrix %>% precise_dist(c("manhattan", "rbf_1"))#> [1] "Starting dists calculations at 2018-11-29 15:20:27" #> [1] "Finished dists calculations at 2018-11-29 15:20:28" #> [1] "Calculations took: 0.55 seconds" #> [1] "Starting dist_funcs calculations at 2018-11-29 15:20:28" #> [1] "Finished dist_funcs calculations at 2018-11-29 15:20:28" #> [1] "Calculations took: 0 seconds"test_dists_transformed <- test_dists %>% precise_transform(enforce_dist = TRUE)