precise_func_fact stands for function factory, thus this function automatically creates distance functions which can be passed to the precise_dist dist_funcs parameter.

precise_func_fact(func = "rbf", params)

Arguments

func

A string of the function to use. Choices include "rbf", "laplace", "minkowski", "random_forest", "kodama_knn", "kodama_pls" and "tsne".

params

A numeric vector of values as input into the algorithm determined by the func parameter.

Value

A list of functions for input into the dist_funcs argument of precise_dist.

Details

While most distance functions have no second argument, some do allowing for multiple views of a dataset tweaked by the second argument of the distance function. The following describes the params argument for each possible func choice:

  • For both RBF and Laplace the params argument refers to the sigma parameter of the rbfdot/laplacedot functions.

  • For minkowski, the params argument refers to it's power.

  • For random_forest, the params argument refers to mtry.

  • For kodama_knn, the params argument refers to f.par when FUN = "KNN"

  • For kodama_knn, the params argument refers to f.par when FUN = "PLS-DA"

  • For tsne, the params argument refers to perplexity.

  • See examples below for some reasonable defaults for the params argument for different func parameter choices.

References

Muchmore, B., Muchmore P. and Alarcón-Riquelme ME. (2018). Optimal Distance Matrix Construction with PreciseDist and PreciseGraph.

Examples

test_data <- replicate(10, rnorm(100)) # Use kernlab's sigest function to estimate the sigma parameter for "rbf" and "laplace" sigma_est <- kernlab::sigest(test_data, frac = 1, na.action = na.omit, scaled = FALSE) # Now return a numeric vector of length 10 that represents 10 values between the 0.1 and 0.9 quantile returned by kernlab::sigest sigma_params <- seq(sigma_est[[1]], sigma_est[[3]], length.out = 10) # sigma_params is now ready to be fed in as the params argument to precise_func_fact when func = "rbf" or func = "laplace". rbf_funcs <- precise_func_fact( func = "rbf", params = sigma_params ) # Return a numeric (integer) vector of length 10 that represents 10 values between 2 and ncol(test_data) * 0.5 as input into random_forest's mtry parameter. rf_params <- seq(2, round((ncol(test_data) * 0.5), 0), length.out = 10) %>% map_dbl(~round(.x, 0))
#> Error in map_dbl(., ~round(.x, 0)): could not find function "map_dbl"
# rf_params is now ready to be fed in as the params argument to precise_func_fact when func = "random_forest". rf_funcs <- precise_func_fact( func = "random_forest", params = rf_params )
#> Error in map(params, ~rf_function(.x)): object 'rf_params' not found
# Return a numeric vector of length 10 that represents 10 values between 1 and 2 as input into minkowski's p (power) parameter. # Note: p = 1 is equivalent to the manhattan distance while p = 2 is equivalent to the euclidean distance. minkow_params <- seq(1, 2, length.out = 10) # minkow_params is now ready to be fed in as the params argument to precise_func_fact when func = "minkowski". minkow_funcs <- precise_func_fact( func = "minkowski", params = minkow_params ) # Return a numeric vector of length 10 that represents 10 values between 5 and 50 as input into tsne's perplexity parameter. tsne_params <- seq(5, 50, length.out = 10) # tsne_params is now ready to be fed in as the params argument to precise_func_fact when func = "tsne". tsne_funcs <- precise_func_fact( func = "tsne", params = tsne_params ) # The value of rbf_funcs, rf_funcs, minkow_funcs and tsne_funcs are now ready to be fed into the dist_funcs argument of precise_dist. # In order to combine the output of *_funcs into one large input for the dist_funcs argument of precise_dist, run the following. precise_dist_input_funcs <- rbf_funcs %>% append(rf_funcs) %>% append(minkow_funcs) %>% append(tsne_funcs)
#> Error in append(., rf_funcs): object 'rf_funcs' not found