A Parallel Future.Rmd
“A happy man is too satisfied with the present to dwell too much on the future.”
PreciseDist relies on the Future package to process functions in parallel. Briefly, the future package attempts to unify the various parallel backends into a consistent framework for most use cases, and it’s philosophy is to put the power in the user’s hands rather than myopically hardcoding the parallel backend into the package code. Thus, the following are some code snippets to get you started, but for much more information please see the future package overview. Also, PreciseDist has never been tested on a cluster, so please file an issue at the PreciseDist issues page if you are trying this and running into difficulties you believe can be traced back to this package. We would be more than happy to try to help.
PreciseDist currently utilizes the foreach package for all of it’s parallelism. We chose foreach over something like the furrr package because it has some very nice options like automatic error handling, which is very useful when running data against 100+ distances in which some of the distances are bound to fail.
Here is the code to initialize a parallel backend for and PreciseDist function that has a parallel parameter:
library(future)
library(doFuture)
## First, register the future adapter for the foreach package.
registerDoFuture()
## This option is optional and allows an infinite size to the global variables future identifies.
options(future.globals.maxSize = +Inf)
## Then register the plan type and the number of cores (workers = cores).
## Setting workers = future::availableCores() will utilize all available cores on your system.
## Leaving the plan as multiprocess should work for all personal computers regardless of the operating system.
plan(multiprocess, workers = future::availableCores())
## Now run the PreciseDist function of interest with parallel = TRUE
your_data %>%
precise_dist(dists = "all_dists", parallel = TRUE)
## That is it.
Now the PreciseDist function will run in parallel as long as you also set parallel = TRUE as one of the function arguments. If this is still unclear, examples of using PreciseDist’s parallel framework are scattered throughout the vignettes.
The following code will achieve the same results as the above code, but it will use a different parallel backend behind the scenes. If future is giving you errors try switching the backend to one of the following before filing an issue or giving up. See the doFuture GitHub page for more information.
library(future)
library(doFuture)
registerDoFuture()
cl <- makeCluster(4)
plan(cluster, workers = cl)
library(future)
library(doFuture)
registerDoFuture()
cl <- makeCluster(4, type = "MPI")
plan(cluster, workers = cl)
Future has a host of options that can be set, but the most useful one in my experience is:
## To see a list of all options run: help("future.options", package = "future")
options(future.globals.maxSize = +Inf)
Running this when you start your R session prevents a potential “Error in getGlobalsAndPackages” by allowing an infinite size to the global variables future identifies.