A Note about the Vignettes.Rmd
“A short story is confined to one mood, to which everything in the story pertains. Characters, setting, time, events, are all subject to the mood. And you can try more ephemeral, more fleeting things in a story - you can work more by suggestion - than in a novel. Less is resolved, more is suggested, perhaps.”
The following is a brief introduction of some good things to know, and then a one or two sentence summary of each vignette and the functions which are exemplified.
The vignettes contained within PreciseDist use biological data, but if that is meaningless to you, understand that PreciseDist will work with many kinds of data because a matrix is a matrix is a matrix. And, while one of PreciseDist’s main goals is to help you create a matrix that is optimal for the problem you are trying to solve, the functions we provide try to be data-source agnostic. Also, while we typically use the word distance throughout the vignettes, PreciseDist produces and works with similarities and correlations as well. What PreciseDist typically tries not to do, however, is make decisions on whether the input(s) is a similarity, correlation or distance, so try to stay aware of how the relationships in your data are being numerically defined before running various PreciseDist functions.
In addition, PreciseDist will always be a work in progress, so if while using the package anything is unclear, please do not hesitate to ask a question, suggest an improvement or point out a reproducible bug/flaw/inconsistency at the Github issues page. Lastly, if a scientific paper has led you to PreciseDist, please note that while the data used for examples in the vignettes may be ostensibly the same at times as the data in the paper, the exact code and methodology is not. If you wish to find a verbatim copy of the code used for the results in the paper, please click here.
This vignette explains how to use the parallel resources of your computer or cluster to run PreciseDist functions in parallel. Although it contains no PreciseDist functions, it is a simple yet crucial vignette to understand unless you are working with very limited amounts of data.
In many ways, this is the main vignette of the packages, and it introduces a methodology for using the PreciseDist framework to tackle the same problem in a variety of different ways. Or, as they say, if all roads lead to Rome there is more than one way to skin a cat.
data("data_cell_cycle")
trellis_plots()
trellis_heatmap()
This is a very meta vignette that shows you how to make and then cluster a graph from a distance matrix of distances. This can be very helpful in deciding which distances one should combine to get a holistic view of either a single dataset or multiple datasets.
trellis_descriptors()
precise_transform()
and it’s keep_string and filter_string parameters
A PreciseDist function may give you an answer that at times seems too good to be true, so this vignette shows you a few ways of trying to mitigate the false and hollow hope that overfit can endow.
precise_dist()
and it’s partitions parameter
precise_transform()
and it’s add_noise parameter
This vignette shows you several different ways you can cluster your results within the PreciseDist framework, and urges you to only trust the results that make sense and which are useful because we believe clustering is more about the journey than the destination.
trellis_descriptors()
In this vignette, we demonstrate a few different ways of determining if your clusters make sense and are useful.
trellis_descriptors()
and both it’s diagnostics and rank parameters
trellis_pivot()
PreciseDist provides a number of output visualization options for input graphs or matrices, so in this vignette we show them all at once.
Although PreciseDist provides many different types of visualizations, Gephi is a wonderful way to take the visualizations PreciseDist produces for further analysis. Also, as an aside, we show you here how to embed downloadable static images into your Rmarkdown page when trying to include them through local paths is making you confused and crazy.
precise_viz()
and it’s graphml parameter
While PreciseDist offers a clustering solution, the framework focuses considerably more on the before (the distance) and the after (the usefulness) of clusters than the clustering itself. So, this vignette is a code repository for other methods that can also take distances (or similarities or correlations) as input.