For the last couple of years, we’ve been using the statistical programming language R when we do statistical analysis or data visualizations at work. We typically deal with small data — most of the time, our data sets are high-tens or low-hundreds of rows of data.
A lot of the time, we create R Notebooks with our analysis and visualizations. This works well for us: the R Notebook contains the code used to do the analysis, the results of the analysis and the visualizations, all in one place. This eliminates questions like: “did you remove outliers before making the graph?” Or, “did you check that the data are distributed normally before you did that test?” A reviewer of the R Notebook can see exactly what was done.
By default, the R Notebook produces an html file that you can open in your browser. You can email this html file to a colleague, and they can see your results and graphs, as well as exactly how you obtained them. If you made a logical mistake, or an inappropriate assumption, your colleague has the opportunity to find it.
There is also a button in the html file that the R Notebook gets exported to that says “Download Rmd.” This allows your colleague to open the notebook in R Studio and run your code. If you sent your data.
The one problem with just emailing R Notebooks to a colleague is that the R Notebook does not include the data. This might be okay if the data source is a file on a network, or a database that you both have access to, but in a lot of cases — at least in my work — the data is a CSV or Excel file. Now, if I want to send an R Notebook to a colleague to review, I need to remember to send the data file along with it.
Enter rde
.
I wrote the package rde
(which stands for Reproducible Data Embedding) to tackle this problem. This
package allows you to embed data right in your R Notebook (or any other R code).
It does so by compressing the data and then
base-64 encoding it into an ASCII
string. This string can be pasted into the R Notebook and converted back into
the original data when someone re-runs the Notebook.
I won’t go into all the details of how to use the package. If you’d like to learn more, you can read the package vignette.
This isn’t the first R Package that I’ve written, but it is the first one that
I’ve submitted to CRAN. When you install an R
package using install.packages()
, you’re installing it from CRAN. I think that
CRAN is one of the best parts of the R ecosystem since it does
continuous integration
for all of the packages hosted there. This helps ensure that all the packages
continue to work as R is updated and as other packages are updated. I’ll likely
talk about this more in a future blog post.
If you’re an R user and you think that the package rde
would help you in your
workflow, check it out. You can install it by typing install.packages("rde")
in R. If you find a bug, please file an
issue on GitHub. And, if you would
like to add functionality or improve it in some way, feel free to send me a
pull request.