Investigating Docker and R15 Dec 2016
Docker and R: How are they used together? That is the question that we asked ourself in recent weeks. In this post, we are going to share our insights with you.
The most prominent effort in this area is the Rocker project. It was initiated by Dirk Eddelbuettel and Carl Boettiger. For an introduction, you may read their blog post here or follow this tutorial from rOpenSci.
With a big choice of pre-build Docker images, Rocker provides optimal solutions for those who want to run R from Docker containers. Explore it on Github or Docker Hub, and soon you will find out that it takes just one single command to run instances of either base R, R-devel or Rstudio Server. Moreover, you can run previous versions of R or use one of the many bundles with commonly used R packages and other software (e.g. bundles going back to Hadley Wickham and rOpenSci).
Dockerizing Research and Development Environments
So why, apart from the incredibly easy and fast installation of R bundles, would you really want to combine R with Docker?
Ben Marwick, Associate Professor at the University of Washington, explains in this presentation that it helps you manage dependencies. It gives a computational environment that is isolated from the host, and at the same time transparent, portable, extendable and reusable. Marwick uses Docker and R for reproducible research and thus bundles up his works to a kind of Research Compendium; an instance is available here, and a template here.
Carl Boettiger, Assistant Professor at UC Berkeley, has also written in detail about using Docker for reproducibility in his ACM SIGOPS paper ‘An introduction to Docker for reproducible research, with examples from the R environment’. Both Ben and Carl have written about their case studies using Docker for research compendia in a essay for rOpenSci.
The R package dockertest makes use of the isolated environment that Docker provides: R programmers can set up test environments for their R packages and R projects, in wich they can rapidly test their works on Docker containers that only contain R and the relevant dependencies. All of this without cluttering your development environment.
Some works are dedicated to dockerizing R-based documents. The liftr package for R lets users enhance Rmd files with YAML-metadata, wich enables rendering R Markdown documents in Docker containers. Liftr also supports Rabix, a Docker-based toolkit for portable bioinformatics workflows. That means that users can have Rabix workflows run inside the container and have the results integrated directly into the final document.
Controll Docker Containers from R
Rather than running R inside Docker containers, it can be beneficial to call Docker containers from inside R. This is what the packages
Selenium provides tools for browser automation, which are also available as Docker images. They can be used, amongst others, for testing web applications or controlling a headless web browser from your favorite programming language. In this tutorial, you can see how and why you should use RSelenium to interact with your Selenium containers.
googleComputeEngineR provides an R interface to the Google Cloud Compute Engine API. It includes a function called
docker_run that starts a Docker container in a Google Cloud VM and executes R code in it. Read this article for details and examples. There are similar ambitions to implement Docker capabilities in the analogsea package that interfaces the Digital Ocean API.
analogsea use functions from the harbor package for R. You should have a look at it, because it may help you to control your own Docker containers that run either locally or remotely.
R and Docker for Complex Web Applications
Docker, in general, may help you to build complex and scalable web applications with R.
Mark McCahill presented at an event of the Duke University in North Carolina (USA) how he provided 300+ students each with private RStudio Server instances. In his presentation (PDF / MOV (398 MB)), he explains his RStudio farm in detail.
If you want to use RStudio with cloud services, you may find delight in these articles from the SAS and R blog: RStudio in the cloud with Amazon Lightsail and docker, Set up RStudio in the cloud to work with GitHub, RStudio in the cloud for dummies, 2014/2015 edition.
The platform R-hub helps R developers with solving package issues prior to submitting them to CRAN. In particular, it provides services that build packages on all CRAN-supported platforms and checks them against the latest R release. The services utilize backends that perform regular R builds inside of Docker containers. Read the project proposal for details.