Tim Stuart     About     Archive     Publications     Feed

Create a computational lab notebook with bookdown

Every data analysis I do now is kept in an R Markdown document. These are great for mixing code with explanatory text, and you can run code in many languages not just R. Whenever I finished working on something, I would compile the R Markdown document into a self-contained html report and save is somewhere, usually with a descriptive filename like “coverage_genes” or “col_vs_cvi”.

This is where the problems begin. These reports, while great individually, can quickly pile up and you start to realize that the name “coverage_genes” doesn’t really help you find a bit of code you wrote 3 weeks ago. What you really want is everything in one place. I would routinely open up 5-10 html files and hit CMD+f on each one to try to find something I knew I wrote a few weeks ago. There had to be a better way.

Bookdown

Bookdown is an R package that collects a group of separate R Markdown documents and merges them into a single document – a book. There are some great examples of books written using bookdown (R for data science by Hadley Wickham is one). A couple of weeks ago I started using bookdown to create a computational lab notebook to store all my data analysis documents in one place, and since bookdown just collects different R Markdown documents, I didn’t really need to change anything I was doing. So far it’s been working really well.

However, bookdown was created with slightly different goals in mind to what I want. Normally, you would need to re-run all the R code each time you build the book in order to get all the entries to display correctly. For me this was a problem – by the of the year I’d be re-running every analysis I’d done that year every time I wanted to add something new.

Instead, I wrote a shell script that cheats a little bit by copying the intermediate markdown files generated by bookdown to a new folder and building the book from that instead. That way, each new analysis can be run once to generate a markdown file, then that markdown file copied over to a separate book and the book re-built. Re-building the book in this case is very fast because it’s not running any analysis code.

Getting interactive plots and tables

Now, the problem with this approach is if some outputs in your R Markdown documents need special javascript libraries to be included in order to run (eg plotly, pagedtable, htmlwidgets), that information is lost. This can be fixed by adding an extra html file to the book’s directory that sources the libraries, like this:

<link href="libs/pagedtable-1.1/css/pagedtable.css" rel="stylesheet" />
<script src="libs/pagedtable-1.1/js/pagedtable.js"></script>
<script src="libs/htmlwidgets-0.8/htmlwidgets.js"></script>
<link href="libs/plotlyjs-1.16.3/plotly-htmlwidgets.css" rel="stylesheet" />
<script src="libs/plotlyjs-1.16.3/plotly-latest.min.js"></script>
<script src="libs/plotly-binding-4.5.6/pl

I saved this in a file called libs.html. Then, in the _output.yml file (used for bookdown configuration), add:

  includes:
    in_header: libs.html

Now, that’s basically everything you need to get a computational lab notebook working using bookdown. My notebook looks something like this:

Drawing

Hosting the book

One drawback however is that using this type of notebook can make it hard to share results with people. Previously, when I would use a single self-contained html document to store everything, it was really easy to send that to whoever needed to see it. Now, using a book, there are many interconnected files and I can’t easily send someone a document to read through. One solution would be to host the book on github pages, but then anyone would be able to read what we’ve been working on, which isn’t always ideal.

I decided to host my notebook using amazon web services S3. It was pretty easy to put the files into a bucket and have it render a static site. The advantage of this is that while the site is viewable by anyone, the address is very cryptic and no-one would come across it by accident or by viewing my github page.

To make the aws s3 bucket, I just went through the “host a static website” guide on amazon and it took ~5 mins. However, I had to drag-and-drop files from my computer to the site, which is annoying if I were to do that each time I update the site.

AWS command line tools has a sync function that will let you upload files to an S3 bucket. All I had to do then is:

aws s3 sync notebook_render/_book s3://[bucket name]

Make sure you only upload the _book directory.

I added this to my shell script with a command line option, so that each time I build the book I can also choose if I want to upload it to amazon.

Creating your own lab notebook

You can start a lab book by cloning or forking the template I made at https://github.com/timoast/notebook-template.

This has all the files and directory structure needed, so you can then just start adding your own R Markdown files. Running the build_book.sh script should update your book with your latest document. build_book.sh -a will update the book and upload it to amazon, as long as you edit the script to put the name of your S3 bucket.

I also have a template that I use for each document in the book. It just makes a space for the date and a title, and puts in a code chunk running devtools::session_info() at the end. If you also want to use that template, install my R package from github:

devtools::install_github("timoast/stuart")