Tools for a reproducible manuscript

It is all too easy to make a mistake putting a manuscript together: mistranscribe a number, forget to rerun some analyses after the data are revised, or simply bodge the round some numbers. Fortunately it is possible to avoid these problems with some technology that inserts numbers and graphs into the manuscript, keeps them all up to date, and lets you see the calculations behind everything. The technology does not prevent code errors, but by making the analysis transparent, they are hopefully easier to find.

There are numerous guides to writing dynamic documents with Rmarkdown. What I want to do here is show some of the things I needed to write a manuscript with Rmarkdown. There may be better solutions to some of the problems I faced – tell me about them. If you want to see any of this in action, take a look at the GitHub repo for my recently accepted manuscript.

To get started with Rmarkdown, in RStudio, go to File >> New File >> R markdown and you get a sample file to play with. Remember, no spaces in the filename.

Too much code

Rmarkdown lets you have chunks of R code embedded in the text, so that when the document is rendered, the code is evaluated following the chunk’s options  and the results printed. This is great for teaching and simple documents, but when you have several hundred lines of code to clean data it becomes unwieldy to have as one document, and prevents reuse if you want to make a manuscript and a presentation with the same code.

The first solution I found for this is the knitr::read_chunk function which lets you put the code in a separate file. I later found the drake package which lets you put most of the code in the drake_plan and then readd or loadd the generated objects (targets) into rmarkdown. When you make the drake_plan only the targets that are out of date (because of code or data changes) get updated. This nicely solves the problem of how to cache the results of slow analyses (there is a cache built into Rmarkdown but it is a bit fragile).

Article format

If you want to submit your manuscript to a journal as a PDF, things will go a lot smoother if you use the journal’s LaTeX template. I have no idea how to do that either, but the rticles package takes away most of the pain. You make a draft manuscript with

rmarkdown::draft("MyArticle.Rmd", template = "acm_article", package = "rticles")

This generates the YAML, the metadata at the top of the Rmarkdown file between two sets of ---, required. This works well provided you don’t do anything daft such as submit to Wiley, one of the few major publishers without a template in rticles.

Citations

Obviously, a manuscript is going to need citations, and formatting citations is a pain. rticles will put a couple of lines in the YAML to say which bibtex file to use and which citation style.

bibliography: chironomid2.bib
csl: elsevier-harvard_rjt.csl

Any decent citation manager will be able to export a bibtex file. I use JabRef.

CSL files for many different journals are available. I had to edit the Elsevier style slightly so that it showed R packages correctly.

In the text, write [@Smith1990] to cite the reference with the key Smith1990 with name and year, or [-@Smith1990] to get just the year. The formatted reference list gets added to the end of the file. The citr addin for Rstudio lets you search through the bibtex file and select references to add to the manuscript.

Line numbers

I hate reviewing manuscripts without line numbers. So I wanted line numbers on my manuscript. This turned out to be easy – Rmarkdown documents are rendered into PDF via LaTeX, so LaTeX commands can be used.

In the YAML, add

header-includes:
  - \usepackage{lineno}
  - \linenumbers

And that’s it.

Cross-referencing figures

One of my favourite parts of writing a manuscript is when I decide to add another figure and need to renumber all the other figures and make sure that the text refers to the correct figure.

Rmarkdown can cross-reference figures (also tables etc). This feature was added relatively recently, and needs bookdown rather than rmarkdown, which can be set in the YAML (NB spacing is important in YAML )

output:
  bookdown::pdf_book:
    base_format: 
      rticles::elsevier_article
    number_sections: true

In the text, to refer to the figure from the chunk with the label weather-climate-plot, use (Fig. \@ref(fig:weather-climate-plot)). Important, this does not work if you have underscores in the chunk name. This took me an entire evening to work out.

You can also cross-reference figures in the supplementary material. In the header-includes: section of the YAML, add

  - \usepackage{xr}
  - \externaldocument[sm-]{supplementary_data}

Where supplementary_data is the name of the rmarkdown file you want to reference. In the text you can then use Fig. \@ref(sm-fig:zab-dist-analogue). The sm prevents any clash between the labels in the manuscript and the supplementary.

To force the figures in the supplementary to be prepended with an “S”, to the YAML in the supplementary, add

header-includes:
  - \renewcommand{\thefigure}{S\arabic{figure}}

This magic works on the *.tex files made as an intermediate stage of rendering the PDF. You need to keep these by running rmarkdown::render with clean = FALSE, then you can run tinytex::pdflatex

tinytex::pdflatex("manuscript.tex", clean=TRUE)
tinytex::pdflatex("supplementary_data.tex", clean=TRUE)

Package management

One of the great things about R is that the packages are often improved. This is also a problem as the code that worked fine may fail after you update a package. One solution to this is to use packrat to keep track of which packages your project uses. packrat will automatically download all the packages you need to reproduce my manuscript.

I found packrat to be a bit of a pain, and am looking forward to trying renv.

Rmarkdown and git

As rmarkdown files are text files they work well with version control using git and, for example, GitHub. Since git tracks changes in each line of text, it would show a long paragraph as having changed if you add one comma. A trick I learnt somewhere is to break the paragraph up, one sentence per line, so the version control now works at the sentence level. Rmarkdown ignores single line returns in a document, you need two to get a new paragraph.

 

It is not difficult

If you can write code, and you can write text, you can write a manuscript with Rmarkdown. Your coauthors can probably cope too.

About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments
This entry was posted in reproducible research and tagged . Bookmark the permalink.

3 Responses to Tools for a reproducible manuscript

  1. I used Rmarkdown before to generate a Word file, were it is quite hard to get the format you want and many lay-out options are not implemented. So I was curious how you were able to generate a manuscripts that is in the right format for a journal. A clear plus for LaTex.

    • With the help of the `rticles` package, it just works in pdf. Never bothered with rendering to word – looks like it might be useful for students who have supervisors that cannot cope with anything new. If you have a look at the github repo, you can see exactly what I did.

  2. vincepi says:

    Thanks very much Richard! Exactly what I was looking for. Vince

Leave a comment