A demo targets plan for reproducible pipelines for Neotoma data

In any analysis, there are a series of steps: data importing, cleaning, analysis, creating figures, and then putting it all together in an manuscript. Very often, some of these steps are time consuming, and you don’t want to rerun everything every time you modify the code, but keeping track of which parts of the code need to be re-run is challenging.

This is where a pipeline toolkit can help. It keeps track of which of your objects are up-to-date, and which need updating.

For some years, I have been using the drake package. But drake has been superseded, targets is the new pipeline toolkit for R. I’ve been planning on migrating from drake to targets for a while.

Pipeline toolkits work very well in conjunction with the Neotoma database as downloading data is slow.

I have written a demonstration project that uses targets to

  • Download data from Neotoma
  • Process and (minimally) clean the data
  • Run a simple analysis
  • Make some figures
  • Render a manuscript containing the results and the figures using R markdown

If the site inclusion rules are changed, only new sites will be downloaded, making the download process much faster after the first run through.

The project is available on GitHub as a template. Make a copy and give it a try and give me some feedback.

Advertisement

About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments
This entry was posted in reproducible research and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s