In any analysis, there are a series of steps: data importing, cleaning, analysis, creating figures, and then putting it all together in an manuscript. Very often, some of these steps are time consuming, and you don’t want to rerun everything every time you modify the code, but keeping track of which parts of the code need to be re-run is challenging.
This is where a pipeline toolkit can help. It keeps track of which of your objects are up-to-date, and which need updating.
For some years, I have been using the
drake package. But
drake has been superseded,
targets is the new pipeline toolkit for R. I’ve been planning on migrating from drake to targets for a while.
Pipeline toolkits work very well in conjunction with the Neotoma database as downloading data is slow.
I have written a demonstration project that uses
- Download data from Neotoma
- Process and (minimally) clean the data
- Run a simple analysis
- Make some figures
- Render a manuscript containing the results and the figures using R markdown
If the site inclusion rules are changed, only new sites will be downloaded, making the download process much faster after the first run through.
The project is available on GitHub as a template. Make a copy and give it a try and give me some feedback.