(9) Automation, Part I

— LAST YEAR’S CONTENT BELOW —

26.37 Why Automate?

Because you’re going to have to rerun your analysis.

This can be a pain if you have multiple files and scripts!

According to Shaun Jackman and Jenny Bryan:

Automate a pipeline

… to reproduce previous results.
… to recreate results deleted by fat fingers.
… to rerun the pipeline with updated software.
… to run the same pipeline on a new data set.

26.38 Non-interactive programming

Run an R script from top to bottom:

  • source()/“source” button
  • rmarkdown::render()/“knit” button
  • Rscript, Rscript -e.

26.39 Pipelines

Let’s take a look at Shaun Jackman’s slides (scroll down).

26.40 Test Drive Make

Complete the “Test drive make” activity to see if you have make installed.

Windows machines: Some options for installation.

26.41 Makefile Structure

Each block of code in a Makefile is called a rule, it looks something like this:

file_to_create: files.it depends.on like_this.R
    code to be run in the command line
    that can have multiple lines of code
    Rscript like_this.R
  • file_to_create is a target, a file to be created, or built.
  • files.it, depends.on, and like_this.R are dependencies, files which are needed to build or update the target. Targets can have zero or more dependencies.
  • : separates targets from dependencies.
  • code to be run in the command line, …, Rscript like_this.R are actions, commands to run to build or update the target using the dependencies. Targets can have zero or more actions. Actions are indented using the TAB character, not spaces.
  • Together, the target, dependencies, and actions form a rule.

(Thanks to contributions from Tiffany Timbers here!)

26.42 LOTR Pipeline Examples

We’ll look at 3 pipelines that do the same thing in different ways: get data -> clean data -> extract relevant data.

26.42.1 Download

Download the cm109-automation_examples.zip file to your participation folder for today, and unzip it.

26.42.2 Test out the automation

We’ll test out the functionality of each pipeline, guided by suggestions from the README’s of each activity.

Overall goal: run the pipeline for each activity.