Class Meeting 3 Authoring

Communication of a data analysis is just as important as the analysis itself. Today, we’ll be looking at tools for writing about your analysis.

Announcements:

  • The add/drop deadline for Stat 545A is on Wednesday Sep. 11
  • Hang tight – the canvas slot for Assignment 1 is coming shortly.

3.1 Learning Objectives

By the end of today’s class, students are expected to be able to:

  • Write documents in markdown on GitHub and RStudio, and render these documents to html and pdf with RStudio.
  • Choose whether html or pdf is an appropriate output
  • Style an Rmd document by editing the YAML header
  • Demonstrate at least two Rmd code chunk options
  • Make presentation slides using one of the R Markdown presentation formats.

3.2 Resources

Cheat sheets for “quick reference”:

Further reading:

Other explorations of this content:

3.3 Topic 1: Output Formats (5 min)

There are generally two prominent non-proprietary file types to display manuscripts of various types:

  1. pdf: This is useful if you intend to print your work onto a physical sheet of paper, or for presentation slides. If this is not the primary purpose, then avoid at all costs, because formatting things so that it fits to the page is way more effort than its worth (unless your making presentation slides).
  2. html: This is what you see when you visit a webpage. Content does not need to be partitioned to pages.

We won’t be using proprietary file types, like MS Word. Amongst many reasons, it just doesn’t make sense for integrating reproducible code into the document and for a dynamic analysis.

Others that we won’t be covering:

  • Jupyter notebooks (actually a JSON file)
  • LaTeX

We’ll be treating pdf and html files as output that should not be edited. In fact, pdf documents are not even easy to edit, and even if you do pay for the Adobe add-on to edit the files, this is not a reproducible workflow.

What’s the source, then? (R) Markdown! We’ll be discussing this

3.4 Topic 2: Markdown

(3 min)

Markdown is plain text with an easy, readable way of marking up your text. Let’s see GitHub’s cheat sheet. Various software convert markdown to either pdf or html.

File extension: .md

3.4.1 Activity: Modify navigating_github.md (5 min)

Together:

  1. Open your navigating_github.md file that we made in the first class.
  2. Mark up the text with some markdown features.
  3. Commit your changes.

Notice that GitHub automatically displays markdown files nicely, but not HTML files.

3.4.2 Activity: Render navigating_github.md (5 min)

N.B.: this exercise employs an effective local workflow, which we will address next class.

Together:

  1. Download the contents of your GitHub participation repository as a zip file.
  2. In RStudio, open the file navigating_github.md.
    • Yes! RStudio also acts as a plain text editor!
  3. Convert the .md file to both pdf and html by clicking the appropriate button under the “Preview” tab.
  4. Push the two new files to GitHub (by dragging and dropping the files onto your participation repo).

3.5 Topic 3: R Markdown

(2 min)

R Markdown (Rmd) is a “beefed up” version of markdown – it has many more features built in to it, two important ones being:

  • We can specify more features in a YAML header.
    • This contains metadata about the document to guide how the Rmd document is rendered.
  • We can integrate code into a document.

Here’s RStudio’s cheat sheet on Rmd. You can see that it certainly has more features than “regular” markdown!

3.5.1 Activity: getting set up with R packages (5 min)

(Includes what we missed from last class)

To get started with using R Markdown, you’ll need to install the rmarkdown R package. The activity we have also depends on the gapminder, tibble, and DT packages.

Together:

  1. To install these packages, in any R console, run the following:
install.packages('rmarkdown')
install.packages('gapminder')
install.packages('tibble')
install.packages('DT')

“Official” R packages are stored an retrieved from CRAN.

  1. Check out vignettes for the tibble package by running browseVignettes(package = "rmarkdown").

3.5.2 Activity: exploring code chunks (15 min)

Last class, we explored data frames. This time, we’ll explore tibbles, but within code chunks in an R Markdown document.

Together:

  1. Open RStudio’s Rmd boilerplate by going to “File” -> “New File” -> “R Markdown” in RStudio. Explore!
  2. Scrap everything below the YAML header.
  3. Add a code chunk below the YAML header via “Insert” -> “R”. Or, by:
    • Mac: Cmd + Option + I
    • Windows: Ctrl + Alt + I
  4. Load the gapminder and tibble packages using the library() function, by adding the following code to your code chunk:
library(gapminder)
library(tibble)
library(DT)
  1. Print out the gapminder data frame to explore the output. Then, in a new code chunk, convert the mtcars data frame to a tibble using the tibble::as_tibble() function. Try out the DT::datatable() function on a data frame!
  2. Add some markdown commentary to this comparative analysis.
  3. Add an in-line code chunk specifying the number of rows of the mtcars dataset.
  4. “Knit” to html and pdf.

Note: knitr integrates the code into the document. The actual conversion here is Rmd -> md -> pdf/html.

3.5.3 Activity: exploring the YAML header (10 min)

(Note: If you’ve “fallen off the bus” from the last exercise, here’s a “bus stop” for you to get back on – just start a new Rmd file and use the boilerplate content while we work through this exercise.)

Now, we’ll modify the metadata via the YAML header. Check out a bunch of YAML options from the R Markdown book.

Together, in an Rmd file (ideally the one from the previous exercise):

  1. Change the output to html_document. We’ll be specifying settings for the html document, so this needs to go on a new line after the output: field:
output:
  html_document:
    SETTINGS
    GO
    HERE
  1. Add the following settings:
    • Keep the md intermediate file with keep_md: true
    • Add a theme. My favourite is cerulean: theme: cerulean
    • Add a table of contents with toc: true
    • Make the toc float: toc_float: true.
  2. Knit the results (you may have to delete the pdf, because it is no longer up to date!)

3.5.4 Activity: exploring chunk options (5 min)

(Bus stop! Couldn’t get previous exercises to work? No problem, just start a fresh R Markdown document with File -> New File -> R Markdown)

Just like YAML is metadata for the Rmd document, code chunk options are metadata for the code chunk. Specify them within the {r} at the top of a code chunk, separated by commas.

Together, in an Rmd file (ideally the same one we’ve been working on):

  1. Hide the code from the output with echo = FALSE.
  2. Prevent warnings from the chunk that loads packages with warning = FALSE.
  3. Knit the results.

3.6 Topic 4: Rmd Presentations

(3 min)

You can also make presentation slides using Rmd. A great resource is Yihui’s Rmd book, “Presentations” section.

Some types of formats:

3.6.1 Activity: exploring ioslides (10 min)

Let’s turn the file we’ve been working on into slides.

Together:

  1. In RStudio, go to “File” -> “New File” -> “R Markdown” -> “Presentation” -> “ioslides”. Explore!
  2. Clear everything below the YAML header.
  3. Copy and paste the tibble exploration we’ve been working on (without the YAML header), and turn them into slides.

3.7 Wrap-up (3 min)

Push the following files to your GitHub repo:

  1. navigating_github.md and its output formats.
  2. The Rmd exploration and its output formats.
  3. The Rmd presentation slides exploration and its output formats.