Class Meeting 8 Intro to plotting with ggplot2, Part II

8.1 Orientation

8.1.1 Worksheet

You can find a worksheet template for today here.

8.1.2 Announcements

From Assignment 3 onwards, whenever you produce an HTML file, you must link to a rendered version of the file. We’ll cover this today.

8.1.3 Today

  • GitHub Pages (15 min)
  • Continue with ggplot2: a tour of some important geoms (20 min)
  • ggplot2 exercises from the worksheet (40 min)

8.2 Participation repository and GitHub Pages (15 min)

8.2.1 GitHub Pages

You can turn your GitHub repository into a website, by enabling GitHub Pages on that repo. This is useful for something as small as being able to display HTML files without getting a local copy of the repository, to something as big as making a full fledged website like the stat545.stat.ubc.ca website.

  • If you make a repo called yourusername.github.io, and enable GitHub Pages on that repo, then the URL of the website will be https://yourusername.github.io/.
  • If you enable GitHub pages on any other repo, the URL for that repo will be https://yourusername.github.io/name_of_other_repo.

Learn more with GitHub’s GitHub Pages tutorial.

8.2.2 Practice with HTML file linking

We’ll practice linking to an HTML file for today’s exercise, by following the instructions on the (new!) “Viewing and Linking to HTML Files” on the assignments home page.

8.3 A tour of some important geoms (20 min)

Here, we’ll explore some common plot types, and how to produce them with ggplot2.

8.3.1 Histograms: geom_histogram()

Useful for depicting the distribution of a continuous random variable. Partitions the number line into bins of certain width, counts the number of observations falling into each bin, and erects a bar of that height for each bin.

Required aesthetics:

  • x: A numeric vector.

By default, a histogram plots the count on the y-axis. If you want to use proportion, specify the y = ..prop.. aesthetic.

You can change the smoothness of the plot via two arguments (your choice):

  • bins: the number of bins/bars shown in the plot.
  • binwidth: the with of the bins shown on the plot.

Example:

8.3.2 Density: geom_density()

Essentially, a “smooth” version of a histogram. Uses kernels to produce the curve.

Required aesthetics:

  • x: A numeric vector.

Good to know:

  • bw argument controls the smoothness: Smaller = rougher.

Example:

8.3.3 Jitter plots: geom_jitter()

A scatterplot, but with minor random perturbations of each point. Useful for scatterplots where points are overlaying, or when one variable is categorical.

Required aesthetics:

  • x: any vector
  • y: any vector

Example:

8.3.4 Box plots: geom_boxplot()

This geom makes a boxplot for a numeric variable in each of a category. Useful for visualizing probability distributions across different categories.

Required aesthetics:

  • x: A factor (categorical variable)
  • y: A numeric variable

Example:

8.3.5 Ridge plots: ggridges::geom_density_ridges()

A (superior?) alternative to the boxplot, the ridge plot (also known as the joy plot) places a kernel density for each group, instead of the box.

You’ll need to install the ggridges package. You can do lots more with ridges – check out the ggridges intro vignette.

Required aesthetics (reversed from boxplots!)

  • x: A numeric variable
  • y: A factor (categorical variable)

Example:

## Picking joint bandwidth of 2.23

8.3.6 Bar plots: geom_bar() or geom_col()

These geom’s erect a bar over each category.

geom_bar() automatically determines the height of the bar according to the count of each category.

geom_col() requires a manual specification of the bar heights.

Required aesthetics:

  • x: A categorical variable
  • y: A numeric variable (only required for geom_col()!)
    • To use proportion in geom_bar() instead of count, set y = ..prop..

Example: number of 4-, 6-, and 8- cylinder cars in the mtcars dataset:

8.3.7 Line charts: geom_line()

A line plot connects points with straight lines, from left-to-right. Especially useful if time is on the x-axis.

Required aesthetics:

  • x: a variable having some ordering to it.
  • y: a numeric variable.

Although not required, the group aesthetic will come in handy here. This aesthetic produces a plot independently for each group, and overlays the results.

8.3.8 Path plots: geom_path()

Like geom_line(), except connects points in the order that they appear in the dataset.

8.4 Activity: Fix the Plots (40 min)

Fill out the worksheet together.

8.5 Time remaining?

If so, let’s make tibbles with tibble(), and make a list column while we’re at it. Maybe even nest() and unnest().