WHEW! Okay, time for some plots. A much more rewarding endeavor. Plot making! Make sure that you have installed the ggplot2 package.

Basics - intro to the grammar of graphics

Here is a brief description of the basic building blocks of a creating a ggplot.

argument description of component
data as a data.frame (long format!)
aesthetic (aes) mapping variables to visualise properties - position,colour, line, type, size
geom actual visualisation of the data
scale map values to the aesthetics, colour, size, shape (show up as legends and axes)
stat statistical transformations, summaries of data (e.g., line fits, etc., )
facet splitting data across panels based on different subsets of the data

Let’s start with a basic scatterplot of life expectancy over time. You’ll notice that we are telling ggplot that we will be using the gapminder data (a data.frame!) and then telling it that we want the year on the x-axis and life expectancy on the y. After that, we need to use the + to indicate that we want to add another layer - in this case we need to add points.

# Load ggplot2
library(ggplot2)

# Load gapminder
library(gapminder)
# Basic scatterplot
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
  geom_point()

Now let’s add some colour. Yes, 'colour' or 'color' can be used in the ggplot functions.

# We're going to colour the discrete variable continent
ggplot(data = gapminder, aes(x = year, y = lifeExp, colour = continent)) +
  geom_point()

Just as you can assign vectors, data.frames, and other R objects to a variable, you can also assign ggplots to variables.

p <-
ggplot(data = gapminder, aes(x = year, y = lifeExp, colour = continent)) +
  geom_point()

And as we’ve seen before, no plot has been produced because it has been stored as the variable p. To view our plot, we can just call that variable.

p


Layering

Here’s where we’re going to demonstrate the way that you add layers to build up a plot.

See here, that when you just call ggplot, without any geoms, nothing gets plotted! You need to also tell it to add something!

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent))

In this case, let’s add a line!

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
  geom_line()

What if we want to add more than a just a line? No problem, let ggplot know that you are going to add something else using the +. Let’s add some points.

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) +
  geom_line() + 
  geom_point()

Okay, that’s great, but the points don’t really stand out against the colour of the lines. We can also be more specific with our layering and aesthetics. Notice how I moved the aesthetics into the geom_line() function. You can think of aesthetics that are listed in the ggplot() function as being the 'global' settings, laying the defaults for any geoms to come later int he plot.

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country)) +
  geom_line(aes(colour = continent)) + 
  geom_point()

Exercise 1

Can you play with layers to create this plot? Solution

Scales

Remember, scales are what will ultimately result your axes and variables that are coded using a legend.

What is something that you notice between these two graphs?

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = log(gdpPercap))) + 
  geom_point()

ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = continent)) + 
  geom_point()

If you haven’t noticed yet, let’s look back at our data.

str(gapminder)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : num  1952 1957 1962 1967 1972 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Cool! There is a difference in the way that the default colour scale operates for discrete and continuous variables!!!!!

Ok, now that we’ve figured that out we can change the colour scales for our variables. With your neighbour see if you can change the colour scales that are being used. You’ll likely need to use the scales cheatsheet section and a little bit of googling. Let me know if you guys need a hint. But I want you to take try first.

Exercise 2

Solution

Exercise 3

Try using ggsave to save one of your plots to a file! Solution


Stats

Stats summarise or transform data. Under the hood, new data frames are created that contain these summaries/transformed data. For the most part, you don’t need to worry about this work happening under the hood.

All geoms have a default statistic that they use (or no transformation - plots the raw data). Some examples of stats layers include density, boxplot,

ggplot(data = gapminder, aes(x = lifeExp, y = gdpPercap)) + 
    geom_point() +
    stat_smooth()
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) + 
    geom_point() +
    scale_x_log10() +
    stat_smooth()
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) + 
    geom_point() +
    scale_x_log10() +
    stat_smooth(method = 'lm')

Advanced notes on stats in ggplot2
More coming on stats - but for now, the above plots are the most you need to know for examples. The notes I will add here (and have started below) are a more advanced description of what’s going on with stats. For the most part you can just plug and play.

We can start with a histogram of life expectancies across all of the continents. If you are really interested, you can read more somewhere??? - yet to find a good description outside of the ggplot appendices…

ggplot(data = gapminder, aes(x = lifeExp)) + 
    geom_histogram(binwidth = 1) 

Now, let’s add a density function on top of this layer.

stats calculate the

ggplot(data = gapminder, aes(x = lifeExp, y = ..count..)) + 
    geom_histogram(binwidth = 1, alpha = 0.7) + 
    stat_density(geom = "line", colour = 'red')


Facetting

Facets allow you to visualise different subsets or groupings of your data on different panels within a single plot. For example, I can facet my data by continent (colouring to make each continent stand out better).

ggplot(data = gapminder, aes(x = year, y = lifeExp, colour = continent)) + 
  geom_point() + 
  facet_wrap(~ continent)

I can also facet by two groupings. (Note that I subset the data to make this manageable and rotate the x axis text to be more readable). # (note that I subset the data to make this a more manageable dataset)

ggplot(data = subset(gapminder, year >= 1997), 
       aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
  geom_point() + 
  facet_grid(continent ~ year) + 
  theme(axis.text.x = element_text(angle = 45))

Other tips

1) Visualising overlapping data

Let’s return to our boring basic scatter plot.

ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
  geom_point()

Sometimes, when you have many data points that overlap, you want to have a better idea of the amount of data for a given point. I find this often comes up when you are plotting discrete variables with continous ones. There are two ways we can get a better look at the data. We can use jitter or adjust the transparency of the data.

# Using geom_jitter
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
    geom_jitter()

We could also use geom_point(position = position_jitter()) instead.

I would prefer if this plot didn’t have the points jittered so much. I can do this by changing the width and height of the jitter.

# Using geom_jitter
ggplot(data = gapminder, aes(x = year, y = lifeExp)) +
    geom_jitter(position = position_jitter(w = 0.5, h = 0.5))

Another option is to use transparency.

# Adjusting the transparency of points

ggplot(data = gapminder, aes(x = year, y = lifeExp)) + 
    geom_point(alpha = 0.5)


2) Themes

The default ggplots are fine, but sometimes you want a plot that is more minimalist, or to tweak the size of text, or thickness/appearance of lines, margins, legends, etc. This is where themes come in. For the master list of themes checkout the ggplot2 theme docs.

One that I commonly use is the black and white theme.

ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
  geom_point() + 
  theme_bw()

I often make it without the grid lines.

ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
  geom_point() + 
  theme_bw() + 
  theme(panel.grid = element_blank())

Or for my presentations I use this custom made white on black theme.

# Custom made theme
theme_wb <- function() {
  theme(panel.background = element_rect(fill = 'black'), 
  plot.background = element_rect(fill = 'black'), 
  axis.line = element_line(colour = 'white'), 
  axis.text = element_text(colour = 'white'), 
  axis.title = element_text(colour = 'white'), 
  panel.grid = element_blank()
  )
}

# My theme right now doesn't deal with legends, so for this demo I've 
# removed it
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
  geom_point() + 
  theme_wb() + 
    guides(colour = FALSE)


Exercise 1 Solution
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country)) + 
    geom_point() +
    geom_line(aes(colour = continent))

Exercise 2 Solution
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = log(gdpPercap))) + 
  geom_point() + 
  scale_colour_gradientn(colours = topo.colors(10))

Exercise 3 Solution
p <- 
ggplot(data = gapminder, aes(x = year, y = lifeExp, by = country, colour = log(gdpPercap))) + 
  geom_point() + 
  scale_colour_gradientn(colours = topo.colors(10))

ggsave(filename = 'my_first_plot.png', plot = p)