You are on page 1of 50

Visualisation with R

and ggplot2
Hadley Wickham
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University

April 2011
Friday, April 15, 2011
1. Important info
2. Diving in: scatterplots & aesthetics
3. Facetting
4. Geoms
5. Histograms and barcharts

Friday, April 15, 2011


HE LLO
my name is

Hadley
Friday, April 15, 2011
http://had.co.nz/
courses/11-thatcamp

Friday, April 15, 2011


Diving in

Friday, April 15, 2011


Learning a new
language is hard!
Friday, April 15, 2011
Scatterplot basics
install.packages("ggplot2")
library(ggplot2)

?mpg
head(mpg)
str(mpg)
summary(mpg)

qplot(displ, hwy, data = mpg)

Friday, April 15, 2011


Scatterplot basics
install.packages("ggplot2")
library(ggplot2)

?mpg
head(mpg)
str(mpg)
Always explicitly
summary(mpg) specify the data

qplot(displ, hwy, data = mpg)

Friday, April 15, 2011


40


35 ●

● ●
● ●
● ●●

30 ● ● ●
● ● ● ● ●● ●
hwy

● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ●● ●● ● ● ● ●

25 ● ● ● ●● ● ● ● ●
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●

20 ● ● ●●
● ● ● ●● ●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●

15 ● ●● ●●● ● ●
● ●

2 3 4 5 6 7
qplot(displ, hwy, data = mpg)
displ
Friday, April 15, 2011
Additional variables

Can display additional variables with


aesthetics (like shape, colour, size) or
facetting (small multiples displaying
different subsets)

Friday, April 15, 2011


40


35 ●

● ● class
● ●
● ●●
● 2seater
30 ● ● ● ● compact
● ● ● ● ●● ●
● midsize
hwy

● ● ● ● ●
● ● ●● ● ● ● ● ● minivan
● ● ● ●● ● ●● ●● ● ● ● ●

25 ● ● ● ●● ● ● ● ● ● pickup
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● subcompact
● ● ● ● ● ● suv
● ●

20 ● ● ●●
● ● ● ●● ●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●

15 ● ●● ●●● ● ●
● ●

2 3 4 5 6 7
qplot(displ, hwy, colour
displ = class, data = mpg)
Friday, April 15, 2011

40


35 ●

● ● class
● ●
● ●●
● 2seater
30 ● ● ● ● compact
● ● ● ● ●● ●
● midsize
hwy

● ● ● ● ●
● ● ●● ● ● ● ● ● minivan
● ● ● ●● ● ●● ●● ● ● ● ●

25 ● ● ● ●● ● ● ● ● ● pickup
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● subcompact
● ● ● ● ● ● suv
● ●

20 ● ● ●●
● ● ● ●● ●
● ● ● ● ● ●● Legend chosen and
displayed automatically.
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●

15 ● ●● ●●● ● ●
● ●

2 3 4 5 6 7
qplot(displ, hwy, colour
displ = class, data = mpg)
Friday, April 15, 2011
Your turn
Experiment with colour, size, and shape
aesthetics.
What’s the difference between discrete or
continuous variables?
What happens when you combine
multiple aesthetics?

Friday, April 15, 2011


Discrete Continuous

Rainbow of Gradient from


Colour
colours red to blue

Linear mapping
Discrete size
Size between radius
steps
and value

Different shape
Shape Doesn’t work
for each

Friday, April 15, 2011


Facetting

Friday, April 15, 2011


Faceting

Small multiples displaying different


subsets of the data.
Useful for exploring conditional
relationships. Useful for large data.

Friday, April 15, 2011


Your turn
qplot(displ, hwy, data = mpg) +
facet_grid(. ~ cyl)
qplot(displ, hwy, data = mpg) +
facet_grid(drv ~ .)
qplot(displ, hwy, data = mpg) +
facet_grid(drv ~ cyl)
qplot(displ, hwy, data = mpg) +
facet_wrap(~ class)

Friday, April 15, 2011


Summary

facet_grid(): 2d grid, rows ~ cols, . for


no split
facet_wrap(): 1d ribbon wrapped into 2d

Friday, April 15, 2011


Aside: workflow

Keep a copy of the slides open so that


you can copy and paste the code.

Friday, April 15, 2011


Geoms

Friday, April 15, 2011


What’s the ● ●

40 problem with ●

this plot? ● ●

35 ●

● ●
● ● ●
● ● ● ●

30 ● ● ●
● ● ● ● ● ●
hwy

● ● ●
● ● ● ● ●
● ● ● ● ● ●

25 ● ● ● ● ● ●
● ● ● ●
● ● ●
● ● ●

20 ● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ●

15 ●

10 15 20 25 30 35
qplot(cty, hwy, data = mpg)
cty
Friday, April 15, 2011
● ●

40




●●
35

● ●
● ● ●

● ●●
● ● ●


30 ●


●●
● ●●●●
● ●●

● ● ●●●● ● ●
● ●● ●
hwy

●● ●
● ● ●● ● ●


● ●●● ● ●
● ●● ● ● ●
●●●● ● ●● ●
● ●
●● ●
●●●



● ●

● ●● ●●●
● ●
25 ● ●● ● ● ● ●
● ●● ●●
●●
● ●
● ●
● ●
●●● ●
● ●

● ● ●●

● ●


●● ●

20 ● ●● ● ●●

● ●●●●●●● ●● ●
● ●


●●● ●● ● ●
● ● ●● ●
●●● ●●● ●
● ●● ●●●
●● ●
● ●

●● ●●●
● ● ●●● ●


15 ●

●●●
● ●


●●
●●

10 15 20 25 30 35
qplot(cty, hwy, data = mpg,
cty geom = "jitter")
Friday, April 15, 2011
● ●

40




●●
35

● ●
● ● ●

● ●●
● ● ●


30 ●


●●
● ●●●●
● ●●

● ● ●●●● ● ●
● ●● ●
hwy

●● ●
● ● ●● ● ●


● ●●● ● ●
● ●● ● ● ●
●●●● ● ●● ●
● ●
●● ●
●●●



● ●

● ●● ●●●
● ●
25 ● ●● ● ● ● ●
● ●● ●●
●●
● ●
● ●
● ●
●●● ●
● ●

● ● ●●

● ●


●● ●

20 ● ●● ● ●●

● ●●●●●●● ●● ●
● ●


●●● ●● ● ●
● ● ●● ●
●●● ●●● ●
● ●● ●●●
●● ●
● ●

●● ●●●
● ● ●●● ●


15 ●

●●●
● ●


●●
geom controls
●●
“type” of plot
10 15 20 25 30 35
qplot(cty, hwy, data = mpg,
cty geom = "jitter")
Friday, April 15, 2011
● ●

40


35 ●

● ●
● ●
● ●

30 ● ●
● ● ●
hwy

● ● ●
● ● ● ●
● ● ● ● ●

25 ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●

20 ● ● ●
● ●
● ●
● ● ●
● ●

15 ● ●

● ●

2seater compact midsize minivan pickup subcompact suv


qplot(class, hwy, data = mpg)
class
Friday, April 15, 2011
How could ● ●

we improve

40

this plot? ●

35 ●

Brainstorm ●





30
for 1 minute.
● ●
● ● ●
hwy

● ● ●
● ● ● ●
● ● ● ● ●

25 ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●

20 ● ● ●
● ●
● ●
● ● ●
● ●

15 ● ●

● ●

2seater compact midsize minivan pickup subcompact suv


qplot(class, hwy, data = mpg)
class
Friday, April 15, 2011
● ●

40


35 ●

● ●
● ●
● ●

30 ● ●
● ● ●
hwy

● ● ●
● ● ● ●
● ● ● ● ●

25 ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●

20 ● ● ●
● ●
● ●
● ● ●
● ●

15 ● ●

● ●

pickup suv minivan 2seater midsize subcompact compact


reorder(class, hwy)
Friday, April 15, 2011
● ●

40


35 ●

● ●
● ●
● ●

30 ● ●
● ● ●
hwy

● ● ●
● ● ● ●
● ● ● ● ●

25 ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●

20 ● ● ●
● ●
● ●
● ● ●
● ●

15 ● ●

Incredibly useful

technique!
● ●

pickup suv minivan 2seater midsize subcompact compact


qplot(reorder(class, hwy), hwy,
reorder(class, hwy) data = mpg)
Friday, April 15, 2011
● ●

40



● ●
35



● ● ●
● ●
● ● ●
● ● ●

30 ● ● ●

● ● ● ●● ● ●● ● ●
●● ● ●
● ●
● ● ●● ● ●
● ●
● ● ●● ●
hwy

● ● ●● ●●
● ● ● ●● ● ●
● ● ●●● ● ●
● ●● ●● ● ●
● ●
● ● ●● ● ●
● ●●

● ● ● ● ●●
● ● ●
25 ● ● ● ● ●●
●●

●●
● ● ● ● ● ● ●
● ●
●● ● ●
● ●
● ● ● ●

● ● ●
● ●● ●


● ●
20 ●
● ●
● ● ●● ● ●
● ●● ●
●● ● ● ●●

● ●
● ●
● ●● ● ●●●
● ● ●●●●●
● ●●
● ● ● ●● ● ● ●
●● ● ● ●● ● ● ● ●
●● ● ● ● ●

● ● ●
15 ● ● ● ●● ● ●

● ● ● ● ●

pickup suv minivan 2seater midsize subcompact compact


qplot(reorder(class, hwy),reorder(class,
hwy, data hwy) = mpg, geom = "jitter")
Friday, April 15, 2011
● ●

40

35 ●

30
hwy


25 ●


20

15

● ●

pickup suv minivan 2seater midsize subcompact compact


qplot(reorder(class, hwy), hwy, data hwy)
reorder(class, = mpg, geom = "boxplot")
Friday, April 15, 2011

● ●

● ●

40

● ●
● ●

35 ● ● ●



● ● ● ●
● ● ●
● ●● ●
30 ● ● ●●●
●●
●●
● ● ●●● ● ●●
● ● ● ● ●● ● ●●
● ● ● ●
hwy


● ●
● ● ● ●● ●●
● ● ● ●
● ● ● ●
● ● ●●
● ● ● ●●● ● ●●
● ● ●● ●
●● ● ● ●

● ● ●
● ● ● ●● ●

25 ● ● ● ● ●● ● ●● ●
● ● ● ●
● ● ● ● ●
● ●●
● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ●
● ● ●
● ● ●


20 ● ● ● ●● ●● ●
● ● ●

● ●● ● ● ●
●●●● ● ●

● ● ● ● ●

● ●
● ● ● ●
● ●●● ● ●● ●●● ● ●
●● ●●● ●● ●
● ●● ●
● ● ● ● ●
● ●
● ●● ●
● ● ●
15 ● ● ● ● ●● ●

● ●
● ● ●

qplot(reorder(class,
pickup suv
hwy), 2seater
minivan
hwy, data midsize
= subcompact
mpg, compact
geom = c("jitter", "boxplot"))
reorder(class, hwy)
Friday, April 15, 2011
Your turn

Read the help for reorder. Redraw the


previous plots with class ordered by
median hwy.
How would you put the jittered points on
top of the boxplots?

Friday, April 15, 2011


Aside: coding strategy

At the end of each interactive session, you


want a summary of everything you did. Two
options:
1. Save everything you did with savehistory()
then remove the unimportant bits.
2. Build up the important bits as you go.
(this is how I work)

Friday, April 15, 2011


Diamonds

Friday, April 15, 2011


Diamonds data
~54,000 round diamonds from
http://www.diamondse.info/
Carat, colour, clarity, cut
Total depth, table, depth,
width, height
Price

Friday, April 15, 2011


x
table width

depth = z / diameter
table = table width / x * 100

Friday, April 15, 2011


Histogram &
bar charts

Friday, April 15, 2011


Histograms and
barcharts

Used to display the distribution of a


variable
Categorical variable → bar chart
Continuous variable → histogram

Friday, April 15, 2011


Always
experiment with
the bin width!
Friday, April 15, 2011
Examples
# With only one variable, qplot guesses that
# you want a bar chart or histogram
qplot(cut, data = diamonds)

qplot(carat, data = diamonds)


qplot(carat, data = diamonds, binwidth = 1)
qplot(carat, data = diamonds, binwidth = 0.1)
qplot(carat, data = diamonds, binwidth = 0.01)
resolution(diamonds$carat)

last_plot() + xlim(0, 3)

Friday, April 15, 2011


Examples
# With only one variable, qplot guesses that
# you want a bar chart or histogram
qplot(cut, data = diamonds)

qplot(carat, data = diamonds)


qplot(carat, data = diamonds, binwidth = 1)
Common
qplot(carat, ggplot2
data = diamonds, binwidth = 0.1)
technique:
qplot(carat, adding
data = diamonds, binwidth = 0.01)
together plot
resolution(diamonds$carat)
components

last_plot() + xlim(0, 3)

Friday, April 15, 2011


qplot(table, data = diamonds, binwidth = 1)

# To zoom in on a plot region use xlim() and ylim()


qplot(table, data = diamonds, binwidth = 1) +
xlim(50, 70)
qplot(table, data = diamonds, binwidth = 0.1) +
xlim(50, 70)
qplot(table, data = diamonds, binwidth = 0.1) +
xlim(50, 70) + ylim(0, 50)

# Note that this type of zooming discards data


outside of the plot regions
# See coord_cartesian() for an alternative

Friday, April 15, 2011


Additional variables

As with scatterplots can use aesthetics


or faceting. Using aesthetics creates
pretty, but ineffective, plots.
The following examples show the
difference, when investigation the
relationship between cut and depth.

Friday, April 15, 2011


4000

3000
count

2000

1000

56 58 60 62 64 66 68 70
qplot(depth, data = diamonds,
depth binwidth = 0.2)
Friday, April 15, 2011
4000

3000

cut
Fair
Good
count

2000 Very Good


Premium
Ideal

1000

qplot(depth,
56 data
58 =60 diamonds,
62 64 binwidth
66 68 = 0.2,
70
fill = cut) + xlim(55,depth
70)
Friday, April 15, 2011
4000

3000

cut
Fair
Good
count

2000 Very Good


Premium
Ideal

1000

Fill is the aesthetic


0
for fill colour
qplot(depth,
56 data
58 =60 diamonds,
62 64 binwidth
66 68 = 0.2,
70
fill = cut) + xlim(55,depth
70)
Friday, April 15, 2011
Fair Good Very Good

2500

2000

1500

1000

500

0
count

Premium Ideal

2500

2000

1500

1000

500

0
qplot(depth, data
56 58 60 62 64 66= 68diamonds,
70 56 58 60 binwidth
62 64 66 68 =
70 0.2)
56 58 +
60 62 64 66 68 70

xlim(55, 70) + facet_wrap(~depth cut)


Friday, April 15, 2011
Your turn

Explore the distribution of price.


How does it vary with colour, or cut, and
clarity?
Practice zooming in on regions of interest.

Friday, April 15, 2011


Where
next?
Friday, April 15, 2011
Friday, April 15, 2011
This work is licensed under the Creative
Commons Attribution-Noncommercial 3.0 United
States License. To view a copy of this license,
visit http://creativecommons.org/licenses/by-nc/
3.0/us/ or send a letter to Creative Commons,
171 Second Street, Suite 300, San Francisco,
California, 94105, USA.

Friday, April 15, 2011

You might also like