Professional Documents
Culture Documents
and ggplot2
Hadley Wickham
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
April 2011
Friday, April 15, 2011
1. Important info
2. Diving in: scatterplots & aesthetics
3. Facetting
4. Geoms
5. Histograms and barcharts
Hadley
Friday, April 15, 2011
http://had.co.nz/
courses/11-thatcamp
?mpg
head(mpg)
str(mpg)
summary(mpg)
?mpg
head(mpg)
str(mpg)
Always explicitly
summary(mpg) specify the data
40
●
●
35 ●
●
● ●
● ●
● ●●
30 ● ● ●
● ● ● ● ●● ●
hwy
● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ●● ●● ● ● ● ●
25 ● ● ● ●● ● ● ● ●
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●
20 ● ● ●●
● ● ● ●● ●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●
15 ● ●● ●●● ● ●
● ●
2 3 4 5 6 7
qplot(displ, hwy, data = mpg)
displ
Friday, April 15, 2011
Additional variables
40
●
●
35 ●
●
● ● class
● ●
● ●●
● 2seater
30 ● ● ● ● compact
● ● ● ● ●● ●
● midsize
hwy
● ● ● ● ●
● ● ●● ● ● ● ● ● minivan
● ● ● ●● ● ●● ●● ● ● ● ●
25 ● ● ● ●● ● ● ● ● ● pickup
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● subcompact
● ● ● ● ● ● suv
● ●
20 ● ● ●●
● ● ● ●● ●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●
15 ● ●● ●●● ● ●
● ●
2 3 4 5 6 7
qplot(displ, hwy, colour
displ = class, data = mpg)
Friday, April 15, 2011
●
40
●
●
35 ●
●
● ● class
● ●
● ●●
● 2seater
30 ● ● ● ● compact
● ● ● ● ●● ●
● midsize
hwy
● ● ● ● ●
● ● ●● ● ● ● ● ● minivan
● ● ● ●● ● ●● ●● ● ● ● ●
25 ● ● ● ●● ● ● ● ● ● pickup
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● subcompact
● ● ● ● ● ● suv
● ●
20 ● ● ●●
● ● ● ●● ●
● ● ● ● ● ●● Legend chosen and
displayed automatically.
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●
15 ● ●● ●●● ● ●
● ●
2 3 4 5 6 7
qplot(displ, hwy, colour
displ = class, data = mpg)
Friday, April 15, 2011
Your turn
Experiment with colour, size, and shape
aesthetics.
What’s the difference between discrete or
continuous variables?
What happens when you combine
multiple aesthetics?
Linear mapping
Discrete size
Size between radius
steps
and value
Different shape
Shape Doesn’t work
for each
40 problem with ●
this plot? ● ●
●
35 ●
●
● ●
● ● ●
● ● ● ●
30 ● ● ●
● ● ● ● ● ●
hwy
● ● ●
● ● ● ● ●
● ● ● ● ● ●
25 ● ● ● ● ● ●
● ● ● ●
● ● ●
● ● ●
●
20 ● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
15 ●
●
10 15 20 25 30 35
qplot(cty, hwy, data = mpg)
cty
Friday, April 15, 2011
● ●
40
●
●
●
●●
35
●
● ●
● ● ●
●
● ●●
● ● ●
●
●
30 ●
●
●
●●
● ●●●●
● ●●
●
● ● ●●●● ● ●
● ●● ●
hwy
●● ●
● ● ●● ● ●
●
●
● ●●● ● ●
● ●● ● ● ●
●●●● ● ●● ●
● ●
●● ●
●●●
●
●
●
● ●
●
● ●● ●●●
● ●
25 ● ●● ● ● ● ●
● ●● ●●
●●
● ●
● ●
● ●
●●● ●
● ●
●
● ● ●●
●
● ●
●
●
●● ●
●
20 ● ●● ● ●●
●
● ●●●●●●● ●● ●
● ●
●
●
●●● ●● ● ●
● ● ●● ●
●●● ●●● ●
● ●● ●●●
●● ●
● ●
●
●● ●●●
● ● ●●● ●
●
●
15 ●
●
●●●
● ●
●
●
●
●●
●●
10 15 20 25 30 35
qplot(cty, hwy, data = mpg,
cty geom = "jitter")
Friday, April 15, 2011
● ●
40
●
●
●
●●
35
●
● ●
● ● ●
●
● ●●
● ● ●
●
●
30 ●
●
●
●●
● ●●●●
● ●●
●
● ● ●●●● ● ●
● ●● ●
hwy
●● ●
● ● ●● ● ●
●
●
● ●●● ● ●
● ●● ● ● ●
●●●● ● ●● ●
● ●
●● ●
●●●
●
●
●
● ●
●
● ●● ●●●
● ●
25 ● ●● ● ● ● ●
● ●● ●●
●●
● ●
● ●
● ●
●●● ●
● ●
●
● ● ●●
●
● ●
●
●
●● ●
●
20 ● ●● ● ●●
●
● ●●●●●●● ●● ●
● ●
●
●
●●● ●● ● ●
● ● ●● ●
●●● ●●● ●
● ●● ●●●
●● ●
● ●
●
●● ●●●
● ● ●●● ●
●
●
15 ●
●
●●●
● ●
●
●
●
●●
geom controls
●●
“type” of plot
10 15 20 25 30 35
qplot(cty, hwy, data = mpg,
cty geom = "jitter")
Friday, April 15, 2011
● ●
40
●
●
35 ●
●
● ●
● ●
● ●
30 ● ●
● ● ●
hwy
● ● ●
● ● ● ●
● ● ● ● ●
25 ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●
20 ● ● ●
● ●
● ●
● ● ●
● ●
15 ● ●
●
● ●
we improve
●
40
this plot? ●
●
35 ●
●
Brainstorm ●
●
●
●
●
●
30
for 1 minute.
● ●
● ● ●
hwy
● ● ●
● ● ● ●
● ● ● ● ●
25 ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●
20 ● ● ●
● ●
● ●
● ● ●
● ●
15 ● ●
●
● ●
40
●
●
35 ●
●
● ●
● ●
● ●
30 ● ●
● ● ●
hwy
● ● ●
● ● ● ●
● ● ● ● ●
25 ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●
20 ● ● ●
● ●
● ●
● ● ●
● ●
15 ● ●
●
● ●
40
●
●
35 ●
●
● ●
● ●
● ●
30 ● ●
● ● ●
hwy
● ● ●
● ● ● ●
● ● ● ● ●
25 ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●
20 ● ● ●
● ●
● ●
● ● ●
● ●
15 ● ●
Incredibly useful
●
technique!
● ●
40
●
●
● ●
35
●
●
●
● ● ●
● ●
● ● ●
● ● ●
●
30 ● ● ●
● ● ● ●● ● ●● ● ●
●● ● ●
● ●
● ● ●● ● ●
● ●
● ● ●● ●
hwy
● ● ●● ●●
● ● ● ●● ● ●
● ● ●●● ● ●
● ●● ●● ● ●
● ●
● ● ●● ● ●
● ●●
●
● ● ● ● ●●
● ● ●
25 ● ● ● ● ●●
●●
●
●●
● ● ● ● ● ● ●
● ●
●● ● ●
● ●
● ● ● ●
●
● ● ●
● ●● ●
●
●
● ●
20 ●
● ●
● ● ●● ● ●
● ●● ●
●● ● ● ●●
●
● ●
● ●
● ●● ● ●●●
● ● ●●●●●
● ●●
● ● ● ●● ● ● ●
●● ● ● ●● ● ● ● ●
●● ● ● ● ●
●
● ● ●
15 ● ● ● ●● ● ●
●
●
● ● ● ● ●
40
35 ●
30
hwy
●
●
25 ●
●
●
●
20
15
● ●
● ●
40
● ●
● ●
35 ● ● ●
●
●
●
● ● ● ●
● ● ●
● ●● ●
30 ● ● ●●●
●●
●●
● ● ●●● ● ●●
● ● ● ● ●● ● ●●
● ● ● ●
hwy
●
● ●
● ● ● ●● ●●
● ● ● ●
● ● ● ●
● ● ●●
● ● ● ●●● ● ●●
● ● ●● ●
●● ● ● ●
●
● ● ●
● ● ● ●● ●
●
25 ● ● ● ● ●● ● ●● ●
● ● ● ●
● ● ● ● ●
● ●●
● ● ● ● ●
● ●
● ● ● ● ●
● ●
● ●
● ● ●
● ● ●
●
●
20 ● ● ● ●● ●● ●
● ● ●
●
● ●● ● ● ●
●●●● ● ●
●
● ● ● ● ●
●
● ●
● ● ● ●
● ●●● ● ●● ●●● ● ●
●● ●●● ●● ●
● ●● ●
● ● ● ● ●
● ●
● ●● ●
● ● ●
15 ● ● ● ● ●● ●
●
●
● ●
● ● ●
●
●
qplot(reorder(class,
pickup suv
hwy), 2seater
minivan
hwy, data midsize
= subcompact
mpg, compact
geom = c("jitter", "boxplot"))
reorder(class, hwy)
Friday, April 15, 2011
Your turn
depth = z / diameter
table = table width / x * 100
last_plot() + xlim(0, 3)
last_plot() + xlim(0, 3)
3000
count
2000
1000
56 58 60 62 64 66 68 70
qplot(depth, data = diamonds,
depth binwidth = 0.2)
Friday, April 15, 2011
4000
3000
cut
Fair
Good
count
1000
qplot(depth,
56 data
58 =60 diamonds,
62 64 binwidth
66 68 = 0.2,
70
fill = cut) + xlim(55,depth
70)
Friday, April 15, 2011
4000
3000
cut
Fair
Good
count
1000
2500
2000
1500
1000
500
0
count
Premium Ideal
2500
2000
1500
1000
500
0
qplot(depth, data
56 58 60 62 64 66= 68diamonds,
70 56 58 60 binwidth
62 64 66 68 =
70 0.2)
56 58 +
60 62 64 66 68 70