Professional Documents
Culture Documents
Hukum Chandra
ICAR-National Fellow & Principal Scientist
Email: hchandra@iasri.res.in
2
Simple Graphics
Graphics - one of the most important aspects of presentation and
analysis of data is generation of proper graphics
4
High-Level Plot Functions
5
Low-Level Plot Functions
6
Scatterplot and Line Graphs
Scatter plots: are useful for studying dependencies between variables.
The plot() function is used for producing scatterplots and line graphs
See ? plot
7
8
plot(x,y); grid()
plot(x,y, type="b", col="blue", lwd=1, lty=4, pch=5, main="My plot",
xlab="x axis", ylab="y axis")
grid(col="red")
9
Common arguments for plot()
type 1-character string denoting the plot type
xlim x limits, c(x1, x2)
ylim y limits, c(y1, y2)
main Main title for the plot
sub Sub title for the plot
xlab x-axis label
ylab y-axis label
col Color for lines and points, either a character string or a number
that indexes the palette()
pch Number referencing a plotting symbol or a character string
cex A number giving the character expansion of the plot symbols
lty Number referencing a line type
lwd Line width
10
plot(x,y,type="b",col="blue",lwd=1,lty=4,pch=5, main="My plot", xlab="x
axis", ylab="y axis")
grid(col="red")
text(8,2,"this is my example plot")
abline(h=1,v=4, col=c("darkred","green"), lty=c(1,4), lwd=c(4,6))
reg.lm=lm(x~y)
abline(reg.lm, col="red",lwd=6) #To add the regression line
11
There is wealth of plotting parameters you can set
plot(x,y)
plot(x,y, pch=16) # plot with new mark with dark circle
x1<- seq(1,5,0.1)
lines(x1,.5*x1) #lines will add (x,y) values
12
## EXAMPLES with Yield Data #########################
data2=read.csv("yielddata.csv",header=T)
plot(data2$Fert,data2$Yield)
grid()
plot(data2$Fert,data2$Yield,type="p",col="green",lwd=1,lty=4,pch=9, main="My
plot", xlab="x axis", ylab="y axis")
text(250000,30000,"this is my example plot")
abline(h=20000,v=200000, col=c("darkred","green"), lty=c(1,4), lwd=c(1,2))
reg.lm=lm(x~y)
abline(reg.lm, col="red",lwd=6) #To add the regression line
13
dx<- rnorm(20,5,5) ## generate 100 random number from standard normal distribution
dy<- rchisq(20,5) ## generate 100 random number from chisq distribution with mean 5
plot(dx,dy,pch=1)
fit<-lm(dx~dy)
abline(fit,col="red",lwd=4)
text(10,4,"Fitted line")
See ? plot
See ? points
14
x <- rnorm(50) ;y <- rnorm(50)
group <- rbinom(50, size=1, prob=.5)
# Basic Scatterplot
plot(x, y)
plot(x, y, xlab="X values", ylab="Y values", main="Simple Y vs X", pch=15, col="red")
3
2
1
y
0
-1
-2
-2 -1 0 1
15
# Distinguish between two separate groups
plot(x, y, xlab="X values", ylab="Y values", main="Grouped data Y vs X",
pch=ifelse(group==1, 5, 19), col=ifelse(group==1, "red", "blue"))
plot(x, y, type="n")
lines(sort(x), sort(y), type="b")
lines(cbind(sort(x),sort(y)), type="l", lty=1, col="blue")
17
plot(sort(x), type="n")
lines(sort(x), type="b", pch=8, col="red")
lines(sort(y), type="l", lty=6, col="blue")
18
Histogram and Density Plot
Histograms: used to study the distribution of continuous data, use
command hist.
hist: function to plot histogram
19
hist(u, density=20) #with shading
20
The sequence of commands below plots two histograms in one window
23
hist(u,freq=F,ylim = c(0,0.8))
curve(dnorm(x), col = 2, lty = 2, lwd = 2, add = TRUE)
24
# overlay normal curve with x-lab and ylim # colored normal curve
# Uses the observed mean and standard deviation for plotting the normal curve
m<-mean(u) ;std<-sqrt(var(u))
hist(u, density=20, breaks=20, prob=TRUE, xlab="x-variable", col="red",
ylim=c(0, 0.7), main="normal curve over histogram")
curve(dnorm(x, mean=m, sd=std), col="darkblue", lwd=2, add=TRUE)
25
hist(u, density=10, breaks=20, col="red", prob=TRUE, xlab="x-variable", ylim=c(0,0.8),
main="Density curve over histogram")
lines(density(u),col = "blue")
26
Boxplots
Boxplots: also a useful tool for studying data. It shows the median,
quartiles and possible outliers.
The R command is boxplot, which we use on the same variables as the
histogram:
# Basic boxplot
boxplot(u, xlab="my variable", boxwex=.4)
boxplot(u, xlab="my variable", boxwex=.6, col="blue", border= "red, lty=2,
lwd=2)
27
## we creat data: three variables
u1<- rnorm(100) ## generate 100 random number from standard normal distribution
u2<- rchisq(100,5) ## generate 100 random number from chisq distribution with mean 5
u3<- rnorm(100,5,1) ## generate 100 random number from normal distribution with mean 5, sd 1
boxplot(u1,u2,u3, boxwex=.4)
boxplot(u1,u2,u3, boxwex=c(.2,.4,.6),col=c("red","blue","green"))
28
variablename<-c("low","medium", "high")
boxplot(u1,u2,u3,names=variablename,boxwex=c(.2,.4,.6), col=c("red","blue","green"),
ylim=c(-5, 20), xlab="variable status")
boxplot(u1,u2,u3,names=variablename, boxwex=c(.2,.4,.6),col=c("red","blue","green"),ylim=c(-
5, 20),xlab="variable status", notch = TRUE)
## try
boxplot(u, xlab="my variable", pars = list(boxwex = 0.5, staplewex = .5, outwex = 0.5),plot = F)
boxplot(u, xlab="my variable", pars = list(boxwex = 0.5, staplewex = .5, outwex = 0.5),plot = T)
?boxplot
29
Barchart (or barplot)
The R command is barplot
MPCE <- c(400, 300,600,550,425)
Suppose data in MPCE are average MPCE of some states whose names are to be
assigned against their value. Following commands are required:
names(MPCE)<-c("UP","MP","Punjab","TN","WB")
To assign names of states. Double quotation mark means that names are
characters not numeric.
30
barplot(MPCE, names=names(MPCE),ylab="MPCE (Rs)", col = c("blue","red","gray","orange","black"))
600
500
400
MPCE (Rs)
300
200
100
0
UP MP Punjab TN WB
31
barplot(MPCE, space=2,names=names(MPCE),xlab="States", ylab="MPCE (Rs)", col =
c("blue","red","gray","orange","black"))
?barplot
32
You can plot more than one curve on a single plot, and label them via a
legend:
range <- seq(-10,10, by = 0.001)
norm1 <- dnorm(range, mean=0, sd=1)
norm2 <- dnorm(range, mean=1, sd=2)
plot(range,norm1, type="l", lty=1, col="red", main="Two Normal Distributions",
xlab="Range", ylab="Probability Density")
points(range, norm2, type="l", lty=2,col="blue")
legend(x=-10,y=0.4,legend= c("N(0,1)", "N(1,2)"), lty=c(1,2),col=c("red","blue"))
33
34
curve()
The function curve() draws a curve corresponding to a given function
If the function is written within curve() it needs to be a function of x
If you want to use a multiple argument function, use x for the argument
you wish to plot over
35
# Plot the gamma density
curve(dgamma(x, shape=2, scale=1), from=0, to=7, lwd=2, col="red")
# Plot multiple curves, notice that the first curve determines the x-axis
curve(dnorm, from=-3, to=5, lwd=2, col="red")
curve(dnorm(x, mean=2), lwd=2, col="blue", add=TRUE)
36
Clean out the workspace
rm(list=ls())
37
Saving Graphs
Graphs can be saved using several different formats, such as PDFs,
JPEGs, and BMPs, by using pdf(), jpeg() and bmp(), respectively
Graphics devices for BMP, JPEG, PNG and TIFF format bitmap files.
png(file="My Histogram.png",width=400,height=350) # Start graphics device
par(mar=c(5,4,2,2)+0.1) #margin size c(bottom, left, top, right)
m<-mean(u) ;std<-sqrt(var(u))
hist(u, density=20, breaks=20, prob=TRUE, xlab="x-variable", col="red", ylim=c(0, 0.7))
curve(dnorm(x, mean=m, sd=std), col="darkblue", lwd=2, add=TRUE)
dev.off() # Stop graphics device
#bmp(filename = "plot.bmp", )
#jpeg(filename = "plot.jpg",
#pdf("C://SavingExample.pdf", width=7, height=5)
38
# Create multiple pdfs of figures, with one pdf per figure
39
Packages
The base distribution comes with some high priority add on packages,
for example, boot, nlme, stats, grid, foreign, MASS, spatial etc
Adding Packages
Choose Install Packages from the Packages menu
Select a CRAN Mirror
Select a package (e.g. car)
Then use the library(package) function to load it for use (e.g.
library(car))
41
Load R PACKAGES
42
43
44
45
Alternative way
Load from local drive, first download from site
46
47
48
Package car (Companion to Applied Regression)
library(car)
Before starting with the use of any package it is advisable to go through its
documentation.
http://cran.r-project.org/web/packages/car/index.html
http://cran.r-project.org/web/packages/car/car.pdf
49
50
Creating Your Own Package
We may want to share our code with other people, or simply make it easier
to use ourselves. There are two popular ways of starting a new package:
Load all functions and data sets you want in the package into a clean
R session, and run package.skeleton(). The objects are sorted into
data and functions, skeleton help files are created for them using
prompt() and a DESCRIPTION file is created. The function then prints
out a list of things for you to do next
51
Structure of a package
The extracted sources of an R package are simply a directory
somewhere on your hard drive. The directory has the same name as the
package and the following contents:
52
Simple Scatterplot
? mtcars
mtcars
attach (mtcars)
plot(wt, mpg, main="Scatterplot Example", xlab="Car Weight ",
ylab="Miles Per Gallon ", pch=19)
53
# Add fit lines
54
The scatterplot( ) function in the car package offers many enhanced
features, including fit lines, marginal box plots, conditioning on a factor,
and interactive point identification
55
56
Scatterplot Matrices
57
The car package can condition the scatterplot matrix on a factor, and optionally
include lowess and linear best fit lines, and boxplot, densities, or histograms in
the principal diagonal, as well as rug plots in the margins of the cells.
58
The gclus package provides options to rearrange the variables so that
those with higher correlations are closer to the principal diagonal. It can
also color code the cells to reflect the size of the correlations.
# Scatterplot Matrices from the glus Package
library(gclus)
dta <- mtcars[c(1,3,5,6)] # get data
dta.r <- abs(cor(dta)) # get correlations
dta.col <- dmat.color(dta.r) # get colors
59
60
High Density Scatterplots
When there are many data points and significant overlap, scatterplots
become less useful
There are several approaches that be used when this occurs
The hexbin(x,y) function in the hexbin package provides bivariate
binning into hexagonal cells
61
Hexagonal Binning
3
Counts
60
2
56
53
1 49
45
42
0 38
y
34
30
-1
27
23
-2 19
16
12
-3 8
5
1
-4 -2 0 2
x
bin<-hexbin(x, y, xbins=50)
plot(bin, main="Hexagonal Binning")
Another option for a scatterplot with significant point overlap is the
sunflowerplot.
62
3D Scatterplots
# 3D Scatterplot
Load package scatterplot3d
library(scatterplot3d)
attach(mtcars)
63
# 3D Scatterplot with Coloring and Vertical Drop Lines
library(scatterplot3d)
attach(mtcars)
scatterplot3d(wt,disp,mpg, pch=16, highlight.3d=TRUE, type="h",col.axis="blue", main="3D
Scatterplot")
64
#3D Scatterplot with Coloring and Vertical Lines and Regression Plane
library(scatterplot3d)
attach(mtcars)
s3d <-scatterplot3d(wt,disp,mpg, pch=16, highlight.3d=TRUE,
type="h", main="3D Scatterplot")
fit <- lm(mpg ~ wt+disp)
s3d$plane3d(fit)
65
Spinning 3D Scatterplots
66
67
You can perform a similar function with the scatter3d(x, y, z) in the Rcmdr
package.
68