Professional Documents
Culture Documents
Hukum Chandra
ICAR-National Fellow & Principal Scientist
Email: hchandra@iasri.res.in
2
Background
R is available as Free Software and maintained by volunteers
4
R Studio: a free Integrated Development Environment (IDE)
for R (recommended!!)
If you install R and R Studio, then you only need to run R Studio
5
To download R software
In any web browser (e.g. Microsoft Internet Explorer), go to R
webpage: http://www.r-project.org
Downloads: CRAN - on the left hand side menu on the screen, click on
CRAN which is under the Download item
6
Set your Mirror- Pick a country site from which to download (for
example, IIT Madras India, but really you can pick any, all this effects
is download speed).
7
On your right hand side you will see Download R for Windows
8
This brings you to a page where you select the part of R you need. While
you may later want to download the set of user contributed functions, for
now just click on base, which gets you the basic R program
13
To run R program code
The main active window within the R environment is the R Console
We can also run blocks of code. Use the R supplied editor or the
Windows-supplied editor Notepad to display and edit our R program
code.
14
To open the editor
We are using the R-supplied editor to display and edit our R program
code, although any general-purpose editor will suffice. Open R-Editor by
going to the File button and clicking on:
15
Use the code editor to enter R commands
16
Outputs are shown in Console
17
Introduction to RStudio
18
Before start
Preferred Assignment operator <-
Instead of usual equal (=) symbol can be used
Command getwd()
Gets working directory
Command setwd()
Sets working directory setwd("E:\\R Course")
19
Getting started with R
R can be used in many ways
?rnorm
?log
help(rnorm)
20
> 4+5
[1] 9
> 4/(3+5)
[1] 0.5
> sqrt(9)+5^2
[1] 28
> sin(pi/2)-log(exp(1))
[1] 0
> exp(2)
[1] 7.389056
> rnorm(10)
[1] 1.78896720 -1.13840718 -0.14144555 -0.06581805 -0.36301621 -0.47357570
[7] 1.17758935 0.33800009 -0.03361512 1.43694640
Here [7] indicates that 1.17758935 is the seventh element in the vector
21
Help and documentation
Roughly, three different form of documentation for the R system for
statistical computing may be distinguished:
Online help that comes with the base distribution or package
Electronic manuals and
Published work in the form of books etc
help function : Help about a specific command can be had by writing a
question mark before the command, for instance:
> ?log
As an alternative, help can be used; in this case, help (log) or
help (mean)
22
23
Entering and Manipulating Data in R
Assignments - to store immediate results
To assign the value 5 to the variable a, enter
a <- 5
a
[1] 5
b <- 9
b
[1] 9
a+b
[1] 14
a-b
[1] -4
a+b-a^2+(1/b)+(a^-b)
[1] -10.88889
25
OBJECTS
R has five basic classes of objects:
character
numeric (real numbers)
integer
complex
logical (True/False)
26
ATTRIBUTES
27
Data Types in R
Scalars (numeric, character etc)
Vectors
Matrices
Frames
28
Vectors and Matrices
Vectors and matrices are of great importance in many numerical
problems since one can not do much statistics on single numbers
Creating Vectors
29
Working with Vectors
To create a vector named tempdata and assign the values 5, 3, 8 to
it, we write as follows:
tempdata <- c(5,-3,8)
The construct c() is used to define a vector. We can do calculations
with vectors just like ordinary numbers as long as they are of the same
length
Vectors can be manipulated, for instance by adding a constant to all
elements
tempdata <- c(5,-3,8)
myconst=50
myconst+tempdata
31
Component extraction
x<- c(2,3,1,5,4,6,5,7,6,8)
y <- c(10, 12, 14, 13, 34, 23, 12, 34, 25, 43)
x[1:4] # 1 to 4 elements
[1] 2 3 1 5
x[x > 4]
[1] 5 6 5 7 6 8
32
u <- x > 4
u
[1] FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
x[u]
[1] 5 6 5 7 6 8
Functions on vectors:
length(x) #To compute length of data in x.
[1] 10
sum(x^2)
[1] 265
33
mean(y)
[1] 22
sum((x-mean(x))^2)
[1] 44.1
34
summary(x^2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 10.75 25.00 26.50 36.00 64.00
Some calculations
sum(weight)
mean(weight) = sum(weight) / length(weight)
35
Rep function
Function repeats a number /vector desired number of times
r1 <- rep(5,2)
r1
[1] 5 5
r2=rep(c("Imphal","Delhi"),3)
r2
[1] "Imphal" "Delhi" "Imphal" "Delhi" "Imphal" "Delhi"
r3=rep(c("Imphal","Delhi"),each=3)
r3
[1] "Imphal" "Imphal" "Imphal" "Delhi" "Delhi" "Delhi"
36
Slightly more complicated example
The rule of thumb is that the BMI for a normal weight individual
should be between 20 and 25, and we want to know if our data
deviate systematically from that.
We can use a one sample t test to assess whether the 6 persons BMI
can be assumed to have mean 22.5 given that they come from a
normal distribution.
Although you might not be knowing about t test but example is just to
give some indication of what real statistical output look like
37
t test (see ? t.test)
data: bmi
t = -0.22855, df = 5, p-value = 0.8283
The p value is not small, indicating that it is not at all unlikely to get data
like those observed if the mean were in fact 22.5
38
Packages in R
Several packages available to enhance R capabilities of data analysis
(last count 4955)
The base distribution already comes with some high priority add on
packages, e.g., boot, nlme, stats, grid, foreign, MASS, spatial etc
library(stats)
#To get results of t-test for comparing population means of x and y when
variances are not equal.
t.test(x,y)
# To get results for usual t-test when variances are equal. If T is replaced
by F then it is equal to t.test(x, y)
t.test(x,y,var.equal=T)
?t.test
40
library(stats)
x <- c(2,3,1,5,4,6,5,7,6,8)
y <- c(10, 12, 14, 13, 34, 23, 12, 34, 25, 43)
mean(x)
mean(y)
var(x,y)
cor(x,y)
t.test(x)
t.test(x,y)
t.test(x,y, var.equal=T)
41
F Test to Compare Two Variances
Performs an F test to compare the variances of two samples from normal
populations.
var.test(x, ...)
x1 <- rnorm(100, mean = 0, sd = 2)
y1 <- rnorm(60, mean = 1, sd = 1)
var.test(x, y) # Do x and y have the same variance?
The lower p-value means test is significant and hypothesis that sample
data comes from normal distribution is rejected
shapiro.test (bmi)
yy=rnorm(100,5,1)
shapiro.test(yy)
42
Nonparametric Tests of Group Differences
43
Matrices
Two Dimensional structure of same type
The commands rbind and cbind can be used to merge row or column
vectors to matrices
x <- c(1,2,3)
y <- c(4,5,6)
A = cbind(x,y)
B = rbind(x,y)
C = t(B)
# The last command gives the matrix transpose of B.
44
Create matrices: matrix(c(1,2,3,4,5,6,7,8,9),nrow=3,ncol=3,byrow=T)
z<- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, byrow=T)
z
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
45
Component extraction
A[r,] - rth row of object A
A[,c] - cth column of object A
A[r,c] - entry in row r and column c of object A
A[A<10] - extract all elements of A that are smaller than 10
z[2,3]
[1] 6
z[,1]
[1] 1 4 7
z[1,]
[1] 1 2 3
46
Arrays
Arrays are similar to matrices, but can be of more than 2 dimensions
Useful in programming new statistical methods
Natural Extension of Matrices
Must be of single type
Data Frame
Data frames - similar to tables (databases), dataset (SAS/SPSS) etc.
Consists of columns of different types
More general than a matrix
Columns Variables; Rows Observations
Convenient to hold all the data required for a data analysis
47
Data Frame - Creation
Created using many ways
Function data.frame function
General syntax is data.frame(col1,col2,) where col1, col2, etc. are
columns of same or different data types (numeric/character/ logical)
Factors
R functions treat nominal, ordinal variables differently as compared to
continuous variables
In R, nominal and ordinal variables are called factors
Use factor() function to make any variable as a factor
48
Handling Data
DATA FRAMES
Data frames are used to store tabular data
Each element of the list can be thought of as a column and the length
of each element of the list is the number of rows
49
Data frames are usually created by calling read.table() or read.csv()
51