Lecture 1 R Introduction

R Software: An Overview
Hukum Chandra
ICAR-National Fellow & Principal Scientist
Email: hchandra@iasri.res.in
ICAR-Indian Agricultural Statistics Research Institute

Library Avenue, PUSA, New Delhi, India
www.iasri.res.in
Workshop Objective
The objective of this workshop is to provide an overview of the

basic R environment and its applications
This workshop is aimed as a starting point for any future

development with R
At the end of the workshop, you will be able to gain awareness of

the basic R language, importing/exporting data and manipulate
data using R
You will do exploratory data analysis, perform basic statistics and

build plots using R / R Studio
2
Background
R is available as Free Software and maintained by volunteers
R is a language and environment for statistical computing and

graphics
It provides a wide variety of statistical (linear and nonlinear

modelling, classical statistical tests, time-series analysis,
classification, clustering, ...) and graphical techniques
The development of the R system for statistical computing is

heavily influenced by the open source idea
R is extensible, can be expanded by installing packages
The base distribution of R is maintain by a small group of

statisticians, the R development Core Team
3
Initial developers: Ross Ihaka & Robert Gentleman
Maintained by top quality experts, continuous improvement
Available on all platforms (Linux, Mac, Windows)
Download the software from the internet : http://www.r-project.org/

(or Google Download R)
Free to install, no catches
4
R Studio: a free Integrated Development Environment (IDE)
for R (recommended!!)
Download from http://www.rstudio.com/
R Studio provides an easy way to access all of the information &

objects that you need in an easy & straight-forward manner
If you install R and R Studio, then you only need to run R Studio
5
To download R software
In any web browser (e.g. Microsoft Internet Explorer), go to R
webpage: http://www.r-project.org
Downloads: CRAN - on the left hand side menu on the screen, click on
CRAN which is under the Download item
6
Set your Mirror- Pick a country site from which to download (for
example, IIT Madras India, but really you can pick any, all this effects
is download speed).
7
On your right hand side you will see Download R for Windows
8
This brings you to a page where you select the part of R you need. While
you may later want to download the set of user contributed functions, for
now just click on base, which gets you the basic R program
Click there and click on base

9
Click on Download R 3.2.3 for Windows (62 megabytes, 32/64 bit)
R-3.2.3-win.exe and save it to your hard disc. At this point you should be asked (via a
prompt box) where you want to save the file. Pick a place on your PC to save this (e.g.,
c:\).
Latest available version of the software
It is an .exe file, which you can save in your hard disc

By double clicking on the name of this file, R is automatically installed. All you need to
do is follow the installation process
10
11
12
To open R software
The installation process automatically creates a shortcut for R
Double click this icon to open the R environment
Or Start > All Programs >R
R will open up with the appearance of a standard Windows
13
To run R program code
The main active window within the R environment is the R Console
R processes commands on a line by line basis
R works fundamentally by question and answer model
Consequently it is necessary to hit ENTER after typing in (or pasting) a

line of R code in order to get R to implement it
Here at the command prompt (the symbol >), we can enter R

commands which run instantly upon pressing the carriage return key
We can also run blocks of code. Use the R supplied editor or the
Windows-supplied editor Notepad to display and edit our R program
code.
14
To open the editor
We are using the R-supplied editor to display and edit our R program
code, although any general-purpose editor will suffice. Open R-Editor by
going to the File button and clicking on:
File > New Script
15
Use the code editor to enter R commands
Use RUN option to execute the commands (or Ctrl R)
16
Outputs are shown in Console
17
Introduction to RStudio
18
Before start
Preferred Assignment operator <-
Instead of usual equal (=) symbol can be used
Path separator forward slash (/) or two backward slashes

E:/ R Course/inputdata.txt
E:\\R Course\\inputdata.txt
Set Working Directory

Folder for input/output files (for R to look)
Command getwd()
Gets working directory
Command setwd()
Sets working directory setwd("E:\\R Course")
19
Getting started with R
R can be used in many ways
Simple calculations, vectors and graphics
To begin with, well use R as a calculator. Enter arithmetic expression

and receive results (second line is answer line). Try the following
4+5
4/(3+5)
sqrt(9)+5^2
sin(pi/2)-log(exp(1))
exp(2)
rnorm(10)
rnorm(25,5,10)
sqrt(9)-5^2+2 #To take the power of something, use the caret symbol (^)
## TRy this and see what you get
sqrt(9=)-5^2+2
?rnorm
?log
help(rnorm)
20
> 4+5
[1] 9
> 4/(3+5)
[1] 0.5
> sqrt(9)+5^2
[1] 28
> sin(pi/2)-log(exp(1))
[1] 0
> exp(2)
[1] 7.389056
> rnorm(10)
[1] 1.78896720 -1.13840718 -0.14144555 -0.06581805 -0.36301621 -0.47357570
[7] 1.17758935 0.33800009 -0.03361512 1.43694640
Here [7] indicates that 1.17758935 is the seventh element in the vector
21
Help and documentation
Roughly, three different form of documentation for the R system for
statistical computing may be distinguished:
Online help that comes with the base distribution or package
Electronic manuals and
Published work in the form of books etc
help function : Help about a specific command can be had by writing a
question mark before the command, for instance:
> ?log
As an alternative, help can be used; in this case, help (log) or
help (mean)
22
23
Entering and Manipulating Data in R
Assignments - to store immediate results
To assign the value 5 to the variable a, enter
a <- 5
a
[1] 5
b <- 9
b
[1] 9
a+b
[1] 14
a-b
[1] -4
a+b-a^2+(1/b)+(a^-b)
[1] -10.88889
msg <- hello

msg
[1] hello
The symbol <- (or alternatively use =) should be read as assigns.

Two character <- should be read as a single symbol: an arrow
pointing to the variable to which the value is assigned
24
A couple of other useful things
R is case-sensitive, for example, data, Data and DATA are three

different names in R
A comment in R code begins with a hash symbol (#)

- Any line starting with # is a comment not executed
Comment your code so you remember what it does
R scripts are simply text files with a .R extension
Use Ctrl + R to submit code
Use up and down arrows to cycle through previous commands in
console
Dont be afraid of errors; you wont break R
If you get stuck, Google is your friend
Spacing around operators is generally disregarded by R
However, adding a space in the middle of a <- changes the meaning to
less than followed by minus
25
OBJECTS
R has five basic classes of objects:
character
numeric (real numbers)
integer
complex
logical (True/False)
The most basic object is a vector
A vector can only contain objects of the same class

BUT: The one exception is a list, which is represented as a vector but
can contain objects of different classes (indeed, that's usually why we
use them)
26
ATTRIBUTES
R objects can have attributes

names, dim names
dimensions (e.g. matrices, arrays)
class
length
other user-defined attributes/metadata
Attributes of an object can be accessed using the attributes () function.
27
Data Types in R
Scalars (numeric, character etc)
Vectors
Matrices
Frames
28
Vectors and Matrices
Vectors and matrices are of great importance in many numerical
problems since one can not do much statistics on single numbers
Creating Vectors
The c() function can be used to create vectors of objects.

x <- c(0.5, 0.6) ## numeric
x <- c(TRUE, FALSE) ## logical
x <- c(T, F) ## logical
x <- c("a", "b", "c") ## character
x <- 9:29 ## integer
x <- c(1+0i, 2+4i) ## complex
Using the vector() function

x <- vector("numeric", length = 10)
x
[1] 0 0 0 0 0 0 0 0 0 0
29
Working with Vectors
To create a vector named tempdata and assign the values 5, 3, 8 to
it, we write as follows:
tempdata <- c(5,-3,8)
The construct c() is used to define a vector. We can do calculations
with vectors just like ordinary numbers as long as they are of the same
length
Vectors can be manipulated, for instance by adding a constant to all
elements
tempdata <- c(5,-3,8)
myconst=50
myconst+tempdata
weight<- c(60, 72, 57,90, 55,80)

height<-c(1.75, 1.80, 1.65, 1.90,1.55,1.85)
bmi<- weight/height^2
Here we note that operation is carried out element wise

30
Sequences
A vector x1 consisting of the integers between 1 and 10 can be created
by writing
X1 <- c(1:10) # 1:10 is short form for 1,2 , 3,,10
X2 <- c(1:5, 10:15)
X1
X2
Sequence function
X3 <- seq(1,10)
X3
1:n produces 1,2,,n
Function seq(from,to,by=) produces desired sequences
Vectors with sequences of numbers with particular increments can be
created with the seq command:
mydata1 <- seq(0,10,2) # integers between 0 and 10, with increment 2
31
Component extraction
x<- c(2,3,1,5,4,6,5,7,6,8)
y <- c(10, 12, 14, 13, 34, 23, 12, 34, 25, 43)
Elements of a vector can be accessed as

x[1] #The first element of the vector x
[1] 2
x[2] # The 2nd element of the vector x

[1] 3
x[1:4] # 1 to 4 elements
[1] 2 3 1 5
x[x > 4]
[1] 5 6 5 7 6 8
32
u <- x > 4
u
[1] FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
x[u]
[1] 5 6 5 7 6 8
Functions on vectors:
length(x) #To compute length of data in x.
[1] 10
sum(x) #To compute sum of data in x.

[1] 47
sum(x^2)
[1] 265
mean(x) #To compute mean of data in x.

[1] 4.7
33
mean(y)
[1] 22
var(x) #To compute variance of x.

[1] 4.9
sqrt(var(x)) # To compute standard deviation of x.

[1] 2.213594
sum((x-mean(x))^2)
[1] 44.1
sqrt(var(x))/mean(x)*100 #To compute coefficient of variation
#To compute summary features of data in x.

summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 3.25 5.00 4.70 6.00 8.00
34
summary(x^2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 10.75 25.00 26.50 36.00 64.00
Some calculations
sum(weight)
mean(weight) = sum(weight) / length(weight)
If we denote by mean(weight) by xbar; then

sd(weight) = sqrt(sum((weight- xbar)^2))/ length(weight))
cor(x,y) #To compute correlation coefficient between x and y.

var(x,y) #To compute covariance between x and y.
35
Rep function
Function repeats a number /vector desired number of times
r1 <- rep(5,2)
r1
[1] 5 5
r2=rep(c("Imphal","Delhi"),3)
r2
[1] "Imphal" "Delhi" "Imphal" "Delhi" "Imphal" "Delhi"
r3=rep(c("Imphal","Delhi"),each=3)
r3
[1] "Imphal" "Imphal" "Imphal" "Delhi" "Delhi" "Delhi"
36
Slightly more complicated example
The rule of thumb is that the BMI for a normal weight individual
should be between 20 and 25, and we want to know if our data
deviate systematically from that.
We can use a one sample t test to assess whether the 6 persons BMI
can be assumed to have mean 22.5 given that they come from a
normal distribution.
We can use function t.test
Although you might not be knowing about t test but example is just to
give some indication of what real statistical output look like
37
t test (see ? t.test)
t.test (bmi, mu=22.5)
One Sample t-test
data: bmi
t = -0.22855, df = 5, p-value = 0.8283
alternative hypothesis: true mean is not equal to 22.5
95 percent confidence interval:

20.35465 24.29501
sample estimates:
mean of x 22.32483
If mu is not given then t.test would use default mu=0
The p value is not small, indicating that it is not at all unlikely to get data
like those observed if the mean were in fact 22.5
38
Packages in R
Several packages available to enhance R capabilities of data analysis
(last count 4955)
For Complete List see http://cran.r-project.org/
Need to download and install required package(s)
Use Install Packages
The base distribution already comes with some high priority add on
packages, e.g., boot, nlme, stats, grid, foreign, MASS, spatial etc
The packages included as default in base distribution implement

standard statistical functionality, for example, linear models, classical
tests, a huge collection of high level plotting functions etc
Packages not included in the base distribution can be installed directly

from R prompt
39
Classical Tests
To load the library of classical tests statistics available with R software use
library(stats)
#To get results of t-test for comparing population means of x and y when
variances are not equal.
t.test(x,y)
# To get results for usual t-test when variances are equal. If T is replaced
by F then it is equal to t.test(x, y)
t.test(x,y,var.equal=T)
?t.test
40
library(stats)
x <- c(2,3,1,5,4,6,5,7,6,8)
y <- c(10, 12, 14, 13, 34, 23, 12, 34, 25, 43)
mean(x)
mean(y)
var(x,y)
cor(x,y)
t.test(x)
t.test(x,y)
t.test(x,y, var.equal=T)
var.test(x,y) #To compare variances of x and y
41
F Test to Compare Two Variances
Performs an F test to compare the variances of two samples from normal
populations.
var.test(x, ...)
x1 <- rnorm(100, mean = 0, sd = 2)
y1 <- rnorm(60, mean = 1, sd = 1)
var.test(x, y) # Do x and y have the same variance?
Shapiro-Wilk test of normality

Shapiro-Wilk test indicates that data are unlikely to have come from a
normal distribution.
shapiro.test ()
The lower p-value means test is significant and hypothesis that sample
data comes from normal distribution is rejected
shapiro.test (bmi)
yy=rnorm(100,5,1)
shapiro.test(yy)
42
Nonparametric Tests of Group Differences
R provides functions for carrying out Mann-Whitney U, Wilcoxon

Signed Rank, Kruskal Wallis, and Friedman tests
# Independent 2-group Mann-Whitney U Test
wilcox.test(y~A) # where y is numeric and A is A binary factor

wilcox.test(y,x) # where y and x are numeric
#Dependent 2-group Wilcoxon Signed Rank Test

wilcox.test(y1,y2,paired=TRUE) # where y1 and y2 are numeric
43
Matrices
Two Dimensional structure of same type
The commands rbind and cbind can be used to merge row or column
vectors to matrices
x <- c(1,2,3)
y <- c(4,5,6)
A = cbind(x,y)
B = rbind(x,y)
C = t(B)
# The last command gives the matrix transpose of B.
44
Create matrices: matrix(c(1,2,3,4,5,6,7,8,9),nrow=3,ncol=3,byrow=T)
z<- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, byrow=T)
z
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
45
Component extraction
A[r,] - rth row of object A
A[,c] - cth column of object A
A[r,c] - entry in row r and column c of object A
A[A<10] - extract all elements of A that are smaller than 10
z[2,3]
[1] 6
z[,1]
[1] 1 4 7
z[1,]
[1] 1 2 3
46
Arrays
Arrays are similar to matrices, but can be of more than 2 dimensions
Useful in programming new statistical methods
Natural Extension of Matrices
Must be of single type
Data Frame
Data frames - similar to tables (databases), dataset (SAS/SPSS) etc.
Consists of columns of different types
More general than a matrix
Columns Variables; Rows Observations
Convenient to hold all the data required for a data analysis
47
Data Frame - Creation
Created using many ways
Function data.frame function
General syntax is data.frame(col1,col2,) where col1, col2, etc. are
columns of same or different data types (numeric/character/ logical)
Factors
R functions treat nominal, ordinal variables differently as compared to
continuous variables
In R, nominal and ordinal variables are called factors
Use factor() function to make any variable as a factor
48
Handling Data
DATA FRAMES
Data frames are used to store tabular data
They are represented as a special type of list where every element of

the list has to have the same length
Each element of the list can be thought of as a column and the length
of each element of the list is the number of rows
Unlike matrices, data frames can store different classes of objects in

each column (just like lists); matrices must have every element be the
same class
Data frames also have a special attribute called row.names
49
Data frames are usually created by calling read.table() or read.csv()
Can be converted to a matrix by calling data.matrix()

Creating data frames
Data frame: represent the data in traditional table oriented way
The command data.frame can be used to organize data of different

kinds and to extract subsets of said data. Assume that we have data
about three persons and that we store it as follows:
length <- c(180,175,190)
weight <- c(75,82,88)
name <- c("Anil","Ankit","Sunil")
Here name is character vector vector of text strings. It does not

matter here whether you use single or double quote symbols, as log as
the left quote is the same as the right quote
friends <- data.frame(name,length,weight)
friends is now a data frame containing the data for the three persons
50
A data frame corresponds to what other statistical packages call a
data matrix or a data set. It is a list of vectors and /or factors of
the same lengths
Data can easily be extracted:

my.names <- friends$name
length1 <- friends$length[1]
51

Lecture 1 R Introduction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1 R Introduction

Uploaded by

Copyright:

Available Formats

R Software: An Overview

ICAR-Indian Agricultural Statistics Research Institute

The objective of this workshop is to provide an overview of the

This workshop is aimed as a starting point for any future

At the end of the workshop, you will be able to gain awareness of

You will do exploratory data analysis, perform basic statistics and

R is a language and environment for statistical computing and

It provides a wide variety of statistical (linear and nonlinear

The development of the R system for statistical computing is

R is extensible, can be expanded by installing packages

The base distribution of R is maintain by a small group of

Maintained by top quality experts, continuous improvement

Available on all platforms (Linux, Mac, Windows)

Download the software from the internet : http://www.r-project.org/

Free to install, no catches

Download from http://www.rstudio.com/

R Studio provides an easy way to access all of the information &

Click there and click on base

It is an .exe file, which you can save in your hard disc

Double click this icon to open the R environment

Or Start > All Programs >R

R will open up with the appearance of a standard Windows

R processes commands on a line by line basis

R works fundamentally by question and answer model

Consequently it is necessary to hit ENTER after typing in (or pasting) a

Here at the command prompt (the symbol >), we can enter R

File > New Script

Use RUN option to execute the commands (or Ctrl R)

Path separator forward slash (/) or two backward slashes

Set Working Directory

Simple calculations, vectors and graphics

To begin with, well use R as a calculator. Enter arithmetic expression

msg <- hello

The symbol <- (or alternatively use =) should be read as assigns.

R is case-sensitive, for example, data, Data and DATA are three

A comment in R code begins with a hash symbol (#)

The most basic object is a vector

A vector can only contain objects of the same class

R objects can have attributes

Attributes of an object can be accessed using the attributes () function.

The c() function can be used to create vectors of objects.

Using the vector() function

weight<- c(60, 72, 57,90, 55,80)

Here we note that operation is carried out element wise

Elements of a vector can be accessed as

x[2] # The 2nd element of the vector x

sum(x) #To compute sum of data in x.

mean(x) #To compute mean of data in x.

var(x) #To compute variance of x.

sqrt(var(x)) # To compute standard deviation of x.

sqrt(var(x))/mean(x)*100 #To compute coefficient of variation

#To compute summary features of data in x.

If we denote by mean(weight) by xbar; then

cor(x,y) #To compute correlation coefficient between x and y.

We can use function t.test

t.test (bmi, mu=22.5)

One Sample t-test

alternative hypothesis: true mean is not equal to 22.5

95 percent confidence interval:

If mu is not given then t.test would use default mu=0

For Complete List see http://cran.r-project.org/

Need to download and install required package(s)

Use Install Packages

The packages included as default in base distribution implement