You are on page 1of 26

Correlation Analysis

• Correlation is a statistical tool which studies the


relationship between two variables and Correlation
Analysis involves various methods and techniques
used for studying and measuring the extent of the
relationship between two variables.

• Correlation Analysis is a statistical procedure by


which we can determine the degree of association or
relationship between two or more variables.

Prof. Kuldeep Sharma, IIBS PAGE 1


Bengaluru
Statistical Relationship

Relation between height & weight; Price & demand,


Age & Height; Radius & Area of a circle

Two variables are said to be correlated if a change


in the value of one variable is accompanied by a
change in the value of another variable.

Such a relationship is called Statistical Relationship.

When both the variables in the bi-variate data are


quantitative, we use the term Correlation analysis to
describe the methods to find out if relationship
exists or not?
Prof. Kuldeep Sharma, IIBS PAGE 2
Bengaluru
Croxton and Crowden defined the
correlation as
“The relationship of quantitative nature. The appropriate statistical
tool for discovering and measuring the relationship and expressing
it in brief formula is known as Correlation.”

According to the Statistician A. M. Tuttle

“Correlation is an analysis of the covariation between two or


more variables.”

Prof. Kuldeep Sharma, IIBS PAGE 3


Bengaluru
Sample Data for House Price Model

House Price in $1000s Square Feet


(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

Prof Kuldeep Sharma, IIBS


Bengaluru
Graphical Presentation
• House price model: scatter
plot
450
400
350
House Price ($1000s)

300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Prof Kuldeep Sharma, IIBS
Bengaluru Square Feet
Types of Relationships
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

Prof Kuldeep Sharma, IIBS


X X
Bengaluru
Types of Relationships (continued)
Strong relationships Weak relationships

Y Y

X X

Y Y

Prof Kuldeep Sharma, IIBS


X X
Bengaluru
Types of Relationships (continued)
No relationship

Prof Kuldeep Sharma, IIBS X


Bengaluru
UNIVARIATE & BIVARIATE DISTRIBUTION

• In a bivariate population we are interested to know


whether there exists some sort of functional
relationship between the two variables involved.

• The change in one variable affects a change in the


other variable or not?
• If yes what is the nature of this relationship?

Prof. Kuldeep Sharma, IIBS PAGE 9


Bengaluru
COVARIANCE
• Covariance is an absolute measure between two
variables X & Y, denoted by Cov. (X,Y) and defined
as
• Cov. (X,Y) = Σ (x - µ )*(y - µ )/n

• Cov. (X,Y) = 1/n*{Σ xy – 1/n*(Σ x)*(Σ y)}

• The covariance measures the strength of the linear


relationship between two variables

Prof. Kuldeep Sharma, IIBS PAGE 10


Bengaluru
SCATTER DIAGRAM OR DOT DIAGRAM
METHOD

• Scatter diagram is a graphical method of showing the


correlation between the two variables x & y.

• The scatter diagram may indicate both degree and


the type of correlation.
• From scatter diagram, we can form a fairly good,
though rough idea about the relationship between the
two variables.

Prof. Kuldeep Sharma, IIBS PAGE 11


Bengaluru
Scatter Plot
A scatter plot (or scatter diagram) can be used to show the
relationship between two variables

C o st p e r D ay v s. P ro d u ctio n Vo lu m e

250
Volume Cost per
per day day
200
23 125
150
Cost per Day

26 140
29 146 100
33 160
50
38 167
42 170 0
50 188 0 10 20 30 40 50 60 70
55 195
V o lu m e p e r D a y
Prof. Kuldeep Sharma, IIBS PAGE 12
60 200
Bengaluru
Advantage & Disadvantage of Scatter Diagram

• Readily comprehensible and enables us to form a rough idea of


the nature of relationship between the two variables
• Not affected by extreme observations
• Not influenced by extreme items

• Not a suitable method if the number of observations is very


large
• Provides only rough measure of Correlation which can differ
from man to man

Prof. Kuldeep Sharma, IIBS PAGE 13


Bengaluru
Co-efficient of Correlation r

It gives the degree of association or relationship


correlation.

The relationship between two variables such that a


change in one variable results in a positive or
negative change in the other variable and also a
greater change in one variable results in
corresponding greater or smaller change in the other
variable is known as Correlation.

Prof. Kuldeep Sharma, IIBS PAGE 14


Bengaluru
Coefficient of Correlation

Measures the strength of the linear relationship


between two quantitative variables

∑( X i − X ) ( Yi − Y )
r= i =1
n n

∑( X −X) ∑( Y −Y )
2 2
i i
i =1 i =1

Prof. Kuldeep Sharma, IIBS PAGE 15


Bengaluru
Application of Correlation analysis

• Correlation analysis is used to measure


strength of the association (linear
relationship) between two variables
– Correlation is only concerned with strength
of the relationship
– No causal effect is implied with correlation

Prof. Kuldeep Sharma, IIBS PAGE 16


Bengaluru
Properties of Co-efficient of correlation
1. It is a measure of the closeness of a fit in a relative
sense
2. R lies between -1 & +1
3. The correlation is perfect negative when r = -1
4. The correlation is perfect positive when r= +1
5. If r = 0 then there is no correlation, Thus Variables are
independent
6. R is a pure number and is not affected by a change of
origin & scale
7. Relative measure of association between two or more
variables

Prof. Kuldeep Sharma, IIBS PAGE 17


Bengaluru
Scatter Plots of Data with Various
Correlation Coefficients
Y Y Y

X X X
r = -1 r = -.6 r=0
Y Y

r=1
r = .6 X X
Prof. Kuldeep Sharma, IIBS
Bengaluru
Karl Pearson’s Coefficient of Correlation
• Karl Pearson (1857-1936) a great Statistician provided formula for measuring
the magnitude of linear correlation coefficient between two variables.

• ᵖ (X,Y) = rxy = Cov (x,y)


√(VarX *VarY)

• ᵖ (X,Y) = rxy = Σ (x - µ )(y - µ )


√ Σ (x - µ )2* Σ (y - µ ) 2

Prof. Kuldeep Sharma, IIBS PAGE 19


Bengaluru
Karl Pearson’s Coefficient of Correlation
contd.
• ᵖ (X,Y) = rxy = n*Σ x*y – Σ x*Σ Y
• √ {n*Σ x 2 – (Σ x)2}*{n*Σ y2 – (Σ y)2}
• Above formula saves a lot of computational labour.
• Also It reduces the error due to computation & rounding off.
• Other forms also can be used
• ᵖ (X,Y) = rxy = Σ dx*Σ dy where dx=(x- µ )
• √ Σ dx2 * Σ dy2 where dx2=(x- µ ) 2
• ᵖ (X,Y) = rxy = Σ dx*Σ dy
• n* σx* σy

Prof. Kuldeep Sharma, IIBS PAGE 20


Bengaluru
Another Formula called short cut method

• ᵖ (X,Y) = rxy = n*Σ dx*dy – Σ dx*Σ dY


• √ {n*Σ dx2 – (Σ dx)2}*{n*Σ dy2 – (Σ dy)2}

• where dx = (x - a) a is assumed mean for X


• where dx2= (x - a) 2
• where dy = (y - b) b is assumed mean for Y
• where dy2= (y - b) 2

Prof. Kuldeep Sharma, IIBS PAGE 21


Bengaluru
Nature of Relationship
• Positive correlation means that low values of one variable are
associated with low values of the other, and high values of one
variable are associated with high values of the other.

• Negative correlation means that low values of one variable are


associated with high values of the other, and high values of one
variable are associated with low values of the other.

• The degree of correlation between two variables is measured by the


Personian ( Product moment) correlation coefficient. ( r )
• The nearer “r” to +1 or –1. The stronger the relationship.

Prof. Kuldeep Sharma, IIBS PAGE 22


Bengaluru
Spearman’s Rank Correlation Coefficient R
• It is applied in the problems in which data cannot be measured
quantitatively but qualitatively assessment is possible such as
beauty, honesty etc.

• In this case the best individual is given rank number1, next 2 and
so on.

• R = 1 - 6*Σ (D) 2
n(n2 – 1)
Where is the square of the difference of corresponding ranks
and n is number of pairs of observations.

Prof. Kuldeep Sharma, IIBS PAGE 23


Bengaluru
Spearman’s Rank Correlation Coefficient
When Ranks are tied or Repeated ranks

• R = 1 - 6[Σ (D)2 +(p3–p)/12+(q3 –q)/12]


n(n2 – 1)
where p, q…….. Are the number of times a value is
repeated

Prof. Kuldeep Sharma, IIBS PAGE 24


Bengaluru
Spearman’s Rank Correlation Coefficient
• It is simpler to understand and easy to calculate as compared to
Karl’s Pearson’s Method.
• It is useful for qualitative data such as beauty, honesty,
efficiency etc.
• It is a useful method when the actual data is not given but only
ranks are given.

• Limitation
• It can’t be used for grouped frequency distribution
• It is no as accurate as Pearson’s coefficient.
• It can’t be used in continuous series.
• When no of items is >30, and if ranks are not given; it takes more time and
therefore can’t be used conveniently.

Prof. Kuldeep Sharma, IIBS PAGE 25


Bengaluru
Quiz
• State the nature of the following correlation
• (positive, Negative or no correlation)

• 1. The amount of rainfall & Yield of crops


• 2. The colour of a saree and the intelligence of the girl wearing it
• 3. Age if life insurance & the premium of insurance
• 4. Demand for goods and their prices under normal time
• 5. Production of pig iron and soot contents in Durgapur
• 6. Unemployment index and the purchasing power of the common
man

PAGE 26

You might also like