You are on page 1of 23

CORRELATION AND REGRESSION

C 5606 / 5/ 1

UNIT 5
CORRELATION AND REGRESSION

OBJECTIVES

General Objective To understand and apply the concept of correlation and regression Specific Objectives At the end of the unit, you should be able to: Draw a scatterplot for a set of ordered pairs Compute the correlation coefficient Compute the equation of the regression line

CORRELATION AND REGRESSION

C 5606 / 5/ 2

INPUT

5 ! CORRELATION So far we have considered the statistics of one variable. f course we sometimes get data involving two variables. !or e"ample, loo# at the mar#s obtained on two $athematics paper by a group of students below. Student +aper , +aper . A -. 2, % //2 C 01 -. D -. 31 & 22 ./ ! 01 32 ' 34 04 ( /, 4. ) 01 52 * 20 -1

So what can we find out from the data 6 Students % and ( have done very well on both papers, & has done very badly on both papers, student ) has done much better on paper . than paper ,. A graph might help us to ma#e more sense of the data, as would the average 7mean8 mar# for papers , and .. The most useful type of graph is a scatter diagram.

CORRELATION AND REGRESSION

C 5606 / 5/ 3

5 " CORRELATION# SCATTER DIAGRA$


)f we plot the data as points, with mar#s for +aper , on the "9 a"is and for paper . on the y9a"is, we obtain a graph li#e the one shown heree. :ote that we do not need to start the scales at ;ero.

<e see that the points go roughly from bottom left to top right7this is made clearer by enclosing the points as shown below.

CORRELATION AND REGRESSION

C 5606 / 5/ 4

!rom the data the mean value for paper , And for paper .

y = 05.,

x = 02.3

<e now plot the line " = 02.3 and y = 05., on the scatter diagram:

The line divide the graph into four quadrants : Top >ight ? All points have both " values and y values greater than their respective means i.e. 7" ? x 8 @1, 7y 9 y 8 @ 1. The product would be positive. %ottom Aeft ? All points have both " values and y values less than their respective means i.e. 7" ? x 8 @1, 7y 9 y 8 @ 1. The product would be positive. Top left ? " values less than x , y values greater than y . +roduct negative. %ottom right ? " values greater than x , y values less than y . +roduct negative. Aoo# at the scattergrams 7scatter diagrams8 below. The patterns seem to be very different.

CORRELATION AND REGRESSION

C 5606 / 5/ 5

>oughly spea#ing: %&sitive c&rrelati&n ? Bthe higher the value of ", the higher the value of y.C Ne'ative c&rrelati&n ? Bthe higher value of ", the lower value of y.C (er& c&rrelati&n ? Bno fi"ed relationship between " and y.C Again this is made clearer by drawing the lines y = y , " = x .

Dou have met scatter diagrams in your wor# of which you may have drawn a Bline of best fitC on the graph in order to estimate a value of y given a value of ". The line was drawn by BeyeC but you would #now that the line passes through the mean values of 7 x , y 8 as shown below.

CORRELATION AND REGRESSION

C 5606 / 5/ 6

The lines on the first two diagrams are relatively easy to draw, but where do we draw a line on the third and having drawn it, would it be of any practical use6 :otice that we have been loo#ing for a special type of relationship between the " and y values ? a straight line or linear relationship. The fact that we canEt find such a relationship does not mean that there is no relationship at all. T)e pr&*+ct#,&,ent f&r,+la f&r *eter,inin' t)e linear c&rrelati&n c&efficient The convention of dealing with data (ori;ontal 7"8 a"is ? The independent variable

Fertical 7y8 a"is ? The dependent variable Aet us loo# at some data on the height of students and the distance they can throw a cric#et ball. (eight 7"8 cm Distance 7y8 m ,.. -, ,.2/ ,22 0. ,2/ 03 ,-.4 ,03 0,0/ 04 ,3, 3, ,332 ,3/ 35

*ust loo#ing at the data, a general response might be Bthe taller a person, the further they can throw a cric#et ball.C 7apart from the odd personG8

CORRELATION AND REGRESSION

C 5606 / 5/ 7

D&es a scatter *ia'ra, s+pp&rt t)at )-p&t)esis.

The e"ample below shows one drawbac#: SCALE

CORRELATION AND REGRESSION

C 5606 / 5/ 8

ne of the measures of the degree of linear correlation between two variables is called the c&efficient &f c&rrelati&n, denoted by the symbol HrE. The coefficient of correlation for two variables, say I and D, is given by:
r=

[( X X )

( X X )(Y Y )
2

(Y Y ) 2

oe simply =

[( x

xy
2

)( y 2 )

The value of the correlation coefficient ranges from The value of the correlation coefficient ranges from +1 for a perfect correlation +1 for a perfect correlation to -1 for aa perfect negative correlation to -1 for perfect negative correlation

E/a,ple 5 " a8 Determine the coefficient of correlation between I and D based on the data below. I D ,. 0 ,1 3 / 4 3

b8 The data given below gives the e"perimental values obtained for the torque output from an electric motor, I, against the current ta#en from the supply, D. Determine the value, degree and nature of the coefficient of linear correlation between the variables I and D 7if there is one8. I D 1 , 3 . 3 2 3 / 0 ,1 3 ,1 5 ,1 / ,4 ,.

CORRELATION AND REGRESSION

C 5606 / 5/ 9

S&l+ti&n t& E/a,ple 5 " a8 Construct a table from the given data. , I 0 3 4 . D ,. ,1 / 3
Y = 36

2 "=I9
X

y = D9 Y 2 , 9, 92

0 "y 93 9, 1 94

3 ". , 1 4
x 2 = 14

5 y. 4 , , 4
y 2 = 20

X = 24
X = 24 =6 4

9. 9, 1 2

Y =

36 =9 4

xy = 16

r= b8

[( x

xy
2

)( y 2 )

[ (14)(20)]

16

16 2!0

= 0 9562

"= I D X X 1 9-.0 , 3 92.0 . 3 9..0 2 3 9,.0 / 91.0 0 ,1 1.0 3 ,1 ,.0 5 ,1 ..0 / ,2.0 4 ,. -.0 x == 45 y = !6 45 !6 X = =45 Y = =! 6 10 10 r=

y=
Y Y

9-.3 9..3 9..3 9..3 91.3 ,.,.,.0.2.-

"y .1.5 4., 3.0 2.4 1.2 1.5 .., 2.0 ,/.4 ,0.2

". .1..0 ,...0 3..0 ...0 1..0 1..0 ...0 3..0 ,...0 .1..0
x 2 = !2 5

y. .,.,3 3.53 3.53 3.53 1.23 ,.43 ,.43 ,.43 .4.,3 ,,.03 y 2 = !! 4

xy = !1 . 1
= 0 95

[( x

xy
2

)( y 2 )

[ (!2 5)(!! 4)]

!1

CORRELATION AND REGRESSION

C 5606 / 5/ 10

A '&&* *irect c&rrelati&n e/ists bet0een t)e t)e val+es &f 1 an* 2

ACTIVIT2 5A

TEST 2O3R 3NDERSTANDING %&! >& +> C&&D):' T

T(& :&IT ):+JT...G

,. Determine the coefficient of correlation up to - decimal places between I and D based on the data below. I D ,.. -, ,.2/ ,22 0. ,2/ 03 ,-.4 ,03 0,0/ 04 ,3, 3, ,332 ,3/ 35

.. The co9ordinates given below refer to an e"periment to verufy :ewtonEs law of cooling over a limited range of values. Determine the value, degree and nature of the coefficient of correlation. Time 7min8 Temperatuer 7oC8 -3 / 2,1 21 ,. .3 ,3 ... .1

2. The following results were obtained e"perimentally when verifying (oo#eEs law: Aoad 7:8 &"tension 7mm8 . . 0 .2 / 3. ,, ,,4 ,0 ..2

Determine the value, degree and nature of the coefficient of correlation.

-. The thic#ness of case9hardening achieved varies with temperature and some co9 ordinated obtained by e"periment are as shown. Temperature 7oC8 -11 -.1 Thic#ness 7Km8 2.5 2.201 2.5 2.1 2./ -11 2.3 -/1 2.2 --1 2.251 2.5

CORRELATION AND REGRESSION

C 5606 / 5/ 11

Determine the coefficient of correlation based on these values.L9

4EEDBAC5 TO ACTIVIT2 5A

,. .. 2. -.

r = 1.5./4 r = 91.4., good, inverse 1.45, good, direct 1.42

CORRELATION AND REGRESSION

C 5606 / 5/ 12

INPUT

5 6 LEAST S73ARES REGRESSION LINE Scatter Diagrams ? Aine f the %est

<e have already referred to the drawing of a line of best fit by eye

Thev only calculation involved determining x dan y , since the line of best ? fit passes through the point 7 x , y 8. !rom the line you might be e"pected to estimate a y ? value given an "9 value. f course, B by eye B line fitting is a subMective matter, trying to minimise the distances between the points and the line. A mathematical computation method is available to produce two lines : #nown as Hy and H" 7 to estimate value of y8 and H" on Hy 7 to estimate values of "8 These are #nown as 7Ainear8 >egression Aines or Aeast9Squares >egression Aines.

CORRELATION AND REGRESSION

C 5606 / 5/ 13

Scatter Diagrams ? The Hy on H" >egression Aine Since the line must pass through 77 x , y 8, the parameters that can vary are the gradient of the line and the point where the line cuts the y ? a"is. The equation of the line will be of the form y = a L b" By on B" 7 some syllabuses use 'ree# letters N and O instead of a and b8

The y on " line minimises the sum of the squares of the vertical distances from the points to the regression line 7 the square of the distance is used to ensure a positive result8. As with correlation there is a formula derived from a proof and a corresponding H computationalE method. The proof is not required at APAS Aevel 8
(x y ( x ) n n
2

!or y = a L b"

b =

xy x
2

a = y 9b x

<here y and x are the mean values of y and ".

CORRELATION AND REGRESSION

C 5606 / 5/ 14

E/a,ple 5 6 a8 - &n / Re'ressi&n Line 7 Aeast Squares >egression Aine 8

" y

..0 2.0

/ 3.0

0 5

5 /

4.0 ,,
2

/.0 4

,..0 ,1.0

,..0 ,2
x = /.-

,-.0 ,2
y

/.-0

x = /- y = /-.0 xy = /.5 x = /-0.0 n = ,1

Calculate the regression line y on ". b8 %ased on the data alreday calculated, find the regression line y on " and estimate the value of y when " = ,31
x = ,-3/ y = 0.1 xy = 553/4 x = .,/151 n = ,1
2

x = /.-

S&l+ti&n t& E/a,ple 5 6 a8 To calculate the regression line y9on9"

b =

xy x
2

(x y ( x ) n n
2

!2"

(!4 x!4 5) 10 2 !4 !45 5 ( ) 10

= 1./255

a= y 9bx

= /.-0 ? 71./255 " /.-8 = ,.-,22 y = ,.-,22 L 1./255 "

So least squares regression line y 9 on 9 " is Aeast Squares >egression Aine 9 y 9 on ? "

!rom the previous page , the least squares regression line y 9 on 9 " is :

CORRELATION AND REGRESSION

C 5606 / 5/ 15

y = ,.-,22 L 1./255" <e can now use this equation to calculate 7 estimate8 a value of y for a given value of " . !or e"ample . !ind a value for y given " = ,1 Substituting y = ,.-,22 L 71./255 " ,18

!inding a value from within the range of " is called interpolation <arning . &stimation a value from outside the data range 7 say " = .1 8 is called e"trapolation and should bec avoided 7 at all cost 8 since you do not #now that the relationship between " and y will hold for larger and smaller values than those recorded. b8 !or the regression line y ? on ? ",

b =

xy x
2

(x y ( x ) n n
2

""6!9

(146! x520) 10 2 146! 21!0"0 ( ) 10

= 1.0.51

a = y 9 7b x 8

= 0. 9 71.0.51 " ,-3./ 8

= 9 .0.2323

So, regresson line is y = 9.0.2323 L 1.0.51" <hen " = ,31, y = 9.0.2323 L 71.0.51 " ,318 = 58 9:

CORRELATION AND REGRESSION

C 5606 / 5/ 16

ACTIVIT2 5B

TEST 2O3R 3NDERSTANDING %&! >& +> C&&D):' T a.

T(& :&IT ):+JT...G

The table shows the results for a number of athletes. I represents long Mump 7metres 8
x = ,4 y = 33 xy = ,.3... x = 23.-- n = /
2

I ,./ .., ,.4 ..1 ,./ ,./ ,.3 ,./ ,.4 ..2 ,4

y 3.5 5.3 3.2 3./ 0.4 5.4 0.0 0.3 3.0 5.. 33

". 2..-.-, 2.3, -.11 2..2....03 2..2.3, 0..4 23.--

y. --./4 05.53 24.34 -3..2-./, 3..-, 21..0 2,.23 -...0 0,./--,.0

"y ,..13 ,0.43 ,,.45 ,2.3 ,1.3. ,-... /./ ,1.1/ ,..20 ,3.03 ,.3...

Calculate the values of b for the regression line y = a L b" b. The length y metres of a cable subMected to a load of " #ilograms is given by y = N L O". )n an e"periment to estimate N and O for a particular cable, the value of of y was measured for each of " . The following quantities were calculated from the ,0 pair of values.
x = ..0 y = .2/ xy = 20/, x = 23.0
2

Calculated the least squares estimates of N and O

CORRELATION AND REGRESSION

C 5606 / 5/ 17

c. Set of bivariate data can be summarised as follows :


x = ., y = -2 xy = ,5, x = 4, n = 3
2

y = 220
2

i8 ii8

Calculate the equation of the regression line of y on " . 'ive your answer in the form y = a L b", where the values of a and b should be stated to 2 significant figures. )t is required to estimate the value of y for a given value of ". State circumstances under which the regression line of " and y should be used, rather than the regression line of y and "

CORRELATION AND REGRESSION

C 5606 / 5/ 18

4EEDBAC5 TO ACTIVIT2 5B

a. b. c.

b = ..-,,/ y = N L O" y = ,0.34 L 1.1,-"

i8 a = 2.13//, regression line is y = 2.15 L ,.,5 7 2 significant figures8 ii8 Jse regression line of " on y to estimate value of " when y is the independent variable.

CORRELATION AND REGRESSION

C 5606 / 5/ 19

SEL4 ASSESS$ENT 5

Dou are approaching success. Tr- all t)e ;+esti&ns in this self9assessment section and chec# your answers given on the ne"t page. )f you encounter any problems, consult your instructor. 'ood luc#. ,. The data given below refers to the relationship between man9hours wor#ed and production achieved in a factory. Determine the coefficient of correlation. )nde" of production man9hour ,11 basis )nde" of production, 4actual basis

45 4,

,11 ,11

,1, ,10

42 /-

,12 ,,.

4, /2

/4 /1

,,1 ,.2

/3 5/

.. The number of man9days lost per wee# due to sic#ness in two similar departments of a factory are show for a ,.9wee# period. Department A Department % . 1 , / , / . , ,4 ,/ ., .1 ,5 ,5 ,/ ,4 ,. ,3 ,3 ,0 ,,0 ,5 ,/ ,2 ,3 ,0 ,/

Determine the coefficent of correlation and comment on its degree and nature.

CORRELATION AND REGRESSION

C 5606 / 5/ 20

2. The masses and height for ten people were measured and the results are as shown. $ass 2/ 7#g8 (eight ,20 7cm8 2/ ,-1 2/ ,25 -,-, -,-5 0, ,-0 2. ,2. 0, ,-4 55 ,32. ,21

Calculate the coefficient of correlation for this data -. The relationship between the pressure and volume of a gas was measured and the follwowing results were obtained : +ressure 0/ 7#+a8 Folume 1.23 7m28 3. 1.45 35 1.-2 52 1.0. /, 1.-/ /, 1..4 /3 1.2, 4. 1.50 ,11..5

Determine the coefficient of correlation and comment on the result obtained. 0. The caloric inta#e of rats varies with body mass as shown below. %ody mass 7g8 Caloric )nta#e 7cal h9, ..1 2., .., ,.0 )s there a linear correlation between these results 6 2.3 2.. -.3 2.3 0.1 2.3 3.1 2.4 5.1 -., /.1 -.. /.0 -.0 4.1 -.3 ,1.1 0.4

CORRELATION AND REGRESSION

C 5606 / 5/ 21

3. Determine the coefficient of correlation for the data given below and test the null hypothesis that = 1 at a level of significance of 1.,. The datagiven relates the number of hours of sunshime per wee# to the hours lost due to sic#ness. (ours of ,1 sunshinePwee# (ous lost due 41 to sic#ness ,2 50 ,0 50 ,5 30 ,/ 00 .1 -0 .. 00 .2 -0 .20

5. The length y metres of a cable subMected to a load of " #ilograms is given by y = N L O". )n an e"periment to estimate N and O a particular cable, the value of y was measured for each of ,0 values of ". The following quantities were calculated from the pairs of values.
x = ..0 y = .2/.0 xy = 20/, x = 23.0
2

a8

Calculate the least squares estimates of N and O

/. A set of bivariate data can be summarised as follows


x = ., y = -2 xy = ,5, x = 4, n = 3
2

= 220

i8 ii8

Calculated the equation of regression line of y and ". 'ive your answer in the form y = a L b", where the values of a and b should be stated to 2 significant figures. )t is required to estimate the value of y for a given value of ". State circumstances under which the regression line of " and y should be used, rather than the regression line of y on "

4. The data given below is relationship between the heights and masses of ten people. (eight, ,50 I cm $ass, /. D #g ,/1 5/ ,42 /3 ,30 5. ,/5 4, ,5, /1 ,4/ 40 ,3/ 5. ,//4 ,55 5-

Determine the equation of the regression line of mass on height, e"pressing the regression coefficients correct to two decimal places.

CORRELATION AND REGRESSION

C 5606 / 5/ 22

,1. The power needed to drive a lathe increase as the cutting angle of the tool increase when cutting a constant speed and depth of cut. The relationship for mild steel is : Cutting 01 angle 7degrees8I +ower 3.. 7#<8D 00 3./ 31 5.3 30 /.. 51 /., 50 /./ /1 4.5 /0 ,1.1 41 ,1.-

Determine a8 the equation of the regression line of power on cutting angle and b8 the equation of the regression line of cutting angle on power, e"presing the regression coefficients correct to three significant figures in each case.

CORRELATION AND REGRESSION

C 5606 / 5/ 23

4EEDBAC5 TO SEL4 ASSESS$ENT 5

(ave you tried all the questions66 )f BD&SC, chec# your answers now. ,. .. 2. -. 0. 3. 5. /. 1.45 1.51 , fair direct 1.45 91.2,, )t is probable that the measurements were made at different Temperatures r = 1.4-, hence there is a good, direct correlation. r = 91.40, t.44

= ,.-.

) t) = /.10

hypothesis is reMected

N= ,0.34 O= 1.1,-

y= ,0.34 L 1.1,-"

i8 y = 2.15 L ,.,5" ii8 use regression line of " and y to estimate value of " when y is the independent variable. y = 9123./2 L 1.33" a8 D = ,.,- L 1.,1- I b8 I = 94..5 L 4.-,D

4. ,1.

You might also like