You are on page 1of 61

Linear Regression #1: Scatter Diagram: Relationship Between 2 Variables?

Linear Regression #2: Scatter Plot with Trendline & X and Y Mean Lines #REF! #REF! #REF!

Linear Regression #1: Scatter Diagram: Relationship Between 2 Variables? Plotting Two variables: Dont use Line Chart, Use Scatter Chart Plotting the point on the chart that graphs the relationship between two variables: Move along x axis a given amount and then along the y axis a certain amount. Independent, Predictor Variable = x Dependent, Predicted Variable = y Scatter Diagram with proper x and y axis labels to see if there is a relationship between two variables. Direct, Positive Relationship: As x increases, y increases Indirect, Negative Relationship: As x increases, y decreases No relationship: no pattern can be seen Add Trendline with linear equation and coefficient of determination (goodness of fit: of the total variation, how much can model explain?) Example 1: Independent Variable Dependent Variable Predictor Variable Predicted Variable x y Time Studying (hours) Score on Test 3 11 2 13 8 12 13 4 7 14 7 7 14 4 4 5 12 16 12 14 2 12 11 6 11 14 10

Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

49 87 50 89 84 79 100 57 64 98 81 68 88 45 52 15 72 97 89 87 48 92 89 52 84 94 79

28 29 30

6 10 11

59 66 97

Example 2: Independent Variable Predictor Variable x Temperature (F) 86 40 41 78 71 91 70 37 65 42 53 83 63 36 43 Dependent Variable Predicted Variable y Sales Chicken Soup $3,300 $8,200 $8,900 $3,100 $4,020 $1,950 $2,500 $6,500 $6,210 $5,250 $7,200 $2,750 $7,150 $7,900 $6,210

Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Example 3: Independent Variable Predictor Variable x Temperature (F) 91 45 46 83 76 96 75 42 70 47 58 Dependent Variable Predicted Variable y Sales Ice Cream $7,113 $2,044 $1,108 $7,093 $3,902 $6,676 $5,403 $886 $4,740 $2,637 $3,150

Sample Point No. 1 2 3 4 5 6 7 8 9 10 11

Example 4: Independent Variable Dependent Variable Predictor Variable Predicted Variable x y Years Using Excel 1 2 3 4 5 6 7 8 9 10 11 12 13 3 8 6 11 20 7 9 3 19 2 16 12 1 Expert Level (Rating 1 - 10)) 5 1 9 5 3 4 10 6 10 1 2 7 6

Sample Point No.

x axis a given amount

two variables.

l variation, how much

Linear Regression #1: Scatter Diagram: Relationship Between 2 Variables? Plotting Two variables: Dont use Line Chart, Use Scatter Chart Plotting the point on the chart that graphs the relationship between two variables: Move along x axis a given amount and then along the y axis a certain amount. Independent, Predictor Variable = x Dependent, Predicted Variable = y Scatter Diagram with proper x and y axis labels to see if there is a relationship between two variables. Direct, Positive Relationship: As x increases, y increases Indirect, Negative Relationship: As x increases, y decreases No relationship: no pattern can be seen Add Trendline with linear equation and coefficient of determination (goodness of fit: of the total variation, how much can model explain?) Example 1:
120

Score on Test

Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Independent Variable Dependent100 Variable Predictor Variable Predicted Variable 80 x y Time Studying (hours) Score on Test 3 49 40 11 87 2 50 20 13 89 0 8 84 0 79 12 13 100 4 57 7 64 14 98 7 81 7 68 14 88 4 45 4 52 5 15 12 72 16 97 12 89 14 87 2 48 12 92 11 89 6 52 11 84 14 94 10 79
60

10

12

Time Studying (hours)

28 29 30

6 10 11

59 66 97

Example 2: Independent Variable Predictor Variable x Temperature (F) 86 40 41 78 71 91 70 37 65 42 53 83 63 36 43 Dependent Variable Predicted Variable $10,000 y $9,000 Sales Chicken Soup $8,000 $3,300 $7,000 $8,200 $6,000 $8,900 $5,000 $3,100 $4,000 $4,020 $1,950 $3,000 $2,500 $2,000 $6,500 $1,000 $6,210 $0 $5,250 0 $7,200 $2,750 $7,150 $7,900 $6,210
Sales Chicken Soup

Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

y = -100.56x + 11436 R = 0.7193

20

40

60

Temperature (F)

Example 3: Independent Variable Predictor Variable x Temperature (F) 91 45 46 83 76 96 75 42 70 47 58 Dependent Variable Predicted$8,000 Variable y $7,000 Sales Ice Cream $6,000 $7,113 $2,044 $5,000 $1,108 $4,000 $7,093 $3,000 $3,902 $6,676 $2,000 $5,403 $1,000 $886 $0 $4,740 0 $2,637 $3,150
Sales Ice Cream

Sample Point No. 1 2 3 4 5 6 7 8 9 10 11

y = 112x - 3354.1 R = 0.9056

20

40

60 Temperature (F)

80

Example 4: Independent Variable Dependent Variable Predictor Variable Predicted Variable x y Years Using Excel 1 2 3 4 5 6 7 8 9 10 11 12 13
Expert Level (Rating 1 - 10)) 12 10 8 6 4 2 0

Sample Point No.

3 8 6 11 20 7 9 3 19 2 16 0 12 1

Expert Level (Rating 1 - 10)) 5 1 9 5 3 4 10 6 10 1 2 5 7 6

y = 0.0436x + 4.9156 R = 0.0078

10

15

20

25

Years Using Excel

x axis a given amount

two variables.

l variation, how much

y = 4.2914x + 34.362 R = 0.7266

14

16

18

80

100

100

120

25

Linear Regression #2: Scatter Plot with Trendline & X and Y Mean Lines 1. Create Scatter Plot with Trendline & X and Y Mean Lines to divide chart into four quadrants in order to further define the pattern and relationship between the two variables

Example 2: Mean: Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x Temperature (F) 86 40 41 78 71 91 70 37 65 42 53 83 63 36 43 y Sales Chicken Soup $3,300 $8,200 $8,900 $3,100 $4,020 $1,950 $2,500 $6,500 $6,210 $5,250 $7,200 $2,750 $7,150 $7,900 $6,210 Xbar y x

Example 3: Mean: Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 x Temperature (F) 91 45 46 83 76 96 75 42 70 47 58 66.27272727 y Sales Ice Cream $7,113 $2,044 $1,108 $7,093 $3,902 $6,676 $5,403 $886 $4,740 $2,637 $3,150 $4,068

Xbar y 66.27273 66.27273


Sales Ice Cream $9,000 $8,000 $7,000 Sales Ice Cream $6,000 $5,000 $4,000 $3,000 $2,000 $1,000 $0 0 20

x 0 8000
Xbar

0 120

y = 112x - 3354.1 R = 0.9056

40

60 Temperature (F)

order to further define

Ybar

Ybar $4,068 $4,068


Ybar Linear (Sales Ice Cream)

60

80

100

120

140

Temperature (F)

Linear Regression #2: Scatter Plot with Trendline & X and Y Mean Lines 1. Create Scatter Plot with Trendline & X and Y Mean Lines to divide chart into four quadrants in order to further define the pattern and relationship between the two variables

Example 2: Mean: Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 59.93333333 x Temperature (F) 86 40 41 78 71 91 70 37 65 42 53 83 63 36 43 $5,409 y Sales Chicken Soup $12,000 $3,300 $8,200 $10,000 $8,900 $3,100 $8,000 $4,020 $1,950 $6,000 $2,500 $4,000 $6,500 $6,210 $2,000 $5,250 $7,200 $0 $2,750 0 $7,150 $7,900 $6,210
Sales Chicken Soup

Xbar y 59.93333 59.93333

x 0 10000 0 100

y = -100.56x + 11436 R = 0.7193

20

40

60 Temperature (F)

80

Example 3: Mean: Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 x Temperature (F) 91 45 46 83 76 96 75 42 70 47 58 66.27272727 y Sales Ice Cream $7,113 $2,044 $1,108 $7,093 $3,902 $6,676 $5,403 $886 $4,740 $2,637 $3,150 $4,068

Xbar y 66.27273 66.27273


Sales Ice Cream $9,000 $8,000 $7,000 Sales Ice Cream $6,000 $5,000 $4,000 $3,000 $2,000 $1,000 $0 0 20

x 0 8000
Xbar

0 120

y = 112x - 3354.1 R = 0.9056

40

60 Temperature (F)

order to further define

Ybar $5,409 $5,409

80

100

120

Ybar $4,068 $4,068


Ybar Linear (Sales Ice Cream)

60

80

100

120

140

Temperature (F)

Linear Regression #3: Coefficient of Correlation: Strength & Direction of Relationship Calculate the Sample Covariance long hand to get measure of strength of the linear relationship. Use Scatter Plot with Trendline & X and Y Mean Lines to see why covariance makes sense Calculate the Sample Covariance using Excel function COVARIANCE.S Measure Strength and Direction of Relationship with Coefficient of Correlation Calculate Coefficient of Correlation long hand to get a measure of the strength and direction of the linear relationship. This number will vary from -1 to 0 to +1 (minus one to zero to positive one) and will indicate a perfect indirect (negative) relationship when minus one, no relationship when it is zero and a perfect direct relationship when it is positive one.

Reasonable positive number = Direct, Positive Relationship: As x increases, y increases Reasonable negative number = Indirect, Negative Relationship: As x increases, y decreases Number close to zero = No relationship: no pattern can be seen See three charts to help visualize the three correlation situations. Calculate Coefficient of Correlation with the Excel functions CORREL and PEARSON Calculate Sample Standard Deviation long hand to see that it is related to Coefficient of Correlation and other Linear Regressi calculations
$12,000 Sales Chicken Soup $10,000 $8,000 $6,000 $4,000 $2,000 $0 0 y = -100.56x + 11436 R = 0.7193

Example 2:

Mean: Count n -1

59.93333 15 14

$5,409

Xbar y 59.93333333 59.93333333 x Ybar 0 100 60 20 40

0 10000 $5,409 $5,409100 80

Temperature (F)

Sample Point x y (x Deviation) (y Deviation) (x Deviation)^2 (y Deviation)^2 Temperat No. ure (F) Sales Chicken Soup (x - Xbar) (y - Ybar) (x - Xbar)^2 (y - Ybar)^2 1 86 $3,300 2 40 $8,200 3 41 $8,900 4 78 $3,100 5 71 $4,020 6 91 $1,950 7 70 $2,500 8 37 $6,500 9 65 $6,210 10 42 $5,250 11 53 $7,200 12 83 $2,750 13 63 $7,150 14 36 $7,900 15 43 $6,210 Sum of Deviations SUM Deviations^2 ====================>>

SUM Mult. Deviations =============================================>> Sample SD x Sample SD y Sample Covariance

Coefficient of Correlation

Example 3:

Xbar y 66.27272727 66.27272727

x 0 8000 0 120

Sample Point x No. 1 2 3 4 5 6 7 8 9 10 11 Mean:

Sales Ice Cream $9,000 $8,000 $7,000 Sales Ice Cream $6,000 $5,000 $4,000 $3,000 $2,000 $1,000 $0 0 20

Xbar

Ybar

Linear (Sales Ice

Temperat ure (F) Sales Ice Cream 91 $7,113 45 $2,044 46 $1,108 83 $7,093 76 $3,902 96 $6,676 75 $5,403 42 $886 70 $4,740 47 $2,637 58 $3,150 66.27273 $4,068

y = 112x - 3354.1 R = 0.9056

40

60

80

100

Temperature (F)

Sample Covariance Coefficient of Correlation Strength and Direction of the relationship Coefficient of Determination = R^2 = "Goodness of fit for our line" Example 4: Sample Point x y

r^2

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 Mean: Sample Covariance Coefficient of Correlation r^2

Years Using Excel 3 8 6 11 20 7 9 3 19 2 16 12 1 9.636364

Expert Level (Rating 1 10)) 5 1 9 5 3 4 10 6 10 1 2 7 6 5.307692308

12 10 8 6 4 2 0 0 5 10

y = 0.0436x + 4.9156 R = 0.0078

Expert Level (Rating 1 - 10))

15

Years Using Excel

elationship. es sense

on e linear relationship. This ect indirect (negative) when it is positive one.

reases ecreases

RSON and other Linear Regression

100.56x + 11436 R = 0.7193

Sample Standard Deviation = Spread In Data. How Fairly Does The Mean Represent The Data Points?

s=
100 120

(1)

(x Deviation)* (y Deviation) (x Deviation)* (y Deviation)

Sample Covariance = Measure the Strength of the Linear Relationship Between 2 Variables, but has problem with units. Note: See 4 Quadrant Example of why this measure makes sense.

sxy =

( ) 1

Coefficient of Correlation = Measures Strength and Direction Of Liner Relationship. Does Not Have A Problem With Units. Range From -1 to 0 to + 1. -1 = Perfect Indirect (Negative) Relationship (as x increases, y decreases). 0 = No Relationship. +1 = Perfect Direct (Positive) Relationship (as x increases, y increases). Used for Linear Relationship only.

rxy =

( ) 1

rxy =
Correlation is not causation

sxsy

Ybar $4,068 $4,068

Linear (Sales Ice Cream)

100

120

140

y = 0.0436x + 4.9156 R = 0.0078

15

20

25

s Using Excel

Linear Regression #3: Coefficient of Correlation: Strength & Direction of Relationship Calculate the Sample Covariance long hand to get measure of strength of the linear relationship. Use Scatter Plot with Trendline & X and Y Mean Lines to see why covariance makes sense Calculate the Sample Covariance using Excel function COVARIANCE.S Measure Strength and Direction of Relationship with Coefficient of Correlation Calculate Coefficient of Correlation long hand to get a measure of the strength and direction of the linear relationship. This number will vary from -1 to 0 to +1 (minus one to zero to positive one) and will indicate a perfect indirect (negative) relationship when minus one, no relationship when it is zero and a perfect direct relationship when it is positive one. Reasonable positive number = Direct, Positive Relationship: As x increases, y increases Reasonable negative number = Indirect, Negative Relationship: As x increases, y decreases Number close to zero = No relationship: no pattern can be seen See three charts to help visualize the three correlation situations. Calculate Coefficient of Correlation with the Excel functions CORREL and PEARSON Calculate Sample Standard Deviation long hand to see that it is related to Coefficient of Correlation and other Linear Regressi calculations
$12,000 Sales Chicken Soup $10,000 $8,000 $6,000 $4,000 $2,000 $0 0 y = -100.56x + 11436 R = 0.7193

Example 2:

Mean: Count n -1

59.93333 15 14

$5,409

Xbar y 59.93333333 59.93333333 x Ybar 0 100 60 20 40

0 10000 $5,409 $5,409100 80

Temperature (F)

Sample Point x y (x Deviation) (y Deviation) (x Deviation)^2 (y Deviation)^2 Temperat No. ure (F) Sales Chicken Soup (x - Xbar) (y - Ybar) (x - Xbar)^2 (y - Ybar)^2 1 86 $3,300 26.0666667 -2109.33333 679.4711111 4449287.111 2 40 $8,200 -19.9333333 2790.666667 397.3377778 7787820.444 3 41 $8,900 -18.9333333 3490.666667 358.4711111 12184753.78 4 78 $3,100 18.0666667 -2309.33333 326.4044444 5333020.444 5 71 $4,020 11.0666667 -1389.33333 122.4711111 1930247.111 6 91 $1,950 31.0666667 -3459.33333 965.1377778 11966987.11 7 70 $2,500 10.0666667 -2909.33333 101.3377778 8464220.444 8 37 $6,500 -22.9333333 1090.666667 525.9377778 1189553.778 9 65 $6,210 5.06666667 800.6666667 25.67111111 641067.1111 10 42 $5,250 -17.9333333 -159.333333 321.6044444 25387.11111 11 53 $7,200 -6.93333333 1790.666667 48.07111111 3206487.111 12 83 $2,750 23.0666667 -2659.33333 532.0711111 7072053.778 13 63 $7,150 3.06666667 1740.666667 9.404444444 3029920.444 14 36 $7,900 -23.9333333 2490.666667 572.8044444 6203420.444 15 43 $6,210 -16.9333333 800.6666667 286.7377778 641067.1111 Sum of Deviations 0.00 0.00 SUM Deviations^2 ====================>> 5272.933333 74125293.33 SUM Mult. Deviations =============================================>>

Sample SD x Sample SD y Sample Covariance

Coefficient of Correlation

19.40716608 2301.013648 -37874.3333 -0.84813245

19.40716608 2301.013648 -37874.33333 -0.84813245 -0.84813245

-37874.33333

Example 3:

Xbar y 66.27272727 66.27272727

x 0 8000 0 120

Sample Point x No. 1 2 3 4 5 6 7 8 9 10 11 Mean:

Sales Ice Cream $9,000 $8,000 $7,000 Sales Ice Cream $6,000 $5,000 $4,000 $3,000 $2,000 $1,000 $0 0 20

Xbar

Ybar

Linear (Sales Ice

Temperat ure (F) Sales Ice Cream 91 $7,113 45 $2,044 46 $1,108 83 $7,093 76 $3,902 96 $6,676 75 $5,403 42 $886 70 $4,740 47 $2,637 58 $3,150 66.27273 $4,068

y = 112x - 3354.1 R = 0.9056

40

60

80

100

Temperature (F)

Sample Covariance 43143.69 Coefficient of Correlation 0.951608 Strength and Direction of the relationship Coefficient of Determination = R^2 = "Goodness of fit for our line" Example 4: Sample Point x y

r^2

0.905558201

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 Mean: Sample Covariance Coefficient of Correlation r^2

Years Using Excel 3 8 6 11 20 7 9 3 19 2 16 12 1 9.636364

Expert Level (Rating 1 10)) 5 1 9 5 3 4 10 6 10 1 2 7 6 5.307692308

12 10 8 6 4 2 0 0 5 10

y = 0.0436x + 4.9156 R = 0.0078

Expert Level (Rating 1 - 10))

15

Years Using Excel

0.088518 0.007835

elationship. es sense

on e linear relationship. This ect indirect (negative) when it is positive one. reases ecreases

RSON and other Linear Regression

100.56x + 11436 R = 0.7193

Sample Standard Deviation = Spread In Data. How Fairly Does The Mean Represent The Data Points?

s=
100 120

(1)

(x Deviation)* (y Deviation) (x Deviation)* (y Deviation) -54983.28889 -55627.28889 -66089.95556 -41721.95556 -15375.28889 -107469.9556 -29287.28889 -25012.62222 4056.711111 2857.377778 -12415.28889 -61341.95556 5338.044444 -59609.95556 -13557.95556

Sample Covariance = Measure the Strength of the Linear Relationship Between 2 Variables, but has problem with units. Note: See 4 Quadrant Example of why this measure makes sense.

sxy =

( ) 1

Coefficient of Correlation = Measures Strength and Direction Of Liner Relationship. Does Not Have A Problem With Units. Range From -1 to 0 to + 1. -1 = Perfect Indirect (Negative) Relationship (as x increases, y decreases). 0 = No Relationship. +1 = Perfect Direct (Positive) Relationship (as x increases, y increases). Used for Linear Relationship only.

-530240.6667

rxy =

( ) 1

sxsy

rxy =
Correlation is not causation

sxsy

Ybar $4,068 $4,068

Linear (Sales Ice Cream)

100

120

140

y = 0.0436x + 4.9156 R = 0.0078

15

20

25

s Using Excel

Linear Regression #4: Calculate Slope & Y-Intercept, Create Estimated Equation and Use It To Make Predictio Formula for slope is derived from the expression minSUM(y observed value - y Predicted value)^2 using differential calculu 667. Calculate Slope and Y-Intercept for Regression Line long hand. Calculate Slope using the SLOPE Function Calculate the y-Intercept using the INTERCEPT Function Slope = Rise Over Run = For every one unit of x, how far does y move? Y-intercept = y value where x = zero. = point at which line crosses axis Use slope and y-intercept to create estimated simple linear regression equation (line or model) From sample data, the slope and y-intercept are point estimates for the population parameters for slope and y-int

Use estimated simple linear regression line to make predictions Be careful when making predictions with the estimated simple linear regression equation (line or model) when the x values range of the sample data. Why? Because the data may show a linear relationship over the range of sample data, but may sh relationship outside that sampled range. See how to use FORECAST function to make predictions.
$12,000 Sales Chicken Soup $10,000 $8,000 $6,000 $4,000 $2,000 $0 0

Example 2:

Mean: Count n -1 Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

59.93333333 15 14

$5,409

Xbar 59.93333333 59.93333333 x 0 100 20 40


Temperature (F)

x y (x Deviation) Temperature (F) Sales Chicken Soup (x - Xbar) 86 $3,300 26.06666667 40 $8,200 -19.93333333 41 $8,900 -18.93333333 78 $3,100 18.06666667 71 $4,020 11.06666667 91 $1,950 31.06666667 70 $2,500 10.06666667 37 $6,500 -22.93333333 65 $6,210 5.066666667 42 $5,250 -17.93333333 53 $7,200 -6.933333333 83 $2,750 23.06666667 63 $7,150 3.066666667 36 $7,900 -23.93333333 43 $6,210 -16.93333333

(y Deviation) (x Deviation)^2 (y - Ybar) (x - Xbar)^2 -2109.33333 679.4711111 2790.666667 397.3377778 3490.666667 358.4711111 -2309.33333 326.4044444 -1389.33333 122.4711111 -3459.33333 965.1377778 -2909.33333 101.3377778 1090.666667 525.9377778 800.6666667 25.67111111 -159.333333 321.6044444 1790.666667 48.07111111 -2659.33333 532.0711111 1740.666667 9.404444444 2490.666667 572.8044444 800.6666667 286.7377778

Sum of Deviations 0.00 0.00 SUM Deviations^2 ====================>> 5272.933333 SUM Mult. Deviations =============================================>> Sample SD x 19.40716608 19.40716608 Sample SD y 2301.013648 2301.013648 Sample Covariance -37874.3333 -37874.33333 Coefficient of Correlation -0.84813245 -0.84813245 Slope Y-Intercept x-value to make prediction 71 Equation to Predict

Example 3: Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 Mean: Sample Covariance Coefficient of Correlation r^2 Slope Y Intercept x Predicted y x y

Xbar y 66.27272727 66.27272727

0 8000

Temperature (F) Sales Ice Cream 91 $7,113 45 $2,044 46 $1,108 83 $7,093 76 $3,902 96 $6,676 75 $5,403 42 $886 70 $4,740 47 $2,637 58 $3,150 66.27272727 $4,068

Sales Ice Cream $9,000 $8,000 $7,000 Sales Ice Cream $6,000 $5,000 $4,000 $3,000 $2,000 $1,000 $0 0 20 40

Xbar

Ybar

y = 112x - 3354.1 R = 0.9056

60 Temperature (F)

Strength and Direction of the relationship (-1 to 0 to +1) Coefficient of Determination = R^2 = "Goodness of fit for our line" (Number between 0 and 1 for every one unit of x, how far does y move? Point at which estimated regression line crosses y-axis 85

Check:

$6,165.78

and Use It To Make Predictions e)^2 using differential calculus. See text page

Sample Standard Deviation = Spread In Data. How Fairly Does The Mean Re Data Points?

g hand.

ion es y move? rosses axis quation (line or model) arameters for slope and y-intercept

s=

(1)

dictions or model) when the x values are outside the ge of sample data, but may show some other

Sample Covariance = Measure the Strength of the Linear Relationship Between but has problem with units. Note: See 4 Quadrant Example of why this measure sense.

ons.

sxy =

( ) 1

y = -100.56x + 11436 R = 0.7193

y 0 10000 Ybar
60 Temperature (F)

Coefficient of Correlation = Measures Strength and Direction Of Liner Relations Not Have A Problem With Units. Range From -1 to 0 to + 1. -1 = Perfect Indirec Relationship (as x increases, y decreases). 0 = No Relationship. +1 = Perfect Dire Relationship (as x increases, y increases). Used for Linear Relationship only.

$5,409 $5,409100 80

rxy =
120

( ) 1

sxsy

(x Deviation)* (y Deviation)^2 (y Deviation) (x Deviation)* (y Deviation) (y - Ybar)^2 4449287.111 -54983.28889 7787820.444 -55627.28889 12184753.78 -66089.95556 5333020.444 -41721.95556 1930247.111 -15375.28889 11966987.11 -107469.9556 8464220.444 -29287.28889 1189553.778 -25012.62222 641067.1111 4056.711111 25387.11111 2857.377778 3206487.111 -12415.28889 7072053.778 -61341.95556 3029920.444 5338.044444 6203420.444 -59609.95556 641067.1111 -13557.95556

Estimated Simple Linear Regression Equation

i = b0 + b1xi
Model based off of proof that minimizes: Least Squares Criterion:

min= ( i)2 or min= ( b0 + b1xi )2


In order to get formula for b0 and b1:
Slope of Line (for every 1 unit of x, how much does y move?)

b1 =

( ) 2

Y-Intercept (at what point does the line cross the y-axis?)

1
Y-Intercept (at what point does the line cross the y-axis?)
-530240.6667

74125293.33 ===========>>

b0 = Ybar - b1*Xbar

Correlation is not causation Strength and Direction of the relationship (-1 to 0 to +1) For every one unit of x, how far does y move? Point at which estimated regression line crosses y-axis

x 0 120

Ybar $4,068 $4,068

Ybar

Linear (Sales Ice Cream)

80 Temperature (F)

100

120

140

e" (Number between 0 and 1)

w Fairly Does The Mean Represent The

near Relationship Between 2 Variables, xample of why this measure makes

Direction Of Liner Relationship. Does -1 = Perfect Indirect (Negative) lationship. +1 = Perfect Direct (Positive) Linear Relationship only.

y move?)

nship (-1 to 0 to +1)

Linear Regression #4: Calculate Slope & Y-Intercept, Create Estimated Equation and Use It To Make Prediction Formula for slope is derived from the expression minSUM(y observed value - y Predicted value)^2 using differential calculus. 667. Calculate Slope and Y-Intercept for Regression Line long hand. Calculate Slope using the SLOPE Function Calculate the y-Intercept using the INTERCEPT Function Slope = Rise Over Run = For every one unit of x, how far does y move? Y-intercept = y value where x = zero. = point at which line crosses axis Use slope and y-intercept to create estimated simple linear regression equation (line or model) From sample data, the slope and y-intercept are point estimates for the population parameters for slope and y-inter Use estimated simple linear regression line to make predictions Be careful when making predictions with the estimated simple linear regression equation (line or model) when the x values a range of the sample data. Why? Because the data may show a linear relationship over the range of sample data, but may s other relationship outside that sampled range. See how to use FORECAST function to make predictions.
$12,000 Sales Chicken Soup $10,000 $8,000 $6,000 $4,000 $2,000 $0 0

Example 2:

Mean: Count n -1 Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

59.93333 15 14

$5,409

Xbar 59.93333333 59.93333333 x 0 100 20 40


Temperature (F)

x y (x Deviation) (y Deviation) (x Deviation)^2 Temperat ure (F) Sales Chicken Soup (x - Xbar) (y - Ybar) (x - Xbar)^2 86 $3,300 26.06666667 -2109.33333 679.4711111 40 $8,200 -19.93333333 2790.666667 397.3377778 41 $8,900 -18.93333333 3490.666667 358.4711111 78 $3,100 18.06666667 -2309.33333 326.4044444 71 $4,020 11.06666667 -1389.33333 122.4711111 91 $1,950 31.06666667 -3459.33333 965.1377778 70 $2,500 10.06666667 -2909.33333 101.3377778 37 $6,500 -22.93333333 1090.666667 525.9377778 65 $6,210 5.066666667 800.6666667 25.67111111 42 $5,250 -17.93333333 -159.333333 321.6044444 53 $7,200 -6.933333333 1790.666667 48.07111111 83 $2,750 23.06666667 -2659.33333 532.0711111 63 $7,150 3.066666667 1740.666667 9.404444444 36 $7,900 -23.93333333 2490.666667 572.8044444 43 $6,210 -16.93333333 800.6666667 286.7377778 Sum of Deviations 0.00 0.00 SUM Deviations^2 ====================>> 5272.933333

SUM Mult. Deviations =============================================>> Sample SD x 19.40716608 19.40716608 Sample SD y 2301.013648 2301.013648 Sample Covariance -37874.3333 -37874.33333 Coefficient of Correlation -0.84813245 -0.84813245 Slope -100.558955 -100.5589552 Y-Intercept $11,436.17 11436.16671 x-value to make prediction 71 $4,296.48 4296.480896 Equation to Predict y Predicted = $11436.17 - $100.56*x y Predicted = 11436.17 + -100.56*x

Example 3: Sample Point No. 1 2 3 4 5 6 7 8 9 10 11 Mean: Sample Covariance Coefficient of Correlation r^2 Slope Y Intercept x Predicted y Check: x y

Xbar y 66.27272727 66.27272727

0 8000

Temperat ure (F) Sales Ice Cream 91 $7,113 45 $2,044 46 $1,108 83 $7,093 76 $3,902 96 $6,676 75 $5,403 42 $886 70 $4,740 47 $2,637 58 $3,150 66.27273 $4,068

Sales Ice Cream $9,000 $8,000 $7,000 Sales Ice Cream $6,000 $5,000 $4,000 $3,000 $2,000 $1,000 $0 0 20 40

Xbar

Ybar

y = 112x - 3354.1 R = 0.9056

60 Temperature (F)

43143.69 0.951608 0.905558 111.9981 -3354.05 85 6165.782 6165.782 Strength and Direction of the relationship (-1 to 0 to +1) Coefficient of Determination = R^2 = "Goodness of fit for our line" (Number between 0 and 1) for every one unit of x, how far does y move? Point at which estimated regression line crosses y-axis

on and Use It To Make Predictions lue)^2 using differential calculus. See text page

Sample Standard Deviation = Spread In Data. How Fairly Does The Mean Re Data Points?

ng hand.

ction oes y move? crosses axis n equation (line or model) n parameters for slope and y-intercept redictions ne or model) when the x values are outside the e range of sample data, but may show some e. tions.

s=

(1)

Sample Covariance = Measure the Strength of the Linear Relationship Between but has problem with units. Note: See 4 Quadrant Example of why this measure sense.

sxy =

( ) 1

y = -100.56x + 11436 R = 0.7193

y 0 10000 Ybar
60 Temperature (F)

Coefficient of Correlation = Measures Strength and Direction Of Liner Relations Not Have A Problem With Units. Range From -1 to 0 to + 1. -1 = Perfect Indirec Relationship (as x increases, y decreases). 0 = No Relationship. +1 = Perfect Dire Relationship (as x increases, y increases). Used for Linear Relationship only.

$5,409 $5,409100 80

rxy =
120

( ) 1

sxsy

(x Deviation)* (y Deviation)^2 (y Deviation) (x Deviation)* (y Deviation) (y - Ybar)^2 4449287.111 -54983.28889 7787820.444 -55627.28889 12184753.78 -66089.95556 5333020.444 -41721.95556 1930247.111 -15375.28889 11966987.11 -107469.9556 8464220.444 -29287.28889 1189553.778 -25012.62222 641067.1111 4056.711111 25387.11111 2857.377778 3206487.111 -12415.28889 7072053.778 -61341.95556 3029920.444 5338.044444 6203420.444 -59609.95556 641067.1111 -13557.95556 74125293.33

Estimated Simple Linear Regression Equation

i = b0 + b1xi
Model based off of proof that minimizes: Least Squares Criterion:

min= ( i)2 or min= ( b0 + b1xi )2


In order to get formula for b0 and b1:
Slope of Line (for every 1 unit of x, how much does y move?)

b1 =

( ) 2

Y-Intercept (at what point does the line cross the y-axis?)

b0 = Ybar - b1*Xbar

===========>>

-530240.6667

b0 = Ybar - b1*Xbar

Correlation is not causation Strength and Direction of the relationship (-1 to 0 to +1) For every one unit of x, how far does y move? Point at which estimated regression line crosses y-axis

x 0 120

Ybar $4,068 $4,068

Ybar

Linear (Sales Ice Cream)

80 Temperature (F)

100

120

140

e" (Number between 0 and 1)

w Fairly Does The Mean Represent The

near Relationship Between 2 Variables, xample of why this measure makes

Direction Of Liner Relationship. Does -1 = Perfect Indirect (Negative) lationship. +1 = Perfect Direct (Positive) Linear Relationship only.

y move?)

nship (-1 to 0 to +1)

Linear Regression #5: Coefficient of Determination: Goodness of Fit = SSR/SST Calculate Total Sum Of Squares (Total Y Deviations Squared) = SST = How well observations cluster around Y Bar (Y Mean Plo deviations of y observed and Mean of Y (Ybar) Calculate Sum of Squares Due To Error = SSE = How well observations cluster around estimated simple linear regression equ deviations between y observed and y predicted = measure of variation that is not explained by the estimated simple linear re model). Calculate Sum of Squares Due To Regression = SSR = SST - SSE = sum of squares of deviations between y predicted an Sales Chicken Soup
Xbar

Yabr Relationship between SST and SSR and SST is: SST = SSR + SSE. When there is no error, the predicted values and the observed Observation 3 Total Variation (y3 - Ybar) regression line and therefore SSE would equal zero. In this case SST = SSR + 0 and SSR/SST = 1, which means perfect "goodne Residual (y3 - Y Observed) the Coefficient of Determination will always be a number between 0Explained Part ofgoodness of fit" and 1 = "perfect and 1. 0 = "no Total Variation (Y Predicted - Ybar) Linear (Sales Chicken Soup)

Sales Chicken Soup

SSR/SST = Coefficient of Determination = R Squared = r^2 $10,000 Use RSQ function to calculate Coefficient of Determination Use Coefficient of Correlation Squared to calculate coefficient of Determination $8,000 Coefficient of Determination can be used for linear and non-linear relationships. This is compared to Coefficient of Correlatio $6,000 for linear relationships.
$4,000 $2,000 $0 0 Part of Total 20 Variation 40 60 Temperature (F)

Mean Slope Intercept

Xbar Ybar 59.93333 -100.559 11436.17

$5,409

Not explained by model Residual (y Observed - y Predicted)

Sample Point x y Predicted y Temperat Sales Chicken No. ure (F) Soup Predicted y 1 86 $3,300 2 40 $8,200 3 41 $8,900 4 78 $3,100 5 71 $4,020 6 91 $1,950 7 70 $2,500 8 37 $6,500 9 65 $6,210 10 42 $5,250 11 53 $7,200 12 83 $2,750 13 63 $7,150 14 36 $7,900 15 43 $6,210

Residual^2 (y Observed - y Predicted)^2

SSE SSR

SSR + SSE = SST

Coefficient of Determination = r^2 = Measure of goodness of fit ======> r^2 = SSR/SST Check: Coefficient of Correlation r^2 = SSR/SST Proportion of the variability in the dependent variable y that is explained by the estimated regression equatio How well does the estimated regression line fit the data? Measure of the goodness of fit for the estimated regression line A number between 0 and +1 Can be used your nonlinear relationships as well as linear. How well are observations are more closely grouped about the least squares line? 1 = perfectly. 0 = Not at all.

Xbar Example 3: 66.27272727 66.27272727

y 0 8000

Sample Point x

y
$9,000 $8,000 $7,000 Sales Ice Cream $6,000 $5,000 $4,000 $3,000 $2,000 $1,000 $0 0

Sales Ice Cream

Xbar

Ybar

No. 1 2 3 4 5 6 7 8 9 10 11 Mean:

Temperat ure (F) Sales Ice Cream 91 $7,113 45 $2,044 46 $1,108 83 $7,093 76 $3,902 96 $6,676 75 $5,403 42 $886 70 $4,740 47 $2,637 58 $3,150 66.27273 4068.363636

y = 112x - 3354.1 R = 0.9056

20

40

60 Temperature (F)

Slope

111.9981 for every one unit of x, how far does y move?

Y Intercept x Predicted y Coefficient of Correlation r^2

-3354.05 Point at which estimated regression line crosses y-axis 75 5045.801

0.951608 Strength and Direction of the relationship (-1 to 0 to +1) Coefficient of Determination = R^2 = "Goodness of fit for our line" (Number between 0 and 1) Proportion of the variability in the dependent variable y that is explained by the estimated regressi How well does the estimated regression line fit the data? Can be used your nonlinear relationships as well as linear.

ess of Fit = SSR/SST ter around Y Bar (Y Mean Plotted Line) = Total squared ar) simple linear regression equation = sum of squares of he estimated simple linear regression equation (line or

Xbar 59.93333

59.93333

ions between y predicted and Mean of Y (Ybar)

cted values and the observed values would all lie on the Variation (y3 - Ybar) which means perfect "goodness of fit". This means that odness of fit" and 1 = "perfect goodness of fit". al Variation (Y Predicted - Ybar)

ith residual = observed value - predicted value = repre predict =

( i)

ed = r^2 y = -100.56x + 11436 mination R = 0.7193 nt of Determination ed to Coefficient of Correlation, which can only be used

Sum Of Squares Due To Error (in model) = How well o the Estimated Line = SSE = Not Explained Part of SST

SSE = ( i)2 SST = ( )2 SSR = (i )2


Relationship between three:

Total Sum Of Squares (Deviation from Mean) = How w cluster around the Ybar Line = SST
80 Part of Total Variation 100 120

Temperature (F) Explained by Model

(Predicted y - Ybar)^2 (Predicted y - Ybar)^2

(y Deviations)^2 (y Observed Ybar)^2

Sum Of Squares Due To Regression (Predicted y minu SST = SSR

SST = SSR + SSE

If there is no deviation in the observed values and the SSR = SST, thus:

SSR/SST = 1 = perfect Prediction.

Coefficient of Determination = How well does the est the data? = Measure of the goodness of fit for the est number between 0 and +1. Can be used your nonline SST = Total Variation linear.

rxy2 = r2 = SSR/SST

Goodness of fit of model to observed values (number between Strength and Direction (number between -1 and 1)

rxy2 = r2 = SSR/SST

estimated regression equation

? 1 = perfectly. 0 = Not at all.

= The percentage of total sum of square that can be e estimated regression equation = Proportion of the va variable y that is explained by the estimated regressio are more closely grouped about the least squares line Using r^2 only, we can draw no conclusion about whe between x and y is statistically significant. Such concl considerations that involve sample size and propertie sampling distributions of the least squares estimators

x 0 120

Ybar $4,068 $4,068

Ybar

Linear (Sales Ice Cream)

80 Temperature (F)

100

120

140

umber between 0 and 1) ned by the estimated regression equation

y 1

x 0

Yabr $5,409

Observation 3 Total Variation (y3 - Ybar) 41 $8,900

Residual (y3 - Y Observed) y 42 $8,900

10000

100

$5,409

41

$5,409

42

7313.249551

predicted value = represents error in using i to

in model) = How well observations cluster around Explained Part of SST

on from Mean) = How well the observations

ssion (Predicted y minus Y bar) = Explained Part of

bserved values and the model values SSE = 0 and

ect Prediction.

How well does the estimated regression line fit odness of fit for the estimated regression line = A n be used your nonlinear relationships as well as

of square that can be explained by using the = Proportion of the variability in the dependent the estimated regression equation. Observations ut the least squares line. o conclusion about whether the relationship significant. Such conclusions must be based on mple size and properties of the appropriate east squares estimators.

Explained Part of Total Variation (Y Predicted - Ybar) 42

y 7313.25

42

$5,409

Sales Chicken Soup

Linear Regression #5: Coefficient of Determination: Goodness of Fit = SSR/SST Calculate Total Sum Of Squares (Total Y Deviations Squared) = SST = How well observations cluster around Y Bar (Y Mean Plo deviations of y observed and Mean of Y (Ybar) Calculate Sum of Squares Due To Error = SSE = How well observations cluster around estimated simple linear regression equ deviations between y observed and y predicted = measure of variation that is not explained by the estimated simple linear re model). Sales Chicken Soup Calculate Sum of Squares Due To Regression = SSR = SST - SSE = sum of squares of deviations between y predicted an Xbar Relationship between SST and SSR and SST is: SST = SSR + SSE. When there is no error, the predicted values and the observed Yabr regression line and therefore SSE would equal zero. In this case SST = SSR + 0Observation 3 Totalwhich means- perfect "goodne and SSR/SST = 1, Variation (y3 Ybar) the Coefficient of Determination will always be a number between 0Residual0 = "noObserved) of fit" and 1 = "perfect and 1. (y3 - Y goodness Explained Part of Total = r^2 SSR/SST = Coefficient of Determination = R SquaredVariation (Y Predicted - Ybar) Linear (Sales Chicken Soup) Use RSQ function to calculate Coefficient of Determination $10,000 Use Coefficient of Correlation Squared to calculate coefficient of Determination Coefficient of Determination can be used for linear and non-linear relationships. This is compared to Coefficient of Correlatio $8,000 for linear relationships.
$6,000 $4,000 $2,000 $0

Mean Slope Intercept

Xbar Ybar 59.93333 -100.559 11436.17

$5,409

Variation 0 Part of Total 20 Not explained by model

40

60 Temperature (F)

Sample Point x y Predicted y Residual Residual^2 Temperat Sales Chicken (y Observed - y (y Observed - y No. ure (F) Soup Predicted y Predicted) Predicted)^2 1 86 $3,300 2788.096569 511.9034314 262045.123 2 40 $8,200 7413.808506 786.1914937 618097.0647 3 41 $8,900 7313.249551 1586.750449 2517776.987 4 78 $3,100 3592.56821 -492.56821 242623.4415 5 71 $4,020 4296.480896 -276.4808961 76441.68594 6 91 $1,950 2285.301793 -335.3017928 112427.2923 7 70 $2,500 4397.039851 -1897.039851 3598760.197 8 37 $6,500 7715.485372 -1215.485372 1477404.689 9 65 $6,210 4899.834627 1310.165373 1716533.304 10 42 $5,250 7212.690596 -1962.690596 3852154.376 11 53 $7,200 6106.542089 1093.457911 1195650.203 12 83 $2,750 3089.773434 -339.7734341 115445.9865 13 63 $7,150 5100.952537 2049.047463 4198595.504 14 36 $7,900 7816.044327 83.955673 7048.555028 15 43 $6,210 7112.131641 -902.1316408 813841.4974 SSE 20804845.91 SSR SSR + SSE = SST

Coefficient of Determination = r^2 = Measure of goodness of fit ======> r^2 = SSR/SST Check: Coefficient of Correlation r^2 = SSR/SST Proportion of the variability in the dependent variable y that is explained by the estimated regression equatio How well does the estimated regression line fit the data? Measure of the goodness of fit for the estimated regression line A number between 0 and +1 Can be used your nonlinear relationships as well as linear. How well are observations are more closely grouped about the least squares line? 1 = perfectly. 0 = Not at all.

Xbar Example 3: 66.27272727 66.27272727

y 0 8000

Sample Point x

y
$9,000 $8,000 $7,000 Sales Ice Cream $6,000 $5,000 $4,000 $3,000 $2,000 $1,000 $0 0

Sales Ice Cream

Xbar

Ybar

No. 1 2 3 4 5 6 7 8 9 10 11 Mean:

Temperat ure (F) Sales Ice Cream 91 $7,113 45 $2,044 46 $1,108 83 $7,093 76 $3,902 96 $6,676 75 $5,403 42 $886 70 $4,740 47 $2,637 58 $3,150 66.27273 4068.363636

y = 112x - 3354.1 R = 0.9056

20

40

60 Temperature (F)

Slope Y Intercept x

111.9981 for every one unit of x, how far does y move? -3354.05 Point at which estimated regression line crosses y-axis 75

Predicted y Coefficient of Correlation r^2

5045.801

0.951608 Strength and Direction of the relationship (-1 to 0 to +1) 0.905558 Coefficient of Determination = R^2 = "Goodness of fit for our line" (Number between 0 and 1) Proportion of the variability in the dependent variable y that is explained by the estimated regressi How well does the estimated regression line fit the data? Can be used your nonlinear relationships as well as linear.

ess of Fit = SSR/SST ter around Y Bar (Y Mean Plotted Line) = Total squared ar) simple linear regression equation = sum of squares of he estimated simple linear regression equation (line or

Xbar 59.93333

59.93333

ions between y predicted and Mean of Y (Ybar) cted values and the observed values would all lie on the which means- perfect "goodness of fit". This means that Variation (y3 Ybar) odness of fit" and 1 = "perfect goodness of fit". al = r^2 edVariation (Y Predicted - Ybar) mination y = -100.56x + 11436 nt of Determination R = 0.7193 ed to Coefficient of Correlation, which can only be used

ith residual = observed value - predicted value = repre predict =

( i)

Sum Of Squares Due To Error (in model) = How well o the Estimated Line = SSE = Not Explained Part of SST

SSE = ( i)2
120

Part of Total Variation 100 80 Explained by Model Temperature (F) (Predicted y - Ybar)^2 (Predicted y - Ybar)^2 6870882.177 4017920.719 3624896.965 3300635.513 1238440.547 9759573.066 1024738.094 5318337.225 259588.9316 3252097.417 486100.0492 5380358.126 95098.71525 5792257.807 2899522.076 53320447.43 74125293.33

Total Sum Of Squares (Deviation from Mean) = How w cluster around the Ybar Line = SST

SST = ( )2 SSR = ( ) SST = SSR + SSE

(y Deviations)^2 Sum Of Squares Due To Regression (Predicted y minu (y Observed Ybar)^2 SST = SSR 4449287.111 2 7787820.444 i 12184753.78 5333020.444 Relationship between three: 1930247.111 11966987.11 8464220.444 1189553.778 If there is no deviation in the observed values and the 641067.1111 25387.11111 SSR = SST, thus: 3206487.111 7072053.778 3029920.444 6203420.444 Coefficient of Determination = How well does the est 641067.1111 74125293.33 SST = Total Variation = Measure of the goodness of fit for the est the data?

SSR/SST = 1 = perfect Prediction.

number between 0 and +1. Can be used your nonline linear.

rxy2 = r2 = SSR/SST

0.719328653 Goodness of fit of model to observed values (number between 2 2 0.719328653 xy -0.84813245 Strength and Direction (number between -1 and 1) 0.719328653 = The percentage of total sum of estimated regression equation

= r = SSR/SST

? 1 = perfectly. 0 = Not at all.

square that can be e estimated regression equation = Proportion of the va variable y that is explained by the estimated regressio are more closely grouped about the least squares line Using r^2 only, we can draw no conclusion about whe between x and y is statistically significant. Such concl considerations that involve sample size and propertie sampling distributions of the least squares estimators

x 0 120

Ybar $4,068 $4,068

Ybar

Linear (Sales Ice Cream)

80 Temperature (F)

100

120

140

umber between 0 and 1) ned by the estimated regression equation

y 1

x 0

Yabr $5,409

Observation 3 Total Variation (y3 - Ybar) 41 $8,900

Residual (y3 - Y Observed) y 42 $8,900

10000

100

$5,409

41

$5,409

42

7313.249551

predicted value = represents error in using i to

in model) = How well observations cluster around Explained Part of SST

on from Mean) = How well the observations

ssion (Predicted y minus Y bar) = Explained Part of

bserved values and the model values SSE = 0 and

ect Prediction.

How well does the estimated regression line fit odness of fit for the estimated regression line = A n be used your nonlinear relationships as well as

of square that can be explained by using the = Proportion of the variability in the dependent the estimated regression equation. Observations ut the least squares line. o conclusion about whether the relationship significant. Such conclusions must be based on mple size and properties of the appropriate east squares estimators.

Explained Part of Total Variation (Y Predicted - Ybar) 42

y 7313.25

42

$5,409

You might also like