Professional Documents
Culture Documents
Carlos Carvalho
The University of Texas McCombs School of Business
mccombs.utexas.edu/faculty/carlos.carvalho/teaching
Todays Plan
sampling distributions
confidence intervals
hypothesis testing
Linear Prediction
Yi = b0 + b1 Xi
b1 = corr (X , Y )
sY
sX
b0 = Y b1 X
where,
v
u n
uX
2
Yi Y
sY = t
i=1
v
u n
uX
2
and sX = t
Xi X
i=1
The direction i
the sign of the c
Applied Regression
n Analysis
i
Carlos M. Carvalhoi=1
Cov (Y , X ) =
)
(Y Y )(Xi X
n1
cov(X , Y )
var(X )var(Y )
cov(X , Y )
sd(X )sd(Y )
Correlation
corr = .5
2
1
0
-1
-2
-3
-3
-2
-1
corr = 1
-1
-3
-2
-1
-2
-3
1
0
-1
-2
-3
-3
-2
-1
corr = -.8
corr = .8
-3
-2
-1
-3
-2
-1
Correlation
Only measures linear relationships:
corr(X , Y ) = 0 does not mean the variables are not related!
corr = 0.72
-8
-6
-4
10
-2
15
20
corr = 0.01
-3
-2
-1
10
15
20
b0 = Y b1 X
I
Least squares finds the point of means and rotate the line
through that point until getting the right slope
2. Slope:
b1 = corr (X , Y )
sY
sX
From now on, terms fitted values (Yi ) and residuals (ei ) refer
to those obtained from the least squares line.
The fitted values and residuals have some special properties. Lets
look at the housing data analysis to figure out what these
properties are...
10
120
100
corr(y.hat, x) = 1
80
Fitted Values
140
160
1.0
1.5
2.0
2.5
3.0
3.5
X
11
30
-10
10
mean(e) = 0
-20
Residuals
20
corr(e, x) = 0
1.0
1.5
2.0
2.5
3.0
3.5
12
Why?
What is the intuition for the relationship between Y and e and X ?
100
120
Crazy line: 10 + 50 X
80
60
140
160
1.0
1.5
2.0
2.5
3.0
3.5
13
-10
10
mean(e) = 1.8
-20
Crazy Residuals
20
corr(e, x) = -0.7
1.0
1.5
2.0
2.5
3.0
3.5
e is unrelated to X ; corr(X , e) = 0.
15
The intercept:
n
1X
1X
ei = 0
(Yi b0 b1 Xi )
n i=1
n i=1
=0
Y b0 b1 X
b0 = Y b1 X
16
corr(e, X ) =
n
X
) = 0
ei (Xi X
i=1
n
X
)
(Yi b0 b1 Xi )(Xi X
i=1
n
X
))(Xi X
)
=
(Yi Y b1 (Xi X
i=1
Pn
b1 =
Y )
= rxy
sy
sx
17
This leads to
n
X
i=1
(Yi Y )2 =
n
n
X
X
(Yi Y )2 +
ei2
i=1
i=1
18
Decomposing
Variance
The
ANOVATables
Table
Decomposingthethe
Variance
ANOVA
20
R2 =
SSR
SSE
=1
SST
SST
0 < R 2 < 1.
21
=
=
=
Pn
(Yi Y )2
Pi=1
n
(Yi Y )2
Pni=1
2
i=1 (b0 + b1 Xi b0 b1 X )
Pn
2
i=1 (Yi Y )
P
)2
b12 ni=1 (Xi X
b12 sx2
2
Pn
=
= rxy
sy2
(Yi Y )2
i=1
22
BackBack
to the
House
to the
HouseData
Data
''-
''/
''.
!""#$%%&$'()*"$+,
23
24
25
26
27
28
E [] = 0 E [Y | X ] = 0 + 1 X
(E [Y | X ] is conditional expectation of Y given X ).
29
30
Conditional Distributions
The conditional distribution for Y given X is Normal:
Y |X N(0 + 1 X , 2 ).
controls dispersion:
31
Mean is E [Y |X ] = E [0 + 1 X + ] = 0 + 1 X .
32
= 40 + 45(1.5) +
= 107.5 +
The model says that the mean value of a 1500 sq. ft. house is
$107,500 and that deviation from mean is within $20,000.
We are 95% sure that
I
20 < < 20
34
0 , 1 (linear pattern)
35
Mean of Y is linear in X .
36
Break
Back in 15 minutes...
37
Pn
Y )
0 = b0 = Y b1 X
38
39
2 =
1X 2
ei
n
i=1
40
1 X 2
SSE
s =
ei =
n2
n2
2
i=1
p
SSE /(n 2), in the same units as Y .
41
Degrees of Freedom
Degrees of Freedom is the number of times you get to observe
useful information about the variance youre trying to estimate.
For example, consider SST =
I
Pn
i=1 (Yi
Y )2 :
8"9"9:"-$;,"/"<"-$=45$.""$>.0?/*?-*$"--4-@$-"?*$)0$?.$estimated
Remember
that whenever you see standard error read it as
.0?/*?-*$*"<)?0)4/&$V ).$0,"$.0?/*?-*$*"<)?0)4/
!""#$%%%&$'()*"$+
43
44
45
46
47
) =
E (X
1
n
E (Xi ) =
P
) = var 1
var(X
Xi =
n
1
n2
var (Xi ) =
2
n
N , 2 .
If X is normal, then X
n
If X is not normal, we have the central limit theorem (more in a
minute)!
48
49
Oracle vs SAP
50
Oracle vs SAP
51
Simple CLT states that for iid random variables, X , with mean
and variance 2 , the distribution of the sample mean becomes
normal as the number of observations, n, gets large.
2
52
0.0
0.2
0.4
f(X)
0.6
0.8
1.0
E [X ] = 1 and var(X ) = 1.
53
150
50
0
Frequency
250
54
100
50
0
Frequency
150
0.0
0.5
1.0
1.5
2.0
2.5
3.0
55
150
100
50
0
Frequency
200
250
0.0
0.5
1.0
1.5
2.0
56
100
50
0
Frequency
150
200
0.7
0.8
0.9
1.0
1.1
1.2
1.3
57
150
100
50
0
Frequency
200
0.90
0.95
1.00
1.05
1.10
58
Sampling Distribution of b1
The sampling distribution of b1 describes how estimator b1 = 1
varies over different samples with the X values fixed.
It turns out that b1 is normally distributed: b1 N(1 , b21 ).
I
b1 is unbiased: E [b1 ] = 1 .
59
Sampling Distribution of b1
Can we intuit what should be in the formula for b1 ?
I
What about n?
Anything else?
var(b1 ) = P
2
2
=
)2
(n 1)sx2
(Xi X
Three Factors:
sample size (n), error variance ( 2 = 2 ), and X -spread (sx ).
60
Sampling Distribution of b0
b20
= var(b0 ) =
2
1
X
+
n (n 1)sx2
61
Estimated Variance
Recall that s =
s2
(n 1)sx2
qP
s
sb0 =
s2
2
1
X
+
n (n 1)sx2
Hence, sb1 =
b1 and sb0 =
b0 are estimated coefficient sds.
A high level of info/precision/accuracy means small sb values.
64
66
67
Confidence Intervals
1 = P(tnp,/2 <
68
Testing
Similarly, suppose that assuming bj tnp (j , sbj ) for our sample
bj leads to (recall Znp tnp (0, 1))
bj j
bj j
= .
+ P Znp >
P Znp <
sbj
sbj
Then the p-value is = 2P(Znp > |bj j |/sbj ).
You do this calculation for j = j0 , an assumed null/safe value,
and only reject j0 if is too small (e.g., < 1/20).
In regression, j0 = 0 almost always.
69
The center of the interval tells you what your estimate is.
The length of the interval tells you how sure you are about
your estimate.
70
71
Hypothesis Testing
72
Hypothesis Testing
73
Hypothesis Testing
bj j0
bj
zbj =
=
for j0 = 0.
sbj
sb j
If H0 is true, this should be distributed zbj tnp (0, 1).
I
74
Hypothesis Testing
We assess the size of zbj with the p-value :
= P(|Znp | > |zbj |) = 2P(Znp > |zbj |)
(once again, Znp tnp (0, 1)).
0.2
-z
0.0
p(Z)
0.4
-4
-2
75
Hypothesis Testing
76
Hypothesis Testing
The formal 2-step approach to hypothesis testing
I
78
-"./(($01"$!)2*345$5"65"33)42$72$8$,9:;<
Example:
Hypothesis Testing
b!
sb!
b!
sb!
!""#$%%%&$'()*"$+,
0.0643
b1 1
=
= 2.205
sb1
0.0291
Forecasting
81
Forecasting
If we use Yf , our prediction error is
ef = Yf Yf = Yf b0 b1 Xf
82
Forecasting
This can get quite complicated! A simple strategy is to build the
following (1 )100% prediction interval:
b0 + b1 Xf tn2,/2 s
A large predictive error variance (high uncertainty) comes from
I
.
Large difference between Xf and X
83
Forecasting
, the space between lines is magnified...
For Xf far from our X
84
1 X 2
ei
n2
s2
(n 1)sx2
s
sb0 = s
2
1
X
+
n (n 1)sx2
85
tnp,/2 is the value such that for Znp tnp (0, 1),
P(Znp > tnp,/2 ) = P(Znp < tnp,/2 ) = /2.
bj j0
sbj
86