Professional Documents
Culture Documents
Problems
Choosing the best model
Aside from OLS, few recognized standards
Few ways to judge if adding an explanatory
variable is justified by the additional explained
variance
Judging Models
Explanatory Framework
Need to find the best or most likely
model, given the data
Two aspects
Which variables should comprise the model?
Which form should the model take?
Predictive Framework
Of the potential variables and model
forms, which best predicts the
outcome?
Bayesian Approach
Bayes Law
Joint Distribution:
(A,B) or (A B)
A= Low Education
B= High Income
p ( A, B)
p ( B | A)
p( A)
p ( A, B ) p ( B | A) p ( A)
p ( A, B )
p( A | B)
p( B)
p ( A, B ) p( A | B ) p ( B )
p ( A) p ( B | A)
p( A | B)
p( B)
p( A) p ( B | A)
p( A | B)
Total Probabilit y
p ( Data | Model2 )
Bayes Factor
Posterior Odds
p ( Data | Model1 )
| 2 , Model
|
Model
d
p (pData
)
2
2
2
2
2
Bayes Factor
Posterior Odds B21
p(pData
Data| Model
| 1 , Model
1)
1 p 1 | Model1 d1
Jeffreys (1935)
Allows comparison of any two models
Nesting not required
Explanatory framework
Problem
Complexity
Challenging to solve
An Approximation: BIC
Bayesian Information Criterion (BIC)
Assumptions
large N
Nested Models
Prior expectation of model parameters is multivariate
normal
BIC-based Significance
Raftery (1995)
Examines all possible models with the
given variables (2k models)
For each model calculates a BIC-based
probability
p( IV )
probabilit ies
probabilit ies
Models with IV
Computationally intensive
A Further Approximation
Compare the model with all variables to
the model without a specific variable
Only requires a model for each IV (k)
Experiment: k=10, n=100,000
Variable
Coef.
P>t
bicdrop1 P
bic P
Riv1
0.0025
0.436*
0.996
0.960
Riv2
0.0011
0.731*
0.997
0.968
Riv3
-0.0044
0.167*
0.992
0.924
Riv4
0.0017
0.597*
0.996
0.965
Riv5
0.0021
0.507*
0.996
0.962
Riv6
0.0070
0.026*
0.963
0.651
Riv7
-0.0025
0.428*
0.996
0.959
Riv8
-0.0006
0.843*
0.997
0.970
Riv9
-0.0013
0.684*
0.997
0.968
Riv10
0.0071
0.024*
0.961 0.631
bicdrop1
Used when bic takes too long or
when comparisons to the AIC are
desired
Further Development
-pre- wise regression
Find the combination of IVs and model
specification that best
predict the outcome variable
Variable significance ignored