Millar BostonBIC

Using the Bayesian Information
Criterion to Judge Models and

Statistical Significance
Paul Millar
University of Calgary
Problems
Choosing the best model
Aside from OLS, few recognized standards
Few ways to judge if adding an explanatory
variable is justified by the additional explained
variance
Conventional p-values are problematic

Large, small N
Potential unrecognized relationships between
explanatory variables
Random associations not always detected
Judging Models
Explanatory Framework
Need to find the best or most likely
model, given the data
Two aspects
Which variables should comprise the model?
Which form should the model take?
Predictive Framework
Of the potential variables and model
forms, which best predicts the
outcome?
Bayesian Approach
Origins (Bayes 1763)

Bayes Factors (Jeffreys 1935)
BIC (Swartz 1978)
Variable Significance (Raftery 1995)
Judging Variables and Models
Stata Commands
Bayes Law
Joint Distribution:
(A,B) or (A B)
A= Low Education
B= High Income
p ( A, B)
p ( B | A)
p( A)
p ( A, B ) p ( B | A) p ( A)
p ( A, B )
p( A | B)
p( B)
p ( A, B ) p( A | B ) p ( B )
p ( A) p ( B | A)
p( A | B)
p( B)
p( A) p ( B | A)
p( A | B)
Total Probabilit y
Bayes Law and Model Probability

p ( Model ) p ( Data | Model )
p ( Model | Data )
Total Probabilit y
Assume: Two Models
p ( Model2 | Data ) p ( Model2 ) p ( Data | Model2 )
p ( Model1 | Data ) p ( Model1 ) p ( Data | Model1 )

Assume: Equal Priors
p ( Data | Model2 )
Bayes Factor
Posterior Odds
p ( Data | Model1 )
Bayes Law and Model Probability

Data| Model
| 2 , Model
|
Model
d
p (pData
)
2
2
2
2
2
Bayes Factor
Posterior Odds B21
p(pData
Data| Model
| 1 , Model
1)
1 p 1 | Model1 d1
Jeffreys (1935)
Allows comparison of any two models
Nesting not required
Explanatory framework
Problem
Complexity
Challenging to solve
An Approximation: BIC
Bayesian Information Criterion (BIC)
Function of N, df, deviance or 2 from the LRT

Readily obtainable from most model output
Allows approximation of the Bayes Factor
Two versions
relative to saturated model (BIC) or null model (BIC)
Assumptions
large N
Nested Models
Prior expectation of model parameters is multivariate
normal
Attributed to Schwartz (1978)
An Alternative to the t-test

Produces over-confident results for
large datasets
Random relationships sometimes
pass the test
Widely varying results possible when
combined with stepwise regression
Only other significance testing
method (re-sampling) provides no
guidance on form or content of model
BIC-based Significance
Raftery (1995)
Examines all possible models with the
given variables (2k models)
For each model calculates a BIC-based
probability
p( IV )
probabilit ies
probabilit ies
Models with IV
All Possible Models
Computationally intensive
A Further Approximation
Compare the model with all variables to
the model without a specific variable
Only requires a model for each IV (k)
Experiment: k=10, n=100,000
Variable
Coef.
P>t
bicdrop1 P
bic P
Riv1
0.0025
0.436*
0.996
0.960
Riv2
0.0011
0.731*
0.997
0.968
Riv3
-0.0044
0.167*
0.992
0.924
Riv4
0.0017
0.597*
0.996
0.965
Riv5
0.0021
0.507*
0.996
0.962
Riv6
0.0070
0.026*
0.963
0.651
Riv7
-0.0025
0.428*
0.996
0.959
Riv8
-0.0006
0.843*
0.997
0.970
Riv9
-0.0013
0.684*
0.997
0.968
Riv10
0.0071
0.024*
0.961 0.631
-pre Prediction only

The reduction in errors for
categorical variables
logistic, probit, mlogit, cloglog
Allows calculation of best cutoff
The reduction in squared errors for

continuous variables
regress, etc.
Allows comparison of prediction

capability across model forms
Ex. mlogit vs. ologit vs. nbreg vs. poisson
bicdrop1
Used when bic takes too long or
when comparisons to the AIC are
desired
-bic Reports probability for each variable

using Rafterys procedure
Also reports pseudo-R2, pre,
bicdrop1 results
Reports most likely models, given
the theory and data (hence a form of
stepwise)
Further Development
-pre- wise regression
Find the combination of IVs and model
specification that best
predict the outcome variable
Variable significance ignored
Bayesian cross-model comparisons

Safer than stepwise
Bayes Factors
Requires development of reasonable
empirical solutions to integrals

Millar BostonBIC

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Millar BostonBIC

Uploaded by

Copyright:

Available Formats

Using the Bayesian Information

Criterion to Judge Models and

Conventional p-values are problematic

Origins (Bayes 1763)

Bayes Law and Model Probability

p ( Model2 | Data ) p ( Model2 ) p ( Data | Model2 )

p ( Model1 | Data ) p ( Model1 ) p ( Data | Model1 )

Bayes Law and Model Probability

Function of N, df, deviance or 2 from the LRT

Attributed to Schwartz (1978)

An Alternative to the t-test

All Possible Models

-pre Prediction only

The reduction in squared errors for

Allows comparison of prediction

-bic Reports probability for each variable

Bayesian cross-model comparisons

You might also like