You are on page 1of 32

Bagging

LING 572
Fei Xia
1/24/06
Ensemble methods
So far, we have covered several learning
methods: FSA, HMM, DT, DL, TBL.

Question: how to improve results?

One solution: generating and combining multiple


predictors
Bagging: bootstrap aggregating
Boosting

Outline
An introduction to the bootstrap

Bagging: basic concepts (Breiman, 1996)

Case study: bagging a treebank parser


(Henderson and Brill, ANLP 2000)
Introduction to bootstrap
Motivation
Whats the average price of house prices?

From F, get a sample x=(x1, x2, , xn), and


calculate the average u.

Question: how reliable is u? Whats the


standard error of u? whats the confidence
interval?
Solutions
One possibility: get several samples from
F.

Problem: it is impossible (or too


expensive) to get multiple samples.

Solution: bootstrap
The general bootstrap algorithm
Let the original sample be L=(x1,x2,,xn)
Repeat B time:
Generate a sample Lk of size n from L by sampling with
replacement.
Compute * for x*.

Now we end up with bootstrap values



* (1 ,..., B )
* *

Use these values for calculating all the quantities of


interest (e.g., standard deviation, confidence intervals)
An example

X1=(1.57,0.22,19.67,
0,0,2.2,3.12)
Mean=4.13
X=(3.12, 0, 1.57,
19.67, 0.22, 2.20) X2=(0, 2.20, 2.20,
Mean=4.46 2.20, 19.67, 1.57)
Mean=4.64

X3=(0.22, 3.12,1.57,
3.12, 2.20, 0.22)
Mean=1.74
A quick view of bootstrapping
Introduced by Bradley Efron in 1979

Named from the phrase to pull oneself up by ones bootstraps,


which is widely believed to come from the Adventures of Baron
Munchausen.

Popularized in 1980s due to the introduction of computers in


statistical practice.

It has a strong mathematical background.

It is well known as a method for estimating standard errors, bias,


and constructing confidence intervals for parameters.
Bootstrap distribution
The bootstrap does not replace or add to
the original data.

We use bootstrap distribution as a way to


estimate the variation in a statistic based
on the original data.
Sampling distribution vs.
bootstrap distribution
The population: certain unknown
quantities of interest (e.g., mean)

Multiple samples sampling distribution

Bootstrapping:
One original sample B bootstrap samples
B bootstrap samples bootstrap distribution
Bootstrap distributions usually approximate the
shape, spread, and bias of the actual sampling
distribution.

Bootstrap distributions are centered at the value


of the statistic from the original sample plus
any bias.

The sampling distribution is centered at the


value of the parameter in the population, plus
any bias.
Cases where bootstrap
does not apply
Small data sets: the original sample is not a good
approximation of the population

Dirty data: outliers add variability in our estimates.

Dependence structures (e.g., time series, spatial


problems): Bootstrap is based on the assumption
of independence.


How many bootstrap samples
are needed?
Choice of B depends on
Computer availability

Type of the problem: standard errors,


confidence intervals,

Complexity of the problem


Resampling methods
Boostrap
Permutation tests
Jackknife: we ignore one observation at
each time

Bagging: basic concepts
Bagging
Introduced by Breiman (1996)

Bagging stands for bootstrap


aggregating.

It is an ensemble method: a method of


combining multiple predictors.
Predictors
Let L be a training set {(xi, yi) | xi in X, yi in Y}, drawn from
the set of possible training sets.

A predictor : X Y is a function that for any given x, it


produces y=(x).

A learning algorithm : that given any L in , it


produces a predictor =(L) in .

Types of predictors:
Classifiers: DTs, DLs, TBLs,
Estimators: Regression trees
Others: parsers
Bagging algorithm
Let the original training data be L
Repeat B times:
Get a bootstrap sample Lk from L.
Train a predictor using Lk.

Combine B predictors by
Voting (for classification problem)
Averaging (for estimation problem)

Bagging decision trees

1. Splitting the data set into training set T1 and test set T2.
2. Bagging using 50 bootstrap samples.
3. Repeat Steps 1-2 100 times, and calculate average
test set misclassification rate.
Bagging regression trees

Bagging with 25 bootstrap samples.


Repeat 100 times.
How many bootstrap samples
are needed?

Bagging decision trees for the waveform task:


Unbagged rate is 29.0%.
We are getting most of the improvement using
only 10 bootstrap samples.
Bagging k-nearest neighbor
classifiers

100 bootstrap samples. 100 iterations.


Bagging does not help.
Experiment results
Bagging works well for unstable learning
algorithms.

Bagging can slightly degrade the


performance of stable learning
algorithms.
Learning algorithms
Unstable learning algorithms: small changes in the
training set result in large changes in predictions.
Neural network
Decision tree
Regression tree
Subset selection in linear regression

Stable learning algorithms:


K-nearest neighbors
Case study
Experiment settings
Henderson and Brill ANLP-2000 paper

Parser: Collinss Model 2 (1997)


Training data: sections 01-21
Test data: Section 23

Bagging:
Different ways of combining parsing results
Techniques for combining parsers
(Henderson and Brill, EMNLP-1999)
Parse hybridization: combining the
substructures of the input parses
Constituent voting
Nave Bayes

Parser switching: selecting one of the


input parses
Similarity switching
Nave Bayes
Experiment results

Baseline (no bagging): 88.63


Initial (one bag): 88.38
Final (15 bags): 89.17
Training corpus size effects
Summary
Bootstrap is a resampling method.

Bagging is directly related to bootstrap.


It uses bootstrap samples to train multiple predictors.
Output of predictors are combined by voting or other
methods.

Experiment results:
It is effective for unstable learning methods.
It does not help stable learning methods.
Uncovered issues
How to determine whether a learning
method is stable or unstable?

Why bagging works for unstable


algorithms?

You might also like