Professional Documents
Culture Documents
LING 572
Fei Xia
1/24/06
Ensemble methods
So far, we have covered several learning
methods: FSA, HMM, DT, DL, TBL.
Solution: bootstrap
The general bootstrap algorithm
Let the original sample be L=(x1,x2,,xn)
Repeat B time:
Generate a sample Lk of size n from L by sampling with
replacement.
Compute * for x*.
X1=(1.57,0.22,19.67,
0,0,2.2,3.12)
Mean=4.13
X=(3.12, 0, 1.57,
19.67, 0.22, 2.20) X2=(0, 2.20, 2.20,
Mean=4.46 2.20, 19.67, 1.57)
Mean=4.64
X3=(0.22, 3.12,1.57,
3.12, 2.20, 0.22)
Mean=1.74
A quick view of bootstrapping
Introduced by Bradley Efron in 1979
Bootstrapping:
One original sample B bootstrap samples
B bootstrap samples bootstrap distribution
Bootstrap distributions usually approximate the
shape, spread, and bias of the actual sampling
distribution.
How many bootstrap samples
are needed?
Choice of B depends on
Computer availability
Types of predictors:
Classifiers: DTs, DLs, TBLs,
Estimators: Regression trees
Others: parsers
Bagging algorithm
Let the original training data be L
Repeat B times:
Get a bootstrap sample Lk from L.
Train a predictor using Lk.
Combine B predictors by
Voting (for classification problem)
Averaging (for estimation problem)
Bagging decision trees
1. Splitting the data set into training set T1 and test set T2.
2. Bagging using 50 bootstrap samples.
3. Repeat Steps 1-2 100 times, and calculate average
test set misclassification rate.
Bagging regression trees
Bagging:
Different ways of combining parsing results
Techniques for combining parsers
(Henderson and Brill, EMNLP-1999)
Parse hybridization: combining the
substructures of the input parses
Constituent voting
Nave Bayes
Experiment results:
It is effective for unstable learning methods.
It does not help stable learning methods.
Uncovered issues
How to determine whether a learning
method is stable or unstable?