You are on page 1of 2

Homework 2

Quinn Ngo
MATH 499
February 10, 2017

1. We are conducting a study to predict a Boolean variable y from two features x1 and x2 that only take
integer values, 0 x1 4 and 0 x2 3.
As our hypothesis class H we take the class of all closed rectangles with integer vertices contained in
[0, 4] [0, 3] (more precisely, the class of all characteristic functions of such rectangles).
Given the training set

x1 x2 y
0 1 1
1 1 0
2 1 1
(a) What is the empirical risk of the rectangle [0, 2] [0, 2]?
We denote the characteristic function of this rectangle as hS (x1 , x2 ).
hS (0, 1) = 1 = f (0, 1)
hS (1, 1) = 1 6= f (1, 1)
hS (2, 1) = 1 = f (2, 1)
Our empirical risk is 1/3, as there is one point in our sample for which hS (x1 , x2 ) 6= f (x1 , x2 ).
(b) Find some Empirical Risk Minimizer.
The function in part (a) suffices as an empirical risk minimizer, as it only predicts one of the
points in the sample incorrectly, and there is no rectangle that can predict all three of these
points correctly.
(c) What is the true risk of the ERM found in part (b) if the actual distribution of the features is
uniform over [0, 4] [0, 3] (more precisely, over the points with integer coordinates within it), and
the actual labels are f (x, y) = 1 if x is even and f (x, y) = 0 if x is odd?
Of the 20 points we have, 9 are predicted incorrectly by our ERM. Hence the true risk of the
hypothesis is 9/20 = .45.

1
(d) Estimate the size of the training set needed to guarantee, with 95% confidence, that the true risk
does not exceed by more than 0.05 its minimum value.
I think that a sample size of 9 would be sufficient.
2. How does the complexity of a learnable class depend on parameters and ? Classify the following as
true or false and justify.
(a) mH increases (in the non-strict sense) as decreases with fixed.
This is true. The lower the value of , the lower the error of our samples must be. In order for
this to occur, we need a more representative sample, so mH must grow larger as decreases.
(b) mH increases (in the non-strict sense) as increases with fixed.
This is false. In the most extreme case, when = 1, the complexity of any learnable class is 0.
(c) It seems reasonable that
lim mH (, ) = +
0
True. As our significance level approaches 0, we should expect that the sample is more represen-
tative, so the complexity mH should increase without bound.
3. Show that given a training set S = {(xi , f (xi ))}m d
i=1 (R {0, 1})
m
there exists a polynomial pS such
that hS (x) = 1 if and only if pS (x) 0, where
(
yi if i [m] such that xi = x
hS (x) =
0 otherwise

Define the set


A = {a : i [m] such that a = xi and h(a) = 1}
We construct a polynomial that evaluates to 0 for the desired values of x and is negative everywhere
else:
Y
pS (x) = (x a) (x a).
aA

4. Let H be a class of binary classifiers over a domain X . Let D be an unknown distribution over X , and
left f be the target hypothesis in H. Fix some h H. Show that the expected value of LS (h) over the
choice of S|x equals L(D,f ) (h), namely,

ES|x Dm [LS (h)] = LD (h).

 
|xi : h(xi ) 6= f (xi )|
ES|x Dm [LS (h)] = E
m
E(|xi : h(xi ) 6= f (xi )|)
=
E(m)
Pm
P (h(xi ) 6= f (xi ))
= i=1
m
= P (h(x) 6= f (x))
= LD (h)

5. Prove that if {H, D, f } satisfy the realizability assumption then


PSDm {S : LS (h ) = 0} = 1.
Suppose, for the sake of contradiction, that PSDm {S : LS (h ) = 0} < 1. Then there exists an S for
which LS (h ) > 0. Since h is an empirical risk minimizer, this means that h arg minhH L(h).
But since {H, D, f }, we know that f H, so L(h) = 0 for all h arg minh H L(h ), a contradiction.

You might also like