You are on page 1of 6

VI.

Logistic Regression - Feedback


You achieved a score of 3.50 out of 5.00.
Your answers, as well as our explanations, are shown below.

To review the material and deepen your understanding of the course content, please
answer the review questions below, and hit submit at the bottom of the page when you're
done.
You are allowed to take/re-take these review quizzes multiple times, and each time you
will see a slightly different set of questions or answers. We will use only your highest
score, and strongly encourage you to continue re-taking each quiz until you get a 100%
score at least once. (Even after that, you can re-take it to review the content further, with
no risk of your final score being reduced.) To prevent rapid-fire guessing, the system
enforces a minimum of 10 minutes between each attempt.
Question 1
Suppose that you have trained a logistic regression classifier, and it outputs on a new
example x a prediction h(x) = 0.2. This means (check all that apply):
Our estimate for P(y=0|x;) is 0.8.
Our estimate for P(y=1|x;) is 0.8.
Our estimate for P(y=1|x;) is 0.2.
Our estimate for P(y=0|x;) is 0.2.

Your answer

Score

Choice explanation
Since we must have P(y=0|

x;) = 1P(y=1|x;), the former


Our estimate for P(y=0|x;) is 0.8.

0.25

is 10.2=0.8.

h(x) gives P(y=1|x;),


Our estimate for P(y=1|x;) is 0.8.

0.25

not 1P(y=1|x;).

h(x) is precisely P(y=1|x;), so each


Our estimate for P(y=1|x;) is 0.2.

0.25

is 0.2.

Our estimate for P(y=0|x;) is 0.2.

0.25

h(x) is P(y=1|x;), not P(y=0|x;)

Total

1.00 / 1.00

Question 2
Suppose you train a logistic classifier h(x)=g(0+1x1+2x2).
Suppose0=6,1=0,2=1. Which of the following figures represents the decision
boundary found by your classifier?

Your answer

Score

Choice explanation
In this figure, we transition from negative
to positive when x1 goes from below 6 to
above 6, but for the given values of ,
the transition occurs when x2 goes from

0.00
Total

below 6 to above 6

0.00 / 1.00

Question 3

Suppose you have the following training set, and fit a logistic regression
classifier h(x)=g(0+1x1+2x2).

x
1

x2

0.5

1.5

Which of the following are true? Check all that apply.


The positive and negative examples cannot be separated using a straight line. So,
gradient descent will fail to converge.
At the optimal value of (e.g., found by fminunc), we will have

J()0.

Adding polynomial features (e.g., instead


using h(x)=g(0+1x1+2x2+3x21+4x1x2+5x22) ) would
increase J() because we are now summing over more terms.

J() will be a convex function, so gradient descent should converge to the global
minimum.

Your answer

Scor
e

Choice explanation

While it is true they cannot be


separated, gradient descent
The positive and negative examples cannot be
will still converge to the optimal
separated using a straight line. So, gradient descent
fit. Some examples will remain
will fail to converge.
0.25 misclassified at the optimum.
At the optimal value of (e.g., found by fminunc), we

The cost function J()is

will have J()0.

always non-negative for logistic


0.25 regression.

Adding polynomial features (e.g., instead

0.25 The summation in J()is over

examples, not features.


Furthermore, the hypothesis
will now be more accurate (or
at least just as accurate) with
new features, so the cost
function will decrease.

using h(x)=g(0+1x1+2x2+3x21+4x1x2+

5x22)) would increase J() because we are now


summing over more terms.

The cost function J()is

J() will be a convex function, so gradient descent


should converge to the global minimum.
Total
Question 4

guaranteed to be convex for


0.25 logistic regression.
1.00 / 1.00

For logistic regression, the gradient is given by jJ()=mi=1(h(x(i))

y(i))x(i)j. Which of these is a correct gradient descent update for logistic regression
with a learning rate of ? Check all that apply.

j:=j1mmi=1(Txy(i))x(i)j (simultaneously update for all j).


:=1mmi=1(h(x(i))y(i))x(i).
j:=j1mmi=1(11+eTx(i)y(i))x(i)j (simultaneously update for all j).
:=1mmi=1(Txy(i))x(i).
Your answer

j:=j1mmi=1(Txy(i))x(i)j(simultaneo
usly update for all j).

Scor
e

Choice explanation
This uses the linear regression
hypothesis Tx instead of that for

0.25 linear regression.


This is a vectorized version of the
direct substitution of jJ() into

:=1mmi=1(h(x(i))y(i))x(i).

0.25 the gradient descent update.

j:=j1mmi=1(11+eTx(i)

0.25 This substitutes the exact form


of h(x(i)) used by logistic
regression into the gradient

y(i))x(i)j(simultaneously update for all j).

descent update.
This vectorized version uses the
linear regression
hypothesis Tx instead of that for

:=1mmi=1(Txy(i))x(i).

0.25 logistic regression.

Total

1.00 / 1.00

Question 5

Which of the following statements are true? Check all that apply.
The cost function J() for logistic regression trained with m1 examples is
always greater than or equal to zero.
The sigmoid function g(z)=11+ez is never greater than one (>1).
For logistic regression, sometimes gradient descent will converge to a local
minimum (and fail to find the global minimum). This is the reason we prefer more
advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/LBFGS/etc).
Linear regression always works well for classification if you classify by using a
threshold on the prediction made by linear regression.

Your answer

Score

Choice explanation
The cost for any example x(i) is
always 0since it is the negative log of
a quantity less than one. The cost
function J() is a summation over the

The cost function J() for logistic


regression trained with m1 examples
is always greater than or equal to zero. 0.00

cost for each eample, so the cost


function itself must be greater than or
equal to zero.
The denomiator ranges

The sigmoid function g(z)=11+ez is


never greater than one (>1).

from to 1 as zgrows, so the result is


0.25

For logistic regression, sometimes


0.00
gradient descent will converge to a local
minimum (and fail to find the global
minimum). This is the reason we prefer

always in (0,1).
The cost function for logistic regression
is convex, so gradient descent will
always converge to the global minimum.
We still might use a more advanded

more advanced optimization algorithms


such as fminunc (conjugate
gradient/BFGS/L-BFGS/etc).

optimization algorithm since they can be


faster and don't require you to select a
learning rate.

Linear regression always works well for


classification if you classify by using a
threshold on the prediction made by
linear regression.
0.25

As demonstrated in the lecture, linear


regression often classifies poorly since
its training prodcedure focuses on
predicting real-valued outputs, not
classification.

Total

0.50 / 1.00

You might also like