CS4780 Mathematical Exam Concepts (Spring 2017

CS4780 Mathematical Background
Exam
(Spring 2017)
Note This quiz is for self-examination. It is far more valuable to you if you refrain from web
searching answers --- using Wikipedia to look up important concepts (click-able pointers in red
color) is fine. You do not have to finish all questions. Necessary mathematical topics would be
discussed, albeit briefly, in lectures. The orders/points of questions do not always imply
difficulties.
Solutions are due in the second class.
Convexity
The following questions test your basic skills in computing the derivatives of univariate
functions, as well as applying the concept of convexity to determine the properties of the
functions.
1. (1 pt) Show that f (x) = x
2
and g(x) = ln x are convex and concave, respectively.
2. (2 pts) Show that is neither convex or concave (feel free to plot the function
1
h(x) = −x
1+e
to get some intuition why this is the case).

3. (2 pts) Show that − ln h(x) is convex.
Basic concepts from information theory

1. (2 pts) Prove that x − 1 ≥ ln x for ∀x > 0. (Hint: Show the minimum of x − 1 − ln x is 0
by studying the function's monototicity.)
2. (3 pts) Using the above property to show that, for any probability vectors p = (p1 , ⋯ pK )
and q = (q1 , ⋯ qK ),
K
pk
K L(p∥q) = ∑ pk ln ≥ 0,
qk
k=1
where K L(p∥q) is called the Kullback-Leibler divergence between p and q . Note that for
probability vectors, pk ≥ 0 and qk ≥ 0 and additionally, ∑k pk = 1 and ∑k qk = 1,
3. (3 pts) Using the above property to show that, for any probability vector p = (p1 , ⋯ pK ),
H (p) = − ∑ pk ln pk ≤ ln K
k=1
where H (p) is called the entropy of p.
Optimization
The following questions test your familiarity with the method of Lagrange multiplier.
1. (2 pts) Consider the following two functions
2 2
f1 (x, y) = (x − 1) + (y − 1)
2 2
f2 (x, y) = (x − 3) + (y − 2)
what are their minima if we constrain x

2
+ y
2
≤ 1? Please show the derivation steps.
2. (5 pts) In a R (i.e., D-dimensional Euclidean space), what is the shortest distance from a
D
point x0 ∈ R to a hyperplane H : w T x + b = 0? You need to express this distance in

D
terms of w, b and x0 . (Hint: Please show the detailed derivation steps. Formulate this
problem as a constrained optimization and then use the method of Lagrange multiplier.)
Probability and statistics

1. (3 pts) Consider a random variable X that follows the uniform distribution between 0 and a.
Please calculate the mean, variance and entropy of X. Please show the steps of the
calculation (you can look up results on the web to verify your calculation.)
2. (7 pts) Suppose we have observed N independent random samples of X, (x1 , x2 , ⋯ , xN ).
What is the maximum likelihood estimation of a? Please show the derivation steps. Is this
estimate unbiased? Please justify your answer too.
3. (6 pts) Given two independent Gaussian random variables U ∼ N (−1, 1) and
V ∼ N (1, 1), are the following random variables also Gaussian? If so, what are their
means and (co)-variances? Note that T is a vector.

U + V U with 50% chance
Y = U + V, Z = U × V, T = ( ), W = {
U − 2V V with 50% chance
4. (5 pts) We have two coins: one is a fair coin -- observing either the head side or the tail
side with 50% probability. The other coin is a fake coin which has head on both sides.
Suppose we randomly pick one and toss it (the two coins are otherwise the same so either
one would be picked up with a 50% probability). (Hint: Use the rules of probability and
Bayes's rule).
What is the probability we observe head?
If we indeed observe head, what is the probability we had picked the fake coin?
Linear algebra
1. (4 pts) Consider the covariance matrix Σ of a random vector X , which is
Σ = E[(X − EX)(X − EX) ], where EX is the expectation of X . Prove that is
⊤
Σ
positive-semidefinite.
2. (5 pts) Let A and B be two RD×D symmetric matrices. Suppose A and B have the exact
same set of eigenvectors u1 , u2 , ⋯ , uD with the corresponding eigenvalues α1 , α2 , ⋯ , αD
for A , and β1 , β2 , ⋯ , βD for B . Please write down the eigenvectors and their
corresponding eigenvalues for the following matrices
C = A + B, D = A − B, E = AB, F = A (assume is invertible)
−1
B A
A 0
, where is the all-zero matrix.
2D×2D D×D
G = ( ) ∈ R 0 ∈ R
0 B

CS4780 Mathematical Exam Concepts (Spring 2017

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS4780 Mathematical Exam Concepts (Spring 2017

Uploaded by

Copyright:

Available Formats

CS4780 Mathematical Background

to get some intuition why this is the case).

Basic concepts from information theory

where H (p) is called the entropy of p.

what are their minima if we constrain x

point x0 ∈ R to a hyperplane H : w T x + b = 0? You need to express this distance in

Probability and statistics

means and (co)-variances? Note that T is a vector.

You might also like