You are on page 1of 2

CS4780 Mathematical Background

(Spring 2017)

Note This quiz is for self-examination. It is far more valuable to you if you refrain from web
searching answers --- using Wikipedia to look up important concepts (click-able pointers in red
color) is fine. You do not have to finish all questions. Necessary mathematical topics would be
discussed, albeit briefly, in lectures. The orders/points of questions do not always imply
Solutions are due in the second class.

The following questions test your basic skills in computing the derivatives of univariate
functions, as well as applying the concept of convexity to determine the properties of the
1. (1 pt) Show that f (x) = x
and g(x) = ln x are convex and concave, respectively.
2. (2 pts) Show that is neither convex or concave (feel free to plot the function
h(x) = −x

to get some intuition why this is the case).

3. (2 pts) Show that − ln h(x) is convex.

Basic concepts from information theory

1. (2 pts) Prove that x − 1 ≥ ln x for ∀x > 0. (Hint: Show the minimum of x − 1 − ln x is 0
by studying the function's monototicity.)
2. (3 pts) Using the above property to show that, for any probability vectors p = (p1 , ⋯ pK )
and q = (q1 , ⋯ qK ),

K L(p∥q) = ∑ pk ln ≥ 0,

where K L(p∥q) is called the Kullback-Leibler divergence between p and q . Note that for
probability vectors, pk ≥ 0 and qk ≥ 0 and additionally, ∑k pk = 1 and ∑k qk = 1,
3. (3 pts) Using the above property to show that, for any probability vector p = (p1 , ⋯ pK ),

H (p) = − ∑ pk ln pk ≤ ln K


where H (p) is called the entropy of p.

The following questions test your familiarity with the method of Lagrange multiplier.
1. (2 pts) Consider the following two functions
2 2
f1 (x, y) = (x − 1) + (y − 1)

2 2
f2 (x, y) = (x − 3) + (y − 2)

what are their minima if we constrain x

+ y
≤ 1? Please show the derivation steps.
2. (5 pts) In a R (i.e., D-dimensional Euclidean space), what is the shortest distance from a

point x0 ∈ R to a hyperplane H : w T x + b = 0? You need to express this distance in

terms of w, b and x0 . (Hint: Please show the detailed derivation steps. Formulate this
problem as a constrained optimization and then use the method of Lagrange multiplier.)

Probability and statistics

1. (3 pts) Consider a random variable X that follows the uniform distribution between 0 and a.
Please calculate the mean, variance and entropy of X. Please show the steps of the
calculation (you can look up results on the web to verify your calculation.)
2. (7 pts) Suppose we have observed N independent random samples of X, (x1 , x2 , ⋯ , xN ).
What is the maximum likelihood estimation of a? Please show the derivation steps. Is this
estimate unbiased? Please justify your answer too.
3. (6 pts) Given two independent Gaussian random variables U ∼ N (−1, 1) and
V ∼ N (1, 1), are the following random variables also Gaussian? If so, what are their

means and (co)-variances? Note that T is a vector.

U + V U with 50% chance
Y = U + V, Z = U × V, T = ( ), W = {
U − 2V V with 50% chance

4. (5 pts) We have two coins: one is a fair coin -- observing either the head side or the tail
side with 50% probability. The other coin is a fake coin which has head on both sides.
Suppose we randomly pick one and toss it (the two coins are otherwise the same so either
one would be picked up with a 50% probability). (Hint: Use the rules of probability and
Bayes's rule).
What is the probability we observe head?
If we indeed observe head, what is the probability we had picked the fake coin?

Linear algebra
1. (4 pts) Consider the covariance matrix Σ of a random vector X , which is
Σ = E[(X − EX)(X − EX) ], where EX is the expectation of X . Prove that is


2. (5 pts) Let A and B be two RD×D symmetric matrices. Suppose A and B have the exact
same set of eigenvectors u1 , u2 , ⋯ , uD with the corresponding eigenvalues α1 , α2 , ⋯ , αD
for A , and β1 , β2 , ⋯ , βD for B . Please write down the eigenvectors and their
corresponding eigenvalues for the following matrices
C = A + B, D = A − B, E = AB, F = A (assume is invertible)

A 0
, where is the all-zero matrix.
2D×2D D×D
G = ( ) ∈ R 0 ∈ R
0 B