Math Review

National University of Singapore
EE5904/ME5404 Neural Networks

Mathematical Review

1) Mathematical notation

- there exist
- for all
- if and only if

2) Euclidean Space, R
n

Euclidean space, R
n
:
The set of all column vectors, x=(x
1
,,x
n
)
T

Addition of two vectors, x+y=(x
1
+y
1
,,x
n
+y
n
)
T

Multiplication of a vector with a scalar, , are defined as
: x=(x
1
,, x
n
)
T

Inner product (or scalar product) of two vectors x = (x
1
,
x
2
, ., x
n
)
T
and y = (y
1
, y
2
, ., y
n
)
T
is denoted by x
T
y:

=
=
n
i
i i
T
y x
1
y x

A norm || . || on R
n
is a mapping that assigns a scalar ||x||
to every xR
n
and that has the following properties:

(a) ||x|| 0 x R
n

(b) ||cx|| = |c| ||x|| for every c R and every x R
n

(c) ||x|| = 0 if x = 0
(d) ||x+y|| ||x||+||y|| x,y R
n
(Triangular
inequality)

Euclidean norm of x, ||x|| = (x
T
x) =
2
1
1
2
|
.
|
\
|
=
n
i
i
x

Schwartz inequality (applies to the Euclidean norm):
|x
T
y| ||x||.||y||
with equality holding if and only if x = y for some
scalar

3) Matrices

m x n matrix =>
(
(
(
(
mn m
n
a a
a a
a a a
L
O M
M
L
1
22 21
1 12 11
=A=[a
ij
]

Product of a matrix A and a scalar , A or A = [a
ij
]

Product of two matrices A and B :
Let A, B, C be a mxn, nxp and mxp matrices,
respectively.
AB = C=[c
ij
], where c
ij =
=
n
k
kj ik
b a
1

Transpose of a mxn matrix A is the nxm matrix A
T
=
[a
ij
]= [a
ji
]

A square matrix A is symmetric if A
T
=A

Non-singular matrix
A square matrix (nxn) A is called non-singular or
invertible if there is an nxn matrix called the inverse of A
(denoted by A
-1
), such that A
-1
A = I = AA
-1
, where I is
the nxn identity matrix.

Positive Definite and Semidefinite Matrices

A square nxn matrix A is said to be positive semidefinite
if x
T
Ax 0 xR
n

A square nxn matrix A is said to be positive definite if
x
T
Ax > 0 xR
n

The matrix A is said to be negative semidefinite
(definite) if A is positive semidefinite (definite).

4) Derivatives

Let f : R
n
R be some function.

Partial Derivative:
For a fixed x R
n
, the first partial derivative of f at the
point x in the ith coordinate is defined by

) ( ) (
lim
) (
0
x e x x f f
x
f
i
i
+
=

where e
i
is the ith unit vector.

Gradient:
If the partial derivatives with respect to all coordinates
exist, f is called differentiable at x and its gradient at x is
defined as:

|
|
|
|
|
.
|
\
|
=
n
x
f
x
f
f
) (
) (
) (
1
x
x
x M

Note: f is orthogonal to the corresponding level
surface f(x
1
,,x
n
)=C

Various Definitions
The function f is called differentiable if it is
differentiable at every x R
n
.

The function f is said to be continuously differentiable if
f(x) exists for every x and is a continuous function of x.

The function f is said to be twice differentiable if the
gradient f(x) is itself a differentiable function.
Hessian Matrix
For this case, we can define Hessian matrix of f at x ,
(
(
=
j i
x x
f
f
) (
) (
2
2
x
x
, i.e. the elements are the second
partial derivatives of f at x.

The function f is twice continuously differentiable if
2
f(x) exists and is continuous.

Vector-valued function
A vector-valued function f : R
n
R
m
is differentiable
(continuously differentiable) if each component f
i
of f is
differentiable (continuously differentiable).

Gradient matrix of the vector-valued function
The gradient matrix of f, f(x), is the nxm matrix whose
ith column is f
i
(x),

[ ] ) ( ) ( ) (
1
x x x f
m
f f = L

Jacobian of the vector-valued function
Jacobian of f = (f(x))
T

Chain rule
Let f : R
k
R
m
and g : R
m
R
n
be continuously
differentiable functions, and let h(x) = g(f(x)).

The chain rule for differentiation states that

h(x) =
x
f(x)
f
g(f(x)), xR
k

Special examples
If f : R
n
R
m
is of the form f(x) = Ax, where A is an m
x n matrix,
f(x) = A
T

If f : R
n
R is of the form f(x) = x
T
Ax/2, where A is a
symmetric n x n matrix,
f(x) = Ax

2
f(x) = A

5) Unconstrained Optimization Techniques

Local and global minima
x*: local min if >0 s.t. f(x*)f(x), x in ||x-x*||
x*: strictly local min if >0 s.t. f(x*)<f(x), x in ||x-x*||

x*: global min if >0 s.t. f(x*)f(x), xR
n

f(x)
x
Strict local
minimum
Local
minima
Strict
global
minimum

Math Review

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Math Review

Uploaded by

Copyright:

Available Formats

National University of Singapore

EE5904/ME5404 Neural Networks

You might also like