You are on page 1of 6

National University of Singapore

EE5904/ME5404 Neural Networks



Mathematical Review

1) Mathematical notation

- there exist
- for all
- if and only if

2) Euclidean Space, R
n

Euclidean space, R
n
:
The set of all column vectors, x=(x
1
,,x
n
)
T

Addition of two vectors, x+y=(x
1
+y
1
,,x
n
+y
n
)
T

Multiplication of a vector with a scalar, , are defined as
: x=(x
1
,, x
n
)
T


Inner product (or scalar product) of two vectors x = (x
1
,
x
2
, ., x
n
)
T
and y = (y
1
, y
2
, ., y
n
)
T
is denoted by x
T
y:

=
=
n
i
i i
T
y x
1
y x


A norm || . || on R
n
is a mapping that assigns a scalar ||x||
to every xR
n
and that has the following properties:

(a) ||x|| 0 x R
n

(b) ||cx|| = |c| ||x|| for every c R and every x R
n

(c) ||x|| = 0 if x = 0
(d) ||x+y|| ||x||+||y|| x,y R
n
(Triangular
inequality)

Euclidean norm of x, ||x|| = (x
T
x) =
2
1
1
2
|
.
|

\
|

=
n
i
i
x


Schwartz inequality (applies to the Euclidean norm):
|x
T
y| ||x||.||y||
with equality holding if and only if x = y for some
scalar


3) Matrices

m x n matrix =>
(
(
(
(

mn m
n
a a
a a
a a a
L
O M
M
L
1
22 21
1 12 11
=A=[a
ij
]


Product of a matrix A and a scalar , A or A = [a
ij
]

Product of two matrices A and B :
Let A, B, C be a mxn, nxp and mxp matrices,
respectively.
AB = C=[c
ij
], where c
ij =

=
n
k
kj ik
b a
1


Transpose of a mxn matrix A is the nxm matrix A
T
=
[a
ij
]= [a
ji
]

A square matrix A is symmetric if A
T
=A

Non-singular matrix
A square matrix (nxn) A is called non-singular or
invertible if there is an nxn matrix called the inverse of A
(denoted by A
-1
), such that A
-1
A = I = AA
-1
, where I is
the nxn identity matrix.

Positive Definite and Semidefinite Matrices

A square nxn matrix A is said to be positive semidefinite
if x
T
Ax 0 xR
n


A square nxn matrix A is said to be positive definite if
x
T
Ax > 0 xR
n


The matrix A is said to be negative semidefinite
(definite) if A is positive semidefinite (definite).


4) Derivatives

Let f : R
n
R be some function.

Partial Derivative:
For a fixed x R
n
, the first partial derivative of f at the
point x in the ith coordinate is defined by

) ( ) (
lim
) (
0
x e x x f f
x
f
i
i
+
=



where e
i
is the ith unit vector.


Gradient:
If the partial derivatives with respect to all coordinates
exist, f is called differentiable at x and its gradient at x is
defined as:

|
|
|
|
|
.
|

\
|

=
n
x
f
x
f
f
) (
) (
) (
1
x
x
x M



Note: f is orthogonal to the corresponding level
surface f(x
1
,,x
n
)=C


Various Definitions
The function f is called differentiable if it is
differentiable at every x R
n
.

The function f is said to be continuously differentiable if
f(x) exists for every x and is a continuous function of x.

The function f is said to be twice differentiable if the
gradient f(x) is itself a differentiable function.
Hessian Matrix
For this case, we can define Hessian matrix of f at x ,
(
(

=
j i
x x
f
f
) (
) (
2
2
x
x
, i.e. the elements are the second
partial derivatives of f at x.

The function f is twice continuously differentiable if

2
f(x) exists and is continuous.


Vector-valued function
A vector-valued function f : R
n
R
m
is differentiable
(continuously differentiable) if each component f
i
of f is
differentiable (continuously differentiable).

Gradient matrix of the vector-valued function
The gradient matrix of f, f(x), is the nxm matrix whose
ith column is f
i
(x),

[ ] ) ( ) ( ) (
1
x x x f
m
f f = L


Jacobian of the vector-valued function
Jacobian of f = (f(x))
T


Chain rule
Let f : R
k
R
m
and g : R
m
R
n
be continuously
differentiable functions, and let h(x) = g(f(x)).

The chain rule for differentiation states that

h(x) =
x
f(x)
f
g(f(x)), xR
k

Special examples
If f : R
n
R
m
is of the form f(x) = Ax, where A is an m
x n matrix,
f(x) = A
T

If f : R
n
R is of the form f(x) = x
T
Ax/2, where A is a
symmetric n x n matrix,
f(x) = Ax

2
f(x) = A

5) Unconstrained Optimization Techniques

Local and global minima
x*: local min if >0 s.t. f(x*)f(x), x in ||x-x*||
x*: strictly local min if >0 s.t. f(x*)<f(x), x in ||x-x*||

x*: global min if >0 s.t. f(x*)f(x), xR
n




f(x)
x
Strict local
minimum
Local
minima
Strict
global
minimum

You might also like