03 Linear Algebra

Introduction to Statistical
Machine Learning
Introduction to Statistical Machine Learning

Christfried Webers
Statistical Machine Learning Group
NICTA
and
College of Engineering and Computer Science
The Australian National University
Canberra
February June 2013
c 2013
Christfried Webers
NICTA
The Australian National
University
ISML
2013
Outlines
Overview
Introduction
Linear Algebra
Probability
Linear Regression 1
Linear Regression 2
Linear Classification 1
Linear Classification 2
Neural Networks 1
Neural Networks 2
Kernel Methods
Sparse Kernel Methods
Graphical Models 1
Graphical Models 2
Graphical Models 3
Mixture Models and EM 1
Mixture Models and EM 2
Approximate Inference
Sampling
Principal Component Analysis
Sequential Data 1
(Many figures from C. M. Bishop, "Pattern Recognition and Machine Learning")
Sequential Data 2
Combining Models
Selected Topics
Discussion and Summary
1of 157
Machine Learning
c 2013
Christfried Webers
NICTA
University
Part III
ISML
2013
Basic Concepts
Linear Algebra
Linear Transformations
Trace
Inner Product
Projection
Rank, Determinant, Trace
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Directional Derivative,
Gradient
Books
115of 157
Intuition
Geometry
Points and Lines
Vector addition and scaling
Humans have experience with 3 dimensions (less with 1, 2
though)
Generalisation to N dimensions (possibly N ! 1)
Line ! vector space V
Point ! vector x 2 V
Example : X 2 Rnm
Space of matrices Rnm and the space of vectors Rnm are
isomorphic
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
116of 157
Vector Space V, underlying Field F

Given vectors u, v, w 2 V and scalars , 2 F, the following
holds:
Associativity of addition : u + (v + w) = (u + v) + w .
Commutativity of addition : v + w = w + v .
Identity element of addition : 9 0 such that
v + 0 = v, 8 v 2 V.
Inverse of addition : For all v 2 V, 9w 2 V, such that
v + w = 0. The additive inverse is denoted v.
Distributivity of scalar multiplication with respect to vector
addition : (v + w) = v + w .
Distributivity of scalar multiplication with respect to field
addition : ( + )v = v + v .
Compatibility of scalar multiplication with field
multiplication : ( v) = ( )v .
Identity element of scalar multiplication : 1v = v, where 1 is
the identity in F .
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
117of 157
Machine Learning
Matrix-Vector Multiplication
1
f o r i i n xrange (m) :
R[ i ] = 0 . 0 ;
f o r j i n xrange ( n ) :
R[ i ] = R[ i ] + A [ i , j ] V [ j ]
Listing 1: Code for elementwise matrix multiplication.
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
118of 157
Machine Learning
1
f o r i i n xrange (m) :
R[ i ] = 0 . 0 ;
f o r j i n xrange ( n ) :
R[ i ] = R[ i ] + A [ i , j ] V [ j ]
Listing 2: Code for elementwise matrix multiplication.
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
a11
6 a21
6
4. . .
am1
A
a12
a22
...
am2
...
...
...
...
32
Matrix Inverse
a1n
v1
a11 v1 + a12 v2 + + a1n vn
6 v2 7 6 a21 v1 + a22 v2 + + a2n vn 7
a2n 7
76 7 = 6
7
5
. . . 5 4. . . 5 4
...
amn
vn
am1 v1 + am2 v2 + + amn vn
Eigenvectors
Singular Value
Decomposition
Gradient
Books
119of 157
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
a11
6 a21
6
4. . .
am1
A
a12
a22
...
am2
...
...
...
...
32
Basic Concepts
a1n
v1
a11 v1 + a12 v2 + + a1n vn
6 v2 7 6 a21 v1 + a22 v2 + + a2n vn 7
a2n 7
76 7 = 6
7
5
. . . 5 4. . . 5 4
...
amn
vn
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
120of 157
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
a11
6 a21
6
4. . .
am1
a12
a22
...
am2
...
...
...
...
32
Basic Concepts
a1n
v1
a11 v1 + a12 v2 + + a1n vn
6 v2 7 6 a21 v1 + a22 v2 + + a2n vn 7
a2n 7
76 7 = 6
7
5
. . . 5 4. . . 5 4
...
amn
vn
R = A[: ,0] V[0];

f o r j i n xrange ( 1 , n ) :
R += A [ : , j ] V [ j ] ;
Listing 4: Code for columnwise matrix multiplication.
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
121of 157
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Denote the n columns of A by a1 , a2 , . . . , an

Each ai is now a (column) vector
Basic Concepts
Trace
Inner Product
Projection
A
2
4a1
a2
...
3 v1
2
3
6 v2 7
7 4
5
an 5 6
4. . .5 = a1 v1 + a2 v2 + + an vn
vn

Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
122of 157
Machine Learning
Transpose of R
c 2013
Christfried Webers
NICTA
University
ISML
2013
Given R = AV,
2
What is R ?
R = 4a1 v1 + a2 v2 + + an vn 5
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
123of 157
Machine Learning
Transpose of R
c 2013
Christfried Webers
NICTA
University
ISML
2013
Given R = AV,
2
What is R ?
R = 4a1 v1 + a2 v2 + + an vn 5
Basic Concepts
Trace
Inner Product
Projection
RT = v1 aT1 + v2 aT2 + + vn aTn
NOT equal to AT V T ! (In fact, AT V T is not even defined.)
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
124of 157
Transpose
Reverse order rule : (AV)T = V T AT
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
125of 157
Machine Learning
Transpose
c 2013
Christfried Webers
NICTA
University
ISML
2013
Reverse order rule : (AV)T = V T AT

VT
v1
6
6
6
v2 . . . vn 6
6
6
4
AT
aT1
aT2
...
aTn
RT
7
7
7 T
7 = v1 a + v2 aT + + vn aTn
1
2
7
7
5
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
126of 157
Machine Learning
Trace
c 2013
Christfried Webers
NICTA
University
The trace of a square matrix A is the sum of all diagonal

elements of A.
n
X
tr {A} =
Akk
k=1
Example
ISML
2013
Basic Concepts
Trace
Inner Product
3
1 2 3
A = 44 5 65
7 8 9
Projection
tr {A} = 1 + 5 + 9 = 15
The trace does not exist for a non-square matrix.
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
127of 157
Machine Learning
vec(x) operator
Define vec(X) as the vector which results from stacking all
columns of a matrix A on top of each other.
Example
2 3
1
647
6 7
677
6 7
2
3
627
1 2 3
6 7
7
A = 44 5 65
vec(A) = 6
657
6
7
7 8 9
687
637
6 7
465
9
Given two matrices X, Y 2 Rnp , the trace of the product

XT Y can be written as
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
tr XT Y = vec(X)T vec(Y)
128of 157
Machine Learning
Inner Product
c 2013
Christfried Webers
NICTA
University
Mapping from two vectors x, y 2 V to a field of scalars F

( F = R or F = C ):
ISML
2013
Basic Concepts
h, i : V V ! F
Conjugate Symmetry :
hx, yi = hy, xi
Linearity :
hax, yi = ahx, yi , and hx + y, zi = hx, zi + hy, zi
Positive-definitness :
hx, xi 0, and hx, xi = 0 for x = 0 only.
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
129of 157
Canonical Inner Products

Inner product for real numbers x, y 2 R:
hx, yi = x y
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
130of 157
Machine Learning
Canonical Inner Products
c 2013
Christfried Webers
NICTA
University
Inner product for real numbers x, y 2 R:

hx, yi = x y
ISML
2013
Dot product between two vectors:

Given x, y 2 Rn
T
hx, yi = x y
n
X
k=1
hX, Yi = tr XT Y =
=
Trace
xk yk = x1 y1 + x2 y2 + + xn yn
Inner Product
Projection
Canonical inner product for matrices :

Given X, Y 2 Rnp
p
n
X
X
Basic Concepts
p
X
k=1
(XT Y)kk =
Matrix Inverse
Eigenvectors
p
n
X
X
k=1 l=1
Singular Value
Decomposition
(XT )kl (Y)lk
Gradient
Books
Xlk Ylk
k=1 l=1
131of 157
Machine Learning
Calculations with Matrices

th
Denote (i, j) element of matrix X by Xij

Transpose : (XT )ij =
ji
PX
...
Product : (XY)ij = k=1 Xik Ykj
Proof of the linearity of the canonical matrix inner product :
hX + Y, Zi = tr (X + Y)T Z =
p
=
=
=
=
n
XX
k=1 l=1
p
n
X
X
k=1 l=1
p
n
X
X
k=1 l=1
p
X
T
p
X
((X + Y)T Z)kk
k=1
((X + Y)T )kl Zlk =
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
n
XX
Trace
(Xlk + Ylk )Zlk
k=1 l=1
Xlk Zlk + Ylk Zlk
Inner Product
Projection
Matrix Inverse
Eigenvectors
(X )kl Zlk + (Y )kl Zlk
Singular Value
Decomposition
Gradient
Books
(X Z)kk + (Y Z)kk
k=1
= tr XT Z + tr YT Z = hX, Zi + hY, Zi
132of 157
Projection
In linear algebra and functional analysis, a projection is a
linear transformation P from a vector space V to itself such
that
P2 = P
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Projection transformation can only have effective influenceRank, Determinant, Trace

for one time, because after first time projection, we can Matrix Inverse
Eigenvectors
not move it anymore by further applying that tranformation.
Singular Value
Decomposition
Gradient
Books
133of 157
Machine Learning
Projection
that
P2 = P
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
(a,b)
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
x
(a-b,0)
Singular Value
Decomposition
Gradient
Books
134of 157
Machine Learning
Projection
that
P2 = P
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
(a,b)
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
(a-b,0)
a
a b
A
=
b
0
1
A=
0
Books
1
0
135of 157
Machine Learning
Orthogonal Projection
Orthogonality : need an inner product hx, yi
Choose two arbitrary vectors x and y. Then Px and y
are orthogonal.
0 = hPx, y
Pyi = (Px) (y
Py) = x (P
c 2013
Christfried Webers
NICTA
University
Py
P P)y
ISML
2013
Basic Concepts
Trace
P =P
P=P
Inner Product
Projection
Example : Given some unit vector u 2 Rn characterising a

line through the origin in Rn
project an arbitrary vector x 2 Rn onto this line by
P = uuT
Proof : P = PT , and

Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
P2 x = (uuT )(uuT )x = uuT x = Px

136of 157
Machine Learning
Given a matrix A 2 Rnp and a vector x. What is the
closest point e
x to x in the column space of A ?
Orthogonal Projection into the column space of A
e
x = A(AT A)
Projection matrix
P = A(AT A)
AT x
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
AT
Projection
Proof
Matrix Inverse
Eigenvectors
P2 = A(AT A)
AT A(AT A)
AT = A(AT A)
AT = P
Singular Value
Decomposition
Gradient
Orthogonal projection ?
Books
PT = (A(AT A)
AT )T = A(AT A)
AT = P
137of 157
Linear Independence of Vectors

A set of vectors is linearly independent if none of them can
be written as a linear combination of finitely many other
vectors in the set.
Given a vector space V and k vectors v1 , . . . , vk 2 V and
scalars a1 , . . . , ak . The vectors are linearly independent, if
and only if
2 3
0
6 07
6 7
a1 v1 + a2 v2 + + ak vk = 0 6 . 7
4 .. 5
0
has only the trivial solution
a1 = a2 = ak = 0
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
138of 157
Rank
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
The column rank of a matrix A is the maximal number of

linearly independent columns of A.
The row rank of a matrix A is the maximal number of
linearly independent rows of A.
For every matrix A, the row rank and column rank are
equal, called the rank of A.
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
139of 157
Machine Learning
Determinant
c 2013
Christfried Webers
NICTA
University
Let Sn be the set of all permutation of the numbers {1, . . . , n}.

The determinant of the square matrix A is then given by
X
det {A} =
2Sn
sgn( )
n
Y
Basic Concepts
Ai,
(i)
i=1
where sgn is the signature of the permutation ( +1 if the

permutation is even, 1 if the permutation is odd).
det {AB} = det {A} det {B} for A, B square matrices.
det A
= det {A}
det AT = det {A}.
ISML
2013
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
140of 157
Machine Learning
Matrix Inverse
c 2013
Christfried Webers
NICTA
University
Identity Matrix I
I = AA 1 = A 1 A
The matrix inverse A 1 does only exist for square matrices
which are NOT singular.
Singular matrix
at least one eigenvalue is zero,
determinant |A| = 0.
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Inversion reverses the order (like transposition)
Matrix Inverse
Eigenvectors
(AB)
(Only then is (AB)(AB)
=B
= (AB)B
A
1
Singular Value
Decomposition
= AA
= I.)
Gradient
Books
141of 157
Matrix Inverse and Transpose
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
(A
1 T
) ?
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
142of 157
Matrix Inverse and Transpose
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
1 T
(A ) ?
need to assume that (A
rule : (A 1 )T = (AT ) 1
) exists
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
143of 157
Machine Learning
Useful Identity
c 2013
Christfried Webers
NICTA
University
ISML
2013
(A
+ BT C
B)
1 T
B C
= ABT (BABT + C)
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
144of 157
Machine Learning
Useful Identity
c 2013
Christfried Webers
NICTA
University
ISML
2013
(A
+ BT C
B)
1 T
B C
= ABT (BABT + C)
Basic Concepts
Trace
How to analyse and prove such an equation?

A 1 must exist, therefore A must be square, say A 2 Rnn .
Same for C 1 , so lets assume C 2 Rmm .
Therefore, B 2 Rmn .
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
145of 157
Machine Learning
Prove the Identity
(A
+ BT C
c 2013
Christfried Webers
NICTA
University
1 T
B)
B C
= ABT (BABT + C)
Basic Concepts
Multiply by (BABT + C)
(A
+ BT C
1 T
B)
B C
(BABT + C) = ABT
= (A
= ABT
+ BT C
1
+B C
B)
1
B)
Trace
Inner Product
Projection
Simplify the left-hand side

(A
ISML
2013
[BT C
1
[B C
B(ABT ) + BT ]
1
B+A
]AB
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
146of 157
Machine Learning
Woodbury Identity
c 2013
Christfried Webers
NICTA
University
ISML
2013
(A + BD
C)
=A
B(D + CA
B)
CA
Basic Concepts
Trace
Useful if A is diagonal and easy to invert, and B is very tall

(many rows, but only a few columns) and C is very wide
(few rows, many columns).
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
147of 157
Manipulating Matrix Equations
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Dont multiply by a matrix which does not have full rank.

Why? In the general case, you will loose solutions.
Dont multiply by nonsquare matrices.
Dont assume a matrix inverse exists because a matrix is
square. The matrix might be singular and you are making
the same mistake as if deducing a = b from a 0 = b 0.
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
148of 157
Machine Learning
Eigenvectors
c 2013
Christfried Webers
NICTA
University
Every square matrix A 2 Rnn has an Eigenvector

decomposition
Ax = x
where x 2 Rn and
Example:
2 C.
ISML
2013
Basic Concepts
Trace
0 1
x= x
1 0
Inner Product
Projection
Matrix Inverse
= { , }
x=
,
1
1
Eigenvectors
Singular Value
Decomposition
Gradient
Books
149of 157
Machine Learning
Eigenvectors
c 2013
Christfried Webers
NICTA
University
How many eigenvalue/eigenvector pairs?
ISML
2013
Basic Concepts
Ax = x
is equivalent to
Trace
Inner Product
(A
I)x = 0
Has only non-trivial solution for det {A

I} = 0
polynom of nth order; at most n distinct solutions
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
150of 157
Machine Learning
Real Eigenvalues
c 2013
Christfried Webers
NICTA
University
ISML
2013
How can we enforce real eigenvalues?

Lets look at matrices with complex entries A 2 Cnn .
Transposition is replaced by Hermitian adjoint, e.g.
1 + 2
5 + 6
3 + 4
7 + 8
1
=
3
2
4
5
7
Basic Concepts
Trace
Inner Product
6
8
Denote the complex conjugate of a complex number

.
Projection
Matrix Inverse
by
Eigenvectors
Singular Value
Decomposition
Gradient
Books
151of 157
Machine Learning
Real Eigenvalues
How can we enforce real eigenvalues?
Lets assume A 2 Cnn , Hermitian (AH = A).
Calculate
xH Ax = xH x
for an eigenvector x 2 Cn of A.
Another possibility to calculate xH Ax
Basic Concepts
(A is Hermitian)
H
(reverse order)
= ( xH x)H
(eigenvalue)
= (x Ax)
ISML
2013
Trace
xH Ax = xH AH x
H
c 2013
Christfried Webers
NICTA
University
Inner Product
Projection
Matrix Inverse
Eigenvectors
= x x
Singular Value
Decomposition
and therefore
=
( is real).
If A is Hermitian, then all eigenvalues are real.

Special case: If A has only real entries and is symmetric,
then all eigenvalues are real.
Gradient
Books
152of 157
Singular Value Decomposition
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Every matrix A 2 Rnp can be decomposed into a product of

three matrices
A = UV T
where U 2 Rnn and V 2 Rpp are orthogonal matrices (
U T U = I and V T V = I ), and 2 Rnp has nonnegative
numbers on the diagonal.
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
153of 157
How to calculate a gradient?

Given a vector space V, and a function f : V ! R.
Example: X 2 Rnp , arbitrary C 2 Rnn ,
f (X) = tr X T CX .
How to calculate the gradient of f ?
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
154of 157

Example: X 2 Rnp , arbitrary C 2 Rnn ,
f (X) = tr X T CX .
Ill-defined question. There is no gradient in a vector space.
Only a Directional Derivative at point X 2 V in direction
2 V.
f (X + h) f (X)
Df (X)() = lim
h!0
h
For the example f (X) = tr X T CX
tr (X + h)T C(X + h)
tr X T CX
Df (X)() = lim
h!0
h
T
T
= tr CX + X C = tr X T (CT + C)
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
155of 157

1
2
3
Calculate the Directional Derivative

Define an inner product hA, Bi on the vector space
The gradient is now defined as
Df (X)() = hgrad f , i
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Trace
Inner Product
Projection
For the example f (X) = tr X T CX

Define the inner product hA, Bi = tr AT B .
Write the Directional Derivative as an inner product with
T
Df (X)() = tr X (C + C)
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
= h(C + C )X, i = hgrad f , i

156of 157
Books
Machine Learning
c 2013
Christfried Webers
NICTA
University
ISML
2013
Basic Concepts
Gilbert Strang, "Introduction to Linear Algebra", Wellesley

Cambridge, 2009.
David C. Lay, "Linear Algebra and Its Applications",
Addison Wesley, 2005.
Trace
Inner Product
Projection
Matrix Inverse
Eigenvectors
Singular Value
Decomposition
Gradient
Books
157of 157

03 Linear Algebra

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

03 Linear Algebra

Uploaded by

Copyright:

Available Formats

Introduction to Statistical

Introduction to Statistical Machine Learning

(Many figures from C. M. Bishop, "Pattern Recognition and Machine Learning")

Vector Space V, underlying Field F

Listing 1: Code for elementwise matrix multiplication.

Listing 2: Code for elementwise matrix multiplication.

R = A[: ,0] V[0];

Listing 4: Code for columnwise matrix multiplication.

Denote the n columns of A by a1 , a2 , . . . , an

Rank, Determinant, Trace

RT = v1 aT1 + v2 aT2 + + vn aTn

NOT equal to AT V T ! (In fact, AT V T is not even defined.)

Reverse order rule : (AV)T = V T AT

Reverse order rule : (AV)T = V T AT

The trace of a square matrix A is the sum of all diagonal

The trace does not exist for a non-square matrix.

Given two matrices X, Y 2 Rnp , the trace of the product

Mapping from two vectors x, y 2 V to a field of scalars F

Canonical Inner Products

Canonical Inner Products

Inner product for real numbers x, y 2 R:

Dot product between two vectors:

Canonical inner product for matrices :

(XT )kl (Y)lk

Calculations with Matrices

Denote (i, j) element of matrix X by Xij

((X + Y)T Z)kk

((X + Y)T )kl Zlk =

(Xlk + Ylk )Zlk

Xlk Zlk + Ylk Zlk

(X )kl Zlk + (Y )kl Zlk

Projection transformation can only have effective influenceRank, Determinant, Trace

Example : Given some unit vector u 2 Rn characterising a

Rank, Determinant, Trace

P2 x = (uuT )(uuT )x = uuT x = Px

Linear Independence of Vectors

The column rank of a matrix A is the maximal number of

Let Sn be the set of all permutation of the numbers {1, . . . , n}.

where sgn is the signature of the permutation ( +1 if the

det AT = det {A}.

Inversion reverses the order (like transposition)

Matrix Inverse and Transpose

Matrix Inverse and Transpose

How to analyse and prove such an equation?

Prove the Identity

Simplify the left-hand side

Rank, Determinant, Trace

Useful if A is diagonal and easy to invert, and B is very tall

Manipulating Matrix Equations

Dont multiply by a matrix which does not have full rank.

Every square matrix A 2 Rnn has an Eigenvector

How many eigenvalue/eigenvector pairs?

Has only non-trivial solution for det {A

How can we enforce real eigenvalues?

Denote the complex conjugate of a complex number

If A is Hermitian, then all eigenvalues are real.

Singular Value Decomposition

Every matrix A 2 Rnp can be decomposed into a product of

How to calculate a gradient?

How to calculate a gradient?

How to calculate a gradient?

Calculate the Directional Derivative

For the example f (X) = tr X T CX