Professional Documents
Culture Documents
Spring 2007
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
Example 1: Term-Document matrices
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1
Example 1 continued: Tasks
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2
40
Example 2: measurement data
20
20
40
0 2 4 6 8 10 12 14 16
T672 4
x 10
60
40
20
20
0 2 4 6 8 10 12 14 16
NOx672 4
x 10
0.08
0.06
0.04
0.02
0
0 2 4 6 8 10 12 14 16
csink 4
x 10
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3
Example 2 continued
Possible question:
how do days vary? how do measured variables depend on each other?
what separates days when phenomenon X occurs from those when it
doesnt?
are there (independent) (pollution) sources present?
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4
Matrices
a11 a12 . . . a1n
a21 a22 . . . a2n
A= Rmn
.. .. ..
am1 am2 . . . amn
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5
Basic concepts
vectors
eigenvalues, eigenvectors
orthogonal bases
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6
Next: quick review of the following concepts:
matrix-vector multiplication, matrix-matrix multiplication
eigenvalues, eigenvectors
linear independence
basis
orthogonality
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7
Matrix-vector multiplication
Pn
a11 a12 . . . a1n x1 j=1 a1j xj
Pn a x
a21 a22 . . . a2n x2
j=1 2j j
Ax = = =y
.. .. ..
.. ..
Pn
am1 am2 . . . amn xn j=1 amj xj
Symbolically
|
=
|
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8
In practice
2 3 ?
5
6 4 = ?
2
1 0 ?
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9
2 3 2 5 + 3 (2) 4
5
6 4 = 6 5 + 4 (2) = 22
2
1 0 1 5 + 0 (2) 5
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10
Or
2 3 2 3 4
5
6 4 = 5 6 2 4 = 22
2
1 0 1 0 5
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11
Alternative presentation of matrix-vector multiplication:
Denote the column vectors of the matrix A by aj. Then
x1
n
x2
X
y = Ax = (a1a2 . . . an) = xj aj
.. j=1
xn
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12
Example
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13
1 0 0
2
1 0 0
Aw = 0
0 1 0
1
0 0 1
1 0 0 2
1 0 0 2
=2 +0 +1 = = y,
0 1 0 0
0 0 1 1
Modern computing devices are able to exploit the fact that a vector
operation is a very regular sequence of scalar operations.
SAXPY, GAXPY
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15
Matrix-matrix multiplication
Rmn 3 C = AB = (cij ),
s
X
cij = aik bkj , i = 1...m, j = 1...n.
k=1
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16
In practice
2 3 ? ?
5 1
6 4 = ? ?
2 1
1 0 ? ?
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17
2 3 2532 21+31 4 6
5 1
6 4 = 6 5 4 2 6 1 + 4 1 = 22 10
2 1
1 0 1502 11+01 5 1
2 3 2 3
= 5 6 2 4 1 6 +1 4
1 0 1 0
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18
Matrix multiplication code
for i=1:m,
for j=1:n,
for k=1:s,
c(i,j)=c(i,j)+a(i,s)*b(s,j);
end
end
end
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19
How to measure the size of a vector?
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20
Vector norms
The most common vector norms are
Pn
1-norm: kxk1 = i=1 |xi|
pPn
Euclidean norm: kxk2 = 2
i=1 xi
all of the above are special cases of the Lp-norm (or p-norm):
Pn
kxkp = ( i=1 xpi)1/p
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21
General definition of a vector norm
Generally, a vector norm is a mapping Rn R, with the properties
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22
How to measure distance between vectors?
Alternative: use the angle between two vectors x and y to measure the
distance between them.
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23
Angle between vectors
The cosine of the angle between two vectors x and y can be used to
measure the similarity between the two vectors:
if x and y are close, the angle between them is small, and cos 1.
x and y are orthogonal, if = 2 , i.e. xT y = 0.
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24
Why not just use the Euclidean distance?
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25
Example: term-document matrix
Each entry tells how many times a term appears in the document:
Doc1 Doc2 Doc3
Term1 10 1 0
Term2 10 1 0
Term3 0 0 1
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26
Eigenvalues and eigenvectors
Av = v
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27
In practice
2 1
Av = v = v.
1 3
2 1
det(A I) =
= (2 )(3 ) 1 = 0
1 3
1 = 3.62 2 = 1.38
0.52 0.85
v1 = v2 =
0.85 0.52
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28
3
4 3 2 1 0 1 2 3 4
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29
Matrix norms
Let k.k be a vector norm and A Rmn.
The corresponding matrix norm is kAk = supx6=0 kkAx k
xk .
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30
Linear Independence
Given a set of vectors (vj )nj=1 in Rm, m n, consider the set of linear
Pn
combinations y = j=1 j vj for arbitrary coefficients j .
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31
Example
The column vectors of the matrix
1 0 0 1
1 1 0 0
[v1 v2 v3 v4] =
0 1 1 0
0 0 1 1
holds for 1 = 3 = 1, 2 = 4 = 1.
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32
Rank of a matrix
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33
Example
The 4 4 matrix
1 0 0 1
1 1 0 0
0 1 1 0
0 0 1 1
has rank 3.
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34
Example
Consider a m n term-document matrix A = [a1 a2 . . . an], where
aj Rm are the documents.
A = VW
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35
Condition number
a 1
For any a 6= 1 the matrix A = is nonsingular and has the
1 1
1 1 1 1
inverse A = a1 .
1 a
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36
Orthogonality
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37
Why we like orthogonal matrices
An orthogonal matrix Q Rmm has rank m (since its columns are
linearly independent).
QT Q = I. QQT = I. (Proofs?)
XT X = (PQ)T PQ = QT PT PQ = QT Q = I.
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38
References
[1] Lars Elden: Matrix Methods in Data Mining and Pattern Recognition,
SIAM 2007.
[2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns
Hopkins Press, Baltimore, MD., 1996.
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39