You are on page 1of 40

Linear Algebra Methods

for Data Mining


Saara Hyvonen, Saara.Hyvonen@cs.helsinki.fi

Spring 2007

1. Basic Linear Algebra

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
Example 1: Term-Document matrices

Doc1 Doc2 Doc3 Doc4 Query


Term1 1 0 1 0 1
Term2 0 0 1 1 1
Term3 0 1 1 0 0

The documents and the query are represented by a vector in Rn


(here n = 3).

In applications matrices may be large!


Number of terms: 104, number of documents: 106.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1
Example 1 continued: Tasks

Find document vectors close to query.


Use some distance measure in Rn.

Use linear algebra methods for


data compression
retrieval enhancement.

Find topics or concepts from term-document matrix.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2
40
Example 2: measurement data
20

20

40
0 2 4 6 8 10 12 14 16
T672 4
x 10
60

40

20

20
0 2 4 6 8 10 12 14 16
NOx672 4
x 10
0.08

0.06

0.04

0.02

0
0 2 4 6 8 10 12 14 16
csink 4
x 10

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3
Example 2 continued

In Hyytiala Forest Field Station the 30 minute averages of some 100+


variables have measured for 10+ years...

some 175 000 time points, 100 variables: alot of data!

Possible question:
how do days vary? how do measured variables depend on each other?
what separates days when phenomenon X occurs from those when it
doesnt?
are there (independent) (pollution) sources present?

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4
Matrices


a11 a12 . . . a1n
a21 a22 . . . a2n

A= Rmn

.. .. ..
am1 am2 . . . amn

Rectangular array of data: elements are real numbers.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5
Basic concepts
vectors

norms and distances

eigenvalues, eigenvectors

linearly independent vectors, basis

orthogonal bases

matrices, orthogonal matrices

orthogonal matrix decompositions: SVD

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6
Next: quick review of the following concepts:
matrix-vector multiplication, matrix-matrix multiplication

vector norms, matrix norms

distances between vectors

eigenvalues, eigenvectors

linear independence

basis

orthogonality

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7
Matrix-vector multiplication

Pn
a11 a12 . . . a1n x1 j=1 a1j xj
Pn a x
a21 a22 . . . a2n x2

j=1 2j j
Ax = = =y

.. .. ..
.. ..
Pn
am1 am2 . . . amn xn j=1 amj xj

Symbolically


|

=

|


Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8
In practice


2 3   ?
5
6 4 = ?

2

1 0 ?

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9

2 3   2 5 + 3 (2) 4
5
6 4 = 6 5 + 4 (2) = 22

2
1 0 1 5 + 0 (2) 5

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10
Or


2 3   2 3 4
5
6 4 = 5 6 2 4 = 22

2

1 0 1 0 5

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11
Alternative presentation of matrix-vector multiplication:
Denote the column vectors of the matrix A by aj. Then

x1
n
x2
X
y = Ax = (a1a2 . . . an) = xj aj

.. j=1
xn

So the vector y is a linear combination of the columns of A.

Often this is a useful way to consider matrix-vector multiplication:

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12
Example

Let columns of A be different topics:


Topic1 Topic2 Topic3
Term1 1 0 0
Term2 1 0 0 = A.
Term3 0 1 0
Term4 0 0 1

2
Then if we multipy A by the vector w = 0 , we get...

1

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13

1 0 0
2
1 0 0

Aw = 0

0 1 0
1

0 0 1

1 0 0 2
1 0 0 2

=2 +0 +1 = = y,

0 1 0 0
0 0 1 1

which represents a document dealing primarily with topic 1 and


secondarily with topic 3.
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14
Note on computational aspects

The column oriented approach is also good when considering computa-


tional efficiency.

Modern computing devices are able to exploit the fact that a vector
operation is a very regular sequence of scalar operations.

This approach is embedded in packages like Matlab and LAPACK (and


others).

SAXPY, GAXPY

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15
Matrix-matrix multiplication

Let A Rms and B Rsn. Then, by definition,

Rmn 3 C = AB = (cij ),
s
X
cij = aik bkj , i = 1...m, j = 1...n.
k=1

Note: each column vector in B is multiplied by A.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16
In practice


2 3   ? ?
5 1
6 4 = ? ?

2 1

1 0 ? ?

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17

2 3   2532 21+31 4 6
5 1
6 4 = 6 5 4 2 6 1 + 4 1 = 22 10

2 1
1 0 1502 11+01 5 1

2 3 2 3
= 5 6 2 4 1 6 +1 4

1 0 1 0

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18
Matrix multiplication code

for i=1:m,
for j=1:n,
for k=1:s,
c(i,j)=c(i,j)+a(i,s)*b(s,j);
end
end
end

Note: loops may be permuted in 6 different ways!

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19
How to measure the size of a vector?

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20
Vector norms
The most common vector norms are
Pn
1-norm: kxk1 = i=1 |xi|
pPn
Euclidean norm: kxk2 = 2
i=1 xi

max-norm: kxk = max1in |xi|

all of the above are special cases of the Lp-norm (or p-norm):
Pn
kxkp = ( i=1 xpi)1/p

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21
General definition of a vector norm
Generally, a vector norm is a mapping Rn R, with the properties

kxk 0 for all x,

kxk = 0 if and only if x = 0,

kxk = ||kxk, for all R,

kx + yk kxk + kyk, the triangular equality.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22
How to measure distance between vectors?

Obvious answer: the distance between two vectors x and y is kx yk,


where k k is some vector norm.

Frequently one measures the distance by the Euclidean norm kx yk2.


So usually, if the index is dropped, this is what is meant.

Alternative: use the angle between two vectors x and y to measure the
distance between them.

How to calculate the angle between two vectors?

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23
Angle between vectors

The inner product between two vectors is defined by (x, y) = xT y.

This is associated with the Euclidean norm: kxk2 = (xT x)1/2.


xT y
The angle between two vectors x and y is cos = kxk2kyk2 .

The cosine of the angle between two vectors x and y can be used to
measure the similarity between the two vectors:
if x and y are close, the angle between them is small, and cos 1.
x and y are orthogonal, if = 2 , i.e. xT y = 0.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24
Why not just use the Euclidean distance?

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25
Example: term-document matrix
Each entry tells how many times a term appears in the document:
Doc1 Doc2 Doc3
Term1 10 1 0
Term2 10 1 0
Term3 0 0 1

Using the Euclidean distance Documents 1 and 2 look dissimilar, and


Documents 2 and 3 look similar. This is just due to the length of the
documents!

Using the cosine of the angle between document vectors Documents 1


and 2 are similar to each other and dissimilar to Document 3.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26
Eigenvalues and eigenvectors

Let A be a n n matrix. The vector v that satisfies

Av = v

for some scalar is called the eigenvector of A and is the eigenvalue


corresponding to the eigenvector v.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27
In practice

 
2 1
Av = v = v.
1 3

2 1
det(A I) =
= (2 )(3 ) 1 = 0
1 3

1 = 3.62 2 = 1.38
   
0.52 0.85
v1 = v2 =
0.85 0.52

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28
3

4 3 2 1 0 1 2 3 4

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29
Matrix norms
Let k.k be a vector norm and A Rmn.
The corresponding matrix norm is kAk = supx6=0 kkAx k
xk .

kAk2 = (max1in i(AT A))1/2 = square root of the largest eigen-


value of AT A. Heavy to compute!
Pn
kAk = max1im j=1 |aij | (maximum over rows)
Pm
kAk1 = max1jn i=1 |aij | (maximum over columns)
qP P
m n 2 Frobenius norm: does not correspond to
kAkF = i=1 a
j=1 ij
any vector norm. Still, related to Euclidean vector norm.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30
Linear Independence

Given a set of vectors (vj )nj=1 in Rm, m n, consider the set of linear
Pn
combinations y = j=1 j vj for arbitrary coefficients j .

The vectors (vj )nj=1 are linearly independent, if


Pn
j=1 j vj = 0 if and only if j = 0 for all j = 1, ..., n.

A set of m linearly independent vectors of Rm is called a basis in Rm:


any vector in Rm can be expressed as a linear combination of the basis
vectors.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31
Example
The column vectors of the matrix

1 0 0 1
1 1 0 0

[v1 v2 v3 v4] =
0 1 1 0

0 0 1 1

are not linearly independent, as

1v1 + 2v2 + 3v3 + 4v4 = 0

holds for 1 = 3 = 1, 2 = 4 = 1.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32
Rank of a matrix

The rank of a matrix is the maximum number of linearly independent


column vectors.

A square matrix A Rnn with rank n is called nonsingular, and it


has an inverse A1 satisfying AA1 = A1A = I.

The (outer product) matrix xyT has rank 1: All columns of

xyT = (y1x y2x . . . ynx)

are linearly dependent (and so are all the rows).

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33
Example
The 4 4 matrix

1 0 0 1
1 1 0 0


0 1 1 0


0 0 1 1

has rank 3.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34
Example
Consider a m n term-document matrix A = [a1 a2 . . . an], where
aj Rm are the documents.

If A has rank 3, then all the documents can be expressed as a linear


combination of only three vectors v1, v2 and v3 Rm:

aj = w1j v1 + w2j v2 + w3j v3, j = 1, ..., n.

The term-document matrix can be written as

A = VW

where V = (v1 v2 v3) Rm3 and W = (wij ) R3n.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35
Condition number
 
a 1
For any a 6= 1 the matrix A = is nonsingular and has the
1 1
 
1 1 1 1
inverse A = a1 .
1 a

As a 1, the norm of A1 tends to infinity.

Nonsingularity is not always enough!

Define the condition number of a matrix to be (A) = kAkkA1k.

Large condition number means trouble!

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36
Orthogonality

Two vectors x and y are orthogonal, if xT y = 0.

Let qj , j = 1, . . . , n be orthogonal, i.e. qTi qj = 0, i 6= j. Then they


are linearly independent. (Proof?)

Let the set of orthogonal vectors qj , j = 1, . . . , m in Rm be normalized,


kqk = 1. Then they are orthonormal, and constitute an orthonormal
basis in Rm .

A matrix Rmm 3 Q = [q1 q2 . . . qm] with orthonormal columns is


called an orthogonal matrix.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37
Why we like orthogonal matrices
An orthogonal matrix Q Rmm has rank m (since its columns are
linearly independent).

QT Q = I. QQT = I. (Proofs?)

The inverse of an orthogonal matrix Q is Q1 = QT .

The Euclidean length of a vector is invariant under an orthogonal


transformation Q: kQxk2 = (Qx)T Qx = xT x = kxk2.

The product of two orthogonal matrices Q and P is orthogonal:

XT X = (PQ)T PQ = QT PT PQ = QT Q = I.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38
References

[1] Lars Elden: Matrix Methods in Data Mining and Pattern Recognition,
SIAM 2007.

[2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns
Hopkins Press, Baltimore, MD., 1996.

Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39

You might also like