Lecture 1

Linear Algebra Methods
for Data Mining

Saara Hyvonen, Saara.Hyvonen@cs.helsinki.fi
Spring 2007
1. Basic Linear Algebra
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
Example 1: Term-Document matrices
Doc1 Doc2 Doc3 Doc4 Query

Term1 1 0 1 0 1
Term2 0 0 1 1 1
Term3 0 1 1 0 0
The documents and the query are represented by a vector in Rn

(here n = 3).
In applications matrices may be large!

Number of terms: 104, number of documents: 106.
Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1
Example 1 continued: Tasks
Find document vectors close to query.

Use some distance measure in Rn.
Use linear algebra methods for

data compression
retrieval enhancement.
Find topics or concepts from term-document matrix.
40
Example 2: measurement data
20
20
40
0 2 4 6 8 10 12 14 16
T672 4
x 10
60
40
20
20
0 2 4 6 8 10 12 14 16
NOx672 4
x 10
0.08
0.06
0.04
0.02
0
0 2 4 6 8 10 12 14 16
csink 4
x 10
Example 2 continued
In Hyytiala Forest Field Station the 30 minute averages of some 100+

variables have measured for 10+ years...
some 175 000 time points, 100 variables: alot of data!
Possible question:
how do days vary? how do measured variables depend on each other?
what separates days when phenomenon X occurs from those when it
doesnt?
are there (independent) (pollution) sources present?
Matrices

a11 a12 . . . a1n
a21 a22 . . . a2n

A= Rmn

.. .. ..
am1 am2 . . . amn
Rectangular array of data: elements are real numbers.
Basic concepts
vectors
norms and distances
eigenvalues, eigenvectors
linearly independent vectors, basis
orthogonal bases
matrices, orthogonal matrices
orthogonal matrix decompositions: SVD
Next: quick review of the following concepts:
matrix-vector multiplication, matrix-matrix multiplication
vector norms, matrix norms
distances between vectors
eigenvalues, eigenvectors
linear independence
basis
orthogonality
Matrix-vector multiplication
Pn
a11 a12 . . . a1n x1 j=1 a1j xj
Pn a x
a21 a22 . . . a2n x2

j=1 2j j
Ax = = =y

.. .. ..
.. ..
Pn
am1 am2 . . . amn xn j=1 amj xj
Symbolically

|

=

|

In practice

2 3 ?
5
6 4 = ?

2

1 0 ?

2 3 2 5 + 3 (2) 4
5
6 4 = 6 5 + 4 (2) = 22

2
1 0 1 5 + 0 (2) 5
Or

2 3 2 3 4
5
6 4 = 5 6 2 4 = 22

2

1 0 1 0 5
Alternative presentation of matrix-vector multiplication:
Denote the column vectors of the matrix A by aj. Then

x1
n
x2
X
y = Ax = (a1a2 . . . an) = xj aj

.. j=1
xn
So the vector y is a linear combination of the columns of A.
Often this is a useful way to consider matrix-vector multiplication:
Example
Let columns of A be different topics:

Topic1 Topic2 Topic3
Term1 1 0 0
Term2 1 0 0 = A.
Term3 0 1 0
Term4 0 0 1

2
Then if we multipy A by the vector w = 0 , we get...

1

1 0 0
2
1 0 0

Aw = 0

0 1 0
1

0 0 1

1 0 0 2
1 0 0 2

=2 +0 +1 = = y,

0 1 0 0
0 0 1 1
which represents a document dealing primarily with topic 1 and

secondarily with topic 3.
Note on computational aspects
The column oriented approach is also good when considering computa-

tional efficiency.
Modern computing devices are able to exploit the fact that a vector
operation is a very regular sequence of scalar operations.
This approach is embedded in packages like Matlab and LAPACK (and

others).
SAXPY, GAXPY
Matrix-matrix multiplication
Let A Rms and B Rsn. Then, by definition,
Rmn 3 C = AB = (cij ),
s
X
cij = aik bkj , i = 1...m, j = 1...n.
k=1
Note: each column vector in B is multiplied by A.
In practice

2 3 ? ?
5 1
6 4 = ? ?

2 1

1 0 ? ?

2 3 2532 21+31 4 6
5 1
6 4 = 6 5 4 2 6 1 + 4 1 = 22 10

2 1
1 0 1502 11+01 5 1

2 3 2 3
= 5 6 2 4 1 6 +1 4

1 0 1 0
Matrix multiplication code
for i=1:m,
for j=1:n,
for k=1:s,
c(i,j)=c(i,j)+a(i,s)*b(s,j);
end
end
end
Note: loops may be permuted in 6 different ways!
How to measure the size of a vector?
Vector norms
The most common vector norms are
Pn
1-norm: kxk1 = i=1 |xi|
pPn
Euclidean norm: kxk2 = 2
i=1 xi
max-norm: kxk = max1in |xi|
all of the above are special cases of the Lp-norm (or p-norm):
Pn
kxkp = ( i=1 xpi)1/p
General definition of a vector norm
Generally, a vector norm is a mapping Rn R, with the properties
kxk 0 for all x,
kxk = 0 if and only if x = 0,
kxk = ||kxk, for all R,
kx + yk kxk + kyk, the triangular equality.
How to measure distance between vectors?
Obvious answer: the distance between two vectors x and y is kx yk,

where k k is some vector norm.
Frequently one measures the distance by the Euclidean norm kx yk2.

So usually, if the index is dropped, this is what is meant.
Alternative: use the angle between two vectors x and y to measure the
distance between them.
How to calculate the angle between two vectors?
Angle between vectors
The inner product between two vectors is defined by (x, y) = xT y.
This is associated with the Euclidean norm: kxk2 = (xT x)1/2.

xT y
The angle between two vectors x and y is cos = kxk2kyk2 .
The cosine of the angle between two vectors x and y can be used to
measure the similarity between the two vectors:
if x and y are close, the angle between them is small, and cos 1.
x and y are orthogonal, if = 2 , i.e. xT y = 0.
Why not just use the Euclidean distance?
Example: term-document matrix
Each entry tells how many times a term appears in the document:
Doc1 Doc2 Doc3
Term1 10 1 0
Term2 10 1 0
Term3 0 0 1
Using the Euclidean distance Documents 1 and 2 look dissimilar, and

Documents 2 and 3 look similar. This is just due to the length of the
documents!
Using the cosine of the angle between document vectors Documents 1

and 2 are similar to each other and dissimilar to Document 3.
Eigenvalues and eigenvectors
Let A be a n n matrix. The vector v that satisfies
Av = v
for some scalar is called the eigenvector of A and is the eigenvalue

corresponding to the eigenvector v.
In practice

2 1
Av = v = v.
1 3

2 1
det(A I) =
= (2 )(3 ) 1 = 0
1 3
1 = 3.62 2 = 1.38

0.52 0.85
v1 = v2 =
0.85 0.52
3
4 3 2 1 0 1 2 3 4
Matrix norms
Let k.k be a vector norm and A Rmn.
The corresponding matrix norm is kAk = supx6=0 kkAx k
xk .
kAk2 = (max1in i(AT A))1/2 = square root of the largest eigen-

value of AT A. Heavy to compute!
Pn
kAk = max1im j=1 |aij | (maximum over rows)
Pm
kAk1 = max1jn i=1 |aij | (maximum over columns)
qP P
m n 2 Frobenius norm: does not correspond to
kAkF = i=1 a
j=1 ij
any vector norm. Still, related to Euclidean vector norm.
Linear Independence
Given a set of vectors (vj )nj=1 in Rm, m n, consider the set of linear
Pn
combinations y = j=1 j vj for arbitrary coefficients j .
The vectors (vj )nj=1 are linearly independent, if

Pn
j=1 j vj = 0 if and only if j = 0 for all j = 1, ..., n.
A set of m linearly independent vectors of Rm is called a basis in Rm:

any vector in Rm can be expressed as a linear combination of the basis
vectors.
Example
The column vectors of the matrix

1 0 0 1
1 1 0 0

[v1 v2 v3 v4] =
0 1 1 0

0 0 1 1
are not linearly independent, as
1v1 + 2v2 + 3v3 + 4v4 = 0
holds for 1 = 3 = 1, 2 = 4 = 1.
Rank of a matrix
The rank of a matrix is the maximum number of linearly independent

column vectors.
A square matrix A Rnn with rank n is called nonsingular, and it

has an inverse A1 satisfying AA1 = A1A = I.
The (outer product) matrix xyT has rank 1: All columns of
xyT = (y1x y2x . . . ynx)
are linearly dependent (and so are all the rows).
Example
The 4 4 matrix

1 0 0 1
1 1 0 0

0 1 1 0

0 0 1 1
has rank 3.
Example
Consider a m n term-document matrix A = [a1 a2 . . . an], where
aj Rm are the documents.
If A has rank 3, then all the documents can be expressed as a linear

combination of only three vectors v1, v2 and v3 Rm:
aj = w1j v1 + w2j v2 + w3j v3, j = 1, ..., n.
The term-document matrix can be written as
A = VW
where V = (v1 v2 v3) Rm3 and W = (wij ) R3n.
Condition number

a 1
For any a 6= 1 the matrix A = is nonsingular and has the
1 1

1 1 1 1
inverse A = a1 .
1 a
As a 1, the norm of A1 tends to infinity.
Nonsingularity is not always enough!
Define the condition number of a matrix to be (A) = kAkkA1k.
Large condition number means trouble!
Orthogonality
Two vectors x and y are orthogonal, if xT y = 0.
Let qj , j = 1, . . . , n be orthogonal, i.e. qTi qj = 0, i 6= j. Then they

are linearly independent. (Proof?)
Let the set of orthogonal vectors qj , j = 1, . . . , m in Rm be normalized,

kqk = 1. Then they are orthonormal, and constitute an orthonormal
basis in Rm .
A matrix Rmm 3 Q = [q1 q2 . . . qm] with orthonormal columns is

called an orthogonal matrix.
Why we like orthogonal matrices
An orthogonal matrix Q Rmm has rank m (since its columns are
linearly independent).
QT Q = I. QQT = I. (Proofs?)
The inverse of an orthogonal matrix Q is Q1 = QT .
The Euclidean length of a vector is invariant under an orthogonal

transformation Q: kQxk2 = (Qx)T Qx = xT x = kxk2.
The product of two orthogonal matrices Q and P is orthogonal:
XT X = (PQ)T PQ = QT PT PQ = QT Q = I.
References
[1] Lars Elden: Matrix Methods in Data Mining and Pattern Recognition,
SIAM 2007.
[2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns
Hopkins Press, Baltimore, MD., 1996.

Lecture 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1

Uploaded by

Copyright:

Available Formats

Linear Algebra Methods

for Data Mining

1. Basic Linear Algebra

Doc1 Doc2 Doc3 Doc4 Query

The documents and the query are represented by a vector in Rn

In applications matrices may be large!

Find document vectors close to query.

Use linear algebra methods for

Find topics or concepts from term-document matrix.

In Hyytiala Forest Field Station the 30 minute averages of some 100+

some 175 000 time points, 100 variables: alot of data!

Rectangular array of data: elements are real numbers.

norms and distances

linearly independent vectors, basis

matrices, orthogonal matrices

orthogonal matrix decompositions: SVD

vector norms, matrix norms

distances between vectors

So the vector y is a linear combination of the columns of A.

Often this is a useful way to consider matrix-vector multiplication:

Let columns of A be different topics:

which represents a document dealing primarily with topic 1 and

The column oriented approach is also good when considering computa-

This approach is embedded in packages like Matlab and LAPACK (and

Let A Rms and B Rsn. Then, by definition,

Note: each column vector in B is multiplied by A.

Note: loops may be permuted in 6 different ways!

max-norm: kxk = max1in |xi|

kxk 0 for all x,

kxk = 0 if and only if x = 0,

kxk = ||kxk, for all R,

kx + yk kxk + kyk, the triangular equality.

Obvious answer: the distance between two vectors x and y is kx yk,

Frequently one measures the distance by the Euclidean norm kx yk2.

How to calculate the angle between two vectors?

The inner product between two vectors is defined by (x, y) = xT y.

This is associated with the Euclidean norm: kxk2 = (xT x)1/2.

Using the Euclidean distance Documents 1 and 2 look dissimilar, and

Using the cosine of the angle between document vectors Documents 1

Let A be a n n matrix. The vector v that satisfies

for some scalar is called the eigenvector of A and is the eigenvalue

kAk2 = (max1in i(AT A))1/2 = square root of the largest eigen-

The vectors (vj )nj=1 are linearly independent, if

A set of m linearly independent vectors of Rm is called a basis in Rm:

are not linearly independent, as

1v1 + 2v2 + 3v3 + 4v4 = 0

The rank of a matrix is the maximum number of linearly independent

A square matrix A Rnn with rank n is called nonsingular, and it

The (outer product) matrix xyT has rank 1: All columns of

xyT = (y1x y2x . . . ynx)

are linearly dependent (and so are all the rows).

If A has rank 3, then all the documents can be expressed as a linear

aj = w1j v1 + w2j v2 + w3j v3, j = 1, ..., n.

The term-document matrix can be written as

where V = (v1 v2 v3) Rm3 and W = (wij ) R3n.

As a 1, the norm of A1 tends to infinity.

Nonsingularity is not always enough!

Define the condition number of a matrix to be (A) = kAkkA1k.

Large condition number means trouble!

Two vectors x and y are orthogonal, if xT y = 0.

Let qj , j = 1, . . . , n be orthogonal, i.e. qTi qj = 0, i 6= j. Then they

Let the set of orthogonal vectors qj , j = 1, . . . , m in Rm be normalized,

A matrix Rmm 3 Q = [q1 q2 . . . qm] with orthonormal columns is

The inverse of an orthogonal matrix Q is Q1 = QT .