You are on page 1of 28

Chapter 3

Linear least-squares problems


In this chapter we discuss overdetermined sets of linear equations
Ax = b,
with A R
mn
, b R
m
, and m > n. An overdetermined set of linear equations
usually does not have a solution, so we have to accept an approximate solution, and
dene what we mean by solving the equations. The most widely used denition
is the least-squares solution, which minimizes |Ax b|
2
.
3.1 Least-squares problems
A vector x minimizes |Ax b|
2
if |A x b|
2
|Ax b|
2
for all x. We use the
notation
minimize |Ax b|
2
(3.1)
to denote the problem of nding an x that minimizes |Ax b|
2
. This is called a
least-squares problem (LS problem), for the following reason. Suppose A R
mn
and b R
m
, and let r
i
(x) be the ith component of Ax b:
r
i
(x) = a
i1
x
1
+ a
i2
x
2
+ + a
in
x
n
b
i
, i = 1, . . . , m. (3.2)
By denition of the Euclidean norm,
|Ax b|
2
=
m

i=1
r
i
(x)
2
,
so the function that we minimize in (3.1) is the sum of the squares of m functions
r
i
(x), hence the term least-squares. The problem is sometimes called linear least-
squares to emphasize that each of the functions r
i
in (3.2) is ane (meaning, a
linear function

ij
a
ij
x
j
plus a constant b
i
), and to distinguish it from nonlinear
least-squares problems, in which we allow arbitrary functions r
i
.
68 3 Linear least-squares problems
We sometimes omit the square in (3.1) and use the notation
minimize |Ax b|
instead. This is justied because if x minimizes |Ax b|
2
, then it also minimizes
|Ax b|.
Example
We consider the least-squares problem dened by
A =
_
_
2 0
1 1
0 2
_
_
, b =
_
_
1
0
1
_
_
.
The set of three equations in two variables Ax = b, i.e.,
2x
1
= 1, x
1
+ x
2
= 0, 2x
2
= 1,
has no solution. The corresponding least-squares problem is
minimize (2x
1
1)
2
+ (x
1
+ x
2
)
2
+ (2x
2
+ 1)
2
.
We can nd the solution by setting the derivatives of the function
f(x) = (2x
1
1)
2
+ (x
1
+ x
2
)
2
+ (2x
2
+ 1)
2
equal to zero:
f(x)
x
1
= 4(2x
1
1) 2(x
1
+ x
2
) = 0
f(x)
x
2
= 2(x
1
+ x
2
) + 4(2x
2
+ 1) = 0.
This gives two equations in two variables
10x
1
2x
2
= 4, 2x
1
+ 10x
2
= 4
with a unique solution x = (1/3, 1/3).
3.2 Applications
3.2.1 Data tting
Example
Figure 3.1 shows 40 points (t
i
, y
i
) in a plane (shown as circles). The solid line
shows a function g(t) = + t that satises
g(t
i
) = + t
i
y
i
, i = 1, . . . , 40.
3.2 Applications 69
10 5 0 5 10
20
15
10
5
0
5
10
15
20
25
PSfrag replacements
t
g
(
t
)
Figure 3.1 Least-squares t of a straight line to 40 points in a plane.
To compute coecients and that give a good t, we rst need to decide on an
approximation criterion or error function. In this example we use the error function
40

i=1
(g(t
i
) t
i
)
2
=
40

i=1
( + t
i
y
i
)
2
.
This is the sum of squares of the errors [g(t
i
) y
i
[. We then choose values of and
that minimize this error function. This can be achieved by solving a least-squares
problem: if we dene
x =
_

_
, A =
_

_
1 t
1
1 t
2
.
.
.
.
.
.
1 t
40
_

_
, b =
_

_
y
1
y
2
.
.
.
y
40
_

_
,
then |Ax b|
2
=

40
i=1
( + t
i
y
i
)
2
, so we can minimize the error function by
solving the least-squares problem
minimize |Ax b|
2
.
The Matlab code for gure 3.1 is given below. (We assume that t and y are
given as two 40 1 arrays t and y.)
x = [ones(40,1) t]\y;
plot(t,y,o, [-11; 11], [x(1)-11*x(2); x(1)+11*x(2)], -);
Note that we use the command x=A\b to solve the least-squares problem (3.1).
70 3 Linear least-squares problems
Least-squares data tting
In a general data tting problem, we are given m data points (t
i
, y
i
) where y
i
R,
and we are asked to nd a function g(t) such that
g(t
i
) y
i
, i = 1, . . . , m. (3.3)
In least-squares data tting we restrict ourselves to functions g of the form
g(t) = x
1
g
1
(t) + x
2
g
2
(t) + + x
n
g
n
(t),
where the functions g
i
(t) (called basis functions) are given, and the coecients x
i
are parameters to be determined. The simple tting problem of gure 3.1 is an
example with two basis functions g
1
(t) = 1, g
2
(t) = t.
We use
m

i=1
(g(t
i
) y
i
)
2
=
m

i=1
(x
1
g
1
(t
i
) + x
2
g
2
(t
i
) + + x
n
g
n
(t
i
) y
i
)
2
to judge the quality of the t (i.e., the error in (3.3)), and determine the values
of the coecients x
i
by minimizing the error function. This can be expressed as a
least-squares problem with
A =
_

_
g
1
(t
1
) g
2
(t
1
) g
n
(t
1
)
g
1
(t
2
) g
2
(t
2
) g
n
(t
2
)
.
.
.
.
.
.
.
.
.
g
1
(t
m
) g
2
(t
m
) g
n
(t
m
)
_

_
, b =
_

_
y
1
y
2
.
.
.
y
m
_

_
.
Figure 3.2 shows an example in which we t a polynomial
g(t) = x
1
+ x
2
t + x
3
t
2
+ + x
10
t
9
to 50 points. (Here the basis functions are g
1
(t) = 1, g
2
(t) = t, . . . , g
10
(t) = t
9
).
The Matlab code is given below. We assume that the data points are given by
two arrays t and y of length 50.
n = 10;
A = fliplr(vander(t)); % mxm matrix with elements A_(ij) = t_i^(j-1)
A = A(:,1:n); % first n columns of A
x = A\y;
% to plot g, generate 1000 points ui between -1 and 1
u = linspace(-1,1,1000);
% evaluate g(ui) for all ui
g = x(n)*ones(1,1000);
for i = (n-1):-1:1
g = g.*u + x(i);
end;
% plot data points and g
plot(u,g,-, t, y, o);
3.2 Applications 71
1 0.5 0 0.5 1
0.2
0
0.2
0.4
0.6
0.8
1
PSfrag replacements
t
g
(
t
)
Figure 3.2 Least-squares polynomial t of degree 9 (solid line) to 50 points
(shown as circles).
3.2.2 Estimation
Suppose y R
m
is a vector of measurements, x R
n
is a vector of parameters to
be estimated, and y and x are related as
y = Ax + w.
The matrix A R
mn
, which describes how the measured values depend on the
unknown parameters, is given. The vector w R
m
is measurement error, and is
unknown but presumably small (in the norm | |). The estimation problem is to
make a sensible guess as to what x is, given y.
If we guess that x has the value x, then we are implicitly making the guess that
w has the value y A x. Assuming that smaller values of w (measured by |w|)
are more plausible than larger values, a sensible guess is the least-squares solution,
which minimizes |A x y|.
Example
We are asked to estimate the position (u, v) of a point in a plane, given noisy
measurements of the distances
_
(u p
i
)
2
+ (v q
i
)
2
of the point to four points with known positions (p
i
, q
i
). We denote the measured
distances as
i
, i = 1, 2, 3, 4.
We assume hat (u, v) (0, 0), and use the approximation
_
(u p
i
)
2
+ (v q
i
)
2

_
p
2
i
+ q
2
i

up
i
_
p
2
i
+ q
2
i

vq
i
_
p
2
i
+ q
2
i
, (3.4)
72 3 Linear least-squares problems
which holds for small (u, v). With this approximation, we can write the measured
distances as

i
=
_
p
2
i
+ q
2
i

p
i
u
_
p
2
i
+ q
2
i

q
i
v
_
p
2
i
+ q
2
i
+ w
i
,
where the error w
i
includes measurement error, as well as the error due to the
linearization (3.4).
To compute estimates u and v, given the measurements
i
and the positions
(p
i
, q
i
), we can solve a least-squares problem with
x =
_
u
v
_
, A =
_

_
p
1
/
_
p
2
1
+ q
2
1
q
1
/
_
p
2
1
+ q
2
1
p
2
/
_
p
2
2
+ q
2
2
q
2
/
_
p
2
2
+ q
2
2
p
3
/
_
p
2
3
+ q
2
3
q
3
/
_
p
2
3
+ q
2
3
p
4
/
_
p
2
4
+ q
2
4
q
4
/
_
p
2
4
+ q
2
4
_

_
, b =
_

1

_
p
2
1
+ q
2
1

2

_
p
2
2
+ q
2
2

3

_
p
2
3
+ q
2
3

4

_
p
2
4
+ q
2
4
_

_
.
The Matlab code below solves an example problem with
(p
1
, q
1
) = (10, 0), (p
2
, q
2
) = (10, 2), (p
3
, q
3
) = (3, 9), (p
4
, q
4
) = (10, 10),
and (
1
,
2
,
3
,
4
) = (8.22, 11.9, 7.08, 11.33).
p = [10; -10; 3; 10];
q = [0; 2; 9; 10];
rho = [8.22; 11.9; 7.08; 11.33];
b = rho - sqrt(p.^2+q.^2);
A = -[p./sqrt(p.^2+q.^2) q./sqrt(p.^2+q.^2)];
x = A\b;
The least-squares estimate is x = ( u, v) = (1.9676, 1.9064).
3.3 The solution of a least-squares problem
In this paragraph we investigate conditions for uniqueness of the solution of a
least-squares problem, and we derive an explicit expression for the solution. The
condition for uniqueness is related to the rank of A.
3.3.1 Full rank matrices
The rank of a matrix A R
mn
, denoted rank(A), is the maximum number of
linearly independent vectors among the columns of A. It immediately follows that
the rank can never exceed the number of columns: rank(A) n.
Determining the rank of a matrix (analytically or numerically) is a dicult
task, but fortunately it is rarely needed. In this course, we will only encounter the
following simpler question: is rank(A) equal to n or is it less than n? (And if it
is less than n, we will never be interested in the exact value.) We can answer this
3.3 The solution of a least-squares problem 73
question by applying the denition of rank: rank(A) = n if the n columns of A
are linearly independent, or in other words,
Ax = 0 = x = 0. (3.5)
(Recall that the vector Ax is a linear combination of the columns of A, with co-
ecients x
i
. The columns are linearly dependent if there exists a nontrivial linear
combination (x ,= 0) that makes Ax = 0. The columns are linearly independent if
the only way to make Ax = 0 is to choose the trivial linear combination x = 0.)
If (3.5) holds, then rank(A) = n. If it does not, i.e., there exists an x ,= 0 with
Ax = 0, then rank(A) < n.
If A is square (m = n), then the condition (3.5) means that A is nonsingular.
Hence, an n n matrix is nonsingular if and only if rank(A) = n.
From linear algebra we know that rank(A) = rank(A
T
). The columns of
A
T
are the rows of A, so the rank of A is also equal to the number of linearly
independent rows in A. As a consequence, the rank of a matrix can never exceed
the number of rows: rank(A) m for an m n matrix A. To test whether
rank(A) = m or rank(A) < m, we apply the condition (3.5) to A
T
: rank(A) = m
if and only if
A
T
x = 0 = x = 0. (3.6)
To summarize, if A is an m n-matrix, then rank(A) minm, n. We say
that A is full rank if
rank(A) = minm, n.
Specically, if A has fewer rows than columns (m < n), then it is full rank if
rank(A) = m. If A has more rows than columns (m > n), then it is full rank if
rank(A) = n. If A is square (m = n), full rank means the same as nonsingular.
To determine whether a matrix is full rank, we can check condition (3.5) if m n,
and condition (3.6) if m n.
Examples
The matrix
A =
_

_
1 1
2 2
1 0
0 0
_

_
has rank 2, so it is full rank. We can verify that (3.5) holds:
Ax =
_

_
1 1
2 2
1 0
0 0
_

_
_
x
1
x
2
_
=
_

_
x
1
x
2
2x
1
+ 2x
2
x
1
0
_

_
,
and Ax = 0 is only possible if x
1
= x
2
= 0. The matrix
A =
_
1 2 1 0
1 2 0 0
_
74 3 Linear least-squares problems
is also full rank for the same reason. The matrix
A =
_

_
1 1 3
2 2 2
1 0 2
0 0 0
_

_
is not full rank. We have
Ax =
_

_
1 1 3
2 2 2
1 0 2
0 0 0
_

_
_
_
x
1
x
2
x
3
_
_
=
_

_
x
1
x
2
+ 3x
3
2x
1
+ 2x
2
+ 2x
3
x
1
+ 2x
3
0
_

_
and Ax = 0 for x = (2, 1, 1).
3.3.2 The normal equations
In this section we prove the following result. If A R
mn
has rank n, then the
solution of the least-squares problem
minimize |Ax b|
2
is unique and given by
x = (A
T
A)
1
A
T
b. (3.7)
Proof
We rst show that A
T
A is positive denite if rank(A) = n. We have
x
T
A
T
Ax = (Ax)
T
(Ax) = |Ax|
2
so A
T
A is certainly positive semidenite (x
T
A
T
Ax 0 for all x). Moreover if
x ,= 0, then Ax ,= 0 (because A has rank n), and therefore |Ax| ,= 0. This means
that
x
T
A
T
Ax > 0
for all x ,= 0, i.e., A
T
A is positive denite. Positive denite matrices are nonsin-
gular, so the inverse (A
T
A)
1
in the expression (3.7) exists.
Next we show that x satises
|A x b|
2
< |Ax b|
2
for all x ,= x. We start by writing
|Ax b|
2
= |(Ax A x) + (A x b)|
2
= |Ax A x|
2
+|A x b|
2
+ 2(Ax A x)
T
(A x b), (3.8)
where we use
|u + v|
2
= (u + v)
T
(u + v) = u
T
u + 2u
T
v + v
T
v = |u|
2
+|v|
2
+ 2u
T
v.
3.4 Solving LS problems by Cholesky factorization 75
The third term in (3.8) is zero:
(Ax A x)
T
(A x b) = (x x)
T
A
T
(A x b)
= (x x)
T
(A
T
A x A
T
b)
= 0
because x = (A
T
A)
1
A
T
b. With this simplication, (3.8) reduces to
|Ax b|
2
= |A(x x)|
2
+|A x b|
2
.
The rst term is nonnegative for all x, and therefore
|Ax b|
2
|A x b|
2
.
Moreover we have equality only if A(x x) = 0, i.e., x = x (because rank(A) = n).
We conclude that |Ax b|
2
> |A x b|
2
for all x ,= x.
3.4 Solving LS problems by Cholesky factorization
The result in 3.3.2 means that if rank(A) = n, we can compute the least-squares
solution by solving a set of linear equations
(A
T
A)x = A
T
b. (3.9)
These equations are called the normal equations associated with the least-squares
problem. As we have seen in the proof above, A
T
A is symmetric positive denite,
so we can use the Cholesky factorization.
Algorithm 3.1 Solving least-squares problems by Cholesky factorization.
given A R
mn
and b R
m
with rankA = n.
1. Form C = A
T
A and d = A
T
b.
2. Compute the Cholesky factorization C = LL
T
.
3. Solve Lz = d by forward substitution.
4. Solve L
T
x = z by backward substitution.
The cost of forming C = A
T
A in step 1 is roughly mn
2
ops. To see this, note that
C is symmetric, so we only need to compute the (1/2)n(n+1) (1/2)n
2
elements
in its lower triangular part. Each element of C is the inner product of two columns
of A, and takes 2m1 2m ops to compute, so the total is (1/2)n
2
(2m) = mn
2
.
The cost of calculating d = A
T
b is 2mn. The cost of step 2 is (1/3)n
3
. Steps 3 and 4
each cost n
2
. The total cost of the algorithm is therefore mn
2
+(1/3)n
3
+2mn+2n
2
or roughly
mn
2
+ (1/3)n
3
.
Note that the condition rank(A) = n implies that m n, so the most expensive
step is the matrix-matrix multiplication A
T
A, and not the Cholesky factorization.
76 3 Linear least-squares problems
Example
We solve the least-squares problem with m = 4, n = 3, and
A = (1/5)
_

_
3 6 26
4 8 7
0 4 4
0 3 3
_

_
, b =
_

_
1
1
1
1
_

_
. (3.10)
We rst calculate C = A
T
A and d = A
T
b:
C =
_
_
1 2 2
2 5 3
2 3 30
_
_
, d =
_
_
7/5
13/5
4
_
_
.
To solve the normal equations
_
_
1 2 2
2 5 3
2 3 30
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
7/5
13/5
4
_
_
,
we rst compute the Cholesky factorization of C:
C =
_
_
1 2 2
2 5 3
2 3 30
_
_
=
_
_
1 0 0
2 1 0
2 1 5
_
_
_
_
1 2 2
0 1 1
0 0 5
_
_
.
We then solve
_
_
1 0 0
2 1 0
2 1 5
_
_
_
_
z
1
z
2
z
3
_
_
=
_
_
7/5
13/5
4
_
_
,
by forward substitution and nd (z
1
, z
2
, z
3
) = (7/5, 1/5, 1/5). Finally we solve
_
_
1 2 2
0 1 1
0 0 5
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
7/5
1/5
1/5
_
_
by backward substitution. The solution is
x = (41/25, 4/25, 1/25).
3.5 Solving LS problems by QR factorization
Although the Cholesky factorization method is often used in practice, it is not the
most accurate method for solving a least-squares problem, for reasons that we will
clarify later. The recommended method is based on the QR factorization of A.
3.5 Solving LS problems by QR factorization 77
3.5.1 Denition
If A R
mn
has rank n, then it can be factored as
A = QR
where Q R
mn
is orthogonal (Q
T
Q = I) and R R
nn
is upper triangular with
positive diagonal elements. This is called the QR factorization of A. The cost of a
QR factorization is 2mn
2
ops.
As an example, one can verify that the matrix A given in (3.10) can be factored
as A = QR with
Q =
_

_
3/5 0 4/5
4/5 0 3/5
0 4/5 0
0 3/5 0
_

_
, R =
_
_
1 2 2
0 1 1
0 0 5
_
_
, (3.11)
and that Q
T
Q = I.
We will see in 3.6 how the QR factorization can be computed.
3.5.2 Orthogonal matrices
A matrix Q is orthogonal if Q
T
Q = I. We have already encountered square or-
thogonal matrices before (in 2.3.4). Here we extend the denition and allow Q to
be non-square.
The following properties of orthogonal matrices are useful. Let Q R
mn
be
an orthogonal matrix.
1. rank(Q) = n. We can verify that Qx = 0 implies x = 0: Qx = 0 implies
Q
T
Qx = 0 which implies x = 0 because Q
T
Q = I.
2. m n. This follows from the rst property and the fact that rank(Q)
minm, n for any m n matrix. In other words, an orthogonal matrix is
either square (m = n) or skinny (m > n), and it is full rank.
3. |Qx| = |x| for all x and (Qx)
T
(Qy) = x
T
y for all x and y. In other words
multiplying vectors with an orthogonal matrix preserves norms and inner
products. As a consequence, it also preserves the angle between vectors:
cos

(Qx, Qy) =
(Qx)
T
(Qy)
|Qx| |Qy|
=
x
T
y
|x| |y|
= cos

(x, y).
4. |Q| = 1. This follows from the denition of matrix norm, and the previous
property:
|Q| = max
x=0
|Qx|
|x|
= max
x=0
|x|
|x|
= 1.
5. If m = n, then Q
1
= Q
T
and we also have QQ
T
= I.
It is important to keep in mind that the last property (QQ
T
= I) only holds for
square orthogonal matrices. If m > n, then QQ
T
,= I.
78 3 Linear least-squares problems
3.5.3 Solving least-squares problems by QR factorization
We now return to the least-squares problem (3.1). We have seen in 3.3.2 that if
rank(A) = n, then the least-squares solution can be found by solving the normal
equations (3.9). The QR factorization provides an alternative method for solving
the normal equations.
The normal equations
A
T
Ax = A
T
b
simplify if we replace A by A = QR, and use the property that Q
T
Q = I. We have
A
T
A = (QR)
T
(QR) = R
T
Q
T
QR = R
T
R,
so the normal equations reduce to
R
T
Rx = R
T
Q
T
b.
The matrix R is invertible (it is upper triangular with positive diagonal elements),
so we can multiply with R
T
on both sides, and obtain
Rx = Q
T
b.
This suggests the following method for solving the least-squares problem.
Algorithm 3.2 Solving least-squares problems by QR factorization.
given A R
mn
and b R
m
with rankA = n.
1. Compute the QR factorization A = QR.
2. Compute d = Q
T
b.
3. Solve Rx = d by backward substitution.
The cost of step 1 (QR factorization) is 2mn
2
. Step 2 costs 2mn, and step 3 costs
n
2
, so the total cost is 2mn
2
+ 2mn + n
2
or roughly
2mn
2
.
We see that the QR factorization method is always slower than the Cholesky fac-
torization method (which costs mn
2
+(1/3)n
3
). It is about twice as slow if m n.
Example
As an example, we can solve the least-squares problem with A and b given in (3.10),
using the QR factorization of A given in (3.11). We have
Q
T
b =
_
_
3/5 4/5 0 0
0 0 4/5 3/5
4/5 3/5 0 0
_
_
_

_
1
1
1
1
_

_
=
_
_
7/5
1/5
1/5
_
_
.
To nd the solution, we solve Rx = Q
T
b, i.e.,
_
_
1 2 2
0 1 1
0 0 5
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
7/5
1/5
1/5
_
_
by backward substitution. The solution is x = (41/25, 4/25, 1/25).
3.6 Computing the QR factorization 79
3.6 Computing the QR factorization
We now turn to the question of computing the QR factorization of a matrix A
R
mn
with rank n.
We start by partitioning A, Q, R as follows
A =
_
a
1
A
2

, Q =
_
q
1
Q
2

, R =
_
r
11
R
12
0 R
22
_
. (3.12)
where a
1
R
m1
, A
2
R
m(n1)
, q
1
R
m1
, Q
2
R
m(n1)
, r
11
R,
R
12
R
1(n1)
, R
22
R
(n1)(n1)
. We want Q to be orthogonal, i.e.,
Q
T
Q =
_
q
T
1
Q
T
2
_
_
q
1
Q
2

=
_
q
T
1
q
1
q
T
1
Q
2
Q
T
2
q
1
Q
T
2
Q
2
_
=
_
1 0
0 I
_
,
which gives the following conditions on q
1
and Q
2
:
q
T
1
q
1
= 1, q
T
1
Q
2
= 0, Q
T
2
Q
2
= I. (3.13)
In addition we want R upper triangular with positive diagonal elements, so r
11
> 0,
and R
22
is upper triangular with positive diagonal elements.
To determine Q and R we write the identity A = QR in terms of the partitioned
matrices
_
a
1
A
2

=
_
q
1
Q
2

_
r
11
R
12
0 R
22
_
=
_
q
1
r
11
q
1
R
12
+ Q
2
R
22

.
Comparing the rst columns we see that a
1
= q
1
r
11
. The vector q
1
must have unit
norm (see the rst condition in (3.13)), and r
11
must be positive. We therefore
choose
r
11
= |a
1
|, q
1
= (1/r
11
)a
1
.
The next step is to determine R
12
. From (3.14), we have
A
2
= q
1
R
12
+ Q
2
R
22
.
Multiplying both sides of the equality on the left with q
T
1
and using the fact that
q
T
1
q
1
= 1 and q
T
1
Q
2
= 0 (from (3.13)), we obtain
q
T
1
A
2
= q
T
1
q
1
R
12
+ q
T
1
Q
2
R
22
= R
12
so we can compute R
12
as R
12
= q
T
1
A
2
. Finally, we compute Q
2
, R
22
from the
identity
A
2
q
1
R
12
= Q
2
R
22
. (3.14)
The matrix on the left is known, because we have already computed q
1
and R
12
.
The matrix Q
2
on the righthand side must be orthogonal (Q
T
2
Q
2
= I), and R
22
must be upper triangular with positive diagonal. We can therefore interpret (3.14)
as the QR factorization of a matrix of size m(n 1).
80 3 Linear least-squares problems
Continuing recursively, we arrive at the QR factorization of an m 1 matrix,
which is straightforward: if A R
m1
, then it can be factored as A = QR with
Q = (1/|A|)A and R = |A|.
This algorithm is called the modied Gram-Schmidt method. Referring to the
notation of (3.12) we can summarize the algorithm as follows.
Algorithm 3.3 Modified Gram-Schmidt method for QR factorization.
given A R
mn
with rankA = n.
1. r11 = a1.
2. q1 = (1/r11)a1.
3. R12 = q
T
1
A2.
4. Compute the QR factorization A2 q1R12 = Q2R22.
It can be shown that the total cost is 2mn
2
.
In step 1, we can be assured that |a
1
| ,= 0, because A has rank n, so its rst
column is nonzero. The matrix A
2
q
1
R
12
in step 4 has rank n 1 if A has rank
n. Indeed, suppose x ,= 0. Then
(A
2
q
1
R
12
)x = A
2
x (1/r
11
)a
1
R
12
x
=
_
a
1
A
2

_
(1/r
11
)R
12
x
x
_
,= 0,
because A =
_
a
1
A
2

has rank n.
Example
We apply the modied Gram-Schmidt method to the matrix
A = (1/5)
_

_
3 6 26
4 8 7
0 4 4
0 3 3
_

_
.
We will denote the three columns as a
1
, a
2
, a
3
, and factor A as
_
a
1
a
2
a
3

=
_
q
1
q
2
q
3

_
_
r
11
r
12
r
13
0 r
22
r
23
0 0 r
33
_
_
=
_
q
1
r
11
q
1
r
12
+ q
2
r
22
q
1
r
13
+ q
2
r
23
+ q
3
r
33

,
where the vectors q
i
are mutually orthogonal and unit norm, and the diagonal
elements r
ii
are positive.
We start with q
1
and r
11
, which must satisfy a
1
= q
1
r
11
with |q
1
| = 1. There-
fore
r
11
= |a
1
| = 1, q
1
= (1/r
11
)a
1
=
_

_
3/5
4/5
0
0
_

_
.
3.6 Computing the QR factorization 81
Next we nd r
12
and r
13
by multiplying
_
a
2
a
3

=
_
q
1
r
12
+ q
2
r
22
q
1
r
13
+ q
2
r
23
+ q
3
r
33

on the left with q


T
1
and using q
T
1
q
1
= 1, q
T
1
q
2
= 0, q
T
1
q
3
= 0:
q
T
1
_
a
2
a
3

= q
T
1
_
q
1
r
12
+ q
2
r
22
q
1
r
13
+ q
2
r
23
+ q
3
r
33

=
_
r
12
r
13

.
We have just computed q
1
, so we can evaluate the matrix on the left, and nd
r
12
= 2, r
13
= 2. This concludes steps 13 in the algorithm.
The next step is to factor the matrix
_
a
2
a
3

=
_
a
2
q
1
r
12
a
3
q
1
r
13

=
_

_
0 4
0 3
4/5 4/5
3/5 3/5
_

_
as
_
a
2
a
3

=
_
q
2
q
3

_
r
22
r
23
0 r
33
_
=
_
q
2
r
22
q
2
r
23
+ q
3
r
33

. (3.15)
We start with q
2
and r
22
. We want a
2
= q
2
r
22
with |q
2
| = 1, so we take
r
22
= | a
2
| = 1, q
2
= (1/r
22
) a
2
=
_

_
0
0
4/5
3/5
_

_
.
Next, we determine r
23
from the second column in (3.15),
a
3
= q
2
r
23
+ q
3
r
33
.
If we multiply on the left with q
T
2
and use the fact that q
T
2
q
2
= 1, q
T
2
q
3
= 0, we
nd
q
T
2
a
3
= q
T
2
(q
2
r
23
+ q
3
r
33
) = r
23
.
We know q
2
and a
3
, so we can evaluate the inner product on the lefthand side,
which gives r
23
= 1.
It remains to determine q
3
and r
33
. At this point we know q
2
and r
23
, so we
can evaluate a
3
q
2
r
23
, and from the last column in (3.15),
a
3
q
2
r
23
=
_

_
4
3
0
0
_

_
= q
3
r
33
,
with |q
3
| = 1. Hence r
33
= 5 and
q
3
=
_

_
4/5
3/5
0
0
_

_
.
82 3 Linear least-squares problems
Putting everything together, the factorization of A is
(1/5)
_

_
3 6 26
4 8 7
0 4 4
0 3 3
_

_
=
_

_
3/5 0 4/5
4/5 0 3/5
0 4/5 0
0 3/5 0
_

_
_
_
1 2 2
0 1 1
0 0 5
_
_
.
3.7 Comparison of Cholesky and QR factorization
In this nal section we compare the relative merits of the two methods for solving
least-squares problems.
3.7.1 Accuracy
The Cholesky factorization method for solving least-squares problems is less accu-
rate than the QR factorization method, especially when the condition number of
the matrix A
T
A is high.
An example will illustrate the dierence. We take
A =
_
_
1 1
0 10
5
0 0
_
_
, b =
_
_
0
10
5
1
_
_
.
The normal equations are
_
1 1
1 1 + 10
10
_ _
x
1
x
2
_
=
_
0
10
10
_
, (3.16)
and it is easily veried that the solution is x
1
= 1, x
2
= 1. We will solve the problem
using both methods, but introduce small errors by rounding the intermediate results
to eight signicant decimal digits.
We rst consider the Cholesky factorization method. Rounding the elements of
A
T
A and A
T
b in (3.16) to eight digits yields
_
1 1
1 1
_ _
x
1
x
2
_
=
_
0
10
10
_
.
(Only one element changes: 1 + 10
10
is replaced by 1.) This set of equations is
unsolvable because the coecient matrix is singular, so the Cholesky factorization
method fails.
In the QR factorization method we start by factoring A as
A =
_
_
1 1
0 10
5
0 0
_
_
=
_
_
1 0
0 1
0 0
_
_
_
1 1
0 10
5
_
.
3.7 Comparison of Cholesky and QR factorization 83
Rounding to eight decimal digits does not change the values of Q and R. We then
form the equations Rx = Q
T
b:
_
1 1
0 10
5
_ _
x
1
x
2
_
=
_
0
10
5
_
.
Rounding to eight digits again does not change any of the values. We solve the
equations by backward substitution and obtain the solution x
1
= 1, x
2
= 1.
In this example, the QR factorization method nds the correct solution, while
the Cholesky factorization method simply fails.
3.7.2 Eciency
For dense matrices, the cost of the Cholesky factorization method is mn
2
+(1/3)n
3
,
while the cost of the QR factorization method is 2mn
2
. The QR factorization
method is slower by a factor of at most 2 (if m n). For small and medium-size
problems, the factor of two does no outweigh the dierence in accuracy, and the
QR factorization is the recommended method.
When A is large and sparse, the dierence in eciency can be much larger
than a factor of two. As we have mentioned in 2.7, there exist very ecient
methods for the Cholesky factorization of a sparse positive denite matrix. When
A is sparse, then usually A
T
A is sparse (and can be calculated in much less than
mn
2
operations), and we can solve the normal equations using a sparse Cholesky
factorization (at a cost much less than (1/3)n
3
).
Exploiting sparsity in the QR factorization method is much more dicult. The
matrices Q and R are usually quite dense, even when the matrix A is sparse.
This makes the method impractical for large sparse least-squares problems. The
Cholesky factorization method is therefore widely used when A is large and sparse,
despite its lower accuracy.
84 3 Linear least-squares problems
Exercises
Denition
3.1 Formulate the following problems as least-squares problems. For each problem, give a
matrix A and a vector b, such that the problem can be expressed as
minimize Ax b
2
.
(You do not have to solve the problems.)
(a) Minimize x
2
1
+ 2x
2
2
+ 3x
2
3
+ (x1 x2 + x3 1)
2
+ (x1 4x2 + 2)
2
.
(b) Minimize (6x2 + 4)
2
+ (4x1 + 3x2 1)
2
+ (x1 + 8x2 3)
2
.
(c) Minimize 2(6x2 + 4)
2
+ 3(4x1 + 3x2 1)
2
+ 4(x1 + 8x2 3)
2
.
(d) Minimize x
T
x +Bx d
2
where B R
pn
, and d R
p
are given.
(e) Minimize Bx d
2
+2Fx g
2
where B R
pn
, F R
ln
, d R
p
and g R
l
are given.
(f) Minimize x
T
Dx + Bx d
2
where D R
nn
is diagonal with positive diagonal
elements, B R
pn
, and d R
p
. D, B and D are given.
3.2 Formulate the following problem as a least-squares problem. Find a polynomial
p(t) = x1 + x2t + x3t
2
+ x4t
3
that satises the following conditions.
The values p(ti) at 4 given points ti in the interval [0, 1] should be approximately
equal to given values yi:
p(ti) yi, i = 1, . . . , 4.
The points ti are given and distinct (ti = tj for i = j). The values yi are also given.
The derivatives of p at t = 0 and t = 1 should be small:
p

(0) 0, p

(1) 0.
The average value of p over the interval [0, 1] should be approximately equal to the
value at t = 1/2:
_
1
0
p(t) dt p(1/2).
To determine coecients xi that satisfy these conditions, we minimize
E(x) =
1
4
4

i=1
(p(ti) yi)
2
+ p

(0)
2
+ p

(1)
2
+
__
1
0
p(t) dt p(1/2)
_
2
.
Give A and b such that E(x) = Ax b
2
. Clearly state the dimensions of A and b, and
what their elements are.
Applications
3.3 Moores law. The gure and the table show the number of transistors in 13 microproces-
sors, and the year of their introduction (from www.intel.com/technology/silicon/mooreslaw.htm).
Exercises 85
year transistors
1971 2,250
1972 2,500
1974 5,000
1978 29,000
1982 120,000
1985 275,000
1989 1,180,000
1993 3,100,000
1997 7,500,000
1999 24,000,000
2000 42,000,000
2002 220,000,000
2003 410,000,000 1970 1975 1980 1985 1990 1995 2000 2005
10
3
10
4
10
5
10
6
10
7
10
8
10
9
PSfrag replacements
t (year)
n
(
n
u
m
b
e
r
o
f
t
r
a
n
s
i
s
t
o
r
s
)
These numbers are also available as [t,n] = ch3ex3.m (t is the rst column of the table
(introduction year) and n is the second column (number of transistors)). The plot below
was produced by the Matlab command semilogy(t,n,o)).
The plot suggests that we can obtain a good t with a function of the form
n(t) =
tt
0
,
where t is the year, and n is the number of transistors. This is a straight line if we plot
n(t) on a logarithmic scale versus t on a linear scale. In this problem we use least-squares
to estimate the parameters and t0.
Explain how you would use least-squares to nd and t0 such that
ni
t
i
t
0
, i = 1, . . . , 13,
and solve the least-squares problem in Matlab. Compare your result with Moores law,
which states that the number of transistors per integrated circuit roughly doubles every
two years.
Remark. The Matlab command to solve a least-squares problem
minimize Ax b
2
is x = A\b, i.e., the same command as for solving a set of linear equations. In other
words, the meaning of the backslash operator depends on the context. If A is a square
matrix, then A\b solves the linear set of equations Ax = b; if A is rectangular with more
rows than columns, it solves the least-squares problem.
3.4 The gure shows a planar spiral inductor, implemented in CMOS, for use in RF circuits.
The inductor is characterized by four key parameters:
n, the number of turns (which is a multiple of 1/4, but that neednt concern us)
w, the width of the wire
d, the inner diameter
D, the outer diameter
86 3 Linear least-squares problems
PSfrag replacements
d
D
w
The inductance L of such an inductor is a complicated function of the parameters n,
w, d, and D. It can be found by solving Maxwells equations, which takes considerable
computer time, or by fabricating the inductor and measuring the inductance. In this
problem you will develop a simple approximate inductance model of the form

L = n

1
w

2
d

3
D

4
,
where , 1, 2, 3, 4 R are constants that characterize the approximate model. (Since
L is positive, we have > 0, but the constants 2, . . . , 4 can be negative.) This simple
approximate model, if accurate enough, can be used for design of planar spiral inductors.
The le ch3ex4.m contains data for 50 inductors, obtained from measurements. Download
the le, and execute it in Matlab using [n,w,d,D,L] = ch3ex4. This generates 5 vectors
n, w, d, D, L of length 50. The ith elements of these vectors are the parameters ni, wi
(in m), di (in m), Di (in m) and the inductance Li (in nH) for inductor i.. Thus, for
example, w13 gives the wire width of inductor 13.
Your task is to nd , 1, . . . , 4 so that

Li = n

1
i
w

2
i
d

3
i
D

4
i
Li for i = 1, . . . , 50.
Your solution must include a clear description of how you found your parameters, as well
as their actual numerical values.
Note that we have not specied the criterion that you use to judge the approximate model
(i.e., the t between

Li and Li); we leave that to your judgment.
We can dene the percentage error between

Li and Li as
ei = 100
|

Li Li|
Li
.
Find the average percentage error for the 50 inductors, i.e., (e1 + + e50)/50, for your
model. (We are only asking you to nd the average percentage error for your model; we
do not require that your model minimize the average percentage error.)
Remark. For details on solving least-squares problems in Matlab, see the remark at the
end of exercise 3.3.
3.5 We have N points in R
2
, and a list of pairs of points that must be connected by links.
The positions of some of the N points are xed; our task is to determine the positions of
the remaining points. The objective is to place the points so that some measure of the
total interconnection length of the links is minimized. As an example application, we can
think of the points as locations of plants or warehouses of a company, and the links as
the routes over which goods must be shipped. The goal is to nd locations that minimize
Exercises 87
the total transportation cost. In another application, the points represent the position of
modules or cells on an integrated circuit, and the links represent wires that connect pairs
of cells. Here the goal might be to place the cells in such a way that the the total length
of wire used to interconnect the cells is minimized.
The problem can be described in terms of a graph with N nodes, representing the N
points. With each free node we associate a variable (ui, vi) R
2
, which represents its
location or position.
In this problem we will consider the example shown in the gure below.
PSfrag replacements
(1, 0)
(0.5, 1)
(1, 0.5)
(0, 1)
(u1, v1)
(u2, v2)
(u3, v3)
l1
l2
l3
l4
l5
l6
l7
Here we have 3 free points with coordinates (u1, v1), (u2, v2), (u3, v3). We have 4 xed
points, with coordinates (1, 0), (0.5, 1), (0, 1), and (1, 0.5). There are 7 links, with
lengths l1, l2, . . . , l7. We are interested in nding the coordinates (u1, v1), (u2, v2) and
(u3, v3) that minimize the total squared length
l
2
1
+ l
2
2
+ l
2
3
+ l
2
4
+ l
2
5
+ l
2
6
+ l
2
7
.
(a) Formulate this problem as a least-squares problem
minimize Ax b
2
where the vector x R
6
contains the six variables u1, u2, u3, v1, v2, v3. Give the
coecient matrix A and the vector b.
(b) Show that you can also obtain the optimal coordinates by solving two smaller least-
squares problems
minimize

Au

b
2
, minimize

Av

b
2
,
where u = (u1, u2, u3) and v = (v1, v2, v3). Give the coecient matrices

A,

A and
the vectors

b and

b. What is the relation between

A and

A?
(c) Solve the least-squares problems derived in part (a) or (b) using Matlab.
3.6 Least-squares model tting. In this problem we use least-squares to t several dierent
types of models to a given set of input-output data. The data set consists of a scalar
input sequence u(1), u(2), . . . , u(N), and a scalar output sequence y(1), y(2), . . . , y(N),
with N = 100. The signals are shown in the following plots.
88 3 Linear least-squares problems
0 20 40 60 80 100
2
1
0
1
2
PSfrag replacements
t
u
(
t
)
y(t)
0 20 40 60 80 100
3
2
1
0
1
2
3
PSfrag replacements
t
u(t)
y
(
t
)
We will develop and compare seven dierent models that relate the signals u and y. The
models range in complexity from a simple constant to a nonlinear dynamic model:
(a) constant model: y(t) =
(b) static linear: y(t) = u(t)
(c) static ane: y(t) = + u(t)
(d) static quadratic: y(t) = + u(t) + u(t)
2
(e) linear, 2-tap: y(t) = 1u(t) + 2u(t 1)
(f) ane, 2-tap: y(t) = + 1u(t) + 2u(t 1)
(g) quadratic, 2-tap: y(t) = +1u(t)+1u(t)
2
+2u(t1)+2u(t1)
2
+u(t)u(t1).
The rst four models are memoryless. In a memoryless model the output at time t, i.e.,
y(t), depends only the input at time t, i.e., u(t). Another common term for such a model
is static.
In a dynamic model, y(t) depends on u(s) for some s = t. Models (e), (f), and (g)
are dynamic models, in which the current output depends on the current input and the
previous input. Such models are said to have a nite memory of length one. Another
term is 2-tap system (the taps refer to taps on a delay line).
Each of the models is specied by a number of parameters, i.e., the scalars , , etc. You
are asked to nd least-squares estimates ( ,

, . . . ) for the parameters, i.e., the values
that minimize the sum-of-squares of the errors between predicted outputs and actual
outputs. Your solutions should include:
a clear description of the least-squares problems that you solve
the computed values of the least-squares estimates of the parameters
a plot of the predicted output y(t)
a plot of the residual y(t) y(t)
the root-mean-square (RMS) residual, i.e., the squareroot of the mean of the squared
residuals.
For example, the ane 2-tap model (part (f)) depends on three parameters , 1, and
2. The least-squares estimates ,

1,

2 are found by minimizing
N

t=2
(y(t) 1u(t) 2u(t 1))
2
.
(Note that we start at t = 2 so u(t 1) is dened). You are asked to formulate this as a
least-squares problem, solve it to nd ,

1, and

2, plot the predicted output
y(t) = +

1u(t) +

2u(t 1),
Exercises 89
and the residual r(t) = y(t) y(t), for t = 2, . . . , N, and give the value of the RMS
residual
Rrms =
_
1
N 1
N

t=2
(y(t) y(t))
2
_
1/2
.
The data for the problem are availabe from the class webpage in the m-le ch3ex6.m.
The command is [u,y] = ch3ex6.
A nal note: the sequences u, y are not generated by any of the models above. They are
generated by a nonlinear recursion, with innite (but rapidly fading) memory.
3.7 The gure shows an illumination system of n lamps illuminating m at patches. The
variables in the problem are the lamp powers x1, . . . , xn, which can vary between 0 and 1.
PSfrag replacements lamp j
patch i
rij
ij
The illumination intensity at (the midpoint of) patch i is denoted Ii. We will use a simple
linear model for the illumination intensities Ii as a function of the lamp powers xj: for
i = 1, . . . , m,
Ii =
n

j=1
aijxj.
The matrix A (with coecients aij) is available from the class webpage (see below), and
was constructed as follows. We take
aij = r
2
ij
max{cos ij, 0},
where rij denotes the distance between lamp j and the midpoint of patch i, and ij denotes
the angle between the upward normal of patch i and the vector from the midpoint of patch
i to lamp j, as shown in the gure. This model takes into account self-shading (i.e., the
fact that a patch is illuminated only by lamps in the halfspace it faces) but not shading of
one patch caused by another. Of course we could use a more complex illumination model,
including shading and even reections. This just changes the matrix relating the lamp
powers to the patch illumination levels.
The problem is to determine lamp powers that make the illumination levels Ii close to a
given desired level I
des
. In other words, we want to choose x R
n
such that
n

j=1
aijxj I
des
, i = 1, . . . , m,
but we also have to observe the power limits 0 xj 1. This is an example of a
constrained optimization problem. The objective is to achieve an illumination level that is
as uniform as possible; the constraint is that the components of x must satisfy 0 xj 1.
Finding the exact solution of this minimization problem requires specialized numerical
techniques for constrained optimization. However, we can solve it approximately using
least-squares.
In this problem we consider two approximate methods that are based on least-squares,
and compare them for the data generated by [A,Ides] = ch3ex7. The elements of A are
the coecients aij. In this example we have m = 11, n = 7, so A R
117
, and I
des
= 2.
90 3 Linear least-squares problems
(a) Saturate the least-squares solution. The rst method is simply to ignore the bounds
on the lamp powers. We solve the least-squares problem
minimize
m

i=1
(
n

j=1
aijxj I
des
)
2
ignoring the constraints 0 xj 1. If we are lucky, the solution will satisfy the
bounds 0 xj 1, for j = 1, . . . , n. If not, we replace xj with zero if xj < 0 and
with one if xj > 1.
Apply this method to the problem data generated by hw5prob2.m, and calculate the
resulting value of the cost function

m
i=1
(Ii I
des
)
2
.
(b) Weighted least-squares. The second method is to solve the problem
minimize
m

i=1
(
n

j=1
aijxj I
des
)
2
+
n

j=1
(xj 0.5)
2
,
where the constant 0 is used to attach a cost to the deviation of the powers from
the value 0.5, which lies in the middle of the power limits. For = 0, this is the
same least-squares problem as in part (a). If we take large enough, the solution
of this problem will satisfy 0 xj 1.
Formulate this problem as a least-squares problem in the variables x, and solve it
for = 1, = 2, = 3, etc., until you nd a value of such that all components
of the solution x satisfy 0 xj 1. For that solution x, calculate the cost function

m
i=1
(Ii I
des
)
2
and compare with the value you obtained in part (a).
3.8 De-noising using least-squares. The gure shows a signal of length 1000, corrupted with
noise. We are asked to estimate the original signal. This is called signal reconstruction,
or de-noising, or smoothing. In this problem we apply a smoothing method based on
least-squares.
0 200 400 600 800 1000
0.5
0
0.5
PSfrag replacements
i
x
c
o
r
r
We will represent the corrupted signal as a vector xcor R
1000
. (The values can be
downloaded from the class webpage, as xcor = ch3ex8.) The estimated signal (i.e., the
variable in the problem) will be represented as a vector x R
1000
.
The idea of the method is as follows. We assume that the noise in the signal is the small
and rapidly varying component. To reconstruct the signal, we decompose xcor in two
parts
xcor = x + v
where v is small and rapidly varying, and x is close to xcor ( x xcor) and slowly varying
( xi+1 xi). We can achieve such a decomposition by choosing x as the solution of the
least-squares problem
minimize x xcor
2
+

999
i=1
(xi+1 xi)
2
, (3.17)
Exercises 91
where is a positive constant. The rst term x xcor
2
measures how much x deviates
from xcor. The second term,

999
i=1
(xi+1 xi)
2
, penalizes rapid changes of the signal
between two samples. By minimizing a weighted sum of both terms, we obtain an estimate
x that is close to xcor (i.e., has a small value of x xcor
2
) and varies slowly (i.e., has a
small value of

999
i=1
( xi+1 xi)
2
). The parameter is used to adjust the relative weight
of both terms.
Problem (3.17) is a least-squares problem, because it can be expressed as
minimize Ax b
2
where
A =
_
I

D
_
, b =
_
xcor
0
_
,
and D is dened as
D =
_

_
1 1 0 0 0 0 0 0
0 1 1 0 0 0 0 0
0 0 1 1 0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 1 1 0 0
0 0 0 0 0 1 1 0
0 0 0 0 0 0 1 1
_

_
R
9991000
.
The matrix A is quite large (1999 1000), but also very sparse, so we will solve the least-
squares problem using the Cholesky factorization method instead of the QR-factorization
method. You should verify that the normal equations are given by
(I + D
T
D)x = xcor. (3.18)
Matlab (but not Octave) provides special routines for solving sparse linear equations, and
they are used as follows. There are two types of matrices: full (or dense) and sparse. If
you dene a matrix, it is considered full by default, unless you specify that it is sparse.
You can convert a full matrix to sparse format using the command A = sparse(A), and
a sparse matrix to full format using the command A = full(A).
When you type x = A\b where A is nn, Matlab chooses dierent algorithms depending
on the type of A. If A is full it uses the standard method for general matrices (LU or
Cholesky factorization, depending on whether A is symmetric positive denite or not).
If A is sparse, it uses an LU or Cholesky factorization algorithm that takes advantage of
sparsity. In our application, the matrix I +D
T
D is sparse (in fact tridiagonal), so if we
make sure to dene it as a sparse matrix, the normal equations will be solved much more
quickly than if we ignore the sparsity.
Matlabs command to create a sparse zero matrix of dimension mn is A = sparse(m,n).
The command A = speye(n) creates a sparse nn-identity matrix. If you add or multiply
sparse matrices, the result is automatically considered sparse.
This means you can solve the normal equations (3.18) by the following Matlab code
(assuming and xcor are dened):
D = sparse(999,1000);
D(:,1:999) = -speye(999);
D(:,2:1000) = D(:,2:1000) + speye(999);
xhat = (speye(1000) + mu*D*D) \ xcor;
If you are using Octave, you can use the dense counterparts of these commands:
D = zeros(999,1000);
D(:,1:999) = -eye(999);
D(:,2:1000) = D(:,2:1000) + eye(999);
xhat = (eye(1000) + mu*D*D) \ xcor;
92 3 Linear least-squares problems
This may be slower but it works.
Solve the least-squares problem (3.17) with the vector xcor dened in ch3ex8.m, for three
values of : = 1, = 100, and = 10000. Plot the three reconstructed signals x.
Discuss the eect of on the quality of the estimate x.
The solution of a least-squares problem
3.9 Are the following matrices full rank?
(a) A =
_
1 2
3 6
2 1
_
(b) A =
_
1 3 2
2 6 1
_
(i.e., the transpose of the matrix in part (a))
(c) A =
_
D
B
_
, where B R
mn
and D R
nn
is diagonal with nonzero diagonal
elements. We make no assumptions about the rank of B.
(d) A = I U where U R
nn
with U < 1.
3.10 Let A R
mn
with m n and rank(A) = n.
(a) Show that the (m + n) (m + n) matrix
_
I A
A
T
0
_
is nonsingular.
(b) Show that the solution x R
m
, y R
n
of the set of linear equations
_
I A
A
T
0
__
x
y
_
=
_
b
0
_
is given by x = b Ax
ls
and y = x
ls
where x
ls
is the solution of the least-squares
problem
minimize Ax b
2
.
3.11 Consider the set of p + q linear equations in p + q variables
_
I A
A
T
I
__
y
x
_
=
_
b
c
_
.
A R
pq
, b R
p
, and c R
q
are given. The variables are x R
q
and y R
p
.
(a) Show that the coecient matrix
_
I A
A
T
I
_
is nonsingular, regardless of the rank and the dimensions of A.
(b) From part (a) we know that the solution x, y is unique. Show that x minimizes
Ax b
2
+x + c
2
.
Exercises 93
The QR factorization
3.12 QR factorization. What is the QR factorization of the matrix
A =
_
2 8 13
4 7 7
4 2 13
_
?
You can use Matlab to check your answer, but you must provide the details of all inter-
mediate steps on paper.
3.13 Explain how you can solve the following problems using the QR factorization.
(a) Find x R
n
that minimizes
Ax b1
2
+Ax b2
2
.
The problem data are A R
mn
, b1 R
m
and b2 R
m
. The matrix A has rank n.
If you know several methods, give the most ecient one.
(b) Find x1 R
n
and x2 R
n
that minimize
Ax1 b1
2
+Ax2 b2
2
.
The problem data are A R
mn
, b1 R
m
, and b2 R
m
. rank(A) = n.
3.14 Cholesky factorization versus QR factorization. In this problem we compare the accuracy
of the two methods for solving a least-squares problem
minimize Ax b
2
.
We will take
A =
_
_
1 1
10
k
0
0 10
k
_
_
, b =
_
_
10
k
1 + 10
k
1 10
k
_
_
,
for k = 6, k = 7 and k = 8.
(a) Write the normal equations, and solve them analytically (i.e., on paper, without
using Matlab).
(b) Solve the least-squares problem in Matlab, for k = 6, k = 7 and k = 8, using the
recommended method x = A\b. This method is based on the QR factorization.
(c) Repeat part (b), using the Cholesky factorization method, i.e., x = (A*A)\(A*b).
(We assume that Matlab recognizes that A
T
A is symmetric positive denite, and
uses the Cholesky factorization to solve A
T
Ax = A
T
b). Compare the results of this
method with the results of parts (a) and (b).
Remark. Type format long to make Matlab display more than ve digits.
3.15 Suppose x is the solution of the least-squares problem
minimize Ax b
2
where A is an mn matrix with rank(A) = n and b R
m
.
(a) Show that the solution of the problem
minimize Ay b
2
+ (c
T
y d)
2
with variable y (where c R
n
, and d R) is given by
y = x +
d c
T
x
1 + c
T
(A
T
A)
1
c
(A
T
A)
1
c.
94 3 Linear least-squares problems
(b) Describe an ecient method for computing x and y, given A, b, c and d, using the
QR factorization of A. Clearly describe the dierent steps in your algorithm. Give
a op count for each step and a total op count. In your total op count, include
all terms that are cubic (n
3
, mn
2
, m
2
n, m
3
) and quadratic (m
2
, mn, n
2
).
If you know several methods, give the most ecient one.

You might also like