Matrix

Linear regression models
in matrix terms
The regression function

in matrix terms
Simple linear regression function

Yi 0 1 x i i
for i = 1,, n
Y1 0 1 x1 1
Y2 0 1 x 2 2
Yn 0 1 x n n
Simple linear regression function

in matrix notation
Y1
Y
2

Yn
1 x1
1
1 x

2
2
0

1 xn
n
Y X
Definition of a matrix
An rc matrix is a rectangular array of symbols or
numbers arranged in r rows and c columns.
A matrix is almost always denoted by a single capital
letter in boldface type.
1 2
A
6
3
1 80 3.4
1 92 3.1
B 1 65 2.5
1 71 2.8
1 40 1.9
x12
x 22
x32
1 x 41
1 x51
1 x61
x 42
x52
x62
1 x11
1 x
21
1 x31
Definition of a vector and a scalar

A column vector is an r1 matrix, that is, a matrix
with only one column.
2
q 5
8
A row vector is an 1c matrix, that is, a matrix with
only one row.
h 21 46 32 90
A 11 matrix is called a scalar, but its just an

ordinary number, such as 29 or 2.
Matrix multiplication
Y X
The X in the regression function is an example of
matrix multiplication.
Two matrices can be multiplied together:
Only if the number of columns of the first matrix equals
the number of rows of the second matrix.
The number of rows of the resulting matrix equals the
number of rows of the first matrix.
The number of columns of the resulting matrix equals
the number of columns of the second matrix.
If A is a 23 matrix and B is a 35 matrix
then matrix multiplication AB is possible.
The resulting matrix C = AB has rows
and columns.
Is the matrix multiplication BA possible?
If X is an np matrix and is a p1 column
vector, then X is
3 2 1 5
1 9 7
90 101 106 88
C AB
5
4
7
3
41 38 27 59
8
1
2
6 9 6 8
The entry in the ith row and jth column of C is the inner
product (element-by-element products added together)
of the ith row of A with the jth column of B.
c11 1(3) 9(5) 7(6) 90
c23 8(1) 1(7) 2(6) 27
c12 1(2) 9(4) 7(9) 101
c24
The X multiplication in
simple linear regression setting
1 x1
1 x
2
0
1 xn
Matrix addition
Y X
The X+ in the regression function is an example
of matrix addition.
Simply add the corresponding elements of the two
matrices.
For example, add the entry in the first row, first column
of the first matrix with the entry in the first row, first
column of the second matrix, and so on.
Two matrices can be added together only if they

have the same number of rows and columns.
Matrix addition
2 4 1 7 5 2 9 9 1
C A B 1 8 7 9 3 1 10 5 8
3 5 6 2 1 8 5 6 14
For example:
c11 2 7 9
c12 4 5 9
c23
The X+ addition in the

Y1
0 1 x1 1
Y
x
2
0
1 2
2
Y
X

Yn
0 1 x n n
Multiple linear regression function

in matrix notation
Y1
Y
2

Yn
1 x11
1 x
21
1 x n1
x12
x 22
xn2
x13
x 23

xn3
Y X
1

2

n
Least squares estimates

of the parameters

The p1 vector containing the estimates of the p
parameters can be shown to equal:
b0
b
1

b p 1
X X X Y
1
where (X'X)-1 is the inverse of the X'X matrix and

X' is the transpose of the X matrix.
Definition of
the transpose of a matrix
The transpose of a matrix A is a matrix, denoted
A' or AT, whose rows are the columns of A and
whose columns are the rows of A all in the same
original order.
1 5
A 4 8
7 9
A A
The X'X matrix in the

1
X X
x1
1
x2
1 x1
1 1 x2
xn
1 xn
Definition of the identity matrix

The (square) nn identity matrix, denoted In, is a
matrix with 1s on the diagonal and 0s elsewhere.
1 0
I2
0 1
The identity matrix plays the same role as the
number 1 in ordinary arithmetic.
9 7 1 0
4 6 0 1
Definition of the inverse of a matrix

The inverse A-1 of a square (!!) matrix A is the
unique matrix such that
A A I AA
Find X'X.

in simple linear regression setting
xi
soap
4.0
4.5
5.0
5.5
6.0
6.5
7.0
--38.5
yi
x i yi
suds so*su
33 132.0
42 189.0
45 225.0
51 280.5
53 318.0
61 396.5
62 434.0
--- ----347 1975.0
x i2
soap2
16.00
20.25
25.00
30.25
36.00
42.25
49.00
----218.75
b0
1
b X X X Y ?
b1
1
X X
x1
1 1
x 2 x n
1 x1
1 x
2
1 xn
Find inverse of X'X.

Its very messy to determine inverses by hand. We let
computers find inverses for us.
X X X X
38.5
7
38
.
5
218
.
75
4.4643 0.78571
0.78571 0.14286
Therefore:
X X
38.5
7
38
.
5
218
.
75
4.4643 0.78571
0
.
78571
0
.
14286
Find X'Y.

xi
soap
4.0
4.5
5.0
5.5
6.0
6.5
7.0
--38.5
yi
x i yi
suds so*su
33 132.0
42 189.0
45 225.0
51 280.5
53 318.0
61 396.5
62 434.0
--- ----347 1975.0
x i2
soap2
16.00
20.25
25.00
30.25
36.00
42.25
49.00
----218.75
b0
1
b X X X Y ?
b1
1
X Y
x1
1 1
x 2 x n
y1
y
2

yn
Least squares estimates in simple

linear regression setting
b X X
4.4643 0.78571
X Y
0.78571 0.14286
347
1975
b0 4.4643(347) 0.78571(1975) 2.67

b
0
.
78571
(
347
)
0
.
14286
(
1975
)
9
.
51

1
The regression equation is
suds = - 2.68 + 9.50 soap
Linear dependence
The columns of the matrix:
1 2 4 1
A 2 1 8 6
3 6 12 3
are linearly dependent, since (at least) one of the

columns can be written as a linear combination of
another.
If none of the columns can be written as a linear
combination of another, then we say the columns are
linearly independent.
Linear dependence
is not always obvious
1 4 1
A 2 3 1
3 2 1
Formally, the columns a1, a2, , an of an nn matrix are
linearly dependent if there are constants c1, c2, , cn,
not all 0, such that:
c1a1 c2 a 2 cn a n 0
Implications of linear dependence

on regression
The inverse of a square matrix exists only if
the columns are linearly independent.
Since the regression estimate b depends on
(X'X)-1, the parameter estimates b0, b1, ,
cannot be (uniquely) determined if some of
the columns of X are linearly dependent.
The main point

about linear dependence
If the columns of the X matrix (that is, if
two or more of your predictor variables) are
linearly dependent (or nearly so), you will
run into trouble when trying to estimate the
regression function.
Implications of linear dependence

on regression
soap1
4.0
4.5
5.0
5.5
6.0
6.5
7.0
soap2 suds
8
33
9
42
10
45
11
51
12
53
13
61
14
62
* soap2 is highly correlated with other X variables

* soap2 has been removed from the equation
The regression equation is
suds = - 2.68 + 9.50 soap1
Fitted values and residuals
Fitted values
y1 b0 b1 x1
y b b x
2
0
1 2
y n b0 b1 x n
Fitted values
The vector of fitted values
y Xb X X X X y
1
is sometimes represented as a function of the

hat matrix H
H X X X X
1
That is:
X X X
y
X y Hy
The residual vector

e i yi y i
e1 y1 y1
e2 y2 y 2
en yn y n
for i = 1,, n
e1
e
2

en
y1 y1
y y
2
2
yn y n
The residual vector written as a

function of the hat matrix
e1
e
2
y1 y1
y y
2
2
en
yn y n
Sum of squares
and the analysis of variance table
Analysis of variance table

in matrix terms
Source
Regression
DF
SS
MS
p-1
SSR b X Y Y JY
n
SSR
p 1
MSR
MSE
Error
n-p
SSE Y Y bX Y
Total
n-1
SSTO Y Y
1
Y JY
n
SSE
n p
Sum of squares
In general, if you pre-multiply a vector by its transpose,
you get a sum of squares.
yy y1
y2
y1
y
n
2
yn y12 y22 yn2 yi2

i 1

yn
Error sum of squares

n
SSE yi y i
i 1
Error sum of squares

SSE y y y y
'
Total sum of squares

Previously, wed write:
SSTO y i y yi
2
But, it can be shown that equivalently:
1
SSTO Y Y Y JY
n
where J is a (square) nn matrix containing all 1s.
An example of
total sum of squares
If n = 2:
i 1
Yi
Y1 Y2 Y12 2Y1Y2 Y22

2
But, note that we get the same answer by:

Y JY Y1
1 1 Y1
2
2
Y2
2
Y
Y
Y
1
1 2
2
Y
1
1
Analysis of variance table

in matrix terms
Source
Regression
DF
SS
MS
p-1
SSR b X Y Y JY
n
SSR
p 1
MSR
MSE
Error
n-p
SSE Y Y bX Y
Total
n-1
SSTO Y Y
1
Y JY
n
SSE
n p
Model assumptions
Error term assumptions

As always, the error terms i are:
independent
normally distributed (with mean 0)
with equal variances 2
Now, how can we say the same thing using

matrices and vectors?
Error terms as a random vector

The n1 random error term vector, denoted as ,
is:
1

2
The mean (expectation) of the

random error term vector
The n1 mean error term vector, denoted as E(),
is:
E 1 0
E 0
2
E
0

E n 0
Definition
Assumption
Definition
The variance of the

The nn variance matrix, denoted as 2(), is
defined as:
1 2 1 ( 1 , 2 ) ( 1 , n )
2
2 ( 1 , 2 ) 2
( 2 , n )
2
2

2

n ( 1 , n ) ( 2 , n ) n
Diagonal elements are variances of the errors.

Off-diagonal elements are covariances between errors.
The ASSUMED variance of the

BUT, we assume error terms are independent
(covariances are 0), and have equal variances (2).
1 2 0 0

2
2 0
0
2
n 0
0
Scalar by matrix multiplication

Just multiply each element of the matrix by the scalar.
For example:
1 4 0 2 8 0
2 7 6 5 14 12 10
1 3 2 2 6 4
The ASSUMED variance of the

2 0 0
2
0
0
2
2
0
0
The general linear regression model

Putting the regression function and assumptions all
together, we get:
Y X
where:
Y is a (
) vector of response values
is a (
) vector of unknown parameters
X is an (
) matrix of predictor values
is an (
) vector of independent, normal error
terms with mean 0 and (equal) variance 2I.

Matrix

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matrix

Uploaded by

Copyright:

Available Formats

Linear regression models

The regression function

Simple linear regression function

Simple linear regression function

Definition of a vector and a scalar

A 11 matrix is called a scalar, but its just an

c23 8(1) 1(7) 2(6) 27

c12 1(2) 9(4) 7(9) 101

Two matrices can be added together only if they

The X+ addition in the

Multiple linear regression function

Least squares estimates

Least squares estimates

where (X'X)-1 is the inverse of the X'X matrix and

The X'X matrix in the

Definition of the identity matrix

Definition of the inverse of a matrix

Least squares estimates

Find inverse of X'X.

Least squares estimates

Least squares estimates

Least squares estimates in simple

b0 4.4643(347) 0.78571(1975) 2.67

are linearly dependent, since (at least) one of the

Implications of linear dependence

The main point

Implications of linear dependence

* soap2 is highly correlated with other X variables

Fitted values and residuals

is sometimes represented as a function of the

The residual vector

The residual vector written as a

Analysis of variance table

Error sum of squares

Error sum of squares

Total sum of squares

But, it can be shown that equivalently:

Y1 Y2 Y12 2Y1Y2 Y22

But, note that we get the same answer by:

Analysis of variance table

Error term assumptions

Now, how can we say the same thing using

Error terms as a random vector

The mean (expectation) of the

The variance of the

Diagonal elements are variances of the errors.

The ASSUMED variance of the

Scalar by matrix multiplication

The ASSUMED variance of the

The general linear regression model

) vector of response values

) vector of unknown parameters

) matrix of predictor values

You might also like