You are on page 1of 1

A derivation of the OLS estimator (𝜷)̂

Definitions:

1 𝑥 𝑥 ⋯ 𝑥 𝑥
𝑦
⎡1 𝑥 𝑥 ⋯ 𝑥 ⎤ ⎡𝑥 ⎤
𝒚≡ ⋮ 𝑿≡⎢
⎢⋮ ⎥
⎥ = [𝟏 𝒙 𝒙 ⋯ 𝒙 ] where 𝒙 ≡ ⎢ ⋮ ⎥
𝑦 ⋮ ⋮ ⋱ ⋮
× ⎣1 𝑥 𝑥 ⋯ 𝑥 ⎦ ⎣𝑥 ⎦ ×
×( + )

𝛽 𝑢
⎡𝛽 ⎤ ⎡𝑢 ⎤
𝜷≡⎢
⎢⋮ ⎥
⎥ 𝒖≡⎢ ⋮ ⎥
⎣𝛽 ⎦( ⎣𝑢 ⎦ ×
+ )×

Let the population model be 𝒚 = 𝑿𝜷 + 𝒖. The estimated equation is 𝒚̂ = 𝑿𝜷.̂

We can thus write 𝒖 = 𝒚 − 𝒚̂ = 𝒚 − 𝑿𝜷.̂

The sum of squared residuals is

𝑆 𝜷̂ ≡ 𝑢̂ = 𝒖 𝒖 = 𝒚 − 𝑿𝜷 ̂ 𝒚 − 𝑿𝜷 ̂ = 𝒚 − 𝜷 ̂ 𝑿 𝒚 − 𝑿𝜷 ̂
=
= 𝒚 𝒚 − 𝒚 𝑿𝜷 ̂ − 𝜷 ̂ 𝑿 𝒚 + 𝜷 ̂ 𝑿 𝑿𝜷.̂

The expressions in red are equal scalars. We can thus write them together as −2𝒚 𝑿𝜷 ̂ and re-express 𝑆 𝜷 ̂ :

𝑆 𝜷 ̂ = 𝒚 𝒚 − 2𝒚 𝑿𝜷 ̂ + 𝜷 ̂ 𝑿 𝑿𝜷.̂

We want to choose the 𝜷 ̂ that minimizes 𝑆 𝜷 ̂ . The first order condition is

𝜕𝑆 𝜷 ̂
=𝟎 ×( + ) ⟹ −2𝒚 𝑿 + 2𝜷 ̂ 𝑿 𝑿 = 𝟎 ×( + ) .
𝜕𝜷 ̂

[Note: By rules of matrix differentiation, 𝜕 −2𝒚 𝑿𝜷 ̂ /𝜕𝜷 ̂ = −2𝒚 𝑿 and 𝜕(𝜷 ̂ 𝑿 𝑿𝜷)/𝜕𝜷
̂ ̂ = 2𝜷 ̂ 𝑿 𝑿.]

Manipulating the first order condition above, we get

𝜷 ̂ 𝑿 𝑿 = 𝒚 𝑿.

Getting the transpose of both sides of the equation above yields

𝑿 𝑿𝜷 ̂ = 𝑿 𝒚.

Pre-multiplying both sides by (𝑿 𝑿)− , we get

𝜷 ̂ = (𝑿 𝑿)− 𝑿 𝒚 ∎

You might also like