Professional Documents
Culture Documents
Examining the concept of expectation will provide us with a better feel for the concepts
of means, variances and covariances.
Consider a discrete random variable Y with a probability distribution p(y). The random
variable Y is said to be discrete if there is associated with Y a finite set of points
having positive probabilities which sum to unity. The probability distribution is
simply defined as: the set of all pairs of (y, p(y)) for which p(y) > 0. You might think
of a chlorophyll mutant which segregates 1:2:1 in the F2 for the phenotypes green,
yellow, and albino.
y
P(y)
.25
1
2
.5
.25
The probability
that y take on
the value 0 is
0.25, etc.
or in this example
E(Y) = 0(.25) + 1(0.5) + 2(.25) = 1.0
We call this the mean of the distribution and usually symbolize it with
E (Y ) = You are probably most familiar with unweighted means where P(y) = (1/n)
Consider two random variables X and Y that are jointly distributed. The joint (or
bivariate) probability distribution for X and Y is given by:
P(x,y) = P (X=x, Y=y)
It is useful to find a measure of association between X and Y. The covariance is such a
measure and is defined as follows:
COV (X,Y) = E( (X-E(X)) (Y-E(Y))
This reduces to :
E(X,Y) E(X) E(Y) = E(X,Y) -xy
This quantity may not always be easy to evaluate as stated. However, you will often be
given information about the two random variables. For example you may be told that X
and Y are independent, which means that their joint expectation is the product of their
individual expectations, or:
E(X,Y) = E(X) E(Y), and thus the covariance of X and Y is zero. Covariance is usually
denoted xy.
Note the similarity between this expression for covariance (E(X,Y) -xy )and the
numerator in the expression for a correlation coefficient. In fact the extimator for xy is
Cov ( X ,Y ) =
and
xy =
n ( xy x y )
n 1
1
xy
n
We know that if 2 variables are uncorrelated they are independent and thus the numerator
in the correlation coefficient would be zero. This agrees with the foregoing on
covariance.
Variance of a Linear Function of Random Variables
1. Assume that X and Y are normally distributed random variables with respective
means 1 and 2 and variance . Assume that X and Y are independent, then
Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X,Y)
Since X and Y are independent, Cov(X,Y) = 0 and Var(X) + Var(Y) = 22
2.
Var (CX) where C is a constant and X has variance 2 = Var (CX) = C22
Nitrogen ( x )
30
20
60
80
40
50
60
30
70
60
500
Yield (y)
73
50
128
170
87
108
135
69
148
132
1100
xy
2190
1000
7680
13600
3480
5400
8100
2070
10360
7920
61800
x2
900
400
3600
6400
1600
2500
3600
900
4900
3600
28400
y2
5329
2500
16384
28900
7569
11664
18225
4761
21904
17424
134660
E(Yi)=104
E(Y)=9.5
+2.1X
25
45
The least squares estimators of the parameters 0 and 1, i.e., those that minimize Q are:
b1 =
( X i X )(Yi Y )
( X i X ) 2
b0 = Y b1 X
Using the data from the table above:
( X i X )(Yi Y )
b1 =
( X i X ) 2
b0 = Y b1 X =
(500)(1100
, )
10
= 2.0
(500) 2
28, 400
10
61, 800
=
1
[1100
,
(2.0)(500)] = 10.0
10
In terms of deviations:
yi y = ( y y ) + ( yi yi ) , or in words
Total
deviations
Deviation Of
fitted regression.
value from mean
Deviation around
regression line
( yi y )2 = ( y y )2 + ( yi yi )2
Computing formulas:
SSTO = yi2 ny 2
SSR = 12 ( xi x )2
SSE = SSTO SSR
r=
cov x, y
var( x)var( y)
x y
x y i i
i i
n
=
(x )2
(y )2
2
2
i )(y
i )
(x
i
i
n
n
=
cov xy
cov xy
Cov(X,Y)/
var y
6800
3400
13660
SQRT (VAR)
58.3095189
116.876003
SQRT(VAR(X)
R=
var x
VAR(Y)
SQRT(VAR(X)
=6800/6814.92
=0.99781069
VAR(Y)
=6814.92268