You are on page 1of 1

Regression 29

Sometime-:, however, it is llso useful to quantify that association. Tltis is


traditionally done using regression analysis. For example, in the instance of the
association between CaO and A13OJ in the torulitea and trondhjeroites of Tabic
2.2 the question If the CaO concentration were 3.5 wt %, what would be the
concentration of AJjO)?' can be answered by calculating the regression equation
for the variables CaO and A12OJ.
The quantification of an association is earned out by fitting a straight line
through Use data and finding the equation of that Line. I ke equation for a
straight line relating variables x and y is
y = a + bx . [2.4]
The constant <s is the value oty given by the straight line at x = 0. The
constant t> is the slope of :he line and shows the number of units increase (or
decrease) 107 that accompanies an increase in one unit of x. The constants a
and k are determined by fining the straight line to the data. The relation above
is ideal and docs not allaw for any deviation from the line. However, in realiry
this is noe die case for most observations are made with some error; often
the data form 1 cloud of points to which a straight line must be fitted. It is this
which introduces some uncertainty to line-fitting procedures and has resulted in
a number of alternative approaches. Regression analysis is the subject of a
number of statistical texts (e.g. Draper and Smith, 1981) and a useful review of
fitting procedures in the earth sciences is given by Troutman and Williams
(1987). Below some of the more popular forms of regression are described.

2.*.1 Ordinary least squares regression

Ordinary least squares recession is traditionally one of die most commonly


used line-fitting techniques in geochemistry because it is relatively simple to
use and because computer software with which to perform the calculations is
generally readily available. Unfortunately, it is often not appropriate.
The least squares best-fit line is constructed so that the sum of the squares
of the vertical deviations about the toe is a minimum. In this case the variable r
is the independent (non-random) variable and b assumed ic have a very small
error; >, on the other lurid, is die dependent variable (the random variable),
with errors an order of magnitude or more greater than the errors on x, and is
to be determined from values of x. In this case we say that y is regressed on x
(Figure 2.2a). It is possible to regress x on y and in this case the best-nc line
minimizes the sum of die squares of the horizontal deviations about the line
(Figure 2.2b). Thus there arc two possible regression lines for the same data, a
rather unsatisfactory situation for physical scientisis who prefer a unique line.
The two lines intersect at the mean of the sample (Figure 2.2c) and approach
each other as the value of the correlation coefficient (r) increases until they
coincide at r = 1 .
In the care of ordinary least squares regression, where y is regressed on x,
the value of the intercept, s, may be computed from:
= 9-K [2.5]
where i and j are the mear. values for variables x and y and f> is the slope of
the line. The slope b is computed from
[2.6]
* = riS/S,)

You might also like