You are on page 1of 10

Introduction

A regression analysis is a useful tool in the hands of a capable manager. By describing the
relationship between different variables, regressions can help managers understand how the
business works and make useful predictions about its development.
This report will be based on the topic: Multiple Regression Analysis. Multiple Regression
analysis is a statistical technique that predicts values of one variable on the basis of two or more
other variables.
It will be used to estimate the demand for a commodity from the years 1981 to 1995, using three
explanatory variables (price, consumers income and price of substitutes). And, meaningful
decisions will be made from the findings of this research.

The Ordinary Least Squares Method (OLS)


Regressions are used to determine the relationships between a dependent variable and one or
more independent or explanatory variables. A simple regression is concerned with the
relationship between a dependent variable and a single independent variable; a multiple
regression is concerned with the relationship between a dependent variable and a series
independent variables.
The Ordinary Least Squares Method is an estimation procedure used to find and parameters
such that the regression equation best summarizes the relationship between the Xi values and the
dependent variable Y. The regression analysis is used to produce an equation that will predict a
dependent variable using one or more independent variables. The OLS method assumes that the
equation is linear, thus it has the form

where Y is the dependent variable you are trying to predict, X1, X2 and so on are the
independent variables you are using to predict it, 1, 2 and so on are the coefficients or
multipliers that describe the size of the effect the independent variables are having on your
dependent variable Y, and there is also an error term associated with the observation.
OLS is a mathematical process which chooses the intercept and the slope of line of best fit such
that the sum of the squares of the deviations (or errors) is minimized .In other words, the OLS
formula minimizes what is known as noise, or the deviation of the actual data from our model,
which is the straight line through the data points. Using OLS the errors are squared to avoid the
positive deviations (points above the line of best fit) cancelling out against the negative
deviations (points below the line of best fit), and to weigh more heavily the larger deviations.
This method is based on a number of statistical assumptions that need to be satisfied in order to
produce unbiased estimators and additional assumptions must be satisfied in order to produce
favorable results. These are:
o The model is correctly specified( the variables are relevant to the theory of the model)
o The model is linear in its parameters
o The errors are homoscedastic meaning that dependent variable for all the data have a
constant variance; it states that deviations from the line of best fit occur randomly with
respect to the magnitude of the independent variables
o Data is derived from a normally distributed population
o The independent variables are not too strongly collinear
o The independent variables are measured precisely such that measurement error is
negligible
o The expected values of the residuals is always zero

o The data are a random sample of the population, that is, residuals are statistically
independent/uncorrelated from each other
The benefits of using the Ordinary Least Squares method are:
o If the expected value of the residuals is always zero, then the OLS estimator is unbiased
o If the residuals have homogeneous variance (homoscedastic), then the OLS estimator has
the minimum variance of all linear unbiased estimators (BLUE) by the Gauss-Marko
Theorem
o If the residuals are also Normally distributed, then t and F tests can be used in the
interpretation to form a conclusion about the data
However there are also disadvantages of using the Ordinary Least Squares method, these
include:
o Because of the inflexibility of the assumptions required, if any are not met the OLS
estimation procedure breaks down and the estimator can no longer appreciate all of the
properties discussed above
o If the assumptions of homoscedasticity and Normal distribution are not present the
estimator will be unbiased and consistent, however it will give inefficient estimates, that
is, OLS will give incorrect estimates of the parameter standard errors
o Another problem is multicollinearity (extreme correlation) among the explanatory
variables which causes difficulties in computing the least squares estimates. The presence
of multicollinearity prevents the mathematical procedure from isolating and measuring
the contribution of each independent variable on the dependent variable
o If error terms do not occur randomly but exhibit a systematic relationship with the
magnitude of one or more of the independent variables, it is a condition of
heteroscedasticity. Heteroscedasticity gives misleading indications and cause the
coefficient of determination to overstate the explanatory power of the regression equation

Concepts used to Interpret Data

R2 (Coefficient of Determination)
The R-squared of the regression is the variation in your dependent variable that is accounted for
(or predicted by) your independent variables (the impact x variables have on y when they
change).

P value (Probability Value)


Hypothesis tests are used to test the validity of a claim that is made about a population. This
claim thats on trial, in essence, is called the null hypothesis. When you perform a hypothesis test
in statistics, a p-value helps you determine the significance of your results The p-value is the
probability of getting the results you did (or more extreme results) given that the null hypothesis
is true.
The alternative hypothesis is the one you would believe if the null hypothesis is concluded to be
untrue. The evidence in the trial is your data and the statistics that go along with it. All
hypothesis tests ultimately use a p-value to weigh the strength of the evidence (what the data is
telling you about the population).
Since it is a probability, the p-value is a number between 0 and 1, and it is interpreted in the
following way:

A small p-value (typically 0.05) indicates strong evidence against the null hypothesis,
so you reject the null hypothesis. There is a greater probability that the output was not
obtained by chance.

A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail
to reject the null hypothesis.

P-values very close to the cutoff (0.05) are considered to be marginal (could go either
way). Always report the p-value so your readers can draw their own conclusions.

Standard Error
The standard error is a single summary number that allows you to tell how accurate your
predictions are likely to be when you perform Linear Regression. It is an estimate of the standard
deviation of the coefficient, the amount it is varies across cases. It can be thought of as a measure

of the precision with which the regression coefficient is measured. If a coefficient is large
compared to its standard error, then it is probably different from 0.
Under Regression Statistics in excel, the standard error refers to the estimated standard deviation
of the error term u. But, under the ANOVA table, the standard error gives the standard errors (i.e.
the estimated standard deviation) of the least squares estimate of 1, 2 and 3.

Coefficients
The relationship between two variables that move in the same direction is called a positive or
direct relationship (both variables increase or decrease simultaneously). Whereas, a negative or
inverse relationship is one in which one variable increases while the other decreases, vice versa.
X1- Price of Commodity
The law of demand states that, all else being equal, an increase in price leads to a decrease in the
quantity demanded and vice versa. So, in this model, we would expect to observe an inverse
relationship between the commodity and demand.
X2 Consumers Income
For most commodities, an increase in income would bring about an increase in demand, and
consumers would demand less when their income decreases (positive relationship). Such goods
are called superior or normal goods. However this does not apply for all normal goods.
Commodities whose demand varies inversely with a change in income are called inferior goods.
So, in this model, we would expect the relationship between income and demand to be positive if
it is a normal good (in most cases) and negative/inverse if it is an inferior good.
X3- Price of Substitutes
A substitute commodity is one that can be used in the place of another good. An increase in price
of one good leads to a decrease in demand of that good and an increase in demand for the other
good.

Table showing the Price of the Product, Consumers Income and Price of Substitutes, as well as
Demand of the product that will be used in the Forecast

Year / Period

Demand (y)

Price (X1)

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995

40
45
50
55
60
70
65
65
75
75
80
100
90
95
85

9
8
9
8
7
6
6
8
5
5
5
3
4
3
4

Consumer
Income (X2)
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800

Price of
Substitute (X3)
10
14
12
13
11
15
16
17
22
19
20
23
18
24
21

Results

R2 (Coefficient of Determination)
From the results, it is seen that r2 has a value of 0.95. This interpreted means that 95% of the
time when the dependent variable changes (y), it is as a result of changes in the independent
variable(s) (x1, x2, x3).
We can also observe that r2= 1- (Residual SS/Total SS)

P value
X1- 0.0108 (the price of the commodity has a significant p value because we are only willing to
let 1% of this prediction be wrong.)
X2- 0.0550 (the consumers income has a marginal p value since we are only willing to let 5.5%
of this prediction be wrong.)
X3- 0.7887 (the price of substitutes has a weak p value because we are willing to let a whole
78% of the predicted outcome to be incorrect.)

Standard Error
This is the standard deviation of the error terms of the coefficients calculated in the data analysis.
The standard error results for = 19.78, 1 = 1.61, 2 = 0.01 and 3 = 0.64.

Coefficients
So, for this model, we would expect a positive relationship between our demand and price of
substitute.
In multiple linear regression, the size of the coefficient for each independent variable gives you
the size of the effect that variable is having on your dependent variable, and the sign on the
coefficient (positive or negative) gives you the direction of the effect, with positive being an
increase and negative, a decrease.
X1- Price of Commodity

From the results, we can observe that for every $1 increase in price, demand will decrease by
4.93 units.
X2- Consumers Income
The estimations show that for every $1 increase in income, demand will increase by 0.02 units.
X3- Price of Substitutes
From the results, an increase in price of substitutes by $1 would bring about an increase in
demand by 0.17 units.

Mathematical Model
Using the mathematical model, y=+1x1+1x2+3x3, we can input the coefficients derived
from our estimates to predict the demand for the company given some idea of the values of the
independent variables, X1, X2 AND X3.
For example if an estimation of the price of the product is five dollars ($5), consumers income is
eight hundred dollars ($800) and the price of the substitute is six dollars ($6), the demand
forecast would be equal to:
Y = + 1 X1 + 2 X2 + 3 X3
Y = 79.11 + (-4.93)(5) + (0.02)(800) + (0.17)(6)
Y= 79.11 + -24.65 + 16 + 1.02
Y= 71.48 units

You might also like