Professional Documents
Culture Documents
P545
Applied Econometrics
This module is partially based on the earlier module Applied Econometrics for the
Agricultural and Food Sector prepared for the University of Londons External Programme
by Alison Burrell.
SOAS | 3736
SOAS CeDEP 1
P545 Applied Econometrics Module Introduction
MODULE INTRODUCTION
The focus of the module is on the classical linear regression model. This is the
basis for much econometric methodology and it provides the framework for
organising the module.
There is a limit to the distance that can be covered in the study time available.
In an econometrics module, the trade-off between breadth and depth is low
since, without good groundwork and sufficient information at each stage, ideas
may be misunderstood and techniques misapplied. This module follows the
standard itinerary of most econometrics textbooks. It deals only with single-
equation models, but by the end of the module the student is ready to tackle
simultaneous equation models, which would be the next stop-over on this
itinerary.
The practical exercises designed to be done with the help of the free computer
software package R are an important element of the module.
SOAS CeDEP 2
P545 Applied Econometrics Module Introduction
Module Aims
The specific aims of the module are:
To present the theory of the classical linear regression model and explain why
the conditions in such a model provide an ideal environment for ordinary least
squares regression.
To show how econometric models can be made more realistic through the use
of dummy variables and a dynamic specification.
understand and selectively and critically apply the basic principles of regression
analysis and statistical inference in the context of a single-equation regression
model
test hypotheses about economic behaviour and critically interpret the results of
these tests
SOAS CeDEP 3
P545 Applied Econometrics Module Introduction
STUDY MATERIALS
The single textbook for the module is:
This book has been chosen for this self-study module because of its attention to full
explanations of concepts and procedures, its long introductory section presenting the
basic statistical concepts used in regression modelling, and its avoidance of
unnecessary algebra and difficult notation. At least part of every chapter is required
reading for the module, and you are also encouraged to read those parts that are not
specifically identified in the module texts, since all the material here should be within
your grasp and will reinforce your understanding of the subject. By the end of the
module, you will know this textbook well and will be ready for other more advanced
readings.
Each unit in the study guide follows the same format and we will say more about the
reasons for choosing their format in Unit 1. Each unit starts with a section on ideas
and issues where the main ideas of the unit are explained in a relatively non-
technical way. This is followed by a study guide section which takes you through
the relevant parts of the textbook, commenting on and reinforcing the material.
Next, there is a section (except in Unit 6 and in Unit 10) containing one or more
worked examples in which various techniques and their application and
interpretation are illustrated in the context of a practical modelling exercise. These
examples are an important tool for learning: follow them through carefully, and later
on, when doing the exercises, take care not to skip the question that asks you to
reproduce the results of the worked example yourself. The worked example is
followed by a set of questions whose answers are contained at the end of each unit
and a summary of the material in the unit. Last but by no means least, each unit
(except Unit 6 and Unit 10) also has a R guide explaining any new computer
commands and procedures that are necessary for answering the questions in that
unit.
The R guide of Unit 1 explains how to install and get started with
this free software.
When studying each unit, we suggest that you study the text at your own pace and
then work through the first few unit questions which are always designed to test your
basic understanding of the unit material. If these questions reveal some weak spots,
refresh them first before going on to the applied, data-based questions which you
will need to answer in conjunction with the R Guide. When you have finished the
questions and checked your answers, you should make a note of any additional
knowledge or insights you have gained by doing the questions that you missed when
studying the module text. The summary at the end of each unit briefly describes the
topics covered and lists what you should have learnt through your study.
SOAS CeDEP 4
P545 Applied Econometrics Module Introduction
Applied statistics and econometrics are subjects with a great deal of specialised
jargon. It can be disconcerting to be faced with a number of unfamiliar new terms,
many of them quite long and often rather similar to each other. We recommend very
strongly that, right from the beginning of the module, you keep a glossary in which
you list each new term as you encounter it, together with an explanation of the
term in your own words. You should read through this glossary every few weeks,
updating your definitions if you find that, as the module progresses, your
understanding of the term develops along with your familiarity with the concept. A
space is provided at the back of the study guide for your glossary.
SOAS CeDEP 5
P545 Applied Econometrics Module Introduction
FURTHER READING
Intermediate textbooks
Greene W (2000) Econometric Analysis, 4th edn. Prentice Hall, New Jersey.
Judge GG, Hill CR, Griffiths WE, Ltkepohl H, Lee T-C (1982) Introduction to the
Theory and Practice of Econometrics. John Wiley, Chichester.
Advanced textbooks
Other
SOAS CeDEP 6
P545 Applied Econometrics Module Introduction
STUDY METHODS
Remember that people learn in different ways (and at different speeds). The units of
the module follow a logical progression in setting out and elaborating the principles
of the subject, but you can move about between units and topics if this suits you.
There is no single rule about how best to learn the kind of material presented in this
study guide. Perhaps the best thing to do at the very beginning is to flick through the
materials, picking out what is most interesting to you, noting what seems more
difficult and what seems easier. You will notice that the module is activity-intensive.
There are many questions and exercises to help you acquire the necessary analytical
skills. Answers to questions and exercises are provided at the end of each relevant
unit. Try to answer them on your own first before consulting the answers we have
provided!
The wealth of material means that it is necessary to pace yourself through it. The
Indicative Study Calendar at the end of this introduction gives you guidance on this
and a series of study tips are included below, which may give you some hints on how
best to study the module material.
In note taking, the activity of selecting what you think is most important, interesting
and relevant, and putting it into your own words, is a powerful means of acquiring
and developing a sound knowledge of the subject. As you read you should
simultaneously be
There are some features in the text of each unit that invite you to take some specific
action before reading further:
SOAS CeDEP 7
P545 Applied Econometrics Module Introduction
This icon invites you to halt and think about the question given. So
cover up the rest of the page unread, and write down what you think is
a reasonable answer to the question before reading on. This is
equivalent to lecturers asking a question of their class and using the
answers as a springboard for further explanation. The explanation,
where appropriate, will be given in the following text, or may be gleaned
from the relevant reading.
Q uestion 0.0
Glossary
Key words and terms are often repeated at the end of each unit in the Summary.
You should ensure you know what they mean before moving on to the next unit.
Where you are not already familiar with the term from previous study, you should
include it in your own glossary. You should add other terms that are new to you to
your glossary whenever this seems likely to be helpful. Some key words are very
likely to be used in examination questions, and an explanation of the meaning of
relevant key words will nearly always attract credit in your answers.
SOAS CeDEP 8
P545 Applied Econometrics Module Introduction
TUTORIAL SUPPORT
There are two opportunities for receiving support from tutors during your study, and
you are strongly advised to take advantage of both. These opportunities involve
Additional features of the VLE include a technical area if you have any access
problems, an administrative area for any relevant queries and profile areas where
students and staff may introduce themselves. A very popular feature is the student
caf where students may socialise and interact regarding any issue they choose
(tutors are not allowed entry).
SOAS CeDEP 9
P545 Applied Econometrics Module Introduction
ASSESSMENT
This module is assessed by:
Since the EA is an element of the formal examination process, please note the
following:
(a) The EA questions and submission date will be available on the Virtual Learning
Environment.
(c) The EA is marked by the module tutor and students will receive a percentage
mark and feedback.
(d) Answers submitted must be entirely the students own work and not a product
of collaboration. For this reason, the Virtual Learning Environment is not an
appropriate forum for queries about the EA.
SOAS CeDEP 10
P545 Applied Econometrics Module Introduction
Unit 8 Heteroscedasticity 15
Unit 9 Autocorrelation 15
Examined Assignment 15
Check the virtual learning environment for submission deadline
SOAS CeDEP 11
Unit One: Introduction to Econometrics
Unit Information 1
Unit Overview 1
Unit Aims 1
Unit Learning Outcomes 1
Unit Summary 21
UNIT INFORMATION
Unit Overview
This unit introduces you to the study of econometrics. It begins by defining
econometrics and then explains how econometrics relates to and differs from other
branches of economics. The important roles of economic theory and data in
econometric work are emphasised. Regression analysis is identified as the basis of
econometric procedure. The aims and purpose of regression analysis are explained.
The main steps of a typical econometric investigation are described and illustrated
with an example.
Unit Aims
To define the nature and scope of econometrics
SOAS CeDEP 1
P545 Applied Econometrics Unit 1
Economic theory is concerned with relationships between variables. You have already
met some of these, including demand and supply functions for agricultural products,
production functions, labour supply and demand functions, and so on. Economic
theory aims to explain economic behaviour; this involves studying the relationship
between economic variables and the factors that influence them.
Other definitions are possible: in your textbook you will come across a number of
definitions that each has a slightly different emphasis. Common to all definitions,
however, is the stress on the empirical nature of econometric work.
This module can be studied in its own right, but normally we would expect you to
take it as part of the MSc programme where, in Part I, you will have studied various
economic theories and models. You should therefore be familiar with a range of
questions raised in theoretical discussions and with the results of some applied
empirical studies. These are good foundations on which to build the study of
econometrics. If up to now you have approached empirical studies from the point of
view of theory or the consequences for policy making, we now invite you to look at
them from the point of view of an econometrician. What is the difference?
SOAS CeDEP 2
P545 Applied Econometrics Unit 1
Non-experimental data
Economic theory develops models using a priori reasoning applied to relatively simple
assumptions. This procedure involves abstracting from secondary complications by
assuming that other things remain equal (or ceteris paribus), in order to
investigate the links between a few key economic variables.
For example, in demand theory we say that the quantity demanded of a commodity
(that is not a Giffen good) will fall if its price rises, other things being equal. These
other things which we assume are held constant include consumers incomes and
income distribution, and the prices of substitutes and complementary goods.
This method is fruitful in economic theory but, unfortunately, it is rarely possible to
carry out controlled experiments to test such statements. Therefore, in empirical
economics the scope for observing such behaviour is severely limited. A researcher
cannot alter a commoditys price, holding other things constant, in order to see what
happens to its demand.
In general, economic data are not the outcome of experiments but rather are
observed and recorded in a non-experimental world where other things are never
equal. Therefore, econometrics involves untangling the effects of different factors
that act simultaneously rather than analysing the results of a laboratory experiment.
Stochastic relationships
Economic theory usually involves deterministic relationships between economic
variables. This can be explained with a simple example: the Keynesian consumption
function. In economic theory we assume that, if we know the level of aggregate real
income, consumption will be uniquely determined. That is, for each value of
aggregate real income there corresponds a given level of aggregate consumption.
In reality, however, we do not expect theoretical relationships to hold exactly. Even
when all the main factors that systematically affect the behaviour of an economic
variable are taken into account, there will still be some random variation due to non-
systematic, one-off factors and human variability.
Hence, in econometric work we deal with relationships between variables that
contain a random or stochastic element, and that are therefore not deterministic in
nature. We investigate functions between variables which we believe to be
reasonably stable on average, but there is always a degree of uncertainty about
them.
In econometrics we make explicit assumptions about these random components,
called disturbances. This is why econometrics draws heavily on probability theory
and statistical inference.
Observed variables
In economic theory we work with theoretical variables. Econometrics, in contrast,
deals with observed data.
Obviously, there is a certain correspondence between them: data collection is
inspired by some theoretical framework. For example, the framework for measuring
national income account data derives from Keynesian economics, which is centred on
the analysis of theoretical aggregates such as output, demand, employment and the
price level.
SOAS CeDEP 3
P545 Applied Econometrics Unit 1
(1) by recording how consumption and income move together over time, or
First, we may need to modify our theory for explaining consumption changes over
time before it can be applied to cross-sectional consumption analysis.
SOAS CeDEP 4
P545 Applied Econometrics Unit 1
the fact that we cannot hold other things constant in empirical analysis
But in empirical work the relationships we wish to disentangle from the data may
involve a number of variables, and may be subject to uncertainties that our theories
could not possibly aim to explain. Econometric methodology therefore includes
approaches for dealing with these issues, as well as the statistical techniques of
parameter estimation.
SOAS CeDEP 5
P545 Applied Econometrics Unit 1
You may be worried about studying econometrics. After all, it involves working with
mathematics and statistics, and you may feel that this is not one of your strengths.
Or perhaps you welcome more emphasis on mathematics and statistics. Whichever is
the case, it is useful to be aware of a particular problem that may arise when
studying econometrics.
This is normal and, indeed, necessary. But this preoccupation with technical detail
often implies that students lose a perspective on What is it all about? and Why are
we doing this? That is, there is a need to keep a grip on the kinds of basic questions,
which give substance to the subsequent technical exercises, uncluttered by notation
and technical detail. We need to get an overview of a problem before we attack it
aided by our technical armoury. We need to know the simple questions and intuitive
insights which have prompted elaborate technical enquiries.
For this reason, as we explained in the module introduction, each unit of the module
text will always start with a section on ideas or issues, whose purpose is to explain,
in simple words and with a minimum of technical notation, the basic substance of the
unit.
The aim is to give you an intuitive feel for the subject matter before going into
technical detail. If you feel that mathematics and statistics is not your strongest suit,
this regular section will give you a few analytical handles to hold on to when
studying relevant techniques. But even if you are confident with mathematics and
statistics, it is important not to skip this section.
Technical expertise is not just a question of ones ability to work out the steps in a
technical procedure or to understand a mathematical derivation. It also involves
understanding the type of questions a technique tries to address and the
assumptions on which it is based as well as judging the appropriateness of particular
technical procedures in specific conditions.
Next, the module units contain a study guide which guides your study of the
textbook. The purpose of this section is to structure your reading of the textbook as
well as to provide brief comments, elaborations and cross-references to exercises
and examples, and to suggest shortcuts in coping with the material.
SOAS CeDEP 6
P545 Applied Econometrics Unit 1
Following on from this, you will find a section containing an example (except Units 6
and 10). The purpose of this section is twofold.
First, the example highlights a specific aspect of the topic under study in a
particular unit of the module.
Second, the example also tries to give you a glimpse of econometrics in action.
Sometimes, you will be asked to participate in analysing the example. The examples
aim to highlight the links between economic theory and empirical investigation, and
to illustrate the problems that can arise when we work with real data.
Next you will find a set of self-assessment questions. It is most important that you
work through all of these. Their purpose is threefold:
This is followed by a section that gives a brief summary of the main issues raised in
the unit.
At the end of each unit you will find answers to the unit self-assessment questions
and (except in Units 6 and 10) a guide that explains how to use R the software
package you will use to carry out econometric exercises. This guide will help you to
master this particular econometrics software package.
To summarise
The section on ideas or issues aims to whet your appetite by giving you an overview
of the topic of the week, expressed in non-technical language.
The core of the module unit is the study guide. This guides you through your reading
of the textbook.
The example is meant to close off the study for that particular unit. It aims to
highlight a problem dealt with in the module material with real data.
The summary draws your attention to the main points made in the unit.
The self-assessment questions are important and you should always work through
them. They will help you to understand the module material, and the knowledge and
experience you gain from doing them will help you to write assignments and answer
examination questions.
The remainder of this unit presents an introduction to regression analysis. As you will
see, it is structured along the pattern outlined above.
SOAS CeDEP 7
P545 Applied Econometrics Unit 1
What is regression?
Regression is the main statistical tool of econometrics. But what is regression?
This is indeed what one would expect: on average, poorer families spend a higher
proportion of their income on food in comparison with better-off families. Note that
we refer to the proportion of total household expenditures spent on food and not
total food consumption of the family (one would expect better-off families to spend
more money on food even though these expenditures are generally a smaller
proportion of their total expenditure).
This leads us to the concept of regression: Regression methods bring out this
average relationship between a dependent variable (the -variable) on the one hand
and one or more independent variables (the -variables, also called the explanatory
variables) on the other.
In our example, the average relationship between the share of food in household
expenditure and the level of household income is the regression of the former
variable on the latter.
Hence, in regression analysis we seek to model the chance variation around the
average line as well as the average line itself.
In summary, we hope that our model captures the basic structure of interaction
between economic variables. We expect that the behavioural relationships are
SOAS CeDEP 8
P545 Applied Econometrics Unit 1
reasonably stable but we know that they do not hold exactly because of the random
component (the disturbance term). At most, we expect these relations to hold on
average.
Trying to determine this average relationship amidst the random variation in the data
is like trying to separate sound from noise when listening to a badly tuned radio.
A regression line: this models the average relationship between the dependent
variable and its explanatory variable(s). This requires us to make an explicit
assumption about the shape of the regression line: the function that expresses it
may be linear, quadratic, exponential, etc.
We are not interested in the disturbance term as a variable per se, but we are keen
to remove its blurred messages that hamper our attempts to investigate the
behavioural relationship between the variables of our model. To do this, we need to
model the stochastic (probabilistic) nature of the disturbance term. This is no easy
task and we always need to think carefully about whether the assumptions we make
about the behaviour of the disturbance term are indeed appropriate for the
relationship under study. Not surprisingly, a great deal of econometric theory and
practice revolves around these assumptions.
It is useful to express these important ideas more formally. We start with the
population regression function. This is a theoretical construct representing a
hypothesis about how the data are generated. For the simple, two-variable linear
regression model we have
= 1 + 2 + (1.1)
Typically, the variables and are observable for each observation , the
disturbance takes different values for each but is not observable, whereas the
parameters 1 and 2 are unknown but constant for all observations.
The presence of the random disturbance means that is stochastic: for each value of
the explanatory variable, , there is a distribution of -values.
SOAS CeDEP 9
P545 Applied Econometrics Unit 1
E(| ) = 1 + 2 (1.2)
The disturbance term, , accounts for the variation in Y around the population
regression line. In Unit 3 you will learn about the assumptions made concerning .
= 2 +
1 + (1.3)
in which 1 and 2 are random variables (the particular estimates obtained depend
on the particular sample of data on and used) that differ from the population
parameters 1 and 2 .
Whereas the disturbance term accounts for the variation in around the
population regression line, the residuals give us the vertical deviations of the
observed -values from the estimated regression line derived from sample data.
The residuals, therefore, are not identical with the disturbances, but clearly they may
contain some information that can help us understand the behaviour of the
disturbances. How to analyse the information contained in the residuals is addressed
in later units.
SOAS CeDEP 10
P545 Applied Econometrics Unit 1
+
=
(1.4)
1 2
in which is the fitted value of the dependent variable, the estimator of (| ), that
is the estimator of the population conditional mean (cf. equation (1.2)). The sample
linear regression line is an estimator of the population regression line.
= 1 + 2 +
2 = (1.5)
= 2 1 (1.6)
which, after taking natural logarithms of both sides of the equation, can be written as
where 1 = log.
This model is also linear in the parameters 1 and 2 . We may view the model as
* = 1 + 2 * + (1.8)
SOAS CeDEP 11
P545 Applied Econometrics Unit 1
This model is known by various names logarithmic, double log, log-log, log-linear
and constant elasticity and is frequently used in applied work to characterise the
form of the functional relationship between the variables. It has the property that the
slope coefficient measures the elasticity of with respect to because
log
2 = = (1.9)
log
Correlation analysis
Although regression analysis is related to correlation analysis, conceptually these two
types of analysis are very different.
The main aim of correlation analysis is to measure the degree of linear association
between two variables and this is summarised by a sample statistic, the correlation
coefficient.
Regression analysis, on the other hand, can deal with relationships between two or
more variables and the variables are not treated symmetrically:
the former is random whereas the latter are often assumed to take the same
values in different samples often referred to as fixed in repeated samples
It is important to note that the regression of on does not give the same
sample regression line as the regression of on .
SOAS CeDEP 12
P545 Applied Econometrics Unit 1
This does not mean, however, that data play only a passive role in economic
analysis. Empirical investigation is an active part of theoretical analysis inasmuch as
it involves testing theoretical hypotheses against the data as well as, in many
instances, providing clues and hints towards new avenues of theoretical enquiry.
Theoretical insights have to be translated into empirically testable hypotheses that
we can investigate with observed data. Hence, theory and data are interactive:
theoretical propositions should be continually tested empirically and theoretical
insights can be improved with the aid of signals from the data.
Most of the data we use in applied economic analysis are not obtained from
experiments but are the result of surveys and observational programmes. National
income accounts, agricultural and industrial surveys, financial accounts, employment
surveys, population census data, household budget surveys, and price and income
data are collected by various statistical offices. They are records of unplanned
events; they are not the outcome of experiments. The nature of this economic data
makes an econometricians work quite different from that of a psychologist or an
agricultural scientist.
In the latter cases, experiments play a central role in empirical research, and much
emphasis is put on the careful design of experiments in order to single out the
stimulus-response relationship between two variables whilst controlling for the
influence of other variables (that is, by holding them constant).
In economics, the scope for experimentation is very limited. We cannot change the
price of a commodity, holding incomes and all other prices constant, just to see what
would happen to the demand for it. In economic theory, we assume that other
things are equal (ceteris paribus) and focus on cause and effect between the
remaining variables. But in empirical analysis other things are never equal, and we
have to observe the behaviour of economic agents from survey data. Multiple
regression techniques allow us to account for the influence of other variables whilst
investigating the interaction between two key variables, but this is not the same as
holding other variables constant.
A careful observer uses data not just to confirm his or her theories, but also to get
clues from empirical analysis to advance his/her theoretical grasp of a problem. It is
primarily this aspect that enables data to contribute to the process of analysis.
SOAS CeDEP 13
P545 Applied Econometrics Unit 1
What is econometrics?
Why study econometrics?
The next section of the textbook is particularly important as it explains how you
might proceed in a typical econometric study. Gujarati and Porter identify eight steps
associated with the typical econometric investigation. Each of the first seven steps is
illustrated in the context of the decisions to enter the labour force.
You will see that in this example the data are plotted in a scatter diagram (or
scatter plot) that helps to visualise the relationship between two variables in the
data. Notice also the central role of estimating the parameters of the model and so
obtaining the estimated regression line.
Gujarati and Porters discussion of the data steps distinguishes between time-series,
cross-section (or cross-sectional) and pooled data (one type being panel data).
Table 1.1 gives a summary of the key differences between our terminology and
notation, and that used by Gujarati and Porter throughout their textbook.
Unknown parameters
Notation Greek letters upper-case Roman letters
Estimated parameters
Notation Greek letters with lower-case Roman letters
Logarithm to base e
Notation log ln
SOAS CeDEP 14
P545 Applied Econometrics Unit 1
SOAS CeDEP 15
P545 Applied Econometrics Unit 1
= 1 + 2 (1.10)
= 1 + 2 + (1.11)
Collection of data
The data to be used are annual time-series data for the UK covering the period
19551991. They are aggregate consumption expenditure and personal disposable
income both measured in (1985) million. The source of the data is the Economic
Trends Annual Supplement 1991. Thus, our model represents a theory about the
SOAS CeDEP 16
P545 Applied Econometrics Unit 1
behaviour of aggregate consumption over time. A scatter plot of these data is given
in Figure 1.1.
It is obvious from this scatter plot that the relationship is upward sloping and it
seems to be reasonably linear.
Parameter estimation
Using these data the parameters 1 and 2 can be estimated to obtain the average
relationship between and . Just how the coefficients of the population regression
function are estimated will be explained in Unit 3. The consumption function
estimated with our data is
and this represents the average relationship between consumption expenditure and
personal disposable income.
SOAS CeDEP 17
P545 Applied Econometrics Unit 1
Prediction
We can use the estimated model to predict what consumption expenditure would be
if personal disposable income were a particular amount. Suppose personal disposable
income was 250 000 million. The predicted amount of consumption expenditure is
=226 202
SOAS CeDEP 18
P545 Applied Econometrics Unit 1
Q uestion 1.1
What are the links between econometrics and both economic theory and mathematical
economics?
For the rest of the questions, you will need to use R software. Please turn to the
R Guide for Unit 1 now and follow the instructions given.
Q uestion 1.2
The data file u1q2.txt contains annual time-series data for the United States over the period
19591991 on aggregate consumption expenditure, , and disposable income, , both measured
per head of population and in billions of constant 1987$. The source of the data is Economic
Report of the President, 1992, table B-5, page 305.
Use R software to produce a scatter plot of on the vertical axis and on the horizontal
axis. Comment on the scatter plot: would a linear regression seem appropriate?
Use R software to obtain time-series plots of and . Describe the way consumption and
income have moved over the period 19581991.
Q uestion 1.3
The hypothesis that foreign direct investment is determined by demand suggests that foreign
direct investment and gross domestic product are positively related, other variables remaining
constant. The data file u1q3.txt contains annual time-series data for the period 19581985 on
foreign direct investment, FDI, and gross domestic product, GDP, for Taiwan. The source of these
data is Pan-Long Tsai, Determinants of foreign direct investment in Taiwan: an alternative
approach with time series data, World Development, 1991, Table A-1, page 285.
Use R software to obtain scatter plots of FDI on GDP and the logarithm of FDI on the logarithm
of GDP, both for the period 19581985. Comment on the two scatter plots. Which of the
following would you expect to be the more appropriate linear regression model:
(a) = 1 + 2 + ?
or
SOAS CeDEP 19
P545 Applied Econometrics Unit 1
Q uestion 1.4
The data file u1q4.txt contains cross-section data from a sample of 100 rural households on the
value of their consumption and income during a given month. Income () includes cash income
from all sources during the month concerned, plus the (imputed) market value of own production
consumed by the household. Consumption ( ) includes the value of all purchased items, plus the
value of own production consumed by the household. The units are measured in the local
currency rounded to the nearest whole number.
(a) Obtain the scatter plot of on . What is the main difference between this scatter plot
and the one constructed in Q 1.2?
(b) Use R software to obtain the histograms of and . Income has the usual positively
skewed distribution that we would expect, whereas the distribution of consumption is less
skewed. Can you suggest a reason for this?
(c) Use R software
(i) to obtain the average propensity to consume at the sample means
(ii) to compare the degree of skewness of the two variables
(iii) to obtain their correlation coefficient.
Q uestion 1.5
Weekly earnings can vary considerably in the case of casual dock labourers recruited on a day-to-
day basis. There are differences between workers as well as across weeks. Weekly earnings will
vary from week to week depending on the activity of the harbour which determines the demand
for labour. Daily recruitment will be high if demand is high, and vice versa. Earnings also vary
between workers in any given week. These depend on the numbers of days a worker manages to
get recruited for in a particular week, on whether he or she is recruited for the day shift or the
night shift, and on the number of hours of overtime he or she works in that week.
In this exercise you will look at data on the weekly earnings of casual workers, ECAS, and the
recruitment of casual workers, CASREC. The data file u1q5.txt contains paired observations on
the two variables ECAS and CASREC. The data were taken from a field study carried out in
1980/81 by the Centre of African Studies in Mozambique (Eduardo Mondlane University, Maputo)
on casual labour on the docks of Maputo harbour. The earnings data are in units of 100 MT, the
local currency being the Metical.
(a) Using R software, calculate the means, standard deviations, and minimum and maximum
values for both variables.
(b) A particular worker is randomly chosen from the labour force in a particular week in
1980/1981 that is also randomly chosen. Using the information in your answer to part (a),
what is your best estimate of the weekly earnings of this randomly selected worker?
(c) With R software, obtain the scatter plot of ECAS against CASREC. Write down what you
observe.
SOAS CeDEP 20
P545 Applied Econometrics Unit 1
UNIT SUMMARY
In this unit we have introduced some basic ideas on econometrics and regression
analysis. The most important points to remember are the following.
you are familiar with the scatter plot as a practical tool of empirical analysis
you know how to load data into R software from a pre-existing data file
SOAS CeDEP 21
P545 Applied Econometrics Unit 1
A 1.1
Economic theory can be viewed as a set of qualitative relationships between variables. Such
theory can frequently be written in the form of a mathematical model. An econometric model
may be obtained from an appropriate mathematical model with the addition of a random error
term. By using data to estimate the econometric model we can in effect quantify economic
relationships.
A 1.2
(a) The scatter plot of against for the United States data is given below.
13000
12000
11000
u1q2$C
10000
9000
8000
7000
u1q2$Y
The scatter plot shows that and have the expected positive association. Their
underlying relationship appears to be approximately linear and seems to be relatively
strong in that the observations would appear close to a regression line drawn in the
scatter plot.
(b) Real consumer spending has been on an upward trend over the whole period, with a
downturn in just three sub-periods, the first of which in 1974 was triggered by the first
OPEC oil price rise. Real income has also been following a long-run rising trend. It has
shown more variability around its long-term trend since the early 1970s than in the earlier
years.
SOAS CeDEP 23
P545 Applied Econometrics Unit 1
A 1.3
The scatter plots of FDI on GDP and logFDI on logGDP are given below.
350
300
250
u1q3$FDI
200
150
100
50
0
u1q3$GDP
SOAS CeDEP 24
P545 Applied Econometrics Unit 1
6
5
u1q3new$LFDI
4
3
2
7 8 9 10 11
u1q3new$LGDP
When the variables FDI and GDP are used (equation (a)), it seems that an upward-sloping curve
may be more appropriate than a straight line. The point in the top right corner seems to lie a
long way from the linear function that one might eyeball through the other observations.
Moreover the points seem to spread out more as GDP increases. With the logarithms of the
variables, a positive relationship that is approximately linear is more clearly seen. Therefore,
equation (b) may be the more appropriate regression model.
SOAS CeDEP 25
P545 Applied Econometrics Unit 1
A 1.4
50
40
30
20 40 60 80 100
u1q4$Y
The most striking difference relative to the time-series scatter plot obtained in Q 1.2 is that
here, in the cross-section data set, there are a number of households with the same or
very similar income levels but with very different consumption levels, so that the data
points are more dispersed in the vertical dimension. On the other hand, with the time-
series data, values of income are not repeated and the data points lie quite close to a
(straight) line. During the period 19591991 the US economy was growing so that both
real income and consumption were upward-trending, thus making repeated values of real
income unlikely.
(b) The range (maximum value minus minimum value) of is much less (7325) than that for
(10322). This is due to the nature of the relationship between consumption and
income. If the average propensity to consume is declining, then high income households
will consume a smaller proportion of their income than low income households, and so
consumption will have a smaller positive skew than income.
SOAS CeDEP 26
P545 Applied Econometrics Unit 1
(c) (i) The average propensity to consume at the sample means is given by
= 0.98699
= (approx) 0.987
(ii) The coefficient of skewness is 0.961725 for income compared with 0.290131 for
consumption.
Note: for a perfectly symmetrical distribution, this coefficient equals zero, and for
negatively skewed distributions (ie with the tail to the left) it takes negative values.
(iii) The simple correlation coefficient (or Pearsons correlation coefficient) for and
is 0.7734, showing that the underlying correlation is positive. However, since the
coefficient is well below its maximum value of 1 (= perfect positive linear
correlation) this indicates that there is some scattering of the points around a
straight line.
A 1.5
(a) The means, standard deviations, and minimum and maximum values for both variables are
given below:
ECAS CASREC
(b) Your best estimator of the weekly earnings of a randomly selected worker would be the
overall mean of the sample. The reason is as follows: you have no way of knowing
whether a good or a bad week for average earnings of workers was selected, nor whether
the particular worker concerned performed better or worse than others in that (unknown)
week. If you want to give a good estimate, and not gamble on the outcome, your best
guess is to take the average of the whole sample. The best estimate is 17.12 or 1712 MT
since this is the arithmetic mean of the whole sample.
SOAS CeDEP 27
P545 Applied Econometrics Unit 1
50
40
u1q5$ECAS
30
20
10
0
u1q5$CASREC
There is considerable variation between workers in terms of their earnings in any particular
week. The ranges are wide: in some cases from 0 to nearly 60 (that is, 6000 MT).
On the whole, the scatter slopes upwards as we move from lower to higher levels of
recruitment, but the relatively modest slope is hidden behind a great deal of variation
within weeks.
Did you notice that the range in the variation between workers earnings tends to increase
when we move from the left to the right? That is, higher levels of recruitment go together
with a wider range in weekly earnings between workers.
SOAS CeDEP 28