You are on page 1of 430
Applied Time Series and Box-Jenkins Models C2 DEDALUS - Acervo - FEA iii 20600004514 Applied Time Series and Box-Jenkins Models Walter Vandaele ton London Sydney ‘Tokyo. Toronto | | | To my three girls, A. A., and L. P. Find Us on the Web! http: /wwwapnet.com Copyright © 1988 by Academic Press AIL Rights Reserved No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. ‘Academie Press ‘A Division of Harcourt Brace & Company ‘5258 Swen, Suite 190, San Diego, Calfoma 921 (United Kingdom Edition published by [ACADEMIC PRESS LIMITED 12428 Oval Read, London NWI 7DX. ISBN: 0-12-712650-3 Library of Congress Catalog Card Number: 82-79937 ‘Prin in the United Stes of Aesicn Se99 ED 7 . = SS = Gy reause Pov ht — wr pan TABLE OF CONTENTS Preface vil Acknowledgments xi CHAPTER 1 INTRODUCTION 1 4.1 ‘The Importance of a Good Forecasting System 2 41.2. The Systematic Approach 2 13 What Isa Time Series? 3 14 Examples 5 CHAPTER 2 STATIONARITY 24 Calculating the Mean 12 22 Stationarity 13 23. Nonstationary Data: A Time Series Plot 16 uw CHAPTER 3 AVERAGE MODELS 31 3.4 Autoregressive Models 52 3.11 First-Order Autoregressive Models 32 3.4.2 Restrctlons on the Autorepressive Parameters: Stationarity Condition 34 15. Objectives of Time Series Analysis 8 4.6 Univatiate and Multiple Time Series Models 8 1L7 ‘The Univariate Box-Jenkins Time Series Methodology 9 24 25 26 27 Stabilization of the Variance 18 Removal of the Trend 20 Seasonal Fluctuations 25 Summary 29 AUTOREGRESSIVE INTEGRATED MOVING 31.3. Special Characteristics of an AR Process 35 3.4.4 An Intuitive Look at the Stationarity Condition 37 31.5 Higher-Order Autoregressive Models 39 iv TABLE OF CONTENTS 32. Moving Average Models 40 32:1 First-Onder Moving Average Models 40 322 Parsimony 41 32.3. Restrictions on the Moving Average Parameters Investbilty Condition 42 ‘324 Special Characteristics of an MA Process. 43 325. Higher-Order Moving Average Models, 44 33. Mixed Autoregressive Moving Average Models 45 331 ARMA(LA) Models 45 33.2. Restrictions on the Parameters of the Mixed Model 45 33.3. Special Characteristics of an ARMA(1,1) Process 47 34 Higher-Order ARMA Models 48 CHAPTER 4 4.1. Identification Strategy 63 42 Autocorrelation Functions of ARIMA, Models 64 42:1 The Autoconelation Function 65 42.2. Deciding If «Series Is Stationary 66 42.3. The Autocorrelation Function of Stationary Series 73 4.3. Partial Autocorrelation Functions of ARIMA Models 85 43.1 Partial Autocortelations 86 43:2 Autoregressive Models 87 43.3 Moving Average Models 99 494 Mixed Models 91 CHAPTER 5 CHECKING 113 5.1 Model Estimation 114 52 An Example 117 533. Diagnostic Checks 121 53 Stationarity Analyses 123 CHAPTER 6 61 Overview and Notation 138 62 Optimal Forecasts and Cost Functions 139 69. Minimum Mean Squared Error Forecast 141 64° Stationary Models 142 IDENTIFICATION 61 FORECASTING 137 a SE 34 Nonstationary Models 49 34:1 Random Walk Model 49 342 Autoregressive Integrated Moving Average Models st 35. Notation 52 35. Difference Operator 52 352 Backward Shift Operator 53 353. ARIMA Models 54 36 Seasonal Models 55 36: Seasonal Autoregressive Models 5 362 Seasonal Moving Average Models 57 36.3 Mixed Seasonal Models 97 364 Goneral Multiplicative Seasonal Models 58 37 Summary 60 44 Autocorrelation Functions and Partial ‘Autocortelation Functions of Seasonal Models 91 44. Autoregressive Multiplicative Seasonal Models 91 442 Moving Average Multiplicative Seasonal Models 97 443 General Characteristics of Multiplestive Sersonal Models 101 45. Sample Autocortelations and Sample Partial ‘Autocorrelations 106 45. Sample Autocorrlations 106 452. Sample Partial Autocorrelations 108 46 Summary 112 ESTIMATIONS AND DIAGNOSTIC 53. Residual Analyses 124 53.3 Overspecified Model: Omiting Parameters 129 53. Underspecfied Model Fitting Extra Parameters 133 54 Summary 135 64.3 An ARMA(L.1) Model 147 65. Nonstationary Models 147 651 An IMA(I,1) Model 147 652 An ARIMA(L,0) Model 150 66 Interval Forecasts 151 66.1. Ftst-Order Autoregressive Model 154 662. First-Order Moving Average Model 154 ‘TABLE OF CONTENTS. V 663. Aw IMA(I,1) Model 154 67 An Beample 155 68. Updating the Forecasts 157 69. Exponential Smoothing 159 6410 Summary 160 CHAPTER 7__ UNITED STATES RESIDENTIAL CONSTRUCTION 161 74 Preliminary Analysis 163 7a Data Examination 163 7212 Comments on the Construction Data 164 72. Wentifiation 164 724 Autocorrelation and Partial Autocorrelation Functions 16 72.2 Comments 169 723 Series Yue 170 724 Setles Vix 170 728 Series YYuy TL 7.26 Summary 172 73. Estimation 172 734 Model 172 732 Model L1 175, 733 Model i217 CHAPTER 8 81 Objective 222 82 Unemployment Data 222 83 Model Identification 229 84 Estimation 230 85. Diagnostic Checks 231 CHAPTER 9 73 Model 2 180 735 Model 3 184 736 Model 31 186 73.7 Summary 189 ‘74 Diagnostic Checks for Model 1 189 741 Overfiting and Underitting 190 742 Residual Analysis 195 743 Cumulative Pesiodogram 197 75. Forecasting 200 7.6 Conclusions 205 Appendix: Model A 206 7A. Model A. 206 7A2 Model A1 209 AS Model AL 211 7A Model ALLL 214 TAS Model AL2 217 UNITED STATES UNEMPLOYMENT 221, 851, Overfting 293 5.2 Residual Analysis 233 85. Cumulative Periodogram 237 86 Forecasting 238 8:7 Conclusions 243 THE CLOROX COMPANY CASE 245 CHAPTER 10 TRANSFER FUNCTION MODELS 257 10.1 Transfer Function Model 259 1102. Special Transfer Function Models, 263 CHAPTER 11 MODELS 267 41141 Cross Covariance and Cross Correlation Function 268 112. Sample Cross Covariances and Cross Correlations 270 103. Seasonal Transfer Function Models 268 104 Overview 255 IDENTIFICATION OF TRANSFER FUNCTION 113. Identification Using the Cross Correlation Function 272 1.34 Basic Hlentifcation Rules 272 4132 Stationarity 281 a Vi TABLE OF CONTENTS 113.3 Prewhitening the Data 286 1134 Ldentifcation of the Noise Model 292 1135 Several Explanatory Variables 292 ‘114 Example: Lydia Pinkham’s Vegetable Compound Data 293, CHAPTER 12 12.1 Transfer Function Model Estimation 302 122 Starting Values 303 123. The Lydia Pinkham Example 305, 124 Diagnostic Checks 306 1241 Residual Analyses—Theory 307 1242 Residual Analyser—Sampling Results 312 CHAPTER 13 MODELS 323 134 Explanatory Variables Forecasted 324 13.2 Interval Forecasts 326 CHAPTER 14 ‘41 The Intervention Model 334 14.2 Mdentfication and Estimation of an Intervention Model 338, 14.24 Univariate Intervention Model 338 APPENDIX A DATA BASES 349 APPENDIX B ‘Mathematical Expectation 358 B2 Variance and Covariance 359 APPENDIX C CA Preface 364 C2 Introduction 365, C21 Running the Time Series | Programs 363 C22 Conventions Common to All the Programs 368 C3 Data Input 369) References 405 1 Author Index 411 Subject Index 413. LAA Univariate ARIMA Model of the Advertising Data 294 = 1142 Cross Comelation Analysis 295 115 Summary 299 ESTIMATION AND DIAGNOSTIC CHECKING OF TRANSFER FUNCTION MODELS = 301 1243 Overspecified Model: Omitting Parameters 313 1244 Under Parameters 314 125. The Lydia Pinkham Frample—Diagnostic Checks “314 126 Summary 321 nd Model: iting Extea FORECASTING WITH TRANSFER FUNCTION 133, Conditional Forecasting 327 184. Lydia Pinkham Example 328 135. Conclusion 331 INTERVENTION ANALYSIS 333 34.22 Transfer Function Intervention Model 340 14.3. Example: Disectory Assistance in Cincinnati 343 144 Conclusion 347 MATHEMATICAL EXPECTATIONS 357 BS Contlations 361 TIME SERIES PROGRAMS: A PRIMER 363 CA. Univariate Time Series Programs 369 C41 Model Identification 369 C42. Model Estimation 375, C43 Model Forecasting 382 C5 Transfer Function Programs 386 C51 Model Identification 386 C52. Model Estimation 393 C53. Model Forecasting 400 PREFACE In the Preface to the 1973 edition of Kendall's (1976) Time Series, Kendall wrote that “iJn the last thirty years the theory of time-series has been trans- formed into a new subject.” An even greater transformation has taken place since Box and Jenkins published their Time Series Analysis: Forecasting and Control (First edition 1970). Since this publication the terms Box-Jenkins models and ARIMA (AutoRegressive Integrated Moving Average) models have become synony- mous. Indeed Box and Jenkins have really revolutionized this subject in devel- oping a practical approach for constructing and evaluating ARIMA time series models. ‘Although @ number of books have been published since the Box and Jenkins book appeared, none has been written at a level accessible to students, ‘and practitioners with very litle statistical training and none has been written presenting the Box-Jenkins models in a truly applied way. When I developed, in 1977, a Business Forecasting course for MBA students at the Harvard University Graduate School of Business Administration, the need for such a book became very clear. This textbook is an outgrowth of one section of this course. ‘The students or practitioners who work through this book will acquire confidence that they can effectively construct and use ARIMA forecast- ing models as well as transfer function and intervention models. Similarly, I LS viii PREFACE the practitioner will be able to recognize situations where these models can | be successfully used, and will be able to translate the results obtained from these models in decision actions. An added feature of this book are several chapters (Chapters 10-14) devoted to transfer function models and intervention analysis. Although these models are certainly more claborate (o construct and evaluate, the reader will be able to work through these chapters without any additional difficulties. These chapters clearly build on the material presented in earlier chapters. METHODOLOGICAL APPROACH In order to achieve the above objectives, I tried to build up the intuition of the time seties user. Therefore, where possible, I not only derive the important theoretical results, but immediately confront the reader with generated data that reproduced these results. For example, in Chapter 4 Luse data generated according to known ARIMA processes to reinforce the results derived in Chapter 3 and to build a bridge between theoretical and applied results, showing, in particular, the need for measures of uncertainty. In the transfer function chapters (Chapter 10 and following) because the methodology is | inherently more complex, I first show the important building blocks, relying on generated data, before deriving the theoretical generalizations. Another important methodological tool is the computer. Currently there are quite a number of commercially available Box-Jenkins computer pro- grams. I have developed a collection of highly interaetive computer time series programs under the name T'S. Inquiries about the package should be directed to Walter Vandacle, $115 34th Street, N.W., Washington, D.C. 20008, USS.A. A Primer to this computer package is available as Appendix © of this text. [strongly recommend that the readers have access to a time series computer package and work through the analysis of the many time series presented in the book. With this purpose in mind, I have listed in Appendix A all the data for the time series used in this book. Nobody should believe that itis possible to successfully construct time series models without touching, real data Because I am convinced that a hands-on experience is a most important, didactic tool I have not included the traditional exercises that can be found in a number of other textbooks. I find many of these exercises of very little real interest to the applicd-oriented audiences addressed by this book. In order to encourage this hands-on approach, I have devoted three chapters to the analysis of real data. Chapter 7 is written in a semiprogrammed text format, I ask the readers to analyze the data on U.S. residential construction Jointly with me. This chapter should acquaint readers with the type of judg- ‘ment required to formulate Box-Jenkins models. In Chapter 8 T summarize ny analysis of the U.S. unemployment, again asking readers to use the com- puter to evaluate their own models. Finally, Chapter 9 is a case study presented OVERVIEW OF THE BOOK IX for the purpose of solving a real decision problem. In this chapter readers must construct time series models and combine these with a linear program- ming problem to formulate a managerial solution to the underlying decision problem, OVERVIEW OF THE BOOK ‘The first 9 chapters are devoted to the construction of univariate time series models. Chapters 10 (o 13 cover the transfer function models, and in the final chapter the intervention model is presented. Chapter 1 contains an overview of the whole book and introduces the univariate and transfer fimction models, It also contains an overview of the many fields in which these time series models have been successfully used. In Chapter 2, the notion of stationarity is introduced, the importance of working with stationary data is discussed, and how data can be made station- ary is shown. I rely extensively on a quarterly U.S. gross national product seties to illustrate intuitively the important concepts. In Chapter 3, a taxonomy : of the most important ARIMA models and their properties is presented. The next chapter is devoted to the first prong of the Box-Jenkins time series ‘methodology, the identification phase. After identifying some possible models that could have generated a particular data series, the reader is then presented | in Chapter 5 with the estimation and diagnostic checking methodology. Finally in Chapter 6, how to construct optimal time series forecasts is discussed. Chapters 7, 8, and 9 are devoted to the analysis of real time series. ‘These chapters should serve the important function of reaffirming that the reader can, after having worked through the earlier chapters, effectively con- | struct ARIMA models. ‘Chapters 10 through 13 present the transfer function models. In many forecasting situations, other variables (events) will systematically influence the seties to be forecasted and therefore there is a need to go beyond a univariate time series model. Chapter 10 introduces the transfer function model as a model that incorporates more than one time series and takes ‘explicitly into account the dynamic characteristics of the system, As with the univariate model building process, the transfer function model construction involves three stages that are similarly labeled identification, estimation and i diagnostic checking, and forecasting. Chapter 11 presents identification and shows that the cross correlations form an important new tool. Chapter 12 is devoted to estimation and diagnostic checking and demonstrates that most of the tools are identical to those presented in Chapter 5. Finally in Chapter 13, discuss how to forecast witha transfer function model, These last chapters again illustrate the wansfer function methodology with the use of a sales- advertising example. ‘The final chapter of the book shows that the intervention model is nothing X PREFACE but a structured dummy variable model of which the underlying model is cither a univariate ARIMA model or a transfer function model. The methodol ogy is illustrated with data on directory assistance calls before and after the introduction of a service charge. Appendix A contains all the data explicitly used in this book. In Appendix B a basic explanation of the expectations operator is presented. Finally, Ap- pendix C is the Primer to the Time Series package of interactive computer programs that I developed. A PC version is in the testing phase and should be available by mid-1990. The package is available from Walter Vandaele, S115 34th Street, N.W,, Washington, D.C., 20008, US.A. INTENDED AUDIENCE ‘The book is intended for students, researchers, and practitioners who have hhad lite statistical training but either want to study these new important time series methods or are confronted with decisions based on these methods, Of course, those with more rigorous statistical training but unfamiliar wit the Box-Jenkins time series models will certainly benefit from studying this book. I have primarily written this book with an applied-oriented audience in mind. T refer to the above section on the Methodological Approach for suggestions on how to derive the greatest benefit from the text. ‘This book can be used in an advanced undergraduate course or beginning graduate course on time series forecasting. Such a course could be part of a business school, department of economics, or engineering curriculum, Social science departments (education, psychology, public health, medicine) have also started to introduce such a course in their curriculum and researchers in these fields are using the time series methodology covered in this book in their applied work. No matter who makes use of this book, it is important to remember that its orientation is an applied one. Therefore, I again strongly encourage readers to develop a hands-on experience using real data encountered in their respective fields of study or work. Walter Vandaele Washington, D.C. Lincoln's Birthday, 1983 _ ACKNOWLEDGMENTS ‘This book would not have been written if, in the Spring of 1977, I had not chosen to teach a course on Business Forecasting for MBA students at the Harvard University Graduate School of Business Administration, In Septem- ber 1979 I continued to work on this book while I was associated with the Center for Computational Research in Economics and Management Science (CCREMS), Sloan School of Management, MIT and with the Department of Economics, Harvard University. The manuscript was completed during the late and wee hours of the days (more late than wee) while I was Economic Advisor to the Director, Bureau of Competition, Federal Trade Commission. Talso used the manuscript of the book at a series of time series seminars that I conducted for audiences of business and government managers as part of the Educational Program of Data Resources, Inc. ‘The transfer function chapters were initially written for these seminars. I benefited enormously from this interaction with the business community. Certainly these seminars formed a fruitful laboratory for my ideas on how to make these methods understandable to an applied-oriented audience. Tam indebted to a great many colleagues at the Harvard Business School, CCREMS at MIT, and the Department of Economics, Harvard University. In particular F want (o thank John Pratt, Robert Schlaifer, and Art Scbleifer ———————— CO WNW xii ACKNOWLEDGEMENTS of the Harvard Business School, and Mark Watson of the Department of Economics at Harvard for reading and commenting on various drafis of the manuscript. I also want to acknowledge the help reccived from John Sch and Dan O'Reilly of Data Resources, Inc. They participated in the DRI Time Series seminars as instructors and they helped me clarify my thinking in innu- merable ways. In addition E. W. Swift, Georgia State University and H. H. Stokes, University of Illinois at Chicago Circle as well as several referees ‘made many important suggestions. Only I can be blamed for not following all these suggestions. Furthermore, I want in particular (0 thank Robert Schlaifer for discussing with me how to write an efficient interactive computer package and for subsequently answering the many questions that I had about the programming of my interactive Time Series package. Without having access to the very proficient program library written by Robert Schlaifer, the writing of this interactive package would not have been possible. From my former students, I want to specially recognize Sergio Koreisha, now at the University of Oregon. We had numerous discussions about the most effective way to present this material. He developed the idea of writing Chapter 7 in a programmed text format. Also under my guidance he wrote Chapters 8 and 9. I want to thank hin for his contribution to this book. ‘would also like to acknowledge my intellectual debt to Charles R, Nelson of the University of Washington at Seattle, one of the first 10 introduce me to Box- Jenkins analysis. Another interactive version of proprietary Box-Jenkins programs, including IDENT, ESTIMATE, FORECAST, CROSSCORR, ATRANSEST, and TRANSFOR, is available from Chatles R. Nelson Associates, Inc., 4921 Northeast 89th Street, Seattle, Washington 98105. My wife Annette, to whom this book is dedicated, owns my thanks for her love. She is always able in a few words to bring matters down to earth. Several secretaries have repeatedly typed different versions of the chap- ters and in doing so they brought the manuscript to its final form. Nancy Hayes, Cheryl Levin, Martha Laisne, Kate Doyle, and in particular Karen Glennon are due many thanks. Michelle DuPree read the whole manuscript and provided invaluable editorial help. I also greatly appreciate the effort and guidance of Susan Elliott Loring, Senior Editor, and Georgia Lee Hadler, Project Editor at Academic Press. Permission from Holden-Day to adapt from Box and Jenkins (1976) Time Series Analysis: Forecasting and Control, Figures 5.9(b) and 8.3(a) and Appendix ‘A9.1, and from Nelson (1973) Applied Time Series Analysis for Managerial Forecast- ing, Figures 5.1 and 5.2, is gratefully acknowledged. (Of course, I assume the usual responsibility for those errors that une doubtedly still lurk somewhere in the book, w.V. CHAPTER 1 INTRODUCTION 11 THE IMPORTANCE OF A GOOD |L5 OBJECTIVES OF TIME SERIES ANALYSIS / 8 FORECASTING SYSTEM | 2 16 UNIVARIATE AND MULTIPLE TIME SERIES 112 THE SYSTEMATIC APPROACH / 2 MODELS / 8) 1.3 WHAT IS A TIME SERIES? | 3 1L7 ‘THE UNIVARIATE BOX-JENKINS TIME ‘SERIES METHODOLOGY / 9 A EXAMPLES / 5 ES = 2. INTRODUCTION 1.1 THE IMPORTANCE OF A GOOD. FORECASTING SYSTEM ‘The managerial need for accurate and reliable forecasts can certainly not be denied. Every day inventory, production, scheduling, financial, and market- ing decisions are made which depend on projections of future sales. Inaccu- rate forecasts can have serious disruptive effects on business operations. For ‘example, if demand for a product in a certain region of the country is underes- timated, then production schedules of the plant supplying this territory and possibly those of nearby plants will have to be modified to accommodate the unanticipated demand, or otherwise sales will be lost. Additional overtime and interplant shipping costs may be incurred. Special and more costly orders may have to be placed to assure continuation of production as raw material supplies become insufficient to meet demand. Furthermore, permanent losses in sales may result ifthe lead times involved in placing new orders and receiv- ing shipmemts from the other plants are too long. Overestimation of demand ‘can also be costly. Investment opportunities may be missed because unneces- sary capital is tied up in inventory and warehousing costs. Aside from the need for accurate and reliable forecasts, itis also essential for a company to operate with a uniform set of projection figures. Complica- tions are bound to arise if the marketing department bases its promotion and advertising plans on forecasts which are different from the ones used by the production department (o set manufacturing schedules, Uniformity in forecasts can be achieved by arranging meetings with the department heads responsible for formulating projections in order to obtain a consensus on the set of figures to be used throughout the company. It should be noted, however, that forecasts based solely on executive opinions may be influenced by political factors such as the necessity to compromise in order to be in line with upper management's disposition toward various interrelated matters. Yet forecasts generated using only mathematical models ‘cannot generally assimilate all the managerial information, such as possible ‘changes in production and delivery schedules caused by strikes or equipment failure. A good forecasting system is one in which systematically derived fig ures can be altered, if necessary, using available managerial expertise. It is also important to realize that the implementation of any forecasting system will create conflicts in many company settings. A forecasting system, if not properly introduced, will be foreign to the existing decision-making process. 1.2 THE SYSTEMATIC APPROACH ‘There exist many methods and approaches for formulating forecasting mod- cls, but in this book we will deal exclusively with the time series forecasting model. In particular, we will discuss the Autoregressive Integrated Moving, WHAT IS A TIME SERIES? 3. ‘Average (ARIMA) models described by George Box and the late Gwilym Jenkins in their 1970 book entitled Times Series Analysis: Forecasting and Control. ‘Although time series models have been studied for many years, Box and Jenkins popularized their use, demonstrated how to extend their application to scasonal data, and made the methodology operational. Their contribution to the field of time series analysis cannot be overestimated. ‘The Box-Jenkins approach possesses many appealing features. It allows the manager who only has data on past years’ sales to forecast future sales without having to search for other time series data such as consumer's income, prices, ete, However, the Box-Jenkins approach also allows for the use of several time series to explain the behavior of another series if these other ie series data are available. In the next section we will discuss what a time series is, introduce some of the objectives of time series analysis, and present the distinction between univariate and multiple time series models. Throughout the chapter we will use examples to illustrate the concepts introduced. 1.3 WHAT IS A TIME SERIES? A tine series is a collection of observations generated sequentially through time. The special features of a time series are that the data are ordered with respect to time, and that successive observations are usually expected to be dependent. Indeed, it is this dependence from one time period to another which will be exploited in making reliable forecasts. The order of an observation is denoted by a subscript &. Therefore, we denote by the th observation of a time series. The preceding observation is denoted by zen, and the next observation as z+ Ic also will be useful to distinguish between a time series process and a time series realization. The observed time series is an actual realization of an underlying time series process. By a realization we mean a sequence of observed data points, and not just a single observation. The objective of time series analysis is to describe succinctly this theoretical process in the form of an observable model that has similar properties to those of the process itself, In this book we will analyze measurements or readings made at predeter- mined and equally or almost equally spaced time intervals? to generate hourly, 1A revised edition was published in 1976, 45ee, 4, Wold (1954) and Quenouille (1957) for a discussion of earlier work on time series * Lite theoretical work has been done on time series observed at unequal intervals. We can refer the reader to Quenouile (1058) on autoregressive series, and to Granger (1963) and {Cleveland and Devlin (1980, 1982) on the elfect of varying month length on economic series. 4 nrropuction daily, monthly, or quarterly data. The calendar month is an example where the interval between observations is not quite constant. Such time series are called discrete time series, in contrast with continuous time series, which exis at every point in time. An example of a continuous time series is the tempera- ture in a given place. iscrete time series can arise in several ways. Given a continuous time series, itis possible to construct a discrete time series by taking measurements, at equally spaced intervals of time. Granger and Newbold (1977) define such series as instantaneously recorded, Examples of this type of series are daily tem- erature readings at 3:00 past, and the Dow Jones index at closing time on successive days. Note again that because the New York Stock Exchange is closed on weekends, the time interval between successive daily observations of the Dow Jones index is not quite constant. Alternatively, discrete time series can arise by accumulating or aggregating a realization for a predeter- mined time interval. These are accumulated seris, as described by Granger and Newbold (1977), Examples are monthly rainfall, quarterly industrial pro- duction, daily miles traveled by an airline, and monthly traffic fatalities. Notice that monthly traffic fatalities are actually an aggregation of discrete events. ‘Therefore, although we do not formally analyze continuous time series, some- times continuous data can be transformed into discrete data, which can then, be analyzed by the methods presented in this book. In many cases the data will exhibit a number of nuisance effects. The difference in the lengths of months is one such case; the fact that a month may include either four or five weekends will influence the observed data Points. Movable feasts and holidays contribute their share of confusion, espe- cially Easter, which may fall either in the first or the second quarter of the year. (We refer to footnote 3 for references dealing more specifically with these issues.) Strikes also introduce aberrations into the series. In many situations the “raw” data may have to be “cleaned up” before ‘we can start a formal analysis, The following are a few ways to accomplish this task: 1. A certain measure of comparability may be obtained for calendar month production data by correcting the figures to correspond to a standard month of 30 days, e.g., by multiplying the production in February by 30/28. 2. Short-term effects may sometimes be eliminated by aggregation. Varia- tions due to a movable Easter can be eliminated by working with month rather than with three-month periods. 3. Data relating to nominal values may be divided by some index measuring changes in money values in order to create constant-value series (ie. series in real terms). No hard-and-fixed rules exist for when and how the data should be cleaned, or preprocessed, or for when to use the raw data. The analyst has EXAMPLES 5 ‘o consider the purpose of the forecast. If there is a need for quarterly fore- casts, it makes no sense to use semi-annual data in order to eliminate the Easter problem. It is also unnecessary to clean up all the series we want to analyze. We may want to leave the monthly sales data unadjusted for the different lengths of the calendar months but allow the seasonal index to include the effect of changes in length from one month to the other, In other cases, the user may be confronted with possible outliers in the data that would play havoc in analyzing such a series. Sometimes the user may be able to identify such outliers because their cause is known, such as t a strike or holiday effect. In other cases the user may feel uneasy in identifying such outliers. In both cases the methods to deal with these problems are beyond the scope of this book. For more details the reader is referred to Cleveland et al. (1978), Kleiner et al. (1979), and Martin et al. (1983). 1.4 EXAMPLES Time series occur in a variety of fields. Some examples are: Business Successive weekly and/or monthly sales figures. Figure 1.1, taken from Chatfield and Prothero (1973), presents the monthly sales figures of an engineering product from 1965 to May 1971.4 The seasonality of this monthly series is very typical. November sales are high each year, and April and May sales are low. In addition, the series shows evidence of a trend; although the sales mean over the whole period equals 298.4 (sce the dotted line in Figure 1.1), we would certainly predict a higher sales level for 1972 | than for 1968. Other business examples include the number of tourists visiting Hawaii (Geurts and Ibrahim, 1975), the hourly electricity demand (Brubacher and Wilson, 1976), and the monthly total airline miles flown in the United Kingdom (Montgomery and Johnson, 1976). * Economies Agricultural commodity pricing (Leuthold et al., 1970), the mar- | ket yield on U.S, government three-month Treasury Bills (see Figure 12), ‘monthly export totals, monthly employment and unemployment statistics, the Federal Reserve Board Index of Industrial Production, and money supply statistics are but a few of many other examples. ™ Sociology Crime statistics (Boston armed robberies—see Deutsch and Alt, 197), suicide rates, birth rates, and divorce rates are typical sociological examples. " Physics and Engineering Numerous time series occur in the branches of the natural sciences: meteorology (Ledolter, 1976), marine science, and geo- physics. Examples are wind velocity, rainfall statistics, degree-days by month, i , ' 1 aatigs ang enumerated in Appendix A. The dotied tine in the main body of Figures 1-1= 1-4 indicates the position ofthe Sample mean of the data, 1968 Year 196519661967 1969 1970—«19T7T EE FIGURE 11 Monthly Sales, January 1965-May 1971. (From Chatfield and Prothero, 1973, and solar activity (see Morris, 197, for an analysis of the sunspot data). ‘An example of engineering data is the yield data from a batch chemical process (Jenkins and Wats, 1968). This yield data is represented in Figure 1.3. In contrast with the Chatfield and Prothero sales data in Figure 1.1, this series does not show any signs of a trend, The mean of the series, therefore, could be used as an initial forecast of the level of the series (see the dotted line in Figure 1.3). ' FIGURE 1.2 Market Yield on U.S. Government Three-Month Treasury Bills, January 1956- December 1968. (From Federal Reserve Board.) EXAMPLES 7 1020 30 a0 3060 70 Batch number FIGURE 1.3. Chemical Batch Process Yields. (From Jenkins and Watts, 1968.) = Medicine and Public Health Epidemiological statistics (See Figure 14. for the reported cases of rubella in the United Statest), immunization statistics, jimmunogenetics, electrocardiograms, and electroencephalograms are some of the more familiar examples in the fields of medicine and public health FIGURE 14 Reported Cases of Rubella, 1966-1968, biweekly. (From Montgomery and Johnson, 1976.) 20007 —y T T T 1966 1967 1968 The reported cases of rubella by two-week periods in Ohio, Indiana, Minois, Michigan, and Wisconsin have been collected by the Genter for Disease Control, Atlanta, Georgia, and are reported in Montgomery and Johuson (1976). 8 INTRODUCTION 1.5 OBJECTIVES OF TIME SERIES ANALYSIS. ‘Since the reasons for studying time series often determine the choice of meth- ods to use, it is helpful to be given an overview of some study objectives: 1. to obtain a concise description of the features of a particular time series process; 2. to construct a model to explain the time series behavior in terms of other variables and to relate the observations to some structural rules of behiay- 3. based on the results of (1) or (2), to use the analysis to forecast the behavior of the series in the future based upon a knowledge of the past. From (1) we assume that there is sufficient momentum in the system to ensure that past and future behavior will be the same. From (B) we have more insight into the underlying forces of the time series process and can exploit these to obtain more accurate forecasts; and 4. to control the process generating the series by examining what might happen when we alter some of the parameters of the model, or by estab- lishing. policies that intervene only when the process deviates from a target by more than a prescribed amount, In this book we primarily will concentrate on the objectives specified in (1), @), and (8). However, in Chapter 14 we also will discuss the intervention model, which will allow us to evaluate the effect of interventions. 1.6 UNIVARIATE AND MULTIPLE TIME SERIES MODELS. Aside from the distinction between discrete and continuous time series mod~ ‘ls its also important to classify time series models according to the number Of vatiables included in the model. A time series model consisting of just ‘one variable is appropriately called a univariate time series model, A univariate time series model will use only current and past data on one variable. For ‘example, if we forecast the unemployment rate next month oF two months from now using a univariate model, we could only use current and past unem- ployment data. Implicit in the formulation of such a model is the assumption that factors which influence unemployment have not changed or are not ex: pected (0 change sufficiently to warrant introducing these explicitly into the model. ‘A time series model which makes explicit use of other variables to describe the behavior of the desired series is called a multiple time series model. The model expressing the dynamic relationship between these variables is called 1 transfer function model. The terms transfer function model and multiple time Series model are used interchangeably. A transfer function model is related to ‘THE UNIVARIATE BOX-JENKINS TIME SERIES METHODOLOGY 9 the standard regression model in that both have a dependent variable and ‘one or more explanatory variables. But, as will be shown in Chapter 10, a transfer function model can allow for a richer dynamic structure in the relation- ship between the dependent variable and cach explanatory variable, and be- tween the error term. ‘The usefulness of such a transfer function can be better appreciated if we consider an example. Suppose that aside from unemployment data, we also have data on the money supply. Then, by constructing a transfer function model, we could exploit the dynamic relationship between unemployment and the money supply. For instance, if we discover that a change in the money supply this month will trigger a response in the unemployment situa- tion two months from now, we may then be in a much better position than, if we had just used a univariate model to predict future unemployment. Finally, a special form of transfer function model is the intervention model. ‘The special characteristic of such a model is niot the number of variables in the model, but that one of the explanatory variables captures the effect of an intervention, a policy change, or a new law. ‘The next eight chapters will be devoted to univariate time series models. ‘The univariate ARIMA model will also be the cornerstone of the transfer function analysis. Chapters 10 through 13 will discuss the transfer function model. The intervention model will be treated in Chapter |. 1.7 THE UNIVARIATE BOX-JENKINS TIME SERIES METHODOLOGY In the next eight chapters we will be discussing the univariate ARIMA analysis, commonly called the Box-fenkins approach. This Box-Jenkins approach for analyzing time series data consists of extracting the predictable movements from the observed data. The time series is decomposed into several compo- nents, sometimes called filters, ‘The Box-Jenkins approach primarily makes use of three linear filters: the autoregressive, the integration, and the moving average filter. If we think of these filters as being special types of sieves, then the Box Jenkins method can be viewed as an approach by which time series data are sifted through a seties of progressively finer sieves. As the data pass through cach sieve, some characteristic component of the series is filtered out. This process will terminate when what continues to go through the sicves is judged to be so fine that no additional information can be filtered out of it, Figure 1.5 contains a very schematic representation of how the observed data, 2, is transformed by the ARIMA filters. All the terms used in Figure 1.5 will be carefully explained in the next chapters. After applying the integra- tion filter to the observed data, we obtain a filtered series, 1. Next, the FIGURE 1.5 Moving cr ier average jaar ‘a Filter | integration |__| Autoregressive |e The ARIMA Model. autoregressive filter produces an intermediate series, ¢, and finally the moving average filter gencrates random noise, a. The objective in applying these filters is to end up with random noise which is unpredictable. ‘The natures of the different sieves and the grid sizes of the sieves are all the information we need to describe the behavior of the time series. Indeed, finding the number and nature of the filters is equivalent to finding the struc ture, identifying the form, and constructing the model for the series. The Box_Jenkins method provides a unified approach for identifying which filters are most appropriate for the series being analyzed; for estimating the parame- ters describing the filters (i.e., for estimating the grid sizes of the sieves), for diagnosing the accuracy and reliability of the models that have been esti- mated, and, finally, for forecasting. In the next chapters we shall describe the details of each of these filters and the process of formulating such Box-Jenkins models. In Chapter 2 we will discuss, using an example, the integration filter which is closely related to the concept of trend, Chapter 3 will give an overview of the different ARIMA models and their characteristics. In Chapter 4 we will present the identification strategy. Chapter 5 is devoted to the estimation of the ARIMA models and the important model diagnostic checking. Chapter 6 presents the forecasting. Chapter 7 is an overview chapter allowing the reader to verily the understanding of the material presented in earlier chapters. Chapter 7 is written in a programmed text format, The effectiveness of the BoxJenkins technique is examined in Chapter 8. Finally, Chapter 9 concludes the univari- ate part of the book with a case study. STATIONARITY 2.4 CALCULATING THE MEAN / 12 22 STATIONARITY / 13 2.3 NONSTATIONARY DATA: A TIME SERIES PLOT / 16 2.4 STABILIZATION OF THE VARIANCE / 18 25 REMOVAL OF THE TREND / 20 2.6 SEASONAL FLUCTUATIONS / 25 2.7 SUMMARY / 29 irene 12. STATIONARITY In this chapter we analyze the integration filter presented in Figure 1.5. To focus the discussion, a modified section of this figure is represented in Figure 21. In discussing the integration filter we will define a related concept called stationarity, and indicate how to transform nonstationary data into stationary data. We shall begin by presenting some motivating issues for explaining this stationarity concept 2.1 CALCULATING THE MEAN How would you calculate the mean or average of a time series of a specified length? Calculating the mean of a sequence of observations might appear to be a trivial problem, as we would just add all observations and divide this total by the number of observations. However, if the series is steadily increasing over time (L., shows evidence of a trend!) and we make decisions based on this mean, we would certainly not want to select the same value for the start as for the end of the series. We would be hard pressed to claim that the dotted line representing the mean of the monthly sales data from Chatfield and Prothero (see Figure 1.1), or of the market yield on U.S. govern- ment three-month Treasury Bills (see Figure 1.2), would be a good forecast of the future level of each series. Indeed, if we regard the observed series ‘as one realization of all possible series that could be generated by the same mechanism for that same time interval, we have only a sample of size one. We are therefore faced with the difficult task of estimating a mean for each time period based on one observation for that time period, and, clearly, with the even harder task of estimating variances and autocorrelations.? ‘The observed value of the series at a particular time period should be viewed as a random value; that is, if a new realization could be obtained under similar conditions, we would not obtain the identical numerical value. Let us measure at equal intervals the thickness of a steel wire made on a continuous extraction machine? Such a list of measurements can be inter~ preted as a realization of wire thicknesses. If we were (o repeatedly stop the process, service the machine, and restart the process to obtain new wires ‘under similar machine conditions, we would be able to obtain new realizations from the same stochastic process. These realizations could be used to calculate the mean thickness of the wire after one minute, two minutes, ete. The term !We define a fd as any systematic change in the level of a time series. In Seeti will present a more detailed discussion of end and trend removal methods 2 The autocorrelations (to be defined below) are crucial elements in deciding which process is generating the data This example is taken from Granger and Newbold (197). 2.5 we FIGURE 2.1 stationarrry 13 Integration Filter Stationarity Operator ‘The Integration Filter. stochastic simply means random, and the term process should be interpreted as the mechanism generating the data, Let us denote the realizations by 3, j = 1,2... . J 2, ‘n; with J being the number of realizations, and m the (otal time length of each realization. Therefore, we really have J time series of the same process. ‘Then a possible estimator of the mean of the wire thickness after ¢ time units, denoted as fir, is defined as fh ile 2) However, in most situations we can only obtain one realization.* For example, we cannot stop the economy, go back to some arbitrary point in time, and then restart the economic process to observe a new realization. With a single realization, we cannot estimate with any precision the mean at each time period ¢, and itis impossible to estimate the variance and autocor- relations, Therefore, to estimate the mean, variance, and autocorrelations parameters of a stochastic process based on a single realization, the time series analyst must impose restrictions on how the data can be generated. 2.2 STATIONARITY In the previous section we indicated that it is not advisable to estimate the mean for each time period based on just one realization of a general stochastic process. But if there is no trend in the series, we might be willing to assume that the mean is constant for each time period and that the observed value at each time period is representative of that mean. We could then estimate the mean of the series by averaging over all the data as expressed by the following standard formula: pat $.. 2.2) ‘For work on repeated measurements on the same process, sce Anderson (1978), 14. stanionaRrry Notice that we dropped the subscript j from z because we have only one realization. Thus, to assume that the observed value at each time period is representative of the mean value, we must restrict the mean of the series to be constant. For the chemical batch process yield data represented in Figure 1.3, such an assumption could be quite plausible. This assumption is just one of the conditions for stationarity. In Section 2.5 we will discuss what can be done if this assumption is violated. A second condition for stationarity is that the variance of the process be constant? The variance of the series expresses the degree of variation around the assumed constant mean level and as such gives a measure of uncertainty around this mean. If the variance is not constant but, say, increases as time goes on, it would be incorrect to believe that we can express the uncertainty around a forecasted mean level with a variance calculated based on all the data. If we were (0 do so, we would really have a kind of average variance which would deflate the uncertainty around the forecasted valucs. We will therefore impose that the series has a constant variance.t In Section, 2.4 we will present some transformations that in certain situations will make it more plausible to assume @ constant variance for a transform series. Finally, we must also impose a condition on the nature of the correlation between data at different time periods. ‘The autocorrelation measures the correlation’ between an observation at time 4, and an observation at time 5,%, Given that this is a correlation between observations of the same series at different time periods, it is appropriately called autocorrelation. Given the values 21, 22) 23). « »%, the autocorrelation between 2 and z+; measures the correlation between the pairs (21, 22). (2a, ta)... +5 Cnt» Za) and is denoted by pi. Likewise, the autocorrelation between % and +2 equals.the correlation between the (n — 2) pairs (2, 29), (22, 24), - - - » Gana, ta), and, is similarly denoted by p2. In general, pr measures the correlation between. pairs of observations & periods apart. If, for example, we have for pi a value that is strongly positive, dose to H, and we currently have observed a value for 2 above the mean value of the series, we would again expect the next value to be above the mean, Similarly, if px is strongly negative, close to —1, we would expect the next value to be below the mean if the current value is above the mean, It should + Fora definition and some properties of a variance, see Appendix B, “This assumption is similar o the assumption of homoscedasticity of the disturbance cerms in regression analysis. 11 will sufice for now to know that sf two series are strongly positive correlated, then when fone series inereases (decreases) we expect to see that the other series also increases (decreases); that if to series are strongly negative correlated, then when one series increases (decreases) ‘we expect to see thatthe other series decreases (increases); and that iftwo series are not correlated, ‘here will be no relationship between one series increasing (decreasing) and the other. A precise ‘definition of autocorrelation is given in Appendix B. In Chapter 3, we will explicly calculate the autocorrelations of several ARIMA time series processes STATIONARITY 15 be intuitively clear that autocorrelations will play an important role in forecast ing time series. Let us now divide the series in two halves and calculate the autocorrela- tions for the first half as well as for the second half of the series. If we observe really different values, say for pi, based on the first half and based on the second half, we then should not use the autocorrelations for the first half in making predictions for the future but we should rather rely on the second half only. And, similarly, the p: based on the whole sample, which can be viewed as a kind of average autocorrelation over the whole series, would be misleading. We therefore will impose the condition that an autocor- relation should not depend on which segment of the data is used to calculate the correlation. That is, we assume that the autocorrelations between z and are independent of the t and s and are only determined by the lag between Cand s, Finally, we remark that if the autocorrelations depend only on the time interval between two points, then pis = pre and in particular, p-1 = Pr. In other words, the autocorrelation pr is the same whether the series is lagged forward or backward. Because the autocorrelations are symmetrical about lag zero, only the autocorrelations for positive values of k need to be exam- ined. Let us now summarize these three conditions. A process 2 (= 1, 2, ‘nis stationary® if mean? of = Ei: = 23) variance of a= Ella — pw)" = 0? ay and autocorrelation (24,%) = El(ze — p)(% ~ w)V/o? = pe-ee ‘That is, a time series process is stationary if the mean and the variance are constant over time (and both are finite) and if the autocorrelation between values of the process at two time periods, say # and s, depends only on the distance between these time points and not on the time period itself, (We arbitrarily assume that ¢ > 5.) In this section we have tried (o make it intuitively clear that if the process stationary, we can meaningfully estimate the mean, variance, and autocorre- lations from just one realization. In fact, we can then estimate the mean, variance, and autocorrelations of the process using methods as ifa realization * Strictly speaking, we define weak or covariance stationarity, also sometimes called stationarity in mean and variance A stronger form of stationarity refers to the joint distribution of a set of random variables. If Pron a,.-- sm) isthe joint distribution of any set of m consecutive £5, chen the series is strictly stationary if Fis independent of forall w > 0. The distribution ‘of any set of m consceutive observations is the same wherever in the series we choose those m ‘observations. See Appendix B for an explanation and propertics of the expectation operator E- 16 STATIONARITY of length n constituted n samples of the same process. For example, the formula (2.2) constitutes a valid estimator of the mean. ‘Since most economic and business series are not stationary (trends are frequently present in such series), our next task is to show how nonstationary series can be transformed into stationary series, In explaining in greater detail nonstationarity and the possible transformation, we will explicitly rely on an economic time series example. 2.3 NONSTATIONARY DATA: ‘A TIME SERIES PLOT “The first step in any time series analysis should be to plot the available observa- tions against time. This is often a very valuable part of any data analysis since qualitative features such as trend, seasonality, discontinuities and outliers will usually be visible if present in the data. Although the desirability of | making such a plot at the outset is self-evident, we still come across studies in which an incorrect conclusion was reached because the data had not been plotted. In this and subsequent scetions we will make explicit use of a quarterly gross national product (GNP) data series. Figure 2.2 contains a plot of the GNP of the United States series from the first quarter of 1947 through the | fourth quarter of 1970. The data, seasonally unadjusted, are in billions of | ‘current dollars and, contrary to the convention, are expressed in quarterly tates rather than in annual rates (quarterly rates multiplied by four). | Examining this plot, we observe that for the entire 23-year period the GNP has been steadily growing, and because of this trend this series is there- fore nonstationary. Changes in population, industrial and agricultural pro- ductivity, as well as changes in the standard of living are just some of the factors which have contributed to this long-term trend. ‘Also apparent from the graph is a recurring pattern within cach year. ‘The GNP data for the fourth quarter of each year is always higher than the data for the previous three quarters. The occurrence of such a seasonal pattern is gencrally attributed to a pickup in sales during the Christmas season. Sea- sonal variations, however, are also related to climate and customs. For examn- ple, unemployment among building and catering workers tends to be highest fn the winter, while sales of gasoline in a summer resort area tend to be higher during the summer months. ‘Seasonal variations in general recur with a high degree of regularity 1» Outliers are somewhat harder to visualize in time series data because of the dependence of tne observation on another. Indeed time series outliers could be located in the “middle! of the series, See Martin et al. (1983) ‘1-This example fa from Roberts (1974) The data are listed in Appendix A, Table A.

You might also like