You are on page 1of 15

Fuzzy Sets and Systems 64 (1994) 279-293 279

North-Holland

A comparison of fuzzy forecasting and


Markov modeling
Joe Sullivan and William H. W o o d a l l
Department of Management Science & Statistics, University of Alabama, Tuscaloosa, A L 35487-0226, USA

Received September 1993


Revised November 1993

Abstract; Fuzzy time series models were introduced by Song and Chissom [Fuzzy Sets and Systems 54 (1993) 269-277, 54 (1993)
1-9, 62 (1994) 1-8] to model and forecast processes whose values are described by linguistic variables. Song and Chissom used as
an application the forecasting of educational enrollments. This paper reviews the two methods set forth by Song and Chissom, a
first-order time-invariant fuzzy time series model and a first-order time variant model. These models are compared with each
other and with a time-invariant Markov model using linguistic labels with probability distributions. The results of these methods
for the enrollment data are compared with three traditional time series models, a first-order autorgresssive (AR(1)) model and
two second-order auto-regressive (AR(2)) models, all of which are time-invariant.

Kevwords: Fuzzy time series; fuzzy sets; linguistic variables: time-variant models; time-invariant models: Markov chains.

1. Introduction

Song and Chissom [7] propose fuzzy time series models and fuzzy forecasting to model and forecast
processes whose observations are linguistic values. They illustrate the methodology by forecasting the
enrollment at the University of Alabama from 20 years of data. Song and Chissom [8] (SC Part I) use a
time-invariant fuzzy time series model, while Song and Chissom [9] (SC Part II) use a time-variant
model for the same problem. In their example the crisp data were fuzzified into linguistic values to
illustrate the fuzzy time series method using fuzzy set theory. Ruimin [6] and Watada [11] give methods
for fuzzy time-series analysis based on fuzzy regression models using time as the independent variable.
In Section 2 we review and contrast the time-invariant model proposed in SC Part I and the
time-variant model proposed in SC Part II.
SC Parts I and II assert, 'all ]traditional forecasting] methods fail when the historical enrollment data
are composed of linguistic values'. However, a Markov model, described in Section 3, can use linguistic
labels directly, but with the membership functions of the fuzzy approach replaced by analogous
probability functions. Traditionally, the parameters of a Markov model are estimated from observations
in which the state occupied is known with certainty. We show that observations for which several states
have non-zero probability, corresponding to memberships in fuzzy sets, may also be used. The Markov
model is applied to the example used by Song and Chissom with slightly increased accuracy.
Both the fuzzy forecasting and Markov models use the linguistic values directly to produce a 'fuzzy'
forecast that, in turn, is 'defuzzified' into a numerical point estimate. An alternative procedure would
simply defuzzify the linguistic data into numerical values, then use traditional time series methods to

Correspondence to: Prof. W.H. Woodall, Department of Management Science and Statistics, 300 Alston, University of
Alabama, Tuscaloosa AL 35487-0226 USA.

0165-0114/94/$07.00 1994~Elsevier Science B.V. All rights reserved


SSDI 0165-0l 14(94)00039-A
280 J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling

produce numerical forecasts. This approach is addressed in Section 4, using the numerical values for the
enrollment example.

2. Fuzzy time series concepts

As Song and Chisson point out, the fuzzy time series model deals with situations where the data
are linguistic values, in contrast to the conventional time series approaches that typically manipulate
crisp, numerical data. The procedure is illustrated by forecasting enrollment at the University of
Alabama. First we review the example, contrasting differences between the two methods given in the
papers.
In this example the data are available in crisp form and must be fuzzified before the fuzzy time series
methodology can be used. In both papers the fuzzification process starts by defining the universe of
discourse U that contains the historical data and upon which the fuzzy sets are defined. This was chosen
to be U = [13000, 20000]. Selection of U is important because, for example, it is not possible for a
forecast to be outside this interval. Next U is partitioned into intervals of equal length, referred to as ui.
For this example the intervals chosen were: u~=[13000,14000], u2=[14000,15000], u3 =
[15000, 16000], u4 = [16000, 17000], u5 = [17000, 18000], u6 = [18000, 19000], and u7 = [19000, 20000].
Next, some fuzzy sets were defined on the universe. For this example the same number of fuzzy sets
and intervals were defined, but this need not always be the case. The fuzzy sets that were defined and
their corresponding linguistic values are: A~ = ( n o t many), A z = ( n o t too many), A 3 = ( m a n y ) ,
A4 - (many many), As = (very many), A6 = (too many), and A7 = (too many many). The intervals and
fuzzy sets are related by assigning each interval a membership in each fuzzy set. The vectors of
memberships used by Song and Chissom are given in Table 1. In SC Part II the enrollment for each
year in the data has a membership of unity in one of these sets and membership zero in all the others,
so each observation is associated with a particular fuzzy set. On the other hand SC Part I sets forth for
each year a vector of memberships in these sets, so the enrollment for each observed year has non-zero
membership in several of the fuzzy sets. For example. 1977 has memberships {AI/O, A2/0.6, A3/1,
A4/0.6, As/O.1, A6/O, A 7 / 0 } in SC Part I and {A1/0, A2/O, A3/1, A4/O, As/O, A6/O, A7/0} in SC Part II.
Thus, the specifics of the fuzzy time series examples in these papers would differ because the input data
are different. Table 2 gives the vectors of membership values in the fuzzy sets and the crisp enrollment
for each year. The memberships used in SC Part II can be obtained from those of SC Part I by
replacing those membership values less than one with zero.

Modeling the fuzzy time series


The fuzzy time series method begins by estimating the tendency for enrollment categories (or fuzzy
sets) to be followed by other categories. The hypothesis is that the enrollment for a given year indicates
the next year's enrollment by way of a fuzzy relationship [matrix], R r, which is estimated from the

Table 1. Vectors of memberships of the intervals


u i in the fuzzy sets A i

A 1= {Ul/1, uJO.5, u3/O, u4lO, uJO, uJO, U7/0}


A2={Ul/O.5, u2/l, u310.5, u4/0, u5/0, u6/0, UT//0}
A3- {uJO, uz/0.5, u31l, u4/0.5, us/0, uolO, u7/0}
A4 = {u~/0, u2/O,u3/0.5, uJ1, uJO.5, u6/O,uT/O}
A5 -{ul/O, u2/O,u3/O,u4/0.5, us/l, u6/0.5, u7/0}
A6 -{Ul/0 , u2/0, u3/0, u4//0,u5/0.5, u6/l, U7/0.5}
A7 - {Ul/0, u2/O, u3/O, u4/O,us/O, u/0.5, u7/l }
J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling 281

Table 2. Yearly enrollments and membership in fuzzy sets A i

Year Enrollmcnt Membership vector, as given in SC Part I Fuzzy set,


SC Part II
Aj Az A3 A4 A5 A~, A7

71 13(155 1 0.5 0 0 0 0 0 At
72 13563 1 0.8 0.1 0 0 (1 0 A1
73 13 867 1 0.9 0.2 0 0 0 I) A~
74 14696 0.8 1 0.8 0.1 0 0 0 A~
75 15 460 0.2 0.8 1 0,2 0 0 0 A3
76 15 311 0.2 0.8 1 0.2 0 0 0 At
77 15603 0 0.6 1 0,6 0.1 0 0 A~
78 15 861 0 0.5 1 0,7 0.2 0 (1 A3
79 16 807 0 0.1 0.5 1 0.9 0.2 0 A4
80 16919 0 0.1 0.5 1 0.9 0.2 0 A4
81 16388 0 0.2 0.8 1 0.5 0 0 A4
82 15 433 0.2 0.8 1 0.2 0 0 0 A~
83 15 497 0.2 0.8 1 0.2 0 0 0 A3
84 15 145 0.2 0.8 1 0.2 0 0 0 A3
85 15 163 0.2 0.8 1 0.2 0 0 0 A3
86 15 984 0 0.2 1 0.7 0.2 0 (1 A3
87 16859 0 0.1 0.5 1 0.8 0.1 (1 A4
88 18 150 0 0 0.1 0.5 0.8 1 0.7 A6
89 18 970 0 0 0 0.25 0.55 1 0.8 A6
90 19 328 0 0 0 0.3 0.5 0.8 1 A7
91 19337 A7
92 18876 A~

available historical data. Once Rj is estimated, the enrollment for next year is then estimated in turn as
the current enrollment membership vector times R r, using the rules of fuzzy matrix multiplication.
It is important to distinguish between the true (but u n k n o w n ) value of model parameters, such as this
relationship matrix, and an estimate of the parameters derived from data. Naturally an estimate will
depend on the particular data that has been observed and the particular years selected to be used in
estimating the parameters. Song and Chissom do not distinguish between a true, unknown matrix and
any of its estimates. They use the same symbol, Ry, for both references, a potential source of confusion
for some readers. This paper will generally use a carat over a symbol to indicate an estimate derived
from the data.
Estimation proceeds in two ways in the papers. In SC Part I the process is modeled as time invariank
so all the data are used to estimate Rj-; the resulting matrix is valid for estimating the enrollment
transition for any successive years. In the second paper a time varying model is used, so there are
several relationship matrices Rr(t, t - 1 ) to be estimated from those most recent past observations
falling within a moving window. The matrix used in a forecast is the one for which t corresponds to the
year for which the enrollment is to be forecast.
In the time invariant case of the first paper, the estimated R I is the fuzzy union of several matrices,
called Ry.i, where i is an index corresponding to a transition between two consecutive years in the
observed data. For each transition the corresponding matrix is the outer fuzzy product of the A vector,
as defined in Table 1, for the initial year times the A vector for the subsequent year. The A vector used
is the one for which the year has membership of one, as given in the last column of Table 2. In this
example the first transition is from 1971 to 1972, which is from A1 to AI, since both 1971 and 1972 have
unit membership in fuzzy set A~. This information is captured in matrix ky,~, the outer fuzzy product of
282 J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling

the vector A1 with itself. The result is

(1, 0.5, 0, 0, 0, 0, 0)T X (1,0.5,0,0,0,0,0)=Rf,1


1 0.50000 0
0.5 0.5 0 0 0 0 0
0 0 0 0 0 0 0
= 0 0 0 0 0 0 0 .
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0

The second transition, 1972 to 1973, is also from A1 to A1, while the third transition, 1973 to 1974, is
from A l to A2, and so forth, as easily determined from the last column of Table 2.
The fuzzy outer product corresponds to the conventional outer vector product with the operation of
multiplication replaced by the minimum of the operands. Alternatively expressed, the element in row i
and column j of the resulting product matrix would be the minimum of element i of the first vector and
element j of the second vector.
The fuzzy union of several matrices corresponds to the conventional addition matrix operation with
the addition operator being replaced by the maximum operator. That is, the element in row i and
column j of the resulting matrix is the maximum of all the elements in row i and column j" of the
constituent matrices instead of the sum of these elements. An important property of using the
maximum operator is that repeated appearances of transitions have no effect on the estimate of
relationship matrix Ri. This property is discussed in more detail at the end of Section 3.
Using the data given, the time invariant estimate for relationship matrix R I given in SC Part I is
-1 1 0.5 0.5 0 0 0
0.5 0.5 1 0.5 0.5 0 0
0 0.5 1 1 0.5 0.5 0.5
RI= 0 0.5 1 1 0.5 1 0.5 (1)
0 0.5 0.5 0.5 0.5 0.5 0.5
0 0 0 0 0.5 1 1
0 0 0 0 0.5 0.5 0.5

Forecast for next year


In SC Part I the estimated enrollment is the product of the previous year's membership vector in the
fuzzy sets, given in the center of Table 2, times/~I. This forecasting model is expressed in SC Part I as
A,+I = A , oRf. (2)
Fuzzy matrix multiplication is used, which is analogous to conventional matrix multiplication, but with
the operator substitutions of minimum for multipfication and maximum for sum. Thus, the element in
row i and column j of the fuzzy product would be the fuzzy scalar product of row vector i from the first
matrix and column j of the second matrix. The fuzzy scalar product of two vectors of the same length is
the maximum of all the minima of corresponding pairs. For this example, the estimated enrollment for
1978 is the product of the fuzzy membership vector for 1977 times k s, the first element of which is the
fuzzy scalar product of (0, 0.6, 1, 0.6, 0.1, 0, 0) and (1, 0.5, 0, 0, 0, 0, 0). The vector of pairwise minima is
(0, 0.5, 0, 0, 0, 0, 0) of which the maximum is 0.5. This is the first element of (0.5, 0.5, 1, 1, 0.5, 0.6, 0.5),
the forecast for 1978.
"[T]ranslating this fuzzy output into a regular number is indeed a necessary step," according to SC
J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling 283

part I. Two distinct methods are used for defuzzification, depending on the characteristics of the fuzzy
output vector. In the first method, used when there is a single unique maximum or when the multiple
maxima are located in consecutive positions in the vector, the crisp value is the centroid of the ui
interval(s) corresponding to the maximum (maxima). In the example forecast for 1978 given above, the
crisp value would be the centroid of the interval composed of u3 and u4, which is 16000. A slightly
different defuzzification method is used for other vectors having repeated maxima that are not
consecutive. This method uses the centroid of all the ui intervals, not just the ones corresponding to
the maxima, with each weighted by its corresponding standardized membership value. A membership
vector is standardized by dividing by the sum of its elements. For example, the 1980 forecast has
membership vector (0.1,0.5, 1, 1,0.5, 1,0.5). Since the maxima are in positions three, four, and six,
which are not consecutive, the first rule would not apply. The crisp value for 1980 is the membership
vector times the vector of centroids of the ui intervals, (13500, 14500, 15500, 16500, 17500, 18500,
19500), divided by the sum of memberships. This value is 16869.6, although due to rounding an
approximate value of 16813 is given in SC Part I.

Performance
Having developed a forecast, attention naturally turns to its accuracy. It is important to distinguish
between modeling accuracy and forecasting accuracy, the latter applying to years not used in estimating
the parameters of the model. Because the data used to build the forecasting model typically fit the
resulting model better than future data, modeling accuracy is typically better than forecasting accuracy.
In SC Part I the data for the years 1971 through 1990 were used to estimate the model parameters;
hence deviations between the model estimates and actual enrollments for those years would reflect
modeling accuracy, not forecasting accuracy. The average accuracy given for the SC Part I example is
3.2%, averaging the nineteen years of modeling deviations and the one forecasting deviation. The
transition from 1991 to 1992 is now known, but the forecasting accuracy cannot be calculated since the
necessary input vector, the fuzzy membership vector for 1991, must be determined subjectively and is
not given.
Song and Chissom compare the 3.2% average accuracy with other published results but do not
distinguish between modeling and forecasting accuracy. For example, Song and Chissom cite Warrack
and Russell [10] as forecasting 'enrollments of four universities' with forecasting errors ranging from
3.7% to 14.4%, whereas Warrack and Russell's paper itself gives the deviations as 0.8% to 18.3%. The
overall deviations for the three years forecasted are 1.6%, 2.5%, and 7.3% for an average of 3.8%.
Warrack and Russell use predictor variables developed from surveys of prospective enrollees, but it is
not clear from their paper whether the years used in estimating the model parameters were different
from the years used to evaluate the accuracy.
Another cited study is Weiler [12], characterized as having an average error of 9.7%. This is an
understatement of Weiler's accuracy, because it refers to an average across the components into which
Weiler divides his overall forecast. Naturally, there is a greater chance for deviation when a forecast is
divided into component forecasts. A better gauge is the overall enrollment forecast for the university,
which is given for intervals up to 26 weeks before the enrollment became known, and is thus a true
forecast. The numbers cited in SC Part I also appear in Chatman [2, p. 70], which is referring to the
Weiler forecast of eight weeks before enrollment was known. For this point in time Weiler's forecast
for the whole university deviated from the subsequently known actual enrollment by 3.6%. These errors
in the overall forecast decreased from 4% (26 weeks before the fall term) to 0.8% just before the term.

Fuzzy time-variant method


In SC Part II, Song and Chissom use a time-variant model of a fuzzy time series for the enrollment
example. There is no advice on deciding whether the time-variant or invariant model will work better in
a particular application. In the time-variant model the fuzzy relationship matrix may be different for
each transition; it is denoted by Rr(t, t - 1 ) and estimated from the w - 1 previous, observed
284 J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling

transitions in years t - w through t - 1, where w is the width of the moving window on the observed
values. SC Part II shows results for values of w ranging from 2 through 9.
Calculating these time-varying relationship matrices proceeds in the same manner described for the
time-invariant case, except only some of the most recent transitions are used. For example, for the case
where w is four, the matrix/~I (75, 74) is obtained from the three transitions observed in the years 1971
through 1974. These transitions are: {{A1---~A1}, {A1--> A1}, {A1--) A2} }. The matrix is the fuzzy union of
the three fuzzy vector products corresponding to these transitions,

1 0.5 0 0 0 0
0.5 0.5 0.5 0 0 0 0
0 0 0 0 0 0 0
/~f(75, 74) = 0 0 0 0 0 0 0

I
0 0 0 0 0 0 0
IO 0 0 0 0 0 0
0 0 0 0 0 0 0

The forecast for 1975 is the fuzzy set for 1974, which is A 2 from Table 1, times/~s(75, 74), resulting in
(0.5, 0.5, 0.5, 0, 0, 0, 0). There is a difference between the procedures of the two papers that should be
noted at this point. In SC Part I the forecast does not correspond to this product. Although the fuzzy
sets defined in Table 1 are used in estimating the fuzzy relation/~igiven in (1), in SC Part I the forecast
is calculated using the membership vectors for each year, determined from an additional subjective step
and given in the center of Table 2.

Performance of the time-variant fuzzy method

In the time-variant method, performance is given in terms of true forecasting errors, since the year
being forecasted is not used in building the model from which its forecast is obtained. As noted
previously, true forecasting errors tend to be larger than modeling errors. The effect of this change in
the performance criteria could elevate the reported errors for the time-variant model, which are mostly
larger than the errors reported for the time-invariant model.
Calculating the forecast errors requires defuzzifying the forecast fuzzy set. SC Part II uses a 3-layer
neural network model for defuzzification. The average forecasting error reported for different values of
the window width w generally decreased with a decrease in w, ranging from 4.49% down to 3.15%.
However, the simple technique of using the previous year's crisp enrollment as the forecast produces an
average deviation of 3.16%, which is smaller than that reported for all the values of w except for the
value w = 2. As w decreases, less of the observed information is being incorporated into each estimate
of the relationship matrix. For the best performing case, where only one transition makes up the matrix,
it will often be the outer product of the vector describing the current state with itself, since many of the
transitions result in staying in the same category. When such is the case the forecast will be the current
category. Thus, the time-variant method with a small window largely mimics the simple method of using
the previous year's crisp enrollment as the forecast.

3. Markov model

As Song and Chissom point out, traditional forecasting methodologies are not suited to data
composed of linguistic values. This is the motivation for the fuzzy time series method. There are
probability models such as the discrete state Markov models, however, that use categorical data which
J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling 285

can include linguistic labels. These are seldom applied in a traditional time series environment since
data are most often available in numerical form. In this section a Markov model for the enrollment data
will be described and the parameter estimation process compared with that of the fuzzy time series
method. Kemeny and Snell [4] give an introduction to Markov models.
A discrete-state Markov process has a (possibly infinite) number of states or categories that are
mutually exclusive. The Markov property requires that the probability of transition to a particular state
], given the process is currently in state i, be independent of the history of states occupied before the
current state. Typically with a finite number of states these transition probabilities are arrayed in a
p p transition matrix, where p is the number of states in the model. The element in row i and column
j gives the probability of transition to state j given the current state is i. If we let P, denote the vector of
state probabilities at time t, where entry n in this vector gives the probability the system is in state n at
t, then
P ; ~ , = P~ * Rm, (3)
where Rm is the transition matrix. The product P~ * R ~ gives the vector of state probabilities for time
t + k. Thus, in general,
P~k=P~*Rkm, k=l,2,....
The transition matrix R,, may vary with time, in which case a subscript is added to indicate the time to
which it applies. In this paper the model is based on a time-invariant transition matrix, but a
time-variant model could be used in a way analogous to SC Part II. The Markov model of Equation (3)
corresponds closely to that of the fuzzy times series forecasting in (2), but the multiplication used in (3)
is conventional matrix multiplication.
Even though the states of a Markov chain are mutually exclusive, the process being modeled does not
have to occupy one particular state with certainty for a Markov model to be valid. Several states can
have non-zero probabilities, analogous to the concept of fuzzy set membership. With the Markov
model, however, there is the requirement that all the state probabilities must sum to one for each
observation. Membership functions do not have this restriction.
Emulating the fuzzy time series model as closely as possible, we let each state in the Markov model
correspond to a probability density function (pdf), referred to as bi, i = 1 . . . . . 7. Each pdf is defined on
the same universe, U, as the corresponding fuzzy set, having the same shape as the memberships of the
intervals ui in the corresponding fuzzy set, but scaled to unit area. The first two of these seven pdf's are
shown in Figure 1. In the Markov chain approach each state, analogous to a fuzzy set A, corresponds to
a probability distribution on the universe of discourse instead of a fuzzy set. In addition, a year's
observation will be a vector giving the probabilities for each state instead of a vector giving

- - b[l]

0.0006

-- -- b{2i
0,0005

.0004

~.0003
m _1

5.0002 I
{i.001]l
i
I
14,,o 16c : > !~v0o :'00~o Enrollment

Fig. I. Probability density functions corresponding to fuzzy sets A I and ,42.


286 J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling

memberships in the fuzzy sets. These probability distribution functions can be estimated in some
applications or determined subjectively.
The fuzzy enrollment data will be used to estimate the transition matrix R,~ for a Markov model,
using the same years (1971-1990) used by Song and Chissom. Each pair of successive years in the data
constitutes a transition, so there are 19 observed transitions in the 20 years of enrollment data.
Associated with each observed transition is a p p matrix that is the outer vector product of the vector
of probabilities corresponding to the initial observation times the vector of probabilities corresponding
to the subsequent observation. Summing these matrices then normalizing all rows to sum to unity
produces the estimated transition matrix for the Markov model. The forecast is this estimated transition
matrix pre-multiplied by the previous year's observed (or forecasted) probability vector. Several
variations in this technique are possible by using different probability distributions for the observations.
We will illustrate two of the possibilities of a Markov model corresponding as closely as possible to
the methods set forth by Song and Chissom. In SC Part I the definitions of the fuzzy sets, given in Table
1, are used to estimate the fuzzy relation Rr. Then the membership vectors for each year, given in Table
2, are multiplied by Rr to obtain the fuzzy forecast membership vector. Finally, the fuzzy forecast is
defuzzified using either of two methods according to the shape of the memberships in the forecast
vector. Analogously, the definitions of the fuzzy sets A, given in Table 1, are normalized to sum to one
and used as probability vectors to estimate the transition matrix for the Markov chain approach. The
state used for a year corresponds to the fuzzy set used in Song and Chissom, which is that fuzzy set with
membership of one for the year, as shown in the last column of Table 2. The resulting data is shown in
Table 3.
The matrix corresponding to the first transition would be
4
0 0 0 0 0-
0 0 0 0 0
0 0 0 0 0 0 0
(2,~,O,O,O,O,O)T.(~,~,O,O,O,O,O)= 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0

Adding the matrices for all the transitions in the data and scaling the result so the rows sum to unity
produces an estimate for the transition matrix Rm. The estimate is
-0.469 0.373 0.130 0.028 0 0 0
O.141 0.254 0.356 0.217 0.033 0 0
0 0.174 0.413 0.304 0.076 0.022 0.011
/~,,, = 0 0.132 0.353 0.309 0.118 0.059 0.029
0 0.042 0.167 0.208 0.167 0.222 0.194
0 0 0 0 0.125 0.417 0.458
0 0 0 0 0.125 0.417 0.458

The forecast vector for a year would be R,, pre-multiplied by the previous year's probability vector,
obtained in this example by standardizing the appropriate membership vector from Table 2. In
converting these forecast vectors to point estimates we found, as did Song and Chissom, that better
results are produced by using two different techniques according to the skewness of the vector. When
the skewness coefficient, the unitless ratio of the third central moment to the cube of the standard
deviation, is one half or less, the mean is used as the point estimate. However, for skewed vectors the
mean is not representative, so the mode is used instead. Table 4 gives the resulting output vectors, their
J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling 287

Table 3. Standardized membership vectors

Ycar Vector

71 0.667 0,333 0 0 0 0 0
72 0.667 0.333 0 0 0 0 0
73 0.667 0,333 0 0 0 0 0
74 0.25 0.5 0.25 0 0 0 0
75 0 0.25 0.5 0.25 0 0 0
76 0 0.25 0.5 0.25 0 0 0
77 0 0.25 0.5 0.25 0 0 0
78 0 0.25 0.5 0,25 0 0 0
79 0 0 0.25 0.5 0.25 0 0
80 0 0 0.25 0.5 0.25 0 0
81 0 0 0.25 0.5 0.25 0 0
82 0 0.25 0.5 0.25 0 0 0
83 0 0.25 0.5 0.25 0 0 0
84 0 0.25 0.5 0.25 0 0 0
85 0 0.25 0.5 0.25 0 0 0
86 0 0.25 0.5 0.25 0 0 0
87 (1 0 0.25 0.5 0.25 0 0
88 (1 0 0 0 0.25 0.5 0.25
89 0 0 0 0 0.25 0.5 0.25
9(I 0 0 0 0 0 0.333 0.667

skewness coefficients, the point estimates, and percentage deviation of the point estimate from the
actual enrollment.
The average of the absolute deviations for these years is 2.6%, slightly better than the 3.2% for the
same years with the fuzzy time series model. Following Song and Chissom, the observation for 1991
was not used in estimating the model parameters; the forecasting error for that year is 0.8%, somewhat
less than the 1.7% from the fuzzy time series method. Figure 2 shows the resulting point estimates for the
Markov model and the actual enrollment. The Markov model can be used to obtain an estimated
probability distribution for the forecast, an improvement over a point estimate. The probability

Table 4. Forecast with Markov model

Year Output vector Skew Point Enroll- Devia-


coef. est. ment tion

72 0.360 0.334 0.205 0.091 0.011 0 0 0.66 13500 13563 0.5


73 0.306 0.3t3 0.240 0.122 0.018 0.001 0.001 0.54 14500 13867 4.6
74 0.284 0.303 0,253 0.135 0.022 0.002 0.001 (/.51 14500 14696 -1.3
75 0.191 0.261 0.306 0.190 0.039 0.009 0.004 0.40 15231 15460 - 1.5
76 0.094 0.217 0.361 0.248 0.057 0.015 0,008 0.33 15563 15311 1.6
77 0.094 0.217 0.361 0.248 0.057 0.015 0.008 0.33 15563 15603 0.3
78 0.037 0,178 0.372 0.278 0.080 0.034 0.021 0.62 15500 15861 2.3
79 0.029 0.167 0.363 0.279 0.087 0.045 0,029 0.68 15500 16807 -7.8
80 0.005 0.105 0.276 0.248 0.124 0.131 0.112 0.38 16684 16919 1.4
81 0.005 0.105 0,276 0.248 0.124 0.131 0.112 0,38 16684 16388 1.8
82 0.011 0.137 0.335 0.280 0.107 0.075 0.054 0.67 15500 15433 0.4
83 0.094 0,217 0.361 0.248 0.057 0.015 0.008 0.33 15563 15497 0.4
84 0.094 0.217 0.361 0.248 0.057 0.015 0.008 0.33 15563 15145 2.8
85 0.094 0.217 0.361 0.248 0.057 0.015 0,008 0.33 15563 15163 2.6
86 0.094 0.217 0,361 0,248 0,057 0.015 0,008 0.33 15563 15984 2.6
87 0.013 0.155 0.364 0.288 0.094 0.051 0,033 0.75 15500 16859 8.1
88 0.006 0.111 0.291 0,260 0,122 0,116 0,094 0.47 16577 18150 -8.7
89 0 0.038 0,113 0.113 0.133 0.296 0,307 -0.73 19500 18970 2.8
90 0 0.022 0.069 0.074 0.133 0.341 0,361 -1.12 19500 19328 (1.9
91 0 0.023 0.073 0,076 0.132 0.338 0.358 -1,10 19500 19337 0.8
288 J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling

Number of Students

-O

~iI
19000--

18000'
--
" I

17000'

16000"

15000'

14000'

J,,l,,Igll,,,l,ll,IJ
75 80 85 90 Year

Fig. 2. Actual ( - - ) and forecast (-- -) enrollments using the Markov model.

distribution is a weighted average of the probability distributions corresponding to the fuzzy label
states, using the output probabilities of the forecast as the weights. For comparison, Figure 3 gives the
similar results for the fuzzy time series method of SC Part I.
The Markov method requires a probability distribution over U for each linguistic label and a
probability distribution over the linguistic labels for each observation. In many cases these will be
determined subjectively. Thus, in this example other variations in the Markov procedure are possible.
The vectors in Table 1 that define the fuzzy sets A (after standardizing them into probability vectors)
can be used for both estimating R m and for obtaining the forecasts. Alternatively, the yearly
observation vectors from Table 2 (again, after conversion into probability vectors) can be used for both
purposes.
Illustrated next is another method which corresponds closely to the matrix estimation method of SC
Part II, in that only one fuzzy set represents a year's observation. This illustration is based on a
time-invariant model, so it does not make use of the moving window of SC Part II. Modified yearly

Number of Students

19000-

18000-

17000-
#
16000-

15000-
/,'
14000-

-JlJllJ -tl J',j l: I


75 80 85 90 Year

Fig. 3. Actual ( - - ) and forecast ( - - - ) enrollments with the fuzzy time series.
J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling 289

observation vectors are determined from the membership vectors in Table 2 by rounding all elements to
zero except for the one corresponding to the maximum membership. In terms of the Markov model the
uncertainty regarding the state occupied is removed so the states for each transition are known with
certainty. These modified observation vectors are used for both estimating R,, and producing the
forecast. This technique produces the following estimate for the transition matrix
2
31 0 0 0 0 0
0 0 1 0 0 0 0
o
o o o 1 o .

0 0 0 0 1 0 0
1 1
0 0 0 0 0 2 2

0 0 0 0 0 0 1

The observed data contain no transitions from states five or seven so these rows are not specified by
the data, but are subjectively completed to satisfy the requirement that all rows sum to unity. Using this
matrix with the modified observation vectors produces an average of absolute errors of 3.0%,

Comparison of Markov and fuzzy time series models


Both the Markov and fuzzy time series models make direct use of linguistic labels and the notation
for the models is quite similar. The fuzzy approach uses membership functions to reflect the ambiguity
of the linguistic labels and the categorization of each observed data value. The Markov method uses
probability distributions for this purpose. However, there are several important differences between
them. These models use the empirical data to estimate relationships in fundamentally different ways.
Repeated transitions are ignored in estimating the fuzzy relationship matrix, whereas the estimated
Markov transition probabilities make use of the relative frequency of transitions. This distinction can be
illustrated by simple examples.
Note that crisp membership is a special case of fuzzy membership, so letting the membership values
be zero except for a single one defines a fuzzy set for which the fuzzy time series method is applicable.
Suppose there is a process that obeys the Markov model with three crisp states, C~, C2, and C3, and
the following transition matrix:

Ra = 0 .
0
Provided at least three transitions are observed, both the fuzzy time series and Markov methods will
produce the same estimated R matrix, which will be the matrix of true transition probabilities since the
transitions are certain. In this case the fuzzy and Markov chain methods produce the same forecasts,
which will reproduce the observed data without error.
However, suppose the true transition probabilities are different, as given below:

Rh= 0.1 0 0.9 .


0.9 0.1 0
The estimated fuzzy relationship matrix i~1 will have a one in each location (i, j) corresponding to a

[01 ]
transition C; to Cj observed in the data. Therefore, with a sufficiently large number of observations the
estimated fuzzy relation matrix l)y will tend to be to be

R,,= 1 0 .
1 1
290 J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling

This indicates the next category is equally likely to be one of two possibilities, whereas in fact one of
them is nine times more likely to occur. This will lead to inaccurate forecasts. In contrast, the estimated
Markov matrix R m has the desirable property of tending to become increasingly close to the true
transition matrix as more data are observed.
Since incorporating a moving window adds another parameter to the model, there will be situations
where the proper choice of the parameter will improve performance. Greater skill is required, however,
in estimating the model. For example, consider the second process where the true transition
probabilities are given by Rb and the window size is less than three transitions, say two transitions
(w = 3). Suppose the following transitions were observed: {{C~~ C2}, {C2~ C3}}, ending in C3 and
producing the following estimated fuzzy relationship matrix Re:

0 .
0
Now the current state is C3, so the membership vector is (0, 0, 1), and the fuzzy product of this vector
with the matrix Re results in all zeros. Thus, it is possible for the moving window to be too short, so that
not enough information is used in the estimate. The Markov model can also be based on a time-variant
transition matrix which is estimated using a moving window if it is suspected that the transition matrix
changes over time. Thus, it is susceptible to this same problem.

4. Time series model

The fuzzy time series and Markov models described above use directly the linguistic values of the
data and their output is 'defuzzified' into a numerical point estimate. An alternative approach is to
defuzzify the linguistic data and then use conventional time series models for analysis, producing a
numerical point estimate. This approach will be illustrated using the actual crisp data for this example,
rather than defuzzifying the linguistic data.
One of the most simple forecasting methods is to use the previous year's enrollment as the forecast
for the current year. This gives an average error of 3.1% for this example, compared to the 3.2% error
for the fuzzy forecasting model and 2.6% for the Markov chain model.
Another simple model is a trend with time, in which case the ordinary least squares fit is
-2147 + 224 * (year),
where only the last two digits are used for the year. For this simple model the average deviation for
years 1972 through 1991 is 5.1%, using the years 1971-1900 in estimating the model parameters.
Since the residuals from this simple regression are positively correlated, an autoregressive model is
appropriate to consider. In such a model the value of the variable for year t is a linear combination of
the values in the p previous years plus an independent disturbance, where the parameter p denotes the
order of the model. The autoregressive model will be applied to the residuals from the regression
model, since there seems to be a linear trend in the data which is captured by the regression. Hence the
regression residuals will be the 'data'. The model for order one (AR(1)) is

Xt = El * Xt-1 q- Ut

where x, represents the regression residual at time t and the u, are independent random variables. For
these data unity turns out to be a reasonable estimate for rl. See Chatfield [1], Montgomery et al. [5]
and Greene [3] for procedures to use in estimating the parameters and model selection. Using the
estimated value ~ = 1, the best estimate of the residual (deviation from the trend line) is the previous
residual, so the enrollment estimate for a particular year would be the current trend estimate plus the
previous year's departure from the trend. Equivalently the estimate is the previous year's enrollment
J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling 291
Number of Students

19000--

18000-

17000-

16000-

15000-

14000-

I 1 1 11 I I i I I I I I t I
75 8O 85 90 Year

Fig. 4. AR(I) forecasts (---) and actual enrollments (--).

plus the yearly trend amount. Such a model gives an average deviation for years 1972-1991 of 2.8%,
using data for years 1971-1990 in estimating the model parameters. Figure 4 shows the forecasts
resulting from this model and the actual enrollments.
In this example no matter what coefficient is used with the AR(1) model, the resulting errors are still
autocorrelated. Thus, an AR(1) model cannot be properly specified for this data since the model
requires that the u, be independent. Since the order one model cannot be specified progression to an
order two model is indicated, using the previous two years in the estimate
X t = F1 * X t - 1 + r 2 g < x t _ 2 -}- u t .

With this model the estimated coefficients are r] = 1.321 and r2 = - 0 . 6 4 7 . The AR(2) model gives an
average deviation for years 1972-1991 of 2.22%, using data for years 1971-1990 in estimating the
model parameters. With this model the yearly errors are uncorrelated. Figure 5 shows the forecasts for

N u m b e r of s t u d e n t s

19000 -- --

18000 -- - I ~
_

17000"

16000' ?

1 5 0 0 0 --

14ooo--- /
I , , i , I I , , , I , , , t I t ,
] i l i l l i l i l l l i l i t l i l i l i
75 80 85 90 Year

Fig. 5. AR(2) forecasts (---) and actual enrollments (--).


292 J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling

Number of S t u d e n t s

19000--

18000--

17000--

16000
/
1500~

14000--

/l llllllllllllLlll75 80 85 90 Year

Fig. 6. AR(2) forecasts with back-casting (-- -) and actual enrollments (--).

this model and the actual enrollments. A slight improvement is possible by fitting exactly the first two
years of the data by 'back-casting'. This reduces the average error to 2.0%. Figure 6 shows the resulting
forecasts.

5. Conclusions

Traditional time series methods based on crisp data emphasize the distinction between the
deterministic and random aspects of a model of a process. They use historical data from the process to
estimate the deterministic part. Future results of a process are presumed to obey the model with the
same unknown but estimated deterministic part with unknown future random disturbances. Therefore,
the future values of the process "are estimated by extrapolating the estimated deterministic part, about
which the future random disturbances will cause variation. This approach was illustrated in Section 4.
Data for several time periods is used to estimate the parameters of any model for forecasting values
in future time periods. Therefore, forecasting accuracy is measured by the error between the model
value and the actual value for time periods not used in estimating the model parameters. For most
models there will be a discrepancy between the model and actual values for the time periods used in
estimation. This is not a fair measure of forecasting power, since it can easily be reduced to zero by
using a model with sufficiently numerous parameters. The benefit to true forecasting accuracy, however,
is dubious at best. Thus, it is always important to distinguish between forecasting and modeling
accuracy in comparing models.
When the data are only available in linguistic form, both the Markov and the fuzzy time series
models accommodate the data using categories or fuzzy sets. However, the Markov model has the
advantage that any repeated transitions in the data are taken into account in estimating the model
parameters, whereas the fuzzy time series model based on the usual max-min operations cannot do this.
A simple example is given in which the Markov model would give better forecasts.
Although the Markov model was more accurate for the enrollment example, further testing would be
required to determine the generality of this result. There may be other examples for which the fuzzy
time series would give a more accurate result.
Finally, fuzzy data can be defuzzified into numerical values so several conventional time-series
J. Sullivan, W.H. Woodall / Fuzzy forecasting and Markov modeling 293

models can be applied to the problem. Not surprisingly the time series models have greater accuracy
when the actual numerical values are known, as in the enrollment example.
Furthermore, the conventional time-series models are better than either the fuzzy time series or
Markov models in projecting strong trends to values outside the range of historically observed data. As
noted earlier, neither of these latter models can project a value outside the defined universe of
discourse.

Acknowledgment

We wish to thank Qiang Song for his generous and helpful comments on this paper. Any remaining
deficiencies are solely the responsibility of the authors.

References

[1] C. Chatfield, The Analysis of Time Series, An Introduction, fourth edition (Chapman & Hall, London, 1989).
[2] S.P. Chatman, Short-term forecasts of the number and schloastic ability of enrolling freshman by adademic divisions,
Research in Higher Education 25 (1986) 68-8l.
[3] W.H. Greene, Econometric Analysis, Chapters 15 and 18 (MacMillan, New York, 1990).
[4] J.G. Kemeny and J.L. Snell, Finite Markov Chains (Van Nostrand, Princeton, NJ, 1960).
[5] D.C. Montgomery, L.A. Johnson and J.S. Gardiner, Forecasting & Time Series Analysis, second edition (McGraw Hill, New
York, 1990).
[6] S. Ruimin, Fuzzy causal relation analysis in time series, in: J. Kacprzyk and M. Fedrizzi, eds., Fuzzy Regression Analysis
(Omnitech Press, Warsaw, 1992) 228-234.
[7] Q. Song and B. Chissom, Fuzzy time series and its models, Fuzzy Sets and Systems 54 (1993) 269-277.
[8] Q. Song and B.S. Chissom, Forecasting enrollments with fuzzy time series - Part I, Fuzzy Sets and Systems 54 (1993) 1-9.
[9] Q. Song and B.S. Chissom, Forecasting enrollments with fuzzy time series - Part II, Fuzzy Sets and Systems 62 (1994) 1-8.
[10] B.J. Warrack and C.N. Russell, Forecasting demand for postsecondary education in Manitoba: the motivational index and
the demand index as an enrollment forecasting tool, Research in Higher Education 19 (1983) 335-349.
[11] J. Watada, Fuzzy time-series analysis and forecasting of sales volume, in: J. Kacprzyk and M. Fedrizzi, eds., Fuzzy
Regression Analysis (Omnitech Press, Warsaw, 1992) 211-227.
[12] W.C. Weiler, A model for short-term institutional enrollment forecasting, Journal of Higher Education 51 (1980) 314-327.

You might also like