You are on page 1of 15

Journal of Hydrology 224 (1999) 100114 www.elsevier.

com/locate/jhydrol

Application of fuzzy rule-based modeling technique to regional drought


R. Pongracz a,1, I. Bogardi b,*, L. Duckstein c,2
b a Department of Meteorology, Eotvos Lorand University, Pazmany setany 1, Budapest, H-1117, Hungary Department of Civil Engineering, University of Nebraska-Lincoln, W359 Nebraska Hall, Lincoln, NE 68588-0531, USA c Ecole Nationale du Genie Rural des Eaux et des Forets, 19, avenue du Maine, 75732 Paris Cedex 15, France

Received 12 January 1999; accepted 19 August 1999

Abstract Fuzzy rule-based modeling is applied to the prediction of regional droughts (characterized by the modied Palmer index, PMDI) using two forcing inputs, El Nino/Southern Oscillation (ENSO) and large scale atmospheric circulation patterns (CPs) in a typical Great Plains state, Nebraska. Although, there is signicant relationship between simultaneous monthly CP, lagged Southern Oscillation Index (SOI) and PMDI in Nebraska, the weakness of the correlations, the dependence between CP and SOI and the relatively short data set limit the applicability of statistical modeling for prediction. Due to the above difculties, a fuzzy rule-based approach is presented to predict PMDI from monthly frequencies of daily CP types and lagged prior SOIs. The fuzzy rules are dened and calibrated using a subset called the learning set of the observed time series of premises and PMDI response. Then, another subset, the validation set is used to check how the application of fuzzy rules reproduces the observed PMDI. In all its eight climate divisions and Nebraska itself, the fuzzy rule-based technique using the joint forcing of CP and SOI, is able to learn the high variability and persistence of PMDI and results in almost perfect reproduction of the empirical frequency distributions. 1999 Elsevier Science B.V. All rights reserved.
Keywords: Fuzzy rule-based modeling; Drought; ENSO; Circulation pattern; Palmer index

1. Introduction The purpose of this paper is to develop and apply fuzzy rule-based modeling to the prediction of regional droughts from the joint use of two forcing inputs or premises, namely El Nino/Southern Oscillation (ENSO) and large scale atmospheric circulation
* Corresponding author. Tel: 1-402-472-1726; fax: 1-402472-8934. E-mail addresses: prita@caesar.elte.hu (R. Pongracz), ibogardi@unlinfo.unl.edu (I. Bogardi), duckstein@engref.fr (L. Duckstein) 1 Tel: 36-1-209-0555/6615; fax: 36-1-372-2904. 2 Tel: 33-1-4549-8931; fax: 33-0-1-4549-8827.

patterns (CPs) applied to the case study of a typical Great Plains state, Nebraska (Fig. 1). Drought is a normal part of the Great Plains climate, and it is different from other natural hazards that affect the region. Drought is a slow-onset, insidious hazard that is often well established before it is recognized as a threat, taking months or years to develop. Economic, environmental, and social impacts of drought can be enormous (WGA (1996)). The Federal Emergency Management Agency (FEMA, 1995) estimates annual drought losses in the US to be US$68 billion. The 198789 drought across much of the US totaled an estimated US$39.4 billion in direct and indirect losses, which is still the

0022-1694/99/$ - see front matter 1999 Elsevier Science B.V. All rights reserved. PII: S0022-1694(99 )00 131-6

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

101

Fig. 1. Climate divisions in Nebraska. 1: Western Nebraska; 2: Northern Nebraska; 3: Northeastern Nebraska; 5: Central Nebraska; 6: Eastern Nebraska; 7: Southwestern Nebraska; 8: South-Central Nebraska; 9: Southeastern Nebraska.

largest amount for any natural disaster in the US (Riebsame et al., 1991). Environmental and social impacts of drought are harder to measure, but no less signicant. In the Great Plains, droughts have always played a major role. During the second half of the 19th Century, drought directly affected
8 PMDI 4 0 -4

settlement patterns and population shifts as European Americans moved westward from the eastern US. In this century, drought conditions during the 1930s, and the associated dust storms, gave the Great Plains the nickname the Dust Bowl, and again desperate farmers ed the Plains for the West Coast. Droughts in the

Western Nebraska

-8 1946 1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 years

8 PMDI 4 0 -4

Northeastern Nebraska

-8 1946 1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 years
Fig. 2. PMDI time series (19461997) in climate divisions 1 and 3.

102

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

Fig. 3. Distributions of PMDI for the learning and validation sets in climate division 1.

1950s and 1970s caused less social upheaval, but still resulted in large agricultural losses in the Plains. The late 1980s drought severely affected the northern Plains, while the recent droughts had a major effect in the southern Plains, causing US$5 billion in 1996 and US$7 billion in 1998 in losses in Texas (Chenault and Parsons, 1998). Drought indices have become common tools to measure the intensity and spatial extent of droughts. One of the most commonly used climatic drought indices in the US is the Palmer Drought Severity Index (PDSI) (Palmer, 1965), that is based on the principles of a balance between moisture supply and demand when man-made changes are not considered. This index indicates the severity of a wet or dry spellthe greater the absolute value the more severe the dry or the wet spell. The PDSI was modied by the National Weather Service Climate Analysis Center, to obtain another index (modied PDSI or PMDI) which is more sensitive to the transition periods between dry and wet conditions (Heddinghause and Sabol, 1991). This paper considers the modied Palmer index. The methodology is, however, applicable to any other drought indices such as the Standardized Precipitation Index (McKee et al., 1993) or the BhalmeMooley drought index (Bogardi et al., 1994). This is an important point because it has been argued that Palmer Drought Indices have weaknesses that limit their application as a drought monitoring tool (Alley, 1984; Guttman et al., 1992). On the other hand, given the observed high variability and persistence of PMDI (Fig. 2), it is a more challenging task to

reproduce these features with any modeling technique. A long-term historical data set of PMDI values exists for climatic divisions around the US (Guttman and Quayle, 1996). In the present paper, PMDI is evaluated during the summer half-year (April September) in eight climate divisions in Nebraska (Fig. 1). Drought conditions are quite different in these divisions; Fig. 2 shows the observed time series for divisions 1 and 3 (Western and Northeastern Nebraska, respectively) during the 194697 period. Several drought periods can be identied according to these divisional PMDI time series. After the drought in the 1930s the next most signicant drought period occurred from 1952 through 1957 (Lawson et al., 1977) that is obvious in both Nebraskan regions (Fig. 2). Although, the climate of the different divisions varies considerably, the main patterns are similar. The western part of Nebraska is colder and drier in general, compared to the eastern part (Palecki, 1996), and also less variable in PMDI values. The question arizes if the monthly PMDI values are homogeneous. To this end, Fig. 3 shows the cumulative frequency distribution of PMDI in division 1 for two periods: the training sets of 194662, 197894 and the validation set of 196377. The two frequency distributions are different at the 0.1, but the same at the 0.05 signicance level, using the two sample KolmogorovSmirnov test. The other divisions behave similarly.

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114 Table 1 Categories dened on PMDI PMDI intervals PMDI 3 3 PMDI 1 1 PMDI 1 1 PMDI 3 PMDI 3 Drought categories Very dry Dry Normal Wet Very wet

103

2. Atmospheric circulation, ENSO and droughts Great Plains droughts are strongly related to unusual and persistent synoptic meteorological conditions, mostly large scale circulation patterns and ENSO (El Nino and La Nina) events. The importance of ENSO effects on weather anomalies and crop production in the Midwest was shown by many researchers, as summarized by Carlson et al. (1996). The association of PDSI with ENSO has been demonstrated for the whole United States (Piechota and Dracup, 1996), but the correlations are not strong enough to predict drought from ENSO index alone. Pesti et al. (1996) used a fuzzy rule-based technique to identify the relationship between PDSI and CP types in New Mexico. Large scale circulation patterns (CPs) can be represented by the daily geopotential height eld of the 500 hPa level above a US-East-Pacic area centered at Nebraska. The grid consists of 49 points between latitude of 25 65N and longitude of 80 130W. This data set (NCAR and University of Washington, 1996) provides grid point values of daily geopotential height eld observations for the 194694 period. To overcome the time scale difference between monthly droughts and daily CP, the effects of CP on droughts are represented by the monthly empirical relative frequencies of daily CP types. The CP types were identied by a combined multivariate technique (Wilks, 1995), namely principal component analysis
Table 2 Categories dened on SOI SOI intervals SOI 1 1 SOI 1 SOI 1 ENSO phases E1 Nino Normal La Nina

(PCA) and cluster analysis using the k-means method (MacQueen, 1967). The same methodology has been used as in Matyasovszky et al. (1993), but for longer time series and smaller number of clusters in the present study (Pongracz, 1999). The procedure starts with a PCA performed on the daily geopotential height elds of 500 hPa level in order to obtain new uncorrelated variables for the classication. Then, a system with k initial cluster centers (here, k 6 is chosen as the number of CP types) is dened, and the rst daily PCA grid is examined by calculating the distances between the grid and each cluster center. The grid is classied into the closest cluster, and the center of this cluster having a new member is recalculated. The same steps are applied to each daily PCA one grid after the other. Then, the nal cluster centers obtained after classifying all PCA grids are handled as initial centers and the whole classication procedure is reiterated until cluster centers are stabilized. The ENSO phenomena are represented by the time series of SOI (Southern Oscillation Index) which is one of the most commonly used indices in ENSO research. Monthly values are available from Internet (NOAA, 1997) for the years 19331997. The data set of the Palmer index, the PMDI consists of monthly values during the period of 18951998 (NOAA, 1998). Drought events occur in the case of negative PMDI values while positive values imply wet conditions. Possible statistical relationships between the three above-mentioned time series have been analyzed elsewhere (Pongracz et al., 1997), and some results are shown here. First, discrete categories are dened on the PMDI and SOI (Tables 1 and 2). The correlation coefcients between the monthly relative frequencies of CP types and lagged PMDI or SOI are smaller than 0.18, and mostly not signicant. On the other hand, the empirical frequency distributions of CP types during the ve drought categories are different at the 0.01 signicance level. Fig. 4 shows the frequencies of CP types during the two most extreme PMDI categories: very dry and very wet conditions. The frequencies of CP types during the three ENSO phases are also signicantly different. The correlation coefcients between PMDI and lagged SOI reach 0.39 and are signicant (Fig. 5). Both direction of lag has been evaluated since simultaneous, lag and pre-lag teleconnections of climate

104

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

0.25 0.2 frequency 0.15 0.1 0.05 0 CP 1 CP 2 CP 3

CP 4

0.25 0.2 CP 5 CP 6 frequency CP 1 0.15 0.1 0.05 0 CP 2 CP 3 CP 4

CP 5 CP 6

Very dry

CP types

Very wet

CP types

Fig. 4. Empirical relative frequency distributions of CP types during extreme drought conditions in climate division 8.

variables may be related to ENSO (Wright, 1985). The conditional frequency distributions of PMDI during El Nino and La Nina periods (Fig. 6) are also signicantly different. This simple statistical analysis reinforces earlier ndings (e.g. Piechota and Dracup, 1996) that despite the strong teleconnection between ENSO and droughts, droughts have occurred in this region under various phases of ENSO (Carlson et al., 1996). Thus, in the Great Plains, the partial signals of ENSO and CP on drought are even weaker than in other regions. Also, CP and ENSO are evidently interdependent as shown for example by Bartholy et al. (1996). Thus, the more traditional stochastic approach to regress SOI and the frequencies of CP types with a drought index may not work, as shown later. 3. Fuzzy rule-based methodology Due to the above difculties of the traditional statistical analysis, fuzzy rule-based modeling is used for utilizing the existing linkage between the joint
correlation coefficient -0.4 -0.3 -0.2 -0.1 0.0 -6 -5 -4 -3 -2

ENSOCP forcing and the drought index. The fuzzy rule-based approach has relatively simpler structure and requires neither independency, nor long data sets (Galambosi et al., 1999). In the following, a fuzzy rule-based technique, namely the weighted counting algorithm (Bardossy and Duckstein, 1995) will be adapted to the present case and described in a step-by-step manner. 3.1. Selection of the input variables (premises) and denition of the training and validation data sets Based on the previous analysis, SOI and the monthly frequency distribution of the six CP types constitute the input, forcing function, or premises. In addition, the question arises how many prior monthly premises should be considered to predict the drought index. There is no strict rule for this case; here we used a selection based on the correlation between SOI with different lag periods and the drought. For the CP types, none of the prior months has any signicant correlation, thus, only the simultaneous frequency distributions of the six CP types represent

1946-94

-1 0 1 2 lag time [months]

Fig. 5. Correlation coefcients between drought and the lagged SOI.

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

105

0.3

El Nino

0.3

La Nina

0.2

0.2

Very dry

Normal

Normal

0.1

0.1

Very dry

Very wet

Wet

Dry

0 Drought categories

0 Drought categories

Fig. 6. Empirical relative frequency distributions of drought conditions during El Nino and La Nina.

the rst type of premises X1 ; ; X6 . For SOIas expectedthe picture is different (Fig. 5), since the lag correlations are signicant up to the prior six months. The highest correlation between PMDI and SOI occurs for a lag of six months, but then the correlations weaken. However, another local maximum correlation can be seen at 4 months lag period. Furthermore, theoretically, an annual cycle is considered, so beyond six months in either direction no lag periods are taken into account. Based on these ndings, we used four lagged periods (0, 2, 4 and 6 months) of high correlations as SOI-type premises (X7, X8, X9, X10). Note the trade-off between the increasing number of premises and the length of the data set. The entire 19461994 data set {Xi;j ; Yj }i1;;k; j1;;n contains k 6 4 10 premises Xi and n observations on premises and the response Y. The entire time series is split into two parts: a training set t (2/3 of the entire period) and a validation set n (1/3 of the entire period). The training set will be used to learn the fuzzy rules so it must be long enough in order to provide valuable model outputs.
1
Membership-function

And the validation set will be applied to validate the rules derived from the training set, namely, how correctly they can estimate the observed response. Different partitions of the data set were used to check the sensitivity of results to this operation; in the present case, the results are not sensitive to the selection of partitions. Therefore all the examples use the same partitioning (194662 and 197894 as the training period, and 196377 as the validation period). 3.2. Denition of fuzzy numbers Fuzzy numbers are dened for each variable involved in the model. A fuzzy number Ai consists of x; mAi x pairs where x is an element of some type of continuous set and mAi is the membership function which must have no local minimum and attain a maximum of onevalues of mAi x vary in the range of [0;1] depending on the truth of the characteristics that are considered (Dubois and Prade, 1980). One of the simplest fuzzy numbers is the triangular fuzzy number represented as a1 ; a2 ; a3 T using notation of Fig. 7. Dene, for example, the fuzzy number of very dry condition. The denition of a very dry climatological condition may not be correct by characterizing it with a single value of PMDI. Even an interval has sharp limits and the various values between the lower and upper limits are not distinguished. In the meanwhile, fuzzy logic makes it possible to look at these PMDI values as a sort of continuum. Although interval partitioning can be appropriate to apply in some cases fuzzy numbers are closer to human thinking than intervals. So all possible PMDI values have

0.8 0.6 0.4 0.2 0

a1

a2

a3

Fig. 7. A triangular fuzzy number.

Wet

Dry

Very wet

106

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

values of membership function

Very rare

Rare Medium

Frequent

Very frequent

0 0 max max max max

Monthly relative frequency of a given CP type


Fig. 8. Fuzzy numbers dened on the monthly relative frequency of a given CP type.

some membership values in very dry climatological conditions (all the positive PMDI values representing wet conditions must have 0 membership values). Using a triangular fuzzy number, very dry climatological conditions can be dened with a1 6; a2 4; and a3 1; or (6, 4, 1)T. For instance, actual values of PMDI: 1.9, 2.1 and 4.5 have membership values of 0.30, 0.37 and 0.75, respectivelyso they all represent a very dry condition but to different degrees. The denition of fuzzy numbers were based mainly on the range of premises and the response variable. Then, a linear partitioning was applied to each variable (SOI values, CP relative frequencies, PMDI values). 3.2.1. Fuzzy numbers dened on premises The entire range of possible premise values is divided into several overlapping classes each forming a fuzzy number. The more fuzzy numbers we dene, the better estimation can be expected for the values of
values of membership function 1

PMDI. However, if too many fuzzy numbers are dened on the premises, the validation set might contain too many observations that have never occurred before in the training set, therefore fuzzy rules cannot be applied to them. As a compromise, all premises (relative frequencies of CP types, and lagged SOI time series) are divided into ve regions, namely for monthly CP occurrence: very rare A(1), rare A(2), medium A(3), frequent A(4) and very frequent A(5) (Fig. 8). Then for SOI: strong El Nino A(1), weak El Nino A(2), normal A(3), weak La Nina A(4), and strong La Nina phases A(5) (Fig. 9). Various CP types occur with different frequencies, so for the sake of comparability the highest monthly frequency that ever occurred in the data set is dened as the maximum of the given CP type premise (Table 3). As an example, during the rst month of data, April 1946 the occurrences of CP types are: CP1: 9, CP2: 0, CP3: 6, CP4: 2, CP5:1, and CP6: 12 days. Thus, for that month X1 0:30 (relative frequency of CP1), X2 0, X3 0:20, X4 0:07; X5 0:03; X6 0:40; X7 1:04 (simultaneous SOI), X8 0:31 (SOI2 months before), X9 0:60 (SOI4 months before), X10 0:25 (SOI6 months before). The
Table 3 Monthly maximum relative frequencies (max) for daily CP types and their proportions CP1 CP2 0.50 0.38 0.25 0.13 CP3 0.93 0.70 0.47 0.23 CP4 0.74 0.56 0.37 0.19 CP5 0.94 0.70 0.47 0.23 CP6 0.60 0.45 0.30 0.15

weak El Nino strong El Nino 0 -3 -2 -1

Normal

weak La Nina strong La Nina

0 SOI values

Fig. 9. Fuzzy numbers dened on SOI.

max 3/4max 1/2max 1/4max

0.77 0.58 0.38 0.19

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114 Table 4 Values of membership function for the rst data point (April 1946) i Xi,1

107

mA1i
Very rare

mA2i
Rare 0.44 0 0.86 0.36 0.14 0 Weak El nino 0.69 0 0 0

mA3i
Medium 0.56 0 0 0 0 0.33 Normal 0.31 0.79 0.60 0.83

mA4i
Frequent 0 0 0 0 0 0.67 Weak La nina 0 0.21 0.40 0.17

mA5i
Very frequent 0 0 0 0 0 0 Strong La nina 0 0 0 0

1 2 3 4 5 6

0.30 0 0.20 0.07 0.03 0.40

0 1.00 0.14 0.64 0.86 0 Strong El nino

7 8 9 10

1.04 0.31 0.60 0.25

0 0 0 0

corresponding membership functions are given in Table 4. For example, the relative frequency of CP1 X1;1 0:30 has membership values (different from 0) in both fuzzy sets Rare monthly CP occurrence and Medium monthly CP occurrence, 0.44 and 0.56, respectively; or the relative frequency of CP5 X5;1 0:03 has membership values (different from 0) in both fuzzy sets Very rare monthly CP occurrence and Rare monthly CP occurrence, 0.86 and 0.14, respectively. 3.2.2. Fuzzy numbers dened on response variable PMDI as the response variable (Y) was considered for the eight climate divisions and spatial average of the entire state of Nebraska (NOAA, 1998). Different fuzzy number systems were dened on the range from extremely dry (large negative PMDI values) to extremely wet (large positive PMDI values) conditions. As the total number of fuzzy numbers increases (7, 8, 11, 12, 17, 18), the accuracy of the
1

fuzzy rule-based model improves. So, the last option was chosen with fuzzy numbers: B1; ; B18 (Fig. 10). This number of fuzzy partitions offers a proper representation of the wide range of PMDI, and the data set can provide several points in each interval. For the example of April 1946, values of the PMDI membership function are given in Table 5 for the eight climate divisions and the spatial average. 3.3. Rule construction Fuzzy rules are constructed using the training set(t : {Xi;j ; Yj }i1;;k; j1;;nt (where nt n, number of observations in the time series of the training set) by applying the following steps. 3.3.1. Determine the highest values of all membership functions for each data point First, values of membership functions are calculated for each observed premise and response

dry 7 dry 6 dry 5 dry 4 dry 3 dry 2 dry 1 wet 1 wet 2 wet 3 wet 4 wet 5 wet 6 wet 7 wet 8 extreme extreme normal dry wet -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

PMDI values
Fig. 10. Fuzzy numbers dened on PMDI.

108

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

Table 5 Values of response membership function for the rst data point (April 1946/PMDI in different regions of Nebraska) Drought Division Membership values m B(1) m B(2) Extreme dry Dry 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mB(6) Dry 3 0 0 0 0.47 0 0.08 0.29 0 0 mB(7) Dry 2 0.16 0.71 0.31 0.53 0.47 0.92 0.71 0.76 0.84 mB(8) Dry 1 0.84 0.29 0.69 0 0.53 0 0 0.24 0.16 mB(9) Normal 0 0 0 0 0 0 0 0 0 mB(10) Wet 1 0 0 0 0 0 0 0 0 0 mB(17) Wet 8 0 0 0 0 0 0 0 0 0 mB(18) Extreme wet 0 0 0 0 0 0 0 0 0

Y1 1.16 1.71 1.31 2.47 1.47 2.08 2.29 1.76 1.84

1 2 3 5 6 7 8 9 NE

variable: mAli Xi;j for li 1; ; 5; i 1; ; k and mBl Yj : Then, the maximum values of membership functions are selected. Thus, each Xi;j data point within the data set j 1; ; nt possesses a value Mi;j : Mi;j max mAli Xi;j ;
li 1;;5

and also each response Yj possesses a value M0;j : M0;j max mBl Yj :
l1;;18

Table 6 shows these selected maximum values for the rst data point, April 1946.

Table 6 Maximum membership function values and weights for the rst data point (April 1946) i Name Maximum value Mi,1 0.56 1.00 0.86 0.64 0.86 0.67 0.69 0.79 0.60 0.83 DOF1 0:049 Maximum value M0,1 0.84 0.71 0.69 0.53 0.53 0.92 0.71 0.76 0.84 Name of the fuzzy number Medium Very rare Rare Very rare Very rare Frequent Weak El nino Normal Normal Normal

1 2 3 4 5 6 7 8 9 10

CP1 CP2 CP3 CP4 CP5 CP6 SOI SOI (2) SOI (4) SOI (6)

Response variable

Location

Name of the fuzzy number dry dry dry dry dry dry dry dry dry 1 2 1 2 1 2 2 2 2

Weight of rule 1 v1 DOF1 M0;1 0.041 0.035 0.034 0.026 0.026 0.045 0.035 0.037 0.041

PMDI div. PMDI div. PMDI div. PMDI div. PMDI div. PMDI div. PMDI div. PMDI div. PMDI/NE

1 2 3 5 6 7 8 9

W-Ne N-Ne NE-Ne Central-Ne E-Ne SW-Ne S-Central Ne SE-Ne Nebraska

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

109

3.3.2. Combined effect of fuzzy numbers (using operator AND) Since we have more than one premise, the effects of premises should be combined. The two most commonly used operators for fuzzy numbers are AND and OR (Zimmermann, 1985). In the present model we used only the operator AND to add the effects of different premises. So a rule will look like this: IF (X1;j is Al1 AND X2;j is Al2 AND AND ^ j is B(l). X10;j is Al10 THEN Y The combined effect of all premises is represented here by the product of membership functions called degree of fulllment (DOF) which indicates the degree of applicability of the rule within the system. Thus, DOF of the jth set of data points (DOFj) is calculated as: DOFj
k Y i1

IF Medium CP1 occurrence AND Very rare CP2 occurrence AND Rare CP3 occurrence AND Very rare CP4 occurrence AND Very rare CP5 occurrence AND Frequent CP6 occurrence AND weak El Nino in the actual month AND Normal phase 2 month before AND Normal phase 4 month before AND Normal phase 6 month before THEN dry1 drought condition (2) The rule system will grow as more and more rules are added on the basis of observed data points. If a rule derived from a given set of data points is not included in the rule system yet, then it should be added to the rule system. 3.3.3. Assign a weight to each rule Weights indicate the proportion of the training data sets explained by a given (mth) rule. They are calculated as the sum of the products of DOFj and value of membership function of the response variable M0;j :

M i ;j :

The rst data point (April 1946) has a DOF1 0:561:000:860:640:860:670:690:790:600:83 0:049: In the very beginning, the fuzzy rule system is empty, it contains no rules at allthe rst rule is derived from the rst observed values. In the present case, this rst rule for the entire Nebraska, and for Northern, Central, Southwestern, South-Central, Southeastern Nebraska, looks as follows: IF Medium CP1 occurrence AND Very rare CP2 occurrence AND Rare CP3 occurrence AND Very rare CP4 occurrence AND Very rare CP5 occurrence AND Frequent CP6 occurrence AND weak El Nino in the actual month AND Normal phase 2 month before AND Normal phase 4 month before AND Normal phase 6 month before THEN dry2 drought condition (1) for Western, Northeastern and Eastern Nebraska:

vm

nt X j1

DOFj M0;j :

For the rst data point (April 1946), the weights of rule (1) or (2), depending on the area considered, are shown in Table 6. If the rst rule (1) or (2) appears in more data points, the individual weights are summed. After proceeding throughout the entire training set, all derived rules will possess a weight, that will be used in the validation procedure when the estimated values of the response variable are calculated during the defuzzication step. 3.4. Validation procedure Fuzzy rules are validated using the validation data set(n : {Xi;j ; Yj }i1;;k;jnt 1;;n in the following steps. 3.4.1. Calculate all possible DOF for each data point All values of membership functions are calculated for each premise, so we have all mAli Xi;j (for li 1; ; 5; i 1; ; k) values. Since the fuzzy numbers are dened as overlapping regions, all the data point will fall into two different fuzzy regions

110

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

Table 7 Membership function values at the data point of July 1966 (Western Nebraska) i Xi,124

mA1i
Very rare

mA2i
Rare 0.67 0.52 0.97 0.26 0.97 0.21 Weak El nino 0.16 0.43 0.82 0.89

mA3i
Medium 0 0 0 0.74 0 0 Normal 0.84 0.57 0 0.11

mA4i
Frequent 0 0 0 0 0 0 Weak La nina 0 0 0 0

mA5i
Very frequent 0 0 0 0 0 0 Strong La nina 0 0 0 0

1 2 3 4 5 6

0.13 0.07 0.23 0.32 0.22 0.03

0.33 0.48 0.03 0 0.03 0.79 Strong El nino

7 8 9 10

0.24 0.65 1.77 1.33

0 0 0.18 0

of a given premise (Figs. 8 and 9). Thus, theoretically, there are 2 k possible rules, but most of them are either impossible or did not occur in the training set (the maximum number of rules is determined by the length of the training set, nt which is much less then 2 k). Therefore only a few existing rules will be taking

2 where B m is the most likely value (mBm 1) of the consequence fuzzy number Bm dened on PMDI. In our example, for data point July 1966 ve rules are applicable out of the 1024 possible fuzzy rules (Table 8). So the estimation for PMDI in western Nebraska at July 1966 is:

5 ^ 124 10 0:0960:641 0:250:141 3:590:233 1:010:151 9:120:303 5:39 1:45 Y 3:73 105 0:0960:64 0:250:14 3:590:23 1:010:15 9:120:30

into account in specifying the response output. As an example, for a data point from the validation set (July 1966), the possible membership values are calculated in the western Nebraskan region (Table 7). The total number of potentially applicable fuzzy rules is 210 1024 in the present study.

3.5. Evaluate the fuzzy rule-based model The fuzzy rule-based model must be evaluated in terms of how well it reproduces the statistical properties and the actual time series of the consequences in the validation set.

3.4.2. Combine the fuzzy responses: defuzzication At this time, the application of each rule provides a fuzzy response. The defuzzication process will combine the fuzzy responses and arrive at a crisp (a real number) estimated response. The center of gravity can be commonly used to obtain the estimated ^ j ): value of the response variable (Y
2 DOFm vm B m ^ j mt X Y DOFm vm mt

4. Results The results of the model using 5 fuzzy numbers on each premise and 18 on the response variable are summarized in Table 9 by providing the means and standard deviations of observed and estimated PMDI, the root-mean squared errors (RMSE) and the correlation coefcients between observed and estimated time series for each climate division. These statistical characteristics serve as criteria of verication. It is evident from Table 9 that the fuzzy rule-based model preserves the statistical parameters of the PMDI; in

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114 Table 8 Characteristics of the applied rules for the data point of July 1966 (Western Nebraska) Applied (mth) rule VR,R,VR,R,R,R,N,wE,wE,N ! dry1 VR,R,VR,M,R,VR,wE,N,wE,N ! wet1 VR,R,R,R,R,VR,wE,wE,sE,wE ! wet3 R,VR,VR,N,R,R,wE,N,wE,wE ! wet1 R,R,R,R,R,R,wE,wE,wE,wE ! dry3 DOFm [10 5] 0.96 2.51 35.91 10.12 91.20 Weight v m [10 2] 6.4 1.4 2.3 1.5 3.0
2 B m

111

1 1 3 1 3

Table 9 Summary of results. Fuzzy numbers: 5 in CP time series, 5 in SOI time series, 18 in PMDI time series Div1 Ave. of observed Ave. of estimated Std. dev. of observed Std. dev. of estimated RMSE Correlation coeff. 0.18 0.15 2.13 1.99 1.53 0.74 Div2 0.75 1.00 2.87 2.76 1.82 0.80 Div3 0.65 0.59 2.85 2.78 1.63 0.84 Div5 0.97 0.89 2.70 2.83 1.74 0.82 Div6 0.71 0.59 2.59 2.53 1.57 0.82 Div7 0.53 0.62 2.60 2.54 1.73 0.79 Div8 0.83 0.68 2.47 2.46 1.64 0.79 Div9 0.53 0.40 2.75 2.60 1.65 0.82 NE 0.42 0.32 2.91 2.92 1.77 0.83

fact, there is no signicant difference in any of the regions considered. In addition, the distributions of the calculated PMDI reproduce the empirical distributions (Fig. 11). It is even more noteworthy if we consider the much larger difference between the empirical distributions in the learning set (from which the rules are derived) and the validation set (Fig. 3). However, the performance of the model is very sensitive to the selection of the number of classes in the premises. If, for instance, only three fuzzy numbers
observed PMDI estimated PMDI / CP only

are dened on the two types of premises (Rare, Medium, Frequent monthly CP occurrence, and El Nino, Normal, La Nina phases), the distributions differ signicantly (Fig. 12)the results are not as good. Fig. 13 shows the observed and estimated time series for two divisions. The model performs almost perfectly over the training set, and quite well over the the entire period. However, the estimated values are not exactly the same as the observed drought index during the validation period. This is evident and was
estimated PMDI / CP + SOI estimated PMDI / SOI only

1 0.8 relative frequency 0.6 0.4 0.2 0 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

South-Central Nebraska

PMDI-values
Fig. 11. Comparison of cumulative frequency distributions of PMDI time series (19461994) in South-Central Nebraska (Fuzzy partitions: 5 on CP, 5 on SOI, 18 on PMDI).

112

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

1 0.8 relative frequency 0.6 0.4 0.2 0 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 PMDI-values


Fig. 12. Cumulative frequency distribution of the observed and estimated PMDI time series (19461994) in Nebraska (fuzzy partitions: 3 on CP, 3 on SOI, 17 on PMDI).

observed PMDI estimated PMDI

expected, since droughts are triggered by a large number of atmospheric, hydrologic, agricultural, and other phenomena in addition to the two types of premises this model considers. Another reason is that during the learning process huge and persistent

negative (years 195457) and positive (199294) peaks must be assimilated. The model did learn all the peaks which is necessary to apply the fuzzy rule-based model to the entire range of PMDI.
estimated PMDI

Western Nebraska
9 6 PMDI 3 0 -3 -6 -9 1946 training set first half 1954 1962

observed PMDI

validation set 1970 years 1978

training set second half 1986 1994

South-Central Nebraska
9 6 PMDI 3 0 -3 -6 -9 1946 1954 1962 training set first half

observed PMDI

estimated PMDI

validation set 1970 years 1978

training set second half 1986 1994

Fig. 13. Observed and estimated PMDI time series (19461994, summer half years). Fuzzy partitions: 5 on CP, 5 on SOI, 18 on PMDI.

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114

113

South-Central Nebraska
9 6 PMDI values 3 0 -3 -6 -9 1946 1952 1958 1964

observed PMDI

predicted PMDI

1970 years

1976

1982

1988

1994

Fig. 14. Observed and multivariate regression estimated PMDI time series for South-Central Nebraska.

5. Discussion and conclusions A fuzzy rule-based methodology has been presented to estimate the modied Palmer index using global atmospheric circulation and ENSO for the climate divisions in Nebraska. Separate use of either the relative frequencies of CP types as premises or the lagged SOI shows that neither formulation can reproduce the empirical frequency distribution (Fig. 11). In fact, prediction based solely on SOI is the worst. The consideration of the joint forcing then results in dramatic improvement. Thus, one of the main ndings of this paper is that both types of premises must be taken into consideration for the prediction of PMDI in Nebraska. On the other hand, multivariate regression provides very poor results regardless of whether either or both types of premises are used. Fig. 14 presents typical results of mutivariate regression; evidently this tool cannot be used in the regions considered with the amount of data available. Similar results were found for precipitation in Arizona in Galambosi et al. (1997). The fuzzy rule-based technique has potential to generate time series of drought indices under climate change scenarios. The main idea is to use, instead of the historical CP and ENSO data, GCM-produced data with the established linkage (fuzzy rule) to predict the drought indices. Several GCMs are able to reproduce features of present atmospheric general circulation patterns quite correctly (e.g. Simmons and Bengtsson, 1988; Mearns et al., 1999). On the other hand, because of the possible difculty of obtaining GCM-produced ENSO indices, one may resort to a scenario analysis: for instance, an unchanged ENSO

regime, a more frequent El Nino scenario, and a more frequent La Nina scenario can be assumed, although the rapid development of GCMs may provide meaningful ENSO outputs in the near future. The following conclusions can be drawn: 1. Climate divisions in Nebraska reect different drought conditions of high variability and persistence. 2. Although, there is signicant relationship between simultaneous monthly CP, lagged SOI and PMDI in Nebraska, the weakness of the correlations, the dependence between CP and SOI and the relatively short data set limit the applicability of statistical modeling for prediction. 3. Fuzzy rule-based modeling that does not seek a mathematical function to describe the relationship, provides an excellent tool to predict PMDI from only two types of premises: monthly frequencies of daily CP types and lagged prior SOI. 4. The fuzzy rules can be ascertained from a subset called learning set of the observed time series of premises and PMDI response. Then another subset, the validation set should be also dened to check how the application of fuzzy rules reproduces the observed PMDI. 5. In all its eight climate divisions and Nebraska itself, the fuzzy rule-based technique using the joint forcing of CP and SOI, is able to learn the high variability and persistence of PMDI and results in almost perfect reproduction of the empirical frequency distributions and the real time prediction is also acceptable.

114

R. Pongracz et al. / Journal of Hydrology 224 (1999) 100114 Lawson, M.P., Dewey, K.F., Neild, R.E., 1977. Climatic Atlas of Nebraska, University of Nebraska Press, Lincoln, NE 88 pp. MacQueen, J.B., 1967. Some methods for classication and analysis of multivariate observations. Proc. 5th Berkeley Symp. on Math. Stat. Probab. 1, 281297. Matyasovszky, I., Bogardi, I., Bardossy, A., Duckstein, L., 1993. Estimation of local precipitation statistics reecting climate change. Water Resour. Res. 29 (12), 39553968. McKee, T.B., Doeskin, N.J., Kleist, J., 1993. The relationship of drought frequency and duration to time scales. Presented at the Eighth Conference on Applied Climatology, AMS, Boston, MA. Mearns, L.O., Bogardi, I., Giorgi, F., Matyasovszky, I., Palecki, M., 1999. Comparison of climate change scenarios generated from regional climate model experiments and statistical downscaling. J. Geophys. Res. 104 (8) 66036621. NCAR Data Support Section and University of Washington Dept. of Atmospheric Sciences, 1996. NCEP Grid Point Data Set version III. NOAA, 1997. SOI time series. http://nic.fb4.noaa.gov:80/data/ cddb/cddb/soi. NOAA, National Climatic Data Center, 1998. Modied Palmer Drought Severity Index. ftp://ftp/ncdc.noaa.gov/pub/data/cirs/ 9808.pmdi. Palecki, M., 1996. Nebraskas climate: past and future. Nebraskaland Magazine 74 (1), 106121. Palmer, W.C., 1965. Meteorological drought. Research Paper 45, US Weather Bureau, Washington DC, 58 pp. Piechota, T.C., Dracup, J.A., 1996. Drought and regional hydrologic variation in the United States: Associations with the El NinoSouthern Oscillations. Water Resour. Res. 32 (5), 13591373. Pesti, G., Shrestha, B., Duckstein, L., Bogardi, I., 1996. A fuzzy rule-based approach to droght assessment. Water Resour. Res. 32 (6), 17411747. Pongracz, R., 1999. ENSO impacts and climate change consequences in the Northern midlatitudes. PhD dissertation, Eotvos Lorand University, Budapest, Hungary (unpublished). Pongracz, R., Bogardi, I., Duckstein, L., Bartholy, J., 1997. Risk of regional drought inuenced by ENSO. In: Haimes, Y.Y., Moser, D.A., Stakhiv, E.Z. (Eds.). Risk-Based Decision Making in Water Resources VIII, ASCE, Reston, VA, pp. 114125. Riebsame, W.E., Changnon Jr., S.A., Karl, T.R., 1991. Drought and Natural Resources Management in the United States: Impacts and Implications of the 198789 Drought, Westview Press, Boulder, CO 174 pp. Simmons, A.J., Bengtson, L., 1988. Atmospheric general circulation models: their design and use for climate studies. In: Schlesinger, M. (Ed.). Physically-Based Modelling and Simulation of Climate and Change, NATO ASI Series, II. Kluwer Academic, Dordrecht, pp. 627652. Western Governors Association, 1996. Drought Response Action Plan. WGA, Denver, CO. Wilks, D.S., 1995. Statistical Methods in the Atmospheric Sciences, Academic Press, San Diego, CA 467 pp. Wright, P.B., 1985. The Southern Oscillation: an oceanatmosphere feedback system? Bull. Amer. Met. Soc. 66, 398412. Zimmermann, H.J., 1985. Fuzzy Set Theoryand Its Applications, KluwerNijhoff, Boston, MA 363 pp.

6. A step-by-step description of the methodology has been provided to facilitate its application to other similar cases.

Acknowledgements This research has been partially supported by the US National Science Foundation under grants CMS9613654 and CMS-9614017.

References
Alley, W.M., 1984. The Palmer drought severity index: limitations and assumptions. J. Clim. Appl. Meteorol. 23 (7), 11001109. Bardossy, A., Duckstein, L., 1995. Fuzzy Rule-Based Modeling with Applications to Geophysical, Biological and Engineering Sciences, CRC Press, Boca Raton, FL 232 pp. Bartholy, J., Matyasovszky, I., Duckstein, L., Bogardi, I., 1996. Interrelationship between ENSO and large-scale circulation patterns. Presentation at the Conference on Probability and Statistics, San Francisco, CA, 2123 February. Bogardi, I., Matyasovszky, I., Bardossy, A., Duckstein, L., 1994. A hydroclimatological model of areal drought. J. Hydrol. 153, 245264. Carlson, R.E., Todey, D.P., Taylor, S.E., 1996. Midwestern corn yield and weather in relation to extremes of the Southern Oscillation. J. Prod. Agric. 9 (3), 347352. Chenault, E.A., Parsons, G., 1998. Drought worse than 96; cotton crops one of worst ever. http://agnews.tamu.edu/stories/AGEC/ Aug1998a.htm, Texas A&M Agricultural News Home Page, College Station, TX, August 19. Dubois, P., Prade, H., 1980. Fuzzy Sets and Systems: Theory and Applications, Academic Press, San Diego, CA. Federal Emergency Management Agency, 1995. National Mitigation Strategy: Partnerships for Building Safer Communities. FEMA, Washington DC. Galambosi, A., Duckstein, L., Ozelkan, E., Bogardi, I., 1997. A fuzzy rule-based model to link circulation patterns, ENSO, and extreme precipitation. In: Haimes, Y.Y., Moser, D.A., Stakhiv, E.Z. (Eds.). Risk-Based Decision Making in Water Resources VIII, ASCE, Reston, VA, pp. 83103. Galambosi, A., Ozelkan, E., Duckstein, L., Bogardi, I., 1999. A fuzzy rule-based model for precipitation analysis under climate change in the US Southwest. Presentation at the 79th Annual Meeting, AMS, Dallas, TX, 1015 January. Guttman, N.B., Quayle, R.G., 1996. A historical perspective of U.S. climate divisions. Bull. Am. Met. Soc. 77 (2), 293303. Guttman, N.B., Wallis, J.R., Hosking, J.R.M., 1992. Spatial comparability of the Palmer drought severity index. Water Resour. Bull. 28, 11111119. Heddinghause, T.R., Sabol, P., 1991. A review of the Palmer Drought Severity Index and where do we go from here. Proc. of the Seventh Conference on Applied Climatology, 242246.

You might also like