You are on page 1of 21

Journal of Hydrology, 153 (1994) 1 21

0022-1694/94/$07.00 1993 - Elsevier Science B.V. All rights reserved

[4]

Flood frequency analysis for ungauged sites using a


region of influence approach

Zolt Zrinji, Donald H. Burn*


Department of Civil Engineering, University of Manitoba, Winnipeg, Man., R3T 2N2. Canada
(Received 4 January 1993; revision accepted 9 June 1993)

Abstract

A framework is presented for regional flood frequencyanalysis that is applicablefor estimatingextreme


flow quantiles at ungauged catchments. The methodologyuses the Region of Influence (ROI) approach to
regionalization and explicitlyincorporates a homogeneity test in the process of selecting the collection of
stations that comprise the 'region' for an ungauged site. The relative merits of the methodology are
demonstrated through an application to extreme flow data for sites in Newfoundland, Canada. The new
approach is compared with results obtained from regression analysis and is shown to provide improved
estimates of extreme flow quantiles at sites which are considered to be ungauged.

I. Introduction

T h e e s t i m a t i o n o f the flooding p o t e n t i a l at a site is r e q u i r e d for the design o f


a v a r i e t y o f u r b a n p l a n n i n g a n d river e n g i n e e r i n g w o r k s . O b t a i n i n g a n
a c c u r a t e e s t i m a t i o n o f the r e l a t i o n s h i p b e t w e e n e x t r e m e flows a n d the
a s s o c i a t e d r e c u r r e n c e interval (the so-called Q - T relationship), b e c o m e s
m o r e c o m p l i c a t e d if the length o f the a v a i l a b l e s t r e a m f l o w g a u g i n g r e c o r d
at the site o f c o n c e r n is s h o r t e r t h a n the r e t u r n p e r i o d o f interest. A n even
g r e a t e r difficulty o c c u r s if there is n o flow r e c o r d a v a i l a b l e at the site o f
interest. T o c o m p e n s a t e for a n insufficient length o f d a t a record, the t r a d e -
off b e t w e e n the spatial a n d the t e m p o r a l c h a r a c t e r i z a t i o n o f e x t r e m e flows c a n
be effected t h r o u g h the use o f regional flood f r e q u e n c y analysis. R e g i o n a l
flood f r e q u e n c y analysis c a n facilitate the e s t i m a t i o n o f a n e x t r e m e flow
value at a l o c a t i o n for w h i c h limited flow d a t a exist, b a s e d o n an e x t r e m e

* Corresponding author.
SSD1 0022-1694(93)02334-T
2 Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21

flow relationship derived using information from basins with similar hydro-
logic responses.
Regional flood frequency analysis has historically been used in one of two con-
texts. In the first, the intent is to improve the accuracy and precision of the extreme
flow estimates at a gauged site by pooling at-site information with information from
hydrologically similar gauged sites. In the second, the task is to estimate extreme
flows at a site for which no streamflow data are available. Although there are obvious
fundamental differences between the two problem contexts, there are also several
common features involved which can be effectively exploited.
The purpose of the research described in this paper is to develop a new
approach to regional flood frequency analysis for ungauged sites. The frame-
work developed employs a novel regionalization approach that uses catch-
ment characteristics as the attributes that define the similarity between an ungauged
site and each of the available gauged stations. The next section of this paper presents
a brief overview of regional flood frequency analysis. This is followed by the
development of a framework for the estimation of extreme flows at ungauged
sites. An application of the methodology ensues along with a comparison of the
results with those obtained from an earlier study of the same area.

2. Background

Regional flood frequency analysis typically entails first defining regions


which are subsets of the entire collection of sites. A region comprises a
group of sites from which extreme flow information can be combined for
improving the estimation of extremes at any site in the region. This process
can be referred to as the regionalization step. The next step is to assign the site
of interest to one of the distinct regions defined in the first step. The manner of
accomplishing this task will depend on the process used for defining regions
and whether or not the site of interest is gauged. The third step entails pooling
extreme flow information from the stations in the region to obtain an estimate
of the extreme flow relationship for the site of interest. Again the options
available for this process are partly dictated by the problem context. Further
details on approaches to the above three components of the regional flood
frequency process are provided below with particular emphasis on methods
applicable for ungauged sites.

2.1. Regionalization

Traditionally, the delineation of regions relied on geographic, political,


administrative, or physiographic boundaries. The resulting regions were
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1-21 3

assumed to be homogeneous in terms of hydrologic response, but this cannot


in general be guaranteed, particularly if the spatial variability of the
physiographic and hydrologic characteristics is large. The importance of
regional homogeneity has been demonstrated by, among others, Greis and
Wood (1981), Hosking et al. (1985a), and Lettenmaier et al. (1987).
Hydrologists have often used a regional regression model to estimate flow
characteristics at ungauged sites based on catchment characteristics. Many
studies use residuals from an overall regression relationship for an entire area,
to create geographically contiguous regions based on the sign and magnitude
of the residuals (Wandle, 1977). However, deriving regions using this method
requires a large amount of subjective input.
Cluster analysis is a more objective method of creating regions (Tasker,
1982). The essence of cluster analysis is to identify clusters (groups) of
gauging stations such that the stations within a cluster are similar while
there is dissimilarity between the clusters. Many different algorithms are
available for performing cluster analysis (see, for example, Everitt, 1980).
Because different types of variables (e.g. catchment characteristics or flood
statistics) can be used to define the similarity measure required to generate
clusters, the variables used have to be carefully selected and weighted
according to their importance for the actual problem (Burn, 1989).
Cluster analysis can be combined with discriminant analysis (Wiltshire,
1986b) to assign an ungauged site to an existing cluster. This method
is based on a discriminant score which defines the probability of
cluster membership for an ungauged site. The discriminant score, which
is a function of catchment characteristics, indicates the most likely
group to which the new site should be assigned. However, there is no
guarantee that the new site is hydrologically similar to the stations in that
region.
A novel regionalization approach involves dispensing with fixed regions.
This approach allows each site to have a potentially unique set of stations
which constitutes the region for that site. Thus, it is possible for two neigh-
boring sites to have completely different sets of stations that represent the
region for each site. This methodology, which involves the transfer of extreme
flow information from similar stations to the site of interest, was first
suggested by Acreman and Wiltshire (1987) and Acreman (1987). The
subsequent implementation of the method was referred to as the region of
influence (ROI) approach by Burn (1990a,b). This approach requires the
choice of a threshold value that functions as a cut-off point for the
dissimilarity measure. All stations which have a dissimilarity measure greater
than the threshold value are excluded from the region for that particular
site.
4 Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1-21

2.2. Assignment o f sites to a region

Assigning an ungauged site of interest to one of the regions has traditionally


been a crucial stage in regional flood frequency analysis. With our present
knowledge of the physical environment, it is not possible to accurately model
the multitude of processes and interactions which influence the flood
frequency characteristics at any site. However, through the regionalization
process, it is possible to determine which catchment characteristics play an
influential role in the flood generation process.
Assigning an ungauged site to one of the regions is trivial if the sites are
grouped on the basis of geographical regions or political jurisdictions. In these
situations, borders between regions are clearly defined and the assignment is
determined by the physical location of the ungauged site. However, the
hydrologic similarity of regions defined in this way is often unacceptably
low. When the regions that have been delineated do not represent a con-
tiguous set of gauging stations, the assignment process is not as straight-
forward. If the regions were defined in terms of catchment characteristics,
the ungauged site can be assigned to a region based on its location in the
catchment characteristic data space. Although this is a simple process, there is
no guarantee that similarity in terms of catchment characteristics implies
hydrologic similarity. Thus the ungauged site may be hydrologically different
from the gauged stations in the region. In addition, abrupt changes in the
extreme flow estimates could result as one moves over a data space partition
line.

2.3. Extreme flow estimation

With the definition of the stations that belong to a region, it is possible to


estimate at-site extreme flows incorporating information from all gauged
stations in the region. Approaches such as the index flood method, weighted
distribution based estimators, or multiple regression can be used to combine
information from the stations that belong to a region. The index flood method
involves standardizing the extreme flow information at each site in the region
by dividing the annual flood data by a standardizing flow (typically the mean
annual flood for the site). A regional 'parent' extreme flow relationship is then
derived by pooling standardized extreme flow information from the stations in
the region. From this relationship, one can obtain a standardized (dimension-
less) extreme flow estimate corresponding to any return period desired. The
weighted distribution approach entails estimating standardized extreme flows
at each station and then estimating the standardized extreme flow at the site of
interest as a weighted average of the estimates from the stations in the region.
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21 5

For both of these approaches, the standardizing flow for the site of interest
must be estimated, typically by a regression relationship using catchment
characteristics as the explanatory variables. This standardizing flow is then
multiplied by the dimensionless extreme flow estimate to scale the regional
estimate for the site of interest.

3. A methodology for ungauged site regional flood frequency

The framework described below entails regionalization incorporating a


homogeneity test, the selection of a regional parent distribution function,
and the estimation of at-site extreme flow quantiles. Because both the homo-
geneity test and the selection of the regional distribution function involve the
use of L-moments, a brief overview of pertinent results from the theory of
L-moments is presented in an Appendix.

3.1. Regionalization process

The region of influence (ROI) approach forms the basis for the regionaliza-
tion component of the framework for ungauged site regional flood frequency
analysis developed herein. In this approach, regions are created in a flexible
way such that there are no rigid boundaries, but rather the regions can
overlap. Each site can be considered to have its own region which consists
of the collection of stations that comprise the ROI for that site.
The previous applications of the ROI approach defined catchment
similarity as a function of flow statistics estimated from the annual flood
series for the stations. This is obviously not an option when dealing with
ungauged sites. One possible method for creating regions of influence for
ungauged sites involves regionalization based on catchment similarity
expressed by the weighted Euclidean distance in an M dimensional attribute
space, where the attributes are catchment characteristics. The distance metric
used is defined as

= - (1)

where Djk is the weighted Euclidean distance from site j to site k, M is the
number of attributes (catchment characteristics) included in the distance
measure, Wi is the weight associated with catchment characteristic i, and Xji
is a standardized value for catchment characteristic i for sitej. The weights in
the distance metric are intended to reflect the relative importance of the
catchment characteristics for defining hydrologic similarity. Standardization
6 Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21

of the attributes is necessary because the catchment characteristics will have


different units and also different relative magnitudes. While several methods
are available for data standardization, in this work the attribute is
standardized by dividing each value by the standard deviation for the
attribute. In this way, the dimensionality and the differences in the variability
of the attributes are considered.
Equation (1) may be used to form a lower triangular dissimilarity
matrix, with zeros on the main diagonal. The elements of this matrix
represent the distance (in attribute space) between pairs of sites and
constitute indices of the lack of similarity between the sites. The regionaliza-
tion process for a given site involves successively adding gauged stations
to the ROI for a site, starting with the most similar station and con-
tinually adding the next most similar station. The process terminates when
adding an additional station would lead to a lack of homogeneity for the
collection of stations in the ROI for the site. The homogeneity of the ROI
is evaluated after each station addition using a homogeneity test described
below. In the process of sequentially adding sites to the region, the first site
added which results in a heterogeneous ROI is deleted and the next most
similar site is evaluated. The skipping of one site was implemented to avoid
premature cessation of the ROI building process resulting from one unusual
site.
The process of determining an ROI for a site uses both catchment
characteristic information and the recorded flow measurements for the
gauged stations. The determination of the order in which the gauged stations
are added to the ROI for a site depends on the catchment characteristic values,
while the homogeneity test, which controls the termination of the process, uses
extreme flow data. The homogeneity test thus serves the same role as the
similarity threshold used in the previous application of the ROI method to
gauged sites (Burn, 1990b).
Several types of regional homogeneity test have been developed (see, for
example, Wiltshire, 1986a,c). The homogeneity test used herein is based on an
idea developed by Chowdhury et al. (1991) and involves the use of sample
L-moment ratios to determine if the at site L-moment ratios are similar to the
L-moment ratios of the regional parent distribution. The Appendix provides a
brief overview of L-moments.

3.2. Homogeneity test

In order to test if the L-moment ratios at a site are sufficiently similar to the
L-moment ratios of the parent distribution, the X~R) statistic has been
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1-21

employed (Chowdhury et al., 1991). The X(R)


2 statistic is defined as
K
X~R) = ZX~k) (2)
k=l

where X(2k) is the contribution to X~R) from site k and K is the total num-
ber of sites in the region. The value of X~k) is defined as

-- Var( ) j,j (3)


where ~2 and ~3 are the estimates for the L-skewness and L-kurtosis, respect-
ively, at site k. "r~ and r3R are weighted regional estimates for L-skewness and
L-kurtosis defined as
K

rrg _ k=lK for r = 2, 3 (4)

k=l

where nk is the n u m b e r of years of record at site k. Var(~2), Var(~3) and


Cov(2,~3) are the asymptotic variance and covariance of the sample
L - m o m e n t ratios (see Appendix A of C h o w d h u r y et al., 1991).
The statistic X~R) follows the X2 distribution with 2 ( K - 1) degrees of free-
d o m if the observations at each site are independent. The region can be
considered as h o m o g e n e o u s if the calculated value of X~R) is less than the
tabulated critical X2 value at a selected level of significance for the appropri-
ate n u m b e r of degrees of freedom.

3.3. Selection of a regional distribution


An important use of the L-moments calculated from a r a n d o m sample is to
identify the distribution from which the sample was drawn. It is possible to
determine this because the L - m o m e n t Ar exists if and only if E IX I exists and
therefore a distribution whose mean exists is uniquely characterized by L-
moments, even if some of the conventional moments of the distribution do
not exist (Wallis, 1989). Hosking (1986) gave L-moments for distributions
typically used in regional flood frequency analysis in terms of their distri-
bution parameters and expressions for parameter estimators in terms of the
sample estimates of the L-moments. An appropriate distribution can be detect
by plotting the sample r 3 vs. T4 values on an L - m o m e n t ratio diagram along
8 Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21

/
- ----- Generalized Logistic /
Generalized Extreme Value
/
Generalized Pareto
Distributions:U-Uniform, E-Exponent, G-Gumbel, L-Logistic, N-Normal
/
Site Sample
/
Sample Mean
/p
r~

8 //
0.5 //
,~.//
" ' - * . 2 , / /2

........................... -~--~ - :~zz - ~ ~'~ ......................................................................


t

-0.5 0 0.5 1

L-skewness
Fig. 1. L-Moment ratio diagram for the case study data.

with the plot of the L-moment ratios for commonly used distributions (see
Fig. 1).
3.4. Estimation of at-site extreme flow quantiles

Once a region of influence for an ungauged site has been determined,


extreme flow estimation at the ungauged site can proceed. Various methods
for combining information from stations in the ROI are specific to the
distribution function selected for extreme flows and the parameter estimation
technique used. The regional GEV estimator with probability weighted
moments (PWM) parameter estimation (Hosking et al., 1985b) has been
found to be an efficient way of combining flow data in regional flood
frequency analysis (Potter, 1987). PWMs for a site may be calculated from
the annual flood series as
Mjr = --1 ~ - ~ p~xi where r -- 0, 1,2 (5)
nj i=1

and
( i - 0.35)
Pi - (6)
nj
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1-21 9

where Mjr is a sample estimate for the order r PWM for site j; Pi is the plotting
position for the/th flood, x i, of the annual maximum daily flow series and nj is
the number of annual maximum daily flow values for site j.
Two types of regional estimators are presented herein involving different
weighting methods. The first method of weighting information from each site
included in the region of influence is by the record length at each gauged
station. Weighting by record length implies that the stations with longer
flow records provide greater amounts of information about the regional
flood frequency relationship. Scaled PWMs for each site j in the region are
calculated as

MJ' vj (7)
tJ'- Mj0

tJ2 - Mj2
Mj0 vj (s)

Regional PWMs are then calculated as a weighted average of the scaled at-site
PWMs through
K
Z tJinj
j=l
T/- K
for i= 1 2 (9)

ZnJj=l

On the basis of the above equations, it is possible to define the parameters of


the regionalized 'parent' distribution function using relationships that are
specific to the form of the distribution function (see, for example, Hosking,
1986).
A second method for weighting information from stations in the region is
by the similarity measure defined by Eq. (1). In this case, the distribution
parameters and extreme flows associated with various return periods are
estimated at each gauged site individually. The parameter estimation process
starts with the calculation of the PWMs as given by Eqs. (5) and (6). The
parameters of the distribution function for each site can then be estimated.
Once the parameters have been estimated, standardized extreme flows can be
obtained for each site and these standardized extreme flows then weighted to
obtain the estimated standardized extreme flow at the ungauged site. The
weights applied to the standardized extreme flows for a particular gauged
station are inversely proportional to the distance defined by Eq. (1). The
l0 Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1-21

weight, wl,, for site k is defined as

1
Duk
- - - (lO)
K 1

where D u k represents the distance (in attribute space) between the ungauged
site and regional gauged station k. This weighting scheme implies that stations
that exhibit greater similarity with the site of interest are more important for
estimating at-site extreme flow quantiles.
In order to complete the extreme flow estimation at the ungauged site, an
estimate of the mean annual flood (i.e. the standardizing flow) is required. A
nonlinear regression relationship is used to obtain the required estimate. The
regression model has the form of

Q = aoX 1 X 2
al a2 . . . X p e (11)

where Q is the mean annual flood, Xg is the ith catchment characteristic, P is


the number of catchment characteristics included in the regression, and the as
are the regression parameters.
To incorporate as much information as possible, all stations can be
included in the regression model with the exception of the station which is
considered to be ungauged. Thus we will obtain NS-1 regression relationships
where NS is the number of stations for which streamflow data are available. It
should be noted that if the total number of stations is sufficiently large, it
would be possible to further subdivide the data space to improve the
regression estimates.
The approach presented above for regional flood frequency analysis
at ungauged sites includes several novel features. Firstly, the frame-
work represents the first application of the region of influence
approach to ungauged sites. The determination of an ROI for each ungauged
site means that the assignment of an ungauged site to an existing, fixed
region is no longer required. Secondly, the method entails the direct
incorporation of a homogeneity test into the regionalization task. The
inclusion of the homogeneity test in the regionalization process should
ensure that the sites in an ROI constitute a homogeneous grouping of
gauging stations. The application of the above steps for the extreme
flow estimation at ungauged sites should thus lead to improved extreme flow
estimates.
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1-21 11

4. Application of the methodology

4.1. Description of case study area

The study area consists of the entire island of Newfoundland, Canada. The
island is roughly triangular in shape and is bounded on the west coast by the
Gulf of Saint Lawrence and on the south and northeast by the Atlantic Ocean.
Available catchment characteristics for a set of 22 gauging stations include the
catchment area, a catchment shape factor, the percentage of catchment area
controlled by lakes and swamps, the percentage of barren area, the mean
annual runoff for the catchment, and the latitude of the catchment centroid
(Panu and Smith, 1988). In addition, annual maximum flow data are available
for the 22 stations. The length of the flow records varies between 12 and 61
years, with an average length of 29 years.

4.2. Basis for comparison of results

The methods for extreme flow estimation described above were evaluated
through an application to the case study. A difficulty in any evaluation of
extreme flow estimators using real world data is that we never know the 'true'
extreme flow for a selected return period at a site. This is especially true when
methods for estimating extreme flows at ungauged sites are considered. Tech-
niques for estimating extreme flows at ungauged sites can be evaluated using a
set of stations for which gauging records do exist by sequentially considering
each of the gauged sites to be ungauged and conducting the analysis as if no
flow record existed for the site considered to be ungauged. However, this only
solves part of the evaluation problem in that the at-site flow record provides
an imperfect estimate of the true, but unknown, extreme flow relationship at
the site. The quality of the available estimates should improve with increases
in the record length. However, for the record lengths available for the case
study considered herein, the at-site extreme flow estimates based on single
station analysis must be deemed somewhat suspect, particularly for those
stations with very short record lengths. As such, the at-site extreme flows
estimated from single station analysis are unlikely to provide a reasonable
benchmark for comparison of the various ungauged site flood frequency
analysis options.
An alternative to using single station estimates is to use a regional estimate
of the extreme flows at each site incorporating information from both the site
of interest and from other similar stations. Although this option is also not
ideal, it does avoid the difficulty associated with the short at-site record
lengths and provides an assessment of the merit of the ungauged estimation
12 Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21

options by comparing regional estimates obtained with and without data at


the site of interest.
The regional method used herein also employs the ROI concept but for this
case the similarity metric used for defining the ROIs can include flood
statistics because in the regional analysis, extreme flow information is avail-
able for all sites. The variables used in the similarity metric were the same as
those used by Burn (1990b) and included the coefficient of variation (CV) of
the annual flood series, a plotting position estimate of the 10 year flow event
(Q10) interpolated from the available annual flood series, and a variation on
the Pearson skewness (PS) measure defined as

PS - # - m (12)
~r
where # is the mean, m is the median, and a is the standard deviation of the
annual flood series. As with the case of forming ROIs for ungauged sites,
stations were added to the ROI for a site, in the order given by the similarity
metric, until adding an additional site results in the collection of stations not
passing the homogeneity test. For the gauged analysis case, streamflow
information from the site of interest is also included in the homogeneity test
(this was obviously not possible with the ungauged analysis described earlier).
Parameter estimates and extreme flows for the site of interest were then
calculated in accordance with the procedures outlined above, where at-site
flow information can now also be incorporated.
An examination of the L-moment ratio plot (see Fig. 1) revealed that the
GEV distribution is an appropriate option for the collection of gauging
stations. The same distribution function was used for each site and also for
all stages of the analysis to avoid introducing performance differences arising
from the distribution assumption. In this way, only the effect of the various
regionalization approaches is tested.
The GEV distribution function is defined as

F(x)=exp - 1-~(x-() for k0 (13)

and

F(x) = exp ( -exp ( 0 c~


l(x - ())) - for k= (14)

The terms (, c~, and k are the location, scale, and shape parameters, respect-
ively. When k = 0, the GEV distribution reduces to the Gumbel distribution.
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1-21 13

The inverse distribution function is defined as

XT=~+~ 1-- --log 1-- for k#0 (15)

and

olog(,og(, 1)) for (16,


The parameters for this distribution are estimated, using PWMs, through
(after Hosking et al., 1985b)
2T1 - 1 log2
c- - - (17)
3T2 - 1 log3

k = 7.8590c + 2.9554c 2 (18)

(2T1 - 1)k
a = r ( 1 + k)(1 - 2 -k) (19)

(F
~ = 1 + k ( F ( 1 + k ) - 1) (20)

The results obtained using the framework developed herein were compared
with the extreme flow quantile estimates obtained by regression analysis,
following the work of Panu and Smith (1988). The extreme flow estimation
for each site by regression included two different grouping scenarios. Firstly,
the entire data set is defined as one group. Secondly, the area was divided into
two groups, namely the northern and the southern part of Newfoundland
(Panu and Smith, 1988). This station grouping corresponds to geographic
regionalization. The dividing line for the two groups follows the topo-
graphical highest line in the central part of the island. An extensive analysis
(Panu and Smith, 1988) was conducted regarding the influential catchment
characteristics in support of the above grouping. It was found that the major
flood producing mechanism in the northern part of the island is spring snow-
melt aggravated by rainfall and in the southern part of the island, rainfall
storms and rainfall events contributing to snowmelt in both the winter and the
spring season are important. The influential catchment characteristics
included in the regression model when the entire island is considered as a
group are the catchment drainage area (DA), the mean annual runoff
(MAR), the percentage of catchment area controlled by lakes and swamps
(ACLS), and the catchment shape factor (SHAPE). For the southern group,
this same set of catchment characteristics is used. For the northern group, the
14 Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21

influential catchment characteristics are the DA, MAR, and the latitude of the
catchment centroid (LAT). The general form of the regression model used is
lr'al Vt/2 . .
QT=-0,-~, A2. Xp' (21)
where Xi is one of the p influential catchment characteristics, the as are the
regression parameters, and Qr is the estimated flow where T indicates the
return period. This type of regression relationship between estimated flow
and catchment characteristics must be developed for each desired return
period. The above nonlinear form of the regression relationship can be trans-
formed to a linear relationship using
log QT = log a0 + al log xl + a2 log x 2 ... ap log Xp (22)
The study by Panu and Smith (1988) used the annual maximum instantaneous
flow. This study, however, is based on the annual maximum daily flow series
and the regression parameters were therefore recalculated with respect to the
maximum daily flow for each year. Furthermore, for consistency, the values of
extreme flows corresponding to the various return periods were estimated, for
each gauged site, based on the GEV probability distribution using: (i) single
site frequency analysis, and (ii) regional at-site estimates. Finally, to avoid an
unfair comparison, separate regression relationships were derived for each
site. The extreme flows from the site of interest were not included in the
data set used to derive the individual regression relationships. In this way,
the regression approach is realistically evaluated as an ungauged site extreme
flow quantile estimation approach.
A total of eight estimation options were considered and are summarized
below.
T1. This is a single station at-site extreme flow estimation option which uses
at-site PWMs to estimate the parameters for the GEV distribution.
T2. This is a regional estimator based on PWMs estimated from extreme
flow data for all stations in the ROI defined for each site. PWMs from the
stations in the ROI are weighted in accordance with the record length at the
station to obtain at-site extreme flow estimates. In defining the ROI, the
station attributes used were the CV, PS, and Q 10 with equal weights assigned
to each attribute. The 1% level of significance was used in the regional
homogeneity test which determines the stopping criteria for adding stations
to an ROI.
ROI1. This is the region of influence approach applied to ungauged sites
where catchment characteristics are used as the attributes defining station
similarity. The catchment characteristics selected were the DA, SHAPE,
ACLS, and M A R with equal weights assigned to each attribute. PWMs
from the stations in the ROI are weighted in accordance with the record
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1-21 15

length at the station to obtain at-site extreme flow estimates. The 1% level of
significance was again used in the homogeneity test.
ROI2. This option is the same as ROI1 except that the PWMs from the
stations in the ROI are weighted in accordance with the similarity between the
catchment characteristics for each station and the catchment characteristics
for the site of interest.
RG1. This is a regression based estimator considering all the stations to
comprise a single region. The single station at-site extreme flow estimates
(option T1) at 5, 10, 20, 50, and 100 year return periods were used as the
'true' flow values to be estimated.
RG2. This option is the same as RG1 except the set of stations is divided
into two regions corresponding to the north and south portions of the study
area.
RG3. This option is equivalent to RG1 except that the 'true'
at-site estimates were obtained from Option T2 (the regional estimation
approach).
RG4. This option is the same as RG3 except the set of stations is divided
into two regions as in option RG2.
The first two estimators are two different approximations to the 'true'
extreme flow values at the sites, the third and fourth methods are ROI
ungauged options developed in this work and the remaining four options
are regression estimators similar in spirit to the approach of Panu and
Smith (1988).

4.3. Presentation and analysis of results

The first stage in the analysis of the results was a comparison of the two
estimates of the 'true' extreme flow values, options T1 and T2. One would
anticipate that if the regional estimator was a perfect predictor of the true
extreme flows that the agreement between options T1 and T2 would improve
with an increase in the record length and would also be better for shorter
return periods. This is the case because as the length of the record at a site
increases, the single station at-site estimates should approach the 'true' values.
In addition, for a fixed record length, the utility of regional information is
greater for longer return periods and in fact for return periods in excess of the
length of record, it can be expected that any regional information is of benefit
regardless of the similarity between the new information and the information
from the site of interest.
The results from the two 'true' flow estimators were compared at return
periods equal to 5, 10, 20, 50, and 100 years. It was noted that for the two sites
with the shortest data record length (12 and 15 years) the agreement between
16 Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21

the two options was comparatively poor. The lack of agreement in these
results was particularly pronounced for the longer return periods (50
and 100 years). For these two stations, the regional estimates are probably
providing more realistic extreme flow estimates. For the site with the longest
data record length (61 years) the agreement was also not particularly strong.
In this case, there was some indication of a systematic lack of fit. For the
remainder of the sites, the agreement was generally quite satisfactory with
greater differences in the results obtained noted for the longer return
periods, as was expected. There was, however, not as strong a relationship
as expected between the agreement in the results and the record length of the
site. The above results perhaps imply that a more sophisticated weighting of
regional and at-site information could be devised. This, however, was beyond
the scope of the present investigation. The results of this comparison indicate
that the regional estimator is probably a reasonable approximation to the unknown
'true' extreme flow values although the comparisons presented below will consider
both options T1 and T2 as predictors of the 'true' extreme flow values.
The ROI defined for each site when the site was assumed to be ungauged
tended to contain a similar set of stations to those included when the site was
considered to be gauged. The main differences were in the order in which the
sites were included in the region. For the ungauged case, there were generally
fewer stations included in the ROI. The main reason for this behaviour is that
when stations are selected based on similarity in basin characteristics, a
collection of stations is formed that contains stations which should not
have been added based on their extreme flow characteristics. As a result,
the homogeneity test will deem the set of stations to no longer be homoge-
neous and the regionalization process will terminate even though there are still
stations available, but not in the ROI, which are hydrologically similar to the
site of interest.
To evaluate and compare the relative merits of the ungauged site estimation
options, comparisons were made between the results from: option ROI1 and
ROI2 vs. both T1 and T2; option RG1 and RG2 vs. T1; RG3 and RG4 vs. T2.
It should be noted that for the regression options, the comparison is with at-
site estimates equivalent to those predicted by the corresponding regression
relationship (i.e., either the single station or the regional estimates). Each
estimator was evaluated in terms of a measure of the relative mean squared
error of the estimates averaged over return periods of 5, 10, 20, 50, and 100
years. This measure is defined as

(23)
i=1 j=l Q~
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21 17

Table 1
MSE values for the various estimation options

Estimation option 'True' extreme flow quantiles option

TI T2

RO11 14.96 14.38


ROI2 14.80 14.47
RG1 15.14
RG2 30.85
RG3 14.79
RG4 23.24

where MSE is the relative mean squared error measure, N S is the number of
sites in the data set, J is the number of return periods, 0~ is the estimate for the
extreme flow for thejth return period at site i, and Q~.is the 'true' value for the
extreme flow for the jth return period at site i.
The results for the various estimators are summarized in Table 1, which
reveals a substantive improvement in the MSE associated with the ROI
options relative to the regression-based alternatives. Options ROI1 and
ROI2 result in essentially equivalent MSE values, implying that the two
approaches to weighting extreme flow information from sites in the ROI
have less impact on the estimation of the extreme flow quantiles than does
the identification of an appropriate set of gauging stations for inclusion in the
region for a site. It is also noted that there is a slight decrease in the MSE
values for ROI 1 and ROI2 when the 'true' extreme flow quantiles are assumed
to follow the assumption of T2 (regional estimates) as opposed to T1 (at-site
only estimates). The fact that the improvement when T2 is considered as the
benchmark is fairly small implies that there has not been undue bias intro-
duced to the process as a result of employing similar regionalization
approaches for the ungauged and gauged analysis.
The regression results were also noted to provide better estimates when T2
was used to define the 'true' extreme flow quantiles as opposed to the case
when T1 was used. There is no apparent explanation for this pattern because
in both cases the regression equations were derived using the assumed 'true'
values as the dependent variables. However, a disadvantage of the regression
approach arises from the need to formulate separate regression relationships
for each return period of interest. This can lead to the definition of incon-
sistent Q - T relationships at some sites where in unusual cases, the predicted
extreme flow quantile for a return period T l could be greater than the value
for return period T2 where T 1 is less than T2. An anomalous result like this
18 Z. Zrinji, D.H. Burn/Journal of Hydrology 153 (1994) 1 21

cannot occur with the RO1 based approach in that the entire Q- T relationship
for a site is explicitly defined.

5. Conclusions

A framework for regional flood frequency analysis at ungauged sites has


been developed that incorporates a region of influence approach and a homo-
geneity test. The regionalization process uses the combination of catchment
characteristics at all sites and flow characteristics at the gauged sites to create a
flexible homogeneous region for each ungauged site. The L-moment approach
employed in the homogeneity test proved effective in identifying groupings of
similar stations and also was easily incorporated into the regionalization
process. The application of a homogeneity test at every step of the regionaliz-
ation ensures a homogeneous ROI and leads to improved extreme flow
quantile estimates.
Extreme flow estimation at ungauged sites using the ROI approach was
shown to result in an improvement relative to results from a regression
approach. The advantage associated with estimating extreme flow quantiles
using the ROI approach is, however, not substantial for this case study. This is
partly because of the unique characteristics of the data set employed. Because
the entire set of gauging stations form a nearly homogeneous region, the advan-
tages of employing an efficient regionalization approach are not as great as would
be the case for a larger and less homogeneous collection of gauging stations.
From the results obtained with the ROI approach, it can be seen that the
major contribution arises from the formation of flexible regions. Both ROI
approaches, regardless of the method used for combining information from
the gauged sites in the regions, provide similar results, implying that the
regionalization component in regional flood frequency is an important factor
for efficient extreme flow quantile estimation.

Acknowledgment

The work described in this paper was supported by a research grant from
the Natural Sciences and Engineering Research Council (NSERC) of Canada.
This support is gratefully acknowledged.

References

Acreman, M.C., 1987. Regional Flood Frequency Analysis in the UK: Recent Research--New
Ideas. Institute of Hydrology, Wallingford, UK.
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21 19

Acreman, M.C. and Wiltshire, S.E., 1987. Identification of regions for regional flood frequency
analysis. EOS 68(44): 1262 (Abstract).
Burn, D.H., 1989. Cluster analysis as applied to regional flood frequency. J. Water Resour.
Manage., ASCE, 115(5): 576-582.
Burn, D.H., 1990a. An appraisal of the 'region of influence' approach to flood frequency
analysis. Hydrol. Sci. J., 35(2): 149 165.
Burn, D.H., 1990b. Evaluation of regional flood frequency analysis with a region of influence
approach. Water Resour. Res., 26(10): 2257-2265.
Chowdhury, J.U., Stedinger, J.R. and Lu, L.-H., 1991. Goodness-of-fit test for regional gen-
eralized extreme value flood distributions. Water Resour. Res., 27(7): 1765-1776.
Everitt, B., 1980. Cluster Analysis, 2nd edn. Wiley, New York, NY, pp. 24-40.
Greenwood, J.A., Landwehr, J.M., Matalas, N.C. and Wallis, J.R., 1979. Probability weighted
moments: definition and relation to parameters of several distributions expressible in inverse
form. Water Resour. Res., 15(5): 1049 1054.
Greis, N.P. and Wood, E.F., 1981. Regional flood frequency estimation and network design.
Water Resour. Res., 17(4): 1167 1177.
Hosking, J.R.M., 1986. The Theory of Probability Weighted Moments. IBM Research Report
RC12210, IBM, Yorktown Heights, NY, 160 pp.
Hosking, J.R.M., 1990. L-moments: analysis and estimation of distributions using linear
combinations of order statistics. J. R. Statist. Soc. B, 52(1): 105 124.
Hosking, J.R.M., Wallis, J.R. and Wood, E.F., 1985a. An appraisal of the regional flood
frequency procedure in the UK 'Flood Studies Report'. Hydrol. Sci. J., 30(1): 85-109.
Hosking, J.R.M., Wallis, J.R. and Wood, E.F., 1985b. Estimation of the generalized extreme
value distribution by the method of probability weighted moments. Technometrics, 27(3):
251 261.
Landwehr, J.M, Matalas, N.C. and Wallis, J.R., 1979. Probability weighted moments com-
pared with some traditional techniques in estimating Gumbel parameters and quantiles.
Water Resour. Res., 15(5): 1055 1064.
Lettenmaier, D.P., Wallis, J.R. and Wood, E.F., 1987. Effect of regional heterogeneity on flood
frequency estimation. Water Resour. Res., 23(2): 313 323.
Panu, U.S. and Smith, D.A., 1988. Estimating Flood Flows at Ungauged Sites in Newfound-
land. Water for World Development. Proceedings of the Vlth IWRA World Congress on
the Water Resources, Volume II, pp. 207 221.
Potter, K.W., 1987. Research on flood frequency analysis: 1983-1986. Rev. Geophys., 25(2):
113-118.
Tasker, G.D., 1982. Comparing methods of hydrologic regionalization. Water Resour. Bull.,
18(6): 965-970.
Wallis, J.R., 1989. Regional Frequency Studies Using L-Moments. IBM Research Report
RC14597, Yorktown Heights, New York, 17 pp.
Wandle, S.W., 1977. Estimating the magnitude and frequency of flood on natural streams
in Massachusetts. U.S. Geological Survey Water Resources Investigations 77-39,
27 pp.
Wiltshire, S.E., 1986a. Regional flood frequency analysis I: Homogeneity statistics. Hydrol.
Sci. J., 31(3): 321 333.
Wiltshire, S.E., 1986b. Regional flood frequency analysis II: Multivariate classification of
drainage basins in Britain. Hydrol. Sci. J., 31(3): 335 346.
Wiltshire, S.E., 1986c. Identification of homogeneous regions for flood frequency analysis. J.
Hydrol., 84: 287-302.
20 Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21

Appendix

L-Moments
L-moments of a random variable were first introduced by Hosking (1986).
They are analogous to conventional moments, but are estimated as a linear
combination of order statistics. L-moments can be expressed using probability
weighted moment (PWM) estimators. For a random variable X, with a
cumulative distribution function of F(X), the order r PWM can be defined
as (Greenwood et al., 1979)
/3r = E{ y[F(Y)] r} (A1)
Hosking (1986, 1990) defined L-moments as a linear combination of the
PWMs. The first four L-moments are
A, =/30 (A2)

A2 = 2/31 -/32 (A3)

A3 = 6/32 - 6/31 +/30 (A4)

"~4 = 20/33 - 30/32 + 12/31 -/30 (AS)


The first L-moment, A1, is the arithmetic mean, while the second L-moment,
A> is a measure of dispersion, analogous to the standard deviation. By
standardizing the higher L-moments through

Ar for r=3,4 (A6)


7"r ~ A2

they become independent of the units of measurement for X. Analogous to


conventional moment ratios, such as the coefficient of skewness (CS), r3 is the
L-skewness and reflects the degree of symmetry of a sample. Similarly, r4 is a
measure of peakedness and is referred to as L-kurtosis. In addition, the L-
coefficient of variation, L-CV, is defined as
k2
r2=~1 (A7)
For practical use, the PWMs can be estimated from a finite sample. Let
X1 ~<)(2 ~< ... ~<Xn be the ordered values of a sample of size n. Then, the
unbiased estimator of/3r is (Landwehr et al., 1979)
( j - 1 ) ( j - 2 ) . . . ( j - r)
br=n-l~.~=l( n 1--~-7 2 ) -~--r) XJ (A8)
Z. Zrinji, D.H. Burn / Journal of Hydrology 153 (1994) 1 21 21

T h e r e f o r e , the u n b i a s e d e s t i m a t o r s for the )~r are given b y


ll = bo (A9)
12 = 2bl - bo (al0)

13 = 6b2 - 661 + b0 (All)

/4 = 20b3 - 30b2 + 12bl - bo (a12)


S a m p l e estimates for "rr are tr = lr/12 for r = 3,4 a n d t2 = lz/ll. These estimates
are a s y m p t o t i c a l l y u n b i a s e d for large values o f n.

You might also like