You are on page 1of 10

Accident Analysis and Prevention 33 (2001) 799 808

www.elsevier.com/locate/aap

Overdispersion in modelling accidents on road sections and in


Empirical Bayes estimation
Ezra Hauer *
35 Merton Street, Apartment 1706, Toronto, Ont., Canada M5S 3G4
Received 28 September 2000; received in revised form 08 October 2000; accepted 30 October 2000

Abstract

In multivariate statistical models of road safety one usually finds that the accident counts are overdispersed. The extent of the
overdispersion is itself subject to estimation. It is shown that the assumption one makes about the nature of overdispersion will
affect the maximum likelihood estimates of model parameters. If one assumes that the same overdispersion parameter applies to
all road sections in the data base, then, the maximum likelihood estimate of parameters will be unduly influenced by very short
road sections and insufficiently influenced by long road sections. The same assumption about the overdispersion parameter also
leads to an inconsistency when one estimates the safety of a road section by the Empirical Bayes method. A way to avoid both
problems is to estimate an overdispersion parameter () that applies to a unit length of road, and to set the overdispersion
parameter for a road section of length L to L. How this would change the estimates of regression parameters for road section
models now in use requires examination. Safety estimation by the Empirical Bayes method is altered substantially. 2001
Elsevier Science Ltd. All rights reserved.

Keywords: Empirical Bayes; Overdispersion; Negative binomial; Multivariate model

1. Introduction The NB distribution arises when accident counts for


entity i are Poisson distributed with a mean piq while q
Entities have accident counts and traits. An entity is random variable drawn from a gamma distribution
can be a road section with traits such as length, traffic with mean= 1 and variance= 1/ (see, e.g. Guo 1996).
flow, number of lanes; entities can be drivers, with We will call the overdispersion parameter. Let Yi be
gender, age, annual distance travelled as traits; they can the count of accidents on entity i. By the NB
be grade crossings, with train traffic, vehicle traffic, distribution,
sight distance, protective device used as traits; entities
can be cities, with road density, number of inhabitants
P(Yi = yi )=
G(yi + )  

pi 
yi
. (1)
as traits, and so on. Often statistical regression models G()yi ! pi + pi +
are formulated to express the expected accident count
of an entity as a function of its traits. After the un- The value of the overdispersion parameter is estimated
known model parameters are estimated, one usually along with all other unknown model parameters.
finds that the accident counts are overdispersed. That The overdispersion parameter features in two uses to
is, that the differences between the accident counts and which the results of statistical regression modelling are
model predictions, are larger than what would be con- put. First, when the results of modelling are used to
sistent with the assumption that accident counts are estimate what is the mean accident frequency of entities
Poisson distributed. This led researchers to use the with given traits, the overdispersion parameter serves to
Negative Binomial (NB) distribution to represent the indicate how widely the accidents counts are distributed
distribution of accident counts. around the estimated mean. Second, when the results of
modelling are used to estimate the safety of a specific
* Tel.: +1-416-483-4452; fax: +1-416-483-4415. entity for which the traits and the accident record are
E-mail address: ezra.hauer@utoronto.ca (E. Hauer). known by the Empirical Bayes (EB) method (see, e.g.

0001-4575/01/$ - see front matter 2001 Elsevier Science Ltd. All rights reserved.
PII: S 0 0 0 1 - 4 5 7 5 ( 0 0 ) 0 0 0 9 4 - 4
800 E. Hauer / Accident Analysis and Pre6ention 33 (2001) 799808

Hauer, 1997), the overdispersion parameter determines Table 1


Data for two road sections
the relative weights given to the model prediction and
the accident record. Section Traits Accident counts
When fitting models to accident counts it is often
assumed that one overdispersion parameter is common L, length (km) X, AADT Y, accidents in a year
to all entities. Thus, e.g. a single common overdisper-
1 0.5 500 1
sion parameter is used by Maycock and Hall (1984)to
2 10.0 500 40
model accidents at roundabouts, by Hausman et al.
(1984) to model patent counts; Hauer et al. (1989) to
model accidents at urban signalized intersections; by 2. A problem of influence
Bonneson and McCoy (1993) to model accidents at
stop-controlled rural intersections; by Guo (1996) to In this section a numerical example is used to illus-
model quarterly counts of surgical procedures; by Mi- trate how two alternative assumptions about the distri-
aou (1996) to model truck accidents; by Vogt and bution of accident counts influence parameter
Bared (1998) to model accidents on rural road segments estimates. In this example a parameter is estimated once
and intersections, and by many others. assuming that accident counts are Poisson distributed
To use an overdispersion parameter that is common and next using the NB distribution assumption. The
to all entities may be found limiting. There is no reason data in this example are for two road sections with the
to expect that the gamma distribution from which q is traits and accident counts given in Table 1. Take, e.g.
sampled is the same for all the entities that make up the the model equation pi = hLiX 0.5 In this pi is
i .
data for the regression, or to all the entities to which the expected number of accidents per year for road
the predictions may be thought to apply. One way to section i, Li is the length of road section i, and Xi is its
remove this limitation is to assume that i =p ki (see, Annual Average Daily Traffic (AADT). The parameter
e.g. Cameron and Trivedi, 1998 and Maher and Sum- h is to be estimated by the method of maximum likeli-
mersgill, 1996). When the parameter k is set to 0, the hood.
standard NB model is obtained; when k\ 0 is chosen
then the variance of the gamma distribution from
which q is sampled decreases as pi increases. Miaou and 2.1. Poisson regression
Lum (1993) modelled crashes on rural interstate high-
ways using k=1. Using the data from Maycock and Assume first that the accident counts (Y) comes from
Hall (1984) for four-arm roundabouts, Maher and a Poisson distribution. The Poisson log-likelihood func-
Summersgill (1996) report that k=0.25 maximized the tions with respect to h for section 1, section 2, both
likelihood of the data. Yet another possibility is to sections on the same graph, and the sum for both
assume that itself is a function of the various covari- sections together are shown in Figs. 1 and 2 Fig. 3 Fig.
ates and unknown parameters. Heydecker and Wu 4.
(1999) used this approach when modelling accidents at The maximum in Fig. 1 is at h1 = 1/(0.5 5000.5)=
three-arm junctions. The attraction of such generaliza- 0.089, in Fig. 2 at h2 = 40/(10 5000.5)= 0.179. Fig. 3
tions is that they can better represent the reality of serves to show that the variation in log-likelihood for
specific data sets. section 1 is very small compared to the variation in the
My experience with accident prediction models for log-likelihood for section 2. This is why the much
road sections which differ in length leads me to believe longer road section 2 with its many accidents dominates
that the choice of the expression for overdispersion is the joint log-likelihood function in Fig. 4. As a result
not only a matter of getting a better fit at the price of the maximum for the sum of two log-likelihoods is at h1
additional parameters in the model. The issue is not and 2= 0.175, very close to the maximum for road
only empirical but also conceptual. In Section 2I show section 2.
why problems of parameter estimation may arise when That a data point with many accidents should have
the same value of is assumed to apply to road more influence on the estimate of h than a data point
sections that differ in length. In Section 3I discuss the with a few accidents is in accord with intuition and can
general mechanism by which overdispersion arises in be explained from basic principles. For road section i, h
accident modelling. This understanding leads to a sug- is its mean number of accidents divided by LiX 0.5 i .
gestion of a remedy to the aforementioned problem. When h is estimated, the accident count replaces the
The remedy is discussed in Section 4. In Section 5 I mean number of accidents. If the accident count is
argue that the use of a single and common dispersion Poisson distributed, the variance of this estimate of h is
i ) = h/(Li X i ). Thus, whatever the
the meani /(LiX 0.5 2 0.5
parameter also leads to a logical blemish in the Empir-
ical Bayes Estimation of expected accident frequency value of h, since L1/L2 = 1/20, the variance of h1 is 20
and that the remedy in Section 4 removes the blemish. times the variance of h2. In combining two estimates of
E. Hauer / Accident Analysis and Pre6ention 33 (2001) 799808 801

Fig. 1. Section 1. Fig. 3. Sections 1 and 2.

It is commonly believed that the estimates of model


different precision, the combined estimate is most pre- parameters do not change much when accident counts
cise if we use a linear combination of h1 and h2 with the are assumed to be NB instead of Poisson distributed;
inverse of their variances as weights. That is, h1&2 = the belief is that alternative assumptions about the
(0.089/20 + 0.179/1)/(1/20 + 1/1) = 0.175. In this, the distribution of accident counts affect mainly the preci-
weight of h1 = 0.089 is 1/21 and the weight of h2 = sion with which the model parameters are estimated. In
0.179 is 1/1.05. this numerical example, in which there are large differ-
ences in road section length, the maximum likelihood
2.2. Negati6e binomial regression parameter estimate does depend on what is assumed to
be the distribution of accident counts.
Assume now that accident counts (Y) come from the The issue is now clear. We expect that road section 2,
negative binomial distribution in Eq. (1) and that it is which is 20 times longer than road section 1 and has 40
known that = 1. The negative binomial log-likelihood times as many accidents, will always exert a strong pull
functions with respect to h for section 1, section 2, both on the maximum likelihood estimate h1&2. It is dis-
sections on the same graph, and the sum for both turbing to find that assuming accident counts to be NB
sections together are shown in Fig. 5 Fig. 6 Fig. 7 Fig. distributed with a common , considerably weakens the
8. The maxima in Fig. 5 and Fig. 6 are, as before, at influence of what should be the dominant data point.
h1 = 0.089 and h2 = 0.179. However, as is evident from Why is this so?
the comparison of Fig. 3 and Fig. 7, the log-likelihood The transparent explanation is again provided by
function for road section 2 is now much flatter. There- examining the relative precision of the estimates h1 and
fore, when the log-likelihoods for sections 1 and 2 (Fig. h2. Under the Poisson assumption the variance of an
8) are added, the h1 and 2 shifts a considerable way from estimate of h was mean/(LX 0.5)2. Under the NB as-
h2 in the direction of h1. It is now 0.144. The mutual sumption the variance of an estimate of h is [mean
position of the estimates is depicted in Fig. 9. (1+ mean/)]/(LX 0.5)2. If we take mean1 = 0.144

Fig. 2. Section 2. Fig. 4. Sum for sections 1 and 2.


802 E. Hauer / Accident Analysis and Pre6ention 33 (2001) 799808

Fig. 5. Section 1. Fig. 7. Sections 1 and 2.

0.5 5000.5 =1.6 accidents per year and mean2 =20 large compared to the magnitude of the overdispersion
1.6= 32 accidents per year, then the ratio of their parameter .
variances is [1.6(1+ 1.6)/L 21]/[(32(1 +32)/(20L)21]= The issue can be now stated generally:
4.16/2.64 # 1.5. Combining the two hi s we now obtain
h1 and 2 = (0.089/4.16 + 0.179/2.64)/(1/4.16 + 1/2.64) = The lesser the variance of the (accident) counts for a
0.144. Whereas under the Poisson assumption the vari- data point (a road section), the larger is its influence
ance of h1 was 20 times that of h2, under the NB on the estimates of model parameters. When counts
assumption it is only 1.5 times. are assumed to be NB distributed with a common
The root of the problem in the numerical example is overdispersion parameter , then, entities with a
the innocuous but commonly made assumption, that mean that is large in comparison to the magnitude of
the overdispersion parameter is the same for road will suffer a dramatic increase in the variance of
sections 1 and 2. Recall that the NB distribution arises their (accident) counts. This brings about a corre-
when pi is multiplied by a random variable q drawn spondingly large decline in their influence over the
from a gamma distribution with mean=1 and vari- most likely value of the model parameters, as com-
ance =1/i. This amounts to assuming that the vari- pared to their influence when counts are assumed to
ance of accident counts is pi (1 + pi /i ). For road be Poisson distributed. But entities with large means
section 2, under the Poisson assumption, the variance (e.g. long road sections and road sections with a lot
of accident counts was the same as the mean (32); using of traffic) are precisely those entities that should
= 1 in the NB assumption, the variance of accident influence parameter estimation most.
counts is 33 times the mean [32(1+ 32/1) =1056]. The
reason for the dramatic increase in the variance of the Miaou and Lum (1993) provide essentially the same
accident counts of road section 2 is that its mean (32) is argument when they note that in estimating the

Fig. 6. Section 2. Fig. 8. Sum for sections 1 and 2.


E. Hauer / Accident Analysis and Pre6ention 33 (2001) 799808 803

entity b (pqb ) will likely differ from the mean for


entity a in spite of the fact that their represented traits
are identical. This difference has several reasons. First,
the measured traits are subject to estimation error.
Thus, e.g. AADT is estimated on the basis of few days
of traffic counting that may have been carried out in
some other year at a nearby location. Therefore identi-
cal estimates of AADTs do not mean that, in fact, road
Fig. 9. Comparison of estimates.
sections a and b served the same traffic. Second, the
same average may represent very different distributions.
(regression) coefficients for road sections with the same If road section a carries 100 vehicles at night and 900
AADT the linear-regression models put considerably during daytime while road section b serves 500 vehi-
more weight on accidents observed on short road sec- cles at night and in daytime, a and b will have the
tions than those observed on long sections (p. 698). same AADT but different accident means. Third, enti-
They argue in favour of using the Poisson models ties a and b are sure to differ in unrepresented traits;
(where the variance of accident counts is the mean) as one road may lead to a pub, the other may be popular
opposed to the Normal and Log-Normal models with deer. For all these reasons, if we imagine a large
(which they, for some reason, call linear-regression group of entities (a, b, c, ) with the same represented
models) in which the assumption is that the variance is traits, they must be thought to have different means
proportional to the square of the road section length or (pqa, pqb, pqc, ). These means form a distribution
the square of the mean. with p as a mean (when E[q]= 1) and with p 2/ as
That under the NB error structure with common , variance (when VAR[q] =1/). The ro le of statistical
road section 2 does not dominate the estimate of h modelling is to provide an estimate p as a function of
should raise eyebrows. To suggest a sensible remedy, it represented traits.
is necessary to be clear about why overdispersion arises We are interested in how p can be expressed as a
in our circumstance. function of the represented traits for two common
purposes. First, when we wish to ascertain how a
certain trait (say, lane width or grade) is associated with
3. Why are accident counts overdispersed? (perhaps affects) the expected accident frequency. The
second purpose is when we wish to estimate the ex-
The usual circumstance of statistical road safety pected accident frequency of a specific site by
modelling is that we have data for many entities (say, combining the prediction p for sites with identical
road sections) and we wish to fit a model to this data represented traits with the accident history of the site.
set. For each entity we know the count of accidents that In both instances (using the common statistical par-
have been reported to the police. In addition, for each lance) this means that we wish to estimate a random
entity we know some traits that have been measured or effects model and this is the framework of discussion in
estimated with varying degrees of accuracy. The mea- this paper. (Were a time series of accident counts and
sured traits are often averages over the road section traits available for each entity in a data set, it would
length (e.g. roadside hazard rating) or averages over also be possible to estimate a parameter qi, i =
time (e.g. Annual Average Daily Traffic AADT). a, b, c, doing so might be of interest if, e.g. one tried
We then choose to include some such measured traits in to examine why some entities in the data set seem to
the model equation as covariates. These will be called have fewer or more accidents than is predicted by their
the represented traits. In addition, there are the mea- represented traits. In this case we might be interested in
sured traits that are not represented in the model, some a fixed effects model.)
measurable but unmeasured traits, and perhaps some Unless VAR[q]= 0, p (being the mean of the pqi, in
altogether unmeasurable traits. These are the unrepre- the imagined group of entities i= a, b, c, that have
sented traits. Both represented and unrepresented traits identical represented traits) will not be the same as the
are linked to accident occurrence. pq of the entity in the data. On this basis it is possible
If for a specific entity, say entity a it was possible to to further clarify the nature of overdispersion. Recall
keep constant all its traits both represented and that the definition of variance is the expected value of
unrepresented and then to repeat the same time the squared difference between a random variable and
period over and over, each time recording the count of its mean. The expected value of the squared differences
accidents, then, the average of the counts would con- between a random variable and any fixed value other
verge on the period mean for this specific entity pqa. than the mean, is larger than the variance. Overdisper-
Consider now a different entity, say entity b, with the sion is judged by the magnitude of the squared differ-
same represented traits as entity a. The mean for ence between accident counts and model predictions
804 E. Hauer / Accident Analysis and Pre6ention 33 (2001) 799808

the squared residuals. But model predictions are esti- 4. A remedy


mates of the p for an imagined population of entities
with identical represented traits; model predictions are Road section 2 of the numerical example can be
not estimates of the pqi of the entities for which we thought to consist of 20 subsections like road section 1.
happen to have the accident counts. Since the squared Therefore p2 = 20p1. With this, under the usual NB
residuals are computed with respect to a value other assumption, the variance of accidents counts on road
than the mean of the entities for which we have data, section 2 would be 20p1(1+20p1/). (The usual NB
we must expect the residuals to be larger than the assumption is that the value of q, drawn from the
variance of the counts for the entities in the data set. Gamma distribution with mean=1 and variance=1/,
Thus, overdispersion is the necessary consequence of applies to the entire road section 2.)
the fact that the means of the entities that serve as data An alternative expression for the variance of accident
are not what the model aims to predict. The root cause counts on road section 2 is obtained when one assumes
of overdispersion is that entities with the same repre- that for each of the 20 constituent subsections a sepa-
sented traits have different means. rate q is selected from a Gamma distribution with mean
In the numerical example we noted that the influence 1 and variance 1/. Under this assumption the variance
of a data point diminishes as the variance of its acci- of accident counts on each such sub-section is p1(1+
dent counts increases and that the NB distribution p1/). If the 20 subsection accident counts are statisti-
implies that the variance of the accident counts for cally independent, the variance of their sum will be
entity i is pi (1pi /i ). Therefore, the central ques- 20p1(1+p1/). We now have two different expressions
tion is: how does the VAR[q] =1/i change as a func- for the variance of accident counts on road section 2
tion of the value of represented traits. This question is and, obviously, 20p1(1+20p1/)" 20p1(1+p1/).
answered in a very specific way if it is assumed that q is The reason for the noted difference can be stated in
tangible terms. The expression 20p1(1+20p1/) has
always drawn at random from a Gamma distribution
been obtained by assuming that all 20 constituent sub-
with mean 1 and variance 1/. In this case, since i is
sections have the same q. That is, all 20 subsections of
assumed to be , the same for all entities, the variance
which road section 2 is made up have the very same
of the means in an imagined population of entities with
errors in variables, the same distribution that under-
identical represented traits is proportional to p 2. Thus,
lies averages, and the same unrepresented traits. In
e.g. the use of =1 for both road sections in the
contrast, the expression 20p1(1+20p1/) is obtained by
numerical example implies that roads with the same
assuming that for each of the 20 constituent subsections
represented traits as section 2 (10 mile long road sec-
a new q is selected at random from the Gamma distri-
tions with AADT=500) have a VAR[pq] that is four
bution with mean= 1 and variance=1/. This implies
hundred times that of roads with the same represented
that the 20 constituent sub-sections differ amongst each
traits as road section 1 (0.5 mile long road sections with other in their errors in variables, distributions under-
AADT = 500). Should one expect such a large increase lying averages and in unrepresented traits in the same
in the variance that is due to traits not represented in manner as any two road sections with the same repre-
the model simply because one road section is much sented traits would, even when they were not stringed
longer than the other road section? out one behind the other on the same road. In short,
The condition under which this would be true will be under the first assumption the 20 values of q are
examined next. However, before doing so, it is impor- perfectly correlated while under the second assumption
tant to comment briefly on why in available data sets they are uncorrelated.
some road sections are long and others short. Miaou The assumption about the q s being either perfectly
and Lum (1993)(pp. 697699) provide a good discus- correlated or entirely uncorrelated brackets reality.
sion of this issue. Usually road sections in state or Subsections of the same road will still differ in many
provincial inventories are delimited by intersections. In respects. Therefore, there is no reason to think that the
principle, the aim is to define the road sections so that q s of all the subsections are the same. However,
the road in-between the end-points is of one class (e.g. subsections with the same represented traits when
two-lane, four-lane divided) and is fairly homogeneous stringed out consecutively along the same road are
in traffic. Additional considerations having to do with likely to differ less than would subsections that are not
the history of major reconstructions also apply. When spatially proximal. Therefore the q s of subsections
so defined, road sections are usually not homogeneous along the same road are likely to be correlated, if only
in features such as grade, curvature, roadside hazard because they are used by the same drivers under the
and the like. In some data bases (e.g. the HSIS the same climatic conditions.
Highway Safety Information System) an attempt is A general expression for the variance of accident
made to define road sections that are homogeneous in counts when the q s of subsections are not correlated
all (or most) measured traits. can be obtained as follows. A road section of length L
E. Hauer / Accident Analysis and Pre6ention 33 (2001) 799808 805

Fig. 10. Section 1. Fig. 12. Sections 1 and 2.

can be thought to consists of L/l subsections, all To examine the implication of this alternative as-
subsections having the same covariate values, and each sumption on parameter estimation, I now return to the
subsection being l units long. If the entire road section numerical example. Assume that (l=1km)= 1. Then,
has a mean accident frequency p, each subsection has for road section 1, VAR[q]= 1/[(l= 1 km) 0.5/1] =
mean accident frequency w= p/(L/l). As before a mix- 1/0.5 = 2 and for section 2, VAR[q]= 1/[(l=1
ing variable q is drawn randomly from a distribution km) 10/1] = 1/10. The corresponding four log-likeli-
with mean= 1, but now we assume that its variance hood functions are in Figs. 1013.The maxima in Figs.
depends on the length of the subsection l. That is, 10 and 11 are, as before, at h1 = 0.089 and h2 =0.179.
VAR[q] = 1/(l). Using the NB distribution, for each As is evident from the comparison of Figs. 10 and 11,
subsection of length l, VAR[Accident count for a sub- the much larger influence of the log-likelihood function
section]= w(1 + w/(l)). Let Y be the sum of accident for the accident-rich road section 2 has been restored.
counts for the L/l subsections. Since the subsection Therefore, when the two log-likelihoods are added, the
counts are statistically independent, VAR[Y]= h1 and 2 (= 0.175) is again very close to h2.
(L/l)w[1+ (L/l)w/((l)L/l)] = p[1 + p/((l)L/l)]. When it comes to parameter estimation by maximiz-
The sum of independent NB random variables is not ing a likelihood function then, instead of using in Eq.
NB distributed. However, we can use the result about (1) a that is common to all entities, for entity i we
VAR[Y] as a motivation to postulate an alternative used i = (l)Li /l. The question is what subsection
negative binomial model for the count of accidents Y length l should be used? Suppose that using a subsec-
for a road of length L. The alternative assumption is tion length l= 1 km we obtain the maximum likelihood
that the NB distribution arises when p for the road estimate ML(l=1) of (l= 1). This determines the
section of length L is multiplied by a random variable values of all the i s. Naturally, the same values of i
q drawn from a gamma distribution with mean=1 and maximize the likelihood function if subsections were,
variance= 1/[(l)L/l]. say, 2 km long. This implies that ML(l=1)=

Fig. 11. Section 2. Fig. 13. Sum of sections 1 and 2.


806 E. Hauer / Accident Analysis and Pre6ention 33 (2001) 799808

ML(l=2)/2. That is, that the ratio ML(l)/l is con- Consider now the first half of this road on which y1
stant. Therefore we can choose l =1 unit of length, say accidents were counted and ask what is to be the
1 mile or 1 km and the numerical values of i = (1 estimate of the expected accidents there. One extreme
mile) Li (measured in miles) or i =(1 km) option is to estimate as if both halves of the road had
Li (measured in km) will be the same. Using the same the same q. To estimate in this manner reflects the
argument, namely, that there is a unique set of values belief that not only do the two halves of the road have
i that maximize the likelihood function, we can estab- the same represented traits, as postulated, but they are
lish the relationship between ML(1 mile) and ML(1 also have the same errors in variables, the same distri-
km); ML(1 mile)L(measured in miles)= ML(1 butions that underlie averages, and the same unrepre-
mile) 0.622 L(measured in km). Therefore ML(1 sented traits. With these assumptions the estimate for
km) = 0.622 ML(1 mile). the first half of the road would be half the estimate for
The gist of the argument in this section was that the whole road. This amounts to asserting that not only
reality is somewhere between assuming that all subsec- is the count of accidents on the second half of the road
tions of a road section have precisely the same repre- (y2) relevant to estimating what is true about the first
sented and unrepresented traits, and, alternatively, half of the road, but that it is just as important as the
assuming that these traits vary from sub-section to accident count on the first half of the road (y1). One
sub-section of a certain road just as they vary between may ask how far can this belief be stretched? How can
any subsections of the road system that have the same one judge whether physically separate road segments
represented traits. The mathematical equivalent of these with the same (or even similar) represented traits have
two bracketing assumptions is to either use a constant the same q, in which case their accident counts should
that does not depend on road length or to use be used jointly to estimate the expected number of
i = Li. (Both options can be represented alge- accidents on any one of these segments? In any case, if
braically by i = L ii with i = 0 or i = 1. A device the belief that the same q applies to both halves of the
for establishing a compromise that reflects reality might road is correct, then the sum of EB estimates for the
be to use a value of i from within the (0, 1) range). two halves equals the EB estimate for the whole road.
With the assumption of a common q the EB estimates
are consistent.
5. Implication for EB estimation The other extreme option is to estimate as if the two
road halves differed in their unrepresented traits, in
In EB estimation the model prediction pi and the variable errors and in the nature of averages. That is,
accident count Yi are combined as follows (see, e.g. even though the two halves have the same represented
Hauer, 1997): traits, their q s could differ in the same manner as q s
Estimate of piq =weighti (1 weighti ) (Yi ). (2) differ amongst roads that are not physically contiguous;
as if two values of q were sampled independently from
In this, pi is the expected number of accidents for the same distribution, one for each road half. With this,
entities that have the same represented traits as entity i for the first half of the road the EB estimate is :
and weighthalf p/2+ (1 weighthalf) (y1), and for the
1 1 second half of the road the EB estimate is: weighthalf
weighti = =
1+VAR[piq]/E[piq] 1 +(p 2i /i )/pi p/2+ (1 weighthalf) (y2). The sum of the two esti-
mates is : weighthalf p+ (1 weighthalf) (y1 +y2). As
1
= . (3) was shown earlier, the EB estimate for the road as a
1+pi /i whole is weightwholep+ (1weightwhole) (y1 + y2). We
The assumption that i =L 0 = was seen earlier would like the EB estimator be such that, the sum of
to lead to potential distortions in parameter estimation. the estimates for the two halves be the same as the
The same assumption may lead to an inconsistency estimate for the whole. The only general way to satisfy
when the safety of a road section is to be estimated by this equality is to make weighthalf = weightwhole.
the EB method. The assumption i = L 0 = implies that
To show the potential for inconsistency, consider a weighthalf = (1+ (p/2)/) 1 and weightwhole =(1+p/
road that has along its entire length the same repre- ) 1. Because weighthalf " weightwhole, the sum of the
sented traits. The model for this kind of road predicts p estimates for the two halves of a road section is not the
accidents in a period of specified length. Assume that same as the estimate for the whole. Under the assump-
we have in that period accident counts y1 and y2 on the tion that i = the EB estimates are inconsistent when
first and the second half of this road. Since for the the two halves have different q s. The alternative
whole road weightwhole =(1 +p/) 1, the estimate of assumption that i = L does not have the same blem-
pq for the road is weightwholep+ (1 weightwhole) (y1 + ish. Now weighthalf = [1+(p/2)/(L/2)] 1 =[1+p/
y2). (L)] 1 = weightwhole. Since the two weights are now
E. Hauer / Accident Analysis and Pre6ention 33 (2001) 799808 807

the same, the sum of the estimates for the two halves is i being close to 1. Whether the usual maximum likeli-
the same as the estimate for the whole. hood estimation can overcome this built-in tendency
That in our thought we divided the road section into needs to be explored.
halves only made the algebra transparent. Division into The recommendation here is not to regard the
unequal parts would lead to the same conclusion. It overdispersion parameter as a constant common to
appears that if it is not true that all subsections of a all entities, at least not without justification that is
road that is homogeneous in represented traits have the based on the data. At this time, the relationship i =
same expected number of accidents per unit length, Li seems to be the more attractive alternative. In this
then use of the assumption that i = would lead to relationship Li is a logical variable. It represents the
inconsistent EB estimates. logical requirement that if a road section is made up of
two non-overlapping sub-sections the sum of the model
predictions for the two subsections should be the same
6. Summary and discussion as the model prediction for the road section as a whole.
This can be true only if the model prediction is propor-
In NB regression it is common to assume that the tional to section length. At times, section length is used
overdispersion parameter is the same for all entities. in models not only as a logical variable but also as a
Experience with statistical modelling of accidents on statistical covariate. Since the length of a road section is
road sections indicates that this assumption may have determined mainly by the distance between major inter-
undesirable consequences. When road sections in the sections, it is entirely possible that section length is
data base differ in length, two problems arise. First, if correlated with speed, driveway density etc, and there-
model parameters are estimated by maximum likeli- fore legitimately serves as a proxy for several causal
hood, the relative influence of long road sections is variables. Logical variables and statistical covariates
much diminished whereas very short road sections exert must be kept separate. Thus, e.g. should parameter
an unduly large influence. Second, when the expected estimation lead to the result that accident frequency is
accident frequency of a road section is estimated by the proportional to (section length)0.79, this is best written
EB method and the entire road section does not have as proportionality to L L 0.21 in which L is the
the same q, the sum of estimates for subsections does logical variable and L the covariate which serves as
not add up to the estimate for the road section as a proxy for causal variables associated with section
whole. length.
An alternative is to consider overdispersion for road Would the generalization suggested by Cameron and
section i to be given by Li. This has been motivated by Trivedi (1998) to use i = p ki with k set to either 0 or
the argument that each road section can be viewed as 1 be a remedy? If k is set to be 0 then we have the
consisting of many subsections that differ in traits that standard NB model and both noted deficiencies apply.
are not represented in the model. Therefore, that these If k is chosen to be larger than 0 but lesser than 1, then
subsections have different q s drawn independently weightpart " weightwhole and the potential for estimation
from the same Gamma distribution. The use of this inconsistency discussed in Section 5 remains. When k is
assumption has been shown to remedy both problems. set to be 1, then, in Eq. (2), weighti = (1+ 1/) 1. This
It can be argued that reality is somewhere between would remove the estimation inconsistency. Unfortu-
the unsatisfactory assumption that i =L 0 and the nately, use of a weight that does not diminish as pi
logically-pleasing-but-not-exactly-true assumption that increase would deprive EB estimation of its essence.
i = L 1i . It is therefore tempting to consider i = L ii The attraction of EB estimation is that it correctly
with i in the range (0, 1). Two problems arise. First, accounts for differences in the precision of two separate
any i "1 will give rise to the problem that when the q sources of information. When pi is large, most of the
of a road section is not constant, the sum of EB information about the expected accident frequency for
estimates for parts will not be the EB estimate for the the specific site should come from the accident count
whole. Second, examination of residuals may give an for that site, not the model prediction. Therefore
illusory confirmation of incorrect assumptions about i. weighti should be small. Conversely, when pi is small
The problem is that a value of i close to 0 will give the accident count will be small and therefore much of
little weight to data points with large p. This, will make the information should come from the model predic-
for large squared residuals at these data points. Large tion. In this case weighti should be large. It follows that
squared residuals for these data points will seem to the suggested generalization does not solve the
confirm the assumption that i is close to 0. Had a i problem.
close to 1 been assumed, the data points where p is Heydecker and Wu (1999) suggest that itself be
large would have been more influential. The resulting modelled as a function of the various covariates and a
parameter estimates, would make these residuals new set of parameters. This avenue can be pursued
smaller, and thereby seem to confirm the assumption of provided that the structure i = Li f(covariatesi ) is
808 E. Hauer / Accident Analysis and Pre6ention 33 (2001) 799808

preserved. However, since we suspect that in reality the both assumption, one can then compare the perfor-
q s for subsections of the same road are neither mance of the two alternative EB estimators.
uncorrelated nor perfectly correlated, the fundamental
remedy is in research that might provide details about
the variation of q s among spatially proximate and References
spatially distant road sections.
Bonneson, J.A., McCoy, P.T., 1993. Estimation of safety at two-way
To make our point clearly, the numerical example on
stop-controlled intersections on rural highways. Transportation
which Section 2 was based made use of one very short Research Record 1401, 83 89.
and one very long road section. The question arises Cameron, A.C., Trivedi, P.K., 1998. Regression Analysis of Count
whether in real data sets which consist of a mixture of Data. Cambridge University Press, Cambridge.
road section lengths, the assumption made about the Guo, G., 1996. Negative multinomial regression models for clustered
event counts. Sociological Methodology 26, 113 132.
overdispersion parameter would have a noticeable ef-
Hauer, E., 1997. Observational Before After Studies in Road Safety.
fect on the results modelling. A related practical ques- Pergamon, Oxford, UK.
tion is whether models that have been estimated in the Hauer, E., Ng, J.C.N., Lovell, J., 1989. Estimation of safety at
past on the assumption of a single common overdisper- signalized intersections. Transportation Research Record 1185,
sion parameter should revisited. These questions can be 48 61.
Hausman, J., Hall, B.H., Griliches, Z., 1984. Econometric models for
answered fully only by modelling several real data sets count data with an application to the patents-R&D relationship.
using both assumptions about the overdispersion Econometrica 52 (4), 909 938.
parameter. It is a task for future research. However, Heydecker, B.G, Wu, J., January/1999. Identification of sites for road
one issues can be settled now. The value of the overdis- accident remedial work by Bayesian statistical methods: an exam-
persion parameter under the differing assumptions is ple of uncertain inference. The Fifth International Conference on
the Application of Artificial Intelligence to Civil and Structural
sure to be different. Therefore, if any use is to be made Engineering, Oxford.
of the overdispersion parameter, the models need to be Maher, M., Summersgill, I., 1996. A comprehensive methodology for
re-estimated. the fitting of predictive accident models. Accident Analysis and
Another question is whether the two alternative as- Prevention 28 (6), 281 296.
sumptions about the dispersion parameter will have a Maycock, G, Hall, R.D., 1984. Accidents at four-arm roundabouts.
Laboratory Report LR 1120, Transport Research Laboratory,
noticeable effect on the EB estimation of the expected Crowthorne, Berkshire, UK.
accidents. This question can be answered affirmatively. Miaou, S-P., 1996. Measuring the Goodness of Fit of Accident
The EB estimate depends on the model prediction (p) Prediction Models. FHWA- RD-96-040. FHWA, McLean, VA.
and the weight (Eq. (2)). Even if the model prediction Miaou, S.P., Lum, H., 1993. Modelling vehicle accidents and highway
was affected only mildly by the alternative assumptions, geometric design relationships. Accident Analysis and Prevention
25 (6), 689 709.
the weight under the two assumptions will, as a rule, Vogt, A., Bared, J., 1998. Accident models for two-lane rural seg-
be substantially different. This may open another line ments and intersections. Transportation Research Record 1635,
of empirical inquiry. After fitting a model to data using 18 29.

You might also like