You are on page 1of 4

Real-time Highway Traffic Accident Prediction Based on

the k-Nearest Neighbor Method



Yisheng Lv
Institute of Automation
Chinese Academy of Sciences
Beijing, China
E-mail: lvyisheng@gmail.com
Shuming Tang
Shandong University of Science and Technology
Qingdao, China
E-mail: sharron@ieee.org
Hongxia Zhao
Institute of Automation
Chinese Academy of Sciences
Beijing, China
E-mail: hongxia.zhao@ia.ac.cn


AbstractThe occurrence of a highway traffic accident is
associated with the short-term turbulence of traffic flow. In
this paper, we investigate how to identify the traffic accident
potential by using the k-nearest neighbor method with real-
time traffic data. This is the first time the k-nearest neighbor
method is applied in real-time highway traffic accident
prediction. Traffic accident precursors and their calculation
time slice duration are determined before classifying traffic
patterns. The experimental results show the k-nearest neighbor
method outperforming the conventional C-means clustering
method.
Keywords-real-time accident prediction; highway accident
prediction; k-nearest neighbor method; real-time traffic data;
pattern classification
I. INTRODUCTION
Many studies have tried to identify the relationship
between traffic accident occurrences and traffic conditions.
But little effort has been made to study the relationship
between traffic accident occurrences and dynamic traffic
flow characteristics described by real-time traffic flow
variables.
Oh et al. [1] developed a Bayesian classifier to classify
traffic patterns as leading or not leading to collision and used
5-min standard deviation of speed as the indicator. Lee et al.
[2] developed a log-linear model to predict crashes by
estimating crash precursors from loop detector data. In a later
study, the same authors [3] refined the aforementioned
model and proposed a method to determine observation time
slice duration prior to crash occurrence. Abdel-Aty et al.
developed a matched case-control model to model the crash
potential based on traffic loop data and the rain index [4].
Luo and Garber [5] found that the C-means clustering
method, the Nave-Bayes method and the Discriminant
Analysis was only able to identify the patterns leading to
collision with an overall classification error rate at about
50%, and they analyzed reasons for the unsuccessful
identification of the patterns leading to collision. Qi et al. [6]
used the random effects ordered probit model in predicting
freeway accident likelihood.
This research aims to predict the highway traffic accident
potential by identifying hazardous traffic conditions and
normal traffic conditions with real-time traffic data.
Hazardous traffic conditions mean traffic patterns leading to
traffic accidents and normal traffic conditions are traffic
patterns not leading to traffic accidents. The k-nearest
neighbor method (k-NN) and the C-means clustering method
(CM) are used to identify traffic patterns. This is the first
time the k-nearest neighbor method is applied in real-time
highway traffic accident prediction. The k-NN method is
easily implemented and tested and fast enough [7, 8] to
classify traffic conditions in Advanced Traffic Management
Systems (ATMS); the CM method has non-linear non-
parametric characteristics and requires the least knowledge
about the study data set [9, 10]. In addition, new
determination methods of calculation time slice duration of
accident precursors is used on the basis of experimental
results in this study.
The rest of this paper is organized as follows. Section 2
describes the framework of real-time highway traffic
accident prediction. Section 3 introduces the determination
methods of accident precursors and their calculation time
slice durations, the k-NN method and the CM method.
Section 4 presents the model application and the results are
described and analyzed. Section 5 gives the conclusions.
II. FRAMEWORK OF REAL-TIME HIGHWAY TRAFFIC
ACCIDENT PREDICTION
The framework of highway traffic accident forecast
based on real-time traffic flow data is shown in Fig. 1, which
has two parts, the design of classifiers (as shown in the left
part of Fig. 1.) and its application (as shown in the right part
of Fig. 1.). The designed classifier is used to identify
hazardous traffic conditions and normal traffic conditions,
2009 International Conference on Measuring Technology and Mechatronics Automation
978-0-7695-3583-8/09 $25.00 2009 IEEE
DOI 10.1109/ICMTMA.2009.657
547
meanwhile, with the number of samples increasing, the
classifier may be redesigned and accident precursors may be
reselected, which can improve the identification accuracy.
The work flow of the classifier design is as follows: firstly,
the traffic accident and corresponding real-time traffic data
are collected and preprocessed in order to obtain samples;
secondly, accident precursors are chosen; thirdly, the
classifier is designed, and then applied to identify traffic
accident potential. This framework is proposed from the
perspective of pattern recognition, and it considers feedback
to dynamically select accident precursors and design the
classifier.

Figure 1. Framework of real-time highway traffic accident prediction
III. METHODOLOGY
A. Selection of traffic accident precursors
In this paper, traffic accident precursors are defined as
traffic flow measures that can represent changes between
normal traffic conditions and hazardous traffic conditions.
Traffic accident precursors are the input features. Accident
precursors are selected based on the separability criterion of
the average Euclidean distance between classes, which is a
common separability criterion [9]. The average distance
between normal traffic conditions and hazardous traffic
conditions was calculated as

Where:
: the number of samples;

: feature vector of normal traffic conditions;

: feature vector of hazardous traffic conditions.


Note: During the classifier design,

and

should be
observed under the same conditions, such as weather and
road segment. The chosen precursors should maximize the
average distance between classes.
Time slice calculation () is another important factor
that significantly impacts the traffic accident prediction
because accident precursors are dependent of it. Most
previous studies arbitrarily chose the prior to crash
occurrence on the basis of subjective judgment. was
determined in [3] with a more objective method, which
considered distributions of accident precursors. The chosen
criterion of should maximize the difference of precursor
values for hazardous traffic conditions and normal traffic
conditions [3].The method to estimate in this study is one
that maximizes the sum of squares of differences between
precursors calculated from those two traffic conditions,
which is


Where:
: difference of accident precursor values between two
traffic conditions;
: the number of hazardous (or normal) traffic condition
samples;

: precursor values for an interval right before an


accident occurrence;

: precursor values for an interval 50 minutes


before an accident occurrence.

is the precursor value for hazardous traffic


conditions, and

is the precursor value for normal


traffic conditions. The reason for choosing the data 50
minutes before an accident occurrence to calculate

is
that, it is believed that traffic conditions 50 minutes before
an accident cannot affect the accident occurrence [1].
B. Design of hazardous traffic conditions and normal
traffic conditions classifier
1) k-nearest neighbor method.
The k-nearest neighbor method is a simple classification
method. The classification decision rule of the method is to
assign an observation to the class that receives the largest
vote amongst the k nearest neighbors [7].
The best choice of k depends on data. Generally speaking,
larger values of k can make classification results reliable, but
make boundaries between classes less distinct. This forces us
to choose a compromise k that is a small fraction of the
number of samples [10]. Details of the k-NN method can be
found in [9]. Based on numerical testing, k = 1 and k = 5
were used in this study.
2) C-means clustering method.
The C-means clustering method assigns the observation
to the cluster whose centroid is the nearest [9]. The objective
function of the CM method is expressed as


Where:

: number of observations in cluster ;

: centroid of cluster ;
: number of clusters in the data set;

: observation vector.
It can be seen from the objective function that the
criterion is to minimize the distances of the observations
New
Samples
Prediction
Model
Historical traffic accidents and
corresponding real-time traffic
data
Data reduction and
preprocessing
Samples
Traffic accident precursor
selection
Classifier design
(or Prediction model building)
Real-time traffic data
collection
Preprocessing data
Calculating accident
precursors
Classification and decision
(or Prediction and decision)
548
within the cluster to the cluster centroid. The main advantage
of the CM method is its simplicity. There is no need for data
training and testing. However, it has its own limitations.
Firstly, it may not achieve a global minimum. Secondly, the
classification result is sensitive to initial randomly chosen
centroids.
IV. MODEL APPLICATION
A. Data collection
For this paper, traffic accident data and its corresponding
real-time traffic data obtained from inductive loop detectors
were collected from the simulation software TSIS. The study
section used for collecting data was about 3.2 km long. This
section had three lanes with six stations spaced at about 0.5
km, as shown in Fig. 2. Traffic volume, speed, and
occupancy were collected every 20 seconds from three lanes.
The location and time of traffic accident occurrences were
designated by simulation parameters. 50 traffic accidents and
their matching traffic data within 60 minutes before an
accident occurrence on a dry road were collected from the
simulation. Note: the data were only collected from the
environment described above, so external effects on accident
occurrence such as road geometry and weather effects were
well controlled.
B. Selection of accident precursors
We chose six variables as candidate accident precursors.
These variables were the means and the standard deviations
(SD) of traffic volume, speed and time headway.

and


were assumed to be precursor values and were recorded 5-
min period 50minutes before an accident occurrence and 5-
min period right before an accident occurred, respectively. If
raw traffic data is directly used to determine precursors,
variables with large values may dominate those with small
values, and they would be more easily chosen as accident
precursors. So we first linearly scaled the mean and standard
deviation of traffic volume, speed and time headway to the
range [0, 1], respectively. Then we used (1) and (2) to
determine accident precursors and .The results were as
follows:
If one variable was selected as an accident precursor, it
was the SD of time headway (SDTH); if two accident
precursors were selected, they were the SDTH and SD of
speed (SDS); if three variables were chosen, they were the
SDS, SDTH and SD of traffic volume (SDV). And the best
calculation time slice duration for standard deviation of time
headway, speed and volume were 5 minutes, 2 minutes and 1
minute respectively based on (2), as shown in Fig. 3.
Table 1 shows the classification results. Firstly, the k-NN
method outperforms the CM model. As the number of
accident precursors increases, the classification result for the
k-NN method becomes more satisfying. Secondly, the
classification results of the CM method are not satisfying.
Their overall error rate is about 50% and the error rate of
assigning normal traffic conditions to hazardous traffic
conditions is about 90% which will produce a very high false
alarm rate. Thirdly, the overall error rate is always around 50%
based on one single variable no matter which method is
applied, which indicates that we cannot identify hazardous
traffic conditions from normal traffic conditions based on
one single variable. This result is also consistent with what
found in [5].

(a) Study site and location of detector station

(b) Simulation environment in TSIS
Figure 2. Study site used in this research

(a) SD of time headway

(b) SD of speed

(c) SD of volume
Figure 3. Determination of calculation time slice durations
0
50
100
150
200
250
! ? 3 1 b 8 9 !0
D
i
f
f
e
r
e
n
c
e

(
%
)
Calculation time slice duration(minutes)
0
!00
?00
300
! ? 3 1 b 8 9 !0
D
i
f
f
e
r
e
n
c
e

(
%
)
Calculation time slice duration(minutes)
0
200
400
600
! ? 3 1 b 8 9 !0
D
i
f
f
e
r
e
n
c
e

(
%
)
Calculation time slice duration(minutes)
549
TABLE I. CLASSIFICATION RESULTS BASED ON DIFFERENT ACCIDENT
PRECURSORS AND DIFFERENT CLASSIFICATION METHODS
Accident
Precursors
Methods
OER
(%)
NDER
(%)
DNER
(%)
5-min SDTH
CM 49 92 6
k-NN
k = 1 65 70 60
k = 5 50 70 30
5-min SDTH,
2-min SDS
CM 49 88 10
k-NN
k = 1 30 10 50
k = 5 35 30 40
5-min SDTH,
2-min SDS,
1-min SDV
CM

49 88 10
k-NN
k = 1 25 20 30
k = 5 30 40 20
The DNER of the 5-nearest neighbor method with three
variables is 20% in this study, which means 80% of
hazardous traffic conditions are identified by the 5-nearest
neighbor method with three variables. 59% of crashes were
identified in [4], where a matched case-control model was
used. We only use traffic conditions to identify potential
traffic accidents, so the 80% accident identification accuracy
is considered reasonable.
V. CONCLUSIONS
This research tries to identify traffic conditions leading to
traffic accidents more likely by using the k-nearest neighbor
method, while considering the joint effects of accident
precursors on traffic accident occurrences and controlling the
geometry, environmental factors, etc. The data used in this
study were collected from the simulation software TSIS.
The main contributions of this study are concluded as
follows:
A real-time highway traffic accident forecast framework
is presented. This framework is proposed from the
perspective of pattern recognition. It considers feedback to
dynamically select accident precursors and design the
classifier.
A new method of determining calculation time slice
duration of precursors, which maximizes the sum of squares
of differences between precursors of two traffic conditions,
is proposed. 80% of hazardous traffic conditions are
identified by the 5-nearest neighbor method with three
variables. And the k-nearest neighbor method outperforms
the C-means clustering model.
The experiment demonstrates that it is promising to use
the k-nearest neighbor method in real-time traffic accident
prediction. Real-time traffic accident prediction algorithm
can be integrated into ATMS. It is believed that real-time
traffic accident prediction can play an important part in
preventing traffic accident occurrence and improving the
safety and efficiency of transportation systems.
ACKNOWLEDGMENT
This work was supported in part by the MOST Grants
2006CB705500 and 2007AA11Z217, NNSFC Grant
60621001, CAS Grant 2F07N04, and Shandong Province
Taishan Chair Professor Fund.

REFERENCES
[1] C. Oh, J. Oh, S. G. Ritchie, and M. Chang, "Real-time Estimation of
Freeway Accident Likelihood," in the 80th Annual Meeting of
Transportation Research Board, Washington, D.C., 2001.
[2] C. Lee, F. Saccomanno, and B. Hellinga, "Analysis of Crash
Precursors on Instrumented Freeways," Transportation Research
Record: Journal of the Transportation Research Board, No.1784, pp.
18, 2002.
[3] C. Lee, B. Hellinga, and F. Saccomanno, "Real-time-crash prediction
model for application to crash prevention in freeway traffic," in
Statistical Methods and Modeling and Safety Data, Analysis, and
Evaluation Washington: Natl Acad Sci, 2003, pp. 67-77.
[4] M. A. Abdel-Aty and R. Pemmanaboina, "Calibrating a real-time
traffic crash-prediction model using archived weather and ITS traffic
data," IEEE Transactions on Intelligent Transportation Systems, vol.
7, pp. 167-174, 2006.
[5] L. Luo and N. J. Garber., "Freeway Crash Predictions Based on Real-
Time Pattern Changes in Traffic Flow Characteristics," Office of
University Programs, Research Innovation and Technology
Administration, U.S. Department of Transportation UVACTS-15-0-
101, 2006.
[6] Y. Qi, B. L. Smith, and J. H. Guo, "Freeway accident likelihood
prediction using a panel data analysis approach," Journal of
Transportation Engineering-Asce, vol. 133, pp. 149-156, 2007.
[7] S. M. Tang and H. J. Gao, "Traffic-incident detection-algorithm based
on nonparametric regression," IEEE Transactions on Intelligent
Transportation Systems, vol. 6, pp. 38-42, 2005.
[8] M. J. Islam, Q. M. J. Wu, M. Ahmadi, and M. A. Sid-Ahmed,
"nvestigating the performance of Naive- Bayes classifiers and K-
nearest neighbor classifiers," in 2nd International Conference on
Convergent Information Technology Gyongju, South Korea: Institute
of Electrical and Electronics Engineers Computer Society, Piscataway,
NJ 08855-1331, United States, 2007.
[9] Z. Bian, X. Zhang, and etc., Pattern recognition, 2nd ed. Beijing:
Tsinghua University Press, 2000.
[10] R. Duda, P. Hart, and D. Stork, Pattern Classification, Second
Edition. New York: John Wiley & Sons, Inc., 2001.

550

You might also like