Train Accidents Report

Analysis of Train Accidents in the U.S.
During
2001 2012
Imran A. Khan
Summary
There are many factors that can cause severity of rail accidents. Based on my findings, season is an
important factor that causes more death. I find that more fatalities occur during summer season. The rate of
change of fatalities during summer season is estimated to be about 0.22 with 95% confident interval
between 0.04 and 0.4. So it is important for the FRA to put an extra safety when the train is running under
summer season. Type of accident and cause of accident significantly affect the cost damage at 5% level. A
train accident at RR grade crossing is more likely to cause cost damage. Putting a greater safety at RR
Grade Crossing can reduced the severity of cost damage. Also, the FRA should train well their people about
safety in order to minimize human error.
Honor pledge: On my honor, I pledge that I am the sole author of this paper and I have accurately cited all
help and references used in its completion.
Imran A. Khan
1. Problem Description
1.1. Situation
According to the Federal Railroad Administration (FRA) data from 2001 2012[3], about 2,500 train
accidents occur annually in the U.S. These incidents cause injuries ranging from the moderately severe to
death and cost damage. Many of the incidents have small damage cost and do not lead to any death. On
average, a train accident happens in the last 12 years has zero number of injury as shown in Figure 1. I
observe that more than 13% of the accidents lead to high damage cost and about 1% lead to a lot of people
killed. These are the extreme values above the upper whisker of boxplots that are considered as severe
accidents.
Figure 1. Boxplot of severity metrics
Boxplot of Total Damage
10
5
0
Total Damage (x$1,000,000)
4
0
Fatalities
15
Boxplot of Fatalities
For a given accident, many factors come into play. Table 1 shows that most of the accidents are because of
human error account 34% of train accidents, followed by rack, roadbed and structures failures account 32%.
Train derailment is the most common type of accident that leads to large cost damage (Table 2). However,
this type of accident has very minimum human damage. There are only 7% of train accidents at rail-highway
crossings but it lead to many fatalities (85%).
Table 1. Frequency table for severity metrics by type of cause

#
%
Total
Number
Fatalities
Type of cause
Damage
Total
Number
Fatalities
($)
Mechanical and
Electrical Failures (E)
Train operation Human Factors (H)
Miscellaneous
Causes (M)
Signal and
Communication (S)
Rack, Roadbed and
Structures (T)
Damage
($)
3617
561,518,909
12%
0%
17%
10655
35
756,877,988
34%
7%
23%
6327
452
625,508,542
20%
90%
19%
589
24,838,419
2%
0%
1%
9785
16
1,377,074,145
32%
3%
41%
Table 2. Frequency table for severity metrics by type of accident

#
%
Total
Number
Fatalities
Type of accident
Derailment (1)
Damage
Total
Number
Fatalities
($)
Damage
($)
20694
22
2,552,192,579
67%
4%
76%
Head on collision (2)
99
83,868,583
0%
2%
3%
Rear-end collision (3)
218
66,591,167
1%
1%
2%
Side collision (4)
1221
121,312,370
4%
1%
4%
Raking collision (5)
500
26,922,342
2%
0%
1%
Broken collision (6)
62
4,455,089
0%
0%
0%
Highway-rail cross (7)
2309
428
160,160,623
7%
85%
5%
RR Grade Crossing (8)
7,324,022
0%
0%
0%
Obstruction (9)
728
20
59,304,105
2%
4%
2%
Explosive (10)
12
18,268,351
0%
0%
1%
Fire (11)
231
37,921,122
1%
0%
1%
Other impacts (12)
3428
134,453,596
11%
1%
4%
Others (13)
1469
11
73,044,054
5%
2%
2%
Figure 2. Boxplot of severity metrics vs. season

Total Damage vs. Season
10
5
0
Fatalities
15
Fatalities vs. Season
Spring
Summer
Autumn
Winter
Spring
Summer
Season
Autumn
Winter
Season
Figure 3. Boxplot of severity metrics vs. type of accident

Total Damage vs. Type of Accident
10
5
0
Fatalities
15
Fatalities vs. Type of Accident
7
Type
10 11 12 13
7
Type
10 11 12 13
Figure 4. Boxplot of severity metrics vs. cause of accident

Total Damage vs. Cause of Accident
10
5
0
Fatalities
15
Fatalities vs. Cause of Accident
Cause
Cause
In this study, the predominant focus is on the severity of train accidents, i.e. any incident that lead to more
fatalities and expensive cost damage, and how they can be minimized. It appears that summer season lead
to more fatalities as compared to the other seasons but the total damage is almost similar across the four
seasons (Figure 2). Different type of accident will lead to different severity of rail accident. For severe
accident, it looks like explosive is the major type of accident that causing cost damage (Figure 3). Cause of
accident is another factor that affects the cost damage (Figure 4).
Figure 5 and 6 show that there is a relationship between speed and the severity metrics. High speed of train
tends to cause more fatalities and cost damage. The plots also tell me that the more people evacuated, the
more severity of rail accidents can be reduced. Gross tonnage of a train (TONS) and number of head end
locomotive (HEADEND1) are other important factors that related to severity metrics.
Figure 5. Scatterplot matrix between fatalities (TOTKLD) and the other quantitative variables
40
80
0 20000 50000
80
TOTKLD
2000
5000
40
TRNSPD
0 20000 50000
EVACUATE
HEADEND1
2000
5000
0 2 4 6 8
TONS
0 2 4 6 8
Figure 6. Scatterplot matrix between total damage (ACCDMG) and the other quantitative variables
40
80
30000 70000
1.5e+07
80
0.0e+00
ACCDMG
2000
5000
40
TRNSPD
30000 70000
EVACUATE
HEADEND1
0.0e+00
1.5e+07
2000
5000
0 2 4 6 8
0 2 4 6 8
TONS
The biplot displayed in Figure 7 tell me that many factors related to fatalities and cost damage as many
vectors pointing in the same direction as TOTKLD as well as ACCDMG. This confirms my findings as the
scatterplot matrix displayed in Figure 5 and 6 and those vectors are potential factors in causing severity
metrics.
Figure 7. Biplot for human and cost damage
Fatalities
10
Total Damage
20
-60
-40
-20
40
20
0.04
0.02
-20
-40
-0.02
-0.04
Comp.2
-10
CARSDMG
CARSHZD
CARS HEADEND1
TRNSPD
EVACUATE
TONS
ACCDMG
-0.1
0.0
0.1
0.2
-0.04
Comp.1
-0.02
-60
Latitude
-0.2
60
TEMP
-20
-0.2
0.00
20
10
0.1
0.0
-0.1
Comp.2
Longitud
40
Longitud
Latitude
TONS TEMP
CARS TOTKLDTRNSPD
HEADEND1
EVACUATE
CARSHZD
CARSDMG
20
60
-10
0.2
-20
0.00
0.02
0.04
Comp.1
1.2. Goal
The purpose of this study is to provide recommendations, so that FRA can take their action to reduce the
severity of railroad accidents in terms of fatalities and total damage.
1.3. Metrics
I utilize multiple linear regression models to measure the severity metrics. I consider several potential
factors such type of accident, cause of accident, and season to predict the severity of rail accidents. I use
significance level of 5% for the analysis throughout this study. If the confidence level (p-value) is less than
0.05, then my (null) hypothesis is rejected in favor of the alternative. Alternatively, if p-value is greater
than 0.05, the null should not be rejected. I also use adjusted R2, AIC and BIC criteria to compare between
models.
1.4. Hypothesis
Based on my observation in Figure 2, 3, and 4, I have three hypotheses regarding the severity of rail
accidents:
Hypothesis 1:
Ho: Season does not cause to more death
H1: Season causes to more death.
Hypothesis 2:
Ho: Type of accident does not affect cost damage.
H1: Type of accident affects cost damage.
Hypothesis 3:
Ho: Cause of accident does not affect cost damage.
H1: Cause of accident affects cost damage.
2. Approach
2.1. Data
The data set is obtained from the FRA railroad accidents period 2001 2012[3]. In total, there are 42033
accidents over the 12 years with 140 relevant variables. I find 20% of data points are duplicated. I also find
one data point with extreme value in terms of evacuation in year 2002. The reported numbers is 50000
people evacuated in an accident which is very unlikely to happen. I spot this extreme value as typo because
it shows very large value as compared to the other cases (Figure B1, Appendix B). Thus, I do not include
them in the analysis. I also do not consider data points from September 11, 2001 due to the chances of that
happening again is almost zero. The cost damage is significantly higher than the other cases. This leads to
30973 data points used in this study.
Since my interest is to examine severe accidents, only extreme cases above the upper whisker boxplot of
severity metrics is taken into account, that are accidents with at least one fatality and cost damage with at
least $143,861. I use all potential predictor variables, including confounding variables, in the initial models.
In total, there are 21 predictors: 10 continuous variables and 11 categorical variables, including SEASON
created as a new variable (Table A1, Appendix A). I remove any missing cases from data since the
methodology required complete observations. In total, I use 391 cases to model fatality (TOTKLD), and
2954 to model total damage (ACCDMG).
2.2. Analysis
In the modeling of severity metrics, I perform multiple linear regression analysis using R software with a
general model.
= 0 + 1 1 + 2 2 + . + +
The stages of data analysis are as follows:
1. I convert all categorical predictor variables into dummy variables. For example, TYPE has 13 levels
and R automatically encodes these 13 levels into 12 dummy variables with derailment as the base
case. See Table A1 in the Appendix for details base case selected for each categorical variable.
2. I utilize simple linear regression analysis for each of the 21 potential predictor variables.
Continuous predictor with p-value > 0.25 is not considered in the initial model (Full Model).
3. I reduce the full models by dropping all the non-significant predictors (Reduced Model). I use
Partial F test to examine if smaller set of predictors can be retained.
4. I also perform an alternative model selection, i.e. stepwise selection procedure, to select important
predictors in the full model (Step Model).
5. I then compare the reduced model and the step model by adjusted R2 and AIC criteria to select the
best model. I cannot use cross validation for model comparison due to the regression model on a
fold in which certain levels of the factor variable are not present.
6. I introduce second order model and interaction term for the selected model.
7. I carry out graphical diagnostic plots to examine how well the regression assumptions are satisfied.
8. I transform the response variable if the regression assumptions are violated.
For fatalities model, there are 10 predictor variables in the full model (Table B1, Appendix). This model
can be reduced by dropping 7 variables, i.e. TRNSP, TONS, HEADEND1, TYPE, TYPTRK, TRKCLAS,
and CAUSE with F-statistic 0.913 and p-value 0.5743. The reported BIC show that the step model is a
better model with smaller BIC value (881.76). But the AIC and the adjusted R2 values agree that the reduced
model is a preferable model (Table B2, Appendix). Furthermore, a second order model including interaction
term is considered in the reduced model. I find that second order model does not fit better than the first
model. I use partial F-test to check for this and I get F-test 1.39 with p-value 0.24. This means the interaction
terms and the second order of EVACUATE are not important in the model and the first order model is
preferable. A further investigation with diagnostic plots shows that the fitted model is moderately violated
the regression assumptions. The residual points are generally scattered randomly throughout the range of
fitted values. The points also generally fall around the line in QQ plot (Figure B2, Appendix). Transforming
the response variable with Box-Cox method does not do any better (Figure B3, Appendix), thus the fitted
model without interaction and second order term is chosen for ease of interpretation. Table 3 summarizes
the estimated coefficient (standard error) and the corresponding p-value for the first and second order
model.
Table 3. Comparison the first and second order model for fatalities
First order model
Estimate (Std. Error)
(Intercept)
Second order model
P-value
P-value
1.07 (0.08)
<0.0001
1.07 (0.08)
<0.0001
0.001 (0)
<0.0001
0 (0.002)
0.87
TYPEQ 2
0.46 (0.09)
<0.0001
0.46 (0.09)
<0.0001
TYPEQ 3
0.1 (0.14)
0.46
0.09 (0.14)
0.51
TYPEQ 4
-0.3 (0.7)
0.67
-0.3 (0.7)
0.67
TYPEQ 6
-0.07 (0.7)
0.92
-0.07 (0.7)
0.92
TYPEQ 7
0.19 (0.35)
0.58
0.18 (0.35)
0.60
TYPEQ 8
0.02 (0.2)
0.93
0.01 (0.2)
0.95
TYPEQ 9
-0.3 (0.5)
0.55
-0.3 (0.5)
0.55
TYPEQ A
-0.3 (0.7)
0.67
-0.3 (0.7)
0.67
TYPEQ C
0.08 (0.35)
0.82
0.1 (0.35)
0.78
TYPEQ D
0.43 (0.41)
0.29
0.44 (0.41)
0.28
TYPEQ E
Season(base case:
Spring)
Summer
-0.07 (0.7)
0.92
-0.07 (0.7)
0.92
0.23 (0.1)
0.02
0.22 (0.1)
0.03
-0.07 (0.11)
0.51
-0.06 (0.11)
0.61
0.03 (0.1)
0.74
EVACUATE
Type of consist
(base case = TYPEQ 1)
Autumn
Winter
0.03 (0.1)
0.80
4.7E-7 (1.8E-6)
0.80
EVACUATE x Summer
0.003 (0.004)
0.43
EVACUATE x Autumn
0.0001 (0.003)
0.97
EVACUATE x Winter
-0.001 (0.009)
0.95
EVACUATE2
For total damage model, there are 16 predictor variables considered in the initial model (Table B1,
Appendix). There are 7 variables that are not significant in the full model. Thus, only 9 predictors are kept
in the reduced model, i.e. CARSHZD, EVACUATE, TRNSPD, TONS, TYPE, TRNDIR, REGION, TYPTRK, and
CAUSE. A partial F-test shows that the reduced model explains total damage better with F-statistic 1.74 and
p-value 0.08. The reported BIC show that the reduced model is a better model since the BIC value is smaller
(87957.48). However, the model based on stepwise selection procedure is selected as the best model since
the AIC is smaller and the adjusted R2 is larger than the reduced model (Table B3, Appendix). As shown
in Table 4, the selected stepwise model that includes second order and interaction terms (Model 2) is found
to be better with p-value < 0.0001. Model 2 is then reduced by performing stepwise selection procedure. I
find that some interaction and second order terms can be dropped from the model so Model 3 can be retained
(p-value = 0.99).
Table 4. Partial F-test
Model 1: Step model (first order

model)
Res. Df
RSS
Df
2906
1.32E+15
2883
1.24E+15
23
2886
1.24E+15
Sum of
F-test
P-value
7.80E+13
7.89
<0.0001
4.35E+10
0.03
0.99
Square
Model 2: Step model including

second order and interaction terms
(second order model)
Model 3: Model 2 after
performing stepwise selection
procedure
The diagnostic plot shows that the selected model (Model 3) is moderately violated the regression
assumptions (Figure B4, Appendix). Similar as fatalities model, transforming the response variable with
Box-Cox method does not fit any better (Figure B5, Appendix). Therefore, the fitted model with second
order terms without any transformation to the response variable is chosen for ease of interpretation. Table
5 summarizes the estimated coefficient, standard error, and p-value.
Table 5. The selected second order model for total damage

P-value
746000 (393000)
0.058
CARSDMG
8660 (6180)
0.161
CARSHZD
110000 (31500)
<0.0001
283 (102)
0.006
-24100 (4930)
<0.0001
26.9 (4.12)
<0.0001
Head on collision
1470000 (134000)
<0.0001
Rearend collision
465000 (106000)
<0.0001
251000 (73000)
0.001
111000 (140000)
0.429
-199000 (221000)
0.369
-302000 (82000)
<0.0001
6760000 (658000)
<0.0001
Obstruction
242000 (123000)
0.049
Explosive detonation
266000 (658000)
0.686
Fire / violent rupture
-56300 (113000)
0.620
104000 (73000)
0.155
-47600 (111000)
0.667
South
-70600 (65200)
0.279
East
-116000 (61000)
0.058
West
-158000 (63900)
0.013
Region 2
-71000 (107000)
0.509
Region 3
-260000 (107000)
0.015
Region 4
-133000 (103000)
0.196
Region 5
-207000 (96700)
0.033
Region 6
-196000 (98600)
0.046
Region 7
81800 (107000)
0.443
Region 8
-108000 (105000)
0.306
Intercept
EVACUATE
TRNSPD
TONS
Type of accident (base case:
derailment)
Side collision
Raking collision
Broken train collision
Hwy-rail crossing
RR Grade Crossing
Other impacts
Others
Train direction (base case:
north)
FRA designated region (base

case: Region 1)
Type of consist (base case:

TYPEQ -NA)
TYPEQ 1
-225000 (383000)
0.556
TYPEQ 2
-18500 (392000)
0.962
TYPEQ 3
562000 (417000)
0.177
TYPEQ 4
-411000 (409000)
0.315
TYPEQ 5
-426000 (440000)
0.333
TYPEQ 6
83500 (398000)
0.834
TYPEQ 7
-85300 (384000)
0.824
TYPEQ 8
-54500 (401000)
0.892
TYPEQ 9
-117000 (426000)
0.783
TYPEQ A
-39800 (418000)
0.924
TYPEQ B
560000 (610000)
0.359
TYPEQ D
-502000 (611000)
0.411
-83900 (64900)
0.196
Siding
377000 (132000)
<0.0001
Industry
-27800 (110000)
0.801
-340000 (76400)
<0.0001
-12400 (76100)
0.871
-214000 (195000)
0.271
-253000 (68400)
<0.0001
-2820 (1230)
0.022
232 (42.3)
<0.0001
0 (0)
0.032
Longitud2
12.3 (3.92)
<0.0001
TRNSPD x South
96.2 (2410)
0.968
TRNSPD East
7690 (2290)
<0.0001
TRNSPD x West
7900 (2400)
<0.0001
TRNSPD x Region 2
2290 (3780)
0.545
TRNSPD x Region 3
19500 (3770)
<0.0001
TRNSPD x Region 4
12600 (3550)
<0.0001
TRNSPD x Region 5
15700 (3370)
0.000
Type of track (base case:

Main)
Yard
Cause of accident (Base

case: E)
CARSHZD2
TRNSPD2
2
TONS
TRNSPD x Region 6
17300 (3400)
<0.0001
TRNSPD x Region 7
9200 (3690)
0.013
TRNSPD x Region 8
12100 (3690)
<0.0001
TRNSPD x Yard
-7370 (5980)
0.218
TRNSPD x Siding
-25500 (7220)
<0.0001
-12200 (10200)
0.231
TRNSPD x H
16400 (2740)
<0.0001
TRNSPD x M
4740 (2490)
0.056
TRNSPD x S
4680 (10200)
0.646
TRNSPD x T
14800 (2190)
<0.0001
TRNSPD x Industry
3. Evidence
I find that season is an important factor that leads to more fatalities. The partial F-test shows that season
cannot be eliminated from the model (F-statistic: 3.49, p-value: 0.016). The p-value for summer season is
0.03, meaning that I have a strong evidence to reject my (null) hypothesis. The resulting coefficient
indicates that the number of fatalities is higher during summer season. The rate of change of fatalities during
summer season is estimated to be about 0.22 with 95% confident interval between 0.04 and 0.4. Based on
the final model for fatalities, I observe that TYPEQ is another important factor causing more death.
For total damage, cause and type of accident are important factors to the severity of total damage. Different
cause and different type of accident will lead to different cost damage and they are statistically significant.
With 95% confidence, these effects cannot be dropped from the model with F-statistics 5.82 and p-value <
0.0001. Therefore, I can reject my hypothesis that cause and type of accident do not affect total damage. It
should be noted that the train speed and cause of accident has an interaction effect on cost damage (Figure
B6, Appendix). This means that the relationship between total damage and cause of accident depend on the
train speed. I observe that at high train speed, human error comes into play to cause more cost damage.
Furthermore, given the other factors are fixed, the expected total damage for severe accident is higher at
RR Grade Crossing, i.e. $7,506,000 and the evidence is highly significant at 5% level.
4. Recommendation
It is evidence that several number of factors can lead to severe train accidents. This includes season, type
of accident, and cause of accident. The best models to answer my hypotheses have pretty high validation
to predict the severity of rail accidents, i.e. about 25% based on the adjusted R2 (Table B2-B3, Appendix).
With 95% confidence, the effect of season to fatalities is statistically significant. The rate of change of
fatalities during summer season is estimated to be about 0.22 with 95% confident interval between 0.04 and
0.4. The effect of type of accident and cause of accident are also significant to total damage. At 5% level,
these factors cannot be eliminated from the model, so I can be sure that they are important to severity of
train accidents. This confirms my findings based on the plots shown in Figure 2, 3, and 4. The results tell
me that the FRA should put an extra safety requirement when the train is running during summer season.
Human errors are often unavoidable. This is what I obtain from modeling the cost damage. I find that human
error is one of the most important factors that causing more cost damage. The FRA should train well their
people about safety, so that human error failures can be minimized. In addition, it is important to put greater
safety for train at RR Grade Crossing.
5. References
[1] D. E. Brown and L. Barnes, Laboratory 1: Train accidents," August 2013, assignment in class SYS
4021.
[2] D. E. Brown and L. Barnes, Laboratory 1: Train accidents template," August 2013, assignment in
class SYS 4021.
[3] F. R. Administration, Federal railroad administration office of safety analysis," August 2012.
[Online]. Available: http://safetydata.fra.dot.gov/officesafety/
Appendix A
Table A. Accident Description
No
Field Name
Description
Type
TOTKLD
Fatalities - total killed for railroads
Response variable
ACCDMG
Total reportable damage on all reports in $
Response variable
CARS
# of cars carrying hazmat
Continuous variable
CARSDMG
# of hazmat cars damaged or derailed
Continuous variable
CARSHZD
# of cars that released hazmat
Continuous variable
EVACUATE
# of persons evacuated
Continuous variable
TEMP
Temperature in degrees Fahrenheit
Continuous variable
TRNSPD
Speed of train in miles per hour
Continuous variable
TONS
Gross tonnage, excluding power units
Continuous variable
10
HEADEND1
# of head end locomotives
Continuous variable
11
Latitude
Latitude in decimal degrees, explicit decimal, explicit +/- (WGS84)
Continuous variable
12
Longitud
Longitude in decimal degrees, explicit decimal, explicit +/- (WGS84)
Continuous variable
13
TYPE
type of accident:
Categorical variable
01= derailment (base case),02= head on collision,03= rearend collision,04=

side collision,05= raking collision,06= broken train collision,07= hwy-rail
crossing,08= RR Grade Crossing, 09= obstruction,10= explosiv detonation,
11= fire / violent rupture,12= other impacts,13= other (described in narrative)
14
VISIBILTY
daylight period:
1=dawn (base case),2=day,3=dusk,4=dark

15
WEATHER
weather conditions:
1=clear (base case), 2=cloudy,3=rain,4=fog,5=sleet,6=snow

16
TRNDIR
train direction:
1=north (base case),2=south,3=east,4=west

17
REGION
FRA designated region (1 = base case)
18
TYPEQ
type of consist:
1=freight train (base case),2=passenger train,3=commuter train,4=work

train,5=single car,6= cut of cars,7= yard / switching,8= light loco(s),9= maint
/ inspect,car,A= spec. MoW q
19
TYPTRK
type of track:
1=main (base case), 2=yard, 3=siding, 4=industry

20
TRKCLAS
FRA track class: 1-9,X (1 = base case)
21
RCL
Remote control locomotive = 0,1,2, or 3
0= not a remotely controlled operation (base case),1= remote control portable

transmitter,2= remote control tower operation, 3= remote control portable
transmitter (more than one remote control)
22
CAUSE
Primary cause of incident:
E=Mechanical and Electrical Failures (base case), H=Human Factors,

M=Miscellaneous Causes, S=Signal and Communication, T=Rack, Roadbed
and Structures
23
SEASON
Primary cause of incident:

1=spring (Mar May) ( (base case), 2=summer (Jun Aug), 3=autumn (Sep
Nov), 4=winter (Dec Feb)
Appendix B
Table B 1. P-value of the overall F-statistic in simple regression model

Fatalities
Total
Damage
CARS
0.96
0.89
CARSDMG
0.40
0.00
CARSHZD
0.86
0.00
EVACUATE
0.00
0.00
TEMP
0.27
0.74
TRNSPD
0.01
0.00
TONS
0.13
0.00
HEADEND1
0.10
0.81
Latitude
0.82
0.00
Longitud
0.71
0.00
factor(TYPE)
0.00
0.00
factor(VISIBLTY)
0.99
0.24
factor(WEATHER)
0.34
0.69
factor(TRNDIR)
0.83
0.00
factor(REGION)
0.28
0.00
factor(TYPEQ)
0.05
0.00
factor(TYPTRK)
0.00
0.00
factor(TRKCLAS)
0.17
0.00
factor(RCL)
NA
0.00
factor(CAUSE)
0.01
0.05
factor(SEASON)
0.05
0.46
*NA: cannot be estimated since only one level available under RCL variable for fatalities model
Table B 2. Model comparison for fatalities

Full Model
Response variable
Reduced Model
Stepwise Model
TOTKLD
EVACUATE, TRNSPD,
TONS, HEADEND1,
Predictor variables
TYPE, TYPEQ, TYPTRK,

TRKCLAS, CAUSE,
EVACUATE, TYPEQ,
EVACUATE, TRNSPD,
SEASON
CAUSE, SEASON
SEASON
R2
33.22%
29.6%
26.66%
adjusted R2
26.43%
26.79%
25.32%
AIC
867.419
846.043
846.044
BIC
1018.23
913.51
881.76
F-statistic: 10.51 on 15
F-statistic: 19.89 on 7 and
and 375 DF, p-value: <
383 DF, p-value: < 2.2e-
354 DF, p-value: 8.785e-16
2.2e-16
16
Overall significance
Partial F-test: Full vs.

Reduced Model
F: 0.913, p-value: 0.5743
Table B 3. Model comparison for total damage

Full Model
Response variable
Reduced Model
ACCDMG
CARSDMG, CARSHZD,
Predictor variables
Stepwise Model
EVACUATE, TRNSPD,
CARSHZD,
TONS, Latitude, Longitud,
EVACUATE, TRNSPD,
TYPE, VISIBLTY, TRNDIR,
TONS, TYPE, TRNDIR,
REGION, TYPEQ,
REGION, TYPTRK,
TYPTRK, TRKCLAS, RCL,
CAUSE
CAUSE
CARSDMG, CARSHZD,
EVACUATE, TRNSPD,
TONS, Longitud, TYPE,
TRNDIR, REGION,
TYPEQ, TYPTRK,
CAUSE
R2
27.2%
25.14%
26.67%
adjusted R2
25.61%
24.29%
25.49%
AIC
87725.23
87747.80
87714.55
BIC
88114.63
87957.48
88008.11
2890 DF, p-value: < 2.2e-16
2.2e-16
2.2e-16
Overall significance
Partial F-test: Full vs.

Reduced Model
F-test:1.74, p-value:0.08
Figure B 1. Boxplot for fatalities and number of people evacuated for each year to identify potential
outliers
40000
30000
20000
10000
0
Number of People Evacuated
50000
Number of people evacuated in each year
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
Year
Normal Q-Q
1924
27762
38066
-2
2 4 6
-2
Residuals
Residuals vs Fitted
1924
38066 27762
Standardized residuals
Figure B 2. Diagnostic plot for the selected fatalities model before transformation
-3
-2
Fitted values
Residuals vs Leverage
6
1924
41250
-2
1.5
Scale-Location
1924
38066 27762
Theoretical Quantiles
0.0
Fitted values
-1
18969
1
0.5
0.5
1
Cook's distance
0.0
0.2
0.4
0.6
Leverage
0.8
Figure B 3. Diagnostic plot for fatalities model after transformation with Box-Cox method (Lambda=-2)
20
40
60
10
1924
27762
38066
40
0
20
38066 27762
-20
Residuals
Normal Q-Q
Residuals vs Fitted
1924
80
-3
-2
60
10
5
18969
4535
1.0
40
1924
0.5
1
0.5
1
Cook's distance
-5
38066 27762
20
2.0
3.0
Scale-Location
1924
0.0
Fitted values
-1
80
0.0
0.2
0.4
Fitted values
0.6
0.8
Leverage
0e+00
2e+06
4e+06
20
18324
41076
20237
5 10
41076
20237
Normal Q-Q
1e+07
18324
0e+00
Residuals
Residuals vs Fitted
Figure B 4. Diagnostic plot for the selected total damage model before transformation
6e+06
-3
2e+06
4e+06
Fitted values
6e+06
15
18324
41076
20237
-5 0 5
4
1
41076
20237
Scale-Location
18324
0e+00
-1
Fitted values
-2
1
0.5
0.5
1
Cook's distance
0.0
0.2
0.4
Leverage
0.6
0.8
Figure B 5. Diagnostic plot for the selected total damage model after transformation with Box-Cox
method (lambda=-0.5)
-1000
1000
38066
41076
10
38066
41076
-3
-2
-1
Scale-Location
18324
38066
36985
2.0
1.0
0.0
10
Fitted values
38066
41076
500
18324
1000 1500 2000 2500
18324
3.0
500
Normal Q-Q
18324
Residuals
3000
Residuals vs Fitted
0.5
1
Cook's distance
1000 1500 2000 2500
0.0
0.2
Fitted values
0.4
0.6
1
0.5
0.8
Leverage
CAUSE
4.0e+06
8.0e+06
M
E
H
S
T
0.0e+00
ACCDMG
1.2e+07
Figure B 6. Interaction plot train speed and cause with damage cost of accident
0 4
13
19
25
31
37
43
49
55
TRNSPD
61
67
75
90

Train Accidents Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Train Accidents Report

Uploaded by

Copyright:

Available Formats

Analysis of Train Accidents in the U.S.

Total Damage (x$1,000,000)

Table 1. Frequency table for severity metrics by type of cause

Table 2. Frequency table for severity metrics by type of accident

Head on collision (2)

Rear-end collision (3)

Side collision (4)

Raking collision (5)

Broken collision (6)

Highway-rail cross (7)

RR Grade Crossing (8)

Other impacts (12)

Figure 2. Boxplot of severity metrics vs. season

Total Damage (x$1,000,000)

Fatalities vs. Season

Figure 3. Boxplot of severity metrics vs. type of accident

Total Damage (x$1,000,000)

Fatalities vs. Type of Accident

Figure 4. Boxplot of severity metrics vs. cause of accident

Total Damage (x$1,000,000)

Fatalities vs. Cause of Accident

Second order model

Estimate (Std. Error)

Model 1: Step model (first order

Model 2: Step model including

Table 5. The selected second order model for total damage

Fire / violent rupture

FRA designated region (base

Type of consist (base case:

Type of track (base case:

Cause of accident (Base

Fatalities - total killed for railroads

Total reportable damage on all reports in $

# of cars carrying hazmat

# of hazmat cars damaged or derailed

# of cars that released hazmat

Temperature in degrees Fahrenheit

Speed of train in miles per hour

Gross tonnage, excluding power units

# of head end locomotives

Latitude in decimal degrees, explicit decimal, explicit +/- (WGS84)

Longitude in decimal degrees, explicit decimal, explicit +/- (WGS84)

01= derailment (base case),02= head on collision,03= rearend collision,04=

1=dawn (base case),2=day,3=dusk,4=dark

1=clear (base case), 2=cloudy,3=rain,4=fog,5=sleet,6=snow

1=north (base case),2=south,3=east,4=west

FRA designated region (1 = base case)

1=freight train (base case),2=passenger train,3=commuter train,4=work

1=main (base case), 2=yard, 3=siding, 4=industry

FRA track class: 1-9,X (1 = base case)

Remote control locomotive = 0,1,2, or 3

0= not a remotely controlled operation (base case),1= remote control portable

Primary cause of incident:

E=Mechanical and Electrical Failures (base case), H=Human Factors,

Primary cause of incident:

Table B 1. P-value of the overall F-statistic in simple regression model

Table B 2. Model comparison for fatalities

TYPE, TYPEQ, TYPTRK,

F-statistic: 19.89 on 7 and

F-statistic: 4.892 on 36 and

and 375 DF, p-value: <

383 DF, p-value: < 2.2e-

354 DF, p-value: 8.785e-16

Partial F-test: Full vs.