You are on page 1of 22

2012

Factors That Affect Transit Ridership in Southern California


2005-2011

A STATISTICAL STUDY OF EXTERNAL FACTORS, 2005-2011


SAMANTHA BEIER, ANDREW REKER, ELIZA YU, XINYU XU

UNIVERSITY OF CALIFORNIA, IRVINE PPD 204 QUANTITATIVE ANALYSIS FOR PLANNERS | Project Submitted for Final Grade

Page 1

INTRODUCTION

For this project, we chose to explore public transportation in order to describe and better understand the statistical relationship between the factors that may impact public transit ridership in Southern California. The geographical scope of our study focuses on the Los Angeles Combined Metropolitan Statistical Area (CMSA) including: Los Angeles, Orange, Riverside, San Bernardino, and Ventura counties. We have chosen to explore transit ridership because the maximized utilization of public transit is often considered to be a form of social welfare and may be associated with positive change and well being at the community level. We have analyzed multiple variables using monthly data from January 2005 until December 2011, such as, non-farm employment levels, average unleaded gasoline price, and precipitation. We hypothesize that there is a relationship between public transit ridership and these variables. In addition, we hypothesize that there is a difference in public transit ridership when comparing the categories of school and non-school months. Due to limitations such as our regional scope, the presence of a large number of transit agencies in this region, and the availability of applicable data, we will not be analyzing internal factors of transit ridership such as fare rates or the quantity or quality of public transit service. Income is a contributing factor we believe can possibly explain transit riderships increase or decrease in usage in the LACMSA region. We will be analyzing income qualitatively as opposed to quantitatively due to lack of reliable monthly data available. Annual data, collected by the U.S. Census Bureau, will allow us to better understand how income has changed throughout this time period throughout the combined metropolitan areas in our area of interest.

Page 2

QUALITATIVE ANALYSIS

We have identified four time-series datasets to enable us to gain some insight into our research topic. These datasets include: unlinked transit ridership in Southern California, monthly total precipitation in inches at the downtown Los Angeles Civic Center weather station, average monthly price of unleaded regular gasoline in the Los Angeles metropolitan area, and the Los Angeles-Orange County-Inland Empire-Ventura County Combined Metropolitan Statistical Area non-farm employment. This paper is structured to examine the relationships between transit ridership and each factor independently. The literature on factors that affect transit ridership generally divides these factors into two categories: internal organizational factors and external economic or societal factors (Taylor & Fink, 2003). We seek to examine the way in which factors external to transit operators have an effect on transit ridership, generally, economic, climatological, and seasonal factors. The time period we have selected for examination spans from January 2005 to December 2011 and were recorded at monthly increments. We will examine the descriptive statistics for each of the four variables in subsequent sub-sections. Afterward, we will examine the

correlation between transit ridership and precipitation, unleaded gasoline price, and non-farm employment, respectively. This is followed by a seasonal comparison between ridership in summer months, here described as non-school and school months using a t-test. Descriptive statistics for each of the variables we will be discussing may be found in the table below (Table 1).

Page 3 TABLE 1 GENERAL STATISTICS FOR DATA SETS Unlinked Transit Ridership in Southern California N Mean Median Mode Std. Deviation Variance Group Id. Summer versus NonSummer 84 1.25 1.00 1 .436 .190 Monthly Total Precip in Inches at LA Civic Center Unleaded Price LAOC (USD) LA-OC-IE-VC Combined MSA Non-Farm Employment

84 84 59,876,176.93 1.2268 59,915,069.50 0.3100 51,088,939a .00 3,364,748.747 2.21565 11,321,534,13 4.909 0,019.254 16,686,804 1 11.02 Range 51,088,939 1 .00 Minimum 67,775,743 2 11.02 Maximum 57,962,155.25 1.00 .0000 Percentile 25 59,915,069.50 1.00 .3100 50 62,009,243.50 1.75 1.6250 75 a. Multiple modes exist. The smallest value is shown VARIABLE 1: Transit Ridership

84 3.04165 3.03550 2.519a .577614 .334 2.664 1.798 4.462 2.57475 3.03550 3.33875

84 6,882,615.48 6,994,200.00 7,202,300 288,170.031 83,041,966,866. 036 730,900 6,493,100 7,224,000 6,551,825.00 6,994,200.00 7,162,975.00

To begin the qualitative portion of our research project, we started by describing transit ridership with respect to the number of trips made using public transportation in the Southern California area. The dataset we selected for transit ridership is from the US DOT National Transit Database and is a sum of all unlinked rides served by transit operators in the Los Angeles-Orange-Riverside-San Bernardino-Ventura County Combined Metropolitan Statistical Area, calculated monthly. We collected monthly time-series data for January 2005 to December 2011, a total of 84 months. (United States Department of Transportation National Transit Database, 2012). The list of operators for this CMSA can be found in Appendix A.

Page 4 When looking at this dataset, we found that the public transit ridership data shows a relatively broad range in the measures of central tendency. The range of ridership over these 84 samples is somewhat broad at 16,686,804, with a minimum value of 51,088,939 and a maximum value of 67,773,743 (Figure 1). The standard deviation for this data is 3,364,749. This shows us that, even though the range is seemingly broad, with the standard deviation relatively small, the majority of the data points are expected to be tightly distributed near the mean. FIGURE 1: MONTHLY T RANSIT RIDERSHIP 2005-2011
Millions Public Transit Ridership

70 68 66 64 62 60 58 56 54 52 50

The mean for the ridership data is 59,876,177. When comparing the range of this data to that of the mean, this combination indicates that there are very few outliers. The median value of the ridership data is 59,915,069. Also, the mean and median are particularly close in value; there is only a difference of 38,993 between the two values. With the mean, the median, standard

Page 5 deviation, and range, we expect that the distribution of data to be close to normal and that there would not be a significant positive or negative skew when graphing this data. To test this conclusion, we use a histogram and can confirm that monthly ridership is distributed closely to that of a normal distribution, centering near the mean of 59,976,177 (Figure 2). The histogram in our figure also shows a line that is fitted to a theoretical normal

distribution. FIGURE 2 HISTOGRAM OF MONTHLY RIDERSHIP WITH NORMAL DISTRIBUTION FITTED

Page 6

VARIABLE 2: Unleaded Gas Prices

The next factor is gas price. In this section we describe the set of data reflecting average gas price within the Los Angeles-Orange-Riverside-San Bernardino-Ventura County Combined Metropolitan Statistical Area from January 2005 to December 2011. We use the monthly average unleaded gas price, as an independent variable to study and analyze the factors that affect transit ridership. Based upon the data summarized below, we can gauge the progression of the average gas prices over time in order to better describe possible patterns present in the data. FIGURE 3 MONTHLY UNLEADED GASOLINE PRICE 2005-2011

The mean of the gas price is $3.042, and the median is $3.036; they are almost the same. The range is $2.664 and the interquartile range is just $0.764. From these descriptive statistics, we conclude that gas prices in this time period are not distributed evenly. The box plot below shows that the median is almost in the middle between the highest price and the lowest price, but that it is slightly closer to Q3 than to Q1. (Figure 4) Because two of the data points are extreme

Page 7 outliers, they are not included in this figure, but they are the highest prices. To better understand this distribution, we chose to display this information using a histogram and to draw a fitted line on the graph; it shows that the distribution of gas price follows a normal distribution curve (Figure 5). FIGURE 4 BOX PLOT OF MONTHLY UNLEADED GAS PRICE

FIGURE 5 HISTOGRAM OF MONTHLY UNLEADED GAS PRICES, FITTED NORMAL

Page 8

VARIABLE 3: Total Non-Farm Employment

The second factor we want to analyze is the number of non-farm jobs. We choose to use the total non-farm employment, which is representative of jobs in the Combined Metropolitan Statistical Area. The mean of non-farm employment is 6,882,615.48 with a standard deviation of 288170.03. Looking at the month-to-month employment trends, from 2005 to 2007, employment rose positively (Figure 6). But from 2008 to 2009, there is a decrease in non-farm jobs, which is likely a result of the deep economic crisis. After that, the economy began a recovery with increased non-farm employment, but had regular fluctuation. FIGURE 6 MONTHLY NON-FARM EMPLOYMENT LEVELS 2005-2011

When we examine the distribution of monthly non-farm data, we see that the distribution of the non-farm employment is not normal (Figure 7). Using both a histogram and a stem-and-leaf

Page 9 plot, we see that there are two peaks in the data: the first one at approximately 6,500,000, and the second one at approximately 7,100,000. FIGURE 7 HISTOGRAM AND STEM-AND-LEAF OF MONTHLY NON-FARM EMPLOYMENT LEVELS

Page 10

VARIABLE 4: Precipitation

The third factor we want to discuss is weather. Anecdotally, we see that when it is raining in this region, many people may postpone, alter, or cancel their travel plans. Therefore, we selected the element of precipitation to represent the condition of inclement weather that may explain possible reasons transit riders defer or cancel their travel plans. The value of precipitation rates fluctuated dramatically from 2005 to 2011 (Figure 8). Since Southern California is a drier climate and is semi-arid, the values of precipitation of many months are very close to zero. FIGURE 8 MONTHLY PRECIPITATION RECORDED AT LA CIVIC CENTER 2005-2011

Page 11 TABLE 2 DESCRIPTIVE STATISTICS FOR MONTHLY PRECIPITATION

The mean value of precipitation is 1.227 inches, with a significantly lower median of 0.310. Given the significantly higher mean when compared to the median, we can conclude that the distribution is positively skewed; we have also calculated the skewedness as having a value of 2.915, showing a very significant positive skew (Figure 9). FIGURE 9 HISTOGRAM OF MONTHLY PRECIPITATION

Page 12

VARIABLE 5: Household Income

Household income in the LACMSA region is shown below in a graph to better represent this data (Table 3). Here we can see that there is a steady increase from 2005-2008 when it peaks and then we can see there is a decline from 2008 to 2011 with a flattening out moving from 2010 to 2011 (Figure 10). From this information we can assess that the higher the household income, the less likely residents within LACMSA service areas are to take public transportation. The lower the income however, the greater likelihood that LACMSA residents will choose to take public transportation. TABLE 3 ANNUAL HOUSEHOLD INCOME IN THE LA CMSA Year 2005 2006 2007 2008 2009 2010 2011 Household Income $52,069 $55,678 $58,648 $60,141 $58,005 $56,542 $56,231

FIGURE 10 ANNUAL H OUSEHOLD I NCOME IN THE LA CMSA

CSA Median Household Income 2005-2011


$62,000 $60,000 $58,000 $56,000 $54,000 $52,000 $50,000 $48,000 2005 2006 2007 2008 2009 2010 2011

Page 13

QUANTITATIVE ANALYSIS

Now that we have described our data more generally, we will test our hypotheses (Table 4). First, we will test the strength of transit ridership correlations to that of: employment level, weather, and average unleaded gasoline price. Finally, we will test for seasonality in our data, that is, we will look to show a significant difference in transit ridership based upon groupings of school and non-school (summer) months. TABLE 4: RESEARCH HYPOTHESES Hypothesis 1: Employment level, weather, and average gasoline price correlate with transit ridership. Research Hypothesis There is a correlation between the three identified factors and transit ridership. Null Hypothesis There is no correlation. Hypothesis 2: There is a seasonal trait in transit ridership based on school year. Research Hypothesis There is a significant difference in transit ridership means between school months and non-school (summer) months. Null Hypothesis There is no significant difference.

CORRELATION: Transit Ridership and Total Non-Farm Employment

First, we examine the relationship between public transit ridership and non-farm employment in order to better understand the relationship between these two variables. We calculated a correlation coefficient of 0.431, which has significance at the 0.01 level. This correlation value of 0.431 can be interpreted as a modest positive relationship. We found that as the quantity of people employed in non-farm jobs increased, so did the quantity of passenger trips in transit ridership (Figure 11). There is a clear, positive correlation between these two variables when graphing these two variables for display. While this positive correlation does show a relationship between public transit ridership and non-farm employment, we are unable to determine a causal relationship from this form of analysis.

Page 14 FIGURE 11: CORRELATION OF TRANSIT RIDERSHIP AND NON-FARM EMPLOYMENT, 20052011

CORRELATION: Transit Ridership and Precipitation

In order to better understand the relationship between transit ridership and that of weather

patterns, we will be using precipitation (in inches) as a way of characterizing weather conditions during any given month. For this analysis, we ran a correlation of these two variables in order to better understand this relationship. When calculating the correlation of these two variables, we found that there is a correlation of -0.589, which is significance at the 0.01 level and indicates a moderate negative relationship. We found that as the average precipitation (in inches) increases, the quantity of passenger trips in transit ridership tends to decrease (Figure 12). When graphed, there is a relatively clear negative correlation between these two variables. While this negative

Page 15 correlation does show a relationship between public transit ridership and precipitation, we are unable to determine a causal relationship from this form of analysis. FIGURE 12: CORRELATION OF TRANSIT RIDERSHIP AND PRECIPITATION, 2005-2011

CORRELATION: Transit Ridership & Gas Prices

We examined the influence of gas price on transit ridership by calculating a Pearsons

correlation statistic between ridership and the average price of unleaded gasoline within the LACMSA. The correlation statistic for these two variables is 0.334 with a p-significance of 0.001. This correlation is meaningful as it is under the p-critical value of 0.05 (Table 5). The correlation statistic also shows that there is a weak positive correlation between unleaded gasoline price and transit ridership.

Page 16 TABLE 5 TRANSIT RIDERSHIP & GAS PRICE CORRELATION TEST RESULT Correlation 0.334 p-significance 0.001 n 84

An alternative way of representing this relationship is to use a graph (Figure 13). A graph to test for relationship between gasoline price and transit ridership would have gasoline price on the x-axis and the transit ridership on the y-axis. Each point on the graph would represent one particular month in our data set. The x-value would reflect the average gas price for a given month while the y-value would reflect the transit ridership during that same month. The 84 points (n=84) would then be plotted and a line of best fit is plotted for reference. When rendered for our data, the graph showed that there is a weak positive correlation between our two variables indicating that as gas price increases, on average, so would public transit ridership. FIGURE 13: CORRELATION OF TRANSIT RIDERSHIP AND PRECIPITATION, 2005-2011

Page 17

T-TEST: Transit Ridership & School Status

For our next test, we sought to explain differences in transit ridership as it relates to school status, here categorized as school and non-school months. We hypothesize that during the academic school year there is an increase in transit ridership in the Los Angeles CMSA when compared to non-school (summer) months. We have categorized school months to include months September through May and non-school months to include months June through August based on the public school academic calendar in the LA CMSA region. To test this hypothesis, we used independent samples T-test in SPSS to help us analyze how school and non-school status affect transit ridership (Figure 14). We assigned a group ID for school and non-school categories for the purposes of this test. They were assigned the ID number of 1 and 2, respectively. FIGURE 14: SCHOOL SEASONALITY - INDEPENDENT SAMPLES T-TEST RESULTS

From the results of this T-test, we can conclude that there is no significant difference in transit ridership between school and non-school months. This is seen in the p-value labeled as

Page 18 Sig (2-tailed) in our table, which has a value of greater than 0.05. This value means that the variance in means of transit ridership between school and non-school months not statistically different in both groups. Therefore, we fail to reject the null hypothesis, as there is no significant difference between these two groups. Looking at the group-level descriptive statistics between the two groups, both groups means are very close to each other as well as overlapping ranges. This further reinforces the conclusion of the t-test (Table 6). TABLE 6: SCHOOL AND NON-SCHOOL IN THE LA-CMSA REGION STATUS IN CORRELATION WITH TRANSIT RIDERSHIP Group school) N Mean Median Mode Std. Deviation Variance Range Minimum Maximum Valid 63 59,495,727.59 59,557,368.00 None 3,546,591.19 12,578,309,076,811.90 16,686,804.00 51,088,939.00 67,775,743.00 21 61,017,524.95 60,326,975.00 None 2,483,518.28 6,167,863,065,431.20 9,077,955.00 57,898,818.00 66,976,773.00 2 (Non-

Group 1 (School)

Page 19

Conclusions

Our statistical analysis found some interesting conclusion in the case of our selected factors. We found significant correlations for all of our factors; however, they were only moderately strong. With most of these calculated correlations, we can say that no single factor is strongly correlated to transit ridership (Table 7). Further areas for research could involve time-series analysis with regression. In addition, a model could be created to create generalized forecasts of transit ridership in Southern California. TABLE 7: SUMMARY TABLE OF CORRELATIONS AND T-TEST RESULTS Transit Ridership and Gasoline Prices Employment Levels Precipitation Seasonality Correlation 0.334 0.431 -0.589 Relationship Weak Positive Weak Positive Moderate Negative Seasonality Not Found

Several interesting points from our data include: the relative strength of the negative correlation between precipitation and transit ridership, compared to the other two variables we tested for correlation was surprising. One possible explanation for this is that December is a month that often has increased precipitation in Southern California; it is also the month that typically has fewer workdays due to a greater number of prominent holidays and an increase in the use of vacation days at the years end. It is the multiple factors that impact transit ridership, both internal and external, that make transit ridership such an interesting topic for research as no single factor can exclusively predict change.

Page 20

State of California Employment Development Department. (2012). MSA Seasonal Adjusted Total Non-farm Employment. [Data File]. Retrieved from http://www.calmis.ca.gov/file/indhist/msa$shws.xls Taylor, Brian D., and Camille N. Y. Fink. The Factors Influencing Transit Ridership: A Review and Analysis of the Ridership Literature. Fall 2003, p. 681 United States Census Bureau (2005-2011). American Fact Finder. Los Angeles-Long BeachRiverside, CA CSA United States Department of Labor Bureau of Labor Statistics. (2012). Consumer Price Index Average Price Data Los Angeles-Riverside-Orange County, CA [Data file]. Retrieved from http://data.bls.gov/timeseries/APUA42174714 United States Department of Transportation National Transit Database. (2012). Monthly Raw Data. [Data File]. Retrieved from http://www.ntdprogram.gov/ntdprogram/pubs/MonthlyData/MONTHLY_RAW_DATA_10 _03_2012.xls Western Regional Climate Center. (2012). Monthly Precipitation Los Angeles Civic Center, California. [Data File] Retrieved from http://www.wrcc.dri.edu/cgibin/cliMONtpre.pl?ca5115

REFERENCES

Page 21

Santa Monica's Big Blue Bus Access Services Anaheim Transportation Network Antelope Valley Transit Authority City of Arcadia Transit City of Commerce Municipal Buslines City of Corona City of Gardena Transportation Department City of La Mirada Transit City of Los Angeles Department of Transportation City of Redondo Beach - Beach Cities Transit Culver City Municipal Bus Lines DAVE Transportation Services, Inc. Foothill Transit Gold Coast Transit Laguna Beach Municipal Transit Laidlaw Transit Services Long Beach Transit Los Angeles County Metropolitan Transportation Authority dba: Metro Montebello Bus Lines Norwalk Transit System Omnitrans Orange County Transportation Authority Riverside Transit Agency Ryder/ATE Santa Clarita Transit Simi Valley Transit Southern California Regional Rail Authority dba: Metrolink SunLine Transit Agency Thousand Oaks Transit Torrance Transit System Ventura Intercity Service Transit Authority Victor Valley Transit Authority

APPENDIX A Transit Operators for LA - CMSA

You might also like