You are on page 1of 22

METHODOLOGICAL ISSUES IN DEVELOPING MODE CHOICE MODELS FOR

DHAKA CITY

Annesha Enam
Department of Civil Engineering,
Bangladesh University of Engineering and Technology
protiti56@gmail.com

Charisma F. Choudhury*
Department of Civil Engineering,
Bangladesh University of Engineering and Technology
cfc@alum.mit.edu

Word count: Tables and figures 5 x 250 = 1250


Text 6403
Total 7653

*Corresponding Author
Abstract
This paper presents the issues and challenges associated with development of a comprehensive
mode choice model for Dhaka, the capital of Bangladesh and the 11th largest city in the world.
Similar to most other developing countries, reliable level-of-service (LOS) data for the wide
variety of motorized and non-motorized modes used by the travelers are not available in
Bangladesh. In addition, the 12 million inhabitants of the city have wide differences in
affordability and accessibility to various modes. These result in substantial heterogeneity in their
choice-sets. These choice-sets are however also unobserved in the data and not easily inferable
from the limited information of the network.

In this paper, we identify the key limitations of the available data and propose methods to
overcome the limitations. A probabilistic choice-set of modes based on a small scale stated
preference (SP) survey has been used to account for the absence of actual choice-set data. The
systematic and stochastic errors in the network-derived time and cost data are explicitly
accounted for in the model structure. The improvements from the proposed approaches are
demonstrated by prediction tests using hold-out samples.

The proposed approaches have immense potential to improve travel mode choice models for
other cities of Bangladesh as well as cities in other developing countries which very often face
similar dearth of data.
1. Background
Bangladesh, the country of about 150 million people has been going through rapid economic
development. Dhaka, the capital city with a population of 12 million, has been influenced the
most by the development processes and has been subjected to high rate of urbanization. The
current urbanization level is around 30 percent and it is expected to rise to 50 percent by the
year 2050 (STP 2005). The city’s transport sector has been adversely affected by the rapid
urbanization and the economic development of the country. Now-a-days traffic congestion is an
issue of great concern for the inhabitants of Dhaka resulting in commuter’s frustration, longer
travel times, lost productivity, increased accidents, more fuel consumption, and deterioration in
air quality. Increasing the physical capacity is a very difficult option for the city since the ratio of
built-up areas is already approximated to be higher than 70% (Bari and Hasan 2001). Therefore
the solution of the problem requires increasing the operational capacity through demand and
supply management.

Though Dhaka is an old city (dating back to 16th century), very few travel demand models have
been developed for the city so far. Among the previous models, a four step travel demand
modeling process was adopted in Dhaka Metropolitan Area Integrated Transport Study (DITS
1993) where the mode choice model was simplified into a binomial choice model between
private and public modes. Habib (2002) also developed a four step model for Dhaka city where
the results of the mode choice model were counterintuitive with positive sign of the coefficients
for time and cost parameters. Moreover, in his study, the coefficient for comfort was greater
than that of time and cost which is not normal for a developing country like Bangladesh. The
most extensive travel demand model for Dhaka in recent years is the Strategic Transport Plan
(STP 2005) where a wide-scale household interview survey has been conducted for the first time
with financial contribution from the World Bank. In the mode choice model of STP (Louis Berger
Inc. and BCL 2005), only two modes were considered i.e. Public Transport (PT) and Individualized
Motorized Vehicles (IMV). In the IMV group, cars and taxis were grouped together overlooking
their very different attributes (e.g. running cost, availability, accessibility, etc.). In addition, non-
motorized vehicles (rickshaw) were not considered for the mode choice model though 37% of
the person trips are made by rickshaw as reported in the same study (STP 2005). In the Binomial
Logit (BL) model three different models have been developed for three income groups and only
two explanatory variables (travel time and travel cost) were used. The model has adapted pre-
set rules for determining choice-sets and ignored the heterogeneity among respondents. Hasan
(2007) developed a four step travel demand model where he adapted a rule based choice model
for car (assuming that a traveler with access to car will always select car as the travel mode
regardless of the situational constraints) and a Multinomial Logit (MNL) model for the choice
among rickshaw, auto-rickshaw, taxi and bus. In the MNL model, it was assumed that all four
modes were available to all travelers. Only two explanatory variables were used for the model
specification (travel time and travel cost) and separate models were developed based on trip
purposes. Hasan’s model was based on STP data but the LOS variables were updated using
supplemental survey (for cost) and outputs of the software EMME/2 (for travel time). The
potential measurement errors introduced in this process have however been ignored.
From the literature review, it was evident that the limitations of the available datasets played
key roles behind the deficiencies of the previously developed mode choice models and this has
prompted the current research. In this paper, the STP data (the most comprehensive travel data
source collected from Dhaka to date) have been explored in detail, the key modeling issues have
been identified and modeling approaches have been proposed to overcome these issues for
development of a more rigorous mode choice model. The improvements from the proposed
approaches have been demonstrated by prediction tests using hold-out samples.

The rest of the paper is organized as follows: A short description of the STP data is presented
first and the main limitations of the data are highlighted. The two key modeling issues, viz.
addressing the unobserved choice-set of respondents and correcting for measurement error in
LOS data are detailed in the subsequent sections. In each case, the problem is described first, a
review of literature on state-of-the-art modeling approaches that are relevant to the problem
are presented next, followed by the proposed methodology, estimation results. Results of
prediction tests using the improved model are then presented followed by the summary of the
research and future research directions.

2. Data
In this research, the Household Interview Survey (HIS) conducted as part of the Strategic
Transport Plan (STP) Study in 2005 has been used as the main data source. This is supplemented
by small-scale detailed surveys.

In the STP HIS more than 6,000 households (STP 2005) were interviewed and a huge amount of
information were collected regarding the location and type of the households, socioeconomic
characteristics of its members, vehicle ownership of the households, daily trip information of
the respondents and also some attitudinal information about the respondents. The
socioeconomic information data are immensely extensive with age, gender, education level,
employment status, occupation, driving license status, address of the worksite and educational
institutions. The daily trips of each member of the households were reported which revealed
more than 30,000 trip information. The trip diary consists of the origin-destination locations,
start and end time between origin and destination (but not for each modal segment of the trip
chain), purpose and transport modes of each trip segment. The attitudinal question part
comprises questions on reasons behind choosing the current mode of travel, existing problems
of the current travel modes, and asks for suggestions for the improvement of the traffic
situation to name a few. More details of the data are available at STP (2005).

Detailed exploration of the data revealed two key limitations:


 Unobserved choice-sets
 Unobserved LOS including absence of data on travel time and cost for unchosen modes

The problems and their consequences on model development are elaborated in the next two
sections, along with proposed solutions.
In addition to the STP HIS study, data from a small scale supplemental survey has been used in
this study. The supplemental survey was a stated preference (SP) survey where 1016 samples
were collected as part of this research. The survey questionnaire included specific question on
the respondent’s available modes. Beside the mode choice information in hypothetical
scenarios, the survey provided information regarding the socio-economic characteristics of the
respondent (e.g. age, gender, occupation, education, household income, car ownership,
chauffeur availability, etc.) as well as attributes of the trip in question ( e.g. purpose, duration,
distance traveled, etc.).

3. Choice-set Generation
3.1. The problem
The classical mode choice model can be expressed as follows:
P(i|Cn) = P(Uin  Ujn,  j  Cn) (1)
Where,
P(i|Cn) = Probability of choosing mode i among all modes in the choice-set of the respondent
Ujn = Utility of mode j where  j  Cn

A basic premise for the theoretical development and practical utilization of discrete choice
models is that the analyst is correctly able to specify the set of modes from which an individual
decision maker chooses a given alternative. However, in practical model development, it is often
the case that only the chosen mode is known with certainty and it is unclear what modes were
available in the choice context and/or were actually considered by the respondent while making
the decision. The analyst is thus burdened with the task of specifying the choice-set. Previous
research has shown that incorrect representation of choice-sets and the imputation of choice-
sets by the use of logical rules can lead to biases in parameter estimation and errors in
forecasting due to misallocation of the alternatives (William and Ortuzar 1979). McFadden and
Reid (1975) and Westin (1974) has shown similar results when the choice-sets are imputed by
some logical rules. Stopher (1980) has empirically shown the impact of captivity on the
estimation of parameters and forecasting with a binary mode choice model. Stopher showed
that the estimated coefficients were smaller and less significant and the alternative specific
constants were larger and more significant than in the “true” model where the true model was
estimated by excluding the captive users. Hensher (1983) has pointed out that for the accurate
estimation of travel time the choice model must include only the “choosers” i.e. the model must
exclude the captive users.

In case of developed countries the transport network data, car ownership and driving license
possession provide reasonable bases for constructing the choice-sets for mode selection. But, in
case of developing countries like Bangladesh, the transport network data are not structured.
Para-transits like human-haulers (and even in some cases traditional transits like buses) operate
beyond their permitted routes which make it impossible to infer the accessibility to these modes
from certain zone using transport network data. Moreover, in developing countries the
affordability of people plays a vital role in determining the choice-set, both for business and
non-business trips. Because of large household sizes and low car ownership rates, the number
of users per car is generally very high and car ownership alone is often not a suitable proxy for
car availability. For instance, if there are five members and a single car in the household, some
household members may get priority over others for the car usage (e.g. school going children,
elderly people, etc.). Moreover, multi-use of the same chauffeur driven car adds complexity in
the car availability since possession of a driving license no longer has correlation with car
availability either1. Therefore, a car is often not chosen, not because of the LOS tradeoffs, but
rather because it is not in the choice-set in the first place.

On the other hand, the affluent people who have access to car are often unaware of what public
transport options exist. This is escalated by the fact that the public transport modes have very
poor traveler information systems. For example, there are no published timetables, no
information regarding the routes or timetable in the bus stops, no options to access public
transport related information via internet or phone, etc. Therefore, a bus or para-transit is often
not chosen, not because of the LOS considerations, but rather because of its exclusion from the
choice-set.

As described in section 1, in the previous mode choice models developed for Dhaka City,
different deterministic rules have been used for specifying the choice-sets (e.g. for household
with car, the car is always in the choice-set, public transports are available either for all (e.g. STP
2005) or only to the non-car owners (e.g. Hasan 2007). Such simplistic rule-based choice-sets
can lead to wrong definitions of choice-sets and subsequently wrong parameter estimates. The
importance of correctly specifying choice-sets and the deficiencies in the choice-set generation
of the previous mode choice model prompts this research where we develop and test a choice-
set generation model that predicts the choice-set probabilistically for the choice context of
Dhaka using socio-economic characteristics, origin-destinations and trip purposes.

3.2. State-of-the-art Approaches


In discrete choice models of pre-seventies, it was assumed that either all the alternatives were
available to all the decision makers or some logical rules were used to compute the choice-set
e.g. in mode choice models no car were assumed to be available to an individual without a
driving license etc. Lerman (1975) recognized the inappropriateness of allocating all alternatives
to all individuals which led to the widespread use of imputed choice-sets in discrete choice
modeling (e.g. Ben-Akiva and Lerman 1974).

In classical economic choice theories, individual choice behavior in cases with unobserved
choice-sets has been modeled as a two-stage sequential process (Manski 1977): i) The
determination of an individual’s choice-set Cn; and ii) With the Cn well defined the individual
chooses an alternative according to some pre-established decision rules (e.g. utility
maximization).
1
In Bangladesh, majority of the households with car can afford to employ a chauffeur since the average
monthly wage of the chauffeur is often as little as $50/month.
For the analyst who has limited information about an individual’s choice-set can consider the
choice-set generation model as either deterministic or probabilistic depending upon the degree
of confidence s/he places on information at hand. There are many examples in the literature of
application of deterministic choice-sets (e.g. Ben-Akiva and Lerman 1974, Train 1980 etc) as well
as probabilistic choice-set generation models (e.g. Wermuth 1978, Swait 1984). In the
probabilistic approach, a separate choice-set generation model is used to predict the choice-set
stochastically; the probability of observing alternative j being chosen by individual n can
therefore be expressed as follows:

𝑃𝑛 𝑗 = 𝐶 𝜖 𝐺𝑛 𝑃𝑛 𝑗 𝐶 𝑃𝑛 𝐶
(2)
Where,
C = an element of Gn (C ⊆ Mn).
Gn = the set of all nonempty subsets of Mn; and
Mn= the set of all deterministically feasible alternatives for individual n (Mn ⊆ M ;
M = the universal choice-set, made up of all possible alternatives available for the choice context
and population in question;

Equation 2 reflects a three-part model of the choice process:

1. A probabilistic choice model, Pn(j|C), conditioned on the choice-set being C ∈ Gn, which by
definition yields choice probabilities of zero for j ∉ C;
2. A deterministic choice-set generation model that determines the subset Mn from the set M;
and
3. A probabilistic choice-set generation model, Pn(C), expressing the probability that set C ⊆ Mn
is the individual’s actual choice-set.

A high degree of computational complexity is implied by equation 2. If |x| denotes the number
of elements in any set X, then Gn is equal to (2|Mn| - 1), of which (2|Mn| - 1) choice-sets actually
contain any given alternative j ∈ Mn.

McFadden (1976a) was first to the best of our knowledge to formulate a model following the
above choice processes; he considered a choice situation where an individual is either captive to
an alternative or free to choose from Mn. This logit captivity model was also independently
developed by Ben-Akiva (1977) and Gaudry and Dagenais (1979). The latter named this the
“Dogit” model. Other approaches of probabilistic choice-set generation model include
Independent Availability Logit Model (Swait 1984) and Parameterized Logit Captivity model (e.g.
Swait and Ben-Akiva 1985).
In Captivity Logit Model, an individual is assumed either to be captive to a single alternative or
to be free to choose from among the full set of deterministically available alternatives. The
Independent Availability Logit Model assumes that the probability of availability of an
alternative is independent of the availability or lack thereof any other alternative. In
Parameterized Logit Captivity model, choice set is modeled as a function of independent
variables such that the probability of a mode being included in the choice set is a function of
socio-economic characteristics of the travelers. This can be expressed as follows:

Where D = (d1,…..,dk,…..dK) is a vector of parameters, Xi is a vector of socio-economic


characteristics of the decision-maker and attributes of alternative i, B = (b1,……,bl,…..bL) is a
vector of parameters of the MNL choice model, and Yi is a vector of socio-economic
characteristics of the individual and attributes of alternative i. It is to be mentioned that, the
two vectors Xi and Yi do not generally contain the same variables. Xi should include those
variables thought to explain captivity to alternative i, whereas Yi should include variables
(perhaps partially or totally overlapping with Xi) which influence choice of i from among the
alternatives of C.

Both the Logit Captivity model and Independent Availability Logit models have been applied to
investigate the mode choice process of the city of Maceio, Brazil (Swait and Ben-Akiva 1985a).
The model focused on the home based work mode choice for full-time workers. The outcome of
the model highlighted an important practical challenge. It indicated that the application of the
probabilistic choice-set generation process cannot be arbitrary. Rather, it must account for the
population in question as well as the source of constraints on it. In this particular choice context
the logit captivity model performed better than the independent availability logit model in the
low income group; the opposite was true for the high income group; and in the middle income
group the choice between the two models was indifferent.

A drawback of the probabilistic choice-set formation models is the greatly increased difficulty of
calibrating them. The departure from the standard logit linear-in-parameters formulation can be
costly because the convenient property of concavity of the log-likelihood function, which
guarantees the uniqueness of the parameters at the point of convergence, is lost. Hence a
greater degree of care and sophistication on the part of the analyst as well as specialized
estimation software are necessary.

In this paper, stated choice-set data (from the supplemental SP data) has been used for
developing the choice-set generation model. This developed model will be applied to revealed
preference data of the HIS to predict the missing choice-set information.

3.3. Proposed Methodology


In the proposed approach the parameters of a choice-set probability model are estimated first
using stated preference (SP) survey data. In the supplementary survey, an explicit question was
included on the availability of different modes as perceived by the respondents. Each
respondent was presented with a list of typical travel modes in Dhaka and asked what modes
were available to them for this particular trip. The answer of the respondents explicitly revealed
the modes considered by the respondents for the trip in question. These stated choice-sets
along with the comprehensive socio-economic and trip related data of the travelers have been
used as to form their choice-sets in RP data.

For the mode choice context and the population of Dhaka city the universal choice-set consists
of bus/tempo, car, CNG/taxi and rickshaw2. In the extreme cases, the traveler is captive to a
single mode or considers all modes in the choice-set. The total number of non empty subsets of
this universal choice-set is fifteen. The subsets of the universal choice-set are as follows:

1. bus, tempo 2. bus, rickshaw, tempo


3. bus, CNG/taxi, tempo 4. bus, CNG/taxi, rickshaw, tempo
5. bus, car, CNG/taxi, rickshaw, tempo 6. bus, car, rickshaw
7. bus, car, CNG/taxi 8. bus, car
9. rickshaw, tempo 10. CNG/taxi, tempo
11.CNG/taxi, rickshaw 12.car, rickshaw
13.car, CNG/taxi, rickshaw 14.car, CNG/taxi
15.car

These choice-sets can be broadly categorized into one of the following six groups.
 Public Transport Group : Includes bus and tempo;
 Public and Personalized Public Transport Group: Includes bus/tempo, CNG/taxi and
rickshaw;
 Personalized Public Transport Group: Includes CNG/taxi and rickshaw ;
 All mode Group: Includes all the four modes mentioned earlier;
 Car and Personalized Public Transport Group: Includes all the four modes except
bus/tempo;
 Car Group: Includes only car.

The candidate socioeconomic attributes and mode characteristics to be affected the choice-set
generation process and related a priori hypotheses are presented in Table 1.

2
Tempos are low- cost para-transits, CNGs are auto-rickshaws that run on compressed natural gas
Table 1: Candidate Variables and a-priori hypotheses

Attributes General Casual Relationship


Monthly Due to considerable disparity in monthly household income, household income
Household is supposed to play a vital role in the determination of the choice set of the
Income (HHI) individual, the propensity of having choice-sets with cars and individual modes
increasing with income.
Gender Due to social norms and culture male and female passengers do not feel free to
share transit vehicles especially in congested situation and also female
passengers try to avoid public transit due to safety concerns. Female
respondents are therefore less likely to have information about public
transport routes and include these modes in choice-sets.
Education & These variables are supposed to be somewhat correlated with HHI; and highly
Occupation educated white-collar employees are likely to have higher propensities of
having choice-sets with cars and individual modes.
Age Due to relatively low fare students are typically more inclined towards public
transit and more familiar with the routes and timetables. They are therefore
more likely to include these modes in choice-sets. Aged people on the other
hand are less likely to have choice-sets that include public transport modes.
Trip Purpose Consideration of the modes may also be strongly affected by the purpose of the
trip; e.g. people may be willing to consider transits in choice-sets for work trips
while they may exclude the same modes from their choice sets for social trips
with family members.
Travel Due to ease of accessibility and door-to-door service there are higher
Duration probabilities of inclusion of rickshaws in the choice-sets for shorter trips while
for long trips mobility is more important and rickshaws are not considered as
viable options

To get an idea about how these variables are going to affect the availability of different modes
in the choice-sets of an individual; an explorative analysis was done where the correlation with
the choice-sets was found to be highest for income, age and gender of the traveler and duration
and purpose of the trip.

Based on these findings, a discrete choice modeling technique has been used to estimate the
utility parameters of different choice-sets. A linear utility function is associated with each
choice-set. The utility of a choice-set i of individual n can be expressed as follows:

Uin = βi Xin + εin , ∀i ∈ Cn; (4)


Where
Xin = socio-economic characteristics of the individuals and attributes of different modes,
βi = Coefficient of Xin,
εi= Random error term,
Cn = Universal choice-set or the choice-set determined deterministically for individual n.
Though ideally the parameters of the choice set generation model should be estimated jointly
with the corresponding mode choice model. in the current research, the parameters of the two
models have been estimated sequentially due to limitations of the estimation software. This
implies that the correlation among error terms between the choice set generation model and
mode choice model have been ignored.

3.4. Results
As discussed earlier choice-set consideration is affected by the attributes of the alternative
modes and the socioeconomic characteristics of individuals. However, not the entire candidate
variables mentioned in Table 1 was found to be statistically significant and/or have intuitive
signs. The estimation was started with monthly household income and other attributes were
added step by step. The variables have been included only if there was a significant
improvement in the goodness-of-fit (adjusted Rho-square) and if the parameters were
significant and their signs were intuitive. For example, the socioeconomic attribute education
was not found to be significant.

The utility functions are presented below and the estimation results using BIOGEME
(http://roso.epfl.ch/biogeme) are presented in Table 2.

Uall =αCNG_taxi * one + αrickshaw * one + αcar * one + αbus_tempo * one + βincome2 * income2
Ubus_car =αbus_tempo * one + αcar * one
Ubus_car_CNG_taxi =αbus_tempo * one + αCNG_taxi * one + αcar * one
Ubus_CNG_taxi =αCNG_taxi * one + αbus_tempo * one + βincome1 * income1 + βttvlong * vlong + βtp1 * tp1
+ βage1 * age1
Ubus_CNG_taxi_rick =αbus_tempo * one + αCNG_taxi * one + αrickshaw * one + βincome1 * income1 + βttvlong *
vlong + βtp1 * tp1 + βage1 * age1
Ubus_rick =αbus_tempo * one + αrickshaw * one + βincome1 * income1 + βttvlong * vlong + βtp1 * tp1
+ βage1 * age1
Ubus_tempo =αbus_tempo * one + βttvlong * vlong + βtp1 * tp1 + βincome1 * income1 + βage1 * age1
Ucar_CNG_taxi =αCNG_taxi * one + βfemale * female + βincome2 * income2
Ucar_CNG_taxi_rick =αcar * one + αCNG_taxi * one + αrickshaw * one + βfemale * female + βincome2 * income2
Ucar_rickshaw =αcar * one + αrickshaw * one + βfemale * female + βincome2 * income2
UCNG_taxi_rick =αCNG_taxi * one + αrickshaw * one + βfemale * female + βincome2 * income2
Urickshaw =αrickshaw * one + βttvshort * vshort + βfemale * female + βincome2 * income2

Where,
αbus_tempo, αcar, αCNG_taxi, αrickshaw are the alternative specific constants associated with the
corresponding modes;
βage1 = coefficient of age for individuals of 18 to 25 years of old;
βfemale = coefficient of female dummy;
βincome1 =coefficient of income for the range of less than or equal to 20000 monthly;
βincome2 =coefficient of income for the range of more than or equal to 50000 monthly;
βtp1 =coefficient of trip purpose dummy for educational and work trips;
βttvlong =coefficient of travel time dummy for trip duration of greater than 45 minutes;
βttvshort = coefficient of travel time dummy for trip duration of 15 to 30 minutes.

Table 2: Estimation Results of Choice-set Generation Model


Model : Multinomial Logit
Number of estimated parameters : 10
Number of observations : 756
Number of individuals : 756
Null log-likelihood : -1878.589
Init log-likelihood : -1878.589
Final log-likelihood : -1372.242
Likelihood ratio test : 1012.694
Rho-square : 0.270
Adjusted rho-square : 0.264
Name Value t-test
αbus_tempo 0
αcar 0.344 2.55
αCNG_taxi 1.14 12.65
αrickshaw -1.15 -13.6
βage1 0.555 3.26
βfemale 0.912 4.42
βincome1 2.28 7.13
βincome2 0.43 2.1
βtp1 1.09 6.87
βttvlong 1.4 8.73
βttvshort 2.03 6.07

It is to be mentioned that twelve of the fifteen different choice-sets mentioned earlier have
been used for the choice-set generation model. Three choice-sets have not considered
separately because of very small amount of observation. Those have been merged with the
choice-sets whose characteristics are supposed to have close resemblance. For example, car
has been merged with car, CNG/taxi group.

The results indicate that all else being equal, the rickshaws have the smallest probabilities of
being included in the choice-set. This is probably due to the fact that rickshaws are banned in
the major streets and certain OD pairs are served by rickshaws only through small streets and
alleys (which are often not known to everyone). Therefore, people are likely to exclude
rickshaws from their choice-sets because of lack of this network information. CNG/taxi on the
other hand has the highest value of alternative specific constant (ASC) indicating higher
likelihood of it being included in the choice-sets. However, it should be noted that the ASCs in SP
studies are not representative of market shares; rather, they merely indicate the part of the
utility unexplained by the explanatory variables (Bliemer et al. 2009).

Coefficients of income have different value and sensitivity for different income groups. The
coefficient of income for the range of less than or equal to 20000 BDT3 monthly incomes is very
significant (at more than 95% confidence level) for public and personalized public transport user
group which is intuitive for the context of the city since members of this income group generally
have no or shared access to car and well-aware of public transport availability. On the other
hand, the coefficient of income for the range of greater than or equal to 50000 BDT monthly
incomes is statistically significant for the people whose choice-set include car and does not
include bus.

The dummy term introduced for female respondents indicates that female respondents have
significantly high likelihood to consider the choice-sets which include car, CNG/taxi and
rickshaw.

The trip purpose dummy for educational and work trip exhibit significant high preference of the
user for the choice-sets which include bus and other personalized public transport for the same
reason as the less than or equal to 20000 BDT income group.

Two travel duration dummies have been introduced in the model. Both of them are highly
significant. The long trip duration dummy (trip duration more than 45 minutes) indicates that
public transports are more likely to be included in the choice-set for long trips while short trip
dummy (trip duration 15 to 30 minutes inclusive) indicates that rickshaw is more likely to be
included in the choice-set for short trips.

The coefficient of age indicates that, as hypothesized, the young commuters have a significant
preference for the choice-sets which include bus.

4. Unobserved Level-of-service (LOS)


4.1. The Problem
High quality LOS data are essential for accuracy of the estimates of the mode choice model.
Walker et al. (2007) use synthetic data to demonstrate a model with measurement error may
result in inconsistent estimates of parameters.

In the context of Dhaka, the available data from STP Household Interview Survey (HIS) has only
got the stated travel times of the respondents/travelers for the chosen mode. The travel times
of the unchosen modes and the fare of all the modes are missing in the data set.

3
1 BDT = 0.014 USD
The data has been supplemented by Hasan (2007) who has calculated the distances between
different Traffic Analysis Zones (TAZ) from network assignment using the network coded in
EMME/2 for the STP study. These distances have been used along with the assumed speeds of
different vehicles for the determination of the travel time between OD pairs by different modes.
It has been acknowledged that the resulting travel time would have some measurement errors
resulting from two major sources stated below:

1. The distances are the zone to zone distances i.e. they are not the actual distance between the
origin and destination of the traveler. Though the zone to zone distances may suffice the needs
of aggregate analysis (e.g. trip distribution and trip assignment) such assumptions may not be
sufficient for the models that are done in the disaggregate level.

2. The traffic on the streets of Dhaka is heterogeneous in nature and due to congested traffic
situation as well as chaos introduced from weak-lane disciplines, the vehicles do not necessarily
run at their free flow speeds. In fact the speeds of the vehicles are not only a function of the
mode rather it also depends on the road characteristics (e.g. widths, surface quality, traffic mix
and presence of non-motorized traffic, etc.). Since a single speed for a mode has been used for
the calculation of the travel time ignoring these sources of heterogeneity, it might have
introduced some errors.

The available data from STP House Hold Survey (HHS) on the other hand has got the stated
travel time of the respondents/travelers, but only for the chosen modes and the cost data was
totally missing in the collected HIS data. Hasan (2007) has conducted a small scale HIS to
calculate an average fare for different modes.

4.2. State-of-the-art Approaches


Brownstone et al. (2001) has developed and demonstrated a technique to overcome the
measurement errors in travel data. In this method multiple imputed values are generated for
each observation and separate choice models are estimated for each set of imputed data. But,
this multiple imputation technique is valuable when validation data are available (Ben-Akiva et
al. 2002). Later on Steimetz and Brownstone (2005) presented a similar method for
measurement error correction which also involves multiple imputations ,but the drawback of
the method is that it can only be used when one has a subsample of accurate observations.
There are other works also which focus on the improvement of level-of-service measurements
(e.g. Ortuzar and Ivelic 1987, Ortuzar and Willumsen 2001, Kim et al. 2006). The task of
measurement error correction also extends to other types of transport models (e.g., Gunn and
Whittaker 1981).

Recently, Walker et al. (2007) has proposed a method to address the measurement errors in
network derived LOS data for the work trip context of Chengdu, China. This method will be
described here in brief due to the similarity of the context.

Walker et al. have treated the true travel time as latent variable which is known only to a
distribution fT() and a set of estimated parameters ϴ i.e. fT(tTrue;ϴ). For example true travel time
may be normally distributed as tTrue ~ N(µ,σT2). The distribution of measured travel time is
conditional on the true travel time and a set of estimated parameters λ as follows:

Now the probability of choosing mode i conditional on a set of estimated parameters β and
explanatory variable tTrue can be expressed as follows:

(6)
Since tTrue is unknown it is necessary to integrate the choice probability given in equation (6)
over the distribution of tTrue as follows:

Now accounting for the tMeasured the likelihood function for the entire framework becomes:

(8)
In the above equation each latent variable adds a dimension for integration and when the latent
variables number exceeds 3, simulation must be done instead of integration.

Walker et al. (2007) in their work treated the travel time for one mode (out of five in the choice-
set) as latent variable. The fit of the model improved only slightly when compared with the logit
model but the Value of Time (VOT) increased significantly from 7.72 yuan/hour to 12.94
yuan/hour. The estimated VOT using the hybrid model was more close to the average income of
the area (15yuan/hour).

In the context of Dhaka the stated values of travel times was too small for using the imputation
method. Using complicated and integrated methods like the one proposed by Walker et al.
(2007) was also not possible due to limitations of the available software. Therefore a simpler
and tractable approach is proposed in the following section.

4.3. Proposed Methodology


In the present study the stated travel time (which is available only for the chosen alternatives) is
assumed as true value of the travel time. A relationship has been developed between the true
travel time and the travel time obtained from network assignment and assumed speed to
explore potential systematic variations between the two and determine necessary correction
factors.

It is to be mentioned that, similar measurement relationship could have been developed to


address the measurement errors for the travel fare of the respondents as well. But,
unfortunately, no stated travel fares by the respondents were available.

In order to address the measurement errors of the travel time, the data was cleaned first and
the observations with very high anomalies between zone to zone distance with the distances
stated by the travelers between their origin and destination locations were excluded. A
regression analysis was then performed in the clean data to develop a relationship among the
true and measured travel times between the OD pairs. The stated travel times by the travelers
have been considered as the true measure of the travel time while the calculated travel times
have been used as the measured travel times with errors. The proposed relationship of the true
and measured travel times can be expressed as follows:

+Є (9)
Where,
is the true value of the travel time i.e. the stated travel time by the
traveler;
is the measured value of the travel time;
α,β are the systematic components to be estimated by regression analysis;
є is the random error component and its mean has been taken to be zero
for the analysis.
In this analysis, we test the hypothesis of the presence of systematic components of error
through statistical tests. It may be noted that α indicates the fixed component of the
measurement error (if any), β indicates the systematic scale difference between the true and
the measured values (if any) and є represents the random part of the error.

4.4. Results
The estimation was started by pooling all the modes together. The regression equation obtained
from the estimation is with an adjusted R square value of
0.849. The constant was statistically insignificant with a t-statistics value of -0.880 and
consequently ignored for the second trial of the estimation. Further separate models have been
estimated for different modes to check whether the separate models differ significantly from
the pooled model or from each other.
Based on the estimated values of the seven modes have been combined into four groups
and another set of regression analysis have been done for each groups separately. The final
estimation results of the analysis are provided in the table below:

Table 3: Final estimation results of regression analysis


Modes Regression Equation Adjusted R square value
Tempo & Public Transport (bus) 0.973
Taxicab & CNG 0.929
Private Car/Microbus & Motor 0.890
Cycle
Rickshaw 0.873

The estimation results indicate that as hypothesized there are some systematic measurement
errors and the above equations can be used to correct the measurement errors of the calculated
travel times before estimating the mode choice models.
5. Prediction Tests
For testing the improvements in the model from the proposed approaches, a mode choice
model with and without the corrections were estimated using 7431 observations of the STP HIS
data. The estimated MNL model consisted choice between only four modes i.e. rickhaw, car
(private car and microbus), CNG & taxi and public transit (bus and tempo) and had generic time
and cost coefficients. Same specifications were used for both the models but the base model
was estimated without incorporating the measurement error corrections and assuming the
universal choice-set while the second model was estimated incorporating both the
measurement error correction and the probabilistic choice-set.

Table 4: Estimation results


Parameters Estimated Parameter values (t-statistics)
Base Model Proposed Model
αbus 0 0
αcar -4.59 (-46.91) -6.9981 (-24.51)
αrick 1.75 (37.72) 3.8645 (-21.07)
αcng_taxi -2.35 (-31.59) -4.4326 (-22.45)
βtraveltime -0.137 (-40.23) -0.2379 (-23.283)
βtravelcost -0.0307 (-20.25) -0.0413 (-15.401)

Here, an increase in the scale factor of the MNL model is noted since the utility function
parameters are greater in the integrated model than in the base model. This scale parameter,
unidentifiable in linear-in-parameters specifications, is inversely related to the variance of the
underlying Gumble distribution. Hence, increase in scale factor corresponds to decreases in the
variance of the stochastic component of the utility functions. (Ben Akiva and Lerman 1985).

The estimated parameters were then applied to a hold-out sample of 100 observations (not
used for estimation and the goodness-of-fit statistics were calculated in terms of log likelihoods
(LL) using the two sets of estimated parameters. The LL value was -73.90 with the corrected
model parameters whereas the value was -107.63 in case of base model showing a substantial
increase in goodness-of-fit which demonstrates the superiority of the corrected model in terms
of forecasting.

6. Conclusions
In the study effort has been made to identify the key limitations of the RP mode choice data
available for Dhaka and correcting them by developing a choice-set generation model and a
measurement error correction model (to correct the LOS variables).
The estimation results of the model parameters have got several policy implications. The
estimated values of the parameters of the choice set generation model indicate that, in general
the female travelers and the travelers with comparatively high monthly incomes are reluctant to
include public transport modes in their choice sets. Therefore in order to increase the ridership
of these groups of people, special incentives (e.g. women only buses/compartments, low floor
vehicles etc.) need to be provided.. The estimated parameters have also implied that, people
have a strong inclination to include public transport in their choice sets compared to other
modes for educational and work trips. This fact can be utilized by the policy makers by providing
some additional benefits to this group of travelers. The additional benefits may include, low fare
for the students, frequent departure of buses during morning and evening peak hours, etc.

The study however has several limitations as well. For example, instead of estimating an
integrated model with the choice-set generation, measurement error correction and actual
mode choice, the sub-models have been estimated separately. As mentioned in Section 4.3, this
was due to the limitations of the available software and will be addressed in future research.
Besides, in case of the measurement error correction, only the travel times of the modes were
corrected, similar approach will be explored in the future to correct the travel costs of different
modes. Further, in the prediction tests, a very simple model structure was used. It may be noted
that this was done only for demonstration purposes and in future research, we plan to enrich
these prediction models using more advanced model forms (e.g. nested logit, cross-nested logit,
mixed logit etc.), including more socio-economic data as well as combining SP choice scenarios
where respondents were given options to compare improved public transport modes like Bus-
rapid-transit (BRT) and Metro Rails with their current modes and select the one they perceive to
be the best. The proposed integrated model structure using a structural and measurement
equation framework is presented in Figure 01.
Characteristics Measured Values
Supplementary survey

Choice-set Attributes of
Modes

Errors Utility Errors

Revealed Choice Stated Choice

Errors
Observable variable
Unobservable variable
Behavioral Relationship
Measurement Relationship
Figure 01: Choice model with the consideration of choice-set generation model and correction
for measurement errors of the LOS values

The Figure shows that, the choice set of the respondents, attributes of modes and the utility of
different modes are not directly stated by the respondents and therefore are unobserved to the
analysts. The observed variables include the socio economic characteristics of the respondents,
measured values of the trip attributes from network analysis and the revealed and stated
choices of the respondents. The analyst has to rely on the measurement relationships among
the observed and unobserved variables to arrive to the unobserved variables. In this process
error terms are associated with the unobserved variables. However the model may take a closed
form (logit) if the errors are assumed to be independent and identically distributed (iid).

The developed models will have a huge potential to better predict the rider ship of proposed
improved urban transport initiatives in Dhaka city. It may be noted that though the
modifications proposed in this paper have been formulated with the context of Dhaka and
available data in mind, the methodologies adapted in this research can provide useful guidance
for mode choice developments in other developing countries which often face similar dearth of
data and modeling challenges.
Acknowledgements

The Stated Preference Survey data collection for this study has been supported by Bureau of
Testing Research and Consultancy, BUET and the Japanese International Cooperation Agency
(JICA). Any opinions, findings and conclusions or recommendations expressed in this publication
are those of the authors and do not necessarily reflect the views of BUET or JICA.

Refernces
1. Bari M. F. and Hasan M. (2001), “Effect of Urbanization on Storm Runoff Characteristics of
Dhaka City”, Tsinghua University Press. XXIX IAHR Congress. Beijing.
2. Ben-Akiva M, McFadden D, Train K, Walker J, Bhat C, Bierlaire M, Bolduc D, Boersch-Supan A,
Brownstone D, Bunch D, Daly A, de Palma A, Gopinath D, Karlstrom A, Munizaga A, (2002),
"Hybrid Choice Models: Progress and Challenges", Marketing Letters 13(3), 163-175.
3. Ben-Akiva, M. (1977), "Choice Models with Simple Choice-set Generation Precesses", Working
Paper, Dept, of civil Engineering, MIT, Cambridge, MA.
4. Ben-Akiva, M. and Lerman, S. (1974), “Some Estimation Results of a simultaneous model of
Auto Ownership and Mode Choice to Work”, Transportation, 4, 4, 357-376.
5. Ben-Akiva, M. and Lerman, S.R. (1985), “Discrete Choice Analysis: Theory and Application to
Travel Demand” The MIT Press, Cambridge, MA.
6. Bierlaire, M. (2003) BIOGEME: A free package for the estimation of discrete choice models,
Proceedings of the 3rd Swiss Transportation Research Conference, Ascona, Switzerland.
7. Bliemer M., Rose J. and van Blokland (2009) Experimental Design Influences on Stated Choice
Outputs: An Empirical Study in Air Travel Choice, 12th Conference of the International
Association of Travel Behavior Research, Jaipur, India.
8. Brownstone, D., Golob, T. F., and Kazimi, C. (2001), "Modeling Non-ignorable Attrition and
Measurement Error in Panel Surveys: An application to Travel Demand Modeling" , Chapter 25
in Survey Nonresponse, Editors, R.M. Groves, D. Dillman, J. L. Eltinge and R.J.A. Little, New York:
Wiley, forthcoming.
9. DITS (1993), "Greater Dhaka Metropolitan Area Integrated Transport Study" , Prepared by PPk
Consultants Declan International and Development Design Consultant (DDC), Dhaka .
10. Gaudry, M. and Dagenais M. (1979), "The Dogit Model", Trans. Res. B., 13B, 105-111.
11. Gunn HF, Whittaker JC, (1981), "Estimation errors for well-fitting gravity models", Working
Paper 149. Institute for Transport Studies, University of Leeds.
12. Habib, K. M. N. (2002), "Evaluation of Planning Options to Alliviate Traffic Congestion and
Resulting Air Pollution in Dhaka City", M.Sc. Thesis, Departmet of Civil Engineering, BUET,
Dhaka.
13. Hasan, S. (2007), "Development of a Travel Demand Model for Dhaka City", M.Sc. Thesis,
Departmet of Civil Engineering, BUET, Dhaka.
14. Heijden, R E C M and Timmermans, H J P (1984), "Modeling Choice-set Generating Processe
via Stepwise Logit Regression Procedures: Some Empericial Results", Environment and Planning
A, 16, 1249-1255.
15. Hensher, D. (1978), ““Valuation of Journey Attributes: Some Existing Empirical Evidence”, in
Determinants of Travel Choice”, D. Hensher and Q. Dalvi, eds., Saxon House.
16. Kim HK, Wu SK, M Hunger M, (2006), "A Case Study on Measuring Travel Time, Speed, and
Delay Using GPS-Instrumented Test Vehicles. Applications of Advanced Technology in
Transportation", 9th International Conference, Chicago IL.
17. Lerman, S. (1975), “A Disaggregate Behavioral Model of Urban Mobility Decisions”,
Unpublished Ph.D. Thesis, Department of Civil Engineering, Massachusetts Institute of
Technology, Cambridge, MA.
18. Manski, c. (1977), “The structure of Random Utility Models”, Theory and Decision, 8, 229-
254.
19. McFadden, D. (1976a), “The Multinomial Logit Model When the Population Contains
‘Captive’ Subpopulations”, Unpublished Memorandum, September 13, 1976.
20. McFadden, D. (1981), "Econometric Models of Probabilistic Choice. In structural Models of
Discrete Data with Econometric Applications", (Edited by C. Manski and D. Mcfadden), MIT
Press, Cambridge, MA.
21. McFadden, D. and Reid, F. (1975), “Aggregate Travel Demand Forecasting from Disaggregate
Behavioral Models”, Presented at the Annual TRB Meeting, Washington, DC.
22. Ortúzar J de D, Ivelic AM, (1987), "Effects of using more accurately measured level-of-service
variables in the specification and stability of mode choice model.", Proceeding 15th PTRC
Summer Annual Meeting, P290,117-130. PTRC, London.
23. Ortúzar J de D, Willumsen LG, (2001), "Modeling Transport", Wiley.
24. Steimetz SSC, Brownstone D (2005), "Estimating Commuters ‘Value of Time’ with Noisy Data:
a Multiple Imputation Approach", Transportation Research B 36, 865-889.
25. Stopher, P. (1980), “Captivity and Choice in Travel-Bahavior Models”, Trans. Eng. Journal of
ASCE, 106, TE4, 427-435.
26. STP (2005), "Strategic Transport Plan for Dhaka", Prepared by Louis Berger Group and
Banndladesh Consultant Ltd.
27. Swait, J(1984), " Probabilistic choice-set Generaion on Transportation Demand Models", Ph.
D. Dissertation, Massachusetts Institute of Technology, Cambridge.
28. Swait, J. and Ben-Akiva, M. (1984), "Incorporating Random Constraints in Discrete Models of
Choice-set Generation", Transp. Res. B., 21B, 91-102.
29. Swait, J. and Ben-Akiva, M. (1985a), "Constraints on Individual Travel Behavior in a Brazilian
City", Transp. Res. Record, 1085, 75-85.
30. Swait, J. and Ben-Akiva, M. (1987), "Empirical Test of a Constrained Choice Discrete Model:
Mode choice in Sao Paulo, Brazil", Transp. Res. B., 21B, 103-115.
31. Thill, Jean-Claude (1992), "Choice-set Formation for Destination choice Modelling", Progress
in Human Geography 16, 3, 361-382.
32. Train, K. (1980), “A Structured Logit Model of Auto Ownership and Mode Choice”, Review of
Economoc Studies, XLVII, 357-370.
33. Walker, J. et. al. (2007), "Travel Demand Models in the Developing World: Correcting for
Measurement Errors", TRB 2008 Annual Metting CD-ROM.
34. Wermuth, m, (1978), “Structure and Callibration of Behavioral and Attitudinal Binary Choice
Model Between Public Transport and Private Car”, Presented at PTRC Summer Annual Meeting,
10-13 July, University of Warwick, England.
35. Westin, R. (1974), “Predictions from Binary Choice Models”, J. of Econometrics, 2, 1-16.
36. Williams, H. and Ortuzar, J. (1979), “Behavioral Travel Theories, Model Specification and the
Response Error Problem”, Working Paper 116, Inst. for Transport Studies, The University of
Leeds.