You are on page 1of 58

Measuring Multi-Channel Advertising Response

Using Consumer-Level Data

Daniel Zantedeschi
Eleanor McDonnell Feit
Eric T. Bradlow
The Wharton School
University of Pennsylvania
DRAFT
PLEASE DO NOT CIRCULATE WITHOUT PERMISSION
July 6, 2014

We thank the Wharton Customer Analytics Initiative and one of its corporate partners who provided us data. WCAI sta
Melissa Hartz, Ben Adams and Ankur Roy all assisted with data preparation. We also thank Peter Danaher for helpful comments
on an earlier version of this manuscript.
Measuring Multi-Channel Advertising Response
Using Consumer-Level Data
Abstract
Advances in data collection have made it increasingly easy to collect data on consumer-level pur-
chases that are linked to those same customers advertising exposures. However, advances in adver-
tising response modeling (i.e., marketing mix modeling) have lagged behind the availability of this
granular data. Extending extant models to multi-channel consumer-level data, we develop a Bayesian
Tobit model that can be used to measure the eectiveness of advertising exposures. Building on the
traditional ad-stock framework, we are able to dierentially determine separate decay rates for each
advertising medium. This allows us to estimate channel-specic short- and long-term eects of ad-
vertising and illustrate how the model can be used to inform advertising strategies. We demonstrate
the model using data on direct marketing that includes randomized holdouts for each campaign across
multiple channels (mitigating potential endogeneity problems) and nd that catalogs have a substan-
tially longer-lasting impact on customer purchases than emails, and that there are signicant synergies
(i.e., interactions) between channels. We also illustrate how the model can be used to score and tar-
get individual customers on the basis of their advertising responsiveness, and nd that targeting the
most responsive customers substantially increases returns to advertising versus targeting the heavy
purchasers. In the conclusion, we discuss the potential for applying this model in other contexts where
individual-level advertising response data is readily available, including online display advertising and
addressable television.
Keywords advertising response, media mix, multi-channel, dynamic linear model, tobit model,
hierarchical Bayes, single-source data
1 Introduction
It is becoming increasingly common that advertising can be targeted to individual consumers; direct mar-
keting, online advertising, social media, and emerging addressable TV systems all allow advertisers to
target individual customers. A major advantage of these targeted advertising platforms is that they col-
lect consumer-level exposure data for all the advertisements they deliver, and so marketers now have the
opportunity to know not just how much they spend on each advertising platform, but also exactly which
viewers were exposed to the advertising at each point in time. When this advertising exposure data can be
linked to data on individuals purchases, a possibility that is also quickly emerging across a wide variety of
platforms arises, the long-wished-for fused consumer-level advertising and response data becomes avail-
able at a very low cost. While it has long been recognized that linking individual consumers advertising
exposures to those same consumers purchase behavior (or any other target behavior) would be helpful
in measuring advertising response (Little 1979, Tellis 2004), until recently that data was only available
through relatively expensive single-source panels. Consequently, the literature has focused on methods
that use aggregate advertising and sales data (c.f., Dekimpe and Hanssens 2000).
In light of these changes in the advertising industry, we return to the question of how to measure
advertising response using this type of consumer-level data. By modeling the advertising response at the
customer-level, we can inform our estimates of advertising response not only from temporal variation
in sales and aggregate advertising, but also by comparing subsequent purchases for customers who are
exposed to an advertisement at a particular point in time versus those who are not. Modeling at the
consumer-level helps to break some of the multicolineartiy that is often seen in aggregate advertising
spending data for dierent channels. This places our work as one of the rst academic papers to focus on
the problem practitioners now sometimes refer to as "multi-channel advertising attribution" (see also Li
and Kannan, 2014; Bollinger et al., 2013).
1
Importantly, looking at advertising response at the consumer level allows us to exploit random vari-
ation in exposures across individuals that was created, in this research, through the use of rm-initiated
randomized holdout groups. To create randomized holdout groups, the marketer randomly assigns a frac-
tion of the targeted customers to be held out from a particular campaign; i.e. the holdout group does
not receive any exposures. Because treatment is assigned randomly, a simple comparison of the purchases
made subsequent to the campaign between the treated and the holdout groups gives an estimate of causal
eect of exposure to the campaign for the targeted population. It is increasingly common for companies
practicing direct marketing to create holdout groups consisting of 5-10% of the population that was tar-
geted to receive each campaign. As we will discuss, randomized holdouts resolve some of the endogeneity
issues that arise in observational consumer-level advertising response data such as endogeneity induced
by targeting customers with a high baseline propensity to purchase (sometimes called activity bias; see
Blake, Nosko and Tadelis, 2014).
However, simple hypothesis tests for individual advertising campaigns are often underpowered due
to the small size of the holdout group and relatively small advertising eects (Lewis and Rao, 2012). In
this paper, we move beyond simple hypothesis testing and employ a hierarchical modeling framework,
which allows us to pool information over multiple campaigns to estimate both short-term response and
long-term decay rates for advertising by channel (something that experiments reported in the literature do
not do.) Because we also estimate response parameters for each customer, we can both answer marketers
questions about the ROI of costly marketing communication and help make future communications more
ecient by identifying customers who have been more responsive to advertising in the past.
Specically, we utilize an individual-level advertising response function in the tradition of the ad-
stock literature (Nerlove and Arrow 1962, Jorgenson 1966). This response function is linked to purchases
through a Tobit I structure, to accommodate the high levels of sparsity in purchase behavior. (While
we focus on purchases measured in dollars, it is straightforward to extend the model to other measures
2
of advertising response, such as quantity purchased, sign-ups, conversions, new subscriptions, etc.) The
ad-stock response function allows us to relate ad exposures to subsequent sales with just two parameters
for each channel (estimated for each individual): the rst characterizing the contemporaneous eect of
advertising and the second related to the carry-over to the next period. By allowing for dierent pairs
of parameters for each channel, we can gauge the dierential contribution of dierent forms of advertis-
ing. The model includes interactions among the dierent channels, i.e., synergies (Naik, Raman and
Winer 2005, Naik and Peters 2009), and we nd a substantial (positive) interaction eect between the two
channels that we study. Consistent with the literature on dynamic choice models (McAlister et al. 1991,
Seetharaman 2004), we also control for state dependence in outcomes and serially correlated errors in the
response. Without controlling for these latent and dynamic features of the response model, as we demon-
strate, we risk biasing the estimates of advertising response, particularly the carry-over of advertising.
Finally, using a hierarchical Bayesian framework, advertising response, interaction and state-dependence
parameters are all estimated for each consumer, allowing us to score and target consumers based on their
predicted cumulative response to an advertisement on a particular channel.
The proposed model can be applied to any data set that includes marketing response, e.g., purchases,
for each consumer in each time period, as well as the amount of advertising exposure on each channel.
As a decision support tool, the output can be used to assess which channels are most eective overall and
how advertising response plays out over time for each channel. Using a direct marketing data set, we show
that e-mails and catalogs have about the same overall advertising response, but catalog response is spread
out over a more extended period of time, which has important implications for whether and how often you
send catalogs or emails. We also show how targeting the most responsive customers can lead to substantial
increases in advertising ROI versus the common practice of targeting customers based on the volume of
their past purchases, irrespective of their prior advertising responsiveness.
While we have developed a model that can be used across a wide variety of contexts, the data we use
3
in this paper involves direct-marketing exposures and responses. There are several advantages to focusing
on this setting. Typically, direct marketers nd it easier to link exposures and purchases; in our particular
data set, the multi-channel retailer uses name and address matching to track whether those who received
catalogs and emails made purchases via any channel. But more importantly, the retailer is in complete
control of how often each customer receives a catalog or email. This makes it straightforward for the
company to execute randomized holdouts where those who are randomly selected to be treated with
a particular advertising message all receive exactly the same dose of advertising. In the conclusion
section, we discuss the challenges of applying this approach in other advertising platforms, such as online
or addressable television advertising, where customers control how many opportunities the advertiser has
to serve an ad to a particular customer.
To summarize, building on the traditional ad-stock framework, we develop a model that can be used to
exploit consumer-level advertising response data with randomized holdouts to gauge advertising response
by channel for each consumer. We apply the model to a direct marketing data set to estimate the eect of
catalog and email mailings on subsequent purchases for each customer. These estimates can be compared
with costs to determine the ROI of mailing to a particular customer. The model we propose can be applied
across the wide variety of rapidly emerging sources of consumer-level advertising response data to estimate
the eects of advertising.
1.1 Consumer-Level Advertising Response Data
We focus on a data set that was collected by a multi-channel speciality retailer and includes all the market-
ing communications that were sent to customers along with records of those same customers purchases
(in dollars) across all channels over a two-year period. Like many other retailers such as Target, Tesco,
Kroger and Macys, the retailer we worked with maintains a database of known customers so that they
4
can send those customers marketing emails and direct mail. Customers at this speciality retailer nearly
always use a credit card to transact, so that as soon as the customer makes her rst purchase, the company
has access to her name and address and can begin sending marketing materials. The retailer sends out
emails and catalogs1 that are designed to drive purchases at the website or stores, typically by informing
customers about newly-available products. Although customers have to provide their address in order to
receive communications, all of the customers in our database have done this and so our estimates of adver-
tising response are conditional on having expressed this basic level of interest in the retailer. Arguably, the
eect of marketing conditional on being mailable is what the company wants to know, since they cant
market to people who are not mailable anyway. The company has complete control over which addressable
customers are mailed and when, making this an ideal setting to execute randomized holdout groups. The
retailer does almost no mass media advertising, so the email and catalog exposures we observe represent
all the marketing communications that the customers see from the retailer.
When a customer makes a purchase, the company matches the name and address associated with the
credit card back to the customers record in the database resulting in data on advertising exposures and
purchases that is linked at the individual level. (If the customer pays at the store in cash, clerks are trained
to request an email address or phone number, which can also be used to match the purchase to a customer
record in the database.) This retailer nds that nearly 95% of customer transactions can be tracked back to
an existing customer. This system allows them to assemble something akin to single-source advertising
response data for all their customers at virtually no additional cost. Although tracking mechanisms may
vary, more and more retailers now have access to this type of CRM data either through name and address
matching, loyalty card programs, or other tracking strategies. Our model is particularly useful in this
direct-marketing context, because the company can track customers over months or years, making it more
1We treat catalogs as a form of advertising rather than a channel because the company uses catalogs as advertising that
directs customers to online or physical stores. There is no way to purchase through the catalog.
5
feasible to estimate individual customers response (and separate state dependence, heterogeneity and
serial correlation as we discuss below), and the company can re-target customers based on the model
ndings.
Our analysis focuses on a sample of 3500 North American customers who purchased at least once
over the two-year period from April 1, 2012 to March 30, 2014. Each of these customers had also made a
purchase in the year prior to the observation windowand were thus mailable on both marketing channels,
meaning that they had both physical and email addresses on le and had not requested to be put on a do
not mail list.
Figure 1 shows each customers total spending with the retailer over time. The plot has been arranged
so that customers are sorted in order of their total purchases over the two-year period. It is clear from the
plot that there is substantial variation (that we exploit for estimation) in purchases across these customers.
Figure 1: Weekly purchasing volume at the individual level shows substantial variation across customers.
Customers are sorted according to total volume of purchases.
Similarly, Figure 2 displays the email and catalog mailings for each customer over the two year period
6
(where customers are sorted in the same order as Figure 1). As can be seen from the diagonal (northwest"
to southeast") ridges in Figure 2(a), the company runs regular catalog campaigns about once per month.
The pattern is similar for email exposures, except that email campaigns are more frequent. These regular
campaigns might make it dicult to estimate an aggregate data model, since there is only moderate tem-
poral variation in advertising. However, as can be seen in Figure 2, for any given campaign there are a
substantial number of customers who do not receive catalog or email (primarily becuase they were held out
from the campaign as we discuss below) and this variation can be exploited in estimating an advertising
response model.
(a) Catalog exposures incidence (b) Email exposures incidence
Figure 2: Weekly incidence of catalog and email exposures at the individual level shows substantial vari-
ation between customers. Customers are sorted according to total volume of purchases.
To summarize the variation in exposures across customers, Table 1 describes the distribution across
customers in the total number of emails and catalogs received over the two-year period. The rst row in
the table shows that the average customer receives 125 emails, but that the amount varies widely across the
population with some customers receiving very few emails (2.5%-tile = 16) and other customers receiving
almost twice as many (97.5%-tile = 240). Catalog mailings are less frequent with the average customer
receiving 15 catalogs, but some receiving far fewer (2.5%-tile = 3) and some receiving a few more (97.5%-
7
mean 2.5%-tile 97.5%-tile
Email mailings 125 16 240
Catalog mailings 15 3 22
Purchases 6 0 31
Total spending $ 176 $ 37 $ 1005
Table 1: Direct mail data shows substantial variation in the number of marketing communications received
and the number of purchases made over a two year period.
tile = 22). Purchases and total spending similarly vary widely across customers with long right-hand tails.
There are two sources of the variation in email and catalog exposures across customers. First, for
each email and catalog campaign that the company planned, the company randomly selected 5-10% of
customers to be held out from the campaign.2 As we will discuss, the randomized holdouts produce
exogeneous variation, which we will exploit to estimate causal eects of advertising. The second reason a
customer might not be exposed to a particular campaign is because she wasnt targeted for the campaign. In
planning each campaign, the company selects a target group based on customers prior purchase behavior.
As we will discuss and illustrate with simulations, targeting based on prior observed behavior is ignorable
(Little and Rubin, 1987, Florens, Mouchart and Rolin, 1990) for the purposes of estimating our proposed
model.
In summary, this data set is ideal for estimating advertising responsiveness across multiple advertising
channels. There is substantial variation in customers exposure to advertising across both channels and this
variation is either random and exogeneous or ignorable. Having laid out our objectives and described the
data, we next conclude the introduction by placing this work within the broader literature on advertising
response.
2The company varied the holdout rate from campaign to campaign during the two-year period, as they tried to balance the
opportunity cost of not mailing against the value of increasing the statistical power of the comparison between the holdout and
treated groups. The average holdout fraction we observe is 8%.
8
1.2 Related Literature
Given the long history of advertising response models in marketing, we do not attempt to provide a com-
prehensive overview of the literature. (See Tellis 2004 and Tellis and Ambler 2007 for comprehensive
reviews.) Instead, we focus on explaining how our work relates to the literature. To frame our contribution
to advertising response modeling, we position our work at the intersection between consumer-level mod-
eling and multi-channel advertising response (see Table 2), noting the sparsity of research that combines
these two features.
Aggregate advertising response models are one of the earliest tools proposed in marketing science
and very shortly thereafter, researchers extended the models to account for advertising across multiple-
channels establishing marketing response modeling and marketing mix optimization as key tools for ad-
vertisers (see Bowman and Gatignon 2010). Research in this literature typically employs a variety of
time-series methods suitable for aggregate advertising spending and aggregate sales data and has explored
many questions about how advertising works such as whether there are long-term eects of advertis-
ing (Ataman, van Heerde and Mela 2010) and whether there are advertising synergies (i.e., interactions)
between channels (Naik and Ramam 2003, Danaher, Bonfrer and Dhar 2008, Naik and Peters 2009).
While these are just a few representative papers, there is a long literature on understanding multi-channel
advertising response with aggregate data.
Researchers as early as Little (1979) recognized that there are advantages to modeling advertising re-
sponse at the consumer-level, but were stymied by the lack of data. Research with consumer-level data
rst emerged when single-source panel data became available in the late 1980s and this work tended
to focus on just one channel: television (Pedrick and Zufryden 1991, Deighton, Henderson and Neslin
1994, Tellis 1998). Collecting single-source data for multiple channels was cost prohibitive. More re-
cently researchers have turned to using single-channel data on display advertising and online purchases
9
(Manchanda et al. 2006, Braun and Moe 2013), where advertising exposures and purchases are regularly
tracked at the cookie level. So, most of the work on advertising response with individual-level data has
focused on a single channel, largely due to data limitations.
There are only two recent papers that we are aware of that propose multi-channel advertising response
models with consumer-level data: Danaher and Dagger (2013) and Bollinger, Cohen and Jiang (2013).
Danaher and Dagger (2013) develop a clever measurement strategy for collecting multi-channel advertis-
ing exposure data by surveying consumers and asking them to recall which media channels they watched
or read and linking that back to the media plan. Because their application focused on advertising for a
clearance sale at a retailer that lasted only 26 days, they focused on same-period response to advertising
and did not address the issue of how to relate exposures to purchases in subsequent time periods as we
do here. More closely related to our work is that of Bollinger, Cohen and Jiang (2013), who also develop
a hierarchical ad-stock framework and estimate dierent decay parameters for each channel (but not for
interactions) by extending the competitive advertising model of Dub, Hitsch and Manchanda (2005).
Aggregate
Expenditures
& Sales
Consumer
Exposures
& Purchases
Single Channel
Little 1979
Broadbent 1984*
Danaher, Bonfrer and Dhar 2008
Tellis 1998
Pedrick and Zufryden 1991
Deighton, Henderson and Neslin 1994
Manchanda, Rossi and
Chintagunta 2004
Braun and Moe 2013
Multi-channel
Tellis 2004;
Raman and Naik 2006
Naik and Peters 2009
This Research
Bollinger, Cohen and Jiang 2013
Danaher and Dagger 2013
Table 2: A categorization of prior advertising response models along two key dimensions: number of
channels and level of aggregation. Our paper is among the rst investigating individual-level exposures
and purchases in a multi-channel environment.
As presented in Table 2, our work ts within the tradition of parametric advertising response modeling
10
which has emphasized statistical methods and predictive implications. The other and more recent emphasis
among those who study advertising is on establishing or evaluating the causal impact of advertising using
model-free evidence such as hypothesis testing. Lodish et al. (1995), in their comprehensive meta-analysis
of 389 advertising studies, have argued that the ideal framework for measuring eectiveness is through
the use of a randomized controlled experiment. Unfortunately, the experiments reported in the literature
have often failed to nd signicant eects because the advertising eects are small and the sample sizes
used in reported experiments are too small to detect these eects. For instance, the recent works by
Lewis and Reiley (2013), Lewis, Reiley and Rao (2014), and Lewis and Rao (2013), consider internet
advertising experiments with samples reaching into the millions which are still too small to obtain
reasonable statistical power. The latter paper points out the gap in the advertising eectiveness literature
between large scale experiments and parametric modeling. Our parametric modeling approach attempts
to ll this gap by focusing on data that includes exogneous variation through randomized holdouts but
uses statistical modeling to reduce the noise in the response variables and improve the estimation of the
eectiveness parameters; but as described next required some methodological extensions.
In particular, our work has a number of important methodological dierences with extant modeling
approaches: (i) our paper includes both a Tobit I selection model (to handle sparsity) and ad-stock error
terms; and (ii) we develop decision support tools to target channels and customers based on their predicted
overall responsiveness to advertising akin to impulse response functions common in the aggregate VAR
modeling literature (Dekimpe and Hanssens, 2000). These methodological features are complementary
to Bollinger, Cohen and Jiang (2013) who focus on the competitive implications of multiple advertising
channels by modeling consumers choice among competitors and pooling advertising response parameters
across brands within a category.
In addition to these aforementioned methodological extensions, our paper also has novelty in its em-
pirics. That is, while some large eld experiments have established signicant links between advertising
11
and demand, in aggregate (Bertrand et al., 2010), there are no published academic papers that describe
randomized advertising exposures in the multi-channel framework. However, not all of the variation in
advertising exposures in our data is completely randomized, so we use a careful modeling strategy to
obtain unbiased estimates of advertising response from the data we have. We will further discuss our
approach to endogeneity in Section 3, after we have formally laid out the model.
2 Model Development and Computation
The foundation of our model is the discrete-time exponentially decaying ad-stock model that was rst
introduced by Koyck (1954) and Jorgenson (1966). The key construct in the Koyck model is that at time t
each individual i = 1, ..., N accumulates a latent stock variable, W
ikt
, for each channel as a result of current
and previous periods exposures to advertising, X
ikt
, on that channel k:
W
ikt
=
t

j=0

j
ik
_
X
i,k,tj
+
i,k,tj
_
(1a)
=
ik
W
i,k,t1
+ X
ikt
+
ikt
(1b)
where
ik
is a consumer- and channel-specic decay parameter constrained to [0, 1) and
ikt
is a series of
demand shocks which are also consumer- and channel-specic.
The formula in (1a) presents an exponentially decaying stock variable which has a long tradition in
marketing (see the survey by Huang, Leng and Liang 2012). However there are practical diculties with
working with this representation since it extends over possibly many time periods. Fortunately, it can be
easily shown to be equivalent to the autoregressive process in (1b). We should also note that the inclusion
of the error term in equation (1b) is of signicant note, as there is some disagreement in the literature about
12
whether the shocks should be included. Dub, Hitsch and Manchanda (2005) justify the error term on the
basis that the data (in their case aggregate exposures and in our case individual-level exposures) does not
fully capture all aspects of the advertising, such as the quality of the copy. In our case, there may also be
variations in how much the customer attended to the ad during a particular exposure or how persuasive the
copy is to a particular consumer. The error term is also important in the computation of the stocks since it
allows us to represent the ad-stock model as a state-space model as we describe below.
While in the original Koyck framework the shocks,
ikt
, are assumed i.i.d, it is important to generalize
this to allow for (potential) serial correlation in the errors. As we will discuss in Section 3, this improves
the models handling of potential endogeneity caused by the use of lagged outcomes. However, this can
be computationally challenging; we describe our approach to allowing for serial correlation in the ad stock
below.
We extend the single latent stock formulation given in equation (1b) to a multi-channel environment
by introducing K stock variables, one for each channel, as in (1a):
W
i1t
=
t

j=0

j
i1
_
X
i,1,tj
+
i,1,tj
_
(2a)
W
i2t
=
t

j=0

j
i2
_
X
i,2,tj
+
i,2,tj
_
(2b)
...
W
iKt
=
t

j=0

j
iK
_
X
i,K,tj
+
i,K,tj
_
(2c)
We also include additional stocks for the interactions among the dierent channels, a topic of signi-
cant empirical interest (see Wilbur 2008, Naik, Raman and Winer 2005, Bollinger, Cohen and Jiang 2013).
For illustration, consider the interaction between exposures on channel 1 and channel 2. We can form an
13
additional stock variable with a separate decay and error term as follows:
W
i,1:2,t
=
t

j=0

j
i,1:2
_
X
i,1,tj
X
i,2,tj
+
i,1:2,tj
_
. (3)
The specication in equation (3) allows for the possibility that exposures on channel 1 and 2 simul-
taneously in the same period gives rise to a contemporaneous interaction eect given by X
i,1,t
X
i,2,t
and
a carry-over eect at lag j given by
j
i,1:2
X
i,1,tj
X
i,2,tj
. These, together with the series of error terms

i,1:2,tj
, dene the interaction stock W
i,1:2,t
. (For a similar treatment of interactions, see Bass et al. 2010,
although at the aggregate level).
We relate the ad-stocks to the observed purchases, Y
it
(measured in dollars in our application), through
a latent variable, Y

it
, for individual i at time t. We assume that the latent process, Y

it
, is related to the
ad-stock variables, including interactions, by sensitivity parameters
ik
which measure the instantaneous
eect of the stock on latent Y

it
.
Y

it
=
i
+

ik
W
ikt
+

>k

i,k:k
W
i,k:k

,t
+
it

it
N
_
0,

_
(4)
Note also the presence of the intercept,
i
in equation (4), that can account for baseline dierences between
individuals in their purchasing behavior. This consumer-level intercept also helps to avoid biases caused by
the advertiser targeting customers who are more likely to buy, because the estimated eect of advertising
measured by
ik
is above and beyond that expected given the individuals estimated baseline propensity.
We then specify a Tobit I selection process, accounting, as mentioned earlier, for sparsity in the pur-
chases, by relating the latent values, Y

it
to the observed outcomes, Y
it
:
Y
it
=
|
.
.
.
.
\
.
.
.
.
'
Y

it
if Y

it
> 0
0 if Y

it
0
(5)
14
Furthermore, we incorporate a state dependence component to account for the possibility that pur-
chasing decisions made in the previous period impact the purchasing decision in the current period, in-
dependently of advertising exposures. We leverage the same ad-stock framework to accommodate state
dependence, by adding an additional stock" of the form:
W
iSt
=
t

j=0

j
iS
_
1
_
Y
i,t1
> 0
_
+
iSt
_
(6a)
=
iS
W
i,S,t1
+ 1
_
Y
i,t1
> 0
_
+
iSt
. (6b)
where 1
_
Y
i,t1
> 0
_
is an indicator for whether customer i purchased in the prior period. The form we posit
here to allow for state dependence nests several extant models for state dependence, e.g., when
iS
= 0 in
equation 6a) we get the classic single-period period bump (i.e., rst order Markov process) for buying
a given product in the previous period. We note that many empirical studies have shown support for state
dependence (see Dub, Hitsch and Rossi, 2010) and without controlling for state dependence the eect
of advertising can be severely over-estimated (Seetharaman 2004). However, if state dependence is not
supported in a particular data set, the estimate for the eect of W
iSt
on the response variable will go to
zero.
Finally, as mentioned above, we allow for the possibility of a (rst order) serial correlation in the
errors. We implement this by creating a stock" for serial correlation:
W
iCt
=
i
W
i,C,t1
+
iCt
(7)
that is equivalent to allowing for a rst-order autocorrelation in
ikt
. With the additional stocks that
15
capture state dependence and serial autocorrelation, the resulting full-model equation for Y

it
is:
Y

it
=
i
+

ik
W
ikt
+

>k

i,k:k
W
i,k:k

,t
+
iS
W
iSt
+ W
iCt
+
it

it
N
_
0,

_
(8)
2.1 Empirical Identication
As is the tradition in the choice modeling literature, our consumer-level advertising response model in-
cludes both state dependence and serial correlation in addition to the dynamic response to advertising
exposures. We view the state dependence and serial correlation largely as controls that allow us to give
conservative estimates of advertising response and we do not place too much emphasis on the causal
interpretation of these model components. However, because simultaneously including all of these ele-
ments in the model is known to present challenges, we consider the empirical identication of these latent
constructs.
In Appendix D, we show a parameter recover study that demonstrates that the parameters for adver-
tising response, state dependence, and serial correlation, can all be recovered with data that is similar to
the data we use in our case study in both the number of respondents and duration of the data. We also nd
that the state dependence can be distinguished from the heterogeneity in consumers baseline propensity
to buy when the time series is suciently long, consistent with what is found by Andrews, Ainslie and
Currim (2008) in brand choice models. In addition to the parameter recovery reported in Appendix D, we
also nd substantial predictive dierences in model t between models that do and do not include state
dependence and serial autocorrelation (as did Hyslop (1999) in a dynamic binary choice model of labor
participation, and Seetharaman (2004) with brand choice models).
Intuitively, this identication comes from dierent observable data patterns, arising from dierent
volumes and dierent purchases/no-purchase states, in purchase behavior implied by heterogeneity versus
16
state dependence versus serial correlation in the errors. For example, if customers purchases from the
retailer regularly over the entire observation period, that suggests that the customer has a high baseline
propensity to buy, whereas if the customer is observed to purchase regularly for several months and then
stops purchasing, the data is more consistent with that customer having high state dependence. Similarly,
the eects of serial correlation in the error structure will be quite transient, while state dependence can
be quite long-lasting. While some of the identication is based on the proposed parametric forms, there
are clearly patterns in the purchase behavior that can identify advertising eects, state dependence, serial
correlation, and heterogeneity (Hyslop, 1999).
2.2 State Space Formulation
Importantly, we note that there is a substantial computational advantage to dening the interactions, the
state dependence and the serial correlation using the same form as the ad stocks since the entire vector of
stocks consisting of the K multi-channel stocks,
_
K
2
K
_
/2 interaction stocks and the state-dependence
stock can be stacked as a vector
W
it
=
_
W
i1t
, ..., W
iKt
.,,.
, W
i,1:2,t
, ..., W
i,(K1):K,t
.,,.
, W
iSt
.,,.
, W
iCt
.,,.
_

Channels + Interactions+ State Dependence+Serial Correlation


As a result, we derive a more compact representation of the dynamics in the model by simply writing:
W
it
=
i
W
i,t1
+ X
it
+
it

it
N (0,

) (9)
where the bold fonts denote the vectorized components of the multi-channel ad-stock model over channels,
interactions, state-dependence stocks and their exposures. As we show in Appendix B this representation
17
allows for an exact likelihood evaluation for each individual at each time period by virtue of an ecient
forward ltering and backward smoothing algorithm. This substantially improves the speed of the estima-
tion algorithm.
To summarize, we propose a heterogeneous model of consumer purchasing behavior where the adver-
tising response follows a Koyck structure. The model allows for separate stock-variables for each channel
and an interaction between them. The set of parameters determining the dynamics of the stock-variables
can be easily interpreted as the simultaneous eectiveness and exponential decay rate of each advertising
channel (or interaction or state dependence). To account for sparsity in the consumer-level purchase data
we have specied a Tobit I process for purchasing as a function of advertising stocks. We also incorporate
important controls for dynamic features of the response including state-dependence and serial correlation.
Both are well-founded in the theory of choice models and, as we will show in Section 4 allows us to better
identify the eectiveness of the dierent advertising channels.
2.3 Priors and Computation
We employ a Bayesian approach to accommodate individual heterogeneity in both the instantaneous and
carry-over eects (Rossi and Allenby 2003). We use conjugate Normal-Inverse Wishart distributions for
all individual-level parameters. We dene
i
as the stacked vector of
i
,
ik
,
i,k:k
, and
iS
. Similarly, we
dene
i
as the stacked vector of
ik
,
i,k:k
,
iS
and
i
. We then dene the distribution of
i
and
i
across
the population of consumers as follows:

i
NK(K+1)
2
+2
_

,

_

i
NK(K+1)
2
+2
_
,

_
where N
d
represents a multivariate normal distribution of dimension d. To ensure stationarity in the latent
ad-stock processes, we additionally constrain each 0
ik
< 1 and 1
i
< 1 via truncation and
18
rejection sampling. We put weakly informative priors on the population-level parameters and the error
terms.

NK(K+1)
2
+2
(0, 5I)

IW
_
I,
K (K + 1)
2
+ 4
_
NK(K+1)
2
+1
(0, I)

IW
_
I,
K (K + 1)
2
+ 4
_

k
IG(1, 1)

IW
_
I,
K (K + 1)
2
+ 4
_
where IW is an inverse-Wishart distribution, IG is an inverse-Gamma distribution, and I is an identity
matrix of appropriate dimension. Given the state-space representation for the latent parameters, we employ
Kalman Dynamic Linear Model relationships (West and Harrison 1997) with the modications provided in
Harvey (1993, Section 3.7.2) and Liao, Anderson and Vahid (2014) to account for the Tobit I structure. See
also Naik and Raman (2003) and Ataman, van Heerde and Mela (2010). Appendix D provides additional
details of the estimation algorithm.
2.4 Cumulative Impulse Response (CIR)
The key parameters of the model,
ik
and
ik
, dene the response of individual i to advertising on channel
k. However, because of the ad-stock formulation and the Tobit mechanism, these parameters do not
directly relate to the economic value of delivering an additional advertisement to consumer i on channel
k at time t. Instead, the cumulative impulse response, dened as the expected cumulative incremental
eect on future purchases for a one-impression impulse in X
ikt
, is a more economically meaningful
measure of the expected return from an additional exposure to consumer i on channel k. In this section,
we derive the cumulative impulse response in closed-form, demonstrating that it is easy to compute from
the estimated consumer-level parameters, and therefore can be used to categorize individuals into those
who are expected to be more or less responsive to advertising. This provides a basis for evaluating counter-
19
factual advertising strategies that we illustrate in Section 4.
First, consider the change in the expected value of Y
it
due to an increase in advertising exposures from
channel k at time t. This instantaneous marginal eect can be written as
E
t
(Y
it
)
X
ikt
= P
_
Y

it
> 0
_

E
_
Y
it
|Y

it
> 0
_
X
ikt
.,,.
I
+
_
E
_
Y
it
|Y

it
> 0
_ _ P
_
Y

it
> 0
_
X
ikt
.,,.
I I
(10)
which is referred to as the McDonald and Mott (1981) decomposition. This allows one to see that a
change in exposures on channel k aects the conditional mean of Y

it
in the positive part of the distribution
(I) and it aects the probability that the expected purchases will be non-zero (I I).
By means of simple transformations dealing with the truncated Normal distribution due to the Tobit I
structure (see Greene 2008), it follows that the marginal eect on the expected value for Y
it
is given by:
E (Y
it
)
X
ikt
=
_

i
+ V
it

_

ik
(11)
V
it
=

ik
W
ikt
+

>k

i,k:k
W
i,k:k

,t
+
iS
W
i,S,t
+ W
i,C,t
. (12)
where denotes the standard normal CDF. As can be seen in equation (12) the contemporaneous eect on
expected purchases due to advertising depends on both
ik
and on the expected activation of the individual
i at time t captured by the () function, which is time-varying due to the fact that each individuals ad-
stock varies based on his or her prior advertising exposures and shocks. Thus, given two customers with
identical
ik
and
ik
parameters, it is more benecial to advertise to the one who has had more prior
exposures and therefore a higher ad-stock, as the activation propensity will be greater.
While equation (12) gives the contemporaneous response to advertising, marketers are typically con-
cerned with the total cumulative response (CIR) that they can expect from an additional exposure to chan-
nel k at time t, i.e., the area under the impulse response curve. In the limit, by taking T and by the
20
properties of the mean of a geometric series, the closed-form for the expected CIR is given by:
CI R
ikt
=

j=0
E
_
Y
i,t+j
_
X
ikt
=
_

i
+ V
it

_ _

ik
1
ik
+
_

i
+ V
it

_

ik

iS
1
iS
_
(13)
While the eect of an impression on any individual can be forward simulated from the model, equation
(13) provides a computationally convenient way to identify customers who would be most aected by
advertising at time t. We will illustrate the use of the CIR for the generation of counter-factual policies at
the aggregated level.
It is easy to extend the reasoning to derive a closed-form expression for a shock on both channels k
and k

in the same period:


CI R
i,k:k

,t
=
_

i
+ V
it

_
(14)

_

i,k
1
i,k
+

i,k

1
i,k

i,k:k

1
i,k:k

+
_

i
+ V
it

_

i,S
_

ik
+
i,k
+
i,k:k

_
1
i,S
_
which allows us to gauge the additional contribution of the interaction terms.
3 Endogeneity in Advertising Response Models
Endogeneity in advertising studies is a well-known problem. In this section, we focus on a series of
potential concerns related to any multi-channel ad stock model (not just ours), noticing that an exhaustive
treatment of the problem in the context of advertising eectiveness is beyond the scope of the paper.
General considerations, for instance, are presented in Chintagunta, Dube and Goh (2005), Shugan (2004)
and from a Bayesian perspective in Liu, Otter and Allenby (2006); yet a brief discussion of endogeneity in
our setting is important to clarify endogeneity concerns in our model. We address potential endogeneity
21
concerns for multi-channel advertising which we categorize into two types: endogenous targeting and
simultaneity. When describing these concerns we elaborate how randomized holdouts, in the form used in
our empirical analysis, as well as our model-based controls can potentially mitigate them.
3.1 Response-based Targeting and Endogeneity
A concerning and common potential cause of endogeneity in individual-level advertising response models
is that the exposures X
it
are correlated with
i
or
i
. This could happen in a direct-marketing setting
like ours, if (a) the marketer knows something about each customers potential to buy and/or (b) their
responsiveness to advertising, and uses that information to target customers accordingly. (See Manchanda,
Chintagunta and Rossi (2004) who describe this in the context of pharmaceutical detailing).
In our data set, the company adopted the common strategy of targeting customers based on RFM
statistics; that is, they select people to target for a specic campaign based on observed past purchases, i.e.
targeting of individuals is based on Y
i,t
for t prior to the current period and not based on
i
or
i
.3 As there
is common confusion in the marketing literature, we next show that the researcher can admissibly ignore
targeting based on prior Y
it
without biasing the recovery of the posterior distribution (See also Liu, Otter
and Allenby, 2006, who describe this in the context of conjoint analysis.) We rst review the theoretical
argument; at the end of this section we present a simplied simulation that illustrates this point.
Assume that the data generating mechanism follows the individual-level advertising response model
presented in Section 2, with an additional equation determining how the exposures are set for each indi-
vidual by the rm:
X
it
G
_
Y
i,t1
, X
i,t1

i
,
i
,
i
,
it
_
(15)
3We had numerous discussions with the retailer about this point. The retailer has almost no information about the customers
other than the purchase data and therefore it would be nearly impossible for them to target based on anything else.
22
where the targeting mechanism, G, is allowed to depend on previous observed purchasing history, Y
i,t1
,
exposures, X
i,t1
and some additional targeting parameters,
it
which are assumed observed or set by the
rm. The targeting parameters may not be observed by the researcher, but this, as we show, is irrelevant
when the purpose of the analysis is to make inference about the advertising response parameters. Denot-
ing with
i
= {
i
,
i
,
i
,
2
i
}

the block of the individual-level response parameters, the individual-level


conditional posterior distribution for this extended model is given by:
f (
i
|...) =
_

i
|

_
.,,.
T
_
t=1
f
_
Y

it
|Y

i,t1
,Y
i,t1
, X
it
,
i
_
.,,.
T
_
t=1
f
_
X
it
|Y
i,t1
,
i
,
i
_
.,,.
(16a)
Prior Ad Response Likelihood Targeting Likelihood
=
_

i
|

_
T
_
t=1
f
_
Y

it
, X
it
|Y

it1
, Y
it1
, X
it
,
i
,
i
_
(16b)
where

are the hyper/population-level parameters. Ignoring the contribution of the targeting likelihood
in equation (15) leads to inference based on a approximated posterior (Mouchart and Scheihing, 2004).
We now introduce a simple theoretical argument as to when it is permissible to treat the exposures X
it
, as
if they were exogenously determined without biasing the posterior.
Consider the following denition (see Florens, Mouchart and Rolin, 1990 Ch.3):
Denition 1. [Sequential Bayesian Cut] X
it
and
i
form a sequential Bayesian cut if (a) Y
it

i
| X
it
,
it
,
(b) X
it

i
|
i
and (c)
i

i
a priori.
When a sequential Bayesian Cut holds,
T
_
t=1
f (X
it
|Y
it1
,
i
,
i
) =
T
_
t=1
f (X
it
|Y
it1
, X
it1
,
i
)
23
that is, the targeting mechanism does not (directly4) depend on
i
and

i
|X
it
,

_
=
_

i
|

_
.
that is, the prior for
i
does not depend on the previous and current exposures. Consequently, a form of
posterior conditional independence applies:
f (
i
|...) =
_

i
|

_
T
_
t=1
f
_
Y

it
|Y

it1
, X
it
,
i
,
2

it
_

T
_
t=1
G (Y
it1
, X
it1
,
i
) .
In other words, the posterior distribution of
i
is conditionally independent of the targeting mechanism G
and it is therefore admissibly ignorable.
The denition above has important implications: targeting rules that are based on prior observables and
and where the prior structure of the response parameters,
i
, is independent of the targeting parameters,

i
, allow for a (sequential) Bayesian cut. This means that the targeting equation can be ignored for the
purposes of inference.
Importantly, notice that G can be virtually any (measurable) function of previous purchases Y
i,t1
and
exposures X
i,t1
. The previous advertising exposures and purchases are observed at time t and therefore do
not prevent the sequential cut between
i
and X
it
. In the dataset considered in this work, as mentioned
earlier, the targeting rules are consistent with the Bayesian cut.
Violations of the conditions presented in Denition 1, lead to scenarios where the targeting mechanism
is not ignorable; this could happen, for example, if targeting is based on future expectations, or if the
targeting is a direct function of the unobserved advertising eectiveness parameters as it is often the case
in online advertising. This could happen if there is a correlation between X
it
and
i
or
i
either because
4The targeting mechanism depends indirectly on
i
through Y
it 1
and X
it 1
. However this is not problematic in the
Bayesian approach as the likelihood function at time t is built sequentially by conditioning on the information set available at
time t 1.
24
the advertiser is targeting customers with high
i
and
i
or because the customer has some control over
their propensity to be exposed and the likelihood of exposure is correlated with the propensity to buy. The
latter, as mentioned earlier, is sometimes referred as is activity bias in display advertising, where users
(cookies) who are more active on the internet are also more likely to be exposed to an advertisement and
more likely to make an online purchase simply because they are more active users (see Lewis and Rao,
2012). It can also occur in traditional media where customers who tend to watch the content on particular
channel (e.g., Animal Planet) are both more likely to be exposed to a television ad for related products
(e.g., pet food) and more likely to purchase those products. While neither of these potential problems are
at play in our direct marketing data set, a possible solution is the one adopted by Manchanda, Chintagunta
and Rossi (2005) where the targeting mechanism G is empirically included in the model and the inference
on is based on the joint likelihood function of the response model and the targeting mechanism as in
formula (16a-c).5
3.2 Simultaneity in Advertising Response Models
A standard and common form of endogeneity in advertising response models, which we will refer to as
simultaneity, occurs when X
ikt
is correlated with
ikt
. The canonical example of this is the positive bias
in advertising response produced by the combination of the advertiser advertising more in periods of peak
demand and the modeler failing to account for these observed-to-the-advertiser shocks in the model.6 In
theory, it is also possible that a correlation occurs because the advertiser is targeting particular channels or
even individual customers based on some knowledge that
ikt
will be particularly high for some i or some k
5Note that in their empirical application Manchanda, Chintagunta and Rossi (2005) do not nd that making inference based
on both the targeting mechanism and the advertising response function substantially changes the posterior means, suggesting
that, the bias caused by ignoring targeting based on advertising repsonsiveness is not large in their data set. However, they do
nd that that the posterior is substantially narrower.
6Note that in the data set we use we dont see any seasonal variation in advertising, suggesting they arent heavying up
advertising during periods of peak demand. For this reason we believe that there is less chance that there is simultaneity in our
data.
25
in some period t. A general solution to this problem is to include randomized holdouts; the cross-sectional
exogeneous variation produced by the holdouts moderates the correlation between
ikt
and X
ikt
. As we
will show in the following simulated data example, this helps to identify the
ik
parameters independently
of the carry-over and the state-dependence or serial correlation components.
3.3 Illustration With a Simplied Example
For the purpose of illustrating the theoretical arguments presented in the previous subsections, and without
loss of generality, consider the following simplied model where we abstract away from (a) the Tobit
structure (which accommodates sparsity in the purchases) and (b) the multi-channel framework. Consider
the simplied ad-stock model:
Y

it
=
i
+
i
Y

i,t1
+
i
X
it
+
it
(17a)

it
=
i

i,t1
+
it
(17b)
X
it
= G
_
Y

i,t1
,
i
_
(17c)
where the
i
proxies for a general form of state-dependence in the latent space and
i
drives serial cor-
relation. In model (17) we have both sources of endogeneity discussed above. In this model, there is
obviously response-based targeting following G(), but more subtly there is simultaneity that is induced
by the presence of serial correlation combined with the lag dependence on Y

it
generating correlation be-
tween the error term
it
and the covariates. This is not immediate to see, but it is easy to show by noting
26
that:
E
_

it
Y

it
_
= X
it

i
E (
it
) +
i
E
_

it
Y
i,t1
_
+ var (
it
)
E
_
Y

it1

it
_
=

i
1
i

i
var (
it
)
E
_

it
|Y

it1
_
0
unless
i
= 0, i.e., there is no serial autocorrelation.
We generate data according to this model consistent with values of the parameters that allow for iden-
tication of and (see Maddala, 1987); details on the simulations are provided in the Appendix C. The
specic targeting mechanism we used to generate the data mimics the one described by the retailer who
provided data to us where they select customers to receive a particular campaign based on their ranking
based on the previous 12 months sales. After generating this data which includes targeting, we estimate
the model using the likelihood that does not include the targeting mechanism, i.e., the model dened by
equation (17 a & b). As the upper-left section of Table 3 shows, when we estimate our proposed model we
accurately recover the population-level parameters even when this endogenous targeting is present, due to
the ignorability of the targeting mechanism. The lower-left section of the table shows parameter recovery
when we add randomized holdouts, which is largely the same, indicating that randomized holdouts have
no role to play in preventing biasing due to targeting based on prior purchases, since there is no bias with-
out randomized holdouts. Thus, the results in rst column of Table 3 illustrate the Bayesian Cut argument
in Denition 1.
In the right side of Table 3, we show the results obtained when we estimate a model that assumes that
serial correlation is zero. When we do this, we are performing a non-admissible reduction by ignoring
a part of the likelihood in the response equation. Hence, the argument in Denition (1) does not hold and
we generate a loss of information in the likelihood leading to biased posterior means.
27
Without Randomized Holdouts
Parameter True Proposed Model Ignoring Serial Correlation
Post. Mean 2.5%-tile 97.5%-tile Post. Mean 2.5%-tile 97.5%-tile
-2 -2.031 -2.202 -1.830 -1.278 -1.440 -1.139
0.33 0.329 0.257 0.400 0.602 0.532 0.671

1 0.9810 0.559 1.345 0.843 0.523 1.153

0.5 0.5106 0.433 0.583 NA NA NA


With 10% Randomized Holdouts
Parameter True Proposed Model Ignoring Serial Correlation
Post. Mean 2.5%-tile 97.5%-tile Post. Mean 2.5%-tile 97.5%-tile
-2 -2.055 -2.225 -1.885 -1.331 -1.471 -1.182
0.33 0.334 0.256 0.414 0.587 0.520 0.658

1 1.067 0.725 1.428 1.034 0.695 1.364

0.5 0.502 0.413 0.577 NA NA NA


With 50% Randomized Holdouts
Parameter True Proposed Model Ignoring Serial Correlation
Post. Mean 2.5%-tile 97.5%-tile Post. Mean 2.5%-tile 97.5%-tile
-2 -2.008 -2.166 -1.846 -1.416 -1.541 -1.290
0.33 0.330 0.251 0.399 0.571 0.494 0.653

1 1.042 0.714 1.351 1.004 0.683 1.332

0.5 0.503 0.418 0.580 NA NA NA


Table 3: Synthetic data parameter recovery study illustrating potential causes of endogeneity in advertising
response models
This is born out in the simulation study; we get a clear bias in the estimated parameters due to simul-
taneity induced by ignoring the serial correlation (Table 3, upper-right section); the model underestimates
instantaneous advertising response and over-estimates the decay parameter, suggesting that the model
confuses the serial correlation for a delayed eect of advertising. Importantly, and reecting the choice
of our direct mailing data set here where advertising is "controllable" by the rm, the lower panels in
Table 3 show that if randomized holdouts are included in the data generating mechanism, even in small
percentages, the additional exogeneous variation makes it is possible to recover the correct

, but the bias
in remains. Thus the randomized holdouts allow us to obtain an unbiased estimate of the same-period
eects of advertising, but note the bias in the carry over remains. This suggests that simultaneity biases
can be eectively mitigated, but not completely resolved, by randomized holdouts. Thus, it is critical that
28
serial correlation is also included in the model specication to avoid potential biases in , but randomized
holdouts provide a safeguard against bias in

.
In summary, we believe, that our dual-pronged framework to address potential endogeneity concerns
by combining a Bayesian modeling approach with state dependence and serial correlation as well as the
the use of randomized holdouts provides a safeguard against endogeneity produced both by targeting and
by simultaneity.
4 Empirical Analysis: Direct Marketing Case Study
4.1 Nested Model Comparison
To understand the importance of the various features of the model presented in Section 2 and provide
evidence that the full specication we propose is necessary, we t a series of nested models to a random
sample of 300 customers from our data set. In addition to estimating the model proposed in Section 2
(model 4), we estimated a model with common stock for the advertising channels, interactions, and state-
dependence (model 1) 7, our proposed model without serial correlation (model 2), and our proposed model
without state dependence (model 3). To estimate each model we ran a Metropolis within Gibbs sampler
for 100,000 draws with a burn-in of 20,000 draws. All chains were converged according to the Raftery
and Lewis (1996) test implemented in the CODA package.
As suggested in West and Harrison (1999, p. 393-394) we evaluated the model t based on a com-
bination of in-sample and out-of-sample posterior predictive checks8 To assess in-sample t, we use the
7This is derived, by virtue of the Koyck (1952) transformation, by setting all the relevant carry-over parameters to be equal.
8Alternative approaches to model selection are based on entropy measures such as the Deviance Information Criterion (DIC)
of Gelfand and Ghosh (1998). However, when dealing with hierarchical models with several latent variables, as noted by Shirley
et al. (2010), model selection based on indicators such as the DIC can lead to contrasting results. See also the discussion in
Duan, McAlister and Sinha (2011) in the context of choice models with cross-brand pass-through eects. For this reason we
rely on classic in-and out-of-sample procedures in the spirit of West and Harrison (1997).
29
root mean squared error between the predicted and actual purchases for each individual in each week as
well as the average log-likelihood for each observation. To assess out-of-sample t, we use the root mean
square error of the one step ahead forecast which conditions on the amount of advertising exposure. These
are dened as
RMSE =
N
_
i=1
_
T
_
t=1

(

Y
it
Y
it
)
2
T
_
N
RFME =
N
_
i=1
_
(

Y
F
i,T+1
Y
i,T+1
)
2
N
where the predicted values,

Y
i,t
are for the in-sample data and

Y
F
i,T+1
for out-of-sample are obtained for each
individual from the posterior predictive distribution of the individual-level parameters, consistent with our
Bayesian framework. We also compute the harmonic estimator of the integrated likelihood (Gelman et
al. 2013) and the log-likelihood for each model as a way to complement the aggregated individual-level
in-sample t with an aggregated measure. Table 4 reports the in- and out-of-sample model ts across all
Model 1 Model 2 Model 3 Model 4
Metric\Features Common Decay
Proposed Model w/o
Serial Correlation
Proposed Model w/o
State Dependence
Proposed
Model
LL -17655 -16593 -16929 -15606
HAR 2868 3354 4164 6045
RMSE 1.45 1.33 1.28 1.10
RMFE 1.82 1.65 1.47 1.34
Table 4: Model comparison based on in- and out-of-sample goodness-of-t measures across models with
dierent components. LL represents the total log-likelihood, HAR the harmonic estimator of integrated
likelihood, RMSE the averaged root mean squared error, RMFE the averaged root mean forecast error.
The model with the best tting model according to each measure is highlighted in boldface.
models. Comparing model 1 to the other models, we nd there is substantial support for allowing for
multiple decay rates: one for each channel and the interaction. Models 2-4, which all include multiple
30
stocks, dominate model 1, with a single stock, and the full model 4 reduces the in-sample RMSE by 25%
and the out-of-sample RMFE by 30% over model 1. Second, comparing models 2 and 3 to model 4, we
nd substantial improvements in in- and out-of-sample t when we include both state dependence and
serial correlation in the model.
To provide an exploratory analysis that there was no targeting based
i
or
i
in this data set (which
might induce a bias), we compared the indivdiual level estimated posterior means (
i
and

i
) of the
response parameters with the average level of exposures, for a given individual (

X
i
) observed in the data.
We failed to detect any signicant relationship (via regression, p-value > 0.10 or inspection of the bi-
variate scatterplots) that could potentially indicate targeting based on
i
(as in Manchanda, Rossi and
Chintagunta, 2004) or a correlation between
i
and X
it
. With this added assurance (above the retailers
assertion that they only target based on past purchases), we now turn to interpring the estimated model
parameters.
Table 5 shows the estimated parameters for the four models. Focusing rst on the full proposed model,
the estimated parameters related to instantaneous eectiveness of advertising,

k
, suggest that email com-
munications are instantaneously more eective than catalogs (Bayesian p-value < 0.01). There is also
weak evidence of potential interaction eects between the two channels. The dierent levels of instanta-
neous eectiveness in combination with a signicant interaction eect may be explained (albeit it is some-
what speculative and an open area for eld experimentation) as the email advertisings ability to stimulate
and initiate information search which is then completed using information presented in the catalog (see
also Dinner, Van Heerde and Neslin 2014).
The carry-over coecients,
k
, also present an interesting pattern. In all the models considered, cat-
alog has a signicantly higher carry over than email (Bayesian p-value < 0.01). This suggests that while
email is more eective at increasing same-week sales, a catalog exposure has a more long-lasting eect.
These parameters indicate that 90% of the email advertising eects dissipate on average in one week while
31
catalogs are more long-lasting with their eects dissipating in about three to four weeks. Interestingly, on
average the company mails catalogs approximately once per month which suggests the brands marketers
have an intuitive sense of this carry-over.
Turning to the eects of state dependence and serial correlation in Model 4, we nd that the state-
dependence eects are positive and comparable with, but somewhat smaller than the advertising eects,
which is consistent with other studies such as Seetharaman (2004). Across all four models, we also nd
that, on average, there is a weak negative serial correlation. Although we dont report it in the table, we
also nd that there is substantial heterogeneity in the serial correlation coecients. Across, individuals we
nd that the posterior means for the range from -0.70 to 0.60 indicating that unobserved shocks generate
dierent persistency patterns at the individual level.
4.1.1 The Role of State-Dependence and Serial Correlation in the Aggregate
Figure 5 graphically illustrates the role of state dependence in the model dynamics by comparing the
predicted lift from an impulse of one catalog exposure combined with one email exposure in the same
week for the full model (model 4) versus model 2 which does not include state-dependence. As Figure 5
shows, the shape of the response function is substantially dierent between the two models. The model
with no state dependence predicts a response that peaks in the week of the impulse (week 0) and is mono-
tonically decreasing in subsequent weeks. The model with state dependence predicts a peak response in
the rst week after the impulse due to the large state-dependence eects which begins in week 1. Fur-
thermore, as can be seen visually in Figure 5, which shows the cumulative impulse response, the overall
predicted response due to the combination of the advertising and the state dependence is also substantially
higher. Thus, the model without state dependence may under-predict the overall (long-term) response to
an impulse in advertising. The better t statistics in Table 4 for models 3 and 4 reect the fact that the
response with state dependence may better represent the shape of the response observed in the data (albeit
32
Model 1 Model 2 Model 3 Model 4
Parameter Common Decay
Proposed Model
w/o Serial Correlaton
Proposed Model
w/o State Dependence
Complete
Model

1.71
(3.45, 1.03)
2.83
(4.32, 1.31)
2.85
(4.35, 1.16)
2.74
(4.37, 1.04)

CAT
0.64
(0.03, 1.27)
0.55
(0.25, 0.85)
0.35
(0.05, 0.65)
0.40
(0.05, 0.75)

EM
0.87
(0.19, 1.55)
1.30
(0.68, 2.05)
0.82
(0.34, 1.30)
0.92
(0.34, 1.50)

k : k

0.33
(0.10, 0.55)
0.26
(0.20, 0.45)
0.15
(0.02, 0.31)
0.21
(0.02, 0.41)

S
0.35
(0.07, 0.57)
0.28
(0.25, 0.53)
NA
0.22
(0.02, 0.46)

CAT
0.09
(0.01, 0.19)
0.25
(0.10, 0.40)
0.29
(0.02, 0.56)
0.24
(0.01, 0.52)

EM
0.09
(0.01, 0.19)
0.10
(0.01, 0.18)
0.12
(0.01, 0.26)
0.10
(0.01, 0.21)

k : k

0.09
(0.01, 0.19)
0.15
(0.01, 0.31)
0.17
(0.01, 0.33)
0.18
(0.01, 0.35)

S
0.09
(0.01, 0.19)
0.20
(0.02, 0.42)
NA
0.17
(0.02, 0.36)

C
0.11
(0.73, 0.47)
NA
0.18
(0.88, 0.43)
0.12
(0.80, 0.51)
Table 5: Population-level parameter estimates for our proposed model along with three nested models.
this is conditional on the specic parametric assumption for the exponentially weighted decay response
function.) Further testing on the dierent functional forms for advertising response, including response
functions that allow for the advertising response to peak even later than the subsequent week would be a
fruitful area for future research.
4.1.2 Comparing the Dynamic Response to Dierent Advertising Channels
To illustrate the models dynamic predictions about how the response to advertising plays out over time,
we plot predicted impulse response curves in Figure 4. The plot shows the impulse response for sending
one email to all 300 customers versus one catalog to the same 300. Note that the full predicted impulse
33
0 4 9 14 19
0
50
100
150
200
250
300
350
Aggregate Impulse Response Curve with and without StateDependence
Weeks from Impulse
Incremental
Spending ($)


With StateDependence
Without StateDependence
Figure 3: Comparison of predicted impulse response functions for models with and without state depen-
dence. The plot depicts the predicted lift in sales for one additional email and catalog exposure for 300
customers.
response includes both the direct contribution of the advertising and the indirect contribution of state
dependence. The large estimated state dependence eect for our brand induces a peak response in week 1
for both channels. As was suggested by the estimated population parameters, emails are instantaneously
more eective, as can be seen by the predicted increase in sales in week zero, which is much higher for
the email impulse. However, the eect of catalogs dominates in weeks 1 and beyond and is larger overall.
As noted above, we nd an interaction between catalog and email and Figure 5 shows the predicted
impulse response curve for a simultaneous exposure to catalog and email for all 300 customers. The chart
breaks down the overall response into that directly attributable to the catalog and email, and that driven
by the interaction and the state dependence. In week zero, the same-week response is about $200, with a
modest contribution from the catalog, a larger contribution from email and a fairly large contribution due
34
0 4 9 14 19
0
50
100
150
200
250
Aggregated Impulse Response Curve (CIR) for either one additional exposure on Catalog or Email
Weeks from Impulse
Incremental
Spending ($)


+1 EMAIL
+1 CATALOG
Figure 4: Comparison between aggregated cumulative impulse response curve for a one-impression shock
to all customers on either email or catalog.
to the interaction between email and catalog. However, in week 1, the large eect of state dependence
kicks in and we see that state dependence accounts for about 60% of the total response. That is, the model
suggests that a major eect of advertising comes through the advertising inducing an initial purchase
and that purchase then induces the customer to be more likely to buy in subsequent periods due to state
dependence.
The impulse response functions we present here are illustrative of how advertisers can use the model to
decompose their observed advertising response and attribute credit to each channel for the observed lift
in sales. This is one of the major benets to advertisers of using the model to analyze their past advertising
and predict which channels exceed their costs and bring the largest return. We next turn to the other major
potential use for the model: targeting advertising to individual customers.
35
1 4 9 14 19 24 13 15 17 19
Weeks from Impulse
Incremental
Spending ($)
Differential Contributions of the Different AdStocks to the Aggregate Impulse Response Curve


CATALOG
EMAIL
INTERACTION
STATE DEPENDENCE
SERIAL CORRELATION
Figure 5: Comparison of the dierential eect of each component in the model on the aggregated impulse
response curve for an increase in exposures on both email and catalog for the full model. Note: the legend
on the Y-axis has been omitted per request of the corporate sponsor providing the data.
4.1.3 Targeting Individual Customers
While the population-level parameters give us a sense of the overall eectiveness of each channel, the
model also gives us information about the responsiveness of each customer through the
ik
and
ik
pa-
rameters. Figure 6 plots posterior means of these parameters for the individuals in our dataset. Panel (A)
suggests that there is a positive correlation between instantaneous eectiveness parameters of catalogs and
email. In Panel (B) and (C) we nd a positive a posteriori relationship between
ik
and
ik
for both cata-
log and email. That is, individuals with higher initial response to advertising seem to have systematically
more carryover for both channels. However, there is not a signicant correlation between
ik
for email
and the
ik
for catalog (Panel D). Most importantly, we nd that there is a great deal of heterogeneity
between customers in their advertising response particularly for catalog, which opens up the opportunity
36
to target or classify catalog advertising (which is relatively expensive) to the most responsive customers.
1 0.5 0 0.5 1 1.5 2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Beta for Catalog
Beta for Email
Panel A: Comparison Between Contemporaneous
Effects for Different Channels
1 0.5 0 0.5 1 1.5 2
0.1
0.15
0.2
0.25
0.3
0.35
Beta for Catalog
Rho for Catalog
Panel B: Comparison Between Contemporaneous and
Carryover Effects for Catalogs
0 0.2 0.4 0.6 0.8 1
0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Beta for Email
Rho for Email
Panel C: Comparison Between Contemporaneous and
Carryover Effects for Email
0.1 0.15 0.2 0.25 0.3
0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Rho for Catalog
Rho for Email
Panel D: Comparison Between Carryover
Effects for different Channels
Figure 6: Comparison of posterior mean of the eectiveness and carry-over parameters across individuals
for dierent channels.
In Table 6, we illustrate the economic advantages of targeting the most responsive customers. (We
report this for two dierent points in time, since the predicted CIR for a customer will vary over time
depending on the customers exposure to advertising and associated ad stock.) In the rst row, we report
the average model-predicted CIR across all customers; this represents the per-customer lift in sales that is
obtained by exposing every individual to one additional email and catalog in the same week. The values in
parentheses represent the variation across the population in the estimated CIR. As shown in the rst row
of Table 6, the expected response from sending a catalog and and email simultaneously is about $0.07, but
37
ranges from about $0.01 to about $0.34 per customer across customers. In the second row of Table 6 we
show the results of targeting the top 10% of customers based on their spending in the prior year, a targeting
strategy that was used by the company (and is common in practice.) This approach substantially raises the
average CIR for the targeted customers to about $0.25 per customer. However, it is even more eective
to target customers who have shown responsiveness to advertising as measured by their model-estimated
parameters and associated individual CIR. In the last row of Table 6, we show average per customer
response that can be achieved when targeting the most responsive 10% of customers. Table 6 shows that
the response increased by almost 50% to about $0.32-0.37. Thus, there are clear economic advantages to
using the proposed model to score customers for their responsiveness based on the CIR and then using
that score to target advertising. (We should note that there is an associated disadvantage. Targeting in this
way would raise a potential endogeneity bias were the model to be estimated from data where customers
had been targeted in this way. This bias could potentially be mitigated by using randomized holdouts or
randomly selecting a segment of the non-targeted population to expose to create exogeneous variation.)
at week 50 at week 101
No Targeting
(all customers)
0.080
(0.017, 0.344)
0.077
(0.011, 0.321)
Targeting based on prior
years spending (top 10%)
0.256
(0.188, 0.501)
0.288
(0.210, 0.489)
Targeting based on
predicted CIR (top 10%)
0.321
(0.216, 0.448)
0.369
(0.299, 0.510)
Table 6: Per customer cumulative impulse response for various customer targeting strategies. For each
entry we report the average marginal response for the targeted group and the 2.5th and 97.5th quantiles
across the targeted group.
38
5 Discussion and Model Extensions
In the previous sections, we have presented a model which can be applied broadly to dierent types of
data to gauge consumers response to dierent advertising channels and target individuals based on their
responsiveness to advertising in a particular channel. In developing the model, we have focused on the
features that are common across many data sets and, in particular, the direct marketing data set described
in the previous section. In this section, we discuss several model extensions that can be used to tailor the
approach to other contexts. These approaches include (a) incorporating saturation eects (b) modeling
competitive advertising eects and spillovers between brands and (c) incorporating alternative forms of
state dependence. Our aim is to provide advice to those looking to implement our model, and so we
discuss when these extensions might be most important in practice and propose several implementation
choices that can mitigate the need for these extensions.
5.1 Additional Modeling Features
Saturation Eects. In our proposed model, consumers response to advertising is a linear function of the
number of advertising exposures. In aggregate models, researchers have found evidence of saturation and
wear-in/out eects which are usually introduced using some non-linear function of the actual exposures
(Bass et al. 2007, Dube, Hitsh and Manchanda 2005). It is certainly possible to extend this idea to the
consumer-level, estimating saturation functions for each individual. For example, we could allow for a
diminishing response to X
ikt
with just one additional parameter as follows:
W
ikt
=
it
W
i,t1
+ X

ik
ikt
+
ikt
(18)
39
where 0
ik
1. While conceptually simple, this would be computational burdensome requiring non-
linear ltering techniques beyond the standard DLM lters employed in our estimation algorithm. Going
well beyond this simple formulation, which can only account for within-period saturation, one might
also consider employing a wear-out/restoration framework, like that of Braun and Moe (2013). However,
we note that if data sets that are very sparse in X
ikt
, as is common, this may result in extremely weak
identication of saturation or wear-out eects.
Competitive Advertising. As discussed above, we have not considered competitive advertising eects
because advertisers often do not have access to this data. (In our data set, there is no way the retailer could
gain access to data on which competitors emails or catalogs their customers have received.) However,
if data on competitive advertising is available, as is the case with single-source panel data, it could be
readily incorporated into our model by adding a cross-eect term to the ad-stock equation to control for
the competitors advertising:
W
ikt
=
i
W
it1
+ X
ikt
+
k

X
ikt
+
ikt
(19)
where

X
ikt
refers to the advertising exposures on the same channel for the competitive brand and
k
represents the advertising spillover from another brand. A more comprehensive, yet computationally
burdensome approach would be to simultaneously estimate stocks for each brand and model consumers
choice among brands (c.f., Bollinger, Cohen and Jiang 2013). These extensions addressing competitive
eects are left for future research, again not because they are conceptually dicult, but rather because
these are specic to certain data structures which are not considered in this work.
Alternative Forms of State-Dependence. In this work, we have adopted a form of state dependence that
allows for prior periods incidence of purchase (1
_
Y
i,t1
> 0
_
) to aect future purchases with an exponen-
tial decay rate that is estimated. Our approach nests the common form of state dependence where only
prior period purchase incidence can aect next period sales. One could extend the framework to allow for
40
other forms of state dependence such as habit formation where there is a carry-over eect of Y

it
or a
brand loyalty variable as in Guadagni and Little (1983). Since our focus is in measuring ad response, we
leave the comparison of these dierent forms of state dependence to future research and, from a practical
point of view, we have noted that dierent forms of state-dependence may not be observationally distin-
guishable in an individual-level model with sparse data. See Table 7 for a classication of dierent eects
that can be recast in the stochastic ad-stock framework presented in this work.
X
S, t
Interpretation Reference
1
{Y
t 1
>0}
Structural State-Dependence Heckman (1981a,b) and Jeuland (1979)
Y

t 1
Habit Formation Heckman (1981a,b) and Seetharaman (2004)
1
{Y
t 1
>0}
Y

t 1
Hybrid Habit-State Dependence
(1 ) 1
{Y
t 1
>0}
Brand Loyalty Guadagni and Little (1983)
1
{Y
t 1
>0}
Y

t 1
Stockpiling Seetharaman (2004)
1
{Y
t 1
>0}
Negative State-Dependence McAlister et al. (1991)
Table 7: Sources of possible state-dependence in the multi-channel advertising framework. Notation: an
indicator function over the set denes the amount purchased by the individual in the previous period.
5.2 Application to Other Targeted Advertising Platforms
While we have applied our proposed method in a direct marketing context, the model that we propose
could be applied to data from other advertising platforms with some modications and adjustments. The
key enablers of our approach are (1) the ability to record advertising exposures at the individual level,
(2) the potential to link those exposures to purchases for the same customers and (3) a way to execute
randomized holdouts (generate exogenous advertising variation), which typically requires some way for
the rm to address" or target advertising to particular customers who are randomly selected to be held
out. In this section, we briey describe two popular advertising platforms that have these features - digital
advertising and addressable television advertising - and describe the modications to our method that
would be required for application to these other platforms.
41
5.2.1 Digital Advertising
Digital advertisers have pioneered the practice of tracking advertising exposures at the individual level.
Driven largely by the structure of online environments, which deliver content directly to individual de-
vices, online advertisers found it very natural to track each advertising exposure to a specic user. Today,
nearly all the major web-based advertising platforms, e.g., Google DoubleClick, track each device that
they serve advertisements to with a cookie and record advertising exposures at the cookie-level and pro-
vide this data to the advertiser, e.g., through the DoubleClick Dynamic Advertising Reporting & Tracking
(DART) system.9 For businesses where purchases are also made online, the data on advertising exposures
can easily be linked via the cookie to purchases that are tracked as part of the clickstream data collected
on the advertisers online store. So, for advertisers that advertise exclusively online, and where all trans-
actions occur online, it is relatively easy to assemble linked consumer-level advertising and response data.
Manchanda et al. (2006) and Braun and Moe (2013) were among early researchers to model advertis-
ing response using online data. While there are some challenges largely due to users choosing to delete
cookies, measurement of exposures and conversions is well established in the online space.
On mobile advertising platforms, the tracking technology is dierent, but exposures and purchases
can be easily be tracked to the mobile device id (which can not be deleted like a cookie). The major
challenge in measurement today is tracking customers across the many devices that they now use, and
several major internet companies and are working on ways to do this, primarily by getting the users to
identify themselves across their devices by logging in. Some companies are even using name-matching
technologies to track oine purchases for customers exposed to digital advertising. So, while there are
many complexities in tracking digital advertising exposures as new digital platforms emerge and linking
that to data on customers online and oine purchases, there are many opportunities to do this in practice,
9In fact, online advertising is often priced according to the number of consumer impressions delivered, so individual expo-
sures must be tracked.
42
making digital advertising an ideal platform for using models to measure advertising response.
As we have discussed, a critical component of our approach is using randomized holdouts to generate
the exogenous variation required to obtain unbiased estimates of advertising response. Hypothetically,
it should be very easy for online advertisers to execute randomized holdouts, since advertisements are
served at the cookie level, however the practice is not common 10 due to several challenges in executing
randomized holdouts.
First, as we discussed earlier, online advertisers are not completely in control of when and (especially)
howmuch each user is exposed to an ad. Users visit websites when they want to visit and so the opportunity
to advertise to a particular cookie is under control of the user. New cookies can show up at any time, so
a mechanism for executing randomized holdouts would have to be able to assign cookies to holdout or
treatment on-the-y and would have to address whether a user who has been exposed once should be
exposed again if another opportunity arises. If the number of exposures is not controlled, then there is
the potential to re-introduce a activity bias among those who are treated, which could induce a bias in
the model estimates. Second, digital advertising platforms will typically choose which ads to show each
cookie based on based on how that cookie is likely to respond, in a way that will maximize revenue for the
platform. It can be challenging to get the platform to randomly select which cookies receive a particular
campaign since this goal is in conict with their prot motives. Finally, even if the advertising platform
will assign treatment and control groups randomly, the control group also needs to be served some ad and
so one has to consider what that ad is. If the control group ends up being exposed to more competitor ads,
we may see a dierence between treatment and control that is caused by the competitor ads and not the
target ads. The standard solution to this is to pay for and serve a non-competitive ad (e.g., a public service
announcement as described in Lewis and Rao 2013), yet there seems to be little appetite for this among
10Berman (2013) proposes a mechanism for executing randomized holdouts across multiple online publishers, but we have
been unable to nd an example of a digital campaign with randomized holdouts in practice despite discussions with several
major online advertising platforms and digital advertising agencies.
43
advertisers due the the added expense. Add these three challenges to the overall complexity of the digital
advertising ecosystem - with publishers, major ad platforms, ad resellers, ad agencies and advertisers who
would all have to coordinate - and it becomes very dicult to execute randomized holdouts on digital
platforms. Yet, the opportunity to truly understand the returns to advertising remain enticing. We hope
that this paper illustrates the utility of randomized holdouts and can serve as a roadmap for those who want
to overcome these challenges.
5.2.2 Addressable Television Advertising
Television is expensive advertising medium, and widely thought to be the most powerful, and so many
advertisers are anxious to get accurate estimates of the eect of tv advertising on sales. While traditional
television does not track advertising exposures, cable systems are rapidly installing newer set-top box
systems that allow cable providers to track and potentially target advertising exposures to individual cable
subscribers (c.f. Kent and Schweidel 2011). One can easily foresee a future where this data is provided
back to their advertisers, just as is common practice in digital advertising today. It will also be relatively
easy for cable companies to oer randomized holdouts to their advertisers where the assignment of treat-
ment and control will be at the household (cable subscriber) level. Cable companies will still need to
address the issue that viewers control when they watch tv and, therefore, the opportunities to advertise
to a particular household, and a randomized holdout scheme will need to address this issue, but we are
optimistic that randomized holdouts will soon be feasible on modern cable systems. Note that it would
be nearly impossible to execute randomized holdouts using a single-source panel where the tracking of
household-level of advertising exposures is divorced from the targeting process.
Since it isnt (yet) possible to make a purchase via television (other than on-demand video content),
ad exposure data will need to be linked to purchase data from other channels, but this will be facilitated
by the fact that cable companies have the name and address for each household. For advertisers that sell
44
directly to customers, such as retailers, it will be easy to match data on tv ad exposures to purchases.11 For
CPG companies, whos products are not purchased directly, companies like Tivo Research Analytics are
working to match set-top box data from multiple cable and satellite providers with purchase data collected
through the loyalty card programs of major retailers (TRA 2013), making linked exposure and purchase
data available to those advertisers. Credit card companies are also a potential source of purchase data that
could be matched (by name and address) to cable viewing data, albeit only at the level of the retailer and
not the specic product. While these tracking systems are still emerging and will vary among advertisers,
it will soon be possible for many major television advertisers to regularly collect consumer-level data on
exposures and purchases and use randomized holdouts to add exogeneous variation to this data.
6 Conclusion
In this paper, we have developed an integrated approach to measuring advertising eectiveness with
consumer-level advertising response data. Our model accounts for (i) multiple channels and their interac-
tions (ii) dynamic advertising eectiveness (iii) sparsity in the response variable (iv) heterogeneity across
individuals (v) state-dependence eects, and (vi) serial correlation in the errors. The model is grounded in
the traditional ad-stock literature, but we have employed a state-space formulation allowing for ecient
and scalable computation of the latent stock variables.
Our intent was to develop a tool that would be useful to practitioners across many contexts, and so
our specication is intentionally simple. As we discuss, our model should be applied (preferably) to
data with some form of exogeneous variation, such as randomized holdouts. This, as we have discussed,
is rapidly becoming available for many dierent types of advertising. We have focused on using the
data to identify how much lift is produced by each advertising channel and which consumers are most
11Privacy regulations are evolving and vary regionally, but will likely require that this matching be done through a third party
in such a way that the advertiser will only have access to an anonymized version of the linked data.
45
responsive to advertising. However, we recognize that the model could be extended in many dierent
directions depending on what data is available and which decisions the advertiser wants to focus on.
For example, if the data included information about the specic advertising copy, one could estimate
the eects of individual advertising creatives (Braun and Moe 2013) or even the interaction between the
copy and the advertising channel. One could develop models that allow advertisers to combine data
that is observed at dierent scales, e.g. weekly direct mail exposures and daily online advertising. If
there was more information about customers browsing behavior, one could develop more complex models
of how advertising aects customers as they move through the purchase funnel (Abhishek, Fader and
Hosanagar, 2012, Li and Kannan, 2014). With sucient variation in the advertising exposures, one could
also allow for non-contemporaneous interactions between advertising channels. Bollinger, Cohen and
Jiang (2013) for instance introduce a novel way to parametrize interactions symmetrically across stocks
and exposures. Moreover, with detailed single-source panels comprising dierent brands and competitors,
one could extend the model presented in this work in the spirit of Danaher and Dagger (2008) to provide
individual-level estimates of cross-competitor (brands) elasticities. What we have attempted to show here
is that whichever way data evolves, the framework developed in this work is a exible basis for a decision
support tool to manage multi-channel advertising.
46
References
[1] K. H. Abhishek, K. Hosanagar, and P. Fader. Aggregation bias in sponsored search data: The curse
and the cure. Working paper available at http://ssrn.com/abstract=1490169, CMU, 2012.
[2] R. L. Andrews, A. Ainsle, and I. S. Currim. On the recovery of choice behaviors with random coef-
cients choice models in the context of limited data and unobserved eects. Management Science,
54(1):8399, 2008.
[3] B. Ataman, H. J. van Heerde, and C. F. Mela. The long-term eect of marketing strategy on brand
sales. Journal of Marketing Research, 47(5):866882, 2010.
[4] F. Bass, N. Bruce, S. Majumdar, and B. Murthi. Wearout eects of dierent advertising themes: A
dynamic bayesian model of the advertising-sales relationship. Marketing Science, 26(2):179195,
2007.
[5] R. Berman. Beyond the last touch: Attribution in online advertising. Working paper available at
http://ron-berman.com/papers/attribution.pdf, Berkeley-Haas, 2013.
[6] M. Bertrand, D. Karlan, S. Mullainathan, E. Shar, and Zinman J. Whats advertising content worth?
evidence from a consumer credit marketing eld experiment. Quarterly Journal of Economics, pages
263306, 2010.
[7] T. Blake, C. Nosko, and S. Tadelis. Consumer heterogeneity and paid search eectiveness: A
large scale eld experiment. Working paper available at http://faculty.haas.berkeley.edu/
stadelis/tadelis.pdf, 2014.
[8] B. K. Bollinger, M. A. Cohen, and L. Jiang. Measuring asymmetric persistence and interaction eects
of media across platforms. Working paper available at http://ssrn.com/abstract=2342349,
NYU, 2013.
[9] D. Bowman and H. Gatignon. Market response and marketing mix models. Number 14. Now Pub-
lishers, 2010.
[10] M. Braun and W. Moe. Online display advertising: Modeling the eects of multiple creatives and
individual impression histories. Marketing Science, 32(5):753767, 2013.
[11] S. Broadbent. Modeling with ad-stock. Journal of the Market Research Society, 26:295312, 1984.
[12] R. E. Bucklin and C. Sismeiro. Click here for internet insights: Advances in clickstream data analy-
sis. Journal of Interactive Marketing, Volume, 23(1):3548, 2009.
[13] S. Chib. Bayes inference in the tobit censored regression model. Journal of Econometrics, 51:7999,
1992.
[14] P. Danaher, A. A. Bonfrer, and S. S. Dhar. The eect of competitive advertising interference on sales
for packaged goods. Journal of Marketing Research, 45(2):21125, 2008.
[15] P. J. Danaher and T. S. Dagger. Comparing the relative eectiveness of advertising channels: A case
study of a multimedia blitz campaign. Journal of Marketing Research, 50(4):517534, 2013.
47
[16] R. Davidson and J.G. Mackinnon. Estimation and Inference in Econometrics. Oxford University
Press, 1993.
[17] J. Deighton, C. M. Henderson, and S. A. Neslin. The eects of advertising on brand switching and
repeat purchasing. Journal of Marketing Research, 31:2843, 1994.
[18] M. G. Dekimpe and D. M. Hanssens. Time-series models in marketing: Past, present and future.
International Journal of Research in Marketing, (17):183193, 2000.
[19] F. X. Diebold. Elements of Forecasting. South-Western Publishing, 2006.
[20] I. M. Dinner, H. J. Van Heerde, and S. Neslin. Driving online and oine sales: The cross-channels
eects of digital versus traditional advertising. Journal of Marketing Research, 2014. Forthcoming.
[21] J. A. Duan, L. McAlister, and S. Sinha. Reexamining bayesian-model-comparison evidence of cross-
brand pass-through. Marketing Science, 30(3):550561, 2011.
[22] J. P. Dub, G. Hitsch, and P. Manchanda. An empirical model of advertising dynamics. Quantitative
Marketing and Economics, 3(2):107, 2005.
[23] J. P. Dub, G. Hitsch, and P. E. Rossi. State dependence and alternative explanations for consumer
inertia. RAND Journal of Economics, 41(3):417445, 2010.
[24] R. F. Engle, D. F. Hendry, and J. F. Richard. Exogeneity. Econometrica, 2(51):277304, 1983.
[25] J. F. Florens, M. Mouchart, and J. M. Rolin. Elements of Bayesian Statistics. Chapman and Hall, 1st
edition, 1990.
[26] A. Gelfand and S. K. Ghosh. Model choice: A minimum posterior predictive loss approach.
Biometrika, 85(1):111, 1998.
[27] A Gelman, J. B. Carlin, H. S. Stern, A. Vehtari, and D. B. Rubin. Bayesian Data Analysis. Chapman
and Hall/CRC, 3rd edition, 2013.
[28] W. Greene. Econometric Analysis. Pearson Prentice Hall, Upper Saddle River, 6th edition, 2008.
[29] P. M. Guadagni and J. D. C. Little. A logit model of brand choice calibrated on scanner data.
Marketing Science, Volume, 2(3):203238, 1983.
[30] A. C. Harvey. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge Uni-
versity Press, 1991.
[31] J. J. Heckman. Heterogeneity and state dependence. In Studies in Labor Markets, University of
Chicago Press, pages 91140, 1981.
[32] J. J. Heckman. Statistical models for discrete choice data. In C. Manski, D. McFadden, eds, Struc-
tural Analysis of Discrete Data with Applications, MIT Press, Cambridge, MA, pages 179195,
1981.
[33] J. Huang, M. Leng, and L. Liang. Recent developments in dynamic advertising research. European
Journal of Operations Research, Volume, 220(3):591609, August 2012.
48
[34] D. Hyslop. State dependence, serial correlation and heterogeneity in intertemporal labor force par-
ticipation of married women. Econometrica, 6(67):12551294, 1999.
[35] D. W. Jorgenson. Rational distributed lags. Econometrica, 32:135149, 1966.
[36] R. J. Kent and D. A. Schweidel. Introducing the ad ecg: How the settop box tracks the lifeline of
television. Journal of Advertising Research, 51(4):586593, 2011.
[37] L. M. Koyck. Distributed lags and investment analysis. North-Holland, Amsterdam, 1954.
[38] R. A. Lewis and J. M. Rao. On the near impossibility of measuring the returns to advertising. Work-
ing Paper available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2367103,
2013.
[39] R. A. Lewis, J. M. Rao, and D. H. Reiley. Measuring the Eects of Advertising: The Digital Frontier.
NBER Press, 2014. Forthcoming.
[40] R. A. Lewis and D. H. Reiley. Online advertising and oine sales: Measuring the eects of retail
advertising via a controlled experiment on yahoo! Unpublished manuscript, 2013.
[41] H. Li and P. K. Kannan. Attribution modeling: understanding the inuence of channels in the online
purchase funnel. Journal of Marketing Research, 2014. Forthcoming.
[42] Y. Liao, H. M. Anderson, and F. Vahid. Multivariate dynamic tobit model. Working pa-
per available at https://editorialexpress.com/cgi-bin/conference/download.cgi?db_
name=ESAMACE2014&paper_id=404, 2014.
[43] J. D. C. Little. Aggregate advertising model: The state of the art. Operations Research, 27(4):629,
1979.
[44] R. J. A. Little and D. Rubin. Statistical Analysis with Missing Data. New York: John Wiley, 1987.
[45] L. Lodish, M. Abraham, S. Kalmenson, J. Livelsberger, B. Lubetkin, B. Richardson, and M. Stevens.
How tv advertising works: A meta-analysis of 389 real world split cable tv advertising experiments.
Journal of Marketing Research, 32(2):125139, 1995.
[46] G. S. Maddala. Limited Dependent and Qualitative Variables. 1982.
[47] G. S. Maddala. Limited dependent variable models using panel data. The Journal of Human Re-
sources, 22(3):307338, 1982.
[48] P. Manchanda, J. P. Dub, K. Y. Goh, and P. Chintagunta. The eect of banner advertising on internet
purchasing. Journal of Marketing Research, 43(1):98108, 2006.
[49] P. Manchanda, P. E. Rossi, and P. K. Chintagunta. Response modeling with non-random marketing
mix variables. Journal of Marketing Research, 41(3):467478, 2004.
[50] L. McAlister, R. Srivastava, J. Horowitz, M. Jones, W. Kamakura, J. Kulchitsky, B. Ratchford,
G. Russel, F. Sultan, and T. Yai. Incorporating choice dynamics in models of consumer behavior.
Marketing Letters, 2(3):241252, 1991.
49
[51] J. F. McDonald and R. A. Mott. The uses of tobit analysis. The Review of Economics and Statistics,
62(2):318321, 1981.
[52] M. Mouchart and E. Scheihing. Bayesian evaluation of non-admissible conditioning. Journal of
Econometrics, 2(123), 2004.
[53] P. A. Naik and K. Peters. A hierarchical marketing communications model of online and oine
media synergies. Journal of Interactive Marketing, 23(4):288299, 2009.
[54] P. A. Naik and K. Raman. Understanding the impact of synergy in multimedia communications.
Journal of Marketing Research, 40(3):375388, 2003.
[55] P. A. Naik, K. Raman, and R. S. Winer. Planning marketing-mix strategies in presence of interactions.
Marketing Science, 24(1):2534, 2005.
[56] M. Nerlove and K. J. Arrow. Optimal advertising policy under dynamic conditions. Economica, New
Series, 29(114):129142, 1962.
[57] J. H. Pedrick and F. S. Zufryden. Evaluating the impact of advertising media plans: A model of
consumer purchase dynamics using singlesource data. Marketing Science, Volume, 2(10), May
1991.
[58] A. E. Raftery and S. M. Lewis. Implementing markov chain monte carlo. In Markov chain monte
carlo in practice (W.R. Gilks, D.J. Spiegelhalter and S. Richardson, eds.), pages 115130. Chapman
and Hall, London, 1996.
[59] P. E. Rossi. Bayesian Non- and Semi-parametric Methods and Applications. Princeton University
Press, Tinbergen Institute, 2014.
[60] P. E. Rossi and G. M. Allenby. Bayesian statistics and marketing. Marketing Science, 22(3):304328,
2003.
[61] P. E. Rossi, G. M. Allenby, and R. McCulloch. Bayesian Statistics and Marketing. John Wiley and
Sons, New York, 2005.
[62] D. A. Schweidel and G. Knox. Incorporating direct marketing activity into latent attrition model.
Marketing Science, 32(3):471487, 2013.
[63] P. B. Seetharaman. Modeling multiple sources of state dependence in random utility models: A
distributed lag approach. Marketing Science, 23(2):263271, 2004.
[64] N. Shepard. Partial non-gaussian state space models. Biometrika, 1(81):115131, 1994.
[65] K. Shirley, D. Small, K. Lynch, S. Maisto, and D. Oslin. Hidden markov models for alcoholism
treatment trial data. Annals of Applied Statistics, 4:366395, 2010.
[66] S. M. Shugan. Endogeneity in marketing decision models. Marketing Science, 23(1):13, 2004.
[67] G. J. Tellis. Advertising and Sales promotion strategy. Reading, MA, 1998.
50
[68] G. J. Tellis. Eective Advertising: Understanding when, how, and why advertising works. Sage
Publications, Thousand Oaks, 2004.
[69] G. J. Tellis and T. Ambler. Handbook of Advertising. Sage Publications, London, UK, 2007.
[70] M. West and J. Harrison. Bayesian forecasting and dynamic models. SpringerVerlag, NewYork,
2nd edition, 1999.
[71] K. C. Wilbur. Atwo-sided, empirical model of television advertising and viewing markets. Marketing
Science, 27(3):356378, 2008.
51
A Background Material: Conditional Expectations for the Left Trun-
cated Normal Distribution
Consider a normally distribution random variable Y with mean and variance
2
. The corresponding rst
two moments for the left-truncated (at zero) distribution of Y , denoted by the

superscript, are given by:

= E (Y|Y < 0) =
()
()
(20a)

2
= Var (Y|Y < 0) =
2
_
i
_
1
()
()

_
()
()
_
2
_
+
_
(20b)
where =

. These formulas allow for a direct changes of variable that can be used to derive a set
of second-order ecient sucient statistics for the Kalman-DLM relationships when the measurement
equation is truncated at zero. This permits us to eciently compute the latent stocks without having to
sample the latent variable with a computationally costly data augmentation step as in Chib (1993).
These derivations have been employed in a series of papers such as Harvey (1991 Section 3.7.2) and
Shepard (1994) in the context of partial non-linear state-space models, such as the dynamic Tobit, and
more recently by Liao, Anderson and Vahid (2014) who present a multivariate extension that can be used
to estimate dynamic higher order Tobit/Probit models.
B Estimation Algorithm
We drawsamples fromthe posterior of all parameters using a Gibbs sampler, which includes a combination
of closed-form draws, Metropolis-Hastings draws and a DLM ltering procedure. After initialization,
parameters are drawn in four blocks:
1. Sample W
ikt
for each individual using modied second order ecient DLM relationships accounting
for the truncation at 0.
2. Sample
i
,
i
,
i
,
2

,
2

for each individual using Metropolis Hastings updates.


3. Sample

,

, , and

using standard conjugate draws (c.f. Rossi, Allenby and McCulloch 2003
pp 72-73)
We provide details of the initialization and the initialization, steps 1 and 2 below.
B.1 Initialization
In order to initialize the relevant parameters in the model we estimate a pooled model, i.e., a model with
common parameters for all individuals, where the latent stocks for each period are constructed explicitly
based on the ten prior periods. Specically, we estimate the model:
Y

t
= +
K

k=1

k
9

j=0

j 1
k
X
tj,k
+
t
(21)
Y

t
= Y
t
if Y
t
> 0, Y
t

< 0 if Y
t
= 0
52
where the bold fonts denote the vectorized observations stacked individual by individual. (For clarity,
we suppress the notation for the interaction terms and state dependence.) We obtain estimates

k
,
k
and
and

by constrained maximum likelihood. These MLE estimates together with the predicted

Y

t
< 0 are
used used to initialized the sampler.
The latent stocks W
ik0
are initialized at the steady-state implied by the maximum likelihood esti-
mates. Namely, for each individual denote by

X
ik
the in-sample mean of the number of exposures for
channel k and individual i. Then we initialize W
ik0
=

X
ik
/ (1
k
). Note that these starting values for the
stocks are updated in each pass of the Gibbs sampler by means of smoothing densities derived using the
DLM relationships in step 2 of the algorithm. Finally,

is initialized at the identity matrix.


B.2 Step 1. Update of the Latent Stocks
Recall the vectorized system of equations determining the dynamics of the stocks in the latent space:
Y

t
= + W
t
+
t
W
t
= W
t1
+ X
t
+
t
(We suppress the index for the individual, i, for clarity.) The DLM relationships determine a system
of recursive densities based on a set of sucient statistics for the predicted (and corrected) mean and
variance of the stocks: W
t
N
_

t
Wt
, V
t
Wt
_
, where
t
W
t
, V
t
W
t
represent the set of sucient statistics based
on all the information available up to time t. We also denote with a subscript t + 1, the predicted or
estimated sucient statistics as traditional in ltering studies (West and Harrison, 1997). These statistical
summaries are derived in two steps commonly referred as the Forward Filtering algorithm (FF). For time
t = 1, ...,T 1
Forward Filtering when Y
t+1
is positive (nothing changes with respect to classic Kalman DLM rela-
tionships):

t+1
W
t
=
t
W
t
+ X
t
V
t+1
W
t
= V
t
W
t

+
2

k
t+1
= V
t+1
W
t

_
V
t+1
W
t
+
2

_
1

t+1
W
t+1
=
t+1
W
t
+ k
t+1
_
Y

t+1

t+1
W
t
_
V
t+1
W
t+1
= V
t+1
W
t
k
t+1
_
V
t+1
W
t
+
2

_
k

t+1
Forward Filtering when Y
t+1
is zero (using second order sucient statistics)

t+1
W
t
=
t
W
t
+ X
t
V
t+1
W
t
= V
t
W
t

+
2

53
k
t+1
= V
t+1
W
t

_
V
t+1
W
t
+
2

_
1

t+1
W
t+1
=
t+1
W
t
+ k
t+1
_
E
t
_
Y

t+1
|Y

t+1
< 0
_

t+1
W
t
_
V
t+1
W
t+1
= V
t+1
W
t
k
t+1
_
V
t+1
W
t
+
2

Var
t
_
Y

t+1
|Y

t+1
< 0
__
k

t+1
Note that the corresponding conditional expectations under truncation can be easily computed using
the formulas in (20) with the predictive moments
t+1
W
t
and V
t+1
W
t
.
Similarly, once time T is reached, it is possible to smooth the densities back in time. Importantly,
this provides a way to estimate the best (in mean squared sense) prediction of the initial conditions of
the stock equations: these updates are also characterized by a set of sucient statistics denoted as
T
W
it
and V
T
W
it
where the T superscript points out that the smoothing update is based on all the information in the
sample and t in this case runs from T 1 to 0. Thus from time T consider the most recent set of sucient
statistics
T
W
T
, V
T
W
T
from the forward ltering step and move backwards in time to obtain:
Backward Sampling (BS)
g
t
= V
t
W
t

_
V
t+1
W
t
_
1

T
W
t
=
t
W
t
+ g
t
_

T
W
t+1

t
W
t
_
V
T
W
t
= V
t
W
t
g
t
_
V
t+1
W
t
V
T
W
t+1
_
g

t
A detailed treatment of the derivations leading to the above can be found in West and Harrison (1997)
and similarly in Bass et al. (2007).
Armed with the set of sucient statistics derived from the FF and BS steps,
T
W
it
, V
T
W
it
we can then
sample the latent stocks for each individual:
W
t
N
_

T
W
t
, V
T
W
t
_
.
To provide a simple illustration of the eect of the modied sucient statistics consider Figure 7
where we generated 100 data points with a level of censoring/sparsity equivalent to our median customer.
It is possible to notice the good recovery of the unobserved signal (at the sucient statistics) even in the
truncated side of the yaxis.
B.3 Step 2. Update the Individual-level Parameters
We draw the individual-level parameters using a Metropolis algorithm which uses candidate sampling
distributions that are customized to the unit-level likelihoods. We use a fractional likelihood approach
as in Rossi, Allenby and McCulloch (2005 pp 135) to set the proposal density for each individual:
L

i
= L
i


L

(22)
where

L

is the pooled likelihood described in the initialization step. The weight is set to T/ (2 N) so
that it does not dominate the unit-level likelihood. The pooled likelihood has the purpose of regularizing
the likelihood for the units that due to the high sparsity do not have a local maximum. We can use
54
0 10 20 30 40 50 60 70 80 90 100
20
15
10
5
0
5
10
15
Illustration of the Modified KalmanDLM Relationshps under Truncation
Weeks


Uncensored Data
Truncated Data (Used for the Estimation)
Recovered Data at the Modified Sufficient Statistics
Figure 7: Recovery of the latent purchasing process, under a sparse response.
the maximum and Hessian of this likelihood to construct a proposal for each individual as follows: let

i
be the set of individual-level parameters that maximizes the likelihood in equation (22) and let

V
i
=


2
L

i
i
i
i =

i
. These can then be combined with the priors presented in Section 2.3 to form a Metropolis
proposal distribution. The update for the individual parameters then uses a standard M-H update with an
additional rejection step to ensure that the
i
and
i
coecients are sampled from the stationary region as
in Chib (1993).
Finally, we note that the individual level Tobit I likelihood is potentially invariant to sign transforma-
tions. Specically, for any ( , W) there is a sign transformation (, W) providing the same value of
the likelihood. This will manifest itself while ltering W
it
using the DLM relationship described above, by
means of reected paths at zero. In practice we have veried that the situation arises when the exposures
X
ikt
are sparse. To overcome the potential unidentiability due to sign transformation, we suggest either
post-processing the draws as in Rossi, Allenby and McCulloch (2005, Ch. 4) or (equivalently) restricting
the s over the positive real line for those individual having sparse exposures.
C Details on the Simulation in Section 3.3
We generated data for N = 300 individuals with a level of sparsity in response comparable to the one of the
real dataset. Importantly, the value of the population parameters is set to allow for the identication of both
55
state-dependence, parametrized by the lagged-dependent variable, and serial correlation (see Maddala,
1987, for a comprehensive discussion of the parametric restrictions allowing for a separation of state-
dependence and serial correlation). The targeting mechanism can be described by the following algorithms
that, as stated in the main text, mimics the one adopted by the rm:
Algorithm 1 Endogenous Targeting, G
it
, with randomizing probabilities p
1
and p
2
1: For each individual i, at time t, compute ranking, R
it
, based on Y
it
in ascending order.
2: Set X
it+1
R
it
/ N with probability 1 p
1
or X
it+1
p
2
(with probability p
1
).
In the randomized generation we set p
1
= (0, 0.1, 0.2, 0.5) while keeping p
2
= 0.1 xed. The rst case
p
1
= 0 corresponds to the limiting case of targeting with no-randomized variation.
Also, the serial correlation parameters
i
s are generated with a high value of persistence compared to
the carry-over coecient, on average 0.5 vs. 0.33, so to exacerbate potential endogeneity biases and render
the recovery of the unobserved persistent heterogeneity
i
more dicult (as 1 the two components
become indistinguishable and the model non-stationary).
D Empirical Identication: Parameter Recovery
Table 8 illustrates the recovery of the parameters at the population-level for a simulated example in which
state-dependence and serial correlation both have positive carry-over coecients. This scenario can be
considered as pessimistic, as, in principle, positive serial correlation and state-dependence should render
the recovery of the parameters more dicult. In detail, we generate 104 observations for 300 customers
according to the population parameters presented in the second column of table 8. The amount of sparsity,
i.e., the fraction of time periods in which the customer does not buy, is similar to the one presented in the
empirical analysis.
Pop. Parameter True Post Mean Post q2.5 Post q97.5
2 1.98 3.15 0.73

CAT
1 1.03 0.70 1.35

EM
2 1.92 1.80 2.05

k : k

0.5 0.52 0.44 0.60

S
1.5 1.63 1.12 2.14

CAT
0.66 0.65 0.51 0.79

EM
0.4 0.43 0.32 0.55

k : k

0.2 0.21 0.05 0.37



S
0.5 0.52 0.40 0.64

C
0.8 0.77 0.63 0.90
Table 8: Parameter recovery for the population parameters in the complete model specication.
56

You might also like