An Introduction To Predictive Customer Lifetime Value Modeling

29/06/2018 An Introduction to Predictive Customer Lifetime Value Modeling
An Introduction to Predictive
Customer Lifetime Value
Modeling
Jean-Rene Gauthier 02.27.17
How can you predict the value of a customer over the course of his or her
interactions with your business? That's a question many companies are trying
to answer, and it was the subject of my Feb. 28 webcast on O’Reilly Media.
Customer lifetime value (CLV) is the “discounted value of future profits

generated by a customer." The word "profits" here includes costs and revenue
estimates, as both metrics are very important in estimating true CLV; however,
the focus of many CLV models is on the revenue side. The reason for this is that
revenue is more difficult to forecast than cost, so a model is more necessary to
predict it (and knowing the revenue a customer will generate can inform your
spend on that customer). These types of models are often called "customer
equity models."
https://www.datascience.com/blog/intro-to-predictive-modeling-for-customer-lifetime-value 1/17
Customers can generate revenue for a company in many different ways.

Obviously, a customer who is making direct purchases certainly increases his or
her lifetime value. In addition, referrals from that customer, indirect marketing,
and word-of-mouth effects ultimately contribute to the value of a customer.
Referrals are very important, and there’s nothing a company likes more than a
“Like” on Facebook or a share on LinkedIn, for example.
Accounting for these network effects can be challenging at first, which is why,
for the sake of simplicity, I will be focusing on direct purchases only in this post.
Historical Cu
stomer
Lifetime
Value
There are many methodologies that deal with the portion of CLV associated
with direct purchases, but the two most broad classes are generally defined as
historical and predictive CLV. Historical methods look at past data and make a
judgment on the value of customers solely based on past transactions, without
any attempt to predict what those customers will do next.
In principle, this is a valid approach if the customers behave similarly and have
been interacting with the company for roughly the same amount of time.
However, there’s generally a fair amount of heterogeneity among customers.
The chart below shows a few purchasing trajectories to illustrate my point.
Time goes from left to right. The vertical dashed line represents the present
time, and each small, vertical line represents an order/purchase made by a
customer:

Typical historical approaches will apply a recency of last purchase criterion to
distinguish between active and inactive users. Average past purchase behavior
is employed to measure the relative (or in some cases, absolute) value of
customers.
However, there are several problems with such methodologies. For example,
the first customer in the chart above has made more purchases than the
second customer, but in fact, the first customer is more likely to be
inactive than the second one. Value based on past averages would claim that
the first customer is more valuable — yet the second customer is still active and
could make many more purchases in the future. Methods that account for
variation in the behavior of customers will allow us to arrive at more accurate
conclusions about customer lifetime and purchase behavior.
Predictive
Customer
Lifetime
Value
The goal of predictive CLV is to model the purchasing behavior of customers in
order to infer what their future actions will be. Whether a predictive CLV model
and methodology makes sense for your use case will largely be determined by
the business context. For the purpose of this post, business context is defined
along two dimensions: non-contractual vs. contractual business settings, and
continuous vs. discrete purchase opportunities. This context definition should
cover the vast majority of business cases. Below, I have included a table
highlighting the differences between these contexts:

Below are some examples of business cases belonging to each one of the four
quadrants. CLV models for fitness clubs or insurance policies will differ from
the ones targeting grocery purchases, for example:
Probabilistic
Models For
The Non-
Contractual
And
Continuous
Purchase
Setting
Perhaps the most common business context is the non-contractual one, in
which the purchase opportunity is continuous. A large number of
probabilistic models have been built to address the challenges of modeling
lifetime value in such a context. These types of models have been used now for
several decades. They are applicable to a wide variety of business situations
and, in many cases, are your “go-to” models. Probabilistic models are definitely
a good first step (and sometimes the only one!) toward CLV modeling.
Machine learning and Markov models are also worthy approaches to CLV
modeling, but they need to be tweaked and sometimes customized to fit the
particulars of a business situation. In the few case studies comparing the
outcome of these different models, probabilistic approaches and machine
learning models tend to produce results that are of a similar quality.
Different
Probabilistic
Models, But
Similar
Modeling
Frameworks
Let’s take a closer look at probabilistic models. There are several different
flavors of probabilistic models out there; however, they all tend to share a
similar modeling framework. In this framework, CLV models are often
constraining the same three latent (unobserved) parameters characterizing
customers behavior:
Lifetime: the period over which a customer is maintaining his or her

relationship with the company
Purchase rate: this parameter corresponds to the number of purchases a

customer will make over a given period of time
Monetary value: this part of the model is concerned with assigning a

dollar amount to each future transaction
In the non-contractual setting, these parameters are unobserved. Probabilistic

models will help us constrain these parameters at the customer level and make
inferences about future purchases and value.
The
Pareto/NBD
Model: A
Good First
Step Toward
CLV
Modeling
The Pareto/NBD model is perhaps the most well-known and frequently applied
probabilistic model in the non-contractual context. I created the chart below to
illustrate how the model works:

The Pareto/NBD portion is on the left side of the chart in the dashed rectangle.
Pareto/NBD only focuses on the purchase count and lifetime. It does not
address the monetary value component. There are a few models out there that
address monetary value; I've chosen the Gamma Gamma extension to the
Pareto/NBD model (as seen in the chart above).
The Pareto/NBD model makes the following assumptions regarding the

customer population:
Purchase count follows a Poisson distribution with rate λ. In other words,

the timing of these purchases is somewhat random, but the rate (in
counts/unit time) is constant. In turn, this implies that the inter-purchase
time at the customer level should follow an exponential distribution.
Lifetime distribution follows an exponential distribution with slope μ. The

expectation value of such distribution is 1/μ and corresponds to the
lifetime of the user.
The latent parameters λ and μ are constrained by two prior gamma

distributions representing our belief of how these latent parameters are
distributed among the population of customers. These two gamma
distributions have parameters (r,α) for the purchase count and (s,β) for
the lifetime. The goal is to find these four parameters. From these, all
actionable metrics can be derived.
In practice, this is how we train a Pareto/NBD model to find these four

parameters. Below is a simple chart demonstrating the process:

First, you must train the model over a training period with a minimum length
that corresponds to three times the typical inter-purchase time of your
customers. With customers data and simulations, we found that three times is a
minimum. Five to ten is definitely better.
The training period will give you an estimate for the model parameters. You
should then be able to compare what the model predicts vs. what you
observed in the training period at the customer level. If the purchase count is
in agreement, the next step is to compare predictions with observations made
in a validation/holdout period. This period has not been observed by the
model. If the model performs well in the validation/holdout period, then you
can forecast for a period of time from several months to several years,
depending on your business needs.
The Gamma-
Gamma
Extension To
The
Pareto/NBD
Model
As mentioned above, the Pareto/NBD model focuses on modeling lifetime and
purchase count. The monetary value extension to the Pareto/NBD model noted
on the right side of the chart, Gamma-Gamma, makes a few assumptions:
At the customer level, the transaction/order value varies randomly

around each customer’s average transaction value. (That, in itself, isn’t
too controversial.)
The observed mean value is an imperfect metric of the latent mean

transaction value E(M), where M represents the monetary value.
Average transaction value varies across customers, though these values

are stationary. (This is a big assumption to make.)
The distribution of average values across customers is independent of

the transaction process. In other words, monetary value can be modeled
separately from the purchase count and lifetime components of the
model. This may or may not hold in typical business situations.
Tying These
Two Models
Together:
CLV
Estimates At
The
Customer
Level

The Pareto/NBD model allows you to compute the expected number of
purchases in a forecast period at the customer level. Furthermore, the Gamma-
Gamma model allows you to assign a value to each of those future purchases.
It becomes a trivial exercise to forecast CLV for each customer; you simply have
to multiply the expectation values of each model. That should allow anyone to
make CLV comparisons during the holdout period before making any
forecasts.
Additional
Information
To help make these concepts very concrete, I have created
a public github repo that contains a notebook and a test dataset of an online
retailer in order to supplement my O'Reilly webcast. In the notebook, you will
find the steps to train both the Pareto/NBD and Gamma-Gamma models and
compute CLV at the customer level.
Learn More
Want to keep learning? Download our new study from Forrester about the tools
and practices keeping companies on the forefront of data science.
JEAN-RENE GAUTHIER
I am a former astronomer working at DataScience as a Data Scientist. Oh

and btw, I love hockey :-)
Enjoyed this post? Don't forget to share.
SUBSCRIBE TO OUR NEWSLETTER
Enter email address

Platform Solutions Resources Tools Company
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered
trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective
owners.

An Introduction To Predictive Customer Lifetime Value Modeling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Introduction To Predictive Customer Lifetime Value Modeling

Uploaded by

Copyright:

Available Formats

29/06/2018 An Introduction to Predictive Customer Lifetime Value Modeling

Customer lifetime value (CLV) is the “discounted value of future proﬁts

Customers can generate revenue for a company in many different ways.

Lifetime: the period over which a customer is maintaining his or her

Purchase rate: this parameter corresponds to the number of purchases a

Monetary value: this part of the model is concerned with assigning a

In the non-contractual setting, these parameters are unobserved. Probabilistic

The Pareto/NBD model makes the following assumptions regarding the

Purchase count follows a Poisson distribution with rate λ. In other words,

Lifetime distribution follows an exponential distribution with slope μ. The

The latent parameters λ and μ are constrained by two prior gamma

In practice, this is how we train a Pareto/NBD model to ﬁnd these four

At the customer level, the transaction/order value varies randomly

The observed mean value is an imperfect metric of the latent mean

Average transaction value varies across customers, though these values

The distribution of average values across customers is independent of

I am a former astronomer working at DataScience as a Data Scientist. Oh

Enjoyed this post? Don't forget to share.

SUBSCRIBE TO OUR NEWSLETTER

Enter email address

You might also like