Professional Documents
Culture Documents
Exponential Families
Generalized Linear Models
1
Why exponential families?
• So far, we’ve been studying fairly concrete models,
capturing phenomena such as coin flips, dice rolls,
drawing balls from urns…
2
Why exponential families?
• Many standard distributions are in the exponential family
(Gaussian, Bernoulli, Dirichlet, Poisson, exponential, …)
• Conjugate priors
3
Exponential family distributions
in a nutshell
• Suppose want to come up with a new probability distribution
What’s the simplest way that parameters and data could be mapped to
probabilities?
– How about a linear mapping via a dot product?
4
Exponential family distributions
in a nutshell
• Suppose want to come up with a new probability distribution
What’s the simplest way that parameters and data could be mapped to
probabilities?
– How about a linear mapping via a dot product?
5
Exponential family distributions
in a nutshell
• Suppose want to come up with a new probability distribution
What’s the simplest way that parameters and data could be mapped to
probabilities?
– How about a linear mapping via a dot product?
6
Exponential family distributions
in a nutshell
8
Entropy
9
Maximum entropy motivation
10
Statistical mechanics
• The second law of thermodynamics:
“the entropy of the universe tends
towards a maximum”
Rudolf Clausius
12
Learning outcomes
By the end of the lesson, you should be able to:
13
14
15
16
Exponential family distributions
• A pdf or pmf is said to be in the exponential
family if it can be written in the form:
“partition function”
• Or:
18
With normalizing constant
• Two ways to write it:
“partition function”
• Or:
19
Sufficient statistics
• is called sufficient because the likelihood
only depends on the data through
21
Summary / notation
22
Writing distributions in
exponential family form: Bernoulli
23
Writing distributions in
exponential family form: Bernoulli
24
Writing distributions in
exponential family form: Bernoulli
25
Writing distributions in
exponential family form: Bernoulli
26
Minimal exponential families
• The sufficient statistics are in some sense redundant,
as we can compute x and 1 – x as linear functions of
each other.
27
Minimal exponential families
28
Minimal exponential families
29
Minimal exponential families
30
Minimal exponential families
31
Minimal exponential families
32
Definitions
• Natural exponential family:
• Canonical form:
34
Gaussian distribution
35
Gaussian distribution
36
Gaussian distribution
37
38
Poisson distribution
39
40
Gamma distribution
41
Bayesian inference
42
Bayesian inference
43
Bayesian inference
44
Bayesian inference
45
Log partition function is
expected sufficient statistics
46
Log partition function is
expected sufficient statistics
47
Log partition function is
expected sufficient statistics
48
Log partition function is
expected sufficient statistics
49
Log partition function is
expected sufficient statistics
50
MLE for exponential families
• The likelihood function is:
51
MLE for exponential families
• The likelihood function is:
52
MLE for exponential families
• The likelihood function is:
53
MLE for exponential families
• The likelihood function is:
54
MLE for exponential families
• Take the derivative and set to zero
55
MLE for exponential families
• Take the derivative and set to zero
56
MLE for exponential families
• Take the derivative and set to zero
57
MLE for exponential families
• Take the derivative and set to zero
58
MLE for exponential families
• At the MLE, the expected sufficient statistics under
the model match the average observed statistics
60
Mean parameterization
• Many distributions are traditionally parameterized by
their mean
61
Generalized linear models
• Linear regression:
• Logistic regression:
1
- 0 +
• What about other types of data?
62
Generalized linear models
Linear predictor
Mean parameter
63
Link functions
64
Example: Poisson regression
65
Estimation of GLMs
• MLE or MAP estimation:
– Second-order optimization methods common,
Newton-Raphson /
iterative reweighted least squares
• Bayesian inference:
– MCMC, or Laplace approximation
66
GLM example: Think-pair-share
• You are a data scientist working on a competitor to
Google Maps. Your boss asks you to design a GLM to
predict the number of cars that pass each of various
stretches of the 5 freeway on any given hour on any
given day in the next 12 months.
67