Professional Documents
Culture Documents
Srihari
Approximate Inference
Sargur Srihari
srihari@cedar.buffalo.edu
Machine Learning
Srihari
Plan of Discussion
Machine Learning
Srihari
p(Z, X | )
p(Z, X | )
Z
p( ) p(z )p(x | z , )
p( ) p(z )p(x | z ,
K
k=1
1:K z1:n
k=1
i=1
1:K
i=1
1:K
Machine Learning
Srihari
Machine Learning
Srihari
( N ( x
N
n=1 k=1
| k , k ))
Expectation
N
znk
EZ ln p( X ,Z | , ,) = ( z nk ) ln k + ln N ( xn | k , k )
n=1 k=1
Machine Learning
Srihari
Machine Learning
Srihari
Types of Approximations
1. Stochastic
Markov chain Monte Carlo
Have allowed use of Bayesian methods across many domains
Computationally demanding
They can generate exact results
2. Deterministic
Machine Learning
Srihari
Machine Learning
Srihari
Machine Learning
Srihari
Role of KL Divergence
Measure closeness of distributions q(Z)
and p(Z|X) using KL divergence
q(Z )
KL{q || p} = Eq ln
p(Z
|
X)
q(Z )
= q(Z )ln
dZ
p(Z | X)
Machine Learning
Srihari
Log Marginal
Probability
where
" p(X,Z) %
L(q) = q(Z)ln#
&dZ
$ q(Z) '
and
" p(Z | X) %
KL{q || p} = q(Z)ln#
&dZ
q(Z)
$
'
Kullback-Leibler Divergence
between proposed q and p
(the desired posterior of interest)
to be minimized
Functional we wish to maximize
Also applicable to discrete
distributions by replacing
integrations with summations
Observations on optimization
Plan:
We seek that distribution q(Z) for which L(q) is largest
Since true posterior is intractable we consider restricted family for q(Z)
Seek member of this family for which KL divergence is minimized
11
hl1 flexibledrr.
ove
I"J
,proachthe rc
Machine Learning
! a pararncF
use
er bound 4
E#
Srihari
r oprimjz
Variational Inference
rample of tb
a\ e opti
spe
niz
p(x
il1 of disrrib
oupsthat re
ronfactonztl
Leonhard Euler
Swiss
Mathematician
1707-1783
Functional Derivative
How does function change in response to small
changes
in input function
12
Machine Learning
Srihari
Machine Learning
Srihari
Variational methods
Nothing intrinsically approximate
But naturally lend themselves to approximation
By restricting range of functions, e.g.,
Quadratic
Linear combination of fixed basis functions
Factorization assumptions
14
Machine Learning
Srihari
Negative
Logarithms
Laplace
Original
distribution
15
Machine Learning
Srihari
q(Z) = qi (Z i )
i=1
= qi ln p(X, Z ) ln qi dZ
i
i
16
Machine Learning
Srihari
Machine Learning
Srihari
11
=
21
12
22
Machine Learning
Srihari
19
Machine Learning
Minimization based on
KL Divergence KL(q||p)
Mean correctly captured
But various under-estimated
: too compact
Minimization based on
Reverse KL Divergence
KL(p||q)
Form used in
Expectation Propagation
Srihari
Machine Learning
Srihari
Machine Learning
Srihari
(1 )/ 2
KL(q||p) corresponds to 1
all D ( p || q ) 0 with equality iff p(x) = q(x)
For
1/ 2
( p(x)
q(x)1/ 2 ) 2 dx
Machine Learning
Srihari
23
Machine Learning
Srihari
24
Machine Learning
Srihari
p( ) = Dir( | 0 ) = C( 0 ) k 0 1
k=1
= N(k m0 ( 0 k )1 )W ( k |W 0 , 0 )
k =1
25
Machine Learning
Srihari
Variational Distribution
priors
precisions
26
Machine Learning
Srihari
K=6 components
After convergence there are
only two components
Density of red ink shows
Mixing coefficients
27