Professional Documents
Culture Documents
1
Announcements
• You have until Tuesday to submit Homework 2
– You can submit now if you want to
2
The art of latent variable modeling:
Box’s loop
Evaluate,
Understand,
Data iterate
explore,
predict
Low-dimensional,
Complicated, noisy,
semantically meaningful
high-dimensional
representations
Latent
variable
(Algorithm, model) pair model
carefully co-designed for
tractability
3
Goals of evaluation
• Usefulness at a task
4
Evaluation of supervised models
Label, or
D features regression output
N Data points X Y
5
Evaluation of supervised models
Label, or
D features regression output
Train
X Y
Test
6
Evaluation of supervised models
Label, or
D features regression output
Train
X Y
Test
Cross-validate
Predict test labels,
Measure accuracy or loss
7
Evaluation of unsupervised models
D features Latent variables
?
Train
X Z
Test ?
No labels available.
How to evaluate?
8
Learning outcomes
By the end of the lesson, you should be able to:
9
10
Evaluation of unsupervised models
• Quantitative evaluation
– Measurable, quantifiable performance metrics
• Qualitative evaluation
– Exploratory data analysis (EDA) using the model
– Human evaluation, user studies,…
11
Evaluation of unsupervised models
• Intrinsic evaluation
– Measure inherently good properties of the model
• Fit to the data, interpretability,…
• Extrinsic evaluation
– Study usefulness of model for external tasks
• Classification, retrieval, part of speech tagging,…
12
Extrinsic evaluation:
What will you use your model for?
• If you have a downstream task in mind, you
should probably evaluate based on it!
13
• Goal: retrieve relevant documents given a query
• Query likelihood model:
– Each document has a language model
– Score documents by the likelihood of generating the query
14
Prediction of held-out data
?
Train
X Z
Test ?
15
Prediction of held-out data
?
Train Θ
X Z
Test ?
16
Prediction of held-out data
?
Train Θ
X Z
Test ?
17
Prediction of held-out data
• Compute log-probability of held-out data
under the posterior predictive distribution
18
Computing log-likelihood
from posterior samples
19
Computing the likelihood with
latent variables
• We may need to marginalize out the latent
variables to compute the likelihood:
20
Computing the likelihood with
latent variables
• Mixture model:
• Topic model:
22
Importance sampling to
estimate the likelihood
23
Importance sampling
• Can be used to estimate the ratio of
partition functions between p(x) and q(x)
24
Importance sampling to
estimate the likelihood
• Target distribution:
• Proposal distribution:
– Normalized distribution
25
How to choose a proposal distribution
• If we use the prior, we recover the simple Monte
Carlo algorithm. Importance weights:
26
Annealed importance sampling
(Neal, 2001)
27
Annealed importance sampling
(Neal, 2001)
28
Annealed importance sampling
Technical details
• Need:
– Target distribution
– Initial distribution, easy to sample from
– A sequence of intermediate distributions,
29
Annealed importance sampling
Technical details
30
Annealed importance sampling
Technical details
• Since is an importance
sample from
33
Example: modeling influence in
citation networks
34
Foulds and Smyth (2013), EMNLP
Example: modeling influence in
citation networks
35
Foulds and Smyth (2013), EMNLP
Example: modeling influence in
citation networks
36
Foulds and Smyth (2013), EMNLP
Topical influence regression
Latent variables for
document influence
citation edge influence
37
Foulds and Smyth (2013), EMNLP
Model Validation Using Metadata:
Number of times the citation occurs in the text
38
Self citations
39
Example: Using labeled data
40
Example: Using labeled data
• Model: A topic model, with seeded topics and
hierarchical structure enforced
42
Variation of information
43
Posterior predictive checks
• Sampling data from the posterior predictive distribution
allows us to “look into the mind of the model” – G. Hinton
“This use of the word mind is not intended to be metaphorical. We believe that a mental
state is the state of a hypothetical, external world in which a high-level internal
representation would constitute veridical perception. That hypothetical world is what the
figure shows.” Geoff Hinton et al. (2006), A Fast Learning Algorithm for Deep Belief Nets.
44
Posterior predictive checks
• Does data drawn from the model differ from the
observed data, in ways that we care about?
• PPC:
– Define a discrepancy function (a.k.a. test statistic) T(X).
• Like a test statistic for a p-value. How extreme is my data set
– Simulate new data X(rep) from the posterior predictive
• Use MCMC to sample parameters from posterior, then simulate data
– Compute T(X(rep)) and T(X), compare. Repeat, to estimate:
45
Example: Belin & Rubin (1995)
• Modeling response times for schizophrenics
and non-schizophrenics at a task
• Three discrepancies:
– largest observed variance for schizophrenics
– smallest observed variance for schizophrenics
– average within-person variance across all subjects
46
47
48
49
50
Qualitative evaluation
• Modeling U.S. Presidential state of the union addresses
51
J. R. Foulds, S. H. Kumar, and L. Getoor. Latent topic networks: A versatile probabilistic programming framework for topic models. ICML, 2015.
Human evaluation
52
Best practices
53
“Jimmy’s law of evaluation”
Probability your
paper is accepted at 0.5
an ML conference
0
1 2 3
54
Think-pair-share:
• You have a Bayesian model of the density of Geolocated Twitter
tweets, per individual, over time, in Southern California. Your end-
goal is to use to model to detect identity fraud. How will you
evaluate it?
Modeling human location data with mixtures of kernel densities. M. Lichman and P. Smyth.
55
Proceedings of 20th ACM SIGKDD Conference (KDD 2014))