Cse291d 1 PDF

1
CSE291-D
Latent Variable Models
Instructor: Dr. Jimmy Foulds
Email: jfoulds@ucsd.edu
Office Hours: TuTh 5-6pm Atkinson Hall 4401
TA: Long Jin

Email: longjin@eng.ucsd.edu
Office Hours:
Course website: cseweb.ucsd.edu/classes/sp16/cse291-d/

Piazza: piazza.com/ucsd/spring2016/cse291d
Poll everywhere: PollEv.com/jamesfoulds656
• Latent variable modeling is a general, principled
approach for making sense of complex data sets
• Core principles:
– Dimensionality reduction
– Probabilistic graphical models
– Statistical inference, especially Bayesian inference
Latent variable models are, basically,

PCA on steroids! 3
Probabilistic latent variable modeling
Data
Complicated, noisy,
high-dimensional
4
Understand,
Data explore,
predict
Complicated, noisy,
high-dimensional
5
Understand,
Data explore,
predict
Complicated, noisy,
high-dimensional
Latent
variable
model
6
Understand,
Data explore,
predict
Low-dimensional,
Complicated, noisy,
semantically meaningful
high-dimensional
representations
Latent
variable
model
7
Underlying principle:
Dimensionality reduction
The quick brown fox jumps over the sly lazy dog
8
[5 6 37 1 4 30 5 22 570 12]
9
[5 6 37 1 4 30 5 22 570 12]
Foxes Dogs Jumping

[40% 40% 20% ]
10
Latent variable models
Z Latent variables
Parameters Φ X Observed data

Data
Points
Dimensionality(X) >> dimensionality(Z)

Z is a bottleneck, which finds a compressed, low-dimensional representation of X 11
Motivating Applications
• Industry:
– recommender systems, user modeling and
personalization, text analysis …
12
• Computational biology:
– Sequence alignment, phylogyny
13
• Computational social science:
– Cognitive psychology, digital humanities, …
14
The digital humanities
Mimno, D. (2012). Computational historiography: Data mining in a century of classics journals.

15
ACM Journal on Computing and Cultural Heritage, Vol. 5, No. 1, Article 3,.

16

17

18
Latent space models
for social network analysis
Social network:
Model that only captures
homophily performs the best
Protein-protein interaction
network:
Model that captures both
homophily and stochastic
equivalence performs the best
Hoff, P. (2008). Modeling homophily and stochastic equivalence in symmetric relational data. NIPS 19
Latent Feature Models for
Social Networks
Alice Bob
Claire
Social Networks
Alice Bob
Cycling Tango
Fishing Salsa
Running
Claire
Waltz
Running
Social Networks
Alice Bob
Cycling Tango
Fishing Salsa
Running
Claire
Waltz
Running
Social Networks
Alice Bob
Cycling Tango
Fishing Salsa
Running
Claire
Waltz
Running
Miller, Griffiths, Jordan (2009)
Latent Feature Relational Model
Alice Bob
Cycling Tango
Fishing Salsa
Running
Claire
Waltz
Running
Cycling Fishing Running Tango Salsa Waltz

Alice
Z= Bob
Claire
Automatically illustrating a guacamole recipe from https://www.youtube.com/watch?v=H7Ne3s202lU
25
26
Probability and Inference
Probability
Data generating
Observed data
process
Inference
27
Figure based on one by Larry Wasserman, "All of Statistics"
Inference Algorithms
• Exact inference
– Belief propagation on polytrees, junction tree
• Approximate inference
– Optimization approaches
• EM
• Variational inference
– Variational Bayes, mean field
– Message passing: loopy BP, TRW, expectation propagation
– Simulation approaches
• Importance sampling, particle filtering
• Markov chain Monte Carlo
– Gibbs sampling, Metropolis-Hastings, Hamiltonian Monte Carlo…
28
The art of latent variable modeling:
Box’s loop
Understand,
Data explore,
predict
Low-dimensional,
Complicated, noisy,
high-dimensional
representations
Latent
variable
model
29
Box’s loop
Understand,
Data explore,
predict
Low-dimensional,
Complicated, noisy,
high-dimensional
representations
Latent
variable
model
30
Box’s loop
Understand,
Data explore,
predict
Low-dimensional,
Complicated, noisy,
high-dimensional
representations
Latent
variable
(Algorithm, model) pair model
carefully co-designed for
tractability
31
Box’s loop
Evaluate,
Understand,
Data iterate
explore,
predict
Low-dimensional,
Complicated, noisy,
high-dimensional
representations
Latent
variable
(Algorithm, model) pair model
carefully co-designed for
tractability
32
This course: CSE291-D
• Three themes:
– Models
– Inference
– Evaluation
• Themes correspond to the steps in Box’s loop
33
Learning Goals
• By the end of the course you will be able to:
– Apply a variety of probabilistic models

– Formulate new probabilistic models to solve the
data science tasks you care about
– Derive inference algorithms for these models using

Bayesian inference techniques
– Evaluate their performance in order to critique and

improve them.
34
Weeks 1-2
• Foundations
– Bayesian inference
– Generative models for discrete data
– Exponential family models
35
Week 3
• Monte Carlo Methods
– Importance sampling, rejection sampling, why
they fail in high dimensions
– Markov chain Monte Carlo (MCMC)
36
Week 4
• Modeling
– Mixture models (revisited)
– Latent linear models

• Factor analysis, probabilistic PCA, ICA
37
Week 5
• Hidden Markov models (revisited)
– Applications, extensions, Bayesian inference
• Evaluating unsupervised models
38
Week 6
• Markov random fields
• Statistical relational learning and probabilistic

programming
– Markov logic networks, probabilistic soft logic,
Stan, WinBUGS
39
Week 7
• Variational inference
– Foundations, mean field updates
– Examples: Gaussian models, linear regression.

Variational Bayes EM
40
Week 8
• Topic models and mixed membership models
– LSA/PLSA, Genetic admixtures, LDA
– Inference: MCMC, variational inference
41
Week 9
• Social network models
– Exponential family random graph models
– Stochastic blockmodels, mixed membership

stochastic blockmodels, latent space models
42
Week 10
• Models for computational biology
– Profile HMMs, Phylogenetic models, coalescent
• Nonparametric Bayesian models

– Chinese restaurant process / Dirichlet process,
Indian buffet process
43
Pedagogy: Active learning
• Student-centered instruction
• Actively engage with the material in class
• In-class quizzes and polls, peer instruction,

discussion with peers
44
“If the experiments [were]
medical interventions, …
the control condition might
be discontinued
because the treatment being tested was
clearly more beneficial.”
45
Pedagogy: Active learning
• Everyone gets to participate, not just students in
the front row
• With traditional lectures only, course content is

“transmitted” in class, and you have to do the
hard yards of learning on your own
• With active learning to augment lectures,

learning, synthesis, and integration with prior
knowledge occur in class, with support from
instructor and peers
46
Peer instruction
• No need to buy a clicker: polleverywhere.com
47
Peer instruction
• You can respond to Poll Everywhere polls with your
laptop, tablet, or smartphone (bring it to class!)
PollEv.com/jamesfoulds656
• I recommend using the Poll Everywhere app, for

Android and Iphone. Find it in the app store.
• If you do not have a smartphone or laptop, you can use

colored voting cards.
The most important thing is that you vote!
48
Course Readings
• Required textbook:
Machine Learning: A Probabilistic Perspective.

Murphy (2012)
• Readings need to be completed

before each class. From Murphy, and/or other articles.
We will do reading quizzes at the start of each class.
• It is very important that you do the readings

so that we can make effective use of our limited lecture
time together (a “flipped classroom” approach.)
49
Reading for Today’s Lecture
• Blei, David M. (2014). Build, compute, critique, repeat: Data analysis with
latent variable models. Annual Review of Statistics and Its Application,
Sections 1-3.
• This article is a good overview of what CSE291-D is all

about, if you’re still deciding whether this course is
for you.
• Note how discussion is framed around Box’s loop.
50
Assessment
• Homeworks 25% (5 of them, 5% each)
• Group Project 35%
• Final 35%
• Participation 5%
51
Group Project
• Groups of 2-4
• An open-ended research project, to give you an

opportunity to explore the techniques and principles
covered in the course
• Must involve one or more of the themes of the course:

models, inference, evaluation (ideally all three)
• May overlap with your other research, but not any

other class project
52
Group Project
• Milestones / deliverables
– Project proposal, 4/19/2016
– Midterm progress report, 5/10/2016
– Project report, 6/9/2016
• Note that you may have to read ahead to start

your project. All readings are listed on the
syllabus, on the course webpage.
53
Piazza
• Use piazza to ask questions about the course,
instead of emailing me or the TA directly, so that
everyone in the class can benefit from the answer
• Piazza will also be used for announcements,

including information on the readings
• Please sign up at:

https://piazza.com/ucsd/spring2016/cse291d
54
How to Succeed in CSE291-D
• While the course will be challenging in the sense that we have a lot
of material to cover in 10 weeks, the course is designed so that
everyone has the opportunity to succeed. I do not grade on a curve.
• Learning goals for each lesson will be clearly stated – if you achieve
these you will be well prepared for the exam.
• Homeworks are designed to give you practice and feedback on the

learning goals.
• The 5% participation marks are there for the taking

– Participate in peer instruction/class discussions/etc, Piazza
55
Required Knowledge
• CSE250A is the only prerequisite
– Basics of directed graphical models
– d-separation, explaining away, Markov blanket
– Maximum likelihood estimation
– Expectation maximization
– A little exposure to mixture models, HMMs,…
– Prerequisites of CSE250A also apply
(elementary math, programming…)
• If you’re rusty, please read Murphy Ch. 10.

56
57
Answer
• D, 5. {2,3,6,7,4}
58
Recap: Markov blanket
• The Markov blanket of a node is the union of its

– Parents
– Children
– Co-parents (other parents of its children)
• The joint distribution is a product of

Pr(x|parents(x)) factors. The Markov blanket of x
is the set nodes it co-occurs with in these factors
59
60
Answer
• C, 4. {3,5,7,4}
61
62
Answer
• Turns out none of them are!
63
Recap: d-separation
• “Bayes ball” is blocked by:
64
Recap: d-separation
• “Bayes ball” passes through:
65
Recap: Explaining Away (Example)
• X, Z independent coin flips encoded as 0 or 1
• Y=X+Z
• If we know Y, then X and Z become coupled
66
Recap: d-separation
• Boundary cases:
67
Recap: d-separation
• Why we need the boundary cases:
• If y’ is a copy of y, reduces to explaining away
68
69
Answer
• Turns out none of them are! Even with 1
observed, you can still pass through 5 via
explaining away.
70
CSE291-D
Instructor: Dr. Jimmy Foulds
Email: jfoulds@ucsd.edu
Office Hours: TuTh 5-6pm Atkinson Hall 4401
TA: Long Jin

Email: longjin@eng.ucsd.edu
Office Hours:
Course website: cseweb.ucsd.edu/classes/sp16/cse291-d/

Piazza: piazza.com/ucsd/spring2016/cse291d
Poll everywhere: PollEv.com/jamesfoulds656

Cse291d 1 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cse291d 1 PDF

Uploaded by

Copyright:

Available Formats

1

TA: Long Jin

Course website: cseweb.ucsd.edu/classes/sp16/cse291-d/

Latent variable models are, basically,

Foxes Dogs Jumping

Parameters Φ X Observed data

Dimensionality(X) >> dimensionality(Z)

Mimno, D. (2012). Computational historiography: Data mining in a century of classics journals.

Mimno, D. (2012). Computational historiography: Data mining in a century of classics journals.

Mimno, D. (2012). Computational historiography: Data mining in a century of classics journals.

Mimno, D. (2012). Computational historiography: Data mining in a century of classics journals.

Cycling Fishing Running Tango Salsa Waltz

• Themes correspond to the steps in Box’s loop

– Apply a variety of probabilistic models

– Derive inference algorithms for these models using

– Evaluate their performance in order to critique and

– Markov chain Monte Carlo (MCMC)

– Latent linear models

• Evaluating unsupervised models

• Statistical relational learning and probabilistic

– Foundations, mean field updates

– Examples: Gaussian models, linear regression.

– Inference: MCMC, variational inference

– Stochastic blockmodels, mixed membership

• Nonparametric Bayesian models

• Actively engage with the material in class

• In-class quizzes and polls, peer instruction,

• With traditional lectures only, course content is

• With active learning to augment lectures,

• I recommend using the Poll Everywhere app, for

• If you do not have a smartphone or laptop, you can use

Machine Learning: A Probabilistic Perspective.

• Readings need to be completed

• It is very important that you do the readings

• This article is a good overview of what CSE291-D is all

• Homeworks 25% (5 of them, 5% each)

• Group Project 35%

• An open-ended research project, to give you an

• Must involve one or more of the themes of the course:

• May overlap with your other research, but not any

• Note that you may have to read ahead to start

• Piazza will also be used for announcements,

• Please sign up at:

• Homeworks are designed to give you practice and feedback on the

• The 5% participation marks are there for the taking

• If you’re rusty, please read Murphy Ch. 10.

• The Markov blanket of a node is the union of its

• The joint distribution is a product of

• If y’ is a copy of y, reduces to explaining away

TA: Long Jin

Course website: cseweb.ucsd.edu/classes/sp16/cse291-d/

You might also like