Discrete Mathematics and Probability Theory: A Brief Compilation

A Brief Compilation
of Guides, Walkthroughs, and Problems
Discrete Mathematics and

Probability Theory
at the University of California, Berkeley
Alvin Wan
Contents
0.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.1.1 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.1.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.1.3 Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.1.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 Modular Arithmetic and Polynomials 5

1.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Polynomial Properties . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Fermats Little Theorem . . . . . . . . . . . . . . . . . . . 5
1.1.4 Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . 6
1.1.5 Error-Correcting Codes . . . . . . . . . . . . . . . . . . . 6
1.1.6 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.7 Secret Sharing . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Secret-Sharing Walkthrough . . . . . . . . . . . . . . . . . . . . . 7
2 Counting 9
2.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Fundamental Properties . . . . . . . . . . . . . . . . . . . 9
2.1.2 Stars and Bars . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Order, Replacement, and Distinguishability . . . . . . . . 9
2.1.4 Combinatorial Proofs . . . . . . . . . . . . . . . . . . . . 10
2.1.5 Inclusion Exclusion Principle . . . . . . . . . . . . . . . . 10
2.2 Stars and Bars Walkthrough . . . . . . . . . . . . . . . . . . . . 11
2.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Probability 14
3.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Law of Total Probability . . . . . . . . . . . . . . . . . . . 14
3.1.3 Conditional Probability . . . . . . . . . . . . . . . . . . . 14
3.1.4 Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.6 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
3.2 Symmetry Walkthrough . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Expectation 19
4.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 Expectation Definition . . . . . . . . . . . . . . . . . . . . 19
4.1.2 Linearity of Expectation . . . . . . . . . . . . . . . . . . . 20
4.1.3 Conditional Expectation . . . . . . . . . . . . . . . . . . . 20
4.1.4 Law of Total Expectation . . . . . . . . . . . . . . . . . . 20
4.2 Linearity of Expectation Walkthrough . . . . . . . . . . . . . . . 21
4.3 Dilution Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Distributions and Estimation 29

5.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1.1 Important Distributions . . . . . . . . . . . . . . . . . . . 29
5.1.2 Combining Distributions . . . . . . . . . . . . . . . . . . . 29
5.1.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1.4 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.5 Linearity of Variance . . . . . . . . . . . . . . . . . . . . . 30
5.1.6 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Variance Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . 31
6 Bounds 33
6.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1.1 Markovs Inequality . . . . . . . . . . . . . . . . . . . . . 33
6.1.2 Chebyshevs Inequality . . . . . . . . . . . . . . . . . . . . 33
6.1.3 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . 33
6.2 Confidence Intervals Walkthrough . . . . . . . . . . . . . . . . . 34
7 Markov Chains 38
7.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.1.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . 38
7.1.3 Transition Probability Matrices . . . . . . . . . . . . . . . 38
7.1.4 Balance Equations . . . . . . . . . . . . . . . . . . . . . . 39
7.1.5 Important Theorems . . . . . . . . . . . . . . . . . . . . . 39
7.2 Hitting Time Walkthrough . . . . . . . . . . . . . . . . . . . . . 40
8 Solutions 44
8.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Page 3
0.1 Purpose
This compilation is (unofficially) written for the Spring 2016 CS70: Discrete
Mathematics and Probability Theory class taught by Professor Satish Rao
and Professor Jean Walrand at UC Berkeley. Its primary purpose is to offer
additional practice problems and walkthroughs to build intuition, as a supplement
to official course notes and lecture slides. Including more difficult problems in
walkthroughs, there are over 35 exam-level problems.
0.1.1 Contributors
A Special Thanks to Sinho Chewi for spending many hours suggesting improvements,
catching bugs, and discussing ideas and solutions for problems with me. Additionally,
thanks to Dibya Ghosh and Blake Tickell, who helped review problems for clarity
and correctness.
0.1.2 Structure
Each chapter is structured so that this book can be read on its own. A
minimal guide at the beginning of each section covers essential materials and
misconceptions but does not provide a comprehensive overview. Each guide
is then followed by walkthroughs covering classes of difficult problems and 3-5
exam-level (or harder) problems that Ive written specifically for this book.
Note: As of Spring 2016, not all chapters have problems. However, all chapters
have at least a walkthrough. This will be amended in Fall 2016.
0.1.3 Breakdown
For the most part, guides are cheat sheets for select chapters from official
course notes, with additional comments to help build intuition.
For more difficult parts of the course, guides may be accompanied by breakdowns
and analyses of problem types that might not have been explicitly introduced
in the course. These additional walkthroughs will attempt to provide a more
regimented approach to solving complex problems.
Problems are divvied up into two parts: (1) walkthroughs - a string of problems
that evolve from the most basic to the most complex - and (2) exam-level
questions, erring on the side of difficulty where needed. The hope is that with
walkthroughs, students can reduce a relatively difficult problem into smaller,
simpler subproblems.
0.1.4 Resources
Additional resources, including 20+ quizzes with 80 practice questions, and
other random worksheets and problems are posted online at alvinwan.com/cs70.
Page 4
Chapter 1
Modular Arithmetic and

Polynomials
1.1 Guide
1.1.1 Modular Arithmetic
In modulo p, only the numbers {0, 1, ..., p 1} exist. Additionally, division is
not well-defined. Instead, we define a multiplicative inverse. We know that
outside a modulo field, for any number n, an inverse n1 multiplied by itself is
1. (n n1 = 1) Thus, we extend the definition of an inverse to the modulo field
in this manner, where for any number n,
n n1 = 1 (mod p)
Do not forget that division, and thus, fractions do not exist in a modulo.
1.1.2 Polynomial Properties

For polynomials, we have two critical properties.
1. A polynomial of degree d has at most d roots.

2. A polynomial of degree d is uniquely identified by d + 1 distinct points.
Note that taking a polynomial over a Galois Field of modulo p (denoted GF (p))
simply means that all operations and elements in that field are (mod p).
1.1.3 Fermats Little Theorem

Fermats Little Theorem states that if p is prime, for any a, ap = a (mod p). If
p does not divide a, we additionally know the following.
5
ap1 = 1 (mod p)
Applying Fermats Little Theorem repeatedly until the exponent of a is less

than p 1 gives us an interesting corollary: ay = ay mod p1 (mod p)
1.1.4 Lagrange Interpolation

For a given set of points, compute first the for each coordinate, where
(x xj )
i = i6=j
(xi xj )
Then, to recover the original polynomial, multiply all i by the respective yi s.
X
P (x) = i yi
i
1.1.5 Error-Correcting Codes

Across a lossy channel, where at most k packets are lost, send n + k packets.
Across a corruption channel, where at most k packets are corrupted, send n+2k
packets. To recover a P across a corruption channel, apply Berkelamp-Welsh.
1.1.6 RSA
In RSA, we have a public key (N, e), where N is a product of two primes, p and
q, and e is co-prime to (p 1)(q 1). Here are Encrypt and Decrypt.
E(x) = xe (mod N )
D(y) = y d = x (mod N )
Why are they defined this way? We have that y = E(x), so we plug in:
D(E(x)) = E(x)d = (xe )d = xed = x1 (mod N )
If the above equation xed = x is satisfied, then D(y) returns the original message.
How do we generate d? By Fermats Little Theorems corollary, we know ed = 1
(mod (p 1)(q 1)). Given we have e, we see that we can compute d if and
only if we know p and q. Thus, breaking RSA equates factorizing N into p, q.
1.1.7 Secret Sharing

In a secret-sharing problem, our goal is to create a secret such that only a group
meeting specific requirements can uncover it. We will explore secret sharing
problems in 3.2 Secret-Sharing Walkthrough.
Page 6
1.2 Secret-Sharing Walkthrough

We begin with the most elementary form of secret-sharing, which requires
consensus from some subset of k people before the secret can be revealed.
Question: Basic
Construct a scheme that requires that at least k of n people to come together,
in order to unlock the safe.
Answer Polynomial of degree k 1.
We need at least k people, meaning this polynomial should require k points.

Thus, we create a polynomial of degree k 1, and distribute n distinct points
along this polynomial to n people. The secret is this polynomial evaluated at 0.
Question: Combining Polynomials

Develop a scheme that requires x1 people from group A and x2 people from B.
Answer polynomials of x1 1, x2 1 degrees, and a 1-degree polynomial using P1 (0), P2 (0)
Create a polynomial p1 of degree x1 1 for A and a second polynomial p2 of

degree x2 1 for B. Use the secrets of p1 and p2 (p1 (0) and p2 (0)) to create a
third polynomial p3 of degree 1. The secret is p3 (0).
Question: Combining Polynomials Generalized

Construct a scheme that requires xi from each of the n groups of people.
Answer n Pi s of xi 1 degree, and 1 n 1-degree polynomial using Pi (0) for all i.
Create n polynomials with degree xi 1 for the ith group. Use the secrets
(i, pi (0)) of all n polynomials to create an n + 1th polynomial of degree n 1.
The root of this n + 1th polynomial is the secret.
Question: Re-weighting
Each group elects oi officials. Construct a scheme that requires ai oi officials
from each group, where 10 citizens can replace an official.
Answer 10ai 1 polynomials, where each official gets 10 points and each citizen gets 1
Create a polynomial of degree 10ai 1, and give each of the ai officials 10 points
each. Then, give each citizen 1 point each. Use the secrets (i, pi (0)) of all n
polynomials to create an n + 1th polynomial of degree n 1. Since each official
has 10 times the number of packets for the same polynomial, any 10 citizens
can merge to become a single official.
Question: Erasure Channel

Construct a scheme that requires k workers to unlock the safe. Make sure to
specify, if at most m workers do not respond to requests, how many workers
we need to ensure we can unlock the safe?
Answer k + m, one k 1-degree polynomial
Page 7
The intuitive response to simply request the number of people that may not
respond in addition to the number of people we need to unlock the safe. Thus,
we need to request k+m workers to reconstruct a degree k1-degree polynomial.
Question: Corruption Channel

Construct a scheme that requires k workers to unlock the safe. Make sure to
specify, if at most m workers mis-remember information, how many people we
need to ensure we can unlock the safe.
Answer k + 2m, one k 1-degree polynomial
Per our knowledge of corruption channels, we need to n + 2k packets, or in this

case, k + 2m workers, where again, we construct a degree k 1 polynomial.
Page 8
Chapter 2
Counting
2.1 Guide
Counting bridges discrete mathematics and probability theory, to some degree
providing a transition from one to the next. Albeit a seemingly trivial topic,
this section provides foundations for probability.
2.1.1 Fundamental Properties

We have two counting properties, as follows:
1. If, for each of k items, we have {n1 , n2 , ..., nk } options, the total number
of possible combinations is n1 n2 nk = i ni .
2. To find the total number of un-ordered combinations, divide the number
of ordered combinations by the number of orderings. nk = (nk)!k!
n!

2.1.2 Stars and Bars

Given n balls and k bins, count all the ways to distribute n balls among k bins.
Given a list of n balls, we need to slice this list in k 1 places to get k partitions.
In other words, given all of our stars (balls, n) and bars (slices, k 1), we
have n + k 1 total items. We can either choose to place our k 1 bars or our
n stars. Thus, the total number of ways to distribute n balls among k bins:

n+k1 n+k1
=
n k1
2.1.3 Order, Replacement, and Distinguishability

This is a list of translations, from counting to English. One of the most
common mistakes over-counting or under-counting the number of combinations.
9
Indistinguishable balls = Order doesnt matter, combinations

Distinguishable balls = Order does matter, permutations
Only one ball in each bin = Without replacement
Multiple balls allowed in each bin = With replacement
2.1.4 Combinatorial Proofs

Several of the following are broadly applicable, for all sections in probability.
However, we will introduce them here, as part of a set of approaches you can
use to tackle combinatorial proofs.
Addition is Or, and multiplication is And.

Expand coefficients. 2 n2 = n2 + n2 , so consider all pairs from n

Or consider all pairs from (another) n.

a+b a b
Distribute quantities. n2 = n2 n2 , so we consider all pairs
from n, a times And consider all pairs from (another) n, b times.
Switch between equivalent forms for combinations, to see which makes
more sense.
Rewrite quantities as choose 1. n = n1 , so we pick one from n

items.
Toggle between the two quantities. a+b = a+b

b a , as choosing b is
the same as choosing a from a + b.
Try applying the first rule of counting as well.
2n is the equivalent of picking all (possibly empty) subsets. In other
words, we consider 2 possibilities for all n items {Include, Dont Include}.
nk k! = (nk)!
n!

, which is k samples from n items without replacement.
Make sure to not prove equality mathematically, or attempt to just write
in words what happens mathematically.
2.1.5 Inclusion Exclusion Principle

This section prevents over-counting; we can visualize this as Venn Diagrams.
For, |A B|, note that the area denoting A B is duplicated. So, we subtract
the intersection to get the space of all A and B. Thus,
|A B| = |A| + |B| |A B|
|A B C| = |A| + |B| + |C| |A B| |B C| |C A| + |A B C|
Page 10
2.2 Stars and Bars Walkthrough

Stars and bars can come in many different types and flavors. The following
build on one case, building in complexity. Here are several other stars and bars
variations:
All possible phone numbers, given that the digits sum to a particular
value.
Distributing chairs among rooms, given we have only a particular number
of chairs.
Bringing at least k of each sock, given you can only fit n socks in your
suitcase.
Notice that in all of the mentioned scenarios, there is some bound on the
number of items we can distribute. That should immediately trigger stars and
bars.
Question: Basic
How many ways are there to sprinkle 10 oreo pieces on our 3 scoops of ice
cream? Assume that each scoop is a different flavor of ice cream. (i.e., Each
scoop is distinguishable.)
2
Answer
12

This reduces to a stars and bars problem, where we have 3 possible buckets
(2 bars) and 10 stars. Note that to specify 3 buckets, we need only 2 bars to
separate the 10 stars. Thus, we can choose the 2 bars or the 10 balls from 12
slots.
Question: At Least
How many ways are there to sprinkle 10 sprinkles on 3 scoops, such that the
first scoop gets at least 5 pieces?
2
Answer
7

This reduces to a second stars and bars problem. Simply, we first give the first
scoop 5 pieces. Why do this? This means that regardless of however many
additional pieces we distribute it, the first scoop will have at least 5 pieces.
We are then left with a sub-stars-and-bars problem, with 10 5 = 5 sprinkles
and 3 scoops. We proceed as usual, noting this is 5 stars and 2 bars.
Question: At Most
Assume that each scoop can only hold a maximum of 8 pieces. How many
ways are there to sprinkle 10 sprinkles on 3 scoops?
2 1 1 1
Answer
12 3 2 3

Page 11
First, we count all the possible ways to distribute 10 oreo sprinkles among 3
scoops. This is the answer to the last quiz, problem 3: 12

2
Then, we count the number of invalid combinations. The only invalid combinations
are when a scoop has 9 or more sprinkles. We consider each case:
1. One scoop has 9 sprinkles. There are 31 ways to pick this one scoop with

9 sprinkles. Then, there are two other scoops to pick from, to give the
final sprinkle, making 21 ways to distribute the last sprinkle.

2. One scoop has 10 sprinkles. There are 31 ways to pick this one scoop.

There are no more sprinkles for the other scoops.
Thus, we take all combinations and then substract both invalid combinations.
We note that the invalid combinations are mutually exclusive.
Question: At Least and At Most

Assume that each scoop can only hold a maximum of 8 pieces and a minimum
of 2. How many ways are there to sprinkle 14 sprinkles on our 3 scoops of ice
cream?
2 1 1 1
Answer
10 3 2 3

Given the first problem in this walkthrough, we know that we can reduce the
problem to a sub-stars-and-bars problem. We first distribute 2 sprinkles to each
scoop, guaranteeing that each scoop will have at least 2 sprinkles distributed to
it.
Then, we count all the possible ways to distribute 8 oreo sprinkles among 3
scoops. This is - by stars and bars - 10

2 .
Then, we count the number of invalid combinations. Since each scoop already
has 2 sprinkles, it can take at most 6 more, to satisfy the 8-sprinkles maximum.
Thus, the invalid combinations are when we distribute 7 or more sprinkles to a
single scoop.
We consider each case:
1. One scoop has 7 sprinkles. There are 31 ways to pick this one scoop with

7 sprinkles. Then, there are two other scoops to pick from, to give the
final sprinkle, making 21 ways to distribute the last sprinkle.

2. One scoop has 8 sprinkles. There are 31 ways to pick this one scoop.

There are no more sprinkles for the other scoops.
Thus, we take all combinations and then substract both invalid combinations.
We note that the invalid combinations are mutually exclusive, making 10

2
3 2 3

1 1 1 .
Page 12
2.3 Problems
1. If we roll a standard 6-sided die 3 times, how many ways are there to roll
a sum total of 14 pips where all rolls have an even number of pips?
2. Given a standard 52-card deck and a 5-card hand, how many unique hands
are there with at least 1 club and no aces?
are there with at least 1 club or no aces?
4. Given a standard 52-card deck and a 3-card hand, how many unique
hands are there with cards that sum to 15? (Hint: Each card is uniquely
identified by both a number and a suit. This problem is more complex than
phone numbers.)
Page 13
Chapter 3
Probability
3.1 Guide
3.1.1 Random Variables
Let be the sample space. A random variable is by definition a function
mapping events to real numbers. X : R, X() R. An indicator variable is
a random variable that only assumes values {0, 1} to denote success or failure for
a single trial. Note that for an indicator, expectation is equal to the probability
of success:
E[Xi ] = 1 P [Xi = 1] + 0 P [Xi = 0] = P [Xi = 1]
3.1.2 Law of Total Probability

r[B],
The law of total probability states that P r[A] = P r[A|B]P r[B]+P r[A|B]P

if the only values of B are B and B. More generally speaking, for a set of Bi
that partition ,
X
P r[A] = P r[A|Bi ]P r[Bi ]
i
Do not forget this law. On the exam, students often forget to multiply by P r[Bi ]
when computing P r[A].
3.1.3 Conditional Probability

Conditional probability gives us the probability of an event given priors. By
definition, the probability of A given B is defined to be
P r[A B]
P r[A|B] =
P r[B]
14
3.1.4 Bayes Rule

Bayes expands on this idea, using the Law of Total Probability.
P r[B|A]P r[A]
P r[A|B] =
P r[B]
3.1.5 Independence
Note that if the implication only goes in one direction, the converse is not
necessarily true. This is a favorite for exams, where the crux of a True-False
question may be rooted in a converse of one of the following implications.
X, Y independent (P r[XY ] = P r[X]P r[Y ])

X, Y independent (E[XY ] = E[X]E[Y ])
X, Y independent (Var(X + Y ) = Var(X) + Var(Y ))
X, Y independent (Cov(X, Y ) = 0)
Using the above definition for independence with probabilities, we have the
following corollary.
P r[X, Y ] P r[X]P r[Y ]

X, Y independent (P r[X|Y ] = = = P r[X])
P r[Y ] P r[Y ]
3.1.6 Symmetry
Given a set of trials, the principle of symmetry states that the probability of
each trial is independent of other trials, without additional information. See 3.2
Symmetry Walkthrough for more details and concrete examples.
Page 15
3.2 Symmetry Walkthrough

Let us take 10 marbles from a bag of N marbles, of which r > 10 are red,
without replacement. Let Xi be an indicator equal to 1 if and only if the ith
marble is red.
Question: Single Unconditioned, None Conditioned

With what probability is the first marble red? The second? Third? The tenth?
N
Answer All:
r
The probability of the first marble being red is Nr . However, by symmetry, the
probability of each marble being red is also Nr ! We know this is true, because
we are not given any information about any of the marbles. With or without
replacement, symmetry applies. However, as we will see, symmetry breaks down
or applies to a limited degree when we are given information and condition our
probabilities.
Question: Single Unconditioned, Single Conditioned

Given that the first marble is red, with what probability is the second marble
red? The third? The tenth? N 1
Answer 1 < i 10, r1
We are given that the first marble is red. As a result, we are actually computing
P r[Xi = 1|X1 = 1]. Since one red marble has already been removed, we have
that for the second marble, P r[X2 = 1|X1 = 1] = Nr1 1 . Again, by symmetry,
we in fact know that this is true for all i > 1, as we do not have additional
information that would tell us otherwise.
Question: Single Unconditioned, Multiple Conditioned

Given that the first and fifth marbles are red, with what probability is the
second marble red? The tenth? N 2
Answer r2
The position of the marble that we have information about, does not matter.
Thus, again applying symmetry, we have that all remaining marbles have probability
r2
N 2 of being red.
Question: Multiple Unconditioned, None Conditioned

What is the probability that the first two marbles are red? The second and
third? The ninth and tenth? N N 1
Answer r r1
Again, by symmetry, we can argue that regardless of which two marbles, the
probability that any pair is red, is the probability that one pair is red. Note
that symmetry doesnt apply within the pair, however. When considering the
second marble, we know that the first marble is necessarily red. When
computing the probability that the first and second marbles are red, we are
then computing P r[X1 ]P r[X2 |X1 ], which is Nr Nr1
1 .
Page 16
Question: Multiple Unconditioned, Multiple Conditioned

Given that the first marble is red and the second is not red, what is the
probability that the seventh marble is red and the ninth marble is not red?
N 2 N 3
Answer r1 N r1
At this point, it should be apparent that we could have asked for the probability
that any two marbles are red. For the seventh marble, which is red, we consider
the number of red marbles we have no information about, which is r 1 and
the total number of marbles we have no information about, N 2. This makes
the probability that the seventh marble is red, Nr1
2 .
For the tenth marble, there are now N 3 marbles we do not have information
about. There are r 2 red marbles we do not have information about. Thus,
the numebr of remaining non-red marbles is (N 3) (r 2) = N r 1,
making the probability that the tenth marble is not red, NNr1
3 .
Page 17
3.3 Problems
1. We sample k times at random, without replacement, a coin from our wallet.
There are p pennies, n nickels, d dimes, and q quarters, making N total
coins. Given that the first three coins are pennies, what is the probability
that we will sample a nickel, 2 pennies, and a nickel in sequential order?
Note that this sequence of 4 coins does not need to be consecutive and
can span more than 4 coins.
2. We are playing a game with our apartment-mate Wayne. There are three
coins, one biased with probability p of heads and the other two fair coins.
First, each player is assigned one of three coins uniformly at random.
Players then flip simultaneously, where each player earns h points per
head. The winning player is the one with the most points. If Wayne earns
k points after n coin flips, what is the probability that Wayne has the
biased coin?
3. Let X and Y be the results from two numbers, chosen uniformly randomly
in the range {1, 2, 3, ..., k}. Define Z = |X Y |.
(a) Find the probability that Z < k 2.

(b) Find the probability that Z 2.
4. Consider a 5 in. 3 in. board, where each square inch is occupied by a

single tile. A monkey hammers away at the board, choosing a position
uniformly at random; assume a single strike completely removes a tile from
that position. (Note that the monkey can choose to strike a position with
no tiles.) By knocking off tiles, the monkey can create digits. For example,
the following would form the digit 3, where 0 denotes a missing tile and
1 denotes a tile.

0 0 0
1 1 0

0 0 0

1 1 0
0 0 0
(a) Given the monkey strikes n times, where n > 15, what is the probability
that the monkey knocks out tiles, such that the board forms 8?
(b) Given the monkey strikes n times, where n > 15, what is the probability
(c) Given the monkey strikes n times, where n > 15, what is the probability
that the monkey knocks out tiles, such that the board forms an even
number?
(d) Which digit is most likely to occur?
Page 18
Chapter 4
Expectation
4.1 Guide
With expectation, we begin to see that some quantities no longer make sense.
Expressions that we compute the expectation for may in fact be far detached
from any intuitive meaning. We will specifically target how to deal with these,
in the below regurgitations of expectation laws and definitions.
4.1.1 Expectation Definition

Expectation is, intuitively, the mean. We multiply all values by the probability
that that value occurs. The formula is as follows:
X
E[X] = xP r[X = x]
xR
However, we also know the following.
X
E[g(X)] = g(x)P r[X = x]
xR
Said another way, the expression in E[...], which we will call g(X) can be
needlessly complex. To solve such an expectation, simply plug in the expression
into the summation. For example, say we need to solve for E[X 2/5 ]. This makes
little intuitive sense. However, we know the expression in terms of X affects
only the value.
X
E[X 2/5 ] = x2/5 P r[X = x]
This also extends to multiple random variables, so
19
X
E[g(X, Y )] = g(x, y)P r[X = x, Y = y]
x,yR
For example, we have that
X
E[(X + Y )2 ] = (x + y)2 P r[X = x, Y = y]
x,yR
4.1.2 Linearity of Expectation

Regardless of independence, the linearity of expectation always holds. Said
P P
succinctly, it is true that E[ i ai Xi ] = i ai E[Xi ]. Said once more in a less
dense format, using constants ai and random variables xi :
E[a1 X1 + a2 X2 + + an Xn ] = a1 E[X1 ] + a2 E[X2 ] + + an E[Xn ]
Given a more complex combination of random variables, apply linearity of

expectation to solve.
4.1.3 Conditional Expectation

Here, we expand our definition of expectation.
X
E[Y |X = x] = yP r[Y = y|X = x]
y
We know how to solve for P r[Y = y|X = x], using definitions from the last
chapter.
4.1.4 Law of Total Expectation

The Law of Total Expectation states simply that
E[E[Y |X]] = E[Y ]
However, the projection property provides a more general form, showing that
the law of total expectation is actually a special case where f (x) = 1.
E[E[Y |X]f (X)] = E[Y f (X)]
Page 20
4.2 Linearity of Expectation Walkthrough

Let us begin with a simple linearity of expectation problem, where of the random
variables are independent. However, as we will see, linearity of expectation
always holds, regardless of dependence between the involved random variables.
From this walkthrough, you should learn:
When a question asks how many, immediately think indicator variable.

The indicator variable is then used to indicate whether or not the ith trial
is a success. Sum over all indicators to compute the number of successes.
Option 1 for computing expectation: Directly compute a complex
P
expression for random variables, using the fact that E[g(X, Y )] = g(x, y)P r[X =
x, Y = y]
Option 2 for computing expectation: We can expand an expectation
that makes little intuitive sense, such as E[(X +Y )2 ] = E[X 2 +2XY +Y 2 ].
Using linearity of expectation, we can then compute each term separately.
Question: Variables
Knowing that E[X] = 3, E[Y ] = 2, E[Z] = 1, compute E[1X + 2Y + 3Z].
Answer 10
Use linearity of expectation to expand E[X +2Y +3Z] = E[X]+2E[Y ]+3E[Z] =

3 + 2(2) + 3(1) = 10.
Question: Independence
Let Z = X1 + X2 + X3 + X4 , where each Xi is the number of pips for a dice
roll. What is E[Z]?
Answer 14
We begin by computing E[Z] = E[X1 + X2 + X3 + X4 ], which, by linearity of

expectation, is the same as E[X1 ] + E[X2 ] + E[X3 ] + E[X4 ]. All of the Xi s are
the same, making i, E[Xi ] = 27 . This makes E[Z] = 4E[X1 ] = 4 72 = 14.
We will now see a more surprising application of linearity of expectation, where
the random variables are dependent.
Question: Dependence
Consider a bag of k red marbles and k blue marbles. Without replacement, we
pull 4 marbles from the bag in sequential order; what is the expected number
of red marbles?
Answer 2
The question asks for how many, so we know we need indicators to count
P4
successes. Let us define Z = i=1 Xi , where each Xi is 1 if the ith marble is
P4
red. We know, E[Z] = E[ i=1 Xi ], and by linearity of expectation
Page 21
4
X
E[Z] = E[Xi ]
i=1
For all indicators, E[X] = P [X = 1].
4
X
= P [Xi = 1]
i=1
4
X 1
=
i=1
2
1
=4 =2
2
As linearity of expectation shows, the probability of a red marble on the first
marble is the same as the probability of a red marble on the fourth. This
symmetry applies to any set of samples where we are not given additional
information about the samples drawn.
Question: Algebra with Independence

Consider three random variables X, Y , and Z, where X is the outcome of a
dice roll, Y is the outcome of another dice roll, and Z = Y + X (Z be the sum
of the two dice roll outcomes). Compute E[Z] and E[Z 2 ]. Is E[Z 2 ] = E[Z]2 ?
Answer 7, 38.5, No
First, well compute E[Z].
E[Z] = E[X + Y ]
= E[X] + E[Y ]
7 7
= +
2 2
=7
E[Z 2 ] is not guaranteed to be 49. E[Z 2 ] 6= E[Z]2 necessarily. Whereas it is

possible, it is not guaranteed.
E[Z 2 ] = E[(X + Y )2 ]
= E[X 2 + 2XY + Y 2 ]
= E[X 2 ] + 2E[XY ] + E[Y 2 ]
We will compute each term separately, to provide clarity.

First, we compute 2E[XY ]. Since, X and Y are independent, we can rewrite
this as
7 7 49
2E[X]E[Y ] = 2( )( ) =
2 2 2
Page 22
Second, we compute E[X 2 ]. This is given in the previous part.

Third, note that E[X 2 ] = E[Y 2 ], since these are both dice rolls. Thus,
E[X 2 ] + 2E[XY ] + E[Y 2 ]

91 49 91
= + +
6 2 6
329
=
6
This makes E[Z 2 ] = 54.8, which is not E[Z]2 = 49.
Question: More Dependence

Angie goes to the music building daily to practice singing. Each day, she
chooses one of n pieces at random. Given some m < n of the pieces are arias,
let us impose an absolute ordering on these arias. If Angie practices for k
days, how many times will Angie practice all arias in sequential order, without
repeating an aria? (Note that this means Angie will necessarily spend m days,
practicing one aria a day, to finish one sequence.)
nm
1 Answer (k m + 1)
The question asks for how many, so we know we need indicators to count
P
successes. We define X to be the total number of sequences and X = i Xi
where Xi is be 1 iff Angie begins a sequence on day i. This means that the
last day Angie can begin a sequence is k m + 1. Thus, we actually consider
k m + 1 trials.
Now, we compute the probability that, of m trials, Angie picks exactly the right
m arias in sequential order. Thus, the probability for a particular Xi is n1m .
X
E[X] = E[ Xi ]
i
X
= E[Xi ]
i
= (k m + 1)E[Xi ]
1
= (k m + 1)
nm
Page 23
4.3 Dilution Walkthrough

For the following problems, some answers are so complex syntactically that
placing an upside-down miniature version would render the answer completely
illegible. As a consequence, most answers in this section are placed right-side
up in plain sight.
It is now year 3000 where watermelon is a sacred fruit. Everyone receives N

watermelons at birth. However, citizens of this future must participate in the
watermelon ceremonies, annually. At this ritual, citizens can choose to pick 1
melon at random, to replace with a cantaloupe.
Question: Two Cases

Given that a citizen has m watermelons at the nth year, what are all the
possible number of watermelons that this citizen can have in the n + 1th year,
and what is the probability that each of these situations occur?
(
m
m w.p.1 N
Xn+1 =
m1 w.p. m
N
In the first case, our number of watermelons does not change. This only occurs
if our pick is a cantaloupe. Since there are m watermelons, there are N m
m
cantaloupes. Thus, the probability of picking a single cantaloupe is N N =
m
1 N.
m
The second case falls out, as there are m watermelons, making N.
Question: Three Cases

Let us suppose that a citizen now picks two watermelons at random, at this
ritual. Given that a citizen has m watermelons at the nth year, what are all
the possible number of watermelons that this citizen can have in the n + 1th
year, and what is the probability that each of these situations occur?

m

w.p. (N m)(N m1)
N (N 1)
m)
Xn+1 = m 1 w.p. 2m(N
N (N 1)

m 2 w.p. m(m1)

N (N 1)
In the first case, our number of watermelons does not change. This only occurs
if both of our picks are cantaloupes. Since there are m watermelons, there are
m
N m cantaloupes. Thus, the probability of picking a single cantaloupe is N N .
(N m)(N m1)
The probability of picking two cantaloupes is N (N 1) .
Likewise, if the number of watermelons decreases by 1, we have chosen one
watermelon and one cantaloupe. This means we either chose the watermelon
m m
second and the cantaloupe first N N N 1 , or we chose the watermelon first and
m N m
the cantaloupe second N N 1 . Summed together, we have that the probability
2m(N m)
of one watermelon and one cantaloupe is N (N 1) .
Page 24
m(m1)
Finally, the probability of picking two watermelons is N (N 1) .
Question: Conditional Expectation, Two Cases

Again, let us consider the original scenario, where each citizen picks only 1
melon at random at the ritual. Given that a citizen has m watermelons at the
nth year, how many watermelons will a citizen then have in year n + 1, on
average? N
Answer Xn (1 1 )
We are effectively computing E[Xn+1 |Xn = m]. We already considered all

possible values of Xn+1 with their respective probabilities. So,
X
E[Xn+1 |Xn = m] = E[Xn+1 = x|Xn = m]
x
X
= xP r[Xn+1 = x|Xn = m]
x
m m
= m(1 ) + (m 1)
N N
m2 m2 m
=m +
N N N
m
=m
N
1
= m(1 )
N
Since Xn = m, we substitute it in.
1
= Xn (1 )
N
Question: Conditional Expectation, Three Cases
Let each citizen pick two melons per ritual. Given that a citizen has m
watermelons at the nth year, how many watermelons will a citizen then have
in year n + 1, on average? N
Answer m(1 2 )
We are effectively computing E[Xn+1 |Xn = m]. We already considered all

possible values of Xn+1 with their respective probabilities. So,
X
E[Xn+1 |Xn = m] = xP r[Xn+1 = x|Xn = m]
x
m(N m)(N m 1) + (m 1)2m(N m) + (m 2)m(m 1)

=
N (N 1)
m(N 1)(N 2)
=
N (N 1)
N 2
=m
N
Page 25
Since Xn = m, we substitute it in.
2
E[Xn+1 |Xn ] = Xn (1 )N
N
Question: Expectation
Let each citizen pick two melons per ritual. After n years, compute the
average number of watermelons a particular citizen will have left.
N
Answer (1 2 )n1 N
We are now computing E[Xn ]. First, we note that the law of total expectation
allows us to conclude the following.
E[Xn+1 ] = E[E[Xn+1 |Xn ]]

2
= E[Xn (1 )]
N
2
= (1 )E[Xn ]
N
We have the following relationship.
2
E[Xn ] = (1 )E[Xn1 ]
N
Since E[Xn ] is recursively defined, we see that the constant in front of E[Xn1 ]
will simply be multiplied repeatedly. Thus, we can express this in terms of
E[X1 ].
2 n1
E[Xn ] = (1 ) E[X1 ]
N
Finally, we note that we began with N watermelons, so E[X1 ] = N .
2 n1
E[Xn ] = (1 ) N
N
Question: Algebra
Let each citizen pick two melons per ritual. If all citizens begin with 100
watermelons, what is the minimum number of years such that the expected
number of cantaloupes a citizen has is at least 99?
Answer 229
Just plug into our expression for E[Xn ]. 99 cantaloupes means 1 watermelon.
Thus, we are solving for E[Xn ] = 1.
2 n1
E[Xn ] = (1 ) 100 = 1
100
Page 26
49 n1 1
( ) =
50 100
49 n1 1
log ( ) = log
50 100
Using the log rule, log an = n log a, we get:
99 1
(n 1) log = log
100 100
1
log 100
n1=
log 49
50
1
log 100
n=1+
log 49
50
n 229
It will take approximately 229 years.
Page 27
4.4 Problems
1. Consider a set of books B. The spine of each book has a single character
inscribed, and this character is the only distinguishing characteristic. Placed
side-by-side in the right order, the spines of all books in B spell some string
s, RaoWalrandRule. What is the expected number of times s shows
up, if a monkey picks n books from B uniformly at random and places
them, uniformly at random, on a shelf? The books may be upside-down,
but the spines are always facing out. Hint: Do repeating characters affect
the probability?
2. Four square inch tiles make up a 2 in. 2 in. insignia, and each tile is
distinct. We have 2572 such tiles. When the tiles are randomly arranged
in a 257 257 grid, what is the expected number of properly-formed
insignias? Assume that tiles can be rotated in any of 4 orientations but
not flipped.
3. Let Xi be the event that the ith dice roll has an even number of pips. Let
Z be the product of all Xi , from 1 to 10. Formally, Z = 10i=1 Xi . Let Y
P10
be the sum of all Xi , from 1 to 10. Formally, Y = i=1 Xi .
(a) Compute E[Y |Z]. Remember that your result is a function of Z.

(b) Using your derivation from part (a), compute E[Y |Z > 0]. Explain
your answer intuitively.
(c) Using your derivation from part (a), compute E[Y |Z = 0].
Page 28
Chapter 5
Distributions and
Estimation
5.1 Guide
Distributions help us model common patterns in real-life situations. In the
end, being able to recognize distributions quickly and effectively is critical to
completing difficult probability problems.
5.1.1 Important Distributions

Here are several of the most important distributions and when to use them.
Binomial Distribution: Number of successes in n trials, where each

trial has probability p of success
Geometric Distribution: Number of trials until the first success, where
each trial is independent and each trial has probability p of success
Poisson Distribution: The probability of k successes per n trials or a
unit of time, given some average number of successes
5.1.2 Combining Distributions

The minimum across k geometric distributions, each with the same parameter
p, is X 0 Geo((1 p)k ).
The sum of any k Poisson distributions is another Poisson distribution
Pk
X 0 Pois( i i ).
5.1.3 Variance
Variance is by definition equal to E[(X E[X])2 ]. After a brief derivation, we
get that Var(X) is then the following.
29
Var(X) = E[X 2 ] E[X]2
Now, we discuss how to algebraically manipulate variance. First, note that

shifting all values of X by a constant c will not change the variance. Second,
pulling out constants from inside the variance squares the value.
Var(X + c) = Var(X)
Var(aX) = a2 Var(X)
5.1.4 Covariance
Covariance is by definition equal to E[(X E[X])(Y E[Y ])]. After a brief
derivation, we get that Cov(X, Y ) is then equal to the following.
Cov(X, Y ) = E[XY ] E[X]E[Y ]
Now, we discuss how to algebraically manipulate Cov. First, we can split sums.
Second, we can move constants out and apply the constant to either variable.
Cov(X1 + X2 , Y ) = Cov(X1 , Y ) + Cov(X2 , Y )

aCov(X, Y ) = Cov(aX, Y ) = Cov(X, aY )
5.1.5 Linearity of Variance

The variance of two random variables summed together is.
Var(X + Y ) = Var(X) + 2Cov(X, Y ) + Var(Y )
However, if X and Y are independent, we have that Cov(X, Y ) = 0 and

Var(X + Y ) = Var(X) + Var(Y ). More generally, if all Xi s are independent,
the linearity of variance holds Var(X1 + ...Xn ) = Var(X1 ) + ... + Var(Xn ), or:
X X
Var( Xi ) = Var(Xi )
i i
If all Xi share a common distribution with identical parameters, then we also

P
know i Var(Xi ) = n Var(Xi ).
5.1.6 Linear Regression
Cov(X, Y )
L[Y |X] = E[Y ] + (X E[X])
Var(X)
Page 30
5.2 Variance Walkthrough

We will begin with the most basic form of variance, with independent random
variables. There are several important takeaways from this walkthrough.
P P
1. If events Xi are mutually independent, then var( i Xi ) = i var(Xi ).
2. The converse of statement is not necessarily true.
3. Otherwise, we apply the more general definition of variance, that var(X) =
E[X 2 ] E[X]2 .
Question: Variance Algebra

Let X be the number of pips for a single dice roll. Compute Var(6X 2 + 3).
Answer 5369
Shifting a distribution (or similarly, a random variables values) by a constant

does not affect the variance.
Var(6X 2 + 3) = Var(6X 2 )
We pull out the constant, and substitute the equation in.
= 36Var(X 2 )
= 36(E[X 4 ] E[X 2 ]2 )
2 2 2 2 2
+62 14 +24 +34 +44 +54 +64
We know that E[X 2 ] = 1 +2 +3 +4
6
+5
= 91
6 . Likewise, E[X 4 ] = 6 =
2275
6 . Finally, plug in and simplify.
2275 91
= 36( ( )2 )
6 6
2275 8281
= 36( )
6 36
= 6 2275 8281
= 5369
Question: Independent Events

Let Xand Y be the number of pips for two separate dice rolls. Compute
Var( 6X 6Y + 3).
Answer 35
We first know that the shift does not affect variance. We apply linearity of
variance, as X and Y are independent. Thus,
Page 31

Var( 6X 6Y + 3) = Var( 6X 6Y )

= Var( 6X) + Var( 6Y )
= 6(Var(X) + Var(Y ))
We can compute Var(X) = E[X 2 ] E[X]2 separately. We know that E[X] =

2 2 2 2
1+2+3+4+5+6 +52 +62
6 = 72 . Likewise, E[X 2 ] = 1 +2 +3 +4
6 = 91
6 . Thus,
Var(X) = E[X 2 ] E[X]2

91 7
= ( )2
6 2
91 49
=
6 4
35
=
12
Now, we substitute in.

Var( 6X 6Y + 3) = 6(Var(X) + Var(Y ))
35 35
= 6( + )
12 12
35
=6
6
= 35
Page 32
Chapter 6
Bounds
6.1 Guide
6.1.1 Markovs Inequality
Markovs inequality offers a bound in one direction. Intuitively, it gives us an
upper bound on the probability we are greater than some value. Keep in mind
that a 0, and X cannot take negative values.
E[X]
P r[X > a]
a
More generally, for a strictly non-negative, monotonically increasing function f ,
E[f (X)]
P r[X > a]
f (a)
6.1.2 Chebyshevs Inequality

Keep in mind that a 0, and intuitively, it gives an upper bound on the
probability that we are more than a distance a from the mean.
Var(X)
P r[|X E[X]| ]
2
However, we may be interested in a lower bound on the probability that we are
less than a distance a from the mean. Thus, if Chebyshevs offers a bound of p,
we are actually interested in 1 p.
6.1.3 Law of Large Numbers

Let E[X] be the average across all samples of data. Let E[X] be the actual
average. The Law of Large Numbers states that with many i.i.d. random
approaches E[X].
variables, E[X]
33
6.2 Confidence Intervals Walkthrough

This question was taken from the Spring 2016 CS70 Discussion 12.
On the planet Vegas, everyone carries a coin. Many people are honest and carry
a fair coin (heads on one side and tails on the other), but a fraction p of them
cheat and carry a trick coin with heads on both sides. You want to estimate p
with the following experiment: you pick a random sample of n people and ask
each one to flip his or her coin. Assume that each person is independently likely
to carry a fair or a trick coin.
Question: Estimation
Given the results of your experiment, how should you estimate p?
We are looking for p, the fraction of people with trick coins, so let us begin by
assuming that the fraction of people with trick coins is p.
Let q be the fraction of people we observe with heads, in terms of p.
1
p + (1 p)
q = (1)
2
This implies that
2 p + (1 p)
q = 2
q1
p = 2
Note that p is the fraction of people we think have trick coins. This is different
from p, which is the actual fraction of people with trick coins. To express the
actual p in terms of our actual q, we rewrite the tilde expressions. Note that this
is in theory dependent on the people we sample, so this is only an approximate
equality that should be true as we approach an infinite number of samples.
p 2q 1
Question: Chebyshevs
How many people do you need to ask to be 95% sure that your answer is off
by at most 0.05?
We are looking for the difference between p and p to be less than 0.05 with
probability 95%. We first note that Chebyshevs inequality naturally follows,
as Chebyshevs helps us find distance from the mean with a certain probability.
Formally, this is Chebyshevs:
var(X)
P r[|X | a]
a2
Page 34
However, we are interested in finding an n so that we are off by at most 0.05 with
probability 95%. This is equivalent to being off by at least 0.05 with probability
5%. The latter is answerable by Chebyshevs.
Then, we follow three steps.
Step 1 : Fit to |X | a
We first only deal with your answer is off by at most 0.05. We can re-express
this mathematically, with the following:
|
p p| < 0.05
We dont have p, however, so we plug in q for our p.
|(2
q 1) (2q 1)| 0.05
|2
q 2q| 0.05
|
q q| 0.025
First, note that with infinitely many samples, the fraction q should naturally
converge to become the fraction q.
q = q
We can thus transform this to something closer to our form!
|
q q| 0.025
.
However, we need to incorporate the number of people we are sampling. So, we
multiply all by n.
|
q n qn| 0.025n
Let us consider again: what is q? We know that q was previously defined to be

the fraction of people that we observe to have heads. We are inherently asking
for the number of heads in n trials. In other words, we want k successes among
n trials, so this sounds calls for a Bernoulli random variable! We will define Xi
to be 1 if the ith person tells us heads. This makes
n
1X
q = Xi
n i=1
To make our life easier, let us define another random variable Y = qn.
Page 35
n
X
Y = qn = Xi
i=1
.
Seeing that this now matches the format we need, our is 0.025. Our final form
is
var(Y )
P r[|Y qn| 0.025n]
(n0.025)2
Equivalently,
var(Y )
P r[|Y qn| < 0.025n] 1
(n0.025)2
var(Y )
Step 2 : Compute a2
We first compute var(Xi ).
var(Xi ) = E[Xi2 ] E[Xi ]2

= q q2
= q(1 q)
We then compute var(Y ).
n
X
var(Y ) = var( Xi )
i=1
n
X
= var(Xi )
i=1
= nvar(Xi )
= nq(q 1)
Thus, we have the value of our right-hand-side.
var(Y ) nq(1 q)
2
=
a (n0.025)2
q(1 q)
=
n(0.025)2
Step 3 : Compute Bound

We now consider the remainder of our question: How many people do you need
to ask to be 95% sure.... Per the first paragraph right before step 1, we are
actually interested in the probability of 5%. Thus, we want the following.
Page 36
var(Y ) q(1 q)
2
= = 0.05
a n(0.025)2
.
We have an issue however: there are two variables, and we dont know q.
However, we can upper bound the quantity q(1q). Since Chebyshevs computes
an upper bound for the probability, we can substitute q(1 q) for its maximum
value.
q(1 q) = q 2 q
To find its maximum, we take the derivative and set equal to 0.
q 0 = 2q 1 = 0
1
q=
2
This means that q(1 q) is maximized at q = 21 , making the maximum value

for q(1 q), 12 (1 12 ) = 12 ( 21 ) = 41 . We now plug in 14 .
q(1 q)
1 = 0.95
n(0.025)2
q(1 q)
= 0.05
n(0.025)2
1/4
= 0.05
n(0.025)2
1 1
=
4n(0.025)2 20
5
=n
(0.025)2
n = 8000
Page 37
Chapter 7
Markov Chains
7.1 Guide
Markov Chains are closely tied to both linear algebra and differential equations.
We will explore connections with both to build a better sense of how markov
chains work.
7.1.1 Definition
Formally, a Markov Chain is a countable set of random variables that satisfy
the memoryless (Markov) property, where transitions to the next state depend
only on the current state.
7.1.2 Characterization
We are interested in three properties of Markov Chains: (1) reducibility, (2)
periodicity, and (3) transience.
A Markov chain is irreducible if it can go from every state i to every other

state j, possibly in multiple steps.
d(i) := g.c.d{n > 0 | P n (i, i) = P r[Xn = i|X0 = i] > 0}, i X where
d(i) = 1 if and only if the state i is aperiodic. The Markov chain is
aperiodic if and only if all states are aperiodic.
A distribution is invariant for the transition probability matrix P if it
satisfies the following balance equation: = P . If a time-dependent
distribution converges limn n = , the resulting distribution is then
called the stationary distribution or steady state distribution.
7.1.3 Transition Probability Matrices

The Transition Probability Matrix (TPM) is written so that the rows sum to 1.
Each i, jth entry corresponds to the probability that we transition from state i
(row) to state j (column).
38
7.1.4 Balance Equations

Balance equations consider incoming edges in the Markov chain, where each
(i) is the sum of all previous states (h) multiplied by the probability that h
transitions to i. More succinctly,
X
i, (i) = Phi (h)
h
Note that we can obtain the balance equations by left-multiplying the TPM by
~ = [(0)(1) . . . (n)]T
7.1.5 Important Theorems

Any finite, irreducible Markov chain has a unique invariant distribution.
Any irreducible, aperiodic Markov chain has a unique invariant distribution
that it will converge to, independent of the chains initial state.
Page 39
7.2 Hitting Time Walkthrough

Hitting Time questions ask for the expected amount of time to hit a state.
Note that for in CS70, we consider only discrete-time Markov chains. The first
3 variants introduced below reduce to solving a system of linear equations.
s2 s3
1 1/2 1 1/4
1/4
1/4 s0 s1 1/2
1/4
Question: Linear Algebra with Transition Probability

Let Xi be the number of steps needed to reach X3 . Compute E[X0 ].
Answer 16
Begin by writing the First Step equations.
1 1 1
(s0 ) = 1 + (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
(s1 ) = 1 + (s0 ) + (s1 ) + (s3 )
4 2 4
(s2 ) = 1 + (s0 )
(s3 ) = 0
Bring all constants to one side and all (si ) to the right.
3 1 1
1 = (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
1 = (s0 ) (s1 ) + (s3 )
4 2 4
1 = (s0 ) (s2 )
0 = (s3 )
Write the system of equations as a matrix.
3/4 1/4 1/2 0 1

1/4
1/2 0 1/4 1
1 0 1 0 1
0 0 0 1 0
Page 40
Finally, solve the system of equations.
0 0 0 0 16

0 0 0 0 10

0 0 0 0 17
0 0 0 1 0
This means (s0 ) = 16.
s2 (2) s3
1 1/2 1 1/4
1/4
1/4 s0 (4) s1 (1) 1/2
1/4
Question: Linear Algebra with Transition Probability, Wait Time

Let Xi be the number of steps needed to reach X3 . In the Markov Chain
above, the number in parentheses for a state represents the number of steps
needed to pass through that state. Compute E[X0 ].
1 1 1
(s0 ) = 4 + (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
(s1 ) = 1 + (s0 ) + (s1 ) + (s3 )
4 2 4
(s2 ) = 2 + (s0 )
(s3 ) = 0
3 1 1
4 = (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
1 = (s0 ) (s1 ) + (s3 )
4 2 4
2 = (s0 ) (s2 )
0 = (s3 )
Page 41
3/4 1/4 1/2 0 4

1/4
1/2 0 1/4 1

1 0 1 0 2
0 0 0 1 0
0 0 0 0 44

0 0 0 0 24

0 0 0 0 46
0 0 0 1 0
s2 (2) s3
1 (4) 1/2 1 1/4 (4)

1/4
1/4 s0 (4) s1 (1) 1/2
1/4 (4)
Question: Lin. Alg. with Trans. Prob., Wait Time, Trans. Time
Let Xi be the number of steps needed to reach X3 . In the Markov Chain
above, the number in parentheses for a state represents the number of steps
needed to pass through that state. The numbers in parentheses for an edge
represent the number of steps needed to pass through that edge. If no
number is specified, the edge takes only 1 step. Compute E[X0 ].
1 1 1
(s0 ) = 4 + (s0 ) + ((s1 ) + 4) + (s2 )
4 4 2
1 1 1
(s1 ) = 1 + (s0 ) + (s1 ) + ((s3 ) + 4)
4 2 4
(s2 ) = 2 + ((s0 ) + 4)
(s3 ) = 0
Simplify all constants.
1 1 1
(s0 ) = 4 + (s0 ) + (s1 ) + 1 + (s2 )
4 4 2
Page 42
1 1 1
(s1 ) = 1 + (s0 ) + (s1 ) + (s3 ) + 1
4 2 4
(s2 ) = 2 + (s0 ) + 4
(s3 ) = 0
3 1 1
5 = (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
2 = (s0 ) (s1 ) + (s3 )
4 2 4
6 = (s0 ) (s2 )
0 = (s3 )
3/4 1/4 1/2 0 5

1/4
1/2 0 1/4 2

1 0 1 0 6
0 0 0 1 0
0 0 0 0 72

0 0 0 0 48

0 0 0 0 78
0 0 0 1 0
Page 43
Chapter 8
Solutions
This section contains completely-explained solutions for each of the problems

provided. Each one of these problems is designed to be at exam-level or harder,
erring on the side of difficulty. The goal is touch on all major topics presenting
in that chapter. In each of the following solutions, we identify Takeaways for
every question at the bottom. You should understand just how the solution
appeals to those takeaways, and on the exam, be prepared to apply tips and
tricks presented here.
44
8.1 Counting
1. If we roll a standard 6-sided die 3 times, how many ways are there to roll
a sum total of 14 pips where all rolls have an even number of pips?
Solution: We can reduce all of our dice to 3-sided die that contain
only even numbers. Additionally, we can consider a reduced subproblem.
In the original problem, we can only combine any number of 2, 4, 6 for a
total of 14. This is the same as counting the number of ways to combine
1, 2, 3 for a total of 7.
Distributing x to a dice roll is the same as assigning it 2x pips. Since a
dice roll can have at most 6 pips, x is at most 3 for a single roll. Since
dice do not have a 0 side, x is at least 1. Thus, we are distributing 7 balls
among 3 bins with at most 3 balls and at least 1 ball for a single bin.
By 2.2 Stars and Bars Walkthrough: At Least, we first distribute 1 ball to
each bin, reducing the problem to 4 balls and 3 bins, for 62

By 2.2 Stars and Bars Walkthrough: At Most, we can identify two classes
of invalid combinations:
Distribute all 4 balls to one bin: 31 .

Distribute 3 balls to one bin. There are then 2 other bins to pick
from, for the last ball: 31 21

In sum, we then have (1) all ways to distribute 7 balls, with at least 1 in
each bin Minus (2) all the ways to get more than 3 balls in a single bin.

6 3 3 2

2 1 1 1
Takeaway: Reduce complex stars and bars to the most basic form.
Alternate Solution:
First, we distribute 1 ball to each bin, reducing the problem to distributing
4 balls among 3 bins such that each bin contains no more than 2 balls.
The possibilities can be enumerated: 2 + 2 + 0, 2 + 0 + 2, 0 + 2 + 2, 1 + 1 +
2, 1 + 2 + 1, 2 + 1 + 1. Hence, there are 6 total ways.
Takeaway: When the options are few enough, enumerate.
are there with at least 1 club and no aces?
Solution: Let A be the event, at least 1 club, and B be the event no
aces. We are looking for |A B|.
Note that computing |A B| is potentially tedious. Instead of considering
no clubs. Thus, we
A, at least 1 club, it is simpler to consider A,
rewrite |A B| = |B| |A B|. To compute |A B|, we examine all
combinations with no aces and no clubs. We are drawing from 52412 =
Page 45
36 cards, making |A B| = 36

5 . Consider all hands with no aces. This is
|B| = 48

5 . Thus, we have
|A B| = |B| |A B|

48 36
=
5 5
are there with at least 1 club or no aces?
Solution: Again, let A be the event, at least 1 club, and B be the
event no aces. We are looking for |A B| = |A| + |B| |A B|.
From the previous part, we have |A B| = |B| |A B|. Thus, we first
simplify the original |A B| algebraically.
|A B| = |A| + |B| |A B|
= |A| + |B| (|B| |A B|)
= |A| + |A B|
We now compute |A|. Again, |A| is easier to compute, so we consider

|A| = || |A|. |A|, or combinations without any clubs, is 39

5 . Thus, the
number of combinations with at least one club is |A| = 52 39

5 5 . We
now compute |A B| = |A| + |A B|

52 39 36
+
5 5 5
Takeaway: Draw a Venn Diagram, and compute simpler portion.
4. Given a standard 52-card deck and a 3-card hand, how many unique
hands are there with cards that sum to 15? (Hint: Each card is uniquely
identified by both a number and a suit. This problem is more complex than
phone numbers.)
Solution:
Numbers
First, a card has a maximum value of 13. Implicitly, we are thus looking
for all the ways to distribute 15 among 3 bins, where any bin has at most
13 and at least 1. First, we distribute a value of 1 to each bin, so the
problem reduces down to distributing 12 among 3 bins, where each bin is
at most 12. By stars and bars, the answer here is just 14

2 .
Suits
3
There are a total of 41 ways to pick suits. The only problem is when
we assign the same number and suit to two different cards. We will thus
consider all invalid combinations.
Page 46
It is possible that each card is a 5. There are 43 total assignments

of suits to the cards, and only 4!/1! = 24 of them are valid. Hence,
there are 43 24 invalid combinations here.
Let us count the number of invalid combinations in which exactly 2 of
the cards are assigned the same number. The number that the cards
are assigned can range from 1 to 7, excluding 5 (since if two cards
are 5, this forces the third card to also be 5, which corresponds to
the previous case of all 3 cards having the same value). There are 32

ways to choose the locations of the numbers, 61 ways to choose the

number which is repeated, 4 ways to choose the suit of the repeated

numbers (in an invalid combination, the repeated numbers have the
same suit), and 4 ways to choose the suit of the last number, for a
total of 3 6 4 4 = 288 invalid combinations.
We thus have (1) all ways to distribute 15 validly Multiplied by (2) all
possible suit choices and finally, Minus invalid suit and number assignments.
3
14 4
(43 24) 288 = 5496
2 1
Takeaway: Beware of over-counting.
Page 47
8.2 Probability
1. We sample k times at random, without replacement, a coin from our wallet.
There are p pennies, n nickels, d dimes, and q quarters, making N total
coins. Given that the first three coins are pennies, what is the probability
that we will sample a nickel, 2 pennies, and a nickel in sequential order?
Note that this sequence of 4 coins does not need to be consecutive and
can span more than 4 coins.
Solution: First, we consider what we know about our wallet. We know
that there are N 3 coins remaining, of which p 3 are pennies. Let
Da denote the denomination of the ith coin. Let a, b, c, d be the indices
in sequential order. Let Xa be the event that Da is a nickel, Xb be the
event that Db is a penny, etc. We are thus interested in computing the
following.
P r[Xa Xb Xc Xd ]
n
The probability of a nickel at some position is N 3 . There are now N 4
coins remaining, of which n 1 are nickels.
n
P r[Xa ] =
N 3
The probability of a penny at some later position, regardless of which
position it is, is Np3
4 . There are now N 5 coins remaining, of which p 4
are pennies.
p3
Pr[Xb |Xa ] =
N 4
p4
The probability of another penny at some later position is N 5 . There
are now N 6 coins remaining.
p4
P r[Xc |Xa , Xb ] =
N 5
n1
The probability of another nickel is N 6 .
n1
P r[Xd |Xa , Xb , Xc ] =
N 6
Since all samples are made at random, we know the following holds.
P r[Xa Xb Xc Xd ] = P r[Xa ]P r[Xb |Xa ]P r[Xc |Xa , Xb ]P r[Xd |Xa , Xb , Xc ]

n p3 p4 n1
=
N 3N 4N 5N 6
We made this argument by symmetry. See 3.2 Symmetry Walkthrough if
this is confusing. Takeaway: Symmetry can greatly simplify problems.
Page 48
2. We are playing a game with our apartment-mate Wayne. There are three
coins, one biased with probability p of heads and the other two fair coins.
First, each player is assigned one of three coins uniformly at random.
Players then flip simultaneously, where each player earns h points per
head. The winning player is the one with the most points. If Wayne earns
k points after n coin flips, what is the probability that Wayne has the
biased coin?
Solution: Let X be the number of heads that Wayne obtains. Let Y be

the event that Wayne has the biased coin. By Bayes Rule, we know the
following.
P r[X|Y ]P r[Y ]
P r[Y |X] =
P r[X|Y ]P r[Y ] + P r[X|Y ]P r[Y ]
We will first compute the easiest terms. We know the following probabilities.
Given that we have three coins with one biased, the probability of a biased
coin is 13 and for an unbiased coin, 32 .
1
P r[Y ] =
3
2
P r[Y ] =
3
We will now compute both conditionals, given Y and Y . Restated in

English, we are computing the probability of hk success given n trials. The
number of successes follows the Bin(n, p) distribution, and the general
formula is

n
P r[Bin(n, p) = k] = (1 p)nk pk
k
For the biased coin, the probability of heads is p (Y ). For the fair coin,
the probability of heads is 21 (Y ). To simplify, we will define k 0 = hk , which
gives us the number of heads that Wayne obtained.

n 0 0
P r[X = k 0 |Y ] = 0
(1 p)nk pk
k

n 1
P r[X = k 0 |Y ] =
k 0 2n
We have computed all values, so we plug into Bayes Rule and simplify.
Page 49
P r[X|Y ]P r[Y ]
P r[Y |X] =
P r[X|Y ]P r[Y ] + P r[X|Y ]P r[Y ]
P r[X|Y ] 13
=
P r[X|Y ] 13 + P r[X|Y ] 32
P r[X|Y ]
=
P r[X|Y ] + P r[X|Y ]2
n nk0 k0

k0 (1 p) p
= n
p + kn0 21n 2
0 k0

nk
k0 (1 p)
0 0
(1 p)nk pk
=
(1 p)nk0 pk0 + 21n
Takeaway: Remember Bayes Rule.

3. Let X and Y be the results from two numbers, chosen uniformly randomly
in the range {1, 2, 3, ..., k}. Define Z = |X Y |.
(a) Find the probability that Z < k 2.

Solution: Instead of computing P r[Z < k 2], it is easier to
compute 1 P r[Z < k 2] = P r[Z k 2], as this includes only
two possible values for Z.
P r[Z k 2] = P r[Z = k 2] + P r[Z = k 1]
We can now compute each probability independently. There are a

total of k 2 possible combinations:
For P r[Z = k 1], we have 1 possible combination of numbers
(k, 1), with 2 ways of rolling those combinations (k, 1) or (1, k),
making 2 total combinations. k22
For P r[Z = k 2], we have 2 possible combinations of numbers
((k, 2), (k1, 1)) with 2 ways to roll each, making 4 total combinations.
4
k2
4 2 6
P r[Z k 2] = + 2 = 2
k2 k k
We thus solve for P r[Z < k 2].
1 P r[Z < k 2] = P r[Z k 2]

P r[Z < k 2] = 1 P r[Z k 2]
Now, we can plug in to get our final expression.
P r[Z < k 2] = 1 P r[Z k 2]
Page 50
6
=1
k2
Takeaway: Use counting where applicable.
(b) Find the probability that Z 2.
Solution: Again, we apply the same trick in the previous part. It
is easier to compute the probability for P r[Z < 2] = 1 P r[Z 2].
Thus, we have only two possible values of Z to account for.
P r[Z < 2] = P r[Z = 0] + P r[Z = 1]
There are a total of k 2 combinations.

We know that Z = 0 implies that both rolls yielded the same
number. Thus, the first roll has k options and for each value the
first roll assumes, the second has 1, making k total combinations.
k
k2
We know that Z = 1 implies that the rolls are within 1 of each
other. Let us consider two cases, (1) X {1, k} or (2) 1 < X <
k. If the first roll is 1 or k, then the second roll has only one
option each, 2 or k 1, respectively. If the first roll 1 < X < k
(k-2 possibilities), then Y has two options each, making 2(k 2)
possible combinations. 2+2(k2)
k2 = 2(k1)
k2
k 2(k 1) 3k 2
P r[Z < 2] = 2
+ 2
=
k k k2
We thus solve for P r[Z 2].
P r[Z < 2] = 1 P r[Z 2]

P r[Z 2] = 1 P r[Z < 2]
Now, plug in our solution for P r[Z < 2].
P r[Z 2] = 1 P r[Z < 2]

3k 2
=1
k2
k 2 3k + 2
=
k2
(k 2)(k 1)
=
k2
4. Consider a 5 in. 3 in. board, where each square inch is occupied by a

single tile. A monkey hammers away at the board, choosing a position
uniformly at random; assume a single strike completely removes a tile from
that position. (Note that the monkey can choose to strike a position with
Page 51
no tiles.) By knocking off tiles, the monkey can create digits. For example,
the following would form the digit 3, where 0 denotes a missing tile and
1 denotes a tile.

0 0 0
1 1 0

0 0 0

1 1 0
0 0 0
(a) Given the monkey strikes n times, where n > 15, what is the probability
Solution: To knock out an 8, the monkey needs to achieve the
following board.

0 0 0
0 1 0

0 0 0

0 1 0
0 0 0
We can consider this as balls and bins, where we have n balls being
thrown into 15 bins. Except, all balls are distinguishable. The total
number of ways to strike 15 tiles is 15n .
We wish to avoid the 2 tiles and to strike all other 13 tiles. Thus,
we are throwing n balls into 13 bins, where each bin has at least 1.
Since order matters, we cannot apply stars and bars directly.
i. Consider all the ways to distribute n balls among 13 bins, 13n
ii. Subtract all cases where 1 bin is left empty. First, we choose the
bin that is left empty 13

1 , then we distribute the n balls among
the remaining 12 bins, 12n . Together, this is 13
n
1 12 .
iii. By inclusion-exclusion, we have double-counted all cases where 2
bins are left empty. Choose the 2 bins that are left empty, 13

2 .
13
n
n
Distribute to the remaining bins, 11 . This is + 2 11 .
We notice a pattern, which is that for 0 i 13, we select i to
be the number of bins that are empty. Then, we distribute n balls
among the remaining i bins. By inclusion-exclusion, we include and
exclude alternately. The total number of ways to distribute n balls
among 13 bins, where each bin receives at least one ball is thus
13
i 13
X
(1) (13 i)n
i=0
i
The probability that this digit occurs is thus
Page 52
13
1 X i 13
(1) (13 i)n
15n i=0 i
Generally, let a be the number of 0s in the matrix. The probability

that the monkey forms this particular number is
a
1 X i a
(1) (a i)n
15n i=0 i
Takeaway: Dont overthink, and consider counting.

(b) Given the monkey strikes n times, where n > 15, what is the probability
Solution: To knock out a 2, the monkey needs to achieve one of
two boards.

0 0 0
1 1 0

0 0 0

0 1 1
0 0 0
By the same logic as the previous part, we can see that there are 4
positions we avoid and 11 positions we must hit. Using the general
form above, we see that the number of 0s (a) is 11. Thus, the
probability of a 2 is:
11
1 X i 11
(1) (11 i)n
15n i=0 i
(c) Given the monkey strikes n times, where n > 15, what is the probability
that the monkey knocks out tiles, such that the board forms an even
number?
Solution: First, let us count the number of 0s required to form
each digit.
digit 0 : 12 zeros
digit 2 : 11 zeros
digit 4 : 9 zeros
digit 6 : 12 zeros
digit 8 : 13 zeros
Let the probability that a digit with a zeros occurs be
a
1 X i a
pa = n (1) (a i)n
15 i=0 i
Applying the pa , we see that the probability that any of these even
numbers occurs is:
Page 53
2p12 + p11 + p9 + p13
(d) Which digit is most likely to occur?

Solution: The probability that a digit occurs is dependent solely
upon the number of zeros present in that digit. Thus, we look
for the digit with the most zeros. We enumerate all numbers and
corresponding 1s and 0s.
Digit ones zeros

0 3 12
1 6 9
2 4 11
3 4 11
4 6 9
5 4 11
6 3 12
7 6 9
8 2 13
9 3 12
We note that 8 has the most zeros. Thus, the digit 8 is most likely
to occur.
Page 54
8.3 Expectation
1. Consider a set of books B. The spine of each book has a single character
inscribed, and this character is the only distinguishing characteristic. Placed
side-by-side in the right order, the spines of all books in B spell some string
s, RaoWalrandRule. What is the expected number of times s shows
up, if a monkey picks n books from B uniformly at random and places
them, uniformly at random, on a shelf? The books may be upside-down,
but the spines are always facing out. Hint: Do repeating characters affect
the probability?
Solution: This is similar to the last problem in the 4.2 Linearity of
Expectation Walkthrough. However, we are drawing letters from s instead
of the entire alphabet. We realize the question the is asking for expected
number of (successes) once more, so we immediately define our indicator
variable, Xi to be 1 if one instance of s ends at the ith position along the
shelf.
The string is 14 letters long, meaning that the first 13 books on the shelf
cannot be the position where an instance of s ends. This makes X =
Pn
i=14 Xi the total number of times s appears on the shelf. By linearity
of expectation:
n
X
E[X] = E[ Xi ]
i=14
n
X
= E[Xi ]
i=14
n
X
= P r[Xi = 1]
i=14
We now compute the probability of success, which is the probability that

s is spelled, given some random sampling of letters from s. For the first
3
letter, sampling randomly from s give us a 14 probability of retrieving an
R, as there are 14 total letters, 3 of which are R. We proceed to compute
the product of these probabilities, one for each letter.
n
X 33113233113121
=
i=14
1414
n
X 36 22
=
i=14
1414
36 22
= (n 13)
1414
Takeaway: Be clever with how you define success for your indicator
variables. Remember linearity of expectation always holds.
Page 55
2. Four square inch tiles make up a 2 in. 2 in. insignia, and each tile is
distinct. We have 2572 such tiles. When the tiles are randomly arranged
in a 257 257 grid, what is the expected number of properly-formed
insignias? Assume that tiles can be rotated in any of 4 orientations but
not flipped.
Solution: There are 256 256 = 48 interior vertices. Let Xi be 1 if the
ith interior vertex is at the center of a valid insignia. Thus, we know that
P48
X = i Xi is the total number of valid insignias. We can apply linearity
of expectation, and then invoke properties of an indicator variable.
8
4
X
E[X] = E[ Xi ]
i
8
4
X
= E[Xi ]
i
8
= 4 E[Xi ]
= 48 P r[Xi = 1]
We can use counting to compute the probability that vertex i is the center
of a valid insignia. First, there are 4 valid orientations for the insignia.
Second, we now compute all possible combinations of 4 tiles. Each of
the four tiles is chosen at random and then rotated at random, making
4 possible tiles with 4 possible orientations each (16), for each tile. Thus
there are a total of 164 = 48 combinations.
4 1 1
P r[Xi = 1] = 8
= 7 = 7
4 4 4
We now plug it back in to solve for E[X].
E[X] = 48 P r[Xi = 1]
1
= 48 7
4
=4
The takeaway below was mentioned prior, but it is worth mentioning

again.
Takeaway: Define your indicator variables cleverly.

3. Let Xi be the event that the ith dice roll has an even number of pips. Let
Z be the product of all Xi , from 1 to 10. Formally, Z = 10i=1 Xi . Let Y
P10
be the sum of all Xi , from 1 to 10. Formally, Y = i=1 Xi .
Page 56
(a) Compute E[Y |Z]. Remember that your result is a function of Z.

Solution: We will begin by making algebraic manipulations. First,
we plug in Y . Then, by linearity of expectation, we move the summation
out.
X10
E[Y |Z] = E[ Xi |Z]
i=1
10
X
= E[Xi |Z]
i=1
= 10E[Xi |Z]
Separately, we will now compute E[Xi |Z]. We note that for an

indicator variable i, E[Xi ] = P r[Xi = 1]. The analogous, conditional
form states the following.
E[Xi |Z] = P r[Xi = 1|Z]

Applying Bayes Rule, we then get the following.
P r[Z|Xi = 1]P r[Xi = 1]

=
P r[Z|Xi = 1]P r[Xi = 1] + P r[Z|Xi = 0]P r[Xi = 0]
First, note that P r[Xi = 1] - the probability of rolling an even number
of pips - is 12 . The probability of rolling an odd, P r[Xi = 0] is also
1
2 . Thus, this simplifies
P r[Z|Xi = 1] 12
=
P r[Z|Xi = 1] 12 + P r[Z|Xi = 0] 21
P r[Z|Xi = 1]
=
P r[Z|Xi = 1] + P r[Z|Xi = 0]
We will additionally assign Z to a value k. After solving E[Xi |Z = k],
we can then substitute all instances of k with Z for E[Xi |Z].
P r[Z = k|Xi = 1]
E[Xi |Z = k] =
P r[Z = k|Xi = 1] + P r[Z = k|Xi = 0]
We now substitute Z in. For P r[Z = k|Xi = 0], since Z is the
product of all Xj , including Xi = 0, then Z = 0. Thus, P r[Z =
k|Xi = 0] = P r[k = 0].
P r[10
j=1 Xj = k|Xi = 1]
E[Xi |Z = k] =
P r[10
j=1 Xj = k|Xi = 1] + P r[k = 0]
Consider the probability specified in the numerator. P r[Xi = 1|10
j=1 Xj ].
We know that since Xi is 1, it does not affect the product. Thus,
this probability reduces to P r[10
j=1,i6=j Xj ].
Page 57
P r[10j=1,i6=j Xj = k]
=
P r[10
j=1,i6=j Xj = k] + P r[k = 0]
Finally, to convert E[Xi |Z = k] back to E[Xi |Z], we substitute Z

for all k. After that, we use our first result to obtain an expression
for E[Y |Z]. We can continue to reason about the numerator, but we
will leave that for part (c); a more intuition-based approach is shown
in the alternate solution that follows.
P r[10j=1,i6=j Xj = Z]
E[Xi |Z] = 10
P r[j=1,i6=j Xj = Z] + P r[Z = 0]
P r[10j=1,i6=j Xj = Z]
E[Y |Z] = 10 10
P r[j=1,i6=j Xj = Z] + P r[Z = 0]
Takeaway: Be able to switch between (1) algebraic manipulations

and (2) intuition.
Alternate Solution:
Since Z can only be 0 or 1, let us consider the two cases. First, when
Z = 1, then every Xi must be 1, which means that the sum of Xi s
must be 10. Therefore, E[Y |Z = 1] = 10.
Otherwise, Z = 0. Computing this case requires a bit more work.

From the definition:
10
X
E[Y ] = yP r[Y = y]
y=0
Conditioned on Z = 0, we know that at least one of the Xi s is 0.

This rules out the possibility that every Xi = 1, which occurs with
probability 210 . The probability that at least one Xi is 0 is 1210 .
The conditional distribution P r(Y = y|Z = 0) therefore has to be
rescaled:
P r[Y = y, Z = 0] P r[Y = y]
P r[Y = y|Z = 0] = = , 0y9
P r[Z = 0] 1 210
Page 58
Therefore, the conditional expectation is

9
X
E[Y |Z = 0] = yP r[Y = y|Z = 0]
y=0
9
1 X
= y Pr[Y = y]
1 210 y=0
10
!
1 X 1
= y Pr[Y = y] 10 10
1 210 y=0
2

1 1
= E[Y ] 10 10
1 210 2

1 1
= 5 10 10
1 210 2
We have obtained the two different cases for E[Y |Z = z], which is
enough to specify E[Y |Z] as a function of Z. Just for fun, we can
stitch the two expressions into one:

1 1
E[Y |Z] = 10Z + 5 10 (1 Z)
1 210 210
(b) Using your derivation from part (a), compute E[Y |Z > 0]. Explain
your answer intuitively.
Solution: To gain some insight, we then evaluate this expression for
Z > 0. Notice that if Z > 0, then P r[Z = 0] = 0. Thus,
P r[10j=1,i6=j Xj = Z]
E[Y |Z > 0] = 10 10 = 10
P r[j=1,i6=j Xj = Z] + 0
This makes sense, as if Z > 0, then i, Xi 6= 0. Thus, all i, Xi = 1,

P10
making Y = i=1 Xi = 10.
(c) Using your derivation from part (a), compute E[Y |Z = 0].
Solution: Next, we evaluate this expression at Z = 0. Note that
P r[Z = 0] = 1.
P r[10j=1,i6=j Xj = 0]
E[Y |Z = 0] = 10 10
P r[j=1,i6=j Xj = 0] + 1
We will now compute P r[10 j=1,i6=j Xj = 0] separately. Note that the

probability the product is 0, is the probability that at least one of
the Xj , i 6= j is 0. As a result, we consider the probability that none
of the Xj are 0, and subtract that from 1. Formally,
P r[10
j=1,i6=j Xj = 0]
= P r[10
j=1,i6=j Xj = 0]
Page 59
= 1 P r[10
j=1,i6=j Xj 6= 0]
Since the probability of all P r[Xj 6= 0] are independent, we can

multiply all of the probabilities together.
= 1 10
j=1,i6=j P r[Xj 6= 0]
For all Xj , P r[Xj 6= 0] is the probability that a dice roll is not even,
which is 12 .
1
= 1 10
j=1,i6=j
2
1
=1
29
Plugging it back in, we get that
1 219
E[Y |Z = 0] = 10
1 219 + 1
1
1 29
= 10 1
2 29
29 1
= 10
210 1
Takeaway: Apply DeMorgans to handle grotesque probabilities.
Page 60

Discrete Mathematics and Probability Theory: A Brief Compilation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discrete Mathematics and Probability Theory: A Brief Compilation

Uploaded by

Copyright:

Available Formats

A Brief Compilation

of Guides, Walkthroughs, and Problems

Discrete Mathematics and

1 Modular Arithmetic and Polynomials 5

3.2 Symmetry Walkthrough . . . . . . . . . . . . . . . . . . . . . . . 16

5 Distributions and Estimation 29

Modular Arithmetic and

1.1.2 Polynomial Properties

1. A polynomial of degree d has at most d roots.

1.1.3 Fermats Little Theorem

Applying Fermats Little Theorem repeatedly until the exponent of a is less

1.1.4 Lagrange Interpolation

Then, to recover the original polynomial, multiply all i by the respective yi s.

1.1.5 Error-Correcting Codes

D(E(x)) = E(x)d = (xe )d = xed = x1 (mod N )

1.1.7 Secret Sharing

1.2 Secret-Sharing Walkthrough

We need at least k people, meaning this polynomial should require k points.

Question: Combining Polynomials

Create a polynomial p1 of degree x1 1 for A and a second polynomial p2 of

Question: Combining Polynomials Generalized

Question: Erasure Channel

Question: Corruption Channel

Per our knowledge of corruption channels, we need to n + 2k packets, or in this

2.1.1 Fundamental Properties

2.1.2 Stars and Bars

2.1.3 Order, Replacement, and Distinguishability

Indistinguishable balls = Order doesnt matter, combinations

2.1.4 Combinatorial Proofs

Addition is Or, and multiplication is And.

Or consider all pairs from (another) n.

2.1.5 Inclusion Exclusion Principle

2.2 Stars and Bars Walkthrough

There are no more sprinkles for the other scoops.

Question: At Least and At Most

There are no more sprinkles for the other scoops.

E[Xi ] = 1 P [Xi = 1] + 0 P [Xi = 0] = P [Xi = 1]

3.1.2 Law of Total Probability

3.1.3 Conditional Probability

3.1.4 Bayes Rule

X, Y independent (P r[XY ] = P r[X]P r[Y ])

P r[X, Y ] P r[X]P r[Y ]

3.2 Symmetry Walkthrough

Question: Single Unconditioned, None Conditioned

Question: Single Unconditioned, Single Conditioned

Question: Single Unconditioned, Multiple Conditioned

Question: Multiple Unconditioned, None Conditioned

Question: Multiple Unconditioned, Multiple Conditioned

(a) Find the probability that Z < k 2.

4. Consider a 5 in. 3 in. board, where each square inch is occupied by a

4.1.1 Expectation Definition

However, we also know the following.

This also extends to multiple random variables, so

For example, we have that

4.1.2 Linearity of Expectation

E[a1 X1 + a2 X2 + + an Xn ] = a1 E[X1 ] + a2 E[X2 ] + + an E[Xn ]

Given a more complex combination of random variables, apply linearity of

4.1.3 Conditional Expectation

4.1.4 Law of Total Expectation

E[E[Y |X]] = E[Y ]

E[E[Y |X]f (X)] = E[Y f (X)]

4.2 Linearity of Expectation Walkthrough

When a question asks how many, immediately think indicator variable.