Professional Documents
Culture Documents
Alvin Wan
Contents
0.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.1.1 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.1.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.1.3 Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.1.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Counting 9
2.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Fundamental Properties . . . . . . . . . . . . . . . . . . . 9
2.1.2 Stars and Bars . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Order, Replacement, and Distinguishability . . . . . . . . 9
2.1.4 Combinatorial Proofs . . . . . . . . . . . . . . . . . . . . 10
2.1.5 Inclusion Exclusion Principle . . . . . . . . . . . . . . . . 10
2.2 Stars and Bars Walkthrough . . . . . . . . . . . . . . . . . . . . 11
2.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Probability 14
3.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Law of Total Probability . . . . . . . . . . . . . . . . . . . 14
3.1.3 Conditional Probability . . . . . . . . . . . . . . . . . . . 14
3.1.4 Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.6 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
4 Expectation 19
4.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 Expectation Definition . . . . . . . . . . . . . . . . . . . . 19
4.1.2 Linearity of Expectation . . . . . . . . . . . . . . . . . . . 20
4.1.3 Conditional Expectation . . . . . . . . . . . . . . . . . . . 20
4.1.4 Law of Total Expectation . . . . . . . . . . . . . . . . . . 20
4.2 Linearity of Expectation Walkthrough . . . . . . . . . . . . . . . 21
4.3 Dilution Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Bounds 33
6.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1.1 Markovs Inequality . . . . . . . . . . . . . . . . . . . . . 33
6.1.2 Chebyshevs Inequality . . . . . . . . . . . . . . . . . . . . 33
6.1.3 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . 33
6.2 Confidence Intervals Walkthrough . . . . . . . . . . . . . . . . . 34
7 Markov Chains 38
7.1 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.1.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . 38
7.1.3 Transition Probability Matrices . . . . . . . . . . . . . . . 38
7.1.4 Balance Equations . . . . . . . . . . . . . . . . . . . . . . 39
7.1.5 Important Theorems . . . . . . . . . . . . . . . . . . . . . 39
7.2 Hitting Time Walkthrough . . . . . . . . . . . . . . . . . . . . . 40
8 Solutions 44
8.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Page 3
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
0.1 Purpose
This compilation is (unofficially) written for the Spring 2016 CS70: Discrete
Mathematics and Probability Theory class taught by Professor Satish Rao
and Professor Jean Walrand at UC Berkeley. Its primary purpose is to offer
additional practice problems and walkthroughs to build intuition, as a supplement
to official course notes and lecture slides. Including more difficult problems in
walkthroughs, there are over 35 exam-level problems.
0.1.1 Contributors
A Special Thanks to Sinho Chewi for spending many hours suggesting improvements,
catching bugs, and discussing ideas and solutions for problems with me. Additionally,
thanks to Dibya Ghosh and Blake Tickell, who helped review problems for clarity
and correctness.
0.1.2 Structure
Each chapter is structured so that this book can be read on its own. A
minimal guide at the beginning of each section covers essential materials and
misconceptions but does not provide a comprehensive overview. Each guide
is then followed by walkthroughs covering classes of difficult problems and 3-5
exam-level (or harder) problems that Ive written specifically for this book.
Note: As of Spring 2016, not all chapters have problems. However, all chapters
have at least a walkthrough. This will be amended in Fall 2016.
0.1.3 Breakdown
For the most part, guides are cheat sheets for select chapters from official
course notes, with additional comments to help build intuition.
For more difficult parts of the course, guides may be accompanied by breakdowns
and analyses of problem types that might not have been explicitly introduced
in the course. These additional walkthroughs will attempt to provide a more
regimented approach to solving complex problems.
Problems are divvied up into two parts: (1) walkthroughs - a string of problems
that evolve from the most basic to the most complex - and (2) exam-level
questions, erring on the side of difficulty where needed. The hope is that with
walkthroughs, students can reduce a relatively difficult problem into smaller,
simpler subproblems.
0.1.4 Resources
Additional resources, including 20+ quizzes with 80 practice questions, and
other random worksheets and problems are posted online at alvinwan.com/cs70.
Page 4
Chapter 1
1.1 Guide
1.1.1 Modular Arithmetic
In modulo p, only the numbers {0, 1, ..., p 1} exist. Additionally, division is
not well-defined. Instead, we define a multiplicative inverse. We know that
outside a modulo field, for any number n, an inverse n1 multiplied by itself is
1. (n n1 = 1) Thus, we extend the definition of an inverse to the modulo field
in this manner, where for any number n,
n n1 = 1 (mod p)
Do not forget that division, and thus, fractions do not exist in a modulo.
Note that taking a polynomial over a Galois Field of modulo p (denoted GF (p))
simply means that all operations and elements in that field are (mod p).
5
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
ap1 = 1 (mod p)
(x xj )
i = i6=j
(xi xj )
X
P (x) = i yi
i
1.1.6 RSA
In RSA, we have a public key (N, e), where N is a product of two primes, p and
q, and e is co-prime to (p 1)(q 1). Here are Encrypt and Decrypt.
E(x) = xe (mod N )
D(y) = y d = x (mod N )
Why are they defined this way? We have that y = E(x), so we plug in:
If the above equation xed = x is satisfied, then D(y) returns the original message.
How do we generate d? By Fermats Little Theorems corollary, we know ed = 1
(mod (p 1)(q 1)). Given we have e, we see that we can compute d if and
only if we know p and q. Thus, breaking RSA equates factorizing N into p, q.
Page 6
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
Question: Basic
Construct a scheme that requires that at least k of n people to come together,
in order to unlock the safe.
Answer Polynomial of degree k 1.
Create n polynomials with degree xi 1 for the ith group. Use the secrets
(i, pi (0)) of all n polynomials to create an n + 1th polynomial of degree n 1.
The root of this n + 1th polynomial is the secret.
Question: Re-weighting
Each group elects oi officials. Construct a scheme that requires ai oi officials
from each group, where 10 citizens can replace an official.
Answer 10ai 1 polynomials, where each official gets 10 points and each citizen gets 1
Create a polynomial of degree 10ai 1, and give each of the ai officials 10 points
each. Then, give each citizen 1 point each. Use the secrets (i, pi (0)) of all n
polynomials to create an n + 1th polynomial of degree n 1. Since each official
has 10 times the number of packets for the same polynomial, any 10 citizens
can merge to become a single official.
Page 7
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
The intuitive response to simply request the number of people that may not
respond in addition to the number of people we need to unlock the safe. Thus,
we need to request k+m workers to reconstruct a degree k1-degree polynomial.
Page 8
Chapter 2
Counting
2.1 Guide
Counting bridges discrete mathematics and probability theory, to some degree
providing a transition from one to the next. Albeit a seemingly trivial topic,
this section provides foundations for probability.
1. If, for each of k items, we have {n1 , n2 , ..., nk } options, the total number
of possible combinations is n1 n2 nk = i ni .
2. To find the total number of un-ordered combinations, divide the number
of ordered combinations by the number of orderings. nk = (nk)!k!
n!
9
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
items.
Toggle between the two quantities. a+b = a+b
b a , as choosing b is
the same as choosing a from a + b.
Try applying the first rule of counting as well.
2n is the equivalent of picking all (possibly empty) subsets. In other
words, we consider 2 possibilities for all n items {Include, Dont Include}.
nk k! = (nk)!
n!
, which is k samples from n items without replacement.
Make sure to not prove equality mathematically, or attempt to just write
in words what happens mathematically.
For, |A B|, note that the area denoting A B is duplicated. So, we subtract
the intersection to get the space of all A and B. Thus,
|A B| = |A| + |B| |A B|
|A B C| = |A| + |B| + |C| |A B| |B C| |C A| + |A B C|
Page 10
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
All possible phone numbers, given that the digits sum to a particular
value.
Distributing chairs among rooms, given we have only a particular number
of chairs.
Bringing at least k of each sock, given you can only fit n socks in your
suitcase.
Notice that in all of the mentioned scenarios, there is some bound on the
number of items we can distribute. That should immediately trigger stars and
bars.
Question: Basic
How many ways are there to sprinkle 10 oreo pieces on our 3 scoops of ice
cream? Assume that each scoop is a different flavor of ice cream. (i.e., Each
scoop is distinguishable.)
2
Answer
12
This reduces to a stars and bars problem, where we have 3 possible buckets
(2 bars) and 10 stars. Note that to specify 3 buckets, we need only 2 bars to
separate the 10 stars. Thus, we can choose the 2 bars or the 10 balls from 12
slots.
Question: At Least
How many ways are there to sprinkle 10 sprinkles on 3 scoops, such that the
first scoop gets at least 5 pieces?
2
Answer
7
This reduces to a second stars and bars problem. Simply, we first give the first
scoop 5 pieces. Why do this? This means that regardless of however many
additional pieces we distribute it, the first scoop will have at least 5 pieces.
We are then left with a sub-stars-and-bars problem, with 10 5 = 5 sprinkles
and 3 scoops. We proceed as usual, noting this is 5 stars and 2 bars.
Question: At Most
Assume that each scoop can only hold a maximum of 8 pieces. How many
ways are there to sprinkle 10 sprinkles on 3 scoops?
2 1 1 1
Answer
12 3 2 3
Page 11
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
First, we count all the possible ways to distribute 10 oreo sprinkles among 3
scoops. This is the answer to the last quiz, problem 3: 12
2
Then, we count the number of invalid combinations. The only invalid combinations
are when a scoop has 9 or more sprinkles. We consider each case:
1. One scoop has 9 sprinkles. There are 31 ways to pick this one scoop with
9 sprinkles. Then, there are two other scoops to pick from, to give the
final sprinkle, making 21 ways to distribute the last sprinkle.
2. One scoop has 10 sprinkles. There are 31 ways to pick this one scoop.
Thus, we take all combinations and then substract both invalid combinations.
We note that the invalid combinations are mutually exclusive.
Given the first problem in this walkthrough, we know that we can reduce the
problem to a sub-stars-and-bars problem. We first distribute 2 sprinkles to each
scoop, guaranteeing that each scoop will have at least 2 sprinkles distributed to
it.
Then, we count all the possible ways to distribute 8 oreo sprinkles among 3
scoops. This is - by stars and bars - 10
2 .
Then, we count the number of invalid combinations. Since each scoop already
has 2 sprinkles, it can take at most 6 more, to satisfy the 8-sprinkles maximum.
Thus, the invalid combinations are when we distribute 7 or more sprinkles to a
single scoop.
We consider each case:
1. One scoop has 7 sprinkles. There are 31 ways to pick this one scoop with
7 sprinkles. Then, there are two other scoops to pick from, to give the
final sprinkle, making 21 ways to distribute the last sprinkle.
2. One scoop has 8 sprinkles. There are 31 ways to pick this one scoop.
Thus, we take all combinations and then substract both invalid combinations.
We note that the invalid combinations are mutually exclusive, making 10
2
3 2 3
1 1 1 .
Page 12
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
2.3 Problems
1. If we roll a standard 6-sided die 3 times, how many ways are there to roll
a sum total of 14 pips where all rolls have an even number of pips?
2. Given a standard 52-card deck and a 5-card hand, how many unique hands
are there with at least 1 club and no aces?
3. Given a standard 52-card deck and a 5-card hand, how many unique hands
are there with at least 1 club or no aces?
4. Given a standard 52-card deck and a 3-card hand, how many unique
hands are there with cards that sum to 15? (Hint: Each card is uniquely
identified by both a number and a suit. This problem is more complex than
phone numbers.)
Page 13
Chapter 3
Probability
3.1 Guide
3.1.1 Random Variables
Let be the sample space. A random variable is by definition a function
mapping events to real numbers. X : R, X() R. An indicator variable is
a random variable that only assumes values {0, 1} to denote success or failure for
a single trial. Note that for an indicator, expectation is equal to the probability
of success:
X
P r[A] = P r[A|Bi ]P r[Bi ]
i
Do not forget this law. On the exam, students often forget to multiply by P r[Bi ]
when computing P r[A].
P r[A B]
P r[A|B] =
P r[B]
14
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
P r[B|A]P r[A]
P r[A|B] =
P r[B]
3.1.5 Independence
Note that if the implication only goes in one direction, the converse is not
necessarily true. This is a favorite for exams, where the crux of a True-False
question may be rooted in a converse of one of the following implications.
Using the above definition for independence with probabilities, we have the
following corollary.
3.1.6 Symmetry
Given a set of trials, the principle of symmetry states that the probability of
each trial is independent of other trials, without additional information. See 3.2
Symmetry Walkthrough for more details and concrete examples.
Page 15
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
The probability of the first marble being red is Nr . However, by symmetry, the
probability of each marble being red is also Nr ! We know this is true, because
we are not given any information about any of the marbles. With or without
replacement, symmetry applies. However, as we will see, symmetry breaks down
or applies to a limited degree when we are given information and condition our
probabilities.
We are given that the first marble is red. As a result, we are actually computing
P r[Xi = 1|X1 = 1]. Since one red marble has already been removed, we have
that for the second marble, P r[X2 = 1|X1 = 1] = Nr1 1 . Again, by symmetry,
we in fact know that this is true for all i > 1, as we do not have additional
information that would tell us otherwise.
The position of the marble that we have information about, does not matter.
Thus, again applying symmetry, we have that all remaining marbles have probability
r2
N 2 of being red.
Again, by symmetry, we can argue that regardless of which two marbles, the
probability that any pair is red, is the probability that one pair is red. Note
that symmetry doesnt apply within the pair, however. When considering the
second marble, we know that the first marble is necessarily red. When
computing the probability that the first and second marbles are red, we are
then computing P r[X1 ]P r[X2 |X1 ], which is Nr Nr1
1 .
Page 16
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
At this point, it should be apparent that we could have asked for the probability
that any two marbles are red. For the seventh marble, which is red, we consider
the number of red marbles we have no information about, which is r 1 and
the total number of marbles we have no information about, N 2. This makes
the probability that the seventh marble is red, Nr1
2 .
For the tenth marble, there are now N 3 marbles we do not have information
about. There are r 2 red marbles we do not have information about. Thus,
the numebr of remaining non-red marbles is (N 3) (r 2) = N r 1,
making the probability that the tenth marble is not red, NNr1
3 .
Page 17
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
3.3 Problems
1. We sample k times at random, without replacement, a coin from our wallet.
There are p pennies, n nickels, d dimes, and q quarters, making N total
coins. Given that the first three coins are pennies, what is the probability
that we will sample a nickel, 2 pennies, and a nickel in sequential order?
Note that this sequence of 4 coins does not need to be consecutive and
can span more than 4 coins.
2. We are playing a game with our apartment-mate Wayne. There are three
coins, one biased with probability p of heads and the other two fair coins.
First, each player is assigned one of three coins uniformly at random.
Players then flip simultaneously, where each player earns h points per
head. The winning player is the one with the most points. If Wayne earns
k points after n coin flips, what is the probability that Wayne has the
biased coin?
3. Let X and Y be the results from two numbers, chosen uniformly randomly
in the range {1, 2, 3, ..., k}. Define Z = |X Y |.
(a) Given the monkey strikes n times, where n > 15, what is the probability
that the monkey knocks out tiles, such that the board forms 8?
(b) Given the monkey strikes n times, where n > 15, what is the probability
that the monkey knocks out tiles, such that the board forms 2?
(c) Given the monkey strikes n times, where n > 15, what is the probability
that the monkey knocks out tiles, such that the board forms an even
number?
(d) Which digit is most likely to occur?
Page 18
Chapter 4
Expectation
4.1 Guide
With expectation, we begin to see that some quantities no longer make sense.
Expressions that we compute the expectation for may in fact be far detached
from any intuitive meaning. We will specifically target how to deal with these,
in the below regurgitations of expectation laws and definitions.
X
E[X] = xP r[X = x]
xR
X
E[g(X)] = g(x)P r[X = x]
xR
Said another way, the expression in E[...], which we will call g(X) can be
needlessly complex. To solve such an expectation, simply plug in the expression
into the summation. For example, say we need to solve for E[X 2/5 ]. This makes
little intuitive sense. However, we know the expression in terms of X affects
only the value.
X
E[X 2/5 ] = x2/5 P r[X = x]
19
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
X
E[g(X, Y )] = g(x, y)P r[X = x, Y = y]
x,yR
X
E[(X + Y )2 ] = (x + y)2 P r[X = x, Y = y]
x,yR
X
E[Y |X = x] = yP r[Y = y|X = x]
y
We know how to solve for P r[Y = y|X = x], using definitions from the last
chapter.
However, the projection property provides a more general form, showing that
the law of total expectation is actually a special case where f (x) = 1.
Page 20
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
Question: Variables
Knowing that E[X] = 3, E[Y ] = 2, E[Z] = 1, compute E[1X + 2Y + 3Z].
Answer 10
Question: Independence
Let Z = X1 + X2 + X3 + X4 , where each Xi is the number of pips for a dice
roll. What is E[Z]?
Answer 14
Question: Dependence
Consider a bag of k red marbles and k blue marbles. Without replacement, we
pull 4 marbles from the bag in sequential order; what is the expected number
of red marbles?
Answer 2
The question asks for how many, so we know we need indicators to count
P4
successes. Let us define Z = i=1 Xi , where each Xi is 1 if the ith marble is
P4
red. We know, E[Z] = E[ i=1 Xi ], and by linearity of expectation
Page 21
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
4
X
E[Z] = E[Xi ]
i=1
4
X
= P [Xi = 1]
i=1
4
X 1
=
i=1
2
1
=4 =2
2
As linearity of expectation shows, the probability of a red marble on the first
marble is the same as the probability of a red marble on the fourth. This
symmetry applies to any set of samples where we are not given additional
information about the samples drawn.
E[Z] = E[X + Y ]
= E[X] + E[Y ]
7 7
= +
2 2
=7
E[Z 2 ] = E[(X + Y )2 ]
= E[X 2 + 2XY + Y 2 ]
= E[X 2 ] + 2E[XY ] + E[Y 2 ]
Page 22
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
The question asks for how many, so we know we need indicators to count
P
successes. We define X to be the total number of sequences and X = i Xi
where Xi is be 1 iff Angie begins a sequence on day i. This means that the
last day Angie can begin a sequence is k m + 1. Thus, we actually consider
k m + 1 trials.
Now, we compute the probability that, of m trials, Angie picks exactly the right
m arias in sequential order. Thus, the probability for a particular Xi is n1m .
X
E[X] = E[ Xi ]
i
X
= E[Xi ]
i
= (k m + 1)E[Xi ]
1
= (k m + 1)
nm
Page 23
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
In the first case, our number of watermelons does not change. This only occurs
if our pick is a cantaloupe. Since there are m watermelons, there are N m
m
cantaloupes. Thus, the probability of picking a single cantaloupe is N N =
m
1 N.
m
The second case falls out, as there are m watermelons, making N.
In the first case, our number of watermelons does not change. This only occurs
if both of our picks are cantaloupes. Since there are m watermelons, there are
m
N m cantaloupes. Thus, the probability of picking a single cantaloupe is N N .
(N m)(N m1)
The probability of picking two cantaloupes is N (N 1) .
Likewise, if the number of watermelons decreases by 1, we have chosen one
watermelon and one cantaloupe. This means we either chose the watermelon
m m
second and the cantaloupe first N N N 1 , or we chose the watermelon first and
m N m
the cantaloupe second N N 1 . Summed together, we have that the probability
2m(N m)
of one watermelon and one cantaloupe is N (N 1) .
Page 24
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
m(m1)
Finally, the probability of picking two watermelons is N (N 1) .
X
E[Xn+1 |Xn = m] = E[Xn+1 = x|Xn = m]
x
X
= xP r[Xn+1 = x|Xn = m]
x
m m
= m(1 ) + (m 1)
N N
m2 m2 m
=m +
N N N
m
=m
N
1
= m(1 )
N
Since Xn = m, we substitute it in.
1
= Xn (1 )
N
Question: Conditional Expectation, Three Cases
Let each citizen pick two melons per ritual. Given that a citizen has m
watermelons at the nth year, how many watermelons will a citizen then have
in year n + 1, on average? N
Answer m(1 2 )
X
E[Xn+1 |Xn = m] = xP r[Xn+1 = x|Xn = m]
x
Page 25
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
2
E[Xn+1 |Xn ] = Xn (1 )N
N
Question: Expectation
Let each citizen pick two melons per ritual. After n years, compute the
average number of watermelons a particular citizen will have left.
N
Answer (1 2 )n1 N
We are now computing E[Xn ]. First, we note that the law of total expectation
allows us to conclude the following.
2
E[Xn ] = (1 )E[Xn1 ]
N
Since E[Xn ] is recursively defined, we see that the constant in front of E[Xn1 ]
will simply be multiplied repeatedly. Thus, we can express this in terms of
E[X1 ].
2 n1
E[Xn ] = (1 ) E[X1 ]
N
Finally, we note that we began with N watermelons, so E[X1 ] = N .
2 n1
E[Xn ] = (1 ) N
N
Question: Algebra
Let each citizen pick two melons per ritual. If all citizens begin with 100
watermelons, what is the minimum number of years such that the expected
number of cantaloupes a citizen has is at least 99?
Answer 229
Just plug into our expression for E[Xn ]. 99 cantaloupes means 1 watermelon.
Thus, we are solving for E[Xn ] = 1.
2 n1
E[Xn ] = (1 ) 100 = 1
100
Page 26
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
49 n1 1
( ) =
50 100
49 n1 1
log ( ) = log
50 100
Using the log rule, log an = n log a, we get:
99 1
(n 1) log = log
100 100
1
log 100
n1=
log 49
50
1
log 100
n=1+
log 49
50
n 229
Page 27
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
4.4 Problems
1. Consider a set of books B. The spine of each book has a single character
inscribed, and this character is the only distinguishing characteristic. Placed
side-by-side in the right order, the spines of all books in B spell some string
s, RaoWalrandRule. What is the expected number of times s shows
up, if a monkey picks n books from B uniformly at random and places
them, uniformly at random, on a shelf? The books may be upside-down,
but the spines are always facing out. Hint: Do repeating characters affect
the probability?
2. Four square inch tiles make up a 2 in. 2 in. insignia, and each tile is
distinct. We have 2572 such tiles. When the tiles are randomly arranged
in a 257 257 grid, what is the expected number of properly-formed
insignias? Assume that tiles can be rotated in any of 4 orientations but
not flipped.
3. Let Xi be the event that the ith dice roll has an even number of pips. Let
Z be the product of all Xi , from 1 to 10. Formally, Z = 10i=1 Xi . Let Y
P10
be the sum of all Xi , from 1 to 10. Formally, Y = i=1 Xi .
Page 28
Chapter 5
Distributions and
Estimation
5.1 Guide
Distributions help us model common patterns in real-life situations. In the
end, being able to recognize distributions quickly and effectively is critical to
completing difficult probability problems.
5.1.3 Variance
Variance is by definition equal to E[(X E[X])2 ]. After a brief derivation, we
get that Var(X) is then the following.
29
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
Var(X + c) = Var(X)
Var(aX) = a2 Var(X)
5.1.4 Covariance
Covariance is by definition equal to E[(X E[X])(Y E[Y ])]. After a brief
derivation, we get that Cov(X, Y ) is then equal to the following.
Now, we discuss how to algebraically manipulate Cov. First, we can split sums.
Second, we can move constants out and apply the constant to either variable.
X X
Var( Xi ) = Var(Xi )
i i
Cov(X, Y )
L[Y |X] = E[Y ] + (X E[X])
Var(X)
Page 30
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
P P
1. If events Xi are mutually independent, then var( i Xi ) = i var(Xi ).
2. The converse of statement is not necessarily true.
3. Otherwise, we apply the more general definition of variance, that var(X) =
E[X 2 ] E[X]2 .
Var(6X 2 + 3) = Var(6X 2 )
= 36Var(X 2 )
= 36(E[X 4 ] E[X 2 ]2 )
2 2 2 2 2
+62 14 +24 +34 +44 +54 +64
We know that E[X 2 ] = 1 +2 +3 +4
6
+5
= 91
6 . Likewise, E[X 4 ] = 6 =
2275
6 . Finally, plug in and simplify.
2275 91
= 36( ( )2 )
6 6
2275 8281
= 36( )
6 36
= 6 2275 8281
= 5369
We first know that the shift does not affect variance. We apply linearity of
variance, as X and Y are independent. Thus,
Page 31
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
Var( 6X 6Y + 3) = Var( 6X 6Y )
= Var( 6X) + Var( 6Y )
= 6(Var(X) + Var(Y ))
Var( 6X 6Y + 3) = 6(Var(X) + Var(Y ))
35 35
= 6( + )
12 12
35
=6
6
= 35
Page 32
Chapter 6
Bounds
6.1 Guide
6.1.1 Markovs Inequality
Markovs inequality offers a bound in one direction. Intuitively, it gives us an
upper bound on the probability we are greater than some value. Keep in mind
that a 0, and X cannot take negative values.
E[X]
P r[X > a]
a
More generally, for a strictly non-negative, monotonically increasing function f ,
E[f (X)]
P r[X > a]
f (a)
Var(X)
P r[|X E[X]| ]
2
However, we may be interested in a lower bound on the probability that we are
less than a distance a from the mean. Thus, if Chebyshevs offers a bound of p,
we are actually interested in 1 p.
33
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
Question: Estimation
Given the results of your experiment, how should you estimate p?
We are looking for p, the fraction of people with trick coins, so let us begin by
assuming that the fraction of people with trick coins is p.
Let q be the fraction of people we observe with heads, in terms of p.
1
p + (1 p)
q = (1)
2
This implies that
2 p + (1 p)
q = 2
q1
p = 2
Note that p is the fraction of people we think have trick coins. This is different
from p, which is the actual fraction of people with trick coins. To express the
actual p in terms of our actual q, we rewrite the tilde expressions. Note that this
is in theory dependent on the people we sample, so this is only an approximate
equality that should be true as we approach an infinite number of samples.
p 2q 1
Question: Chebyshevs
How many people do you need to ask to be 95% sure that your answer is off
by at most 0.05?
We are looking for the difference between p and p to be less than 0.05 with
probability 95%. We first note that Chebyshevs inequality naturally follows,
as Chebyshevs helps us find distance from the mean with a certain probability.
Formally, this is Chebyshevs:
var(X)
P r[|X | a]
a2
Page 34
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
However, we are interested in finding an n so that we are off by at most 0.05 with
probability 95%. This is equivalent to being off by at least 0.05 with probability
5%. The latter is answerable by Chebyshevs.
Then, we follow three steps.
Step 1 : Fit to |X | a
We first only deal with your answer is off by at most 0.05. We can re-express
this mathematically, with the following:
|
p p| < 0.05
|(2
q 1) (2q 1)| 0.05
|2
q 2q| 0.05
|
q q| 0.025
First, note that with infinitely many samples, the fraction q should naturally
converge to become the fraction q.
q = q
|
q q| 0.025
.
However, we need to incorporate the number of people we are sampling. So, we
multiply all by n.
|
q n qn| 0.025n
n
1X
q = Xi
n i=1
To make our life easier, let us define another random variable Y = qn.
Page 35
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
n
X
Y = qn = Xi
i=1
.
Seeing that this now matches the format we need, our is 0.025. Our final form
is
var(Y )
P r[|Y qn| 0.025n]
(n0.025)2
Equivalently,
var(Y )
P r[|Y qn| < 0.025n] 1
(n0.025)2
var(Y )
Step 2 : Compute a2
n
X
var(Y ) = var( Xi )
i=1
n
X
= var(Xi )
i=1
= nvar(Xi )
= nq(q 1)
var(Y ) nq(1 q)
2
=
a (n0.025)2
q(1 q)
=
n(0.025)2
Page 36
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
var(Y ) q(1 q)
2
= = 0.05
a n(0.025)2
.
We have an issue however: there are two variables, and we dont know q.
However, we can upper bound the quantity q(1q). Since Chebyshevs computes
an upper bound for the probability, we can substitute q(1 q) for its maximum
value.
q(1 q) = q 2 q
q 0 = 2q 1 = 0
1
q=
2
q(1 q)
1 = 0.95
n(0.025)2
q(1 q)
= 0.05
n(0.025)2
1/4
= 0.05
n(0.025)2
1 1
=
4n(0.025)2 20
5
=n
(0.025)2
n = 8000
Page 37
Chapter 7
Markov Chains
7.1 Guide
Markov Chains are closely tied to both linear algebra and differential equations.
We will explore connections with both to build a better sense of how markov
chains work.
7.1.1 Definition
Formally, a Markov Chain is a countable set of random variables that satisfy
the memoryless (Markov) property, where transitions to the next state depend
only on the current state.
7.1.2 Characterization
We are interested in three properties of Markov Chains: (1) reducibility, (2)
periodicity, and (3) transience.
38
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
X
i, (i) = Phi (h)
h
Note that we can obtain the balance equations by left-multiplying the TPM by
~ = [(0)(1) . . . (n)]T
Page 39
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
s2 s3
1 1/2 1 1/4
1/4
1/4 s0 s1 1/2
1/4
1 1 1
(s0 ) = 1 + (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
(s1 ) = 1 + (s0 ) + (s1 ) + (s3 )
4 2 4
(s2 ) = 1 + (s0 )
(s3 ) = 0
Bring all constants to one side and all (si ) to the right.
3 1 1
1 = (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
1 = (s0 ) (s1 ) + (s3 )
4 2 4
1 = (s0 ) (s2 )
0 = (s3 )
Page 40
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
0 0 0 0 16
0 0 0 0 10
0 0 0 0 17
0 0 0 1 0
s2 (2) s3
1 1/2 1 1/4
1/4
1/4
1 1 1
(s0 ) = 4 + (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
(s1 ) = 1 + (s0 ) + (s1 ) + (s3 )
4 2 4
(s2 ) = 2 + (s0 )
(s3 ) = 0
Bring all constants to one side and all (si ) to the right.
3 1 1
4 = (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
1 = (s0 ) (s1 ) + (s3 )
4 2 4
2 = (s0 ) (s2 )
0 = (s3 )
Page 41
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
0 0 0 0 44
0 0 0 0 24
0 0 0 0 46
0 0 0 1 0
s2 (2) s3
1/4 (4)
Question: Lin. Alg. with Trans. Prob., Wait Time, Trans. Time
Let Xi be the number of steps needed to reach X3 . In the Markov Chain
above, the number in parentheses for a state represents the number of steps
needed to pass through that state. The numbers in parentheses for an edge
represent the number of steps needed to pass through that edge. If no
number is specified, the edge takes only 1 step. Compute E[X0 ].
1 1 1
(s0 ) = 4 + (s0 ) + ((s1 ) + 4) + (s2 )
4 4 2
1 1 1
(s1 ) = 1 + (s0 ) + (s1 ) + ((s3 ) + 4)
4 2 4
(s2 ) = 2 + ((s0 ) + 4)
(s3 ) = 0
1 1 1
(s0 ) = 4 + (s0 ) + (s1 ) + 1 + (s2 )
4 4 2
Page 42
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
1 1 1
(s1 ) = 1 + (s0 ) + (s1 ) + (s3 ) + 1
4 2 4
(s2 ) = 2 + (s0 ) + 4
(s3 ) = 0
Bring all constants to one side and all (si ) to the right.
3 1 1
5 = (s0 ) + (s1 ) + (s2 )
4 4 2
1 1 1
2 = (s0 ) (s1 ) + (s3 )
4 2 4
6 = (s0 ) (s2 )
0 = (s3 )
0 0 0 0 72
0 0 0 0 48
0 0 0 0 78
0 0 0 1 0
Page 43
Chapter 8
Solutions
44
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
8.1 Counting
1. If we roll a standard 6-sided die 3 times, how many ways are there to roll
a sum total of 14 pips where all rolls have an even number of pips?
Solution: We can reduce all of our dice to 3-sided die that contain
only even numbers. Additionally, we can consider a reduced subproblem.
In the original problem, we can only combine any number of 2, 4, 6 for a
total of 14. This is the same as counting the number of ways to combine
1, 2, 3 for a total of 7.
Distributing x to a dice roll is the same as assigning it 2x pips. Since a
dice roll can have at most 6 pips, x is at most 3 for a single roll. Since
dice do not have a 0 side, x is at least 1. Thus, we are distributing 7 balls
among 3 bins with at most 3 balls and at least 1 ball for a single bin.
By 2.2 Stars and Bars Walkthrough: At Least, we first distribute 1 ball to
each bin, reducing the problem to 4 balls and 3 bins, for 62
By 2.2 Stars and Bars Walkthrough: At Most, we can identify two classes
of invalid combinations:
Distribute 3 balls to one bin. There are then 2 other bins to pick
from, for the last ball: 31 21
In sum, we then have (1) all ways to distribute 7 balls, with at least 1 in
each bin Minus (2) all the ways to get more than 3 balls in a single bin.
6 3 3 2
2 1 1 1
Takeaway: Reduce complex stars and bars to the most basic form.
Alternate Solution:
First, we distribute 1 ball to each bin, reducing the problem to distributing
4 balls among 3 bins such that each bin contains no more than 2 balls.
The possibilities can be enumerated: 2 + 2 + 0, 2 + 0 + 2, 0 + 2 + 2, 1 + 1 +
2, 1 + 2 + 1, 2 + 1 + 1. Hence, there are 6 total ways.
Takeaway: When the options are few enough, enumerate.
2. Given a standard 52-card deck and a 5-card hand, how many unique hands
are there with at least 1 club and no aces?
Solution: Let A be the event, at least 1 club, and B be the event no
aces. We are looking for |A B|.
Note that computing |A B| is potentially tedious. Instead of considering
no clubs. Thus, we
A, at least 1 club, it is simpler to consider A,
rewrite |A B| = |B| |A B|. To compute |A B|, we examine all
combinations with no aces and no clubs. We are drawing from 52412 =
Page 45
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
36 cards, making |A B| = 36
5 . Consider all hands with no aces. This is
|B| = 48
5 . Thus, we have
|A B| = |B| |A B|
48 36
=
5 5
3. Given a standard 52-card deck and a 5-card hand, how many unique hands
are there with at least 1 club or no aces?
Solution: Again, let A be the event, at least 1 club, and B be the
event no aces. We are looking for |A B| = |A| + |B| |A B|.
From the previous part, we have |A B| = |B| |A B|. Thus, we first
simplify the original |A B| algebraically.
|A B| = |A| + |B| |A B|
= |A| + |B| (|B| |A B|)
= |A| + |A B|
Page 46
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
We thus have (1) all ways to distribute 15 validly Multiplied by (2) all
possible suit choices and finally, Minus invalid suit and number assignments.
3
14 4
(43 24) 288 = 5496
2 1
Page 47
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
8.2 Probability
1. We sample k times at random, without replacement, a coin from our wallet.
There are p pennies, n nickels, d dimes, and q quarters, making N total
coins. Given that the first three coins are pennies, what is the probability
that we will sample a nickel, 2 pennies, and a nickel in sequential order?
Note that this sequence of 4 coins does not need to be consecutive and
can span more than 4 coins.
Solution: First, we consider what we know about our wallet. We know
that there are N 3 coins remaining, of which p 3 are pennies. Let
Da denote the denomination of the ith coin. Let a, b, c, d be the indices
in sequential order. Let Xa be the event that Da is a nickel, Xb be the
event that Db is a penny, etc. We are thus interested in computing the
following.
P r[Xa Xb Xc Xd ]
n
The probability of a nickel at some position is N 3 . There are now N 4
coins remaining, of which n 1 are nickels.
n
P r[Xa ] =
N 3
The probability of a penny at some later position, regardless of which
position it is, is Np3
4 . There are now N 5 coins remaining, of which p 4
are pennies.
p3
Pr[Xb |Xa ] =
N 4
p4
The probability of another penny at some later position is N 5 . There
are now N 6 coins remaining.
p4
P r[Xc |Xa , Xb ] =
N 5
n1
The probability of another nickel is N 6 .
n1
P r[Xd |Xa , Xb , Xc ] =
N 6
Since all samples are made at random, we know the following holds.
Page 48
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
2. We are playing a game with our apartment-mate Wayne. There are three
coins, one biased with probability p of heads and the other two fair coins.
First, each player is assigned one of three coins uniformly at random.
Players then flip simultaneously, where each player earns h points per
head. The winning player is the one with the most points. If Wayne earns
k points after n coin flips, what is the probability that Wayne has the
biased coin?
P r[X|Y ]P r[Y ]
P r[Y |X] =
P r[X|Y ]P r[Y ] + P r[X|Y ]P r[Y ]
We will first compute the easiest terms. We know the following probabilities.
Given that we have three coins with one biased, the probability of a biased
coin is 13 and for an unbiased coin, 32 .
1
P r[Y ] =
3
2
P r[Y ] =
3
n
P r[Bin(n, p) = k] = (1 p)nk pk
k
For the biased coin, the probability of heads is p (Y ). For the fair coin,
the probability of heads is 21 (Y ). To simplify, we will define k 0 = hk , which
gives us the number of heads that Wayne obtained.
n 0 0
P r[X = k 0 |Y ] = 0
(1 p)nk pk
k
n 1
P r[X = k 0 |Y ] =
k 0 2n
We have computed all values, so we plug into Bayes Rule and simplify.
Page 49
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
P r[X|Y ]P r[Y ]
P r[Y |X] =
P r[X|Y ]P r[Y ] + P r[X|Y ]P r[Y ]
P r[X|Y ] 13
=
P r[X|Y ] 13 + P r[X|Y ] 32
P r[X|Y ]
=
P r[X|Y ] + P r[X|Y ]2
n nk0 k0
k0 (1 p) p
= n
p + kn0 21n 2
0 k0
nk
k0 (1 p)
0 0
(1 p)nk pk
=
(1 p)nk0 pk0 + 21n
4 2 6
P r[Z k 2] = + 2 = 2
k2 k k
We thus solve for P r[Z < k 2].
Page 50
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
6
=1
k2
Takeaway: Use counting where applicable.
(b) Find the probability that Z 2.
Solution: Again, we apply the same trick in the previous part. It
is easier to compute the probability for P r[Z < 2] = 1 P r[Z 2].
Thus, we have only two possible values of Z to account for.
k 2(k 1) 3k 2
P r[Z < 2] = 2
+ 2
=
k k k2
We thus solve for P r[Z 2].
Page 51
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
no tiles.) By knocking off tiles, the monkey can create digits. For example,
the following would form the digit 3, where 0 denotes a missing tile and
1 denotes a tile.
0 0 0
1 1 0
0 0 0
1 1 0
0 0 0
(a) Given the monkey strikes n times, where n > 15, what is the probability
that the monkey knocks out tiles, such that the board forms 8?
Solution: To knock out an 8, the monkey needs to achieve the
following board.
0 0 0
0 1 0
0 0 0
0 1 0
0 0 0
We can consider this as balls and bins, where we have n balls being
thrown into 15 bins. Except, all balls are distinguishable. The total
number of ways to strike 15 tiles is 15n .
We wish to avoid the 2 tiles and to strike all other 13 tiles. Thus,
we are throwing n balls into 13 bins, where each bin has at least 1.
Since order matters, we cannot apply stars and bars directly.
i. Consider all the ways to distribute n balls among 13 bins, 13n
ii. Subtract all cases where 1 bin is left empty. First, we choose the
bin that is left empty 13
1 , then we distribute the n balls among
the remaining 12 bins, 12n . Together, this is 13
n
1 12 .
iii. By inclusion-exclusion, we have double-counted all cases where 2
bins are left empty. Choose the 2 bins that are left empty, 13
2 .
13
n
n
Distribute to the remaining bins, 11 . This is + 2 11 .
We notice a pattern, which is that for 0 i 13, we select i to
be the number of bins that are empty. Then, we distribute n balls
among the remaining i bins. By inclusion-exclusion, we include and
exclude alternately. The total number of ways to distribute n balls
among 13 bins, where each bin receives at least one ball is thus
13
i 13
X
(1) (13 i)n
i=0
i
Page 52
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
13
1 X i 13
(1) (13 i)n
15n i=0 i
11
1 X i 11
(1) (11 i)n
15n i=0 i
(c) Given the monkey strikes n times, where n > 15, what is the probability
that the monkey knocks out tiles, such that the board forms an even
number?
Solution: First, let us count the number of 0s required to form
each digit.
digit 0 : 12 zeros
digit 2 : 11 zeros
digit 4 : 9 zeros
digit 6 : 12 zeros
digit 8 : 13 zeros
Let the probability that a digit with a zeros occurs be
a
1 X i a
pa = n (1) (a i)n
15 i=0 i
Applying the pa , we see that the probability that any of these even
numbers occurs is:
Page 53
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
Page 54
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
8.3 Expectation
1. Consider a set of books B. The spine of each book has a single character
inscribed, and this character is the only distinguishing characteristic. Placed
side-by-side in the right order, the spines of all books in B spell some string
s, RaoWalrandRule. What is the expected number of times s shows
up, if a monkey picks n books from B uniformly at random and places
them, uniformly at random, on a shelf? The books may be upside-down,
but the spines are always facing out. Hint: Do repeating characters affect
the probability?
Solution: This is similar to the last problem in the 4.2 Linearity of
Expectation Walkthrough. However, we are drawing letters from s instead
of the entire alphabet. We realize the question the is asking for expected
number of (successes) once more, so we immediately define our indicator
variable, Xi to be 1 if one instance of s ends at the ith position along the
shelf.
The string is 14 letters long, meaning that the first 13 books on the shelf
cannot be the position where an instance of s ends. This makes X =
Pn
i=14 Xi the total number of times s appears on the shelf. By linearity
of expectation:
n
X
E[X] = E[ Xi ]
i=14
n
X
= E[Xi ]
i=14
n
X
= P r[Xi = 1]
i=14
n
X 33113233113121
=
i=14
1414
n
X 36 22
=
i=14
1414
36 22
= (n 13)
1414
Takeaway: Be clever with how you define success for your indicator
variables. Remember linearity of expectation always holds.
Page 55
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
2. Four square inch tiles make up a 2 in. 2 in. insignia, and each tile is
distinct. We have 2572 such tiles. When the tiles are randomly arranged
in a 257 257 grid, what is the expected number of properly-formed
insignias? Assume that tiles can be rotated in any of 4 orientations but
not flipped.
Solution: There are 256 256 = 48 interior vertices. Let Xi be 1 if the
ith interior vertex is at the center of a valid insignia. Thus, we know that
P48
X = i Xi is the total number of valid insignias. We can apply linearity
of expectation, and then invoke properties of an indicator variable.
8
4
X
E[X] = E[ Xi ]
i
8
4
X
= E[Xi ]
i
8
= 4 E[Xi ]
= 48 P r[Xi = 1]
We can use counting to compute the probability that vertex i is the center
of a valid insignia. First, there are 4 valid orientations for the insignia.
Second, we now compute all possible combinations of 4 tiles. Each of
the four tiles is chosen at random and then rotated at random, making
4 possible tiles with 4 possible orientations each (16), for each tile. Thus
there are a total of 164 = 48 combinations.
4 1 1
P r[Xi = 1] = 8
= 7 = 7
4 4 4
We now plug it back in to solve for E[X].
E[X] = 48 P r[Xi = 1]
1
= 48 7
4
=4
Page 56
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
X10
E[Y |Z] = E[ Xi |Z]
i=1
10
X
= E[Xi |Z]
i=1
= 10E[Xi |Z]
P r[Z|Xi = 1] 12
=
P r[Z|Xi = 1] 12 + P r[Z|Xi = 0] 21
P r[Z|Xi = 1]
=
P r[Z|Xi = 1] + P r[Z|Xi = 0]
We will additionally assign Z to a value k. After solving E[Xi |Z = k],
we can then substitute all instances of k with Z for E[Xi |Z].
P r[Z = k|Xi = 1]
E[Xi |Z = k] =
P r[Z = k|Xi = 1] + P r[Z = k|Xi = 0]
We now substitute Z in. For P r[Z = k|Xi = 0], since Z is the
product of all Xj , including Xi = 0, then Z = 0. Thus, P r[Z =
k|Xi = 0] = P r[k = 0].
P r[10
j=1 Xj = k|Xi = 1]
E[Xi |Z = k] =
P r[10
j=1 Xj = k|Xi = 1] + P r[k = 0]
Consider the probability specified in the numerator. P r[Xi = 1|10
j=1 Xj ].
We know that since Xi is 1, it does not affect the product. Thus,
this probability reduces to P r[10
j=1,i6=j Xj ].
Page 57
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
P r[10j=1,i6=j Xj = k]
=
P r[10
j=1,i6=j Xj = k] + P r[k = 0]
P r[10j=1,i6=j Xj = Z]
E[Xi |Z] = 10
P r[j=1,i6=j Xj = Z] + P r[Z = 0]
P r[10j=1,i6=j Xj = Z]
E[Y |Z] = 10 10
P r[j=1,i6=j Xj = Z] + P r[Z = 0]
Alternate Solution:
Since Z can only be 0 or 1, let us consider the two cases. First, when
Z = 1, then every Xi must be 1, which means that the sum of Xi s
must be 10. Therefore, E[Y |Z = 1] = 10.
10
X
E[Y ] = yP r[Y = y]
y=0
P r[Y = y, Z = 0] P r[Y = y]
P r[Y = y|Z = 0] = = , 0y9
P r[Z = 0] 1 210
Page 58
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
We have obtained the two different cases for E[Y |Z = z], which is
enough to specify E[Y |Z] as a function of Z. Just for fun, we can
stitch the two expressions into one:
1 1
E[Y |Z] = 10Z + 5 10 (1 Z)
1 210 210
(b) Using your derivation from part (a), compute E[Y |Z > 0]. Explain
your answer intuitively.
Solution: To gain some insight, we then evaluate this expression for
Z > 0. Notice that if Z > 0, then P r[Z = 0] = 0. Thus,
P r[10j=1,i6=j Xj = Z]
E[Y |Z > 0] = 10 10 = 10
P r[j=1,i6=j Xj = Z] + 0
P r[10j=1,i6=j Xj = 0]
E[Y |Z = 0] = 10 10
P r[j=1,i6=j Xj = 0] + 1
P r[10
j=1,i6=j Xj = 0]
= P r[10
j=1,i6=j Xj = 0]
Page 59
Discrete Mathematics and Probability Theory aaalv.in/abcDMPT
= 1 P r[10
j=1,i6=j Xj 6= 0]
= 1 10
j=1,i6=j P r[Xj 6= 0]
For all Xj , P r[Xj 6= 0] is the probability that a dice roll is not even,
which is 12 .
1
= 1 10
j=1,i6=j
2
1
=1
29
Plugging it back in, we get that
1 219
E[Y |Z = 0] = 10
1 219 + 1
1
1 29
= 10 1
2 29
29 1
= 10
210 1
Page 60