You are on page 1of 38

3.

Theory of Probability

3.1 Basic Definitions and Properties

3.2 Conditional Probability and Independent Events

3.3 Bayes Formula

3.4 Applications

3.5 Problems
Ismor Fischer, 5/29/2012 3.1-1
3. Probability Theory
3.1 Basic Ideas, Definitions, and Properties
POPULATION = Unlimited supply of five types of fruit, in equal proportions.
O1 = Macintosh apple O4 = Cavendish (supermarket) banana
O2 = Golden Delicious apple O5 = Plantain banana
O3 = Granny Smith apple

Experiment 1: Randomly select one fruit from this population, and record its type.
Sample Space: The set S of all possible elementary outcomes of an experiment.
S = {O1, O2, O3, O4, O5} #(S) = 5
Event: Any subset of a sample space S. (Elementary outcomes = simple events.)
A = Select an apple. = {O1, O2, O3} #(A) = 3
B = Select a banana. = {O4, O5} #(B) = 2
#(Event)
#(trials) 1/1
1
3/4
2/3 4/6 ... Event P(Event)
A 3/5
0.6
1/2 A 3/5 = 0.6
0.4
B 2/5 ...
1/3
1/4
2/6 B 2/5 = 0.4
# trials of
0 experiment
1 2 3 4 5 6 ... 5/5 = 1.0
e.g., A B A A B A ...

P(A) = 0.6 The probability of randomly selecting an apple is 0.6.


As # trials
P(B) = 0.4 The probability of randomly selecting a banana is 0.4.
Ismor Fischer, 5/29/2012 3.1-2

General formulation may be facilitated with the use of a Venn diagram:

Experiment Sample Space: S = {O1, O2, , Ok} #(S) = k

A Om+1 Om+2
O2
O1
O3
Om+3
O4
. . . Om
. . . Ok

Event A = {O1, O2, , Om} S #(A) = m k


Definition: The probability of event A, denoted P(A), is the long-run relative
frequency with which A is expected to occur, as the experiment is repeated indefinitely.

Fundamental Properties of Probability

For any event A = {O1, O2, , Om} in a sample space S,


1. 0 P(A) 1
m
2. P(A) =
i =1
P(Oi ) = P(O1 ) + P(O2 ) + P(O3 ) + + P(Om )

Special Cases:

P() = 0
k
P(S) =
i =1
P(Oi ) = 1 certainty

3. If all the elementary outcomes of S are equally likely, i.e.,


1
P(O1) = P(O2) = = P(Ok) = ,
k
then
#( A) m
P( A) = = .
#( S ) k

Example: P(A) = 3/5 = 0.6, P(B) = 2/5 = 0.4


Ismor Fischer, 5/29/2012 3.1-3

Experiment 2: Select a card at random from a standard deck (and replace).


Sample Space: S = {A, , K} #(S) = 52
A
A 2 3 4 5 6 7 8 9 10 J Q K
B
A 2 3 4 5 6 7 8 9 10 J Q K

A 2 3 4 5 6 7 8 9 10 J Q K

A 2 3 4 5 6 7 8 9 10 J Q K

Events: A = Select a 2. = {2, 2, 2, 2} #(A) = 4


B = Select a . = {A, 2, , K} #(B) = 13
Probabilities: Since all elementary outcomes are equally likely, it follows that
#( A) 4 #( B) 13
P(A) = = and P(B) = = .
#( S ) 52 #( S ) 52

New Events from Old Events


complement

(1) Ac = not A = {All outcomes that are in S, but not in A.}

P(Ac) = 1 P(A)
4 48
Example: Ac = Select either A, 3, 4, , or K. P(Ac) = 1 = .
52 52
Example: Experiment = Toss a coin once.
Events: A = {Heads} Ac = {Tails}
Probabilities:
Fair coin P(A) = 0.5 P(Ac) = 1 0.5 = 0.5
Biased coin P(A) = 0.7 P(Ac) = 1 0.7 = 0.3
Ismor Fischer, 5/29/2012 3.1-4

intersection

(2) A B = A and B = {All outcomes in S that A and B share in common.}


= {All outcomes that result when events A and B occur simultaneously.}
1
Example: A B = Select a 2 and a = {2} P(A B) = .
52

Definition: Two events A and B are said to be disjoint, or mutually exclusive,


if they cannot occur simultaneously, i.e., A B = , hence P(A B) = 0.

S
A B

Example: A = Select a 2 and C = Select a 3 are disjoint events.


Exercise: Are A = {24, 34, 44, 54,...} and B = {26, 36, 46, 56,...} disjoint?
If not, find A B.

union

(3) A B = A or B = {All outcomes in S that are either in A or B, inclusive.}

P(A B) = P(A) + P(B) P(A B)

= 0, if A and B
are disjoint.
Example: A B = Select either a 2 or a has probability
4 13 1 16
P(A B) = + = .
52 52 52 52
Example: A C = Select either a 2 or a 3 has probability
4 4 8
P(A C) = + 0 = .
52 52 52
Ismor Fischer, 5/29/2012 3.1-5

Note: Formula (3) extends to n 3 disjoint events in a straightforward manner:

(4) P(A1 A2 An) = P(A1) + P(A2) + + P(An).

Question: How is this formula modified if the n events are not necessarily disjoint?
Example: Take n = 3 events

S Then P(A B C) =
A B
P(A) + P(B) + P(C)
P(A B) P(A C) P(B C)
+ P(A B C).

Exercise: For S = {January,, December},


verify this formula for the three events
A = Has 31 days, B = Name ends in r, and
C C = Name begins with a vowel.

incisors
canine canine

Exercise: A single tooth is to be randomly selected for a premolars


certain dental procedure. Draw a Venn diagram to illustrate
the relationships between the three following events:
A = upper jaw, B = left side, and C = molar, and
indicate all corresponding probabilities. Calculate the molars
probability that all of these three events, A and B and C,
occur. Calculate the probability that none of these three
events occur. Calculate the probability that exactly one of
these three events occurs. Calculate the probability that premolars
exactly two of these three events occur. (Think carefully.)
Assume equal likelihood in all cases. canine canine
incisors

The three set operations union, intersection, and complement can be unified via...

DeMorgans Laws Exercise: Using a Venn diagram, convince


yourself that these statements are true in
(A B) c = Ac Bc general. Then verify them for a specific
example, e.g., A = Pick a picture card and
(A B) c = Ac Bc B = Pick a black card.
Ismor Fischer, 5/29/2012 3.1-6

Slight Detour

Suppose that out of the last n = 40 races, a certain racing horse won x = 25, and lost
the remaining n x = 15. Based on these statistics, we can calculate the following
probability estimates for future races:
x 25 5
P(Win) = = = 0.625 = p Out of every 8 races, the horse
n 40 8
wins 5 and loses 3, on average.
x 15 3
P(Lose) 1 = = = 0.375 = 1 p = q
n 40 8
P(Win) 5/8 5
Odds of winning = = = 5 to 3
P(Lose) 3/8 3

Definition: For any event A, let P(A) = p, thus P(Ac) = q = 1 p. The odds of event A
p p
= = , i.e., the probability that A does occur, divided by the probability that it
q 1 p
does not occur. (In the preceding example, A = Win with probability p = 5/8.)
Note that if odds = 1, then A and Ac are equally likely to occur. If odds > 1 (likewise,
< 1), then the probability that A occurs is greater (likewise, less) than the probability
that it does not occur.

Example: Suppose the probability of contracting a certain disease in a particular


group of high risk individuals is P(D+) = 0.75, so that the probability of being
disease-free is P(D) = 0.25. Then the odds of contracting the disease in this group is
equal to 0.75/0.25 = 3 (or 3 to 1). * Likewise, if in a reference group of low risk
individuals, the prevalence of the same disease is only P(D+) = 0.02, so that P(D) =
0.98, then their odds = 0.02/0.98 = 1/49 ( 0.0204). As its name suggests, the
corresponding odds ratio between the two groups is defined as the ratio of their
3
respective odds, i.e., = 147. That is, the odds of the high-risk group contracting
1/ 49
the disease are 147 times larger than the odds of the low-risk reference group. (Odds
ratios have nice properties, and are used extensively in epidemiological studies.)

*
That is, within this group, the probability of disease is three times larger than the probability of no disease.
Ismor Fischer, 5/29/2012 3.2-1

3.2 Conditional Probability and Independent Events


Using population-based health studies to estimate probabilities relating
potential risk factors to a particular disease, evaluate efficacy of medical
diagnostic and screening tests, etc.
Example: Events: A = lung cancer B = smoker

S Disease Status
A B
Lung No lung
cancer (A) cancer (Ac)
0.03 0.12 0.04
Yes
0.12 0.04 0.16
(B)

Smoker
0.81 No
0.03 0.81 0.84
(Bc)

0.15 0.85 1.00

Probabilities: P(A) = 0.15 P(B) = 0.16 P(A B) = 0.12

Definition:

Conditional Probability of Event A,


given Event B (where P(B) 0)
P( A B)
P(A | B) =
P( B)

0.12
= = 0.75 >> 0.15 = P(A).
0.16
Comments:
P( B A) 0.12
P(B | A) = = = 0.80, so P(A | B) P(B | A) in general.
P( A) 0.15
General formula can be rewritten: P(A B) = P(A | B) P(B) IMPORTANT
Example: P(Angel barks) = 0.1
P(Brutus barks) = 0.2 Therefore
P(Angel and Brutus bark) = 0.06
P(Angel barks | Brutus barks) = 0.3
Ismor Fischer, 5/29/2012 3.2-2

Example: Suppose that two balls are to be randomly drawn, one after another,
from a container holding four red balls and two green balls. Under the scenario
of sampling without replacement, calculate the probabilities of the events
A = First ball is red, B = Second ball is red, and A B = First ball is red
AND second ball is red. (As an exercise, list the 6 5 = 30 outcomes in the
sample space of this experiment, and use brute force to solve this problem.)

R1 G1 R2

R3 R4 G2

This type of problem known as an urn model can be solved with the use of
a tree diagram, where each branch of the tree represents a specific event,
conditioned on a preceding event. The product of the probabilities of all such
events along a particular sequence of branches is equal to the corresponding
intersection probability, via the previous formula. In this example, we obtain the
following values:
1st draw 2nd draw

P(B | A) = 3/5
P(A B) = 12/30 A A
c
P(A) = 4/6
c B
P(B | A) = 2/5 c
P(A B ) = 8/30 A B
c
A B

c
P(B | A ) = 4/5
P(Ac B) = 8/30
c
P(A ) = 2/6
c c
P(B | A ) = 1/5
P(Ac Bc) = 2/30

We can calculate the probability P(B) by adding the two boxed values above,
i.e., P(B) = P(A B) + P(Ac B) = 12/30 + 8/30 = 20/30, or P(B) = 2/3.
This last formula which can be written as P(B) = P(B | A) P(A) + P(B | Ac) P(Ac)
can be extended to more general situations, where it is known as the Law of Total
Probability, and is a useful tool in Bayes Theorem (next section).
Ismor Fischer, 5/29/2012 3.2-3

Suppose event C = coffee drinker.

S Disease Status
A C Lung No lung
cancer (A) cancer (Ac)
0.09 0.06 0.34

Coffee Drinker
Yes
0.06 0.34 0.40
(C)

No
0.51 0.09 0.51 0.60
(Cc)

0.15 0.85 1.00


Probabilities: P(A) = 0.15 P(C) = 0.40 P(A C) = 0.06
P(A C) 0.06
Therefore, P(A | C) = P(C) = = 0.15 = P(A)
0.40
i.e., the occurrence of event C gives no information about the probability of event A.
Definition:
Two events A and B are said to be statistically
independent if either:
(1) P(A | B) = P(A), i.e., P(B | A) = P(B),
or equivalently,
(2) P(A B) = P(A) P(B).

Exercise: Prove that if events B and C are statistically independent, then so are
each of the following: B and Not C Not B and C Not B and Not C
Hint: Let P(B) = b, P(C) = c, and construct a 2 2 probability table.
Summary
A, B disjoint If either event occurs, then the other cannot occur: P ( A B) 0.
A, B independent If either event occurs, this gives no information about the other:
P ( A B ) P ( A) P ( B ) .

Example: A = Select a 2 and B = Select a are not disjoint events, because


A B = {2 } . However, P(A B) = 1/52 = 1/13 1/4 = P(A) P(B); hence
they are independent events. Can two disjoint events ever be independent? Why?
Ismor Fischer, 5/29/2012 3.2-4

A VERY IMPORTANT AND USEFUL FACT: It can be shown that for


any event A, all of the elementary properties of probability P(A) covered in
the notes, extend to conditional probability P (A | B ) , for any other event B.
For example, since we know that P( A1 A2 ) P( A1 ) P( A2 ) P( A1 A2 )
for any two events A1 and A2, it is also true that
P( A1 A2 | B) P( A1 | B) P( A2 | B) P( A1 A2 | B) for any other event B.
As another example, since we know that P ( Ac ) 1 P (A ) , it therefore also
follows that P ( Ac | B ) 1 P (A | B ) .

Exercise: Prove these two statements. (Hint: Sketch a Venn diagram.)


HOWEVER, there is one important exception! We know that if A and B are
two independent events, then P( A B) P( A) P( B) . But this does not
extend to conditional probabilities! In particular, if C is any other event, then
P( A B | C ) P( A | C ) P( B | C ) in general. The following example illustrates
this, for three events A, B, and C:

A B

.20 .20 .20

.05
.05 .05

.10

C .15

Exercise:
Confirm that P( A B) P( A) P( B) , but P( A B | C) P( A | C ) P( B | C ) .

In other words, two events that may be independent in a general population,


may not necessarily be independent in a particular subgroup of that population.
Ismor Fischer, 5/29/2012 3.2-5

More on Conditional Probability and Independent Events


Another example from epidemiology

S = POPULATION A = lung cancer S = POPULATION A = lung cancer

A B A C

C = smoker
B = obese

Suppose that, in a certain study population, we wish to investigate the prevalence of lung cancer
(A), and its associations with obesity (B) and cigarette smoking (C), respectively. From the first
of the two stylized Venn diagrams above, by comparing the scales drawn, observe that the
proportion of the size of the intersection A B (green) relative to event B (blue + green), is about
equal to the proportion of the size of event A (yellow + green) relative to the entire population S.
That is,
P( A B) P( A)
= .
P( B) P( S )

(As an exercise, verify this equality for the following probabilities: yellow = .09, green = .07,
blue = .37, white = .47, to two decimals, before reading on.) In other words, the probability that a
randomly chosen person from the obese subpopulation has lung cancer, is equal to the probability
that a randomly chosen person from the general population has lung cancer (.16). This equation
can be equivalently expressed as
P(A | B) = P(A),

since the left side is conditional probability by definition, and P(S) = 1 in the denominator of
the right side. In this form, the equation clearly conveys the interpretation that knowledge of
event B (obesity) yields no information about event A (lung cancer). In this example, lung cancer
is equally probable (.16) among the obese as it is among the general population, so knowing that
a person is obese is completely unrevealing with respect to having lung cancer. Events A and B
that are related in this way are said to be independent. Note that they are not disjoint!

In the second diagram however, the relative size of A C (orange) to C (red + orange), is larger
than the relative size of A (yellow + orange) to the whole population S, so P(A | C) P(A), i.e.,
events A and C are dependent. Here, as is true in general, the probability of lung cancer is
indeed influenced by whether a person is randomly selected from among the general population
or the smoking subset, where it is much higher. Statistically, lung cancer would be a rare disease
in the U.S., if not for cigarettes (although it is on the rise among nonsmokers).
Ismor Fischer, 5/29/2012 3.2-6

Application: Are Blood Antibodies Independent?


An example of conditional probability in human genetics
(Adapted from Rick Chappell, Ph.D., UW Dept. of Biostatistics & Medical Informatics)

Background: The surfaces of human red blood cells (erythrocytes) are coated with antigens
that are classified into four disjoint blood types: O, A, B, and AB. Each type is associated
with blood serum antibodies for the other types, that is,

Type O blood contains both A and B antibodies.


(This makes Type O the universal donor, but capable of receiving only Type O.)
Type A blood contains only B antibodies.
Type B blood contains only A antibodies.
Type AB blood contains neither A nor B antibodies.
(This makes Type AB the universal recipient, but capable of donating only to Type AB.)

In addition, blood is also classified according to the presence (+) or absence ( ) of Rh factor
(found predominantly in rhesus monkeys, and to varying degree in human populations; they
are important in obstetrics). Hence there are eight distinct blood groups corresponding to this
joint classification system: O+, O , A+, A , B+, B , AB+, AB . According to the American
Red Cross, the U.S. population has the following blood group relative frequencies:

Rh factor
+ Totals
O .384 .077 .461
Types
Blood

A .323 .065 .388


B .094 .017 .111
AB .032 .007 .039
Totals .833 .166 .999

From these values (and from the background information above), we can calculate the
following probabilities:

P (A antibodies) = P (Type O or B) P (B antibodies) = P (Type O or A)


= P (O) + P (B) = P (O) + P (A)
= .461 + .111 = .461 + .388
= .572 = .849

P (B antibodies and Rh+ ) = P (Type O+ or A+)


= P (O+) + P (A+)
= .384 + .323
= .707
Ismor Fischer, 5/29/2012 3.2-7

Using these calculations, we can answer the following.

Question: Is having A antibodies independent of having B antibodies?

Solution: We must check whether or not

P(A and B antibodies) = P(A antibodies) P(B antibodies),


i.e.,
P(Type O) .572 .849
or
.461 .486

This indicates near independence of the two events; there does exist a slight
dependence. The dependence would be much stronger if America were
composed of two disjoint (i.e., non-interbreeding) groups: Type A (with B
antibodies only) and Type B (with A antibodies only), and no Type O (with
both A and B antibodies). Since this is evidently not the case, the implication is
that either these traits evolved before humans spread out geographically, or they
evolved later but the populations became mixed in America.

Question: Is having B antibodies independent of Rh+?

Solution: We must check whether or not

P (B antibodies and Rh+) = P (B antibodies) P (Rh+),


that is,
.707 = .849 .833,

which is true, so we have exact independence of these events. These traits


probably predate diversification in humans (and were not differentially selected
for since).

Exercises:

Is having A antibodies independent of Rh+?


Find P (A antibodies | B antibodies) and P (B antibodies | A antibodies).
Conclusions?
Is Blood Type independent of Rh factor? (Do a separate calculation for
each blood type: O, A, B, AB, and each Rh factor: +, .)
Ismor Fischer, 5/29/2012 3.3-1

3.3 Bayes Formula


Suppose that, for a certain population of individuals, we are interested in
comparing sleep disorders in particular, the occurrence of event A = Apnea
between M = Males and F = Females.
S = Adults under 50
M F
A

A M A F

Also assume that we know the following information:


P(M) = 0.4 P(A | M) = 0.8 (80% of males have apnea)
prior probabilities
P(F) = 0.6 P(A | F) = 0.3 (30% of females have apnea)

Given here are the conditional probabilities of having apnea within each
respective gender, but these are not necessarily the probabilities of interest. We
actually wish to calculate the probability of each gender, given A. That is, the
posterior probabilities P(M | A) and P(F | A).

To do this, we first need to reconstruct P(A) itself from the given information.

P(A | M)
P(A M) = P(A | M) P(M)
P(M)

P(Ac | M)
P(Ac M) = P(Ac | M) P(M) P(A) = P(A | M) P(M) + P(A | F) P(F)

P(A | F)
P(A F) = P(A | F) P(F)
P(F)

P(Ac | F)
P(Ac F) = P(Ac | F) P(F)
Ismor Fischer, 5/29/2012 3.3-2

So, given A
P(M A) P(A | M) P(M)
P(M | A) = P(A) = P(A | M) P(M) + P(A | F) P(F)
(0.8)(0.4) 0.32
= (0.8)(0.4) + (0.3)(0.6) = 0.50 = 0.64

and
posterior
P(F A) P(A | F) P(F) probabilities
P(F | A) = P(A) = P(A | M) P(M) + P(A | F) P(F)

(0.3)(0.6) 0.18
= (0.8)(0.4) + (0.3)(0.6) = 0.50 = 0.36

S
Thus, the additional information that a
M F randomly selected individual has apnea (an
A event with probability 50% why?) increases
the likelihood of being male from a prior
0.32 0.18 probability of 40% to a posterior probability
of 64%, and likewise, decreases the likelihood
of being female from a prior probability of
60% to a posterior probability of 36%. That
is, knowledge of event A can alter a prior
0.08 0.42 probability P(B) to a posterior probability
P(B | A), of some other event B.

Exercise: Calculate and interpret the posterior probabilities P(M | Ac) and P(F | Ac)
as above, using the prior probabilities (and conditional probabilities) given.

More formally, consider any event A, and two complementary events B1 and B2,
(e.g., M and F) in a sample space S. How do we express the posterior
probabilities P(B1 | A) and P(B2 | A) in terms of the conditional probabilities
P(A | B1) and P(A | B2), and the prior probabilities P(B1) and P(B2)?

Bayes Formula for posterior probabilities P(Bi | A) in terms


of prior probabilities P(Bi), i = 1, 2

P( Bi A) P ( A | Bi ) P ( Bi )
P(Bi | A) = = P( A | B ) P(B ) + P( A | B ) P(B )
P( A) 1 1 2 2
Ismor Fischer, 5/29/2012 3.3-3

In general, consider an event A, and events B1, B2, , Bn, disjoint and exhaustive.
S
B1 B2 . . . Bn

A B1 A B2 . . . A Bn

Prior Probabilities
P(A | B1)
P(B1) P(A B1)
c
P(A | B1)
P(Ac B1) Law of Total Probability
P(A | B2)
P(B2) P(A B2)
P(A Bj)
P(Ac | B2) c
P(A B2)
n
P(A | B3)
P(B3) P(A B3) P(A) = P(A | Bj) P(Bj)
c j=1
P(A | B3)
.
P(Ac B3)
. .
. . .
.
. .
P(A | Bn)
P(Bn) P(A Bn)
c
P(A | Bn)
P(Ac Bn)

Bayes Formula (general version)


For i = 1, 2, , n, the posterior probabilities are

P( Bi A) P ( A | Bi ) P ( Bi )
P(Bi | A) = = n .
P( A) P( A | B j ) P(B j )
j =1 Reverend Thomas Bayes
1702 - 1761
Ismor Fischer, 5/29/2012 3.4-1

3.4 Applications
Evidence-Based Medicine: Screening Tests and Disease Diagnosis
D+ D
T D+

T+

T D
T+ D+ T+ D

Clinical tests are frequently used in medicine and epidemiology to diagnose or screen
for the presence (T+) or absence (T ) of a particular condition, such as pregnancy or
disease. Definitive disease status (either D+ or D ) is often subsequently determined
by means of a gold standard, such as data resulting from follow-up, invasive
radiographic or surgical procedures, or autopsy. Different measures of the tests
merit can then be estimated via various conditional probabilities. For instance, the
sensitivity or true positive rate of the test is defined as the probability that a
randomly selected individual has a positive test result, given that he/she actually has
the disease. Other terms are defined similarly; the following example, using a
random sample of n = 200 patients, shows how they are estimated from the data.
Disease Status
Diseased (D+) Nondiseased (D )
Test Result

Positive (T+) 16 (= TP) 9 (= FP) 25

Negative (T ) 4 (= FN) 171 (= TN) 175

20 180 200

True Positive rate = P(T+ | D+) False Positive rate = P(T+ | D )


16 9
Sensitivity = 20 = .80 1 specificity = 180 = .05

False Negative rate = P(T | D+) True Negative rate = P(T | D )


4 171
1 sensitivity = 20 = .20 Specificity = 180 = .95
Ismor Fischer, 5/29/2012 3.4-2

In order to be able to apply this test to the general population, we need accurate
estimates of its predictive values of a positive and negative test, PV+ = P(D+ | T+)
and PV = P(D | T ), respectively. We can do this via the basic definition
P(B A)
P(B | A) = P(A)
which, when applied to our context, becomes
P(D+ T+) P(D T )
P(D+ | T+) = P(T+) and P(D | T ) = ,
P(T )
TP TN
often written PV+ = TP + FP and PV = FN + TN .

16 171
Here, PV+ = 25 = 0.64 and PV = 175 = 0.977.

However, a more accurate determination is possible, with the use of


P(A | B) P(B)
Bayes Formula: P(B | A) = P(A | B) P(B) + P(A | Bc) P(Bc)

which, when applied to our context, becomes


P(T+ | D+) P(D+)
P(D+ | T+) = ,
P(T+ | D+) P(D+) + P(T+ | D ) P(D )
(Sensitivity)(Prevalence)
i.e., PV+ =
(Sensitivity)(Prevalence) + (False Positive rate)(1 Prevalence)
and
P(T | D ) P(D )
P(D | T ) = ,
P(T | D ) P(D ) + P(T | D+) P(D+)
(Specificity)(1 Prevalence)
i.e., PV = .
(Specificity)(1 Prevalence) + (False Negative rate)(Prevalence)

All the ingredients are obtainable from the table calculations, except for the
baseline prevalence of the disease in the population, P(D+), which is usually
grossly overestimated by the corresponding sample-based value, in this case,
20/200 = .10. We must look to outside published sources and references for a
more accurate estimate of this figure.
Ismor Fischer, 5/29/2012 3.4-3

Suppose that we are able to determine the prior probabilities:


P(D+) = .04 and therefore, P(D ) = .96.
Then, substituting, we obtain the following posterior probabilities:
(.80)(.04) (.95)(.96)
PV+ = = .40 and PV = = .99.
(.80)(.04) + (.05)(.96) (.95)(.96) + (.20)(.04)

Therefore, a positive test result increases the probability of having this disease from
4% to 40%; a negative test result increases the probability of not having the disease
from 96% to 99%. Hence, this test is extremely specific for the disease (i.e., low
false positive rate), but is not very sensitive to its presence (i.e., high false negative
rate). A physician may wish to use a screening test with higher sensitivity (i.e., low
false negative rate). However, such tests also sometimes have low specificity (i.e.,
high false positive rate), e.g., MRI screening for breast cancer. An ideal test
generally has both high sensitivity and high specificity (e.g., mammography), but are
often expensive. Typically, health insurance companies favor tests with three criteria:
cheap, fast, and easy, e.g., Fecal Occult Blood Test (FOBT) vs. colonoscopy.
Patient-obtained fecal smears are analyzed
FUITA Procedure
for presence of blood in stool, a possible
sign of colorectal cancer. High false
positive rate (e.g., bleeding hemmorhoid).

High cost Low cost No cost!

Overwhelmingly preferred by most insurance companies.


Ismor Fischer, 5/29/2012 3.4-4

Evidence-Based Medicine: Receiver Operating Characteristic (ROC) Curves


Originally developed in the electronic communications field for displaying Signal-
to-Noise Ratio (SNR), these graphical objects are used when numerical cutoff
values are used to determine T+ versus T .
Example: Using blood serum markers in a screening test (T) for detecting fetal
Downs syndrome (D) and other abnormalities, as maternal age changes.
Triple Test: Uses three maternal
serum markers (alpha-fetoprotein,
unconjugated oestriol, and human
gonadotrophin) to calculate a womans
individual risk of having a Down
syndrome pregnancy.

IDEAL
TEST

AUC = 1 sensitive,
Age 40 but not
specific
optimal
cutoff
Age 35

Age 30
Age 25
Age 20

specific,
but not True + = False +
sensitive True = False
(nondiscriminatory test;
AUC = 0.5)
Ismor Fischer, 5/29/2012 3.4-5

The True Positive rate (from 0 to 1) of the test is graphed against its False Positive
rate (from 0 to 1), for a range of age levels, and approximated by a curve contained
in the unit square. The farther this graph lies above the diagonal i.e., the closer it
comes to the ideal level of 1 the better the test. This is often measured by the
Area Under Curve (AUC), which has a maximum value of 1, the total area of the
unit square. Often in practice, the curve is simply the corresponding polygonal
graph (as shown), and AUC can be numerically estimated by the Trapezoidal Rule.
(It can also be shown that this value corresponds to the probability that a random
pregnancy can be correctly classified as Down, using this screening test.) Illustrated
below are the ROC curves corresponding to three different Down syndrome
screening tests; although their relative superiorities are visually suggestive, formal
comparison is commonly performed by a modified version of the Wilcoxon Rank
Sum Test (covered later).

Triple + dimeric inhibin A (DIA)


Ismor Fischer, 5/29/2012 3.4-6

Further Applications: Relative Risk and Odds Ratios


Measuring degrees of association between disease (D) and exposure (E) to a
potential risk (or protective) factor, using a prospective cohort study:
PRESENT FUTURE
TIME

Given: Exposed (E+) and Unexposed (E ) Investigate: Association with D+ and D

From the resulting data, various probabilities can be estimated. Approximately,

Disease Status
Diseased (D+) Nondiseased (D )
Risk Factor

Exposed (E+) p11 p12 p11 + p12

Unexposed (E ) p21 p22 p21 + p22

p11 + p21 p12 + p22 1

P(D+ E+) p11 P(D E+) p12


P(D+ | E+) = = P(D | E+) = =
P(E+) p11 + p12 P(E+) p11 + p12
P(D+ E ) p21 P(D E ) p22
P(D+ | E ) = = P(D | E ) = =
P(E ) p21 + p22 P(E ) p21 + p22

P(D+ | E+) p11 / (p11 + p12) p11


Odds of disease, given exposure = =
p12 / (p11 + p12)
=
p12
P(D | E+)
P(D+ | E ) p21 / (p21 + p22) p21
Odds of disease, given no exposure = =
p / (p + p )
=
p22
P(D | E ) 22 21 22

P(D+ | E+) P(D+ | E ) p11 p21 p11 p22 cross product


Odds Ratio: OR = = p p22 = p12 p21
P(D | E+) P(D | E ) 12 ratio
Comment: If OR = 1, then odds, given exposure = odds, given no exposure, i.e.,
no association exists between disease D and exposure E. What if OR > 1 or OR < 1?

P(D+ | E+) p11 / (p11 + p12) p11 (p21 + p22) cross product
Relative Risk: RR = = p / (p + p ) = p (p + p ) ratio
P(D+ | E ) 21 21 22 21 11 12

Comment: RR directly measures the effect of exposure on disease, but OR has


better statistical properties. However, if the disease is rare in the population, i.e.,
p11 (p21 + p22) p11 p22
if p11 0 and p21 0, then RR = p (p + p ) p12 p21 = OR.
21 11 12
Ismor Fischer, 5/29/2012 3.4-7

Recall our earlier example of investigating associations between lung cancer and
the potential risk factors of smoking and coffee drinking. First consider the former:

Lung Cancer
Diseased (D+) Nondiseased (D )
Smoking

Exposed (E+) .12 .04 .16

Not Exposed (E ) .03 .81 .84

.15 .85 1.00

P(D+ E+) .12 3 .04 1


P(D+ | E+) = = = ; therefore, P(D | E+) = = .
P(E+) .16 4 .16 4
A random smoker has a 3 out of 4 (i.e., 75%) probability of having lung cancer;
a random smoker has a 1 out of 4 (i.e., 25%) probability of not having lung cancer.
P(D+ | E+) 3/4 .12
Therefore, the odds of the disease, given exposure, = = 1/4 or .04 = 3.
P(D | E+)
The probability that a random smoker has lung cancer is 3 times greater than the probability that
he/she does not have it.

P(D+ E ) .03 1 .81 27


P(D+ | E ) = = .84 = ; therefore, P(D | E ) = .84 = .
P(E ) 28 28
A random nonsmoker has a 1 out of 28 (i.e., 3.6%) probability of having lung cancer;
a random nonsmoker has a 27 out of 28 (i.e., 96.4%) probability of not having lung cancer.
P(D+ | E ) 1/28 .03 1
Therefore, the odds of the disease, given no exposure, = = 27/28 or .81 = .
P(D | E ) 27
The probability that a random nonsmoker has lung cancer is 1/27 (= .037) times the probability
that he/she does not have it.
Or equivalently,
The probability that a random nonsmoker does not have lung cancer is 27 times greater than the
probability that he/she does have it.

odds(D | E+) 3 (.12) (.81)


Odds Ratio: OR = = 1/27 or the cross product ratio (.04) (.03) = 81 .
odds(D | E )
The odds of having lung cancer among smokers are 81 times greater than
the odds of having lung cancer among nonsmokers.

P(D+ | E+) 3/4 (.12) (.84)


Relative Risk: RR = = 1/28 or the cross product ratio (.16) (.03) = 21 .
P(D+ | E )
The probability of having lung cancer among smokers is 21 times greater than
the probability of having lung cancer among nonsmokers.

The findings that OR >> 1 and RR >> 1 suggest a strong association between lung
cancer and smoking. (But how do we formally show that this is significant? Later)
Ismor Fischer, 5/29/2012 3.4-8

Now consider measures of association between lung cancer and caffeine consumption.

Lung Cancer
Diseased (D+) Nondiseased (D )

.06 .34 .40


Caffeine
Exposed (E+)

Not Exposed (E ) .09 .51 .60

.15 .85 1.00

P(D+ E+) .06 .34


P(D+ | E+) = = = .15 ; therefore, P(D | E+) =
P(E+) .40 .40 = .85 .
A random caffeine consumer has a 15% probability of having lung cancer;
a random caffeine consumer has an 85% probability of not having lung cancer.
NOTE: P(D+ | E+) = .15 = P(D+), so D+ and E+ are independent events!
P(D+ | E+) .15 .06
Therefore, the odds of the disease, given exposure, = = .85 or .34 = .176 .
P(D | E+)
The probability that a random caffeine consumer has lung cancer is .176 times the probability
that he/she does not have it.
P(D+ E ) .09 .51
P(D+ | E ) = = .60 = .15 ; therefore, P(D | E ) = .60 = .85 .
P(E )
A random caffeine non-consumer has a 15% probability of having lung cancer;
a random caffeine non-consumer has an 85% probability of not having lung cancer.
P(D+ | E ) .15 .09
Therefore, the odds of the disease, given no exposure, = = .85 or .51 = .176 .
P(D | E )
The probability that a random caffeine non-consumer has lung cancer is .176 times the probability
that he/she does not have it.
odds(D | E+) .176 (.06) (.51)
Odds Ratio: OR = = .176 or the cross product ratio = 1.
odds(D | E ) (.34) (.09)
The odds of having lung cancer among caffeine consumers are equal to
the odds of having lung cancer among caffeine non-consumers.
P(D+ | E+) .15 (.06) (.60)
Relative Risk: RR = = .15 or the cross product ratio (.40) (.09) = 1.
P(D+ | E )
The probability of having lung cancer among caffeine consumers is equal to
the probability of having lung cancer among caffeine non-consumers.

NOTE: The findings that OR = 1 and RR = 1 are to be expected, since D+ and E+ are independent!
Thus, no association exists between lung cancer and caffeine consumption. (In truth, there actually is a
spurious association, since many coffee drinkers also smoke, which commonly leads to lung cancer.
In this context, smoking is a variable that confounds the association between lung cancer and caffeine,
and should be adjusted for. For a well-known example of a study where this was not done carefully
enough, with substantial consequences, see MacMahon B., Yen S., Trichopoulos D., et. al., Coffee and
Cancer of the Pancreas, New England Journal of Medicine, March 12, 1981; 304: 630-33.)
Ismor Fischer, 5/29/2012 3.4-9

Adjusting for Age (and other confounders)


Once again, consider the association between lung cancer and smoking in the earlier
example. A legitimate argument can be made that the reason for such a high
relative risk (RR = 21) is that age is a confounder that was not adequately taken
into account in the study. That is, there is a naturally higher risk of many cancers as
age increases, regardless of smoking status, so How do you tease apart the effects
of age versus smoking, on the disease? The answer is to adjust, or standardize,
P(D+ | E+)
for age. First, recall that relative risk RR = by definition, i.e., we are
P(D+ | E )
confining our attention only to individuals with disease (D+), and measuring the
effect of exposure (E+ vs. E). Therefore, we can restrict our analysis to the two
cells in the first column of the previous 2 2 table. However, suppose now that the
probability estimates are stratified on age, as shown:

D+
+
Age ni = #(E+) +
xi = #(D+ E+) pi = P(D+ | E+) = xi+/ ni+
+

50-59 250 5 5/250 = .02


E+ 60-69 150 15 15/150 = .10
70-79 100 40 40/100 = .40
+ + +
Total n = 500 x = 60 p = 60/500 = .12 (as before)

Age ni = #(E) xi = #(D+ E) pi = P(D+ | E) = xi/ ni


50-59 300 3 3/300 = .01
E 60-69 200 8 8/200 = .04
70-79 100 7 7/100 = .07

Total n = 600 x = 18 p = 18/600 = .03 (as before)

For each age stratum (i = 1, 2, 3),

ni+ = # individuals in the study who were exposed (E+), regardless of disease status

ni = # individuals in the study who were not exposed (E), regardless of disease status

xi+ = # of exposed individuals (E+), with disease (D+)

xi = # of unexposed individuals (E), with disease (D+)

Therefore,
pi+ = xi+ / ni+ = proportion of exposed individuals (E+), with disease (D+)

pi = xi / ni = proportion of unexposed individuals (E), with disease (D+)


Ismor Fischer, 5/29/2012 3.4-10

From this information, we can imagine a combined table of age strata for D+:

Age ni = ni+ + ni p i+ p i
50-59 550 .02 .01
E 60-69 350 .10 .04
70-79 200 .40 .07
Total n = 1100

Now, to estimate the age-adjusted numerator P(D+ | E+) of RR, we calculate the
weighted average of the proportions pi+, using their corresponding combined
sample sizes ni as the weights. That is,

ni pi (550)(.02) (350)(.10) (200)(.40) 126


P( D | E ) 0.1145
ni 550 350 200 1100

and similarly, the age-adjusted denominator P(D+ | E) of RR is estimated by the


weighted average of the proportions pi, again using the same combined sample
sizes ni as the weights:

ni pi (550)(.01) (350)(.04) (200)(.07) 33.5


P( D | E ) 0.0305
ni 550 350 200 1100

whereby we obtain
P(D+ | E+) 126
RRadj = = = 3.76.
P(D+ | E ) 33.5

Note that in this example, there is a substantial difference between the adjusted and
unadjusted risks. The same ideas extend to the age-adjusted odds ratio ORadj.
Ismor Fischer, 8/26/2016 3.5-1

3.5 Problems
1. In a certain population of males, the following longevity probabilities are determined.
P(Live to age 60) = 0.90
P(Live to age 70, given live to age 60) = 0.80
P(Live to age 80, given live to age 70) = 0.75
From this information, calculate the following probabilities.
P(Live to age 70)
P(Live to age 80)
P(Live to age 80, given live to age 60)

2. Refer to the barking dogs problem in section 3.2.


(a) Are the events Angel barks and Brutus barks statistically independent?

(b) Calculate each of the following probabilities.


P(Angel barks OR Brutus barks)
P(NEITHER Angel barks NOR Brutus barks), i.e.,
P(Angel does not bark AND Brutus does not bark)
P(Only Angel barks) i.e.,
P(Angel barks AND Brutus does not bark)
P(Only Brutus barks) i.e.,
P(Angel does not bark AND Brutus barks)
P(Exactly one dog barks)
P(Brutus barks | Angel barks)
P(Brutus does not bark | Angel barks)
P(Angel barks | Brutus does not bark)
Also construct a Venn diagram, and a 2 2 probability table, including marginal sums.

3. Referring to the urn model in section 3.2, are the events A = First ball is red and
B = Second ball is red independent in this sampling without replacement scenario? Does this
agree with your intuition? Rework this problem in the sampling with replacement scenario.

4. After much teaching experience, Professor F has come up with a conjecture about office hours:
There is a 75% probability that a random student arrives to a scheduled office hour within the
first 15 minutes (event A), from among those students who come at all (event B). Furthermore,
there is an 80% probability that no students will come to the office hour, given that no students
arrive within the first 15 mins. Answer the following. (Note: Some algebra may be involved.)
(a) Calculate P(B), the probability that any students come to the office hour.
(b) Calculate P(A), the probability that any students arrive in the first 15 mins of the office hour.
(c) Sketch a Venn diagram, and label all probabilities in it.
Ismor Fischer, 8/26/2016 3.5-2

5. Suppose that, in a certain population of cancer patients having similar ages, lifestyles, etc., two
categorical variables I = Income (Low, Middle, High) and J = Disease stage (1, 2, 3, 4) have
probabilities corresponding to the column and row marginal sums in the 3 4 table shown.

Cancer stage
1 2 3 4
Low 0.5
Income
Level
Middle 0.3
High 0.2
0.1 0.2 0.3 0.4 1.0

(a) Suppose I and J are statistically independent. Complete all entries in the table.

(b) For each row i = 1, 2, 3, calculate the following conditional probabilities, across the columns
j = 1, 2, 3, 4:

P(Low Inc | Stage 1), P(Low Inc | Stage 2), P(Low Inc | Stage 3), P(Low Inc | Stage 4)
P(Mid Inc | Stage 1), P(Mid Inc | Stage 2), P(Mid Inc | Stage 3), P(Mid Inc | Stage 4)
P(High Inc | Stage 1), P(High Inc | Stage 2), P(High Inc | Stage 3), P(High Inc | Stage 4)

Confirm that, for j = 1, 2, 3, 4:


P(Low Income | Stage j) are all equal to the unconditional row probability P(Low Income).
P(Mid Income | Stage j) are all equal to the unconditional row probability P(Mid Income).
P(High Income | Stage j) are all equal to the unconditional row probability P(High Income).

That is, P(Income i | Stage j) = P(Income i). Is this consistent with the information in (a)? Why?

(c) Now for each column j = 1, 2, 3, 4, compute the following conditional probabilities, down the
rows i = 1, 2, 3:

P(Stage 1 | Low Inc), P(Stage 2 | Low Inc), P(Stage 3 | Low Inc), P(Stage 4 | Low Inc),
P(Stage 1 | Mid Inc), P(Stage 2 | Mid Inc), P(Stage 3 | Mid Inc), P(Stage 4 | Mid Inc),
P(Stage 1 | High Inc). P(Stage 2 | High Inc). P(Stage 3 | High Inc). P(Stage 4 | High Inc).

Likewise confirm that, for i = 1, 2, 3:

P(Stage 1 | Income i) P(Stage 2 | Income i) P(Stage 3 | Income i) P(Stage 4 | Income i)


are all equal to the are all equal to the are all equal to the are all equal to the
unconditional column unconditional column unconditional column unconditional column
probability P(Stage 1). probability P(Stage 2). probability P(Stage 3). probability P(Stage 4).

That is, P(Stage j | Income i) = P(Stage j). Is this consistent with the information in (a)? Why?

Technically, we have only defined statistical independence for events, but it can be formally extended to general random
variables in a natural way. For categorical variables such as these, every category (viewed as an event) in I, is statistically
independent of every category (viewed as an event) in J, and vice versa.
Ismor Fischer, 8/26/2016 3.5-3

6. A certain medical syndrome is usually associated with two overlapping sets of symptoms, A and B.
Suppose it is known that:
If B occurs, then A occurs with probability 0.80 .
If A occurs, then B occurs with probability 0.90 .
If A does not occur, then B does not occur with probability 0.85 .
Find the probability that A does not occur if B does not occur. (Hint: Use a 2 2 probability
table; label the marginal probabilities a P ( A), b P ( B ), and the intersection probability
c P ( A B ). Fill out the rest of the table from the given statements using these symbols, then
use some algebra.)

7. The progression of a certain disease is typically characterized by the onset of up to three distinct
symptoms, with the following properties:
Each symptom occurs with 60% probability.
If a single symptom occurs, there is a 45% probability that the two other symptoms will also occur.
If any two symptoms occur, there is a 75% probability that the remaining symptom will also occur.
Answer each of the following. (Hint: Use a Venn diagram.)
(a) What is the probability that all three symptoms will occur?
(b) What is the probability that at least two symptoms occur?
(c) What is the probability that exactly two symptoms occur?
(d) What is the probability that exactly one symptom occurs?
(e) What is the probability that none of the symptoms occurs?
(f) Is the event that a symptom occurs statistically independent of the event that any other
symptom occurs?

8. I have a nephew Berkeley and niece Chelsea (true) who, when very young, would occasionally
visit their Uncle Ismor on weekends (also true). Furthermore,
i. Berkeley and Chelsea visited independently of one another.
ii. Berkeley visited with probability 80%.
iii. Chelsea visited with probability 75%.
However, it often happened that some object in his house especially if it was fragile
accidentally broke during such visits (not true). Furthermore,
iv. The probability of such an accident occurring, given that both children visited, was 90%.
v. The probability of such an accident occurring, given that only Berkeley visited, was 35%.
vi. The probability of such an accident occurring, given that only Chelsea visited, was 20%.
vii. The probability of such an accident occurring, given that neither child visited, was 2%.
Sketch and label a Venn diagram for events A = Accident, B = Berkeley visited, and C = Chelsea
visited. (Hint: The Exercise on page 3.2-3 might be useful.)
Ismor Fischer, 8/26/2016 3.5-4

9. At a certain meteorological station, data are being collected about the behavior of
thunderstorms, using two lightning rods A and B. It is determined that, during a typical storm,
there is a 99% probability that lightning will strike at least one of the rods. Moreover, if A is
struck, there is a 60% probability that B will also be struck, whereas if B is struck, there is a 75%
probability that A will also be struck. Calculate the probability of each of the following events.
(Hint: See PowerPoint section 3.2, slide 30.)

Both rods A and B are struck by lightning


Rod A is struck by lightning
Rod B is struck by lightning

Are the two events A is struck and B is struck statistically independent? Explain.

10. The Monty Hall Problem (simplest version)


Between 1963 and 1976, a popular game show called
Lets Make A Deal aired on network television, starring
charismatic host Monty Hall, who would engage in deals
small games of chance with randomly chosen studio
audience members (usually dressed in outrageous costumes)
for cash and prizes. One of these games consisted of first
having a contestant pick one of three closed doors, behind one
of which was a big prize (such as a car), and behind the other
two were zonk prizes (often a goat, or some other farm
animal). Once a selection was made, Hall who knew what
was behind each door would open one of the other doors
that contained a zonk. At this point, Hall would then offer
the contestant a chance to switch their choice to the other
closed door, or stay with their original choice, before finally
revealing the contestants chosen prize.

Question: In order to avoid getting zonked, should the


optimal strategy for the contestant be to switch, stay, or does it not make a difference?
Ismor Fischer, 8/26/2016 3.5-5

11.
(a) Given the following information about three events A, B, and C.

P( A B) 0.69 P( A B) 0.19
P( A C ) 0.70 P ( A C ) 0.20
P( B C ) 0.71 P ( B C ) 0.21

Find the values of P ( A), P ( B ), and P (C ) .

(b) Suppose it is also known that the two events A C and B are statistically independent.
Sketch a Venn diagram for events A, B, and C.

12. Recall that in a prospective cohort study, exposure (E+ or E ) is given, so that the odds ratio is
defined as
odds of disease given exposure P(D+ | E+) P(D | E+)
OR = = .
odds of disease given no exposure P(D+ | E ) P(D | E )

Recall that in a retrospective case-control study, disease status (D+ or D ) is given; in this case,
the corresponding odds ratio is defined as

odds of exposure given disease P(E+ | D+) P(E | D+)


OR = = .
odds of exposure given no disease P(E+ | D ) P(E | D )
Show algebraically that these two definitions are mathematically equivalent, so that the same
cross product ratio calculation can be used in either a cohort or case-control study, as the
following two problems demonstrate. (Recall the definition of conditional probability.)

13. Suppose that some entries of the joint probability table of two independent events (given by,
respectively, the rows and columns below) have been accidentally deleted.

.10
.20
.25

Restore the missing values. (Hint: Start by letting x and y be the first and third column marginal
probabilities respectively, then solve for the two row marginal probabilities.)
NOTE: There are two possible solutions. Find both of them.
Ismor Fischer, 8/26/2016 3.5-6

14. Under construction

15. An observational study investigates the connection between aspirin use and three vascular
conditions gastrointestinal bleeding, primary stroke, and cardiovascular disease using a group
of patients exhibiting these disjoint conditions with the following prior probabilities:
P(GI bleeding) = 0.2, P(Stroke) = 0.3, and P(CVD) = 0.5, as well as with the following
conditional probabilities: P(Aspirin | GI bleeding) = 0.09, P(Aspirin | Stroke) = 0.04, and
P(Aspirin | CVD) = 0.02.
(a) Calculate the following posterior probabilities: P(GI bleeding | Aspirin), P(Stroke | Aspirin),
and P(CVD | Aspirin).
(b) Interpret: Compare the prior probability of each category with its corresponding posterior
probability. What conclusions can you draw? Be as specific as possible.

16. On the basis of a retrospective study, it is determined (from hospital records, tumor registries, and
death certificates) that the overall five-year survival (event S) of a particular form of cancer in a
population has a prior probability of P(S) = 0.4. Furthermore, the conditional probability of
having received a certain treatment (event T) among the survivors is given by P(T | S) = 0.8, while
the conditional probability of treatment among the non-survivors is only P(T | Sc) = 0.3.
PAST PRESENT
Treatment (T): 5 years Given: Survivors (S)
P(T | S) = 0.8 vs. Non-survivors (Sc)
P(T | Sc) = 0.3 P(S) = 0.4

(a) A cancer patient is uncertain about whether or not to undergo this treatment, and consults with
her oncologist, who is familiar with this study. Compare the prior probability of overall
survival given above with each of the following posterior probabilities, and interpret in context.
Survival among treated individuals, P(S | T)
Survival among untreated individuals, P(S | Tc)
(b) Calculate the Relative Risk of survival for this disease. Interpret this value.

(c) Also calculate the following.


Odds of survival, given treatment
Odds of survival, given no treatment
Odds Ratio of survival for this disease. Interpret this value.

17. WARNING! This problem is not for the mathematically timid.

Recall that two events A and B are statistically independent if P ( A B) P ( A) P ( B ) .


It therefore follows that the difference

P( A B) P( A) P( B)

is a measure of how far from statistical independence any two arbitrary events A and B are.
Prove that 4 . When is the inequality sharp? (That is, when is equality achieved?)
1
Ismor Fischer, 8/26/2016 3.5-7

18. First, recall that, for any two events A and B, the union A B defines the inclusive or i.e.,
Either A occurs, or B occurs, or both.
Now, consider the event Only A i.e., Event A occurs, and event B does not occur defined
as the intersection A Bc, also denoted as the difference A B. Likewise, Only B = B and
not A = B Ac = B A. Using these, we can define xor the so-called exclusive or i.e.,
Either A occurs, or B occurs, but not both as the union (A B) (B A), or equivalently,
(A B) (A B). This is also sometimes referred to the symmetric difference between A and
B, denoted A B. (See the two regions corresponding to the highlighted formulas below.)

A B

AB BA
= A Bc A B = Ac B

(a) Suppose that two treatment regimens A and B exist for a certain medical condition. It is
reported that 35% of the total patient population receives Treatment A, 40% receives
Treatment B, and 14% receives both treatments. Construct the corresponding Venn diagram
and 2 2 probability table. Are the two treatments A and B statistically independent of
one another?
Calculate P(A or B), and P(A xor B).

(b) Suppose it is discovered that an error was made in the original medical report, and it is
actually the case that 35% of the population receives only Treatment A, 40% receives only
Treatment B, and 14% receives both treatments. Construct the corresponding Venn diagram
and 2 2 probability table. Are the two treatments A and B statistically independent of
one another?
Calculate P(A or B), and P(A xor B).
Ismor Fischer, 8/26/2016 3.5-8

19. Three of the most common demographic variables used in epidemiological studies are age, sex,
and race. Suppose it is known that, in a certain population,

30% of whites are men, 40% of males are white men, 50% of white males are men.

(a) What percentage of whites are male? Formally justify your answer!

(b) What percentage of males are white? Formally justify your answer!

Hint: Follow the same notation as the example in section 3.2, slide 26, of the PowerPoint slides.

20. In another epidemiological study, it is known that, for a certain population,

10% of adults are men, 20% of males are white, 30% of whites are adults

40% of males are men, 50% of whites are male.

What percentage of adults are white?

Hint: Find a connection between the products P(A | B) P(B | C) P(C | A) and P(B | A) P(C | B) P(A | C).

21. The Shell Game. In the traditional version, a single pea is placed under one of three walnut
half-shells in full view of an observer. The shells are then quickly shuffled into a new random
arrangement, and the observer then guesses which shell contains the pea. If the guess is correct,
the observer wins.

(a) For the sake of argument, suppose there are 20 half-shells instead of three,
and the observer plays the game a total of n times. What is the probability
that he/she will guess correctly at least once out of those n times? How
large must n be, in order to guarantee that the probability of winning is over
50%? What happens to the probability as n ?

(b) Now suppose there are n half-shells, and the observer plays the game a total of n times.
What is the probability that he/she will guess correctly at least once out of those n times?
What happens to this probability as n ?

Hint (for both parts): First calculate the probability of losing all n times.

22. (a) By definition, two events A and B are statistically independent if and only if P(A | B) = P(A).
Prove mathematically that two events A and B are independent if and only if P(A | B) = P(A | Bc).
[Hint: Let P(A) = a, P(B) = b, P(A B) = c, and use either a Venn diagram or a 2 2 table.]

(b) More generally, let events A, B1, B2, , Bn be defined as in Bayes Theorem. Prove that:

A and B1 are independent, A and B2 are independent, , A and Bn are independent

if and only if P(A | B1) = P(A | B2) = = P(A | Bn).


[Hint: Use the Law of Total Probability.]
Ismor Fischer, 8/26/2016 3.5-9

23. Prove that the relative risk RR is always between 1 and the odds ratio OR. (Note there are three
possible cases to consider: RR < 1, RR = 1, and RR > 1.)

24. Consider the following experiment. Pick a random integer from 1 to 1012.
(a) What is the probability that it is either a perfect square (1, 4, 9, 16, ) or a perfect cube
(1, 8, 27, 64,)?

(b) What is the probability that it is either a perfect fourth power (1, 16, 81, 256, ) or a perfect
sixth power (1, 64, 729, 4096,)?

25. As defined at the beginning of this chapter, the probability of Heads of a coin is formally
X ( n)
identified with lim when that limiting value exists where n = # tosses, and X = # Heads
n n
in those n tosses. Show by a mathematical counterexample that in fact, this limit need not
necessarily exist. That is, provide an explicit sequence of Heads and Tails (or ones and zeros)
X ( n)
for which the ratio does not converge to a unique finite value, as n increases.
n

26. Warning: These may not be quite as simple as they look.

(a) Consider two independent events A and B. Suppose A occurs with probability 60%, while
B only occurs with probability 30%. Calculate the probability that B occurs, i.e., P(B).

(b) Consider two independent events C and D. Suppose they both occur together with
probability 72%, while there is a 2% probability that neither event occurs. Calculate the
probabilities P(C) and P(D).

27. Solve for the middle cell probability (?) in the following partially-filled probability table.

.01 .02
? .50
.03 .04
.60

28. How far away can a prior probability be from its posterior probabilities?
Consider two events A and B, and let P(A | B) = p and P(A | Bc) = q be fixed probabilities.
If p = q, then A and B are statistically independent (see problem 22 above), and thus the
prior probability P(B) coincides with its corresponding posterior probabilities P(B | A) and
P(B | Ac) exactly, yielding a minimum value of 0 for the absolute differences
| P ( B ) P ( B | A) | and | P ( B ) P ( B | AC ) | .

In terms of p and q (with p q), what must P(B) be for the maximum absolute differences to
occur, and what are their respective values?
Ismor Fischer, 8/26/2016 3.5-10

29. Let A, B, and C be three pairwise-independent events, that is, A and B are independent, B and C
are independent, and A and C are independent. It does not necessarily follow that
P ( A B C ) P ( A) P ( B ) P (C ) , as the following Venn diagram illustrates. Provide the details.

a (1 b c ) d

ab d ac d
d

b (1 a c ) d bc d c (1 a b) d

B C

1 a b c + ab + ac + bc d

30. Bar Bet


(a) Suppose I ask you to pick any four cards at random from a deck of 52, without replacement,
and bet you one dollar that at least one of the four is a face card (i.e., Jack, Queen, or King).
Should you take the bet? Why? (Hint: See how the probability of this event compares to 50%.
If this is too hard, try it with replacement first.)

(b) What if the bet involves picking three cards at random instead of four? Should you take the
bet then? Why?

(c) Refer to the posted Rcode folder for this part. Please answer all questions.
Ismor Fischer, 8/26/2016 3.5-11

31.
(a) True or False? If event A and event B are statistically independent, and if event A and event C
are statistically independent, then it follows that event B and event C are statistically
independent. Prove or find a counterexample (e.g., via a Venn diagram).

(b) Repeat part (a), with the word independent replaced by dependent.

32.
(a) True or False? If event A and event B are statistically independent, and if event A and event C
are statistically independent, then it follows that event A and event B C i.e., B and C are
statistically independent. Prove or find a nontrivial counterexample (e.g., via a Venn diagram).

(b) True or False? If event A and event B are statistically independent, and if event A and event C
are statistically independent, and if event A and event B C i.e., B and C are statistically
independent, then it follows that event A and event B C i.e., B or C are statistically
independent. Prove or find a nontrivial counterexample (e.g., via a Venn diagram).

(c) Repeat part (b), but with the underlined condition replaced by the condition B C .
(See Venn diagram.)

B A C

You might also like