Professional Documents
Culture Documents
~~ ~------------~0
this solution set.
;;
2. Let x denote the gain on one of these
policies. Then x = 36 ifthere is no fire and Excel: binom.dist(3,20,0.3,1) = 0.1071
x = 36- 15,000 = -14,964 ifthere is a fire.
Now write down the prob. dist'n. ofx: 6. If Larry is simply guessing the shape shown
on each card drawn, we basically have the
X P(x) "guessing on a multiple-choice exam"
-14,964 0.002 (2 fires per 1000) problem, in this case with 16 trials
36 0.998 (1 - .002) (questions) and P(success) = 0.20 on each
trial. The conditions for modeling x =
E[x] = 2: xP(x) = (-14964)(.002) + 36(0.998) number of correct answers as a binomial
= $6.00 random variable appear to be well-satisfied,
No, on this policy the company will earn $36 so we assume x- b(n = 16, p = 0.20), and
or lose $14,964. we wish to fmd P(at most 3 correct)= P(x s:
3) and P(at least 10 wrong)= P(at most 6
3. Let a success be a number greater than 2. correct)= P(x ,; 6):
Then P(success) = 4/6 = 2/3. Let x be the
--~i- --------··--- ·-· -----·--l
number of successes in 10 rolls of the die, 0 q >~ 7 :z_o
and this sounds like (and is) a binomial .....
problem where x- b(n = 10, p = 2/3), and Excel: binom.dist(3, 16,0.2, 1) = 0.5981
we want to fmd P(x z 8) = 1 - P(x s: 7); binom.dist(6,16,0.2,1) = 0.9733
remember: draw a picture. Since 0.67 (2/3
rounded) is not tabled, we would have to 7. It is probably reasonable to assume that such
either compute the needed probability injuries occur randomly over time, and we'll
manually (not the sort of messy calculation assume that in this plant individual injuries
one would want to deal with) or get Excel to are the rule and simultaneous injuries to
do it for us. Excel is the clear choice, and multiple individuals are very rare (although
the formula would be clearly the plants in some industries would
_JJ__.. not satisfy this description), which should
---z 111~ make the Poisson distribution a suitable
..,._! ' - probability model for the number of work-
Excel: 1- binom.dist(7,10,2/3,1) = 0.2991 related injuries within an interval of time. In
this case the concern is with a one-year time
4. Let a success be the event that a person takes period, and the expected annual frequency is
a room. We're given that P(success) = 0.1. 12 x 0.75 = 9 =A. Thus, we assume that x =
P(motel still will not be full) = P(fewer than number of work-related injuries in the one-
2 successes). If we let x =number of year period is distributed P(A = 9) assuming
successes out of 20 trials, it seems clear that the new safety procedures have no impact on
x is a binomial random variable (check the injury frequency. The question of interest
three conditions that must be satisfied). becomes "How unlikely was it to see only 7
Thus, x - b(n = 20, p = 0.1 ), and we want to injuries if the expected number was 9?" and
fmd P(x < 2) = P(x s: 1), which, is this is then commonly rephrased as "What's
I 'I'
"'l
I
(' I ,'1.- ~
· - - - - - ·.. ··--··------!
2. \)
the chance the plant would have seen 7 or
+' even fewer injuries in the year following
Excel: binom.dist(l,20,0.1,1) = 0.3917 adoption of the new safety procedures if the
HW 2 Solution - page 2
1--;-:-:-:-; ~lr-
~
----
Excel: poisson.dist(7,9,1) = 0.3239
The area between 70 and x 0 is O.:W, which
A chance of roughly 1/3 says that a count of means that x 0 is about 1.28cr above 70, so Xa
7 or fewer would not be that surprising if the = 70 + 1.28(12) = 85.36 ==> $85,360.
expected count were equal to 9, so the Excel: norm.inv(0.90,70,12) = $85,379
results are not very convincing that the new
safety rules have been effective.
b.
8. A batch of 60 cookies will contain 450
chocolate chips, which means an average of
450 I 60 = 7.5 chocolate chips per cookie. 7o X
0
Some will have more and some will have
less (and in fact none will have exactly 7.5
chips assuming individual chips remain The area between 70 and x0 is 0.25, which
intact during the making of the cookies). means that x 0 is 0.675cr above J-1, so Xo = 70
We're told that a Poisson probability model + 0.675(12) = 78.1 ==> $78,100
is reasonable for the number of chips (x) in a Excel: norm.inv(0.75,70,12) = $78,094
cookie, so we assume x- P(A. = 7.5), and we
want to fmd P(x < 5) = P(x :s; 4):
~---;--;-]-~--t- ---~
Excel: poisson.dist(4,7.5,1) = 0.1321
4.12.e. 0.7
f. 0.9
g. E[x] = 1(0.1) + 3(0.2) + ... + 9(0.1) = 5.0
e. P(x>13.24)=0.5-0.3686=0.1314
/0
Since half (0.50) the distribution lies below Clearly, P( container is underfilled) =
the mean, it follows that the area between 50 P(x < l 0) = 0.5 and P( overfilled)=
and X0 must be 0.3413, so Xo must be l.Ocr P(x > 10) = 0.5.
above the mean. Thus, x0 =50+ 1.0 (3) =
53.0.
Excel: norm.inv(0.8413,50,3)
HW 2 Solution - page 4
~'
;::.. .
--==- -=;-j
= 0.40 + 0.39- 0 = 0.79
Jc./..;..17Y\ 2 1"(Vir/f!
Sf""j? : l
If N
3 '-/
Answers: a. 114 b. Yz, Yz c. 1/3 0 o.tf o 0. tfo
_ _ _
6.27. x = 30/6 = 5; s2 = [(4-5)2 + ... + (3-5)2]/(6-1) 6.122.a. 99% CI: x ± mult ó_x = x ± mult ó/%n&.
= 26 / 5 = 5.2 ==> s = %5&.& 2 = 2.28 _Estimate ó with s, multiple becomes tn-1, or
x ± tn-1 (s/%n& ) = 1.13 ± 2.648 (2.21/%7&2 ) =
a. 90%: 5 ± 2.015 (2.28 / %6& ) = 5 ± 1.876 1.13 ± 0.690.
b. 95%: 5 ± 2.571 (2.28 / %6& ) = 5 ± 2.393 We are 95% confident that the true mean
c. 99%: 5 ± 4.032 (2.28 / %6& ) = 5 ± 3.753 number of pecks is contained in the interval
_ (0.44, 1.82).
Now assume same x and s, but let n = 25. Note: The t-multiple of 2.648 is for d.f.=70,
which is the closest value to 71 that is higher
a. 90%: 5 ± 1.711 (2.28 / %2&5 ) = 5 ± 0.780 up in the t table in the 12th edition of MBS.
b. 95%: 5 ± 2.064 (2.28 / %2&5 ) = 5 ± 0.941 In earlier editions, the t table jumped from
c. 99%: 5 ± 2.797 (2.28 / %2&5 ) = 5 ± 1.275 30 to 40 to 60 to 120, so d.f.=70 was not in
_ the table. In that older version of the table,
6.32.a. x = 3.8, s = 1.2, n = 20 the closest value to 71 that is higher up in
90% C.I. est. for ì: the table is 2.660, which corresponds to
3.8 ± 1.729 (1.2 / %2&0 ) = 3.8 ± 0.464 d.f.=60; using 2.660 would produce a
slightly wider CI.
b. We can be 90% confident that the interval b. The result in part a indicates that it is very
(3.336, 4.264) contains the true mean LOS likely that the mean for blue string is less
for women in this state’s hospitals in 2008. than 2 pecks, which provides very convinc-
c. It’s an interval constructed in such a way ing evidence that chickens are more likely to
that 90% of such intervals will contain the peck at white string (in light of previous
population measure being estimated. research supporting a ì of 7.5 for white
string).
6.43.a. Your text uses the rule that n is large enough _ _
^ and (1 - np^ ) are both $ 15. These
if np 6.123.a. 99% CI: x ± mult ó_x = x ± mult ó/%n&.
conditions are satisfied here, so n is large _Estimate ó with s, multiple becomes tn-1, or
enough. x ± tn-1 (s/%n& ) = 49.3 ± 9.925 (1.5/%3& ) =
49.3 ± 8.60.
b. ^p ± 1.96óp^ = 0.46 ± 1.96(0.033) b. We are 99% confident that the true mean
= 0.46 ± 0.0647 percentage of B(a)p removed by the toxin is
c. 95% confident that this interval contains the contained in the interval (40.7%, 57.9%).
true population proportion (in the sense that c. The probability distribution of x - the pct. of
95% of the intervals constructed this way B(a)p removed from a soil specimen by the
would do so). toxin - is roughly mound-shaped and
symmetric.
6.45.a. ^p = 818 / 2045 = 0.40 d. Based on the confidence interval in part a,
b. p^ is distributed approximately normal with 50% is certainly a plausible value for the
mean = p and std. error = %& 1-&p&
p(& )/&
n . To true mean percent removed.
build a CI, must estimate p in the standard e. Omit this part.
error with p^.
c. 95% CI: p^ ± mult %[ p^ (1 - p^)]] =
7.11. H0 : p = 0.07 vs. H.: p < 0.07 e. The probability distribution of breaking
strength of the new bonding adhesive is
7.13 H0 : !.! = 863 vs. H.: !.! < 863 roughly mound-shaped and symmetric.
7.53. H0 : !.! = 6000 vs. H.: !.! < 6000 7.78. H 0 : p = 0.70 vs. H.: p f 0.70
Test Stat: (pt. est.- hyp. value)= ~ Test Stat: (pt. est.- hyp. value)= ~
std. error of pt. est. a, std. error of pt. est. a!>
1\
s~
!Ill - tn_ 1 (estimate a with s) /poCl~:o)ln - appx N(O,l)
~s / [ \ 1
·~.us
-!.l.'lt 0 l,t,'/t
~·''''
o /, ?../,r
SA
~I
Test Stat: = ~~ - (@ 1) 0
SA
~I
tcalc = (.078767- 0)/ .007938 = 9.92 > 2.080 tcalc = (-0.27081 - 0)/ .03036 = -8.92 < -2.086
==> Reject H0, conclude average tweet rate ==>Reject H0 , conclude percentage of
one week before movie's release is a useful students below the poverty level is a useful
predictor of opening weekend box office predictor of a school's average 3'd-grade
revenue. score on the FCA T reading exam.
c. Estimate of a iss= 13.3165. Note that sis c. Estimate ofcr iss= 3.42319. In this
an estimate of the standard deviation of y for problem, for a given percentage of students
a given value ofx. In this problem, for a below the poverty level, there is a high
given average hourly tweet rate one week probability ("95%) that a school's actual
before opening, there is a high probability average 3'ct-grade score on the FCAT
( "95%) that a movie's actual opening reading exam will fall within 6.85 points ( "2
weekend box office revenue will fall within times s) of the estimated average.
$27M ("2 times s) ofthe estimated expected d.~= 79.9% ==> 79.9% ofthe variation in
revenue. average 3'ct-grade score on the FCAT
d. ~ = 82.4% ==> 82.4% of the variation in reading exam across the schools in this
opening weekend box office revenue across sample can be explained by variation in the
the movies in this sample can be explained percentage of students below the poverty
by variation in the average hourly one-week- Aevel across those schools.
,;thead tweet rate across those movies. e. ~~ = -0.27081, which says that on the
e. ~~ = 0.078767, which says that on the average, the average 3'd-grade score on the
average, opening weekend box office FCAT reading exam drops by 0.27081
revenue increases by $0.078767M, or points (? - units aren't provided) for each
$78,767 for each additional tweet in the one percentage point increase in the
average hourly one-week-ahead tweet rate. percentage of students below the poverty
Equivalently, it sayS' that on the average, level
opening weekend box office revenue f. See page 3.
increases by roughly $7.88M for each
additional I 00 tweets in the average hourly 11.103.
one-week-ahead tweet rate. a. y= 44.130 + 0.2366x, where
f. See page 3. (Note in this scatter plot that the y= manager success index
data suggest a violation in one of the and x = no. of interactions w/ outsiders
underlying population assumptions: the b. Test H 0 : ~~ = 0 against H.: ~~ fO
variability in y appears to increase as x (\
-2.//C 0 1.110
140 • jj ~ /, I f o f 0. 0 7! 7 t 7 X
./'
120
100 ~¥
·I
Q)
::l
=
Q)
::>
80
Q)
~ 60
(z.oo 1 !1..7o) (t'fO~ 1 1/j, 1/1. J X "
40 f
20
•
l-00 /,Jfo f o.rnf7?7 (1-uo)=- /!.. 9o
0
l'f oo
0 200 400 600 800 1000 1200 1400
Tweets
~ (t.o,tfU.) • •
~ 170 • ~ • 0oo, /Pl. 9)
~
165 X y
160 lh, 013-0. J:?of 1 ()..o) = /!/,?
• v'
10 20 30 40 50 60 70 80 90 100
/r!O
POVERTY
d
40
• 0 1/ L/, /30 f 0.1-3 U (o) 11. J3o
30 • • .o0
• • I
20
0 10 20 30 40 50 60 70
9o (fo) ~ iJ.'-Ilt/
80 90
Interacts
BZAN 6310 - Solution to HW 9a
l.a. y=45.103192 + 8.952449x 1 + 1.212715x2 + population of people who have those given
9.94556x 3, where values for those three predictors. SBP in
y =systolic blood pressure (mm of mercury) this sub-population is a random variable
x 1 = Quetelet index (metric version ofBMI) whose mean, by our assumptions, is given by
x 2 = age in years Po+ P1QUET + P2AGE + P3 SMK. We
x 3 = smoking preference ( 1=yes, 2=no) assume that SBP is normally dist'd in that
b. Test H 0 : P1 = 0 against H.: P1 1'0 sub-population, and that it has a standard
1\ deviation a. Our estimate of that standard
Test Stat: = @1 - %)o - tn-(k+I) deviation in this case is 7.407, and this value
SA
~.
gives us an indication of how spread out the
systolic blood pressures are in this sub-popu-
'\L1~,..
lation. For example, we would expect
roughly 95% of those SBPs to fall within
about± 15 ("' 2 x 7 .407) mm of mercury of
the true mean SBP for that sub-population.
-1-.0<13' 0 (l,Oiif e. For specified values of relative size and
(,716 smoking preference (i.e., holding QUET and
tcalc = (8.592449- 0) I 4.498681 = 1.910, SMK constant), SBP increases by 1.212715
which does not fall in either tail, so cannot mm of mercury, on average, for each
reject H0 at the 5% significance level. (Note additional year in a person's age.
that the p-value for this test is 0.0664, and f. For specified values of QUET and AGE, the
the decision not to reject H 0 could also be average SBP of smokers is 9.945568 mm of
made by noting that 0.0664 > 0.05.) mercury more than the avg for non-smokers.
Interpretation: QUET does not contribute g. R 2 = 0.7609. Interpretation: Roughly 76%
significantly to explaining the variation in of the variation in systolic blood pressure
SBP after the effects of AGE and SMK have across the 32 males in this sample is
been accounted for. explained by variation in the relative sizes
c. Ho: P1 = Pz = PJ = 0 (measured by the Quetelet index), ages, and
H.: At least one slope 1'0 smoking preferences of those males.
Test Stat: F = MSR I MSE - F 3,28 • h. r
The highest would be obtained by using
I don't have a handy picture to use, but the AGE if we wanted a one-predictor model.
F 3,28 sampling distribution (which is correct In that case, r 2 would be (0.7752? = 0.601.
if H 0 is actually true) is anchored at F=O and The other questions can be answered by
skewed to the right. For a= 0.01, the using the fact that SST= 6425.969, so that
critical value (tail cut-off value) for the test SSR in this single-variable model would be
is 4.568 using =F.INV(0.99,3,28), so we (0.601) x (6425.969) = 3862. The corres-
reject H0 ifF calc> 4.568. ponding ANOVA table would then be
Fcalc = 1629.942 I 54.862252 = 29.710 »
4.568, so reject H 0, conclude the regression Source ss df MS
is significant (i.e., at,least one of the three Regression 3862 1 3862
slopes is 1'0). Equivalently, SAS gives the Error 2564 30 85.47
p-value as 0.0001 << 0.01 ->reject H 0 • Total 6426 31
d. s = 7.406906 is the point estimate for a.
Interpretation: The estimated standard s = v'MSE = 185.47 = 9.245
deviation of SBP around its mean (i.e., the Fcalc = MSR I MSE = 3862 I 85.47 = 45.2
population regression line) for any given set
of values for the predictors is 7.407. This (Note: The corresponding t-statistic for
statistic s is therefore an estimate of a testing H 0 : P1 = 0 can be found from tcalc =
measure of the variation in SBP in the v'Fcalc = v'/\45.2 = 6.72, and it would be +6.72
population. More precisely, for given values because P1 would be positive since the corre-
ofQUET, AGE, and SMK, there is a sub- lation between SBP and AGE is positive.