You are on page 1of 6

Huang Beiqing STAT107 Homework #2

1.

S c a tt er p lo t of u r b an v s f e r ti l i t
100

80

urban

60

40

20

0 1 2 3 4 5 fertilit 6 7 8

The scatterplot shows a negative association between the fertility rate and the percentage of urban population of a country. The countries with relatively larger percentages living in the city tend to have lower fertility rates (mostly between 1 and 3). On the other hand, the countries with higher fertility rates usually have smaller percentages of population living in urban areas. 2. Pearson correlation of urban and fertilit = -0.640 There seems to be a negative association between the two variables, but the relationship might not be necessarily linear. 3.

Fi t t ed L i n e P l o t
fertilit = 8 7 6 5.583 - 0.04483 urban
S R-Sq R-Sq(adj) 1.24880 40.9% 40.5%

fertilit

5 4 3 2 1 0 20 40 urban 60 80 100

R2 = 0.4096 It means that about 41% of the variance of the fertility rates is explained by the regression line. 4.

N o rm a l Pr o b ab i l i ty P lo t
(response is fertilit)
99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1

Percent

-4

-3

-2

-1

0 Residual

V er s u s F i t s
(response is fertilit) 4 3 2

Residual

1 0 -1 -2 -3 1 2 3 Fitted Value 4 5

The distribution of residuals doesnt appear perfectly normal, nor does the residuals seem random enough viewed against fits. 5.

Fi t t ed L i n e P l o t
fertilit = 6.775 - 0.1010 urban + 0.000534 urban**2 8 7 6
S R-Sq R-Sq(adj) 1.21775 44.2% 43.4%

fertilit

5 4 3 2 1 0 20 40 urban 60 80 100

This seems a better description, for R2 is 44.2%. This regression explains more variation than the linear one by over 3%.

6.

N o rm a l Pr o b ab i l i ty P lo t
(response is fertilit)
99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1

Percent

-4

-3

-2

-1

0 Residual

V er s u s F i t s
(response is fertilit) 4 3 2

Residual

1 0 -1 -2 -3 2 3 4 Fitted Value 5 6

This model fits slightly better than the linear model. In the normal probability plot, dots in the lower left corner form a straighter line. In the Versus Fits plot, the dots are more evenly distributed in the upper and lower regions, though they still cluster densely on the left side.

7.

Fi t t ed L i n e P l o t
Population = 35 30 25 1167 - 0.5868 Year
S R-Sq R-Sq(adj) 1.44434 97.7% 97.4%

Population

20 15 10 5 1930 1940 1950 Year 1960 1970 1980

(a) The equation: Population=1167-0.5868*Year Slope: -0.5868 Intercept: 1167 Between 1930 and 1980, year predicts population pretty well. 97.7% of the variance of the population is explained by the regression. However, in earlier and later times (like before 1900 and after 2000), the prediction is likely to fail. (b) 1167-0.5868*1990=-0.732 (million) The answer doesnt make sense, for population cant be negative. 8. (a)

Sc at t e rp l o t o f Re m o t e v s W ei g h t
30 25 20

Remote

15 10 5 0 120 130 140 150 Weight 160 170 180

There appears to be a positive association between weight and number of times a subject used the remote control. (b)Pearson correlation of Weight and Remote = 0.876 (c) Pearson correlation of Weight and Remote = -0.213 (d) Pearson correlation of C5 and C6 = -0.345 (e)In this case, the gender is the lurking variable. Weight and the number of times a subject used the remote control are all directly related to gender. Between themselves, however, theres no significant relationship. I think when assessing the relationship between two variables, it is critical to make sure that there are no other great, relevant differences among the subjects considered. Moreover, the relationship a scatterplot shows should not be taken as causative before further study.

You might also like