You are on page 1of 19

Holland Cummisford

STAT-143-E
Honors Project

Math 143 for Honors

Part 1: Background Information and Lit Review:

Low birth weight is defined as an infant that is born weighing less than 2500 grams, or
about 5.5 pounds. Low birth weight in infants continues to be a persistent problem in our society,
even in modern times when medical care is so readily available. There is a long list of adverse
effects that come with a low birth weight that begin with high infant mortality rates and can last
throughout the childs lifetime, with higher risk of disease, poor language development, and
brain abnormalities being just a few of the possible consequences (Kogan 1995). Because of the
plethora of negative effects, many researchers have begun looking for possible causes of low
birth weight, and ways that it can be prevented in the future. They have, however, come to a
variety of different conclusions about which factors are most important in determining birth
weight in infants.
Biological and genetic factors were some of the first variables to be looked at in
determining causes of low birth weight. A study conducted by Stanford University in 2000
determined that race could be a significant determinant in an infants birth weight: infants with
a black mother and father had the lowest mean birthweights, while infants with two white parents
had the highest weights. Stanford also concluded that birth weights of Chinese and Japanese
infants were also significantly lower than those of white infants. Furthermore, the age of an
infants mother was also determined to be significant. Stanford University wrote in 2000 that
the incidence of low birth weight is higher among mothers under the age of 18 or over the age
of 35.
There are social factors that have been determined to be significant in an infants birth
weight. One possible cause found is social class, as is described by Kogan. He writes that this
correlation most likely exists because poverty could affect maternal health status at the time of
conception through lower physiologic reserves. What is exactly meant by health status is not
specified, but it is clear that less access to health care, important vitamins, and even simply the
daily calories needed to sustain a healthy pregnancy are often not present for women in poverty.
Nutrient intake and a lack of access to prenatal care were also determined to be significantly
related to birth weight by Stanford University (2000). The World Health Organization (2004)
also states that mothers with a low social status have a tendency to give birth to babies with low
birth weights, and agrees with Kogan that this is due to decreased nutrition and overall health in
mothers in poverty.
Finally, specific behaviors performed by the mother can have adverse effects on their
childrens birth weights, smoking being the most important factor. Stanford University confirms
that smoking is thought to be the cause of up to 30% of cases of low birth weight infants. The
correlation was also reported by a study done by Health Day News in 2009, which compared
mothers who stopped smoking six weeks into pregnancy with mothers that smoked during their
entire pregnancy. They found that the birth weights of infants with mothers that stopped smoking
were significantly higher than those of babies with mothers whose smoked throughout
pregnancy.
Clearly, there are a variety of factors that could possibly affect birthweight as shown by
these studies, and many of these were also studied in the data collected at Baystate Medical
Center in Springfield, MA in 1986. Through this study, they tracked 189 mothers and various
factors that they believed could affect the birth weights of their children. The variables included
were the age of the mother, her weight in pounds at the last menstrual period, her race, her
smoking status during pregnancy, her history of premature labor and of hypertension, the
presence of uterine irritability, and the number of physician visits during the first trimester of
pregnancy. This data was recorded along with the birth weights of each mothers infant. This
data can now be examined to possibly determine which factors have a statistically significant
effect on the birth weight of an infant and compared to the other studies, which all have similar
samples of women, so results should be similar.
References

Baystate Medical Center (1986). Low Birth Weight. Available from:


http://www.umass.edu/statdata/statdata/data/

Health Day News (2009). Smoking-low birthweight link explained in part. U.S. News & World
Report. Retrieved from: http://health.usnews.com/health-news/family-health/womens-
health/articles/2009/02/02/smoking-low-birth-weight-link-explained-in-part

Kogan, M. D. (1995). Social causes of low birthweight. Journal of the Royal Society of
Medicine, 88, pp. 611-615.

Stanford University (2000). Primary determining factors of low birthweight in infants. Retrieved
from: http://web.stanford.edu/group/virus/herpes/2000/primaryf.htm

United Nations Childrens Fund, World Health Organization. (2004). Low birthweight: Country,
regional, and global estimates. New York: UNICEF.
Part 2: Graphs and Variables

Summary statistics for Birth Weight:


Column Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
BWT 2944.6561 531473.68 729.02242 53.028578 2977 4281 709 4990 2414 3475

Birth weights have a relatively normal distribution, ranging from the minimum of 709 grams to
4990 grams. Because the graph is mostly symmetrical, we can use the mean of 2944.66 grams
and the standard deviation of 729.02 grams are appropriate summary statistics.

0 = Normal Birth Weight 1 = Low Birth Weight


Frequency table results for LOW:
Count = 189
LOW Frequency Relative Frequency
0 130 0.68783069
1 59 0.31216931

A little more than 2/3 of babies born in this studies were normal weight (68.78%), while the
other 1/3 (31.21%) were underweight.
Summary statistics for Mothers Age:
Column Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
AGE 23.238095 28.075988 5.2986779 0.38542211 23 31 14 45 19 26

Mothers age has a relatively normal distribution (with one outlier at 45 years), ranging from a
minimum of 14 years to 45 years. Because of its symmetry, the mean of 23. 24 years and
standard deviation of 5.3 years are appropriate summary measures.

Summary statistics for Mothers Weight:


Column Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
LWT 129.81481 935.0985 30.57938 2.2243226 121 170 80 250 110 140

Mothers weight has a right skewed graph, ranging from a minimum of 80 pounds to 250
pounds. Because the data is skewed, the median of 121 pounds and the interquartile range (Q3-
Q1) of 30 pounds should be used as the summary statistics.
1 = White 2 = Black 3 = Other
Frequency table results for RACE:
Count = 189
RACE Frequency Relative Frequency
1 96 0.50793651
2 26 0.13756614
3 67 0.35449735

Slightly over half of the mothers in the study were white (50.79%), while 13.76% were black and
35.45% were some other race.

0 = Non-Smoker 1 = Smoker
Frequency table results for SMOKE:
Count = 189
SMOKE Frequency Relative Frequency
0 115 0.60846561
1 74 0.39153439

Almost 40% of mothers in this studies smoked, and 60.85% were reported as non-smokers.
0 = None 1 = Premature Labor Once 2 = Premature Labor Twice, etc.
Frequency table results for PTL:
Count = 189
PTL Frequency Relative Frequency
0 159 0.84126984
1 24 0.12698413
2 5 0.026455026
3 1 0.0052910053

A large majority of mothers in this study (84.13%) had no premature labor before this study,
while 12.7% had it occur once. Only 5 participants had premature labor twice, and 1 participant
had it three times.

0 = No History of Hypertension 1 = History of Hypertension


Frequency table results for HT:
Count = 189
HT Frequency Relative Frequency
0 177 0.93650794
1 12 0.063492063

Almost all of the mothers had no history of hypertension (93.65%), but 6.35% did have a history
of hypertension.
0 = No History of Uterine Irritability 1 = History of Uterine Irritability
Frequency table results for UI:
Count = 189
UI Frequency Relative Frequency
0 161 0.85185185
1 28 0.14814815

Only 14.81% of mothers had a history of uterine irritability, and the remaining 85.19% had not
history of any uterine irritability.

0 = No Doctor Visits in First Trimester 1 = One Visit 2 = Two Visits, etc.


Frequency table results for FTV:
Count = 189
FTV Frequency Relative Frequency
0 100 0.52910053
1 47 0.24867725
2 30 0.15873016
3 7 0.037037037
4 4 0.021164021
6 1 0.0052910053

A little bit over half of the mothers in this study did not have a single doctor visit in the first
trimester of her pregnancy (52.91%). 24.86% had one visit, 15.87% had two visits, and the
remaining percentage had over 3 visits, with one participant having six doctor visits during her
first trimester.
Part 3: Inferential Statistics

One Sample Confidence Interval for Proportions:


I want to know what proportion of expectant mothers smoke in the whole population (in 1986),
so I used a one sample 95% confidence interval for proportions. The assumptions for this test are
that this is a simple random sample, the observations are independent of one another, the sample
is less than 10% of the population, and there are at least ten in each group (n1=73 and n2=116),
and all of these requirements are met.

One sample proportion confidence interval:


Outcomes in : SMOKE
Success : 1
p : Proportion of successes
Method: Standard-Wald

95% confidence interval results:


Variable Count Total Sample Prop. Std. Err. L. Limit U. Limit
SMOKE 74 189 0.39153439 0.035503574 0.32194867 0.46112012
Based on these results, I am 95% confident that the percentage of all expectant mothers who
smoke is between 32.18 and 46.13%.

One Sample Hypothesis Test for Proportions:


The Center for Disease Control states that the current percent of babies born with low birth
weight is 8.00%. I would like to know if there is a similar percent in this study, or if the percents
have changed significantly since the study was done in 1986. I used a one sample hypothesis test.
The assumptions for this test are that this is a simple random sample, the observations are
independent of one another, the sample is less than 10% of the population, and there are at least
ten in each group (n1=59 and n2=130), and all of these requirements are met.

One sample proportion hypothesis test:


Outcomes in : LOW
Success : 1
p : Proportion of successes
H0 : p = 0.08
HA : p 0.08

Hypothesis test results:


Variable Count Total Sample Prop. Std. Err. Z-Stat P-value
LOW 59 189 0.31216931 0.019733677 11.765132 <0.0001
Using and alpha of .05 and the p-value of less than .0001, p is less than alpha, so we reject the
null hypothesis that the statistic is still 8.00%. There is significant evidence that the percent of
babies with low birth weight has changed since this sample was taken in 1986, and by looking at
the actual percentages (the sample percent is 31.22%) we can see that the proportion has dropped
dramatically in the past 30 years.
Two Sample Confidence Interval for Proportions:
I want to find out the difference in proportions of low birth weight for mothers with a history of
premature labor and mothers without this history. For this, I will use a two-sample 95%
confidence interval for proportions. The assumptions for this test are that this is a simple random
sample, the observations are independent of one another, the sample is less than 10% of the
population, and there are at least ten in each group for each of the two samples. For the first
group, n1=18 and n2=12 and the second group has n1=41 and n2=118. All of these requirements
are met.

Two sample proportion confidence interval:


p1 : Proportion of successes (Success = 1) for LOW where PTL=1 OR PTL=2 OR PTL=3
p2 : Proportion of successes (Success = 1) for LOW where PTL=0
p1 - p2 : Difference in proportions

95% confidence interval results:


Difference Count1 Total1 Count2 Total2 Sample Diff. Std. Err. L. Limit U. Limit
p1 - p2 18 30 41 159 0.34213836 0.095935284 0.15410866 0.53016807
Based on these results, we are 95% confident that mothers with a history of premature labor have
a 15.41% to 53.02% higher rate of babies with low birth weights than mothers without a history
of premature labor.

Two Sample Hypothesis Test for Proportions:


I want to see if mothers with a history of uterine irritability are more likely to have babies with
low birth weights than mothers without this history. To do this, I will use a two-sample
hypothesis test for proportions. The assumptions for this test are that this is a simple random
sample, the observations are independent of one another, the sample is less than 10% of the
population, and there are at least ten in each group for each of the two samples. For the first
group, n1=45 and n2=116 and the second group has n1=14 and n2=14. All of these requirements
are met.

Two sample proportion hypothesis test:


p1 : Proportion of successes (Success = 1) for LOW where UI=0
p2 : Proportion of successes (Success = 1) for LOW where UI=1
p1 - p2 : Difference in proportions
H0 : p1 - p2 = 0
HA : p1 - p2 < 0

Hypothesis test results:


Difference Count1 Total1 Count2 Total2 Sample Diff. Std. Err. Z-Stat P-value
p1 - p2 45 161 14 28 -0.22049689 0.094880033 -2.3239547 0.0101
With an alpha of .05 and p-value of .0101, we reject the null hypothesis. Mothers with a history
of uterine irritability have a significantly higher proportion of low birth weight babies than
mothers without a history of uterine irritability.
One Sample Confidence Interval for Means: Average Age of Expectant Mothers
I want to know the average age of all expectant mothers, so I will use a one sample 95%
confidence interval for means using the data from this study. The assumptions for this test are
that the data was obtained with a simple random sample, the observations are independent of one
another, and the quantitative variable has a normal distribution. The only issue is with the third
assumption, for it appears there is a bit of a right skew for mothers age. Luckily, with a sample
size of 189, we can still use t-procedures since they are robust against skewness with a large
enough sample size. Therefore, all assumptions are met.

One sample T confidence interval:


: Mean of variable

95% confidence interval results:


Variable Sample Mean Std. Err. DF L. Limit U. Limit
AGE 23.238095 0.38542211 188 22.477787 23.998403
Based on these results, I am 95% confident that the average age of all expectant mothers (in
1986) is between 22.48 and 24.00 years.

One Sample Hypothesis Test for Means:


The average birth weight of all babies, according to the Center for Disease Control, is 3401.94
grams. I want to know if this statistic is the same in this study, or if it is different, so I will use a
one sample hypothesis test for means. The assumptions for this test are that the data was obtained
with a simple random sample, the observations are independent of one another, and the
quantitative variable has a normal distribution. All these assumptions are met.

One sample T hypothesis test:


: Mean of variable
H0 : = 3401.94
HA : 3401.94

Hypothesis test results:


Variable Sample Mean Std. Err. DF T-Stat P-value
BWT 2944.6561 53.028578 188 -8.6233486 <0.0001
Using an alpha of .05, the p value is a lot lower than alpha. Because of this, we reject the null
hypothesis and conclude that the mean birth weight of babies in the study is significantly
different than the current mean of 3401.94. After examining the data, we can see that the mean
birth weight has risen from the sample mean of 2944.66 grams in 1986 to the current mean birth
weight.

Two Sample Confidence Interval for Means:


I want to see the difference in mean ages for mothers that gave birth to babies with low birth
weights compared to mothers that gave birth to babies of healthy weights. To accomplish this, I
will use a two-sample 95% confidence interval for means. The assumptions for this test are that
the samples are independent of each other, the samples were gathered using simple random
samples, and that both populations are normally distributed. The histogram for ages of mothers
with low birth weight babies are distributed normally, but the histogram for age of mothers that
gave birth to healthy weight babies is right-skewed. Luckily, because we have a big enough
sample size, and this procedure is robust against skew. Therefore, all of these assumptions are
met.

Two sample T confidence interval:


1 : Mean of AGE where LOW=0
2 : Mean of AGE where LOW=1
1 - 2 : Difference between two means
(without pooled variances)

95% confidence interval results:


Difference Sample Diff. Std. Err. DF L. Limit U. Limit
1 - 2 1.3564537 0.76477138 136.94075 -0.15583491 2.8687423
Based on these results, there is no significant difference between the mean ages of mothers that
gave birth to low birth weight babies and those that gave birth to healthy babies. We know this
because the 95% confidence interval goes from negative to positive, and therefore includes 0 as a
possible difference between means.

Two Sample Hypothesis Test for Means:


I want to see if there is a significant difference in the mean weights (at their last menstrual
period) of mothers that gave birth to low weight and healthy weight babies. To do this, I will use
a two-sample hypothesis test for means. The assumptions for this test are that the samples are
independent of each other, the samples were gathered using simple random samples, and that
both populations are normally distributed. Looking at the histograms for the data, both are
slightly right-skewed, but luckily this test is robust against skew with a big enough sample, and
both of these have large enough samples. Therefore, all the assumptions are met.

Two sample T hypothesis test:


1 : Mean of LWT where LOW=0
2 : Mean of LWT where LOW=1
1 - 2 : Difference between two means
H0 : 1 - 2 = 0
HA : 1 - 2 0
(without pooled variances)

Hypothesis test results:


Difference Sample Diff. Std. Err. DF T-Stat P-value
1 - 2 11.164407 4.4381853 132.45996 2.5155342 0.0131
Using an alpha of .05 and the p-value of .0131, p is less than alpha, so we reject the null
hypothesis. There is a statistically significant difference in mean weight of mothers that gave
birth to babies with low birth weight and those that gave birth to healthy weight babies. By
looking at the results, it appears that the weight of those that gave birth to healthy babies tends to
be higher than those with babies of low birth weight.
I also want to see if there is a difference in the mean birth weights of babies with mothers of
smoking or non-smoking status, so I will use another two-sample hypothesis test for means. The
assumptions for this test are that the samples are independent of each other, the samples were
gathered using simple random samples, and that both populations are normally distributed. All
the assumptions are met.

Two sample T hypothesis test:


1 : Mean of BWT where SMOKE=0
2 : Mean of BWT where SMOKE=1
1 - 2 : Difference between two means
H0 : 1 - 2 = 0
HA : 1 - 2 0
(without pooled variances)

Hypothesis test results:


Difference Sample Diff. Std. Err. DF T-Stat P-value
1 - 2 281.71328 103.97406 170.00132 2.7094575 0.0074
With an alpha of .05 and the p-value of .0074, we see that p is less than alpha and we reject the
null hypothesis. There is a significant difference in the mean birth weights of babies with
smoking and non-smoking mothers, and from the sample difference we can also see that the
weights of babies with non-smoking mothers (0) are significantly higher than those with
smoking mothers (1).

Chi-Square Test
I want to see if there is a relationship between race and low birth weight. Because both of these
are categorical variables, I will use the chi-square test. The assumptions for this test are that the
data was collected with a simple random sample, the observations are independent of one
another, there are expected counts of at least five in each cell, and the sample is less than 10% of
the population. All of these assumptions are met.

Contingency table results:


Rows: RACE
Columns: LOW

Cell format
Count
(Row percent)
(Column percent)
(Expected count)

0 1 Total
1 73 23 96
(76.04%) (23.96%) (100%)
(56.15%) (38.98%) (50.79%)
(66.03) (29.97)
2 15 11 26
(57.69%) (42.31%) (100%)
(11.54%) (18.64%) (13.76%)
(17.88) (8.12)

3 42 25 67
(62.69%) (37.31%) (100%)
(32.31%) (42.37%) (35.45%)
(46.08) (20.92)

Total 130 59 189


(68.78%) (31.22%) (100%)
(100%) (100%) (100%)

Chi-Square test:
Statistic DF Value P-value
Chi-square 2 5.004813 0.0819
Marginal Percents for Race: White (1) = 50.79% Black (2) = 13.76% Other (3) = 35.45%
For Birth Weight: Healthy (0) = 68.78% Low (1) = 31.22%
Conditional Percents Divided by Race for White: Healthy = 76.04% Low = 23.96%
For Black: Healthy = 57.69% Low = 42.31%
For Other: Healthy = 62.69% Low = 37.31%
From these results, it is hard to tell if there is a significant relationship between race and low
birth weight, so we will look at the chi-square test. For this test, the null hypothesis is that there
is no association, and the alternative hypothesis is that there is an association. Using an alpha of .
05 and the p-value of .0819, p is greater than alpha, so we accept the null hypothesis. There is no
statistically significant association between race and birth weight. However, the differences
between races in the percent with healthy weight babies seem important and are worth further
study.

Linear Regression:
I want to see if there is an association between number of doctor visits in the first trimester and
the birth weight of babies. To use this, I will analyze a scatterplot and use linear regression. The
assumptions for this test are that the data was obtained with a simple random sample, the
observations are independent of one another, and the residuals of the scatterplot have a normal
distribution (by looking at a histogram, I can see this is true). All of these assumptions are met.

Simple linear regression results:


Dependent Variable: BWT
Independent Variable: FTV
BWT = 2912.833 + 40.097141 FTV
Sample size: 189
R (correlation coefficient) = 0.058262057
R-sq = 0.0033944673
Estimate of error standard deviation: 729.7274
Parameter estimates:
Parameter Estimate Std. Err. Alternative DF T-Stat P-value
Intercept 2912.833 66.388752 0 187 43.875399 <0.0001
Slope 40.097141 50.242175 0 187 0.79807732 0.4258

Analysis of variance table for regression model:


Source DF SS MS F-stat P-value
Model 1 339165.17 339165.17 0.63692741 0.4258
Error 187 99577887 532502.07
Total 188 99917053

For this test the null hypothesis is that the population slope is 0, and the alternate is that the
population slope is not 0. With an alpha of .05 and a p-value of .4258 for the slope, we accept the
null hypothesis. The population slope is 0, and there is no association between first trimester
visits and birth weight. This is also supported by an R-squared of practically zero.

ANOVA:
I want to see if there is a difference between mean birth weights for mothers of different races.
To do this, I will use analysis of variance. The assumptions for this test are that subjects were
chosen using a simple random sample, the observations are independent of one another, the
response variable is normally distributed in each group (this was confirmed by examining
histograms, which both show normal distributions), and the population standard deviations for
each group are the same. To check the standard deviations, we see if the larger s.d. is less than
twice the smaller s.d. In this case, 727 divided by 638 is less than two, so the standard deviations
are close enough to be considered equal. Therefore, all of the assumptions are met.

Analysis of Variance results:


Responses: BWT
Factors: RACE

Response statistics by factor


RACE n Mean Std. Dev. Std. Error
1 96 3103.7396 727.72424 74.273045
2 26 2719.6923 638.68388 125.25621
3 67 2804.0149 721.30115 88.120961

ANOVA table
Source DF SS MS F-Stat P-value
RACE 2 5070607.6 2535303.8 4.9718944 0.0079
Error 186 94846445 509927.12
Total 188 99917053

Tukey HSD results (95% level)


1 subtracted from
Difference Lower Upper P-value
2 -384.04728 -757.04584 -11.048707 0.042
3 -299.72466 -568.30259 -31.146726 0.0245
2 subtracted from
Difference Lower Upper P-value
3 84.322618 -305.49992 474.14515 0.8661
For this test, the null hypothesis is that the mean birth weights for all races of mothers are equal,
and the alternative is that they are not. Using an alpha of .05 and the p-value of .0079, p is less
than alpha, so we reject the null hypothesis. There is a significant difference between at least two
of the race categories, so we then look at the Tukey comparisons between races. From this, we
see that there is a significant difference between white (1) and black (2) (p=.042) and between
white and other (3) (p=.0245).
Multiple Regression:
Finally, I want to use all of these variable to be able to predict birth weights of babies based on
statistics about the mother. For this, I will use multiple regression with the variables of age, last
weight, race, smoking status, pre-term labor, hypertension, uterine irritability, and first-trimester
visits compared to birth weight. The assumptions are that this data was obtained using a Simple
Random Sample, and that the observations are all independent, and this is true. The highlighted
p-values show significant correlations.

Correlation matrix:
AGE LWT RACE SMOKE PTL HT UI FTV
LWT 0.1800731
5
(0.0132)
RACE - -
0.1728179 0.1650485
5 4
(0.0174) (0.0232)
SMO - - -
KE 0.0443461 0.0441790 0.3390307
82 84 4
(0.5446) (0.5461) (<0.0001)
PTL 0.0716063 -0.140029 0.0079512 0.1875570
86 (0.0546) 927 6
(0.3275) (0.9135) (0.0098)
HT -0.015837 0.2363604 0.0199299 0.0134070 -
(0.8288) (0.0011) 17 37 0.0153995
(0.7855) (0.8547) 79
(0.8334)
UI - - 0.0536020 0.0621589 0.2275853 -
0.0751555 0.1527631 88 97 4 0.1085850
8 7 (0.4638) (0.3955) (0.0016) 6
(0.304) (0.0359) (0.1369)
FTV 0.2153939 0.1405274 - - - - -
4 6 0.0983362 0.0280131 0.0444296 0.0723725 0.0595234
(0.0029) (0.0538) 54 41 6 47 12
(0.1782) (0.702) (0.5438) (0.3223) (0.4159)
BWT 0.0898663 0.1857887 - -0.189113 - - - 0.058262
89 1 0.1962028 (0.0092) 0.1547317 0.1460747 0.2834680 057
(0.2188) (0.0105) 3 3 9 3 (0.4258)
(0.0068) (0.0335) (0.0449) (<0.0001)

From this, we can see that last weight, race, smoking status, pre-term labor, hypertension, and
uterine irritability are all correlated with birth weight. However, there is also a problem with
multicollinearity, since some of these variables are also correlated with each other. Therefore, we
look at the results of the multiple linear regression to see what our regression equation will be.
Multiple linear regression results:
Dependent Variable: BWT
Independent Variable(s): FTV, AGE, LWT, RACE, SMOKE, PTL, HT, UI

Stepwise results:
P-value to enter: 0.15
P-value to leave: 0.25
Step Variable Action P-value RMSE R-squared R-squared (adj)
1 UI Entered <0.0001 700.98596 0.0804 0.0754
2 RACE Entered 0.0094 690.19706 0.1132 0.1037
3 SMOKE Entered 0.0003 667.37531 0.1753 0.162
4 HT Entered 0.0127 657.96787 0.2028 0.1854
5 LWT Entered 0.0386 652.07777 0.2212 0.2

UI = 3105.6821 + -522.3361 UI + -188.76419 RACE + -364.70804 SMOKE +


-595.73339 HT + 3.4321335 LWT

Parameter estimates:
Parameter Estimate Std. Err. Alternative DF T-Stat P-value
Intercept 3105.6821 271.57472 0 183 11.435829 <0.0001
UI -522.3361 135.9513 0 183 -3.8420824 0.0002
RACE -188.76419 56.339257 0 183 -3.3504913 0.001
SMOKE -364.70804 104.32333 0 183 -3.4959395 0.0006
HT -595.73339 201.47857 0 183 -2.9568077 0.0035
LWT 3.4321335 1.6476489 0 183 2.0830491 0.0386

Analysis of variance table for multiple regression model:


Source DF SS MS F-stat P-value
Model 5 22104462 4420892.4 10.397075 <0.0001
Error 183 77812591 425205.41
Total 188 99917053

Summary of fit:
Root MSE: 652.07777
R-squared: 0.2212
R-squared (adjusted): 0.2

From this, we see that pre-term labor, first trimester visits, and age were eliminated from the
equation because of multicollinearity. Therefore, the equation that we come up with is:

Birth weight = 3105.68 522.34 (UI) 188.76 (RACE) 364.71 (SMOKE) 595.73 (HT)
+3.43 (LWT)

Using this equation, we can predict the birth weight of babies based on the status of the mother.
However, from the R-square, which is 22.12%, we can see that this is not a very good predictor
of birth weight, since it only accounts for 22.12% of variation in birth weights of babies.
Part 4: Conclusion

After all of this analysis, I can see which variables were statistically significant and
which were not. Mothers with history of pre-mature labor, uterine irritability, hypertension, low
weight at last menstrual cycle, smoking status, and race other than white gave birth to babies
with significantly lower weights than other mothers. Age and number first-trimester doctor visits,
on the other hand, do not have a statistically significant effect on birth weight.
Many of these results correspond to the studies that I examined earlier. The Stanford
University study confirms the results about race, saying that Asian and black babies have
significantly lower birth weights than white babies, on average. The same study, along with
Health Day, confirm the results on smoking, showing that smoking mothers give birth to lower
weight babies than non-smoking mothers.
On the other hand, my results for age and number of doctor visits do not necessarily fit
with the studies that I previously reviewed. Stanford said that age was a significant factor in birth
weight of babies, but this study does not correspond with that statement. Also, the World Health
Organization talked about the importance of having access to doctors, but according to my
analysis, number of doctor visits in the first trimester has no effect on the birth weights of babies.
There are still many other factors discussed by the World Health Organization that are related to
poverty that were not covered in this analysis, too, so I cannot conclude anything about those
topics.
All in all, the results of my analysis corresponded quite well with the related studies that I
reviewed before looking at these statistics. There are many factors that can affect the birth
weights of babies, and more studies need to be done in order to further confirm which have the
largest effects. This project has helped me have a better view of the world concerning this topic. I
had never really considered this topic before doing this analysis, so it was very interesting to
learn more about it and see for myself which factors do actually effect the birth weights of
babies.
I think this study does a good job at showing a picture of the world, overall. It is a very
complicated topic, so the fact that there were not any super specific results is not surprising, but
it also shows me how even when lots of studies are done it can be hard to find specific results in
statistical analysis. This helps me see that there are some limits to statistics. However, overall,
they are very useful for figuring out the causes of many problems, and can be extremely useful in
seeing what needs to be looked into more in important topics such as the birth weight of babies. I
see now that statistics, especially when used for things like medical and health topics, can be
invaluable in helping people save lives of others.

You might also like