You are on page 1of 5

SOLUTIONS TO THE LAB 1 ASSIGNMENT

Question 1

Excel produces the following histogram of pull strengths for the 100 resistors:

Histogram of Pull Strengths (lb)
0
5
10
15
20
25
59 61 63 65 67 69 71 73 75
F
r
e
q
u
e
n
c
y

(a) The histogram is one-peaked, bell-shaped, and approximately symmetric. Given the relatively
small spread, there is one observation (between 74 and 75) lying far above the main body of the
data. This observation may be considered an outlier. We will verify in Question 2 that indeed, the
single observation is an outlier in a formal sense. The tails of the distribution are relatively short.

(b) The center of the distribution is at approximately 65 pounds. As the distribution is approximately
symmetric, we expect that the values of mean and the median are very similar, and close to 65.

(c) If all 100 PST values were overestimated by approximately the same small positive value due to a
poorly calibrated measuring device, the shape of the histogram would be approximately the same
as the histogram for the overestimated values. However, the center (peak) of the histogram would
be shifted to the left by the difference between the overestimated values and the accurate values.
The mean and the median would also be shifted by the difference to the left but standard deviation
and the interquartile range would not be affected (would be the same as the values obtained for the
overestimated PST values.


Question 2

(a) The summary statistics for the pull strengths obtained with the Descriptive Statistics tool are
displayed below:


Summary Statistics

Mean 64.859
Standard Error 0.29214323
Median 64.45
Mode 64.3
Standard Deviation 2.921432297
Sample Variance 8.534766667
1
Kurtosis 0.566577167
Skewness 0.282186648
Range 16.3
Minimum 58.2
Maximum 74.5
Sum 6485.9
Count 100


(b) The Paste Function feature applied to our data returns the following values of the first quartile, the
third quartile, and the interquartile range:

First Quartile Q
1
=63.175
Third Quartile Q
3
=66.800
Interquartile range =3.625

(c) As the distribution of pull strengths is approximately symmetric, the mean and standard deviation
are appropriate measures of center and variation. The median and the interquartile range are used
for skewed distributions.

Question 3

According to the 1.5*IQR criterion, an outlier is any data point that lies below Q
1
-1.5*IQR or above the
value Q
3
+1.5*IQR. Taking into account the values of the lower and upper quartiles, and the interquartile
range obtained in Question 2, an outlier lies below 57.7375 and above 72.2375. There is only one observation
that satisfies the condition, the value of 74.5 - the largest observation in the data set.

The outlier 74.5 lies far above the main body of the data. Thus we expect that the mean and the standard
deviation of the remaining 99 observations would decrease. We do not expect a significant change in the
value of the median.

The summary statistics for the data without the outlier are displayed below:


Summary Statistics (Outlier Removed)

Mean 64.76161616
Standard Error 0.278230661
Median 64.4
Mode 64.3
Standard Deviation 2.768360123
Sample Variance 7.66381777
Kurtosis -0.109386988
Skewness 0.002956345
Range 13.4
Minimum 58.2
Maximum 71.6
Sum 6411.4
Count 99


The table confirms the conclusions we have reached before.




2
Question 4

In order to convert all 100 PST measurements to kilograms, it is necessary to multiple each value in the
column PST by 0.454. As a consequence, the new mean and the new median can be also obtained by
multiplying the value of the mean and the median for the measurements expressed in pounds by 0.454.
Moreover, given the formula for the standard deviation and the above, the new standard deviation can be
obtained from the standard deviation for the original data by multiplying it by 0.454. Also the interquartile
range for the data in kilograms is equal to the interquartile range for data on the original scale of
measurement multiplied by 0.454.

The histogram for the data expressed in kilograms will have the same shape as the histogram obtained in
Question 1. The peak of the new histogram will be approximately at 65*0.454 =29.51.

Question 5

In order to answer the question whether the new ozone-friendly cleaning process produces similarly strong or
stronger solder-joints, on the average, we look at the summary statistics for the distribution. The mean of the
pull strengths obtained is 64.761616, and it is almost identical to the mean of pull strengths for the old
technology (64.8). The small difference is due to sampling variability. Thus the new technology produces
solder-joints of similar strength, on the average.

Now we compare the variability of the two processes. The standard deviation for the old technology is 2.25 lb.
This value is smaller than the value of 2.7683 lb obtained in Question 3 (after excluding the outlier). Given the
large sample size that the new standard deviation is based on (99), it is safe to conclude that the new process
results in slightly higher variability than the old process. More advanced statistical methods are required to
determine whether the difference is statistically significant. The new process can be examined thoroughly to
determine whether some sources of extra variation can be eliminated.


Question 6

The histogram of electrical resistance for the 100 boards is displayed below:
Histogram of Electrical Resistances
0
5
10
15
20
25
0.2 0.6 1 1.4 1.8 2.2 2.6 3 More
F
r
e
q
u
e
n
c
y



The histogram is one-peaked, and skewed to the right. Most of the observations lie between 0 and
1, but there are several observations o
(a)
utside the range. The right tail is longer than the left tail of
the distribution. There is one outlier.

3
(b) As the distribution is skewed, median and interquartile range are appropriate measures of center
and spread, respectively.
resistance (RES) versus pull strength (PST) displays the relationship between
e two variables. It allows you to assess the type of relationship (linear, nonlinear), direction (positive,
ve
(a) The scatterplot for the data is displayed below:



Question 7

The scatterplot of electrical
th
negati ), and its strength.

Scatterplot of RES vs. PST
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
55 60 65 70 75
Pul l Strength (i n pounds)
E
l
e
c
t
r
i
c
a
l

R
e
s
i
s
t
a
n
c
e

(
i
n

t
e
r
a
o
h
m
s
)

There is no clear pattern in the plot. It seems that the points in the plot are randomly scattered.
However, it is worthy to notice a s

(b)
ubstantial difference in the variation of pull strength values for
low electrical resistance values relative to that one for the high electrical resistance values. There
are no obvious outliers in the plot.





















4


LAB 1 ASSIGNMENT MARKING SCHEMA


Proper Header and appearance: 10 points

1.
)
(c) Histogram of accurate measurements: 2 points
andard deviation and IQR of accurate values: 2 points
2.

ian, standard deviation, IQR): 4 points
(b) irst Quartile, Third Quartile, IQR: 3 points

3. r range for outliers: 2 points
entifying the outlier: 2 points
4. ffect of expressing the PST values in kilograms on summaries: 2 points
stogram: 2 points
5. omparing the average strength of resistors: 2 points
sses: 2 points
6.
) Analysis of the shape of the histogram: 3 points
ce and the spread: 2 points

7.

: 3 points
Outliers: 1 point
(b) catterplot: 6 points

TOTAL = 70

Correctly formatted histogram: 6 points.

(a) Analysis of the shape of the histogram: 3 points
(b Center (estimates of the mean and the median): 2 points
Mean, Median, st

Summary Statistics:
(a) Descriptive Statistics output (mean, med
F
(c) Discussion of appropriateness: 2 points
Determining the lower and uppe

Id
Effect of removing the outlier on some summary statistics: 3 points

E
Effect of expressing the PST values in kilograms on hi

C
Comparing the variability of the two proce

Correctly formatted histogram: 6 points.

(a
(b) Numerical measures to describe typical resistan
Relationship between pull strengths and resistance
(a) Discussion of the pattern in the scatterplot

Correctly formatted s
5

You might also like