You are on page 1of 38

6-1 Numerical Summaries

Definition: Sample Mean

6-1 Numerical Summaries


Example 6-1

6-1 Numerical Summaries

Figure 6-1 The sample mean as a balance point for a system of weights.

6-1 Numerical Summaries


Population Mean

For a finite population with N measurements, the mean is

The sample mean is a reasonable estimate of the population mean.

6-1 Numerical Summaries


Definition: Sample Variance

6-1 Numerical Summaries


How Does the Sample Variance Measure Variability?

Figure 6-2 How the sample variance measures variability through the deviations xi x .

6-1 Numerical Summaries


Example 6-2

6-1 Numerical Summaries

6-1 Numerical Summaries


Computation of s2

6-1 Numerical Summaries


Population Variance When the population is finite and consists of N values, we may define the population variance as

The sample variance is a reasonable estimate of the population variance.

6-1 Numerical Summaries


Definition

6-2 Stem-and-Leaf Diagrams

Steps for Constructing a Stem-and-Leaf Diagram

6-2 Stem-and-Leaf Diagrams


Example 6-4

6-2 Stem-and-Leaf Diagrams

6-2 Stem-and-Leaf Diagrams


Figure 6-4 Stem-andleaf diagram for the compressive strength data in Table 6-2.

6-2 Stem-and-Leaf Diagrams


Example 6-5

6-2 Stem-and-Leaf Diagrams

Figure 6-5 Stem-andleaf displays for Example 6-5. Stem: Tens digits. Leaf: Ones digits.

Try your own Stem and Leaf Plot with the following temperatures for plastic extrusion. 77 80 82 68 65 59 61 57 50 62 61 70 69 64 67 70 62 65 65 73 76 87 80 82 83 79 79 71 80 77

Data Features The median is a measure of central tendency that divides the
data into two equal parts, half below the median and half above. If the number of observations is even, the median is halfway between the two central values. From Fig. 6-6, the 40th and 41st values of strength as 160 and 163, so the median is (160 + 163)/2 = 161.5. If the number of observations is odd, the median is the central value. The range is a measure of variability that can be easily computed from the ordered stem-and-leaf display. It is the maximum minus the minimum measurement. From Fig.6-6 the range is 245 - 76 = 169.

Data Features When an ordered set of data is divided into four equal parts, the
division points are called quartiles. The first or lower quartile, q1 , is a value that has approximately one-fourth (25%) of the observations below it and approximately 75% of the observations above. The second quartile, q2, has approximately one-half (50%) of the observations below its value. The second quartile is exactly equal to the median. The third or upper quartile, q3, has approximately three-fourths (75%) of the observations below its value. As in the case of the median, the quartiles may not be unique.

Data Features The compressive strength data in Figure 6-6 contains


n = 80 observations. Minitab software calculates the first and third quartiles as the(n + 1)/4 and 3(n + 1)/4 ordered observations and interpolates as needed. For example, (80 + 1)/4 = 20.25 and 3(80 + 1)/4 = 60.75. Therefore, Minitab interpolates between the 20th and 21st ordered observation to obtain q1 = 143.50 and between the 60th and 61st observation to obtain q3 =181.00.

Data Features

The interquartile range is the difference between the upper


and lower quartiles, and it is sometimes used as a measure of variability. In general, the 100kth percentile is a data value such that approximately 100k% of the observations are at or below this value and approximately 100(1 - k)% of them are above it.

Temperatures

Tens
5 6 7 8 079

Ones
11224555789 001367799 0002237

Begin with the lowest temperature. The lowest temperature of the month was 50. Enter the 5 in the tens colum and a 0 in the ones. What's the next lowest temperature? It's 57, enter a 7 in the ones, next is 59, enter a 9 in the ones. Now, find all of the temperatures that were in the 60's, 70's and 80's. Enter the rest of the temperatures sequentially until your Stem and Leaf Plot contains all of the data.

6-4 Box Plots


The box plot is a graphical display that simultaneously describes several important features of a data set, such as center, spread, departure from symmetry, and identification of observations that lie unusually far from the bulk of the data. Whisker Outlier Extreme outlier

6-4 Box Plots

Figure 6-13 Description of a box plot.

6-4 Box Plots

Figure 6-15 Comparative box plots of a quality index at three plants.

The following set of numbers are the amount of strength of fifteen weld specimens (they are arranged from least to greatest).
18 27 34 52 54 59 61 68 78 82 85 87 91 93 100 Construct the box plot and write your inference

First find the median. The median is the value exactly in the middle of an ordered set of numbers. * 68 is the median

Next, we consider only the values to the left of the median: 18 27 34 52 54 59 61. We now find the median of this set of numbers. Remember, the median is the value exactly in the middle of an ordered set of numbers. Thus 52 is the median of the scores less than the median of all scores, and therefore is the lower quartile.
52 is the lower quartile Now consider only the values to the right of the median: 78 82 85 87 91 93 100. We now find the median of this set of numbers. The median 87 is therefore called the upper quartile. 87 is the upper quartile

(*If you're finding the median in an ordered set with an even number of values, you must take the average of the two middle numbers. For example: 3, 5, 7, and 10. Add the two middle numbers. 5 + 7 = 12. Divided 12 by 2 to get the average. 12 / 2 = 6. Therefore 6 is the median for the ordered set of 3, 5, 7, and 10.) You are now ready to find the interquartile range (IQR). The interquartile range is the difference between the upper quartile and the lower quartile. In our case the IQR = 87 - 52 = 35. The IQR is a very useful measurement. It is useful because it is less influenced by extreme values, it limits the range to the middle 50% of the values. 35 is the interquartile range

6-5 Time Sequence Plots


A time series or time sequence is a data set in which the observations are recorded in the order in which they occur. A time series plot is a graph in which the vertical axis denotes the observed value of the variable (say x) and the horizontal axis denotes the time (which could be minutes, days, years, etc.). When measurements are plotted as a time series, we often see trends, cycles, or other broad features of the data

6-5 Time Sequence Plots

Figure 6-16 Company sales by year (a) and by quarter (b).

6-6 Probability Plots


Probability plotting is a graphical method for determining whether sample data conform to a hypothesized distribution based on a subjective visual examination of the data.

Probability plotting typically uses special graph paper, known as probability paper, that has been designed for the hypothesized distribution. Probability paper is widely available for the normal, lognormal, Weibull, and various chi-square and gamma distributions.

6-6 Probability Plots


Example 6-7

6-6 Probability Plots


Example 6-7 (continued)

6-6 Probability Plots


Figure 6-19 Normal probability plot for battery life.

6-6 Probability Plots


Figure 6-20 Normal probability plot obtained from standardized normal scores.

Example to calculate Z score The test scores of students in a class test has a mean of 70 and with a standard deviation of 12. What is the probable percentage of students scored more than 85? The z score for the given data is, z = (85 70)/12 = 1.25

Use of Z score: From the z score table, the fraction of the data within this z score is 0.8944. This means 89.44% of the students are within the test scores of 85 and hence the percentage of students who are above the test score of 85 = (100 89.44)% = 10.56% Hence the required probable percentage is 10.56%.

Area under the curve

Right Skew - If the plotted points appear to bend up and to the left of the normal line that indicates a long tail to the right. Left Skew - If the plotted points bend down and to the right of the normal line that indicates a long tail to the left. Short Tails - An S shaped-curve indicates shorter than normal tails, i.e. less variance than expected.

Long Tails - A curve which starts below the normal line, bends to follow it, and ends above it indicates long tails. That is, you are seeing more variance than you would expect in a normal distribution.

You might also like