You are on page 1of 37

MECH 215 Instrumentation and

Measurement
Week 3, Lecture 1
Data Analysis Statistics and Probability

January 18, 2009 Page 1


Data Analysis Stats. and Prob.

Measurements taken repeatedly (even under


identical conditions) will show variations.
These include,
Resolution Measurement System,
Repeatability Procedures and Techniques

Temporal
Measured Variable
Spatial

January 18, 2009 Page 2


Data Analysis Stats. and Prob.

For a set of measurements we need to quantify


1. A single representative value (average)
2. A measure of the variation in the data
3. How well these estimates represent the true
average and variation

January 18, 2009 Page 3


Data Analysis Stats. and Prob.

A sample of data refers to a set of repeated


measurements taken under fixed operating
conditions.
Note 1: Fixed may not mean exactly fixed.
Note 2: Systematic errors are considered to be
negligible. Consider only random errors for now.

January 18, 2009 Page 4


Data Analysis Stats. and Prob.

Because all the measured results will be estimates of


the true values, we can only estimate the true
value x using
x = x u x (%P)
x = true value
x = most probable estimate of x based on
available data
u x = the uncertainty interval
(%P) = the probability level

January 18, 2009 Page 5


Data Analysis Stats. and Prob.

Note: u x combines both random error and system


error. We are ignoring system error for now, but
will include it later (week 5, chapter 5)

Because all measurement samples vary with respect


to the true value, a set of measured samples will
display a spread of values clustering about the
true value.

January 18, 2009 Page 6


Data Analysis

N individual measurements
xi , i = 1,2,3, N
January 18, 2009 Page 7
Data Analysis

Single axis representation of xi


The true value of X is in the middle of the
clump.

January 18, 2009 Page 8


Data Analysis Stats. and Prob.

Using a histogram to display this data, we need to


choose K small intervals for each bin of the
histogram.
For small N, the number of measurement results in at
least one bin should be >= 5
For intermediate values of N, K = 1.87( N 1) +1
0.40

For large values of N, K = N 1/ 2

January 18, 2009 Page 9


Data Analysis Stats. and Prob.

Using data from Table 4.1


N = 20
K = 1.87(20 1) + 1 = 7 .1
0.40

Minimum Value = 0.68


Maximum Value = 1.34

1.34 0.68 0.66


x = = 0.10
7 7
Therefore a bin width x of 0.10 is chosen
January 18, 2009 Page 10
Data Analysis Stats. and Prob.

This histogram is an estimate of the data set


probability density function.
January 18, 2009 Page 11
Data Analysis Stats. and Prob.

p(x) = probability density function


As N and the bin width x 0
nj
p( x) = lim
N , x 0 N (2 x )

nj is the number of samples in each bin


p(x) defines the probability that a measured variable
might assume any particular value upon any
individual measurement.
January 18, 2009 Page 12
Data Analysis Stats. and Prob.

January 18, 2009 Page 13


Data Analysis Stats. and Prob.

January 18, 2009 Page 14


Data Analysis Stats. and Prob.

Normal (Gaussian) distribution


Log Normal Distribution
Poisson Distribution
Weibull Distribution
Binomial Distribution
Student t Distribution
2 Distribution
Uniform Distribution
Beta Distribution

January 18, 2009 Page 15


Data Analysis Stats. and Prob.

In the absence of systematic errors, the true mean


value, x of a set of measurements is

1 T
x = lim
T T
0
x (t ) dt

For discrete data,


N
1
x = lim
N N
x
i =1
i

January 18, 2009 Page 16


Data Analysis Stats. and Prob.

The width of the probability density function reflects


the data variation.

[x (t ) x ] dt
1 T
variance = = lim
2 2
T T 0

For discrete data,


N

(x x)
1
= lim
2
i
2
N N
i =1

Standard Deviation =
January 18, 2009 Page 17
Data Analysis Infinite Stats.

Normal / Gaussian distribution (bell curve)


1 (x x )
2
1
p( x) = exp
2 2
2

x = true mean of x
2 = true variance of x
The probability that any future measurement will fall
within any stated interval, is the value of the area
below the p(x) curve.

January 18, 2009 Page 18


Data Analysis Infinite Stats.

x + x
P ( x x x x + x ) = p ( x ) dx
x x

If we define the standardized normal variate for any x


as
= ( x x ) /

And we specify an interval on p(x) using,

z1 = ( x1 x) /

January 18, 2009 Page 19


Data Analysis Infinite Stats.

Then dx = d

P( z1 z1 ) =
1 z1

2 / 2
e d
2 z1

And because p(x) is symmetrical about x,


1
P( z1 z1 ) = 2
z1

2 / 2
e d
2 0

This is the normal error function (see Table 4.3).

January 18, 2009 Page 20


Data Analysis Infinite Stats.

January 18, 2009 Page 21


Data Analysis Infinite Stats.

The above are statements of probability.


Integration of p(x) between x z1 for z1 = 1.0
Yields 68.26% of the total area under p(x).
That means there is a 68.26% chance that a
measurement of x will fall within x 1.0
For z1 = 2.0 : 95.45% of area under p(x) lies within
x 2.0

For z1 = 3.0 : 99.73% of area under p(x) lies within


x 3.0
January 18, 2009 Page 22
Data Analysis Infinite Stats.

January 18, 2009 Page 23


Data Analysis Finite Stats.

Finite statistics describe only the behavior of the


finite data set.
N
1
Recall that the mean value is x=
N
x
i =1
i

( )
N
1
Sample Variance is S =
2

N 1 1
x xi x

Sample Standard Deviation is S x = S x2


Deviation of x is (xi x )
Degrees of freedom of the data set is N 1 = v
January 18, 2009 Page 24
Data Analysis Finite Stats.

Even though finite statistics do not represent the true


statistics, they can still be used to predict useful
results using

xi = x tv , P S x (P % )
tv,P is given in Table 4.4 and is known as the t estimator.
+- tv,P is a precision interval at probability P%. It is used
to predict the probability of measured samples falling
within any given value range.
See Example 4.4
January 18, 2009 Page 25
Data Analysis Finite Stats.

The range over which the possible values of the true


mean might lie at a given probability level, P,
based on information from a given data set is
given as

x tv , P S x ( P% )
Confidence interval at the probability P%.

January 18, 2009 Page 26


Data Analysis Finite Stats.

The confidence interval is a measure of the random


error in the estimate of variable x.
The estimate of the true mean based on a finite data
set is then,

x = x tv , P S x ( P% )
The Standard Deviation of the means is a precision
indicator for the mean value estimate based on a
finite number of data sets.

January 18, 2009 Page 27


January 18, 2009 Page 28
Data Analysis Finite Stats.

By plotting the sample Standard Deviations for many


2
data sets we can generate p( ) the Chi-
Squared Distribution.
For a normal distribution

= vS /
2 2
x
2

v = N 1 = Degrees of freedom

January 18, 2009 Page 29


Data Analysis Finite Stats.

January 18, 2009 Page 30


Data Analysis Finite Stats.

The distribution estimates the discrepancy in


2

measurements due to random error.


See Table 4.5 for values of 2

January 18, 2009 Page 31


Data Analysis Finite Stats.

Standard Deviation of Means

If many sets of samples are compared, these


distributions will fall around the true distribution.

January 18, 2009 Page 32


Data Analysis Finite Stats.
Sx
The Standard Deviation of the means is, S x =
N
The Standard Deviation of the means of several data
sets will differ from the true Standard Deviation
because the sample sets do not contain all the
information.

January 18, 2009 Page 33


Data Analysis Finite Stats.

Example 4.4

January 18, 2009 Page 34


Data Analysis Finite Stats.

January 18, 2009 Page 35


January 18, 2009 Page 36
Next Time

Regression Analysis

January 18, 2009 Page 37

You might also like