MECH215 Week03Lecture1 DataAnalysis

MECH 215 Instrumentation and
Measurement
Week 3, Lecture 1
Data Analysis Statistics and Probability
January 18, 2009 Page 1

Data Analysis Stats. and Prob.
Measurements taken repeatedly (even under

identical conditions) will show variations.
These include,
Resolution Measurement System,
Repeatability Procedures and Techniques
Temporal
Measured Variable
Spatial

For a set of measurements we need to quantify

1. A single representative value (average)
2. A measure of the variation in the data
3. How well these estimates represent the true
average and variation

A sample of data refers to a set of repeated

measurements taken under fixed operating
conditions.
Note 1: Fixed may not mean exactly fixed.
Note 2: Systematic errors are considered to be
negligible. Consider only random errors for now.

Because all the measured results will be estimates of

the true values, we can only estimate the true
value x using
x = x u x (%P)
x = true value
x = most probable estimate of x based on
available data
u x = the uncertainty interval
(%P) = the probability level

Note: u x combines both random error and system

error. We are ignoring system error for now, but
will include it later (week 5, chapter 5)
Because all measurement samples vary with respect

to the true value, a set of measured samples will
display a spread of values clustering about the
true value.

Data Analysis
N individual measurements
xi , i = 1,2,3, N
Data Analysis
Single axis representation of xi

The true value of X is in the middle of the
clump.

Using a histogram to display this data, we need to

choose K small intervals for each bin of the
histogram.
For small N, the number of measurement results in at
least one bin should be >= 5
For intermediate values of N, K = 1.87( N 1) +1
0.40
For large values of N, K = N 1/ 2

Using data from Table 4.1

N = 20
K = 1.87(20 1) + 1 = 7 .1
0.40
Minimum Value = 0.68

Maximum Value = 1.34
1.34 0.68 0.66

x = = 0.10
7 7
Therefore a bin width x of 0.10 is chosen
This histogram is an estimate of the data set

probability density function.
p(x) = probability density function

As N and the bin width x 0
nj
p( x) = lim
N , x 0 N (2 x )
nj is the number of samples in each bin

p(x) defines the probability that a measured variable
might assume any particular value upon any
individual measurement.


Normal (Gaussian) distribution

Log Normal Distribution
Poisson Distribution
Weibull Distribution
Binomial Distribution
Student t Distribution
2 Distribution
Uniform Distribution
Beta Distribution

In the absence of systematic errors, the true mean

value, x of a set of measurements is
1 T
x = lim
T T
0
x (t ) dt
For discrete data,

N
1
x = lim
N N
x
i =1
i

The width of the probability density function reflects

the data variation.
[x (t ) x ] dt
1 T
variance = = lim
2 2
T T 0
For discrete data,

N
(x x)
1
= lim
2
i
2
N N
i =1
Standard Deviation =
Data Analysis Infinite Stats.
Normal / Gaussian distribution (bell curve)

1 (x x )
2
1
p( x) = exp
2 2
2

x = true mean of x
2 = true variance of x
The probability that any future measurement will fall
within any stated interval, is the value of the area
below the p(x) curve.

x + x
P ( x x x x + x ) = p ( x ) dx
x x
If we define the standardized normal variate for any x

as
= ( x x ) /
And we specify an interval on p(x) using,
z1 = ( x1 x) /

Then dx = d
P( z1 z1 ) =
1 z1

2 / 2
e d
2 z1
And because p(x) is symmetrical about x,

1
P( z1 z1 ) = 2
z1

2 / 2
e d
2 0

This is the normal error function (see Table 4.3).


The above are statements of probability.

Integration of p(x) between x z1 for z1 = 1.0
Yields 68.26% of the total area under p(x).
That means there is a 68.26% chance that a
measurement of x will fall within x 1.0
For z1 = 2.0 : 95.45% of area under p(x) lies within
x 2.0
For z1 = 3.0 : 99.73% of area under p(x) lies within

x 3.0

Data Analysis Finite Stats.
Finite statistics describe only the behavior of the

finite data set.
N
1
Recall that the mean value is x=
N
x
i =1
i
( )
N
1
Sample Variance is S =
2

N 1 1
x xi x
Sample Standard Deviation is S x = S x2

Deviation of x is (xi x )
Degrees of freedom of the data set is N 1 = v
Even though finite statistics do not represent the true

statistics, they can still be used to predict useful
results using
xi = x tv , P S x (P % )
tv,P is given in Table 4.4 and is known as the t estimator.
+- tv,P is a precision interval at probability P%. It is used
to predict the probability of measured samples falling
within any given value range.
See Example 4.4
The range over which the possible values of the true

mean might lie at a given probability level, P,
based on information from a given data set is
given as
x tv , P S x ( P% )
Confidence interval at the probability P%.

The confidence interval is a measure of the random

error in the estimate of variable x.
The estimate of the true mean based on a finite data
set is then,
x = x tv , P S x ( P% )
The Standard Deviation of the means is a precision
indicator for the mean value estimate based on a
finite number of data sets.

By plotting the sample Standard Deviations for many

2
data sets we can generate p( ) the Chi-
Squared Distribution.
For a normal distribution
= vS /
2 2
x
2
v = N 1 = Degrees of freedom


The distribution estimates the discrepancy in

2
measurements due to random error.

See Table 4.5 for values of 2

Standard Deviation of Means
If many sets of samples are compared, these

distributions will fall around the true distribution.

Sx
The Standard Deviation of the means is, S x =
N
The Standard Deviation of the means of several data
sets will differ from the true Standard Deviation
because the sample sets do not contain all the
information.

Example 4.4


Next Time
Regression Analysis

MECH215 Week03Lecture1 DataAnalysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MECH215 Week03Lecture1 DataAnalysis

Uploaded by

Copyright:

Available Formats

MECH 215 Instrumentation and

January 18, 2009 Page 1

Measurements taken repeatedly (even under

January 18, 2009 Page 2

For a set of measurements we need to quantify

January 18, 2009 Page 3

A sample of data refers to a set of repeated

January 18, 2009 Page 4

Because all the measured results will be estimates of

January 18, 2009 Page 5

Note: u x combines both random error and system

Because all measurement samples vary with respect

January 18, 2009 Page 6

Single axis representation of xi

January 18, 2009 Page 8

Using a histogram to display this data, we need to

For large values of N, K = N 1/ 2

January 18, 2009 Page 9

Using data from Table 4.1

Minimum Value = 0.68

1.34 0.68 0.66

This histogram is an estimate of the data set

p(x) = probability density function

nj is the number of samples in each bin

January 18, 2009 Page 13

January 18, 2009 Page 14

Normal (Gaussian) distribution

January 18, 2009 Page 15

In the absence of systematic errors, the true mean

For discrete data,

January 18, 2009 Page 16

The width of the probability density function reflects

For discrete data,

Normal / Gaussian distribution (bell curve)

January 18, 2009 Page 18

If we define the standardized normal variate for any x

And we specify an interval on p(x) using,

January 18, 2009 Page 19

And because p(x) is symmetrical about x,

January 18, 2009 Page 20

January 18, 2009 Page 21

The above are statements of probability.

For z1 = 3.0 : 99.73% of area under p(x) lies within

January 18, 2009 Page 23

Finite statistics describe only the behavior of the

Sample Standard Deviation is S x = S x2

Even though finite statistics do not represent the true

The range over which the possible values of the true

January 18, 2009 Page 26

The confidence interval is a measure of the random

January 18, 2009 Page 27

By plotting the sample Standard Deviations for many

January 18, 2009 Page 29

January 18, 2009 Page 30

The distribution estimates the discrepancy in

measurements due to random error.

January 18, 2009 Page 31

Standard Deviation of Means

If many sets of samples are compared, these

January 18, 2009 Page 32

January 18, 2009 Page 33

January 18, 2009 Page 34

January 18, 2009 Page 35