You are on page 1of 40

Correlation and Causation

(don’t abuse the power of a plot)


But they are powerful
Don’t get too busy
And don’t be silly
Graphical Excellence

“… is that which gives to the viewer the


greatest number of ideas in the shortest time
with the least ink in the smallest space”
The Best Example
His example
What about this?
Graphical Presentation of Data
(From Beckwith)
A graph should be used when it will convey information and portray
significant features more efficiently than words or tabulations.
Graphs should:
1) Require minimal effort from the reader in understanding and
interpreting the information it conveys
2) The axes should have clear labels that name the quantity plotted,
its units, and its symbol
3) Axes should be clearly numbered and should have tick marks for
significant numerical divisions. Typically, ticks should appear in
increments of 1, 2, or 5 units. Not every tick need be numbered.
Too many will clutter the axis.
4) Use scientific notation to avoid placing too many digits on the
graph.
Graphical Presentation of Data
5) When plotting on logarithm axes, place ticks at powers of 10 and
minor ticks at 10, 20, 50, 100, 200, etc.
6) Axes should usually include 0.
7) The choice in scales and proportions should be commensurate with
the relative importance of the variations shown in the results. Use
a square axis as a starting point.
8) Use symbols, Not dots, for data points. Open symbols should be
used before closed (Beckwith’s opinion).
9) Either place error bars on the plot that indicate uncertainty or use
symbols that are the size of the uncertainty.
10) When several curves appear on the same plot, use different line
styles to distinguish them. Avoid using colors.
Graphical Presentation of Data
11) Minimize lettering on graphs
12) Labels on the axes and curves should be oriented to be read from the
bottom or from the right. Avoid forcing the reader to rotate the figure
to read it.
13) The graph should have a descriptive but concise title.
14) Software defaults are seldom what you want!

Bottom Line- You want to communicate information to your reader. The


burden to get your point across falls to you. The chances of
successfully communicating your point are improved considerably
when you make it easy on the reader. Never think of your plot as
pretty graphics. If that is all it is, you should remove it.
Common Mistakes
• Grid lines
• Too many labels, not enough ticks
• Meaningless color
• Lines connecting data points
• Multiple plots to make a comparison in
which the scale of the plots is changed
Uncertainty
Instrument Performance Ratings
Accuracy The difference between the measured value and the
actual value, reported as a maximum.
Precision The difference between the instrument’s reported values
during repeated measurements of the same quantity.
Resolution The smallest increment of change in the measured value
that can be determined from the instruments read out.
Usually similar or smaller than precision.
Sensitivity The change in the output of an instrument per unit
change in the input.

Reading Errors (1/2 last digit)


Example--Endevco
3.3 Introduction to Uncertainty
The overall uncertainty of a measurement will be a combination of the
bias uncertainty and the precision uncertainty which we put together in
the familiar least-squares sense:
Ux = (Bx2 + Px2)1/2
3.4 Estimation of Precision
Uncertainty
3.4.1 Sample versus Population

Sample Must Be Random!


3.4.2 Probability Distributions

Distributions (pdf): characterizes the probability that an error


of a given size will occur.

Normal (Gaussian, bell curve)


Finite Samples
 n 2
∑ x i  − nx
2

1 n 1 n  i=1 
x = ∑ xi ∑ 2
Sx = [x i − x ] =
n i=1 n −1 i=1 n −1

Estimate of Estimate of the


the mean Standard Deviation

Based on a finite sample, we would like to:


1) Estimate the mean and standard deviation, and their uncertainty
2) Infer the distribution of the data (pdf).
Central Limit Theorem
σ
σx =
n
Confidence Intervals
Given a large but finite sample, we estimate that true mean is
n
1
x = ∑ xi ≈ µ
n i=1
We’d like to be able to say how sure we are of this estimate. Let’s look
at the probability that our estimate of the mean is within some bound.
We can say that there is a c% chance that our estimate of the mean lies
within σ
µ ± zc / 2
n

The larger we make the confidence interval c, the larger z becomes


(look at Table 3.2) and therefore the larger the range that our
measurement may land in. A wider dispersion in the population
(large σ) also makes this interval larger. Sampling more data
(increasing n) makes the interval smaller.
Confidence Intervals
This means that we are c% confident that the true mean µ lies within the
interval about our measurement:
σ σ
x − zc / 2 < µ < x + zc / 2
n n
The only trouble is that we don’t know the value of σ either. If n is
large enough, we can use our estimate Sx, so
Sx Sx
x − zc / 2 < µ < x + zc / 2
n n

Sx
Standard Error of the Sample Mean Sx =
n
Example: Turbulent Jet Flow
35

30

25

20
u[m/s]

15

10

0
0 0.02 0.04 0.06 0.08 0.1

t[sec]
Histogram of the data
Data taken in the edge of the calibrator jet
4 u[m/s]
1 10
Minimum 3.017
Maximum 31.309999
Sum 1410552.8

8000 Points 100000


Mean 14.105528
Median 13.931
RMS 14.727658
6000 Std Deviation 4.2353468
Variance 17.938163
Count

Std Error 0.013393343


Skewness 0.23356729
4000 Kurtosis -0.35308256

2000

0
3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Range
Points outside 2σ
Maximum 31.309999
1180 points below 5.8 Sum 1410552.8
3021 points above 22.4
Points 100000
---------------------------------
4201 outside of 2 sigma Mean 14.105528
Median 13.931
35 RMS 14.727658
Std Deviation 4.2353468
Variance 17.938163
30
Std Error 0.013393343
Skewness 0.23356729
25 Kurtosis -0.35308256

20
u[m/s]

15

10

0
0 0.02 0.04 0.06 0.08 0.1

t[sec]
Independent Samples?
30

Are these samples independent?

25

20
u[m/s]

15

10

5
0 0.002 0.004 0.006 0.008 0.01

t[sec]
Uncertainty of the Mean
u[m/s] u[m/s]
u[m/s] Minimum 4.934 Minimum u[m/s]
4.7529998 u[m/s]
u[m/s] Minimum 4.158
Maximum 28.25 Maximum 23.617001
Sum 6964.915 Sum 7167.129 Maximum 25.896

hwtrace 7:57:22 PM 2/4/09Points 0 500 Sum 6557.72


Points 0 hwtrace 7:57:22 PM 2/4/09
500
Mean 13.92983 hwtrace 7:57:22 PM 2/4/09 Points 0 500
60 Mean70 14.334258
Median 14.0625 60 Mean 13.11544
Median 14.3925
RMS 14.497875 Median 12.526
RMS 14.751818
Std Deviation 4.0225087 RMS 13.768174
50 Std Deviation 60 3.4884854
Variance 16.180577 50 Std Deviation 4.193211
Variance 12.16953
Std Error 0.17989206 Variance 17.583019
Std Error 0.15600981
Skewness 0.072202698 50 Std Error 0.1875261
40 Skewness -0.06651041
Kurtosis -0.35469733 40 Skewness 0.48626292
Kurtosis -0.075114321
Kurtosis -0.1589086
40
Count

Count
30

Count
30
30

20
20
20

10
10 10

0
4 6 8 10 12 14 16 18 20 22 24 26 28 0 0
4 6 8 10 12 14 16 18 20 22 24 4 6 8 10 12 14 16 18 20 22 24 26
Range
Range Range

u[m/s]
Minimum 5.0640001
u[m/s] u[m/s]
Maximum 25.41
u[m/s] Minimum 3.925
Sum 7052.453 u[m/s]
Maximum 26.697001
u[m/s]
Points 0 500 Sum 7004.394
hwtrace 7:57:22 PM 2/4/09 Mean 14.104906
Minimum 7.1430001
50 Maximum
hwtrace 7:57:22 PM 2/4/09 26.554001 hwtrace 7:57:22 PM 2/4/09Points 0 500
Median 13.7865 Mean 14.008788
Sum 7394.187 60
RMS 14.734009 60 Median 13.821
Points 501
Std Deviation 4.2636846 RMS 14.681991
Mean 14.758856
Variance 18.179007 Std Deviation 4.3992524
40 Median 14.342 50
Std Error 0.19067777 50 Variance 19.353422
Skewness 0.18801474 RMS 15.232915
Std Error 0.19674055
Kurtosis -0.68577901 Std Deviation 3.7744285 Skewness 0.23795119
Variance 14.24631 40 Kurtosis -0.53367869
30 40
Std Error 0.16862903
Count

Skewness 0.4556875

Count
Kurtosis -0.37815647 30
Count

30
20

20
20
10
10
10

0 0
5 7 9 11 13 15 17 19 21 23 25 3 5 7 9 11 13 15 17 19 21 23 25 27
0
7 9 11 13 15 17 19 21 23 25 27
Range Range

Range
Uncertainty on the Mean

15 15
These are 1 σ error bars, meaning 68% These are 2 σ error bars, meaning 95%
of the error bars should contain the actual mean of the error bars should contain the actual mean

14.5 14.5

Mean
Mean

14 14

13.5 13.5

13 13
0 1000 2000 3000 4000 5000 6000 0 1000 2000 3000 4000 5000 6000

Last Sample Last Sample


Confidence Intervals for Small Samples
We do not always have the luxury of taking large samples (n > 30).
For smaller sample sizes, we cannot assume that σ ~ Sx. If we derive
the distribution of the quantity
x −µ
t=
Sx / n
assuming that the population is gaussian,
we get the Student t-distribution
Student’s t-distribution
Small-Sample Confidence Interval
α=1-c
Sx Sx
x − tα / 2,ν < µ < x + tα / 2,ν (c%)
n n

Recall that we are seeking an estimate of our precision uncertainty


which will be part of our total uncertainty
Ux = (Bx2 + Px2)1/2

Sx
Px = tα / 2,ν (c%)
n
Example 3.5
If we have 12 samples with an average of x and a standard
deviation of Sx, what is the 95% confidence interval for the true
value of µ?

α = 1 - c = 0.05, ν = n - 1 = 11
Sx Sx
x − tα / 2,ν < µ < x + tα / 2,ν (c%)
n n
Sx Sx
x − 2.201 < µ < x + 2.201 (95%)
12 12
3.10 Error Propagation
More often then not, the quantity we are interested in measuring is a
function of a few variables. The book cites the example of
estimating volume flow rate by measuring the time it takes to fill a
bucket of known volume. Q = V/t If you knew the uncertainty of V
and the uncertainty of t, how do you find the uncertainty of Q?

A good scientist measures dimensionless quantities which are


always a function of several other variables.
Cp = ∆P/(1/2ρu2)
x = f (y,z,...)
x = f (y,z,...) Error Propagation
x i = f (y i ,zi ,...)
   
2
2 ∂x 
2 ∂x  2
2 ∂x  ∂x 
ux ≅ uy   + uz   + L + 2uyz   
2

 ∂y   ∂z   ∂y  ∂z 
This is the Error Propagation Formula. In the first two terms, the
uncertainty u is the uncertainty of a fundamental quantity while the
partial derivative comes from the relation between x and y,z.

The last term accounts for the extent that fluctuations in y are
correlated to fluctuations in z. We will generally assume that this is
not significant. If y and z occur randomly, then this term is zero.
Propagation of Uncertainty
   
2
2 ∂x 
2 ∂x  2
2 ∂x  ∂x 
ux ≅ uy   + uz   + L + 2uyz   
2

 ∂y   ∂z   ∂y  ∂z 
We will ignore the covariance (last) term and assume that our
uncertainties behave like standard deviations. Thus

 ∂x  2
 ∂ x  2

ux2 ≅ uy2  + uz2   + L


 ∂y   ∂z 
Uncertainty Analysis Procedure
1) Find the functional form of what you will measure (e.g. Re = Ud/v
2) Identify all variables to be measured (U, d, v)
3) For each of these quantities, determine the bias error based on
instrument specs and calibration information
E.G., the velocity probe has an accuracy of 2% of reading (0.02U) or
perhaps 1% of full scale. The diameter is known to the resolution
of the measuring caliper, which is 0.001”.
4) For each of the quantities, if repeated measurements produce
different results, sample the quantity until the desired precision
uncertainty is obtained. ENSURE ALL SAMPLES ARE
INDEPENDENT. (If not, your precision error is larger than you
have estimated. A desirable precision uncertainty is similar to the
bias uncertainty). Compute precision uncertainty to 95% confidence.
Uncertainty Analysis Procedure
5) Root sum the bias and precision uncertainty for each quantity.
6) Propagate the uncertainty. If component uncertainties are provided
as percentages (relative uncertainties, as opposed to absolute
uncertainties), and if the functional form is multiplications,
divisions and powers, it may be convenient to write the propagation
equation in terms of relative uncertainties by dividing through by
the function.
     
2 2 2
2 ∂ Re 2 ∂ Re 2 ∂ Re
uRe = uU 
2
 + ud   + uv  
 ∂U   ∂d   ∂v 
 d  2
 U  2
 dU  2
2
uRe = uU2   + ud2   + uv2  − 2 
v v  v 
 uU   ud   uv 
2 2 2 2
u Re
=  +  + 
2
Re  U   d   v 

You might also like