Professional Documents
Culture Documents
PDF
This discussion is extracted from the chapter in Silent 10
Risk on meta-probability.
AbstractWe present an exact probability distribution (meta-
distribution) for p-values across ensembles of statistically iden- 8 n=5
tical phenomena, as well as the distribution of the minimum n=10
p-value among m independents tests. We derive the distribution n=15
6
arXiv:1603.07532v3 [stat.AP] 8 Apr 2016
FT
regardless of the sample size n, and vary greatly across repetitions 2
of exactly same protocols under identical stochastic copies of the
phenomenon; such volatility makes the minimum p value diverge
significantly from the "true" one. Setting the power is shown to p
offer little remedy unless sample size is increased markedly or 0.00 0.05 0.10 0.15 0.20
the p-value is lowered by at least one order of magnitude.
The formulas allow the investigation of the stability of the Fig. 1. The different values for Equ. 1 showing convergence to the limiting
reproduction of results and "p-hacking" and other aspects of distribution.
meta-analysis.
From a probabilistic standpoint, neither a p-value of .05 nor
a "power" at .9 appear to make the slightest sense.
an in-sample .01 p-value has a significant probability of
RA
SSUME that we know the "true" p-value, ps , what having a true value > .3.
A would its realizations look like across various attempts on
statistically identical copies of the phenomena? By true value
So clearly we dont know what we are talking about
when we talk about p-values.
ps , we mean its expected value by the law of large numbers Earlier attempts for an explicit meta-distribution in the
across an m ensemble of possible samples for the phenomenon literature were found in [1] and [2], though for situations of
1
P P P
under scrutiny, that is m m pi ps (where denotes Gaussian subordination and less parsimonious parametrization.
convergence in probability). A similar convergence argument The severity of the problem of significance of the so-called
can be also made for the corresponding "true median" pM . "statistically significant" has been discussed in [3] and offered
The main result of the paper is that the the distribution of n a remedy via Bayesian methods in [4], which in fact recom-
small samples can be made explicit (albeit with special inverse mends the same tightening of standards to p-values .01.
functions), as well as its parsimonious limiting one for n large, But the gravity of the extreme skewness of the distribution
with no other parameter than the median value pM . We were
D
N. N. Taleb 1
FAT TAILS RESEARCH PROGRAM
FT
1 n 1 1
1 n
0
where p = I2p 2 , 2 , pM = I12pM 2 , 2 , p =
1 1 n 1 where erfc(.) is the complementary error function and
I2p1 2 , 2 , and I(.) (., .) is the inverse beta regularized
function. erf c(.)1 its inverse.
Remark 1. For p= 12 the distribution doesnt exist in theory, The limiting CDF (.)
but does in practice and we can work around it with the
sequence pmk = 12 k1 , as in the graph showing a convergence 1
erfc erf1 (1 2k) erf1 (1 2pM ) (3)
(k; pM ) =
to the Uniform distribution on [0, 1] in Figure 3. Also note 2
that what is called the "null" hypothesis is effectively a set of
measure 0. mv
Proof: For large n, the distribution of Z = becomes
RA
s
v
n
Proof: Let Z be a random normalized variable with that ofa Gaussian, and the one-tailed survival function g(.) =
realizations , from a vector ~v of n realizations, with sample
1
erfc , (p) 2erfc1 (p).
mean mv , and sample standard deviation sv , = mvsm v
h 2 2
n
(where mh is the level it is tested against), hence assumed
to Student T with n degrees of freedom, and, crucially, PDF/Frequ.
supposed to deliver a mean of ,
53% of realizations <.05
n+1
2 25% of realizations <.01
n 0.15
() 2 +n
=
f (; ) n 1
nB 2, 2
5% p-value
where B(.,.) is the standard beta function. Let g(.) be the one- 0.10
cutpoint
D
tailed survival function of the Student T distribution with zero (true mean)
N. N. Taleb 2
FAT TAILS RESEARCH PROGRAM
0.10
4 .025
0.08
.1
3 .15 0.06
0.5
0.04
2
0.02
1 m trials
2 4 6 8 10 12 14
p Fig. 4. The "p-hacking" value across m trials for pM = .15 and ps = .22.
0.0 0.2 0.4 0.6 0.8 1.0
FT
X,p,n
usefully calculated as:
s
1 1 (X)
(p; pM ) = 2pM log
2p2M
s
r
1
1
Proposition 4. Let c be the projection of the power of the
log 2 log 2p 2 2 log(p) log 2 log 2 log(pM )
2p2
e M test from the realizations assumed to be student T distributed
and evaluated under the parameter . We have
+ O(p2 ). (4) (
(c )L for c < 21
RA
The approximation works more precisely for the band of (c ) =
1
relevant values 0 < p < 2 . (c )H for c > 21
N. N. Taleb 3
FAT TAILS RESEARCH PROGRAM
ACKNOWLEDGMENT
Marco Avellaneda, Pasquale Cirillo, Yaneer Bar-Yam,
friendly people on twitter ...
R EFERENCES
[1] H. J. Hung, R. T. ONeill, P. Bauer, and K. Kohne, The behavior of the
p-value when the alternative hypothesis is true, Biometrics, pp. 1122,
1997.
[2] H. Sackrowitz and E. Samuel-Cahn, P values as random variables
FT
expected p values, The American Statistician, vol. 53, no. 4, pp. 326
331, 1999.
[3] A. Gelman and H. Stern, The difference between significant and
not significant is not itself statistically significant, The American
Statistician, vol. 60, no. 4, pp. 328331, 2006.
[4] V. E. Johnson, Revised standards for statistical evidence, Proceedings
of the National Academy of Sciences, vol. 110, no. 48, pp. 19 31319 317,
2013.
[5] O. S. Collaboration et al., Estimating the reproducibility of psychological
science, Science, vol. 349, no. 6251, p. aac4716, 2015.
RA
D
N. N. Taleb 4