You are on page 1of 11

This article reviews, without mathematics, the important principles governing

the acquisition and use of normative data in electrodiagnostic medicine.


Common flaws in neurophysiological normative data include vague clinical
criteria for establishing freedom from disease, samples that are too small
and inadequately stratified, and application of Gaussian statistics to non-
Gaussian variables. Other problematic issues concern the trade-off between
permissible false-positivity and false-negativity in defining the limits of nor-
mative from sample data, testretest variability, and the use of multiple
independent test measurements in each electrodiagnostic examination.
The following standards for normative data are proposed: (1) standardized
objective determination of freedom from disease; (2) appropriately large
sample of normal subjects; (3) proportional statification of normal subjects
for known relevant variables; (4) test of Gaussian fit for application of
Gaussian statistics; and (5) data presentation by percentiles when Gaussian
fit is in doubt. Many existing normative studies in clinical neurophysiology
do not meet these standards. High-quality normative data, readily acces-
sible, is essential for the accurate electrodiagnosis of neuromuscular dis-
eases. 1997 American Association of Electrodiagnostic Medicine. Published by
John Wiley & Sons, Inc.
Key words: normal limits reference values diagnostic testing
MUSCLE & NERVE 20:414 1997

AAEM MINIMONOGRAPH [47:


NORMATIVE DATA IN
ELECTRODIAGNOSTIC MEDICINE
LESLIE J. DORFMAN, MD, and LAWRENCE R. ROBINSON, MD

More often than not Ive observed that convenient sons why this is so. Testing large numbers of normal
approximations bring you closest to comprehending subjects is perceived as less interesting than testing
the true nature of things.
diseased patients. If the test is uncomfortable or pos-
Haruki Murakami sibly hazardous, it may be more difficult to recruit
normal subjects than patients, and institutional re-
Most of the tests employed in electrodiagnostic view bodies may be reluctant to grant approval for
medicine depend upon normative data for their in- normative studies. External funding agencies are
terpretation. No matter how meticulously a neuro- usually not eager to pay the costs of gathering nor-
physiological test procedure is carried out, its inter- mative data. Many young academic physicians be-
pretation is problematic unless there exists a body of lieve, correctly, that compiling normative data is not
valid normative data against which each patient re- the best strategy for career advancement. Conse-
sult may be compared. As new and more refined tests quently, the acquisition and analysis of normative
are developed, each demands its own base of nor- data have received less attention than they deserve.
mative values for clinical decision making. And yet, The purpose of this article is to review some of
clinicians and clinical scientists devote less attention the statistical principles governing the compilation
to normative data than to other aspects of test devel- and use of normative data in medicine; to critique
opment and implementation. There are several rea- some widely used approaches to normative data
gathering; and to recommend standards for quality
and implementation of normative databases that will
From the Department of Neurology and Neurological Sciences, Stanford
University School of Medicine, Stanford, California; and the Department of facilitate diagnostic clinical applications. The em-
Rehabilitation Medicine, University of Washington School of Medicine, phasis will be upon electrodiagnostic medicine, with
Seattle, Washington.
examples drawn specifically from the discipline, but
Address reprint requests to American Association of Electrodiagnostic
Medicine, 21 Second Street S.W., Suite 103, Rochester, MN 55902.
many of the principles to be considered apply
equally well to diagnostic testing in other medical
Accepted for publication August 1, 1996.
specialties. The intent is to make the reader into a
CCC 0148-639X/97/010004-11
1997 American Association of Electrodiagnostic Medicine. Published
better informed and more critical consumer of
by John Wiley & Sons, Inc. normative data, and to encourage investigators to

4 AAEM Minimonograph [47 MUSCLE & NERVE January 1997


compile normative databases of high quality in order a sample of normal individuals (the convenient ap-
to facilitate the electrodiagnosis of neuromuscular proximation) and extrapolated to the whole popu-
diseases. lation. For this procedure to be valid, the sample
must be of sufficient size, and must resemble the
A WORD ABOUT TERMINOLOGY parent population in those properties which may af-
The adjective normative (data, values) has been cho- fect the variable under study. When relevant prop-
sen to designate test results derived from disease-free erties are known, the sample must be stratified to
individuals. Note that disease free is not the same as include appropriate representation of those proper-
asymptomatic, and neither means healthy.13 The ties. Examples of properties relevant to electrodiag-
word normal is preferable in some respects, but has nostic tests include age and height, and possibly
judgmental overtones,2224 and too many alternative gender. Stratification is unnecessary when the rela-
meanings, so its use here is restricted to mean dis- tionship of the property to the variable under study
ease free. The term reference5,12 has broader scope, is exactly known and can be compensated for math-
including both normative data and disease-reference ematically; this is rarely the case in clinical practice.
data, and is used here to designate only the latter. The relevance of some properties (weight, race) are
still uncertain. Also, it is important to keep in mind
DISTRIBUTION OF A VARIABLE IN A POPULATION that there may exist still other population properties
Types of Variables. There are three main types of whose effects on the variable being studied have yet
variables in electrodiagnostic medicine. Nominal vari- to be recognized. The best way to deal with the pos-
ables are categorical classifications without magni- sibility of uncertainor so-called lurking 18
tude; gender is one example. A subset of nominal, properties is to use a large and variously stratified
the existential variable, refers to phenomena which sample. If the sample is sufficiently large and repre-
are classified as either present or absent. Certain sentative, it will be possible to analyze the influence
pathological signals recorded during the needle ex- of the different population properties.
amination of muscle fall into this category, such as It is useful to distinguish between a normal
myotonic and myokymic discharges. Ordinal vari- sample and what may be termed a feasibility sample.
ables have magnitude, but not a linear scale, so the The latter term refers to the common practice of
magnitudes cannot be mathematically manipulated. trying out a new test procedure on a group of handy
An example of an ordinal scale is the 0 to 4+ abun- subjects comprised of a few colleagues, trainees, and
dance rating given to spontaneous muscle activity laboratory staff. This is a reasonable approach for
such as positive sharp waves, fibrillation, and fascicu- demonstrating that the test is practicable, but the
lation potentials. These ratings are crude, subjective, values generated usually do not meet the criteria for
and somewhate relative. It is understood that a 2+ quality normative data.
rating is greater than a 1+ rating, but it is not known A sample of normal individuals may be se-
whether the magnitude of the difference between lected according to different criteria of normalcy.13
them is the same as the difference between a 4+ and The minimal criterion is that the individuals seem to
a 3+ rating, and so on. Consequently, it makes no be healthy, or have no obvious external signs of be-
sense to use these ratings computationally; a descrip- ing diseased. A somewhat more rigorous criterion is
tion of fibrillation potentials as 2.17+ has no numeri- to use either an interview or a questionnaire to
cal meaning. screen candidates for history or symptoms of disease.
Most of the variables in electrodiagnostic medi- More rigorous still is the use of a physical examina-
cine are of the interval kind, which is to say that they tion, and possibly some ancillary tests, to eliminate
have magnitude and a linear scale that is continuous candidates who may unknowingly be not entirely
or finely divisible, so that values may be added, mul- healthy; this strategy may be necessary if it is impor-
tiplied, and so on, according to the usual rules of tant to eliminate common subclinical pathology
arithmetic. Nerve conduction velocities, compound such as mild carpal tunnel syndrome. These ap-
action potential amplitudes, and motor unit action proaches will yield different normal samples corre-
potential durations are all examples of interval vari- sponding to different subpopulations of people. If
ables. the selection criteria are too lax, the sample may
include subclinically diseased individuals, and the re-
Types of Samples. Because it is not possible to mea- sulting range of normative values will be overly
sure the normative values of a variable in the entire broad. If the criteria are too stringent, some normal
human population, the distribution (range and fre- individuals may be excluded, and the normative
quency) of the values of the variable are measured in range will be artificially narrow. Different criteria of

AAEM Minimonograph [47 MUSCLE & NERVE January 1997 5


normalcy are collectively one of the reasons that nor- tal, and experiential factors which collectively make
mative data in the electrodiagnostic literature are up what is referred to as inherent biological variabil-
not in all cases identical. ity. There are also factors associated with the testing
The use of patients with disease or injury as procedure itself which contribute to the variation in
pseudonormal subjects should be discouraged, be- results. One is the set of technical or procedural
cause these individuals are not properly representa- issues more or less under the control of the indi-
tive of the normal population. This practice is some- vidual performing the test. For example, in perform-
times employed in studies of pediatric patients, ing a nerve conduction velocity study, these might
because it is difficult to recruit normal children for include such things as limb temperature,8,9 ad-
uncomfortable or intimidating tests. When a child is equacy of stimulation, and the physical characteris-
hospitalized, for example, with trauma, the argu- tics of the recording electrodes and the system of
ment is sometimes made that the untraumatized amplification and display. There is also a component
limbs can be considered normal. There are several of variation attributable to who does the testing, or
reasons why this may not be so: (1) other limbs may to the exact methodology employed. This issue has
have sustained subclinical trama sufficient to affect been carefully analyzed in the setting of estimating
the variable under study; (2) medications or other the properties of motor unit action potentials.32 The
treatments may have systemic effects that influence same recorded data sets were measured by different
the test outcome; and (3) the population of children experienced electrodiagnostic medicine consultants
with major trauma may well contain a higher pro- and by different computer algorithms. A certain
portion than normal of preexisting conditions which amount of variation was observed between the re-
predipose to trauma, and these conditionseven if sults obtained by humans and those obtained by
subclinicalmay affect the test outcome. Normative computers; however, this variation was similar to that
data derived from such patients should therefore be between different humans, and between different
considered tentative and subject to verification in computer programs. In other words, there is an ob-
true normal subjects. server factor which contributes to the final test out-
Even less acceptable is the practice, still occasion- come. Similar findings have been obtained in serial
ally encountered, of using as normal subjects those nerve conduction studies on the same subjects.2,6,21
patients referred to the electrodiagnostic laboratory Finally, even when the subject, test, and tester are
for evaluation of symptoms, but whose test results identical, there is still an irreducible uncertainty to
turn out to be within the normal range. These are the outcome, which is referred to as testretest vari-
not normal subjects; they are symptomatic patients ability. Studies in humans suggest that this compo-
whose conditions have not yet been diagnosed. As nent contributes less variation than interindividual
discussed further below, test results that fall within differences, but may yet account for 210% of re-
the normative range do not necessarily mean that sponse variability.27
the subject is free of the disease in question (nor It is useful to bear in mind that testretest varia-
does an abnormal result necessarily mean that the tion can be a diagnostic measure in its own right,
subject is diseased). To judge someone normal on independent of the actual values of the individual
the basis of test results (in whole or part) and then to test results. One may therefore encounter a situation
incorporate those results, or similar ones, into a nor- in which the same test performed sequentially on the
mative database is to engage in circular reasoning same individual gives results both times which are
and to accumulate spurious data. within the normative range, and yet the difference
between the two results may exceed the normative
Sources of Variation in Sampling. In medicine, as range for testretest variation, suggesting the pres-
in much of biological science, a test administered to ence of pathology in the intertest interval. The mag-
a population of individuals rarely gives the same re- nitude of normative testretest variation cannot be
sult in all. Instead, a distribution of values is usually inferred from the results of individual tests, but must
obtained, which is customarily described in terms of be determined empirically.
one measure of central tendency and one of varia-
tionmost often the mean and standard deviation, THE GAUSSIAN DISTRIBUTION
respectively. This variation has multiple sources. Many population variables show a bell-shaped fre-
One is the set of known population properties, such quency distribution which is known as the normal, or
as age, which influence the test results in a predict- Gaussian, distribution. This familiar curve has fea-
able or measurable way.3,4,10,28,31 Another is the set tures which are well suited to statistical analysis: it is
of partially or wholly unknown genetic, developmen- symmetrical; its mean, median, and mode are iden-

6 AAEM Minimonograph [47 MUSCLE & NERVE January 1997


tical; and the area under the curve (the total popu- drawn from such a population is directly related to
lation of measurements) can be conveniently de- sample size. The range of sample values is likewise
scribed in standard deviation, or z, units (Fig. 1). It is related to sample size. Moreover, the rangeunlike
important to note that the Gaussian distribution other methods for deriving normative valuesis
tends asymptotically to the baseline at both ends. critically dependent upon two individual values, the
This means that a population variable with such a lowest and the highest, and essentially disregards all
distribution may include a small proportion of ex- the other sample data. Extreme values, such as the
treme, or outlying, high and low values. The like- lowest or the highest, are more likely to have been
lihood of encountering an outlying value in a sample derived from subjects with subclinical disease, or to

FIGURE 1. Frequency distribution curves for a hypothetical Gaussian variable (above) and one with strong positive skew (below). For the
Gaussian distribution, the numbers to the right indicate representative frequency values, and those to the left indicate the area under the
curve for each increment of z-score (standard deviation). For the skewed distribution, the vertical bars indicate the values corresponding
to 2.5 standard deviations about the mean; because of the erroneous Gaussian assumption, too many cases fall above the upper limit,
and the lower limit is not meaningful.

AAEM Minimonograph [47 MUSCLE & NERVE January 1997 7


be the result of methodological error. For all these
reasons, the range is generally considered not a
good descriptor of the normative limits.
On the other hand, the occurrence of multiple
extremeor outlyingvalues in a test subject may
be a sensitive indicator of pathology.33,34

Normative Limits of the Gaussian Distribution.


When a variable of interest has a Gaussian distri-
bution in the population, it is customary to set the
normative limits at 2 standard deviations about the
mean, which includes about 95% (actually 95.44%)
of the observations. About 4.5% of normative values
will fall outside limits defined in this way, and rep-
resent false-positive test results, half at each end of
the range. Some variables in electrodiagnostic medi-
cine are of interest only to the extent that the results
deviate in a single direction from the mean. For ex-
ample, it is not biologically meaningful if a nerve
conduction velocity value is too fast, or if a distal
latency value is too short; these can only be normal
outlier values, or the result of methodological error.
In such cases, if a certain incidence of false-positivity
is considered appropriate (say 5%), this may be de-
fined in single-ended fashion, i.e., only in the direc-
tion of interest. It may be seen from Table 1, or from
a more comprehensive table of areas of the Gaussian FIGURE 2. Examples of frequency distributions for a hypothetical
curve, that 5% of measurements fall outside the limit Gaussian variable in a normal population (left) and a disease
of the mean + (or , but not both) 1.64 standard population (right). In A there is little overlap between the distri-
butions, so normative limits set widely will permit good discrimi-
deviations.
nation between normal and diseased. In B there is some overlap,
In most medical tests, there is some overlap be- so the definition of normative involves some inevitable trade-off
tween the values obtained in normal subjects and between sensitivity (false-negativity) and specificity (false-
those in diseased individuals (Fig. 2B). The most positivity). A cutoff at * minimizes the sum of false-positivity and
important single factor that governs the definition of false-negativity; a cutoff at ** minimizes false-positivity, at the
expense of increased false-negativity. In C the distributions over-
the normative range is the desired trade-off between
lap extensively; a variable with such characteristics does not
false-positivity and false-negativity.26,27,30 With the make for a good diagnostic test.
normative limits set at 2 standard deviations about
the mean, either 2.25% or 4.5% of measurements in one-tailed or two-tailed apporach is used. The inci-
normal individuals will be false-positives, i.e., fall out- dence of false-positivity can be reduced by broaden-
side the normative range, depending on whether a ing the limits, e.g., to 2.5 standard deviations, in
which case the false-positive rate falls to about 1% in
Table 1. Some values of z corresponding to areas under the aggregate, but the risk of false-negativity is corre-
Gaussian distribution curve. spondingly increased. In other words, when the lim-
its of normative are very broad, more diseased indi-
viduals will give results that fall within those wide
limits, and the sensitivity of the testits ability to
distinguish diseased from normalis diminished,
Double-ended Single-ended but the specificity is correspondingly improved.16 Ac-
% area (Mean + and z) (Mean + or z)
cordingly, as the normative limits are made wider or
90 1.64 1.28 narrower there is a continuous trade-off between
95 1.96 1.64 sensitivity and specificity. 27 Sophisticated algo-
98 2.33 2.05 rithmssuch as receiver operating characteristics
99 2.57 2.33
99.9 3.03
curvesmay be applied to overlapping distributions in
order to achieve a desired level of discrimination.11

8 AAEM Minimonograph [47 MUSCLE & NERVE January 1997


There is no hard and fast rule to guide the setting tions from Gaussian; however, statistical methods
of the normative range in all cases. In general, the which focus on the extremes (ends) of the distribu-
less overlap that exists between the normative and tion (such as setting normative limits) are markedly
disease-reference values for a particular test (Fig. influenced by skewness. An example using hypo-
2A), the more inclusive (broader) the normative thetical data is shown in Figure 1 (lower). Setting the
range should be. It is also the opinion of the authors, normative range of this positively skewed distribu-
derived from the admittedly biased perspective of tion at the mean 2.5 standard deviations, as illus-
academic practices at major referral centers, that trated, is problematic. At the high end of the range,
clinical neurophysiologistsand particularly the less too many normal individuals would be misclassified
experienced onestend to err on the side of false- as abnormal (false-positives). The low end of the
positivity, i.e., interpreting as abnormal too many iso- normal range defined in this way is not meaning-
lated, borderline test results. Many of the diagnoses ful, because it encompasses some values that do not
derived from electrodiagnostic testing call for expen- occur in normal subjects. It has been estimated that
sive or potentially hazardous interventions (e.g., sur- up to 10% of electrodiagnostic classifications may be
gery, immunotherapy) making false-positive results in error because Gaussian statistics have been ap-
especially problematic. These observations lead the plied to non-Gaussian distributions.20
authors, in general, to favor wider ranges of norma- There are three strategies for dealing with
tive. sample distributions which areor may benon-
When the distributions of values in normal sub- Gaussian. The first is to test whether or not the ob-
jects and diseased individuals overlap very exten- served values fit a Gaussian distribution. This is most
sively, the test is a poor one for diagnostic purposes, commonly done using the chi-square statistic to de-
and no amount of adjusting the normative range will
termine the probability that the observed frequen-
make it into a good test (Fig. 2C). In scientific re-
cies of sample values over different intervals is or is
search reports, powerful tests of statistical signifi-
not compatible with the (null) hypothesis that those
cance are often used to show that two distributions
values were drawn from a population in which the
of measurements are significantly different from
variable has a Gaussian distribution. If the probabil-
each other, even though they overlap considerably.
ity is high, Gaussian statistics may be applied to the
The scientific conclusion may be valid, yet the test
sample data. If the probability is low, the null hy-
may have little or no diagnostic usefulness, because a
pothesis is rejectedindicating that the underlying
single patient value cannot be confidently assigned
to a normal or disease category. distribution of the variable in the population is un-
likely to be Gaussian, and some other strategy must
Non-Gaussian Distributions. The distribution of a be employed to define the limits of normative. There
variable in a population may deviate from Gaussian also exist other tests (e.g., the KolmogorovSmirnov
in one or both of two major ways. The distribution is test) that can assign a probability to the likelihood
said to be skewed if it is asymmetrical about the that a data set is derived from a Gaussian distribu-
peak, with a larger tail in one direction than the tion. These tests have limitations and requirements,
other; the skewness is positive if the larger tail is in such as sample size, which are beyond the scope of
the direction of increasing variable values. The dis- this discussion but which should be reviewed before
tribution is said to exhibit kurtosis if the bell the tests are applied. A simple plot of the sample
shape is too wide (platykurtic) or too narrow (lep- data, preferably in histogram format, is often helpful
tokurtic). Of these two, skewness poses the greater to visualize the distribution and confirm the quanti-
problem for constructing good normative data. tative findings from statistical testing.
Many variables in electrodiagnostic medicine are If the sample data do not appear to fit a Gaussian
likely to have skewed distributions in the normal distribution, these can sometimes be transformed so
population, because they are physiologically con- as to be more tractable to Gaussian manipulations.
strained in one direction: distal latencies and com- For example, the logarithm (natural or base 10),
pound action potential amplitudes are two such square root, or negative inverse (1/x) transform of
classes of variables. all data points will bring positively skewed data into
When the mean n standard deviations of a a more Gaussian shape. Squaring or cubing the data
sample set are taken to define the normative range, points will render a negatively skewed distribution
this assumes implicitly that the variable under study (rare in electrophysiological data) more Gaussian. If
has a Gaussian distribution in the population. Many one then takes the mean n standard deviations of
statistical tests are relatively robust to small devia- the transformed data and converts these end points

AAEM Minimonograph [47 MUSCLE & NERVE January 1997 9


back to original units, one can derive meaningful percentile. The limits of a normative distribution
normative limits for clinical application. may be defined directly from the percentile divi-
The third strategy is to use descriptive statistics sions: 5th and 95th percentiles, 2.5th and 97.5th per-
which are distribution free, such as the percentile centiles, as desired.
method described in the next section. The major advantage of percentile analysis is that
it makes no assumptions about the shape of the
THE PERCENTILE METHOD sample and population distributions (except that
they are presumably similar). Percentile analysis
When the sample data permit, Gaussian statistics
works equally well for variables which have Gaussian
represent a convenient approximation to the distri-
and non-Gaussian distributions, including those
bution of the variable in the population. An alterna-
which are heavily skewed or which have more than
tive method for describing the frequency distribu-
one mode (Fig. 3D). Its major disadvantage is a re-
tion of the variable is to divide it into percen-
quirement for a large sample size if the quantile di-
tiles.17,20,36 In theory, this consists of dividing the
visions are to be accurate. A general guideline is that
frequency distribution into 100 equal parts. In prac-
for quantile demarcations on the order of 2.5%
tice, larger divisions (quantiles) are commonly em-
which is a common normative boundarythe
ployed: 4 quartiles, 5 quintiles, 10 deciles, and so on.
sample size should be not less than 100.
Quantiles are calculated from the cumulative fre-
quency distribution of the variable, as illustrated in NORMATIVE DATA STANDARDS
Figure 3, in a manner entirely analogous to deter- Based upon the foregoing considerations, a set of
mining the median valuewhich is, in fact, the 50th standards are proposed that could be applied to new

FIGURE 3. Frequency distributions of hypothetical variables (below) and their corresponding cumulative frequency distributions (above),
illustrating the percentile method. A shows the regular sigmoid cumulative curve corresponding to the symmetrical Gaussian distribution
in B. The median (and mean) corresponds to the 50th percentile. The positively skewed, bimodal distribution in D yields the cumulative
freqency distribution in C. Note that the quintiles (20th percentile points) are highly nonuniform.

10 AAEM Minimonograph [47 MUSCLE & NERVE January 1997


normative data that are being considered for publi- The Normal Subjects Should Be Proportionately
cation. In the authors judgment, these proposed Stratified for Known Relevant Variables. In the
standards are not unduly rigorous, and they are not case of neuromuscular electrodiagnosis, the known
intended to make even more burdensome the task of relevant variables are age, height, and possibly gen-
gathering and presenting new data. The objective is der. Suspected relevant variables are weight, race,
to improve the accuracy of medical diagnostic test- and athleticism. Less likely but still possible relevant
ing. It must be acknowledged, however, that much of variables are numerous and might include occupa-
the existing body of normative data in electrodiag- tion, diet, alcohol intake, smoking, and a host of
nostic medicine does not meet these standards. other experiential factors. It is obviously impossible
to control for all these variables. Good normative
The Criteria for Establishing that the Normal Sub-
data should be stratified for the known variables, i.e.,
jects Are Free from Disease Should Be Reasonable,
include a proportional representation of subjects of
Objective, and Consistently Applied. It is not the
different ages, heights, and genders, similar to the
intention to specify which particular criteria should
proportions of those variables in the intended test
be employed, nor whether these should be imple-
population. The data may then be organized in such
mented using a questionnaire, interview, examina-
tion, ancillary tests, or some variation or combina- a way that separate normative limits are calculated
tion of these. However, it should be clearly stated for each subsetor cellaccording to each vari-
exactly what was done, and that the screening pro- able; or, alternatively, the data may be subjected to
cedure(s) was applied equally to all subjects. It is multiple regression analysis, in which the influence
insufficient to note that the subjects . . . were of each variable is individually computed, and a
healthy, . . . had no evidence of disease, model developed to predict the normative limits for
. . . had no apparent health problems, . . . were each patient. The former approach generally calls
asymptomatic, or other similar statements. for larger numbers of subjects; the latter is statisti-
cally more complex because the regression relation-
The Sample of Normal Subjects Should Be Appropri- ships are often nonlinear, particularly at the ex-
ately Large. Unfortunately, it is not easy to specify tremes, and there may be interaction effects between
a sample size that will be appropriate for all circum- the variables.
stances, but some guidelines have broad application.
In general, the larger the sample size, the better for A Test for Goodness of Gaussian Fit Should Be Ap-
statistical purposes. With sample sizes of fewer than plied to the Sample Data before Using Gaussian De-
10 subjects, the sampling error increases substan- scriptive Statistics. When Gaussian statistics are
tially, so no subgroup within the normal sample used inappropriately, serious errors may result. It is
should consist of fewer than 10 if the intention is to desirable that the chi-square or Kolmogorov
perform subgroup analysis. Twenty normal subjects Smirnov test be applied and the value of P reported.
may be considered an absolute bare minimum num- At a bare minimum, the coefficients of skewness and
ber for validation of a new test procedure, for ex- kurtosis should be computed and reported, to allow
ample, but many more are required for a valid, strati-
an estimate of the deviation from Gaussian.
fied normative database to be used as a clinical
reference. More subjects are needed for percentile
Whenever There Is Doubt, the Normative Data Should
analysis than for application of Gaussian statistics.
Be Described in Percentile Format. It is possible to
Note that the critical value in considering sample
argue that percentile analysis should be the gold
size is the number of individuals, not the number of
nerves, muscles, or motor units. Tests on two nerves standard for normative data presentation, because
or muscles from the same subject are likely to exhibit Gaussian statistics apply only in a subset of special
less variation than tests in different individuals. More cases. However, Gaussian statistics are deeply in-
than one test of a given kind may be performed on grained in medical diagnostic thinking, and do serve
a normal subject when, for example, side-to-side as a convenient approximation in many cases. When-
comparison is of interest; but only one test result ever there is uncertainty about the appropriateness
(one side) from each subject may be entered into of the Gaussian model, a description of the data
the normative database, at the risk of spuriously low- should be given in percentile format. In borderline
ering the measure of interindividual variability (stan- instances, it may not be unreasonable to present
dard deviation). both formats.

AAEM Minimonograph [47 MUSCLE & NERVE January 1997 11


ADDITIONAL CONSIDERATIONS Bayes theorem1,14 provides an approach for
Multiple Tests. No discussion about setting norma- dealing numerically with the conditional probabili-
tive limits is complete without considering the con- ties of multiple diagnostic testing, but its implemen-
sequences of performing multiple independent tests tation is beyond the scope of this discussion. For
on a single patient. In general, the likelihood of find- clinical application, three strategies can be recom-
ing an abnormal value is given by the sum of the mended for dealing with the inevitable false-positive
probabilities in each of the individual tests.15,26,29,35 results. First, attention should be directed to the mag-
Consider an electrodiagnostic evaluation for a typi- nitude of the deviation from normal. A result that
cal clinical problem such as arm pain: depending falls just outside the normative limit is more likely to
upon the precise clinical circumstances, this might be a false-positive normal than a result that is far
involve median and ulnar motor and sensory con- outside the normative range. Second, the question-
duction studies, perhaps a median transcarpal con- able result should be evaluated critically in light of
duction study, and an F-wave study in one nerve, in the patients clinical situation; a borderline abnormal
addition to the needle examination of selected value that makes no clinical sense is more likely to be
muscles. Each motor conduction study involves four a false-positive finding. Finally, the examiner should
independent electrophysiologic measurements seek to identify a pattern of multiple confirmatory
two latencies and two amplitudes of the compound abnormal results that fit with each other, and with
muscle action potentials; each sensory and transcar- the clinical circumstance.
pal study involves two measurementsone latency
and one amplitude of the compound nerve action Excessive Precision. Normative electrodiagnostic
potential; and the F-wave study involves a single la- data should be reported in a format that is commen-
tency measurement. The total number of indepen- surate with the precision of the measurements. In
dent electrophysiologic measurements performed general, latencies can be measured to tenths of mil-
on this patient is therefore 15not counting the liseconds, compound muscle action potential ampli-
needle examination, which has its own special prob- tudes to tenths of millivolts, and compound nerve
lems of interpretation. If the normative range for action potential amplitudes to microvolts. Conduc-
each measurement is set to allow a 2.5% rate of false- tion velocity, in meters per second, is a quotient de-
positivitya common criterionit may be seen rived from a latency measurement and one of dis-
from Table 2 that the examination has a probability tance. The arithmetic rule states that the precision of
of almost 1 in 3 (31.5%) of turning up one or more a quotient corresponds to the less precise of the di-
abnormal values on the basis of chance alone, and visor and dividend. Although latency may be accu-
a probability of greater than 5% of turning up two or rate to tenths of milliseconds, the distance measure-
more such false-positive abnormalities. ment is at best accurate to whole millimeters, so the
conduction velocity should be expressed in whole
integer values only (i.e., no decimals or fractions).
Table 2. Probability (percent) of finding false-positive
abnormal results, on the basis of chance, according to the Presenting Normative Data. The results of norma-
number of independent measurements made.
tive data studies are sometimes presented in graphic
Number of abnormalities format. Graphic representation has several impor-
Measurements 1+ 2+ 3+ 4+ 5+ tant advantages, including data compression and
ease of communicating concepts. Medical journals
1 2.5 prefer graphs to tables of data because tables require
2 4.9 0.1
3 7.3 0.2 <0.1
more labor and are less visually interesting to the
4 9.6 0.4 <0.1 <0.1 casual reader. However, access to the actual numeric
5 11.9 0.6 <0.1 <0.1 <0.1 data is often essential for medical decision making.
6 14.0 0.9 <0.1 <0.1 <0.1 The authors therefore encourage authors and jour-
7 16.2 1.2 <0.1 <0.1 <0.1 nals to publish their high-quality normative data in
8 18.3 1.6 0.1 <0.1 <0.1
9 20.4 2.0 0.1 <0.1 <0.1
tabular form whenever this would facilitate clinical
10 22.4 2.5 0.2 <0.1 <0.1 usage. Normative data are most useful when they are
15 31.5 5.2 0.5 <0.1 <0.1 readily available at the time of the electrodiagnostic
20 39.5 8.5 1.3 0.1 <0.1 examination. The simplest way to accomplish this is
Each measurement has a 2.5% false-positive rate (mean 2 standard
for the normative data to reside within the instru-
deviations for a Gaussian distribution). ments used for the diagnostic testing. Most contem-

12 AAEM Minimonograph [47 MUSCLE & NERVE January 1997


porary electrodiagnostic machines are built around tion potential amplitude: variation with sex and finger cir-
cumference. J Neurol Neurosurg Psychiatry 1980;43:925928.
digital computers of considerable storage and com-
4. Buchthal F, Rosenfalck A: Evoked action potentials and con-
putational power. Investigators, publishers, and in- duction velocity in human sensory nerves. Brain Res
strument manufacturers should work together to- 1966;3(suppl):1122.
ward the goal of making high-quality normative data 5. Campbell WW, Robinson LR: Deriving reference values in
electrodiagnostic medicine. Muscle Nerve 1993;16:424428.
readily available on-line for clinical use. 6. Chaudry V, Cornblath DR, Mellits ED, Avila O, Freimer ML,
Glass JD, Reim J, Ronnett GV, Quaskey SA, Kunel RW: Inter-
Disease Reference Data. Many of the concepts and and intra-examiner reliability of nerve conduction measure-
ments in normal subjects. Ann Neurol 1991;30:841843.
principles discussed with respect to normative data 7. Chaudry V, Corse AM, Freimer ML, Glass JD, Mellits ED,
apply equally well in theory to disease reference Kunel RW, Quaskey SA, Cornblath DR: Inter- and intra-
data, i.e., data collected from patients with specific examiner reliability of nerve conduction measurements in
known diseases (diagnosed by other methods). Dis- patients with diabetic neuropathy. Neurology 1994;44:
14591462.
ease reference data can facilitate the interpretation 8. Denys EH: The influence of temperature in clinical neuro-
of the electrodiagnostic examination by allowing the physiology. Muscle Nerve 1991;14:795811.
interpreting physician to pose the question, What is 9. Dioszeghy P, Stalberg E: Changes in motor and sensory nerve
conduction parameters with temperature in normal and dis-
the likelihood of finding ulnar motor conduction eased nerve. Electroencephalogr Clin Neurophysiol 1992;85:
velocity value X in disease Y? In this way, it can be 229235.
determined how well the examination findings fit 10. Dorfman LJ, Bosley TM: Age-related changes in peripheral
and central nerve conduction in man. Neurology 1979;29:
the proposed interpretive diagnosis. With access to 3844.
valid disease-reference and normative data, one can 11. Eisen A, Schulzer M, Pant B, MacNeil M, Stewart H, Trueman
optimize the setting of the normative limits so as to S, Mak E: Receiver operating curve analysis in the prediction
maximize both the sensitivity and the specificity of of carpal tunnel syndrome: a model for reporting electro-
physiological data. Muscle Nerve 1993;16:787796.
the test (Fig. 2).27,30 12. Elveback LR: The population of healthy persons as a source of
For disease reference data to be valid, they must reference information. Hum Pathol 1973;4:916.
conform to standards similar to those proposed here 13. Elveback LR, Guillier CL, Keating FR: Health, normality, and
the ghost of Gauss. JAMA 1970;211:6975.
for normative data. Each disease patient sample 14. Fagan TJ: Nomogram for Bayes theorem. N Engl J Med 1975;
must be sufficiently large and appropriately stratified 293:257.
for important subject variables, and the descriptive 15. Files JB, van Peenen HJ, Lindberg DAB: Use of normal
range in multiphasic testing. JAMA 1968;205:9498.
statistics applied must be appropriate to the data. In 16. Gorry GA, Pauker SG, Schwartz WB: The diagnostic impor-
thinking about disease reference population tance of the normal finding. N Engl J Med 1978;298:486489.
samples, freedom from disease obviously is not an 17. Herrera L: The precision of percentiles in establishing nor-
appropriate issue, and is replaced by considerations mal limits in medicine. J Lab Clin Med 1958;52:3441.
18. Joiner BL: Lurking variables: some examples. Am Statist 1981;
having to do with accuracy of diagnosis and freedom 35:227233.
from other coexisting diseases. It is important to bear 19. Lang AH, Forsstrom J, Bjorkqvist SE, Kuusela V: Statistical
in mind that the relationship between the electro- variation of nerve conduction velocity: an analysis in normal
subjects and uraemic patients. J Neurol Sci 1977;33:229241.
physiological test results and certain subject and test 20. Mainland D: Remarks on clinical norms. Clin Chem 1971;17:
variables may not be the same in disease as in 267274.
health.7,9,19,25 Motor conduction velocity in diabetic 21. McQuillen MP, Gorin FJ: Serial ulnar nerve conduction ve-
locity measurements in normal subjects. J Neurol Neurosurg
nerves, for example, may not show the same relation- Psychiatry 1969;32:144148.
ship to age or to limb temperature as in normal 22. Murphy EA: A scientific viewpoint on normalcy. Perspect Biol
nerves. The interpretation of disease reference data Med 1966;9:333348.
is further complicated by the additional dimension 23. Murphy EA: The normal, and the perils of the sylleptic argu-
ment. Perspect Biol Med 1972;15:566582.
of disease severity, for which there is no analog in the 24. Murphy EA: The normal. Am J Epidemiol 1973;98:403411.
normal population. For many conditions, therefore, 25. Notermans NC, Franssen H, Wieneke G, Wokke JHJ: Tem-
the compilation of useful electrophysiologic refer- perature dependence of nerve conduction and EMG in neu-
ropathy associated with gammopathy. Muscle Nerve 1994;17:
ence data must await the development of valid, quan- 516522.
titative clinical indices of disease severity. 26. Rivner MH: Statistical errors and their effect on electrodiag-
nostic medicine. Muscle Nerve 1994;17:811814.
27. Robinson LR, Temkin NR, Fujimoto WY, Stolov WC: Impact
REFERENCES of statistical methodology on normal limits in nerve conduc-
tion studies. Muscle Nerve 1991;14:10841090.
1. Bailar JC III, Mosteller F: Medical Uses of Statistics. Waltham, 28. Robinson LR, Rubner DE, Wahl PW, Fujimoto WY, Stolov
Mass Med Soc, 1986, pp 160162. WC: Influences of height and gender on normal nerve con-
2. Bleasel AF, Tuck RR: Variability of repeated nerve conduction duction studies. Arch Phys Med Rehab 1993;74:11341138.
studies. Electroencephalogr Clin Neurophysiol 1991;81:417420. 29. Schoen I, Brooks SH: Judgment based on 95% confidence
3. Bolton CF, Carter KM: Human sensory nerve compound ac- limits: a statistical dilemma involving multitest screening and

AAEM Minimonograph [47 MUSCLE & NERVE January 1997 13


proficiency testing of multiple specimens. J Clin Pathol 1970; normal results in multitest surveys of healthy subjects. Am J
53:190193. Clin Pathol 1970;53:288291.
30. Schulzer M: Diagnostic tests: a statistical review. Muscle Nerve 36. Thompson WR: Biological applications of normal range and
1994;17:815819. associated significance tests in ignorance of original distribu-
31. Soundmand R, Ward LC, Swift TR: Effect of height on nerve tion forms. Ann Math Statist 1938;9:281288.
conduction velocity. Neurology 1982;32:407410.
32. Stlberg E, Andreassen S, Falck B, Lang H, Rosenfalck A, SUGGESTED ADDITIONAL READINGS
Trojabborg W: Quantitative analysis of individual motor unit
action potentials: a proposition for standardized terminology Benson ES, Strandjord PE (eds): Multiple Laboratory Screening.
and criteria for measurement. J Clin Neurophysiol 1986;3: New York, Academic Press, 1969.
313348.
33. Stlberg E, Bischoff C, Falck B: Outliers, a way to detect ab- Feinstein AR: Clinical Biostatistics. St. Louis, Mosby, 1977.
normality in quantitative EMG. Muscle Nerve 1994;17:392399. Galen RS, Gambino SR: Beyond Normality: The Predictive Value
34. Stevens JP: Outliers and influential data points in regression and Efficiency of Medical Diagnoses. New York, Wiley, 1975.
analysis. Psychol Bull 1984;95:334344. Hacking I: The Taming of Chance. New York, Cambridge
35. Sunderman FW Jr: Expected distributions of normal and ab- University Press, 1991.

14 AAEM Minimonograph [47 MUSCLE & NERVE January 1997

You might also like