The Reliability and Diagnostic Validity

Journal of Autism and Childhood Schizophrenia, Vol. 6, No.
3, 1976
The Reliability and Diagnostic Validity of the

Physical and Neurological Examination
for Soft Signs (PANESS) 1
John S. Werry and Michael G. A m a n ~

University of Auckland, Auckland, New Zealand
Twenty-one children, mean age o f 8 years, were each examined on separate

occasions by two pediatric residents, blind to diagnosis, using the neurolog-
ical examination (PANESS) included in the group o f instruments recom-
mended by the National Institute o f Mental Health f o r psychotropic drug
studies in children. Half the children were hyperactive/aggressive, one
quarter were normal, and one quarter had histories or signs strongly pre-
sumptive o f brain damage. Many o f the signs, though reliable, did not
occur in the majority o f children. Examiners did achieve a high level o f
agreement about global neurological status. It was concluded that the
neurological examination probably contains a substantial number o f non-
contributory items and should be regarded as experimental rather than
definitive.
INTRODUCTION
It has long been recognized that children with psychiatric disorders, while
seldom having major neurological signs, often have a cluster of what have
come to be called " s o f t " or equivocal signs involving minor abnormalities
of reflexes and tone, but above all of sensorimotor coordination (Werry,
1972). Thus, to have any potential usefulness in child psychiatry, a system
of examination must include minor as well as major signs.
'This study was supported in part by a grant to Professor Werry from the Medical Research
Council of New Zealand and USPHS grant #MH 18909 from the National Institute of Health
to R. L. Sprague, Ph.D. Drs. M. Hudson and M. Morris performed the examinations. We
should like to pay particular tribute to Dr. Thelma Becroft, a school doctor in Auckland, who
supplied the normal and neurological subjects.
2Requests for reprints should be addressed to Prof. J. S. Werry, Department of Psychiatry,
School of Medicine, University of Auckland, P.B., Auckland, New Zealand.
253
9 1976 Plenum Publishing Corporation, 227 West 17th Street, New Y o r k , N . Y . 1 0 0 1 1 . NO
part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, microfilming, recording,
or otherwise, without written permission of the publisher.
254 Werry and Aman
While there have been efforts in the past (e.g., Paine & Oppe, 1966;
Ozer, 1969; Rutter, Graham, & Yule, 1970; Werry, Minde, Guzman, Weiss,
Dogan, & Hoy, 1970) to systematize wide-range neurological examinations
for child psychiatric patients, there is as yet no generally accepted method
(Werry et al., 1970). Also, conspicuously lacking with only a few exceptions
(Ozer, 1969; Rutter et al., 1970; Werry et al., 1970) are psychometric studies
of the reliability and validity of this type of neurological examination in
children (Werry, 1972). The most sophisticated examination of all, a
children's version of the Reitan Battery (Reitan & Heinemann, 1969) is
cumbersome, requires expensive equipment, and is difficult to score and
interpret.
One of the problem areas in pediatric psychopharmacology is the pre-
diction of those children who are likely to respond to medication. Broad
psychiatric diagnostic pointers are known, as are behavioral target
symptoms (Close, 1973), but there has been interest in predictors which
relate more directly to central nervous function (Wender, 1971). A recurrent
theme through the literature is that children with neurological signs or
"organic" children respond (or perhaps equally often do not respond) to
medication better than children who lack these signs (Conners, 1972;
Kornetsky, 1970; Wender, 1971; Werry, 1972).
Close (1973) has compiled a neurological examination especially for
drug studies in children and this has been incorporated into the recently
published (Psychopharmacology Bulletin, 1973) children's battery of psy-
chopharmacological measures compiled by the Early Clinical Drug Evalua-
tion Unit, (ECDEU) of the National Institute of Mental Health where it
appears as PANESS (Physical and Neurological Examination for Soft
Signs). It is important, therefore, since none appears to be available yet for
this now official instrument, that data attesting to the reliability and validity
be acquired before another test of unknown scientific worth becomes in d
extricably molded into the technique and literature of pediatric psycho-
pharmacology. The types of reliability of interest are interexaminer and
test-retest reliability. The validities of concern are those of ability to dis-
criminate among children who are normal, have minimal brain dysfunction,
and are neurologically disordered and to make predictions about drug
response.
The study to be described in this paper is concerned with interex-
aminer reliability and discriminative power of the individual signs and the
examination as a whole. However, as will become apparent below, certain
admittedly unproven assumptions about test-retest reliability were
necessarily made.
PANESS Examination for Soft Signs 255
METHOD
Subjects
The children for this study were selected from three sources to pro-
vide, it was hoped, a wide spectrum of both type and number of signs. Six
subjects were normal children in the local school system. Ten were in an on-
going project involved with the evaluation and treatment of hyperactive/
aggressive children, and five were children from the local school system
considered by the school doctor to have major neurological impairment but
without mental retardation. The hyperactive/aggressive children employed
in the study were fairly extreme behaviorally as judged by Conners' Teacher
Questionnaire (1969). Their standard scores (relative to a group of normals)
(Sprague, Christensen, & Werry, 1974) on the Conduct, Inattentive, and
Hyperactive factors were 4.07, 2.10, and 3.49, respectively. Background
details of the three groups are presented in Table I.
Procedure
Two senior residents in pediatrics at one of the University of
Auckland's teaching hospitals served as the examiners. Due to logistical
problems, 62% of the children were seen first by Examiner A then by Exam-
iner B while the remainder were first seen by Examiner B. For similar
reasons the interval between examinations varied from 1 to 110 days
(median of 5 days). A stipend for each child examined was paid to the
doctors since the examinations were necessarily somewhat tedious.
In order to enhance the prospects of obtaining reliability, a number of
features of the procedure were maintained constant. None of the children
Table I. Characteristics of the Three Groups
Normal Hyperactive Neurological Total
N 6 10 5 21
Mean age 119.8 90.2 105.6 96.6
(months)
Age range 104-143 61-121 72-146 61-146
Median time
between ex- 4 8.5 2 5
ares (days)
Percent exam- 83.3 50.0 60.0 61.9
ined by A
first
256 Werry and Aman
received psychotropic medication on or immediately preceding the days

they were tested. All examinations were conducted in the same room and
the time o f day o f the two tests was always the same for a given child.
Neither doctor had any knowledge o f the personal history or the diagnosis
of the subjects.
The system of examination was strictly according to the published
protocol (Close, 1973) and the two doctors practiced the examinations on
other children until they felt reasonably familiar with it. Each child was
offered a reward for his cooperation over the two half-hour examinations.
RESULTS AND DISCUSSION
Reliability and Occurrence of Items
The examination consists of 43 items, some of which are scored in

more than one dimension to yield a total of 56 scores. Most range in some
way or other over a 4-point scale of " n o impairment" through "severe
impairment."
To be useful, an item should be reliable and contribute to the ability
o f the examination to discriminate between different diagnostic groups. The
subjects o f this study were chosen on the assumption, based on previous
studies (Werry, 1972; Werry et al., 1970), that they would present a range o f
scores enabling judgments to be made about the items. The following possi-
bilities obtain for a given item: (1) It is unreliable. (2) It is reliable, but it
does not occur or it occurs too infrequently to be of much use and simply
prolongs the examination. (3) It is reliable and occurs in a range o f values
but does not discriminate between diagnostic groups. (4) It is reliable,
occurs, and discriminates.
Eighty-six percent o f the total signs occurred at least once in the
judgments o f both examiners. However, most o f these signs appeared in
only one or two children, and only 36% o f the items were judged as oc-
curring in a sample as small as 20% of the children. Only 12% o f the signs
occurred when the criterion was raised to 50% of the children. This suggests
that a large number of items are probably noncontributory. The level of
occurrence for each item of PANESS has been listed in Table II.
One way of depicting the qualities o f reliability and occurrence is by
constructing a four by four contingency table (such as is done for a X2 test)
for each item in which each child's position is plotted with one examiner's
score along the abscissa and the other along the ordinate. The ideal diag-
nostic sign then emerges with scores along the diagonal while a reliable but
nonoccurring sign clusters in one corner and an unreliable one has scores
Table II. Level of Occurrence for Items of PANESS
% of subjects in
whom sign was
observed by
both examiners Itemsa
0-10% 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 14,
15, 16, 17, 18, 19, 20, 21, 25, 27,
28, 29, 30, 31, 32, 35A, 35B,36B,
38B, 38C, 43A, 43B
11-20% 10, 23, 33, 36A, 37B, 37C, 40B
21-30% 13, 24, 26, 34, 39B, 42B
31-40% 22, 40C
41-50% 38A, 39C, 41B
51 60% None
61-70% 37A, 42C
71-80% None
81-90% 40A, 41C, 42A
91-100% 39A, 41A
aWhere letters occur following numbers, they refer to the

respective subcomponent of that item. Items falling into the
highest frequency categorieswould appear to occur too often.
Inspection of the data indicated that the degree of "pa-
thology" was more extreme in the hyperactive and neuro-
logical groups than in the normal group. Items axe identified
in the appendix.
scattered randomly around the table. Examples of these three possibilities

are set out in Table III. Unfortunately, Example 1 (reliable but not occurr-
ing) represents by far the commonest situation.
It is customary, however, to rely on some numerical method of
describing this p h e n o m e n o n of reliability and occurrence. A crude estimate
o f reliability was made by calculating the percentage of items in which there
w~ts agreement within one point for 75% or more o f the children. Ninety-
three percent o f items met this criterion of agreement within one point.
However, this takes no cognizance o f occurrence and, as pointed out by
Fleiss, Spitzer, Endicott, and Cohen (1972), makes no allowance for
agreement based simply on chance.
Selection o f a statistic presented problems due to the ordinal nature of
the scale and the small number o f gradations. TB, a correlation coefficient
proposed by Kendall and Stewart (1961) to measure the association between
two ranking_s with many ties appeared particularly suitable for these data,
though it is vulnerable to near perfect nonoccurrence exemplified in
Example I, Table III. Compared to the other T Statistics, T C and TG, it has
the property o f reaching an intermediate magnitude, and after inspection o f
the data, T B was selected as the most appropriate summary statistic o f the
three (Table IV).
258 Werry and Aman
Table III a
Example 1: Reliable b u t n o t occurring

4 0 0 0 0
3 0 0 0 0 T B = .548
2~ 2 0 1 0 0 T C = .109
< 1 18 2 0 0 T G = 1.00
X
1 2 3 4
EXAMINER A
Example 2: Reliable and occurring
4 0 0 0 1
3 0 2 3 0 T B = .411
< 2 3 6 1 0 TC'=- .353
X 1 1 2 1 0 T G = .582
1 2 3 4
EXAMINER A
Example 3: Unreliable
r 4 1 3 5 9
2: 3 0 0 0 0 T B = .012
2 0 0 0 0 TC = .006
<
1 1 0 0 2 TG = .029
1 2 3 4
EXAMINER A
a T h e frequencies falling into each cell represent the

n u m b e r of children jointly rated to have that degree
of impairment.
It can be seen that the majority of items achieve a TB value of .35,

particularly if they are o f the nonoccurring type clustering in the corner o f
the diagonal (Example 1, Table II). A few items actually had negative corre-
lations (sharp disagreement), while a disappointingly small number had cor-
relations of any magnitude.
Those which did seem satisfactory (an arbitrary level o f .40 or more
was employed) came almost exclusively from two circumscribed parts of the
examination, Persistence (particularly 30, 32, 33, and 34), and Repetitive
Movements (notably 38, 39, 41, and 42).
These sections are distinctive in that they provide actual measure-
ments or quantified values rather than categories based on judgments o f
"quality o f p e r f o r m a n c e . " The Persistence items use values based on clear-
cut time intervals obtained by a stopwatch, while scores on the Repetitive
Movement items are derived from the number o f taps. The additional
tapping items, relating to Adventitious Movement and Quality of Move-
ment, proved to be of questionable reliability. This suggests, as would be
Table IV. Numbers of Items Falling within Various Correlation

Ranges Using TBa
Value of TB Frequency Item numbers

.80 to .90 1 34
.70 to .80 1 32
.60 to .70 2 28, 42A
.50 to .60 6 3, 5, 24, 30, 39A, 40B
.40 to .50 8 4, 7, 12, 33, 38A, 40A, 41A, 42C
.30 to .40 12 2, 8, 13, 14, 20, 22, 25, 35, 37A,
38C, 40C, 41C
.20 to .30 6 10, 26, 27, 37C, 38B, 39B
.10 to .20 5 6, 9, 18, 23, 41B
0 to .10 4 15, 16, 36, 39C
-10 to 0 1 42B
<-.10 6 11, 21, 19,42B, 43A, 43B
alt was not stated in Kendall and Stewart (1961) (or a number of
other sources available to the authors) how much variance is ac-
counted for by TB correlation. If analogous to the product-
moment correlation, the variance accounted for would be
equivalent to TB squared.
expected, that dimensions which require some subjective global assessment

o n the part o f the examiner will prove to be less reliable t h a n those that
provide a unit o f measurement. U n f o r t u n a t e l y , the repetitive m o v e m e n t
items tended to occur in the n o r m a l g r o u p as well as in the hyperactive and
neurological groups, indicating that the items, as presently graded, m a y be
t o o stringent (see Table II).
Contribution of Items to Judgment of Neurological Status- Validity

As an a d j u n c t to the present study, an additional question was a d d e d
which asked the examiner to assess the subject's degree o f neurological
a b n o r m a l i t y as (1) none, (2) equivocal, (3) probable, or (4) definite. The
correlations o n this item were very high, indicating that the examiners
usually agreed on the neurological status o f the child: T B -- .762, T C =
.55, T G = .958. Indeed, the examiners agreed exactly on 17 o f the 21
children, while there was only minimal disagreement in the other four.
This high degree o f consensus regarding the status o f the children is
difficult to reconcile with the generally p o o r reliability o f individual test
items. It seems p r o b a b l e that the examiners were m a k i n g some global judg-
ment regarding the subject's status during the course o f the lengthy session.
W h e t h e r this is due to the " s u c c e s s " o f the examination or to the experience
o f the examiners can only be considered conjecture. A n a t t e m p t was m a d e
to assess the validity o f the exam b y determining the n u m b e r o f signs
observed in each o f the g r o u p s (Table V). The results were less t h a n con-
260 Werry and Aman
Table V. Effectiveness of PANESS in Discriminating

Groups-Validity
Median number of signs Range

Group (both examiners) (number signs)
Normal 7 2-13
Hyperactive 11.5 6-49
Neurological 15 6-18
vincing, in that the average number of signs differed only slightly over
groups and the amount of overlap was very substantial indeed?
Another approach to the validity of the exam might involve its value
in predicting response to drug treatment. It has been stated that hyperactive
children with signs of neurological dysfunction respond better to drug thera-
py than do children with no impairment (Kornetsky, 1970; Satterfield, Cant-
well, Saul, Lesser, & Podosin, 1973; Wender, 1971). Nine of the hyperactive
subjects also participated in a project assessing methylphenidate and halo-
peridol (Werry & Aman, 1975). When these children were classified as either
"dysfunctional" or " n o r m a l " on the basis of PANESS, the results suggested
that the normals actually responded more favorably (according to parental
reports) to the treatments than did those children with neurological signs.
This can only be very tentatively stated as the number clearly falling into the
categories (three and four, respectively) were exceedingly small, but it sug-
gests that PANESS may be less than successful as a predictive instrument.
CONCLUSIONS
While the system of examination is certainly largely reliable, it is defi-

nitely so only because of the absence of most of the signs. As with previous
similar efforts (Werry, 1972; Werry et al., 1970), PANESS as an examina-
tion specifically for soft signs appears to be rather less than successful since
less than half the signs appeared in as few as one child in five. Of course, it
is possible that the findings here are due to the sample of children (unlikely
in view of the global neurological findings) or to the inexperience of the
examiners. Whatever the reason, it is imperative that further studies be
done investigating the reliability, diagnostic power, and predictive qualities
of both individual items and the examination as a whole. In the meantime,
despite its official status (Psychopharmacology Bulletin, 1973), PANESS
should be regarded as strictly experimental and quite unproven in value as a
diagnostic or predictive instrument in pediatric psychopharmacology.
~Another study which specifically examined the diagnostic validity of PANESS has recently
come to the attention of the authors. Camp, Bialer, Press, and Winsberg (in press) studied 111
normal and 33 behaviorally deviant boys. Although PANESS showed a developmental trend,
it failed to discriminate between the two groups.
APPENDIX
Abbreviated Summary o f PANESS Items

(For complete information, see Close, 1973)
1. Touch finger to nose 26. Walk backwards, tandem
2. Touch other finger to nose fashion
3. Touch finger to nose, eyes
closed Cortical Sensibility
4. Touch other finger to nose, 27. Face-hand. Face a n d / o r hand
eyes closed brushed with cotton fluff (eyes
5. Touch one heel to other heel closed)
6. Repeat with other heel 28. Face-Noise. Brush face a n d / o r
7. Touch heel to leg, eyes closed click ipsilateral ear (eyes closed)
8. Touch other heel to leg, eyes 29. Two-point discrimination. 1 cm
closed separation, dorsum of digiti
minimi
Graphesthesia (palm tracing)
Items 9-16 Persistence Measurements
9. [] Right hand 30. Stick out tongue
10. X Left hand 31. Raise arms out in front
11. 0 Right hand 32. Close eyes
12. [] L e f t h a n d 33. Stand on one foot
13. X Right hand 34. Stand on other foot
14. 3 Left hand 35A Close your eyes and stand still
15. 0 Right hand B Tendency to fall?
16. 3 Left hand 36A Repeat 35A, tandem fashion
B Tendency to fall?
Stereognosis (Object recognition)
Itemg 17-20 Repetitive Movements
17. Coin Right hand Items 37A-42C
18. Ring Left hand 37. Tap, at demonstrated rate, left
19. Safety pin Right hand hand
20. Key Left hand A Number of taps
B Adventitious movements
Balance C Quality
Items 21-26 38. Repeat 37, right hand
21. Walk line on toes A Number of taps
22. Walk back on heels B Movements
23. H o p on one foot to end o f C Quality
line 39. Tap with left foot
24. H o p back on other foot A Number of taps
25. Walk to the e n d t a n d e m B Movements
fashion C Quality
262 Werry and Aman
40. T a p with right f o o t 42. T a p w i t h right finger a n d f o o t

A N u m b e r o f taps A N u m b e r o f taps
B Movements B Movements
C Quality C Quality
41. T a p with left finger a n d f o o t 43. N u m b e r o f times p a t i e n t fol-
A N u m b e r o f taps lows five m o t i o n s (string test)
B Movements A T o the left
C Quality B T o the right
REFERENCES
Camp, J. A., Bialer, I., Press, M., and Winsberg, B. G. (in press). The physical and neuro-
logical examination for soft signs (PANESS): Pediatric norms and comparisons be-
tween normal and deviant boys. Psychopharmacology Bulletin.
Close, J. Scored neurological examination m pharmacotherapy of children. Psychopharma-
cology Bulletin, Special Issue--Pharmacotherapy of Children, 1973, 142-148.
Conners, C. K. A teacher rating scale for use in drug studies with children. American Journal
of Psychiatry, 1969, 6, 152-156.
Conners, C. K. Pharmacotherapy of psychopathology in children. In H. C. Quay & J. S.
Werry (Eds.), Psychopathological disorders of childhood. New York: Wiley, 1972.
Fleiss, J. L., Spitzer, R. L., Endicott, J., & Cohen, J. Quantification of agreement in multiple
psychiatric diagnosis. Archives of General Psychiatry, 1972, 26, 169-171.
Kendall, M. G., & Stuart, A. The advanced theory of statistics (Vol. 2). London: Charles
Griffin & Company Limited, 1961.
Kornetsky, C. Psychoactive drugs in the immature organism. Psychopharmacologia, 1970, 17,
105-136.
Ozer, M. N. The neurological evaluation of school age children. Journal of Learning Dis-
abilities, 1969, 1, 87-84.
Paine, R. S., & Oppe, T. E. Neurological examination of children. Clinics in developmental
medicine (#20/21). London: Heinemann, 1966.
Psychopharmacology Bulletin Pharmacotherapy of children. Special issue, 1973 (No vol-
ume #).
Reitan, R., & Heinemann, C. Interactions of neurological deficits and emotional disturbances
in children with learning disorders. Method for differential assessment. In Learning dis-
orders 111. Seattle: Special Child Publication, 1969. Pp. 93-136.
Rutter, M., Graham, P., & Yule, W. A neuropsychiatric study in childhood. Clinics in devel-
opmental medicine. (#35/36). London: Heinemann, 1970.
Satterfield, J. H., Cantwell, D. P., Saul, R. E., Lesser, L. I., & Podosin, R. L. Response to
stimulant drug treatment in hyperactive children: Prediction from EEG and Neurolog-
ical findings. Journal of Autism and Childhood Schizophrenia, 1973, 3, 36-48.
Sprague, R., Christensen, D., & Werry, J. Experimental psychology and stimulant drugs. In
C. K. Conners (Ed.), Clinical use of stimulant drugs in children. Amsterdam: Excerpta
Medica, 1974.
Wender, P. Minimal brain dysfunction in children, New York: Wiley, 1971.
Werry, J. S. Studies on the hyperactive child IV--an empirical analysis of the syndrome of
minimal brain dysfunction. Archives of General Psychiatry, 1968, 19, 9-16.
Werry, J. S. Organic factors in psychopathology, tn H. C. Quay, & J. S. Werry (Eds.), Psy-
chopathological disorders of childhood. New York: Wiley, 1972.
Werry, J. S., & Aman, M. G. Methylphenidate and haloperidol in children. Effects on atten-
tion, memory, and activity. Archives of General Psychiatry, 1975, 32, 790-795.
Werry, J. S., Minde, K., Guzman, A., Weiss, G., Dogan, K., & Hoy, E. Studies on the hyper-
active child VII: Neurological status compared with neurotic and normal children.
American Journal of Orthopsychiatry, 1970, 42, 441-451.

The Reliability and Diagnostic Validity

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Reliability and Diagnostic Validity

Uploaded by

Copyright:

Available Formats

Journal of Autism and Childhood Schizophrenia, Vol. 6, No.

The Reliability and Diagnostic Validity of the

John S. Werry and Michael G. A m a n ~

Twenty-one children, mean age o f 8 years, were each examined on separate

Table I. Characteristics of the Three Groups

Normal Hyperactive Neurological Total

received psychotropic medication on or immediately preceding the days

RESULTS AND DISCUSSION

Reliability and Occurrence of Items

The examination consists of 43 items, some of which are scored in

Table II. Level of Occurrence for Items of PANESS

aWhere letters occur following numbers, they refer to the

scattered randomly around the table. Examples of these three possibilities

Example 1: Reliable b u t n o t occurring

Example 2: Reliable and occurring

a T h e frequencies falling into each cell represent the

It can be seen that the majority of items achieve a TB value of .35,

Table IV. Numbers of Items Falling within Various Correlation

Value of TB Frequency Item numbers

expected, that dimensions which require some subjective global assessment

Contribution of Items to Judgment of Neurological Status- Validity

Table V. Effectiveness of PANESS in Discriminating

Median number of signs Range

While the system of examination is certainly largely reliable, it is defi-

Abbreviated Summary o f PANESS Items

40. T a p with right f o o t 42. T a p w i t h right finger a n d f o o t

You might also like