You are on page 1of 12

See

discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/12373549

The Development and Well-Being


Assessment: Description and Initial
Validation of an Integrated Assessment of
Child and Adolescent Psychopathology
ARTICLE in JOURNAL OF CHILD PSYCHOLOGY AND PSYCHIATRY AUGUST 2000
Impact Factor: 6.46 DOI: 10.1111/j.1469-7610.2000.tb02345.x Source: PubMed

CITATIONS

READS

587

945

5 AUTHORS, INCLUDING:
Tamsin Jane Ford
University of Exeter
157 PUBLICATIONS 5,161 CITATIONS
SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate,


letting you access and read them immediately.

Available from: Tamsin Jane Ford


Retrieved on: 09 November 2015

J. Child Psycho/. Psychinf. Vol. 41, No. 5, pp. 64.%55, 2000


Cambridge University Press
02000 Association for Child Psychology and Psychiatry
Printed in Great Britain. All rights reserved
0021-9630/00 I 15.00 0.00

The Development and Well-Being Assessment : Description and Initial


Validation of an Integrated Assessment of Child and Adolescent
Psychopathology
Robert Goodman, Tamsin Ford, and Hilary Richards
Institute of Psychiatry, London, U.K.

Rebecca Gatward and Howard Meltzer


Office for National Statistics, London, U.K.
The Development and Well-Being Assessment (DAWBA) is a novel package of questionnaires, interviews, and rating techniques designed to generate ICD-10 and DSM-IV
psychiatric diagnoses on 5-ldyear-olds. Nonclinical interviewers administer a structured
interview to parents about psychiatric symptoms and resultant impact. When definite
symptoms are identified by the structured questions, interviewers use open-ended questions
and supplementary prompts to get parents to describe the problems in their own words.
These descriptions are transcribed verbatim by the interviewers but are not rated by them.
A similar interview is administered to 1 l-16-year-olds. Teachers complete a brief
questionnaire covering the main conduct, emotional, and hyperactivity symptoms and any
resultant impairment. The different sorts of information are brought together by a computer
program that also predicts likely diagnoses. These computer-generatedsummary sheets and
diagnoses form a convenient starting point for experienced clinical raters, who decide
whether to accept or overturn the computer diagnosis (or lack of diagnosis) in the light of
their review of all the data, including transcripts. In the present study, the DAWBA was
administered to community ( N = 491) and clinic ( N = 39) samples. There was excellent
discriminationbetween community and clinic samples in rates of diagnosed disorder. Within
the community sample, subjects with and without diagnosed disorders differed markedly in
external characteristics and prognosis. In the clinic sample, there was substantial agreement
between DAWBA and case note diagnoses, though the DAWBA diagnosed more comorbid
disorders. The use of screening questions and skip rules greatly reduced interview length by
allowingmany sections to be omitted with very little loss of positive information. Overall, the
DAWBA successfully combined the cheapness and simplicity of respondent-based measures
with the clinical persuasiveness of investigator-based diagnoses. The DAWBA has
considerable potential as an epidemiologicalmeasure, and may prove to be of clinical value
too.
Keyworh :Diagnosis, epidemiology, interviewing, mental health, methodology.
Abbreviations: ADHD : attention deficit hyperactivity disorder; CAPA : Child and Adolescent Psychiatric Assessment ; DAWBA : Development and Well-Being Assessment ;
DISC: Diagnostic Interview Schedule for Children; NOS: not otherwise specified; SDQ:
Strengths and Difficulties Questionnaire.

previous research findings, several related considerations


influenced the design of the DAWBA.

Introduction
The Development and Well-Being Assessment
(DAWBA) is an integrated package of measures of child
and adolescent psychopathology. It was initially designed
for a nationwide epidemiological survey of common
emotional and behavioural disorders in a representative
sample of over 10,000 British children and adolescents,
with the primary aim of informing the planning and
provision of services for affected children (Meltzer,
Gatward, Goodman, & Ford, 2000). Drawing upon

The Need to Measure Impact as Well as Symptoms


Defining psychiatric disorder solely in terms of psychiatric symptoms can result in implausibly high caseness
rates. For example, Bird et al. (1988) estimated from their
epidemiological study that 49.5% of Puerto Rican
children aged between 4 and 16 years met criteria for at
least one DSM-I11 diagnosis. As Bird et al. (1990) noted,
many of the children who were eligible for DSM-I11
diagnoses were not significantlysocially impaired by their
symptoms, did not seem in need of treatment, and did not
correspond to what clinicians would normally recognise

Requests for reprints to: Professor Robert Goodman, Department of Child and Adolescent Psychiatry, Institute of
Psychiatry, De Crespigny Park, London SE5 8AF, U.K.
645

646

R.GOODMAN et al.

as cases. This underlines the importance of defining


psychiatric disorders not only in terms of symptom
constellations, but also in terms of significant impact.
Including impact criteria can dramatically alter prevalence estimates. For example, in the Virginia Twin Study,
the population prevalence of DSM-111-R disorder was
41.8 Yo as judged by symptoms alone, falling to 1 1.4YO
when impairment criteria were included (Simonoff et al.,
1997). In DSM-IV (American Psychiatric Association,
1994), most of the common child psychiatric disorders
are now defined in terms of impact as well as symptoms;
operational criteria stipulate that symptoms must result
either in substantial distress for the child or in significant
impairment in the childs ability to fulfil normal role
expectations in everyday life. This same requirement for
impact, in terms of significant distress or social incapacity, characterises the diagnostic criteria employed in
the research version of ICD-10 (World Health Organisation, 1994).

The Need for Multiple Informants


There are two main reasons why a comprehensive
assessment of child psychopathology depends on information about the childs behaviour both at home and
at school. First, some diagnoses-most notably hyperkinesis (World Health Organisation, 1994) and attention
deficit hyperactivity disorder (ADHD ; American Psychiatric Association, 1 9 9 4 h a n only be made when
there is evidence that the disorder is present in two or
more settings, usually home and school. Second, other
behavioural problems may be highly situational, e.g.
severeconduct problems may be present at school but not
at home, or vice versa. The school perspective is also
important because troubled children and adolescents
commonly obtain help through the school system rather
than through mental health services (Burns et al., 1995).
Although it is possible to ask parents whether teachers
have complained of problems at school, it is clearly
preferable not to rely on such hearsay evidence but to
collect information directly from teachers as well as
parents. Young peoples self-reports can provide a
valuable third source of information. For example,
teenagers may describe worries or antisocial activities
that they have successfully hidden from the adults around
them.

Respondent-based Measures Are Easier to


Administer
In a nationwide epidemiological study, it is much easier
and cheaper to use questionnaires and structured interviews (administered by lay interviewers) to obtain respondent-based information on symptoms and impact
than to use semistructured interviews (administered by
clinical or highly trained interviewers) to obtain investigator-based information. Questionnaires are particularly
suitable for teachers, who are often unable to spare the
time for interviews, especially for long investigator-based
ones.

Clinically Informed Ratings Enhance Validity and


Clinical Relevance
An exclusive reliance on respondent-based information
is liable to undermine validity and clinical relevance ;

clinically informed ratings are particularly useful at three


points in the diagnostic process :
(1) Clarifying symptoms and impact. Respondents do
not necessarily understand the wording of questions, and
even if they do, their answers may reflect unrealistically
high (or low) expectations of what can normally be
expected of children at any particular age. Giving
respondents the opportunity to describe any possible
problems in their own words will often enable a clinically
informed rater to recognise misunderstandings or unrealistic standards.
(2) Combining information from different informants.
Having obtained information from multiple sources, how
can the diagnostic process integrate and reconcile conflicting information on symptoms and impact? There are
serious problems with many of the standard approaches.
One approach is to use a priori rules about whom to
believe, e.g. stipulating that teenagers are always better
than their parents at knowing if they are anxious or
depressed. Clinicians are likely to be suspicious of any
such rules, preferring to prioritise different informants
according to circumstances. For example, parents may
give a convincing account of their child being depressed,
whereas the child may insist that everything is fine,
whether out of bravado or a desire to be left alone.
Clinicians often choose to judge which informant to
believe from the quality of the narrative and other subtle
clues. A previous study has shown that clinically informed
raters can synthesise multi-source information in a way
that is both reliable and valid (Goodman, Yude,
Richards, 8z Taylor, 1996).
A second approach to combining multi-source information is to believe anyone who reports a positive
symptom or impairment. This is sometimes known as the
OR rule, since symptom X is deemed to be present if a
parent or a teacher or the young person reports it.
Unfortunately, this approach is bound to push up
apparent rates of disorder since false positive answers
always take priority over true negative ones. If you asked
enough informants-parents, grandparents, neighbours,
friends, teachers, and so on-then many symptoms would
be present according to someone.
A third approach is to abandon any attempt at
integration and simply report rates of disorders separately
according to the type of informant, e.g. reporting that the
rate of depressive disorders in teenagers is, say, 20 % by
their own account, 5 % by their parents account, and 1 %
by their teachers account. This approach fails to address
some of the key concerns of clinicians and service
planners. Clinicians often need to make binary decisions-such as whether to treat or not to treat-in the
face of conflicting accounts. Service planners need to
know whether to provide depression clinics for 1 % or
20 % of the teenage population.
( 3 ) Assigning not otherwise specified (NOS) diagnoses. Despite having psychiatric symptoms that result
in distress and social impairment, some children do not
meet the full criteria for an operationalised diagnosis
such as ADHD, separation anxiety disorder, or oppositional defiant disorder. With clinical judgement, these
children can be assigned nonoperationalised diagnoses,
e.g. ADHD, NOS; anxiety disorder, NOS; disruptive
behavior disorder, NOS. A substantial minority of
children with psychiatric disorders may fall between the
cracks of the operationalised diagnostic categories
(Angold, Costello, Farmer, Burns, & Erkanli, 1999;

DEVELOPMENT AND WELL-BEING ASSESSMENT


Goodman et al., 1996). Using clinical judgement to
recognise nonoperationalised disorders need not undermine reliability and validity (Goodman et al., 1996).

A Focus on the Present State


There are two main reasons why it is preferable to ask
about the present and the recent past rather than about
the childs lifetime history of emotional and behavioural
problems. First, enquiries about long time frames are
generally of unsatisfactory validity (Tanur, 1992), and
one recent investigation into methods for assessing the
prevalence of child and adolescent mental disorders
recommended focusing enquiries on the last month if
possible (Shaffer et al., 1996). Second, for a study focusing
on service planning, the need for services is clearly related
t o ongoing problems rather than to problems that have
long since resolved.

The Prevalence of Uncommon Disorders Requires a


Diflerent Approach
Even with a sample size of over 10,000 children from
the general community, it is not possible to generate
precise estimates of less common disorders such as
psychosis or selective mutism because these are only
likely to affect a handful of children in the sample.
Consequently, it is not appropriate to devote a lot of
interview time asking about these less common disorders.
Predicting the service need for rare but severe disorders is
likely to require a very different strategy, such as
surveying all the clinics and clinicians in a region.

Method
0ver view
The DAWBA measures were administered along with independent measures of mental health and service provision to
samples of 5-15-year-olds drawn from the community and from
psychiatric clinics. All assessments were carried out between
January and March, avoiding the autumn term when teachers
do not yet know their pupils well, and avoiding the summer
term when teachers are often caught up in end-of-year examinations. The diagnosis of clinic cases was independently established by a review of case notes. Most of the community
sample was followed up by questionnaire between 4 and
6 months later.

Community Sample
In order to pilot a nationwide survey ofchildren and teenagers
aged between 5 and 15, the Office for National Statistics used
experienced nonclinical interviewers to carry out a survey of
children drawn from 12different areas in England and Scotland.
Children in each area were identified from child benefit records;
child benefits are available without means testing and are
claimed on behalf of around 98 % of British children. Parents of
a random sample of children were invited to participate via the
Child Benefit Office, and 5 % opted out at this stage. Of the
remainder, 471 participated in the pilot study, representing
82 % of those approached (15 YOrefusal, 3 YOnoncontacts). The
pilot community sample consisted of these 471 individuals plus
an additional 20 community subjects who had participated a
year earlier in a pre-pilot survey, having been located through
household sampling (Goodman, Meltzer, & Bailey, 1998). A
parent interviewwas available for all but 1of the 491 community
subjects. There were 207 subjects aged between 11 and 15, of
whom 201 (97 %) were interviewed.Nearly all 491 families gave

647

their permission for a teacher to be contacted by postal


questionnaire: completed teacher questionnaires were returned
on 353 children (72 %). For the community sample as a whole,
the mean age (SD)was 9.9 years (3.2), and 51 % were male.
Although the pilot sample was drawn from 12 areas chosen to
provide a good geographical spread while being fairly representative,the sample was not selected and weighted to provide
an unbiased estimate of the true prevalence of psychiatric
disorder in British 5-1 5-year-olds-better estimates are available from the subsequent survey of over 10,000 children
(Meltzer et al., 2000).

Psychiatric Clinic Sample


In parallel with the pilot study of community subjects, nonclinical interviewers from the Office for National Statistics
assessed 39 subjects recruited from 3 child and adolescent
mental health clinics in Manchester and London. These clinic
subjectshad all had a clinical assessment and nearly all were still
receiving treatment from the clinic when reassessed as part of
the current study. A parent interview was available for all clinic
subjects. There were 20 subjects aged between 11 and 15, of
whom 16 (80 %)were interviewed. Completed teacher questionnaires were obtained on 17 children (44%). For the clinic
sample as a whole, the mean age (SD)was 11.0 years (2.6), and
79% were male. The community and clinic samples differed
significantly in age [t(527) = 2.1, p < .05] and gender [continuity-adjusted~ ~ (=1 10.5,
) p < .001]. Although these gender
and age differences were not taken into account in the analyses
reported here, the pattern of findings was not altered when
analyses were repeated after stratifying the sample by age or
gender.

The Development and Well-Being Assessment


(DA WBA)
The DAWBA involves four components: a parent interview,
an interview for young people aged 11 or more, a teacher
qqestionnaire, and a computer-assisted clinical diagnostic
rating based on the interviews and questionnaires. The measures
were designed with 5-1 6-year-oldsin mind, though the measures
were not applied to 16-year-olds in the present study since
a previous nationwide survey of adult mental health had already included 16-year-olds(Meltzer, Gill, Petticrew, & Hinds,
1995). The DAWBA interviewsand questionnaires are available
from http://www.iop.kcl.ac.uk/IoP/Departments/ChildPsy/
dawba/intro.stm along with a more detailed account of the
measures. As an indication of the length of the measures, the
paper version of the parent interview is 36 sides long and takes
around 50 minutes to administer to a community sample
(provided the skip rules described below are in use). The
correspondingyouth interview is 33 sides long and takes around
30 minutes to administer to a community sample. The teacher
questionnaire is four sides long. In the present study, the
interviews were computer assisted, with the interview being
programmed in Blaise (Statistics Netherlands). The survey
interviewers had no experience of child psychiatric surveys
beyond a 1-day introduction to the field; all the interviewers
found the interviews challenging but feasible and interesting,
and all were keen to participate in the subsequent main stage
study. The parents and young people who were interviewed
were also generally very positive about the study.
The DA WBAparent interview. The interview covers several
disorders in detail : separation anxiety, specific and social
phobias, post-traumatic stress disorder, obsessive compulsive
disorder, generalised anxiety, rnaj6r depression, hyperkinesis/
ADHD, and conduct-oppositional disorders. For each of these
disorders, the interview asks about all the symptoms,and other
criteria needed for an operationalised diagnosis according to
both DSM-IV (American Psychiatric Association, 1994) and
the research diagnostic version of ICD-10 (World Health
Organisation, 1994). Panic disorder, agoraphobia, autistic
disorders,eating disorders, tic disorders, and any other concerns

648

R. GOODMAN et al.

are covered more briefly, with clinical diagnoses of these


disorders being correspondingly more dependent on rating the
open-ended transcript.
The time frame of the interview is the present and the recent
past. For many disorders, the ICD-10 and DSM-IV diagnostic
criteria stipulate that the symptoms need to have persisted for a
specified number of months, e.g. a minimum of 6 months for
hyperactivity, oppositional-defiant disorder, and generalised
anxiety disorders. In these instances, the relevant section of the
DAWBA interview focuses on the childs symptoms over this
stipulated period. The time frame is longest for conduct disorder
(since DSM-IV criteria include the number of relevant behaviours displayed over the previous 12 months), and shortest
for most of the emotional disorders, where the focus is on the
last month, in line with previous recommendations (Shaffer et
al., 1996).
The interview incorporates skip rules that allow the
interviewer to omit many of the questions in a section unless
enough screening questions are positive. When the skip rules do
not operate, respondents are asked about all relevant ICD-10
and DSM-IV symptoms. Unless at least one of these symptoms
is definitely present (or two symptoms in the case of the
hyperactivity section), the final interview questions about
duration, onset, and impact are omitted. The skip rules were
formulated and refined during the pre-pilot study. To determine
if these skip rules worked satisfactorily in an independent
sample, the interviews in the current study were administered
without skip rules on around half of the community subjects
and on all the clinic subjects ( N = 262). This made it possible to
examine the number of instances in which a positive section
would have been inappropriately omitted had the skip rules
been operational. An interview section was counted as positive
if the responses to that section met the ICD-10 or DSM-IV
criteria for a disorder. The section was also counted as positive
if the respondent reported subthreshold symptoms and impact
but the clinical reviewer subsequently assigned the relevant
diagnosis.
In the presence of positive symptoms in any domain, parents
are asked supplementary questions about the impact of these
problems on the childs life. These domain-specific impact
questions cover resultant distress and interference with family
life, learning, friendships, and leisure activities.
The information elicited by the structured questions about
symptoms and impact is supplemented by semistructured
information.If definitesymptoms are identified by the structured
questions, interviewers are instructed to use open-ended questions and supplementary prompts to get the respondent to
describe the problems in their own words. These descriptions
are transcribed verbatim by the interviewers but are not rated
by them. Interviewers are also encouraged to provide additional
comments, where appropriate, on the respondents understanding and motivation.
The DAWBA interview for 11-16-year-olh. In most respects, the interview for 11-16-year-olds is exactly the same as
the interview for parents, except that it is in the first rather than
the third person. The sections on hyperactivity and oppositionality are much abbreviated, since previous work suggests that
youth self-report in these domains is of very limited validity
(Schwab-Stone et al., 1996). Conversely, more questions about
panic attacks were asked of young people than of parents since
the fleeting and largely subjective nature of these symptoms
makes self-report far more relevant than informant accounts. A
lower limit of 11 years stemmed from previous studies showing
that symptoms are not reliably reported by younger children
(Fallon & Schwab-Stone, 1994; Schwab-Stone, Fallon, Briggs,
& Crowther, 1994) and from similar experience during prepiloting of the DAWBA with 8-10-year-olds.
The D A WBA questionnaire for teachers. The teacher questionnaire covers the inattentive, impulsive, hyperactive, and
oppositional-conduct behaviours relevant to ICD- 10 and DSMIV diagnostic criteria, and also asks about common emotional
symptoms and any other concerns. Reports of definite problems
in the hyperactivity, conduct, or emotional domains are

followed by supplementary questions on the impact of these


problems on the childs life. These domain-specific impact
questions cover resultant distress and interferencewith learning
and peer relationships. There are free text sections throughout
the questionnaire for descriptions of problems or additional
concerns.
Computer-assistedclinical diagnosis. Experienced clinicians
review the data from all sources-structured interviews, questionnaires, and transcripts from parents, young people, and
teachers-before assigning each child ICD- 10 and DSM-IV
diagnoses (or no diagnosis). For ease and speed of rating, the
different sorts of information are brought together by computer.
The raters are also assisted by computerised diagnostic algorithms that determine whether the child meets the operationalised criteria for the commoner ICD-10 and DSM-IV
diagnoses, as judged from the respondents answers to structured questions. The computer diagnoses are not definitive, but
simply form a convenient starting point for the clinical raters
who decide whether to accept or overturn the computer
diagnosis (or lack of diagnosis) in the light of their review of all
the data, including transcripts.
The clinical raters perform four major tasks. First, they use
the transcripts to check whether respondents appear to have
understood the fully structured questions. Second, they decide
which informant to believe when presented with conflicting
information. Third, they assign a not otherwise specified
diagnosis when the child has clinically significant problems that
do not meet operationalised diagnostic criteria. Fourth, they
use information from the transcripts to diagnose less common
disorders such as anorexia nervosa or Tourette syndrome. In
the current study, the clinical ratings were done by two
experienced child psychiatrists (HR, RG), who discussed all
children with complex or borderline diagnoses before reaching
a consensus diagnosis.

External Validating Characteristics


Various measures of the community sample were obtained
independently of the DAWBA measures. Several of these
measures served as external validators of the DAWBA diagnosis. Parents, teachers, and young people over the age of l l
all completed the extended version of the Strengths and
DifEiculties Questionnaire (SDQ; Goodman, 1999). This version
of the SDQ includes a question that asks respondents if they
think that the child has difEiculties in one or more of the
following areas : emotions, concentration, behaviour, or being
able to get on with other people. Possible response categories
are No, Yes-minor difficulties, Yes-definite difficulties ,and Yes-severe difficulties . The definite and severe
difficulties categories were combined to form the basis for three
variables: Parents say there is a problem, Teacher says
there is a problem, and Young person says there is a
problem. Parent, teacher, and self-completed SDQs were also
used to generate emotional, conduct, and hyperactivity scores
in the standard manner (Goodman, 1997; Goodman et al.,
1998). The parent interview included questions on recent
consultations with child and adolescent psychiatrists, psychologists, psychotherapists, or psychiatric nurses. Consultations
with one or more of these professionals formed the basis for the
variable Mental health care provided . The teacher questionnaire asked whether the child had received any specific
help for emotional or behavioural problems from teachers,
educational psychologists, or other professionals working
within the school setting during that school year. This question
formed the basis for the variable School help provided.

Clinical Case Note Diagnosis


An experienced child psychiatrist (TF) who was blind to the
DAWBA findings reviewed the case notes of all 39 children
from the psychiatricclinic sample to determine which diagnoses,
if any, the child would have met at the time when the DAWBA

649

DEVELOPMENT AND WELL-BEING ASSESSMENT


measures were administered. Because of small cell sizes, case
note diagnoses were merged into three categories for most
analyses : hyperkinesis or ADHD ; oppositional or conduct
disorders; and emotional disorders (anxiety and depressive
disorders). For each category, disorders were rated as absent,
possible, or definite. Using this system, 2 researchers independently rated 20 case notes from 1 of the clinics used in the
present study; the kappa coefficients were .93 for hyperkinesisl
ADHD, 1.O for oppositional-conduct disorders, and .67 for
emotional disorders (Goodman, Renfrew, & Mullick, unpublished data). In the present study, DAWBA and case note
diagnoses were not compared until each had been finalised
independently of the other. At this stage, agreement was
examined for each broad-band diagnosis. Overall agreement
between the detailed DAWBA and case note formulations was
also examined, being rated on a 3-point scale: 0 = poor
agreement, 1 = partial agreement, and 2 = substantial or total
agreement. When 2 psychiatrists independently made this rating
on all 39 clinic cases, the kappa was .64.

DAWBA measures would support their validity. It was


not necessary to assume that all clinic cases still had a
psychiatric disorder at the time of the DAWBA assessment, nor was it necessary to assume that community
cases were free from all psychiatric disorder. As shown in
Table 1, there were marked differences between the rates
of DAWBA-diagnosed disorders in the community and
clinic samples, with odds ratios between 13 and 102. At
least one ICD-10 or DSM-IV disorder was diagnosed in
11 YOof the community sample as compared with 92 % of
the clinic sample. This corresponds to a minimum
estimate of 89 % specificity in the community sample and
92 % sensitivity in the clinic sample (based on the extreme
and implausible assumption that all of the community
sample with DAWBA diagnoses were false positives and
all of the clinic sample without psychiatric diagnoses were
false negatives).

Questionnaire Follow-up

The Community Sample

Between 4 and 6months after the original assessment, a


postal copy of the SDQ was sent to the 471 parents who had
originally been recruited into the community sample via child
benefit records. Complete SDQ data was available both initially
and at follow-up on 350 individuals (74 %). The total difficulties
score was calculated in the standard manner (Goodman, 1997).

The second approach to validation considered only the


community sample. If the DAWBA measures were valid,
then individuals with and without DAWBA diagnoses
should differ in predictable ways on independent
measures. Children with a DAWBA diagnosis were
predicted to be substantially more likely to be known to
child mental health professionals, to be receiving help for
emotional or behavioural problems at school, and to be
judged to have a psychiatric problem by parents, teachers,
or the young people themselves. All these predictions
were confirmed with odds ratios of between 8 and 27
(Table 2). Children with different categories of DAWBA
diagnoses-motional,
conduct, or hyperactivity disorders-were predicted to have contrasting profiles of
SDQ scores in these domains. This was indeed the case:
for each disorder and each class of rater (parent, teacher,
self), the highest SDQ scores were in the predicted domain

Results
Comparing Community and Clinic Samples
The first analytic strategy used to examine the validity
of the DAWBA involved a comparison of the clinical and
community samples on rates of DAWBA-diagnosed
disorders. The only assumption underlying this comparison was that the true rate of psychiatric disorder was
substantially higher in the clinic than in the community
sample, so that demonstrating contrasting rates with the

Table 1
Rates of D A W B A Diagnoses in Community and Clinic Children

Any disorder"
Anxiety disorder"
Major depressive disorder"
Conduct-oppositional disorders"
Hyperkinesisb
ADHD'

Psychiatric clinic
sample ( N = 39)

Community
sample ( N = 491)

Odds
ratio

92.3 Yo (36)
43.6% (17)
20.5% (8)
46.2 'Yo (18)
41.0% (16)
48.7% (19)

10.6% (52)
5.5 % (27)
0.8% (4)
3.5% (17)
1.4% (7)
2.4% (12)

101.3
13.3
31.4
23.9
48.1
37.9

"ICD-10 or DSM-IV.
ICD- 10.
'DSM-IV.
p < .001 for all comparisons of clinic and community sample (continuity adjusted xz).

Table 2
Independent Correlates of a D A W B A Diagnosis in the Community Sample

Parents say there is a problem


Teachers say there is a problem
Young person says there is a problem
Mental health care provided
School help provided

DAWBA
diagnosis ( N = 52)

No DAWBA
diagnosis ( N = 439)

Odds
ratio

38.5% (20/52)
50.0% (18/36)
25.0 % (6/24)
26.9 % (14/52)
41.6% (15/36)

2.7% (12/438)
7.3 % (23/317)
4.1% (7/177)
1.4% (6/438)
5.7% (18/317)

22.2
12.8
8.1
26.5
11.9

p < .001 for all comparisons of children with and without DAWBA diagnoses (continuity
adjusted x').

R. GOODMAN et al.

650

Emotional disorders

Figure I .

SDQ profiles of community children with different DAWBA diagnosis.

(Fig. 1).Children with DAWBA diagnoseswere predicted


to have more persistent problems than comparably
symptomatic children without diagnoses, as confirmed by
the results shown in Fig. 2. Of the 350 children who were
assessed using parent SDQs both initially and at followup 4-6 months later, 35 had a DAWBA diagnosis; as a
group, their SDQ total difficulties score did not fall with
time. Of the remaining children without DAWBA diagnoses, there were 63 children who scored in the top 20 %
on total difficulties score; as a group, they regressed

substantially towards the mean. The lack of regression


towards the mean in the DAWBA-diagnosed group was
significant, with the presence or absence of a DAWBA
diagnosis predicting Time 2 score after covarying for
Time 1 score ( p < .002).

The Clinic Sample


The third approach to validation considered only the
clinic sample. How far did the DAWBA and case note

DEVELOPMENT AND WELL-BEING ASSESSMENT


Diagnosed disorder

..............

(N=35)

.................

........................
Mean
SDQ

Score

12

65 I

+I

NOdisorder,
high scorers
(N=63)

lo

(Parent-rated)

No disorder, low scorers


(N=252)

*-------------.
0

4-6 month
follow-up

Initial survey
Figure 2. DAWBA-identified caseness predicts persistence of problems.

Table 3
DA WBA and Case Note Diagnoses on the Clinic Sample
Case note diagnosis
DAWBA diagnosis
Emotional disorders (ICD-10 or DSM-IV)
Absent

Present
~ ~ ( trend
1 ) = 11.6,p = .001; Kendalls tau b = 0.52
Conduct-oppositionaldisorders (ICD-10 or DSM-IV)
Absent

Present

Absent

Possible

Definite

16
6

11
2

9
10

16
3

3
1

1
15

~ ~ ( for
1 )trend = 9.4, p = .002; Kendalls tau b = 0.47

ADHD-hyperkineticdisorders (ICD-10 or DSM-IV)


Absent
Present
~ ~ ( for
1 ) trend = 2 0 . 3 , =
~ .001; Kendalls tau b = 0.70

diagnoses coincide? The underlying assumption was that


if the DAWBA were a valid measure, there should be
substantial overlap between the DAWBA formulation
and the independent formulation of a good psychiatric
clinic. The diagnosis based on case note review was not
considered the gold standard that could form the basis
for calculating sensitivity and specificity. This was partly
because the clinical assessments were not standardised
and were carried out by a variety of professionals of
different levels of seniority. In addition, case notes were
often insufficiently detailed for a definite rating of
disorder, leading the researcher to opt in many cases for
a possible rating instead; this was particularly a
problem for comorbid diagnoses since case notes often
focused on what the clinician regarded as the primary
diagnosis.
Table 3 shows the cross-tabulation of DAWBA and
case note diagnoses for each of the three main diagnostic

groupings : emotional disorders, conduct disorders, and


hyperkinesis/ADHD. Review of the psychiatric clinic
notes suggested a definite diagnosis in 1 of these areas in
30 instances; the DAWBA diagnosed the same disorder
in 28 instances (93%). However, the DAWBA also
diagnosed disorders that were not rated as definitely
present on the basis of a case note review-1 7 of these
28 false positives had been rated from the case notes as
having possible rather than absent disorders. Furthermore, 19 of the DAWBAs 28 false positives were of
comorbid diagnoses, i.e. the DAWBA agreed with the
principal diagnosis reported in the case notes but also
diagnosed 1 or more additional disorders. Overall agreement between the DAWBA and case note formulations
on the 39 clinic cases was classified as substantial or total
in 49 % (19/39), partial in 46 % (18/39), and poor in 5 %
(2/39). The DAWBA correctly identified all six children
with case note diagnoses of less common disorders:

R. GOODMAN et al.

652

Table 4
EfJicacy of Skip Rules

Disorder
Separation anxiety
Specific phobia
Social phobia
Post-traumatic stress disorder
Obsessive-compulsivedisorder
Generalised anxiety
Depression
ADHD/Hyperkinesis
Oppositional-defiant disorder
Conduct disorder

Proportion who
skip section

Informant

Positive cases
missedawith
skip rules

Communityb

Clinicc

Parent
Child
Parent
Child
Parent
Child
Parent
Child
Parent
Child
Parent
Child
Parent
Child
Parent
Parent
Parent
Child

018
012
2/14
012
019
0/4
015
014
113
013
0114
017
017
015
1/23
1/16
019
117

78 %
77 %
77 %
72 %
79 %
79 %
93 yo
91 %
83 Yo
79 Yo
77 yo
74 Yo
61 %
45 yo
73 %
76 Yo
91 %
71 Yo

31 %
56 %
33 %
37 %
31 %
56 %
77 %
62 %
33 %
50 %
36 %
37 yo
26 %
31 yo
10%
23 yo
28 %
62 Yo

4 % (61142)

76 %

40 %

Average

a Missed cases are those where a positive section would have been skipped had the skip rules
been applied.
bCommunity sample = 223 subjects: all parents interviewed without skip rules; 91 young
people were interviewed without skip rules.
Psychiatric clinic sample = 39 subjects: all parents interviewed without skip rules; 16 young
people were interviewed without skip rules.

three with pervasive developmental disorders, one with


schizophrenia, one with anorexia nervosa, and one with
Tourette syndrome.

Skip Rules
The interviews were administered without using the
skip rules on 262 subjects, 223 of whom were from the
community sample and 39 from the psychiatric clinic
sample. Even when the respondent answered the screening question(s) negatively, the interviewer continued with
the rest of the section. This made it possible to examine
how many sections would have been omitted inappropriately had the skip rules been operating. Table 4 shows
that 4.2 % (95 % confidence interval 0.9-7.5 %) of positive sections would have been missed had the skip rules
been in place. The cost in missed diagnoses can be set
against the extent to which skip rules shorten the
interview. With skip rules in place, 76% of sections for
the community sample could have been omitted after the
screening questions; the corresponding proportion for
clinic cases was 40 % .

0ver turning Computer Diagnoses


The clinical raters found the computer-assigned diagnoses helpful as a starting point for their clinical review
and formulation. Nevertheless, the clinician-assigned
diagnoses commonly differed from the computer-assigned diagnoses. Thus 43 (8.8 %) of the 491 community
subjects had an ICD- 10 or DSM-IV diagnosis according
to the computerised algorithms ;the clinical raters judged
11 (2.2%) of these subjects not to have a psychiatric
disorder (false positives) while giving diagnoses to an

additional 20 subjects (4.1 %) who had not been diagnosed by the computer (false negatives). Numbers were
too small to warrant detailed breakdowns of the types of
false positives and negatives, or to permit meaningful
comparisons of the predictive or concurrent validity of
computer-assigned and clinician-assigned diagnoses. The
following three case vignettes provide illustrative examples of subjects whose computer-assigned diagnoses
were changed by the clinical raters.
Subject I : Excluding a computer-assigned diagnosis.
A 13-year-old boy was given a computer diagnosis of a
specific phobia because he had a fear that resulted in
significant distress and avoidance. In his open-ended
description of the fear, he explained that boys from
another school had threatened him on his way home on
several occasions. Since then, he had been afraid of this
gang and had taken a considerably longer route home
every day in order to avoid them. The clinical rater judged
his fear and avoidance to be appropriate responses to a
realistic danger and not a phobia. (Relying on young
respondents to judge whether their own fears are realistic
or exaggerated would clearly be unsatisfactory, since
many young people with phobias lack insight into the
unrealistic nature of their fears.)
Subject 2: Including a diagnosis not made by the
computer. A 7-year-old girl fell just short of the computer algorithms threshold for a diagnosis of ADHD
because the teacher reported that the problems with
restlessness and inattentiveness resulted in very little
impairment in learning and peer relationships at school.
A review of all the evidence showed that the girl had
officially recognised special educational needs as a result
of hyperactivity problems, could not concentrate in class
for morethan 2 minutes at a time even on activities she
enjoyed, and had been offered a trial of medication. The

DEVELOPMENT A N D WELL-BEING ASSESSMENT

clinician concluded that the teachers report of minimal


impairment was an understatement, allowing a clinical
diagnosis of ADHD to be made.
Subject 3: Both adding to and subtracting from computer-assigned diagnoses. A 14-year-old girl received
computer-assigned diagnoses of simple phobia, major
depression, and oppositional-defiant disorder. Both the
girl and her mother had also answered yes to the
interview question about concern about dieting or thinness. The transcripts of the open-ended comments provided by the girl and her mother included convincing
descriptions not only of a depressive disorder but also of
anorexia nervosa of 1 years duration. The supposed
phobia was an anorexic fear of food, and the oppositionality had only been present for a year and was primarily
related to battles over food intake. Consequently, the
clinical rater made the additional diagnosis of anorexia
nervosa and overturned the diagnoses of simple phobia
and oppositional-defiant disorder.

Discussion
Three lines of evidence support the validity of the
DAWBA. First, the rates of all psychiatric disorders were
substantially higher in the clinic than in the community
sample. Second, in the community sample, subjects with
and without DAWBA diagnoses differed markedly in
external characteristics and prognosis. Third, in the
clinical sample, there was considerable overlap between
DAWBA and case note diagnoses; when the diagnoses
differed, this was nearly always due to the DAWBA
diagnosing comorbid disorders not diagnosed from the
case notes. Further studies will need to clarify whether the
DAWBA over-diagnoses comorbidity or whether British
clinicians tend to under-diagnose comorbidity.
Interviewers and interviewees generally enjoyed the
interviews, particularly when use of the skip rules kept the
interview brief. These skip rules functioned well, allowing
76 % of sections to be skipped in the community sample
at the relatively small cost that 4 % of positive interview
sections were wrongly omitted.
The DAWBA successfully combines the features of
respondent-based and investigator-based measures. It
resembles a respondent-based measure such as the Diagnostic Interview Schedule for Children (DISC; Shaffer et
al., 1996) in that it uses lay interviewers, fixed questions,
and computerised diagnostic algorithms. The two main
differences are that the lay interviewers also transcribe
detailed verbatim responses to open-ended questions, and
that clinical raters use these transcripts to generate
clinically informed diagnoses that sometimes over-rule
the computerised diagnoses. Including a clinical review
only added about 10% to the cost of the survey
(unpublished data). We predict that using clinical rather
than computer diagnoses will generate findings that are
more relevant to service planning. To test this, ongoing
prospective studies of larger samples are comparing the
predictive validity of computer-generated and cliniciangenerated diagnoses in terms of outcome and service use.
Existing investigator-based measures such as the Child
and Adolescent Psychiatric Assessment (CAPA ;Angold
et al., 1995) use clinicians or highly trained nonclinical
interviewers to administer semistructured interviewers to
parents and children. Using flexible questioning, the
interviewer elicits enough information to rate the presence and severity of symptoms and resultant impairments. These interviewer-based ratings can form the basis

653

for computerised diagnostic algorithms. The clinical


rating involved in the DAWBA fulfils a similar role but
has some distinctive disadvantages and advantages. With
a traditional semistructured interview, the person who
rates the symptoms is the same person who carries out the
interview, so interviewers can go on asking questions and
clarifying details until they are confident that they can
make their ratings. By contrast, the DAWBA clinical
raters have to judge whether symptoms were present or
not on the basis of the answers obtained by lay interviewers at some earlier time. Detailed transcripts of
answers to open-ended questions generally provide
enough information to do this, but when they do not, the
clinical raters cannot themselves ask supplementary questions. This undoubted disadvantage is offset by a major
economy-expensive and scarce clinical time is not
wasted either on routine interviewing or on travelling to
and from households scattered over a large geographical
area. The main stage survey that followed this study used
over 200 lay interviewers in the field but only required
three clinical raters back at base. In addition, the
DAWBA clinical raters combine information from all
sources to make two important judgements : which
informants to prioritise when there is a clash of information, and whether to assign not otherwise specified diagnoses when children have substantial problems
that do not meet operationalised diagnostic criteria.
These two key judgements cannot generally be made at
the time of the initial interviews, whether semistructured
or not, which is part of the rationale for the DAWBA
method of using clinical input at the overview rather
than the interview stage.
The DAWBAs manner of combining the cheapness
and simplicity of respondent-based measures with the
clinical persuasiveness of investigator-based measures is
novel. Previously, researchers wanting to combine the
advantages of respondent- and investigator-based measures have used multi-phase designs. For example, many
studies have used screening questionnaires or structured
interviews in a first phase and have then selected screenpositive and some screen-negative subjects for a second
phase involving semistructured interviews administered
by clinically trained interviewers (e.g. Costello et al., 1996;
Rutter, Cox, Tupling, Berger, & Yule, 1975; Taylor,
Sandberg, Thorley, & Giles, 1991). By contrast, the
DAWBA approach combines respondent- and investigator-based measures in a single phase, which has several
practical advantages including ease of analysis and
avoidance of the risk of families dropping out between
phases (Deming, 1977). Although common sense suggests
that interviewing all families is bound to be considerably
more expensive than a multi-stage design that only
involves interviewing a proportion of families, this is not
necessarily true (Newman, Shrout, & Bland, 1990). To
estimate prevalence with adequate precision in a multiphase study, it is often necessary to interview a surprisingly high proportion of screen-negative subjects.
When this requirement is combined with the need for
repeated visits to families who are participating in more
than one phase, then the economies of the multi-phase
design are generally modest unless the disorder is rare and
the screening test has excellent sensitivity and specificity.
When the benefits of a multi-phase design are modest, the
other advantages of a one-phase design may commend it
to researqhers. The DAWBA seems to be a suitable
assessment battery for a one-phase study that aims to
combine respondent- and investigator-based measures.

654

R. GOODMAN et al.

Alternatively, the DAWBA can be used as a second phase


in a two-phase study.
The current study examined validity rather than
reliability. Of course, the evidence for validity provides
indirect evidence for reliability too; an unreliable set of
measures would also have done poorly on tests of validity.
A previous study of the clinical rating system incorporated into the DAWBA showed that the method was of
satisfactory inter-rater reliability with kappas of around
.7 (Goodman et al., 1996). Further tests of the inter-rater
reliability of the clinical rating component of the
DAWBA are currently under way.
Ideally, we would have wanted to measure the testretest reliability of the DAWBA on a large sample of
children from psychiatric clinics and the community. We
would also have liked to compare the DAWBA with
other well-established assessments, including respondentbased measures such as the DISC (Shaffer et al., 1996)
and investigator-based measures such as the CAPA
(Angold et al., 1995). It was not possible to do this, partly
because we did not wish to over-burden families whom
we wanted to engage in a longitudinal study. In addition,
however, designs that involve administering two lengthy
interviews in fairly rapid succession are problematic. The
main problem is that participating in the first interview
markedly alters the way respondents behave in the second
interview (e.g. Jensen et al., 1995). Particularly in community samples, respondents admit to fewer problems on
the second interview, perhaps in part because they are
bored with the process and want to get it over with faster.
This attenuation of yes responses is more marked if
the interval between interviews is brief, but increasing the
interval to avoid this leads instead to a different problem,
namely that the childs mental state is more likely to have
changed in the interim. In effect, there is a psychiatric
uncertainty principle: it is not possible to assess psychopathology both accurately and frequently because the
most accurate measures are too long to be repeated
without inducing respondent fatigue, while the most
repeatable measures are less accurate as a consequence of
their brevity. This poses serious problems for the investigator who wants to establish the test-retest reliability
of a lengthy assessment-measured reliability is likely to
be artefactually low. Comparing the validity of two
lengthy assessments is potentially less problematic since
subjects can potentially be randomised to receive one
measure or the other, subsequently comparing the two
measures in terms of concurrent and predictive validity,
and also monetary cost. It will be important in the future
to compare the DAWBA with other respondent- or
investigator-based assessment tools, both in epidemiological and clinical settings.
Acknowledgements-We are very grateful to all the parents,

teachers, children, and interviewers who took part in the


study, to the staff of the three participating psychiatric clinics
(Royal Manchester Childrens Hospital ;Withington Hospital,
Manchester ; Department of Child and Adolescent Psychiatry, Hounslow), to Pippa Hoad, Helen Simmons, and other
colleagues. The study was funded by the Department of
Health.
References
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC :
Author.

Angold, A,, Costello, E. J., Farmer, E. M. Z., Burns, B. J., &


Erkanli, A. (1999). Impaired but undiagnosed. Journal of the
American Academy of Child and Adolescent Psychiatry, 38,
129-1 37.
Angold, A., Prendergast, M., Cox, A., Harrington, R.,
Simonoff, E., & Rutter, M. (1995). The Child and
Adolescent Psychiatric Assessment (CAPA). Psychological
Medicine, 25, 739-753.
Bird, H. B., Canino, G., Rubio-Stipec, M., Gould, M. S.,

Ribera, J., Sesman, M., Woodbury, M., Huertas-Goldman,


S., Pagan, A., Sanchez-Lacay, A., & Moscosco, M. (1988).
Estimates of the prevalence of childhood maladjustment in a
community survey in Puerto Rico. Archives of General
Psychiatry, 45, 1120-1 126.

Bird, H. B., Yager, T. J., Staghezza, B., Gould, M. S., Canino,


G., & Rubio-Stipec, M. (1990). Impairment in the epidemiological measurement of childhood psychopathology in the
community. Journal of the American Academy of Child and
Adolescent Psychiatry, 29, 796-803.

Burns, B. J., Costello, E. J., Angold, A., Tweed, D., Stangl, D.,
Farmer, E. M. Z., & Erkanli, A. (1995). Childrens mental
health service use across service sectors. Health Affairs, 14,
147-159.

Costello, E. J., Angold, A., Burns, B. J., Stangl, D. K., Tweed,


D. L., Erkanli, A., & Worthman, C. M. (1996). The Great
Smoky Mountains study of youth: Goals, design, methods
,and the prevalence of DSM-111-R disorders. Archives of
General Psychiatry, 53, 1129-1 136.

Deming, W. E. (1977). An essay on screening, or on two-phase


sampling, applied to surveys of a community. International
Statistical Review, 45, 29-37.

Fallon, T., & Schwab-Stone, M. (1994). Determinants of


reliability in psychiatric surveys of children aged 6 to 12.
Journal of Child Psychology and Psychiatry, 35, 1391-1408.

Goodman, R. (1997). The Strengths and Difficulties Questionnaire: A research note. Journal of Child Psychology and
Psychiatry, 38, 58 1-586.

Goodman, R. (1999). The extended version of the Strengths and


Difficulties Questionnaire as a guide to child psychiatric
casenessand consequent burden. Journal of Child Psychology
and Psychiatry, 40, 791-801.

Goodman, R., Meltzer, H., & Bailey, V. (1998). The Strengths


and Difficulties Questionnaire: A pilot study on the validity
of the self-report version. European Child and Adolescent
Psychiatry, 7, 125-130.

Goodman, R., Yude, C., Richards, H., & Taylor, E. (1996).


Rating child psychiatric caseness from detailed case histories.
Journal of Child Psychology and Psychiatry, 37, 369-379.

Jensen, P. S., Roper, M., Fisher, P., Piacentini, J., Canino, G.,
Richters, J., Rubio-Stipec, M., Dulkan, M. K., Goodman, S.,
Davies, M., Rae, D., ShaiTer, D., Bird, H. R., Lahey, B. B., &
Schwab-Stone, M. E. (1995). Test-retest reliability of the
Diagnostic Interview Schedule for Children (DISC 2.1).
Archives of General Psychiatry, 52, 61-71.

Meltzer, H., Gatward, R., Goodman, R., & Ford, T. (2000).


Mental health of children and adolescents in Great Britain.

London: The Stationery Office.


Meltzer, H., Gill,B., Petticrew, M., & Hinds, K. (1995). OPCS
surveys of psychiatric morbidity in Great Britain, Report 1.
The prevalence of psychiatric morbidity among adults living in
private households. London: HMSO.
Newman, S . C., Shout, P. E., & Bland, R. C. (1990). The
efficiency of two-phase designs in prevalence surveys of
mental disorders. Psychological Medicine, 20, 183-1 93.
Rutter, M., Cox, A., Tupling, C., Berger, M., & Yule, W .
(1975). Attainment and adjustment in two geographical
areas. I-The prevalence of psychiatric disorder. British
Journal of Psychiatry, 126,493-509.

Schwab-Stone, M., Fallon, T., Briggs, M., & Crowther, B.


(1994). Reliability of diagnostic reporting for children aged
6-1 1 ye,ars; a test-retest reliability study of the Diagnostic
Interview Schedule for Children-Revised. American Journal
of Psychiatry, 151, 1048-1054.

DEVELOPMENT AND WELL-BEING ASSESSMENT


Schwab-Stone, M. E., Shaffer, D., Dulkan, M. K., Jensen,
P. S., Fisher, P., Bird, H. R., Goodman, S. H., Lahey, B. B.,
Lichtman, J. H., Canino, G., Rubio-Stipec, M., & Rae, D. S.
(1996). Criterion validity of the NIMH Diagnostic Interview
Schedule for Children Version 2.3 (DISC-2.3). Journal of the
American Academy of Child and Adolescent Psychiatry, 35,
878-888.
Shaffer, D., Fisher, P., Dulkan, M. K., Davies, M., Piacentini,
J., Schwab-Stone, M. E., Lahey, B. B., Bourdon, K., Jensen,
P. S., Bird, H. R., Canino, G., & Regier, D. A. (1996). The
NIMH Diagnostic Interview Schedule for Children Version
2.3 (DISC-2.3) : Description, acceptability, prevalence rates,
and performance in the MECA study. Journalof the American
Academy of Child and Adolescent Psychiatry, 35, 865-877.
Simonoff, E., Pickles, A., Meyer, J. M., Silberg, J., Maes, H. H.,
Loeber, R., Rutter, M., Hewitt, 3. K., & Eaves, L. J. (1997).

655

The Virginia twin study of adolescent behavioral development: Influences of age, gender and impairment on rates of
disorder. Archives of General Psychiatry, 54, 801-808.
Tanur, J. M. (1992). Questions about questions: Inquiries into
the cognitive bases of surveys. New York: Russell Sage
Foundation.
Taylor, E., Sandberg, S., Thorley, G., & Giles, S . (1991). The
epidemiology of childhood hyperactivity. Institute of Psychiatry: Maudsley Monographs, 33. Oxford: Oxford University Press.
World Health Organisation. (1994). The ICD-I0 classification
of mental and behavioural disorders: Diagnostic criteria for
research. Geneva: Author.
Manuscript accepted 17 January 2000

You might also like