You are on page 1of 7

Acta psychiatr. scand.

1985:72:239-245
Key words: Observer depression scales; Hamilton Depression
Scale; Bech Rafaelsen Melancholia Scale; Montgomery
Asberg Depression Rating Scale.

Comparative analysis of observer depression scales


W.Maier and M. Philipp
Department of Psychiatry (Head: Prof. Dr. O.Benkert),
University of Mainz, W. Germany

ABSTRACT - The Hamilton Depression Scale (HAMD), Bech Rafaelsen Melancholia


Scale (BRMS) and Montgomery Asberg Depression Rating Scale (MADRS) are analy-
zed according to mean discriminatory power, internal consistency. homogeneity and
transferability. The analysis was done separately in different samples of patients with
depressive syndromes: a) operationally defined depressive syndrome; b) Major Depres-
sive Disorder (RDC); c) Major Depressive Disorder. endogenous type (RDC). BRMS
and MADRS were superior to HAMD in all evaluated aspects. Further, the BRMS was
superior to MADRS according to the criteria of homogeneity and transferability.

Received November 25, 1984; accepted for publication February 9, 1985

The Hamilton Depression Scale (HAMD) (1,2) 1. Categories of degree are more precisely
is the most often used observer scale for the described.
assessment of severity of depression. Neverthe- 2. Items are restricted to representing only
less, there are several shortcomings associated those symptoms considered to be core symp-
with this scale: heterogenous and instable factor toms of depressive syndromes.
analytical structure (3) ; missing general factor 3. There are fewer items (11 in BRMS and 10 in
(so-called severity factor) (4); missing course MADRS).
validation ( 5 ) ; neglection of self-reported feel- 4. Items representing somatic complaints have
ings of distress in favor of the assessment of been reduced.
behavioral symptoms and somatic complaints
( 5 ); insufficient separation between different It is not clarified which observer depression scale
scores of global assessment of depression (6); should be used (9).
intermingling between frequency and intensity As a comparative analysis of all three scales
of symptoms associated with depression (i.e. (HAMD, BRMS, MADRS) has not yet been
patients with many low scoring symptoms and made, it has been suggested that the scales
patients with few high scoring core symptoms should now be compared in order to demon-
may have the same sumscore in the HAMD). strate their strength and deficiencies (10). A con-
In recent years two alternative scales have sensus on the criteria for judging the validity of a
been developed which are intended to overcome severity scale of depression is lacking. Cattell
the shortcomings of the HAMD. These are the and Bech have introduced a new criterion for
Bech Rafaelsen Melancholia Scale (BRMS) scale transferability which has significance for
(7,8) and the Montgomery Asberg Depression severity scales in psychiatry (11).
Rating Scale (MADRS) (5). Both scales differ We aim to apply a list of criteria of adequacy
from the HAMD in the following respects: which should be met by a severity scale of de-
240 W. MAlER AND M. PHlLlPP

pression. In the process we will concentrate on Interview and rating procedure


the cross-sectional analysis of transferability and All patients were interviewed with the Present
homogeneity and ignore the aspects of course. State Examination (13) during the first 3 days
after admission. Immediately after the PSE was
given the same rater judged the patient sepa-
rately on a number of rating scales including the
Patients and methods three scales under study. These ratings were
done by an additional free interview using
Patients always the following order of scales: HAMD,
Ratings were done with 151 consecutively admit- BRMS, MADRS. Finally, several criteria lists of
ted in-patients of the Psychiatric Department of operational diagnostic systems were checked
the University of Mainz. All patients were including the criteria A and B of Minor Depres-
required to meet the following conditions: sive Disorder of the Research Diagnostic
depressive syndrome defined by criteria A and B Criteria. The whole rating procedure lasted for
of Minor Depressive Disorder of the Research about 2 to 3 h.
Diagnostic Criteria (RDC) (12); age between 20
and 60 years; absence of physical disorders. Choice of criteria of adequacy
organic brain damage and drug or alcohol de- Because of methodological considerations we
pendence (Table 1). decided to restrict the evaluation to those
criteria of adequacy which can be judged by
Raters analyses within a given scale; we did not use
Ratings were done by eight different raters. All those criteria which rely on a comparison of
raters were physicians who had been trained in scales with each other. Therefore no statements
psychiatry for at least 2 years and were familiar are given about the validity of the scales in terms
with the Present State Examination (PSE), the of content validity and concurrent validity.
Hamilton Depression Scale and the Research The following criteria of adequacy were
Diagnostic Criteria. In a test run before the choosen:
study itself was begun, interrater reliability on
the H A M D was shown to be sufficiently high in Discriminating power. In item analysis the dis-
a test-retest-setting with a time lag of up to 3 h crimination coefficient should differ significantly
between two ratings. Each patient was rated by from 0 in any item and the mean should not be
one rater only, applying all scales mentioned low (14). A mean scale value of at least 0.32 is
below in a fixed order. considered to be adequate (c.f. (3)).

Table 1 Distribution of sum scores. In item analysis the


Patients and RDC-diagnoses
distribution of the sum scores should not differ
All patients n = 151 significantly from normal distribution. This
male n = 43 should be true for any sample under study.
female n = 108
Age mean = 38.4years Internal consistency. The scale should show a
range = 20-60years
satisfactory internal consistency as a measure of
RDC 9 MDD definite n = 99 test reliability (the alpha-coefficient of Cronbach
(9f MDD endogenus type n = 66) (see Lienert (15)) should be higher than 0.75).
(7MDDBP1 n = 16)
(8MDDBP2 n = 9)
Content validity. There should be no correlation
RDC 10 MIN definite n = 25
10 MIN probable n = 2 between the sum score and other variables which
RDC 11 INT n = 2 are not connected with the quality to be assessed
RDC 3 SAD depr. type n = 17 by the scale (age, sex, etc.).
RDC 4 DSlresxhiz. n= 6
COMPARISON OF OBSERVER DEPRESSION SCALES 241

Homogeneity. The relevance of this criterion has study and of the external criteria of the patient
been stressed by Bech (11,16). The concept of sample. This concept is operationalized in the
homogeneity aims at the uniformity of the rela- statistical concept of local independence, a
tion between the scores of each pair of two dif- charateristic feature of the Rasch model and of
ferent items of the scale operationalized by a other latent-class models. The importance of this
coefficient for each pair of items (Mokken, criterion has been stressed by Cattell (20) and
referring to the Loevinger coefficient); the coef- Bech (11,16,18). Bech has furthermore em-
ficients can be summed up to the coefficient of phasized the relevance of Rasch model testing
scale ability of the scale. If this coefficient is be- for the assessment of transferability.
tween 0.30 and 0.40 the scale is said to be weak; Lord & Novik (14, p. 382) vote for a high fac-
if it exceeds 0.40 the scale is medium (17). tor analytical homogeneity (one-dimensionality
in factor analysis of a scale) as a prerequisite of
Transferability. Transferability means independ- scales with adequate transferability (latent-class
ence of item interrelations of the sample under models with local independence). However, the

Table 2
Descriptive analysis of observer rating scales for depression

Total group Subgroup I Subgroup I1


Depressive syndrome Major depressive Endogenous type
(n=ljl) disorder (n=99) (n=66)

HAMD (21 items)


Mean 23.4 25. I 26.8
Median 24.0 25.5 27.5
SD 0.69 0.67 0.75
Intern. consistency* 0.76 11.64 0.62
Homogeneity** 0.13 0.09 0.08
Discrim.power (mean) 0.39 0.24 (1.23

HAMD (17 items)


Mean 20.8 22.5 24.0
Median 21.3 22.7 24.7
SD 0.62 0.62 0.72
Intern. consistency* 0.77 11.67 0.66
Homogeneity** 0.15 0.11 0.11
Discrim.power (mean) 0.38 0.28 0.28

BRMS
Mean 17.8 19.1 20.5
Median 17.9 1x.x 20.5
SD 0.56 11.62 0.80
Intern. consistency* 0.81 (1.77 0.80
Homogeneity** 0.30 0.26 0.31
Discrim.power (mean) 0.52 0.47 0.50

MADRS
Mean 26.0 29.2 31.5
Median 27.0 30.0 32.2
SD 0.90 0.91 1.10
Intern. consistency* 0.86 0.XI) il.80
Homogeneity* * 0.31 0.26 0.24
Discrim.power (mean) 0.60 11.51 0.59

* alpha-coefficient of Cronbach (15)


** Loevinger coefficient (17)

16
242 W. MAlER AND M. PHlLlPP

relevance of this criterion has been questioned patients with Major Depressive Disorder (RDC)
(Cattell (20, p.379) and Bech (11)). Therefore and patients with Major Depressive Disorder,
factor analysis is not presented here. endogenous subtype (RDC).

Statistical analyses
The above-mentioned criteria of adequacy
require that several different methods of analysis Results
must be administered. For analysis the following The part-whole-corrected coefficient of discrimi-
methods are necessary: item analysis including nation in mean shows lower values with the
the aspects of discriminating power, test of devi- H A M D than with the other two scales (Table 2).
ation from normal distribution (Kolmgoroff- Only the H A M D included items which have dis-
Smirnow test (21)), internal consistency, crimination coefficients that do not differ sig-
homogeneity (Mokken analysis (17)) and Rasch nificantly (P>O.Ol) from zero. This is found for
model fitting (19,22). items 15 (hypochondriasis), 16 (loss of weight)
Both item analysis and analysis of consistency and 18 (diurnal variation). Items 3 (suicide), 6
were made. After dichotomizing the item scores (insomnia, delayed), 13 (somatic, general) and
according to the mean in the sample (in order to 14 (loss of libido) in at least one of the popula-
preserve variance (23) a RASCH analysis was tions show relatively low coefficients of discrimi-
done and the coefficient of homogeneity nation (0.18) indicating high heterogeneity. In
(Loevinger; see Mokken (17)) determined. To the analysis of Baumann (3) the discriminating
control this particular method of dichotomizing power of items 4 and 6 of the 17-item H A M D
item scores a further method was applied: all does not differ significantly from zero (P>O.Ol).
items were dichotomized between score 1 (the In the MADRS and the BRMS all items show a
second score) and score 2 (the third score). significant deviation from zero in all groups.
We analysed the data of the whole patient While the MADRS has a discrimination coeffi-
group as well as the data of two subgroups: cient of more than 0.40 in all items the BRMS

Table 3
Correlation of observer rating scale sum scores with other variables
(Spearman's correlation coefficient)

Total group Subgroup I Subgroup I1


Depressive syndrome Major depressive Endogenous type
(11=151) disorder (11 =Y9) (r1=66)

HAMD (21 items)


Age 0.13 I).16 0.05
Diagnosis* - - 0.1 1/0.34**

HAMD (17 items)


Age 0.14 0.12 0.10
Diagnosis* - - l).4O/O.32* '

BRMS
Age 0.06 0.20 0.03
Diagnosis' - - 0.3510.30 **

MADRS
Age 0.13 0.18 0.08
Diagnosis* - - 0.46/0.39* *

* Point biserial correlation coefficient


** Left figure: total group; right figure: subgroup of Major depressive disorder
COMPARISON OF OBSERVER DEPRESSION SCALES 243

has values of 0.10 in item 9 and 0.20 in item 10. for BRMS (0.26 to 0.31) and for MADRS (0.24
All other item scores are higher than 0.40. The to 0.31) according to Mokkens criterion of
lower limit of the discriminating power of a scale >0.30 for a weak homogeneous scale, although
should be 0.32 according to Baumann (3). This is even for these scales there are still problems.
the case for BRMS and MADRS but not for The results are identical after dichotomizing the
HAMD in subgroups I and 11. item scores according to the second method (see
None of the sum scores of the three scales statistical analyses).
shows a significant deviation from normal dis- When examining the homogeneity of a scale
tribution in the Kolmgoroff-Smirnow test (21). by the method of Rasch the tranferability for
This is true for all groups. (Table 2) different populations is of decisive importance.
The alpha-coefficient of Cronbach is adequate Therefore the samples under study have been
for the MADRS and the BRMS in all popula- divided into two parts in which the parameters of
tions. For the HAMD it is relatively lower. difficulty are estimated independently (ratio
Nevertheless, the internal consistency of the scale). After this, both parameters were com-
HAMD is satisfactory (Table 2). pared by a Chi-square test. For our purpose the
No significant correlations were found be- total sample was divided according to the follow-
tween the degree of severity and such external ing criteria: by chance, by age, by the median of
variables as age or sex (Table 3). the sum score of the scale under study, and divi-
After dichotomizing the item scores at the sion of the sample according to the diagnosis
empirical mean in each sample all Loevinger (ICD 296 versus ICD 11011-296).The fitting of the
coefficients of homogeneity (17) significantly dif- Rasch model will be expressed by a Chi-square
fer from zero (Table 2). In HAMD they show a value according to Andersen (22) which shows
high heterogeneity (0.08 to 0.13); in the other the significant deviations from the conditions of
two scales they are slightly higher than in the Rasch model.
HAMD: here they are clearly more acceptable After division of the samples by chance there

Table 4
Observer rating scales for depression: adequacy of RASCH model
-
Rationale for partition

Randomized Partition Partition Partition Partition


partition by age by sex by diag- by median
Scale nosis of scale

HAMD a) 21.9 (20df) 63.0 50.7 98.7 62.6


(21 items)
b) 0.35 .(I0 .oo .oo .OO

HAMD a) 20.0 (16df) 48.1 39.3 67.9 39.6


(17 items)
b) 0.22 .00 .oo .oo .00

BRMS a) 12.3 (1Odf) 21.9 24.6 20.9 65.6


b) 0.27 0.02 0.01 0.03 .00

MADRS a) 6.0 (9df) 26.1 25.5 34.1 35.5


b) 0.74 .00 .00 .oo .00

a ) Chi-square values b) empirical P-values


*: after dichotomizing at the mean
244 W. MAlER AND M. PHlLlPP

are no significant deviations from the model in ity, although better than is the case with the
any of the scales. In all other divisions the HAMD. Thus, according to these criteria the
HAMD is more inhomogeneous in comparison BRMS is superior. However, both scales need
with the BRMS and the MADRS. Neither of the increased homogeneity and transferability. Our
latter scales meets the conditions of the model in results indicate that the homogeneity of the
all division procedures. However, in divisions BRMS could be increased by omitting items 9
according to diagnosis, age and sex, the BRMS, (insomnia) and 10 (tiredness and pains). The
in contrast to the MADRS, shows no significant same applies to item 4 (insomnia) in the
deviation from the Rasch model. The items on MADRS.
the BRMS which interfere most with the Rasch In another study (25) it has been shown that
model are Nos. 9 (insomnia) and 10 (suicide). by shortening to 6 items the HAMD is better
These results are true for the total group as well able to meet the criteria of adequacy used here.
as for the patients with Major Depressive Disor- In this case the shortened version of the HAMD
der. The significant deviations from the Rasch would have even better validity than the BRMS
model (P=0.05 and P=0.01) are identical when and the MADRS: the homogeneity of the
item scores are dichotomized according to the HAMD subscales is higher (Rasch analysis and
second method (see statistical analyses). Loevinger coefficient) than in the more com-
prehensive scales HAMD, MADRS and BRMS;
no criterion used in this study has shown a
Discussion superiority of these three scales. Further, inter-
The 21-item version of the HAMD is not shown rater reliability of the HAMD subscale of Bech
to be superior to the 17-item version. None of is similar to the reliability of the BRMS (7). This
the above-listed criteria is better satisfied by the is another argument for reducing the BRMS and
21-item version than by the 17-item version. MADRS to obtain more homogeneous and shor-
Even the expected gain in reliability is lacking. ter scales without loss of validity and reliability.
Furthermore, high heterogeneity is indicated by In conclusion we consider that the BRMS and
the low mean discrimination power of the items, the MADRS are superior to the HAMD accord-
the Loevinger homogeneity coefficient and the ing to all criteria under study. These findings
Rasch analysis. argue for further comparisons of these three
Our results confirm those in other studies, e.g. scales in validation studies. We suggest the
Bech et al. (24) in which the HAMD only BRMS and MADRS should be used in conjunc-
showed a low degree of homogeneity (Mokkens tion with the HAMD in therapy studies. Accord-
coefficient of scalability, mean discriminating ing to the criteria of transferability and
power, internal consistency) and a low Rasch homogeneity the BRMS seems more adequate
model fitting. Both BRMS and MADRS have a than the MADRS. However, the results are not
higher degree of homogeneity and a better conclusive and require further affirmation. In
Rasch model fitting than the HAMD. The addition, in order to decide between the BRMS
BRMS only deviates from the Rasch model and MADRS, the two scales should be com-
regarding the partition criterion (partition of the pared on sensitivity to change.
sample at the median of the sum score of the
scale). This is mostly because items representing
somatic complaints (particularly item 9) are as Acknowledgement
frequent and intensive in patients with low sum This study was supported in part by Smith Mine Dauelsberg
scores as they are in those with high sum scores Gmbh, Miinchen and Troponwerke, Koln.
on the BRMS. This, however, is not the case for
all other items. The MADRS shows an insuffi-
cient Rasch model fitting in all partitions of the References
sample with the exception of random partition. 1. Hamilton M. A rating scale for depression. J Neurol
This demonstrates a low degree of transferabil- Neurosurg Psychiatr 1960:2356-62.
COMPARISON OF OBSERVER DEPRESSION SCALES 245

2 . Hamilton M. Development of a depression scale for prim- 14. Lord F M, Novik M R. Statistical theories of mental test
ary depressive illness. Br J SOC Clin Psychology 1967: scores. Reading: Addison-Weslay , 1968.
6:278-296. 15. Lienert G A. Testaufbau und Testanalyse. Weinheim:
3. Baumann U. Methodische Untersuchungen zur Hamilton- Beltz Verlag, 1969.
Depressions-Skala. Arch Psychiatr Nervenkr 1976: 16. Bech P. The instrumental use of rating scales for deprcs-
222:359-375. sion. Pharmacopsychiatry 1984:17:22-29.
4. Maier W, Philipp M, Gerken A. Dimensionen der Hamil- 17. Mokken R J. A theory and procedure of scale analysis.
ton-Depressions-Skala. Arch Psychiatr Nervenkr: (in The Hague, 1974.
press). 18. Bech P. Assessment scales for depression: the next 20
5. Montgomery P, Asberg M. A new depression scale years. In: Bech P. Hippius H. eds. Acta Psychiatr Scand
designed to be sensitive to change. Br J Psychiatry 1983: Suppl. 310:117-130.
1979:134:382-389. 19. Rasch G . Probabilistic models for some intelligence and
6. Bech P. Gram L F. Dein E. Jacobson 0, Vitger J. Bolwig attainment tests. Copenhagen: Danish Institute of Educa-
T G. Quantitative rating of depressive states. Acta tional Research, 1960.
Psychiatr Scand 1975:51:161-170. 20. Cattell R B. Personality and mood by questionaire. San
7. Bech P, Bowlby T G. Kramp P. Rafaelsen 0 J. The Bech- Francisco: Jossey-Bass Publishers, 1973.
Rafaelsen Melancholia Scale and the Hamilton Depres- 21. Kendell M G, Stuart A. The advanced theory of statistics.
sion Scale. Acta Psychiatr Scand 1979:59:420-430. vol 2. London: Griffin, 1976.
8. Rafaelsen 0 J , Bech P. Bowlig T G, Kramp P, Gjerris A. 22. Andersen E B. A goodness of fit test for the RASCH
The Bech Rafaelsen Combined Rating Scale for Mania model. Psychometrika 1973:38:123-140.
and Melancholia. In: Achte K. Aalberg V, Lonquist J , 23. Cohen J. The cost of dichotomization. Appl Psychol. Mea-
eds. Psychopathology of depression. Acta Psychiatr Fen- surement 1983:7.249-255.
nica Suppl. 1980. 24. Bech P. Allerup P. Gram F, et al. The Hamilton Depres-
9. Checkley S A, Rush A J , Beckmann H, et al. Functional sion Scale. Evaluation of objectivity, using logistic mod-
indices of biological disturbances. In: Angst J, ed. The els. Acta Psychiatr Scand 1981:63:292-299.
origins of depression: current concepts and approaches. 25. Kxier W. Philipp M. Improving the assessment of the sev-
Berlin: Springer, 1984. erity of dcprcssive states: reduction of the Hamilton De-
10. Snaith R P. Rating scales. Br J Psychiatry 1981: 138512- pression Scale. Pharmacopsychiatry 1985:18:114-115.
514.
11. Bech P. Rating scales for affective disorders: their validity Address:
and consistency. Acta Psychiatr Scand 1981: Suppl. 295.
12. Spitzer R L, Endicott J. Robins, E. Research Diagnostic Wolfgang Maier
Criteria: rationale and reliability. Arch Gen Psychiatry Michnel Philipp
1978: 35:773-785. Dept. of Psychiatry
13. Wing J K, Cooper J E , Sartorius N. The measurement and University of Mainz
classification of psychiatric symptoms. (An instruction Langenbeckstrasse 1
manual for the PSE and CATEGO system.) London: 6500 Maim
Cambridge University Press, 1974. W. Germany

You might also like