2015, Standards, Accuracy, and Questions of Bias in Rorschach Meta-Analyses PDF

Psychological Bulletin 2015 American Psychological Association
2015, Vol. 141, No. 1, 250 260 0033-2909/15/$12.00 http://dx.doi.org/10.1037/a0038445
REPLY
Standards, Accuracy, and Questions of Bias in Rorschach Meta-Analyses:

Reply to Wood, Garb, Nezworski, Lilienfeld, and Duke (2015)
Joni L. Mihura and Gregory J. Meyer George Bombel

University of Toledo The Menninger Clinic, Houston, Texas
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Nicolae Dumitrascu
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Boston University
Wood, Garb, Nezworski, Lilienfeld, and Duke (2015) found our systematic review and meta-analyses of
65 Rorschach variables to be accurate and unbiased, and hence removed their previous recommendation
for a moratorium on the applied use of the Rorschach. However, Wood et al. (2015) hypothesized that
publication bias would exist for 4 Rorschach variables. To test this hypothesis, they replicated our
meta-analyses for these 4 variables and added unpublished dissertations to the pool of articles. In the
process, they used procedures that contradicted their standards and recommendations for sound Ror-
schach research, which consistently led to significantly lower effect sizes. In reviewing their meta-
analyses, we found numerous methodological errors, data errors, and omitted studies. In contrast to their
strict requirements for interrater reliability in the Rorschach meta-analyses of other researchers, they did
not report interrater reliability for any of their coding and classification decisions. In addition, many of
their conclusions were based on a narrative review of individual studies and post hoc analyses rather than
their meta-analytic findings. Finally, we challenge their sole use of dissertations to test publication bias
because (a) they failed to reconcile their conclusion that publication bias was present with the analyses
we conducted showing its absence, and (b) we found numerous problems with dissertation study quality.
In short, one cannot rely on the findings or the conclusions reported in Wood et al.
Keywords: Rorschach, meta-analysis, construct validity, comprehensive system, psychological assess-

ment
Wood et al. (2015) Found Our Meta-Analyses to be active self-described critics of the Rorschach method. Other
Accurate and Unbiased than their own work, ours is the first meta-analytic review of
Rorschach validity they have not criticized as being method-
Wood, Garb, Nezworski, Lilienfeld, and Duke (2015) reviewed ologically flawed, fraught with errors, and biased (Garb,
our Rorschach validity meta-analyses (Mihura, Meyer, Dumi- 1999; Garb, Florio, & Grove, 1998, 1999; Garb, Wood, Nez-
trascu, & Bombel, 2013) and concluded The estimated validity worski, Grove, & Stejskal, 2001; Lilienfeld, Wood, & Garb,
coefficients reported by the authors provided an unbiased and 2000; Wood & Lilienfeld, 1999).
trustworthy summary of the published literature (p. 243). For Wood et al. (2015) concluded, Nearly 15 years ago, one of
readers unfamiliar with the Rorschach empirical literature, we the authors of this Comment published a recommendation that
stress the significance of their approval of our meta-analyses. a moratorium be placed upon use of the Rorschach in clinical
During the last two decades, the authors1 have been the most and forensic settings because of the tests weak scientific foun-
dation (Garb, 1999). He and the other authors of this Comment
agree that, in light of the compelling evidence laid out by
Mihura et al. (2013), the time has come to withdraw this
Joni L. Mihura and Gregory J. Meyer, Department of Psychology,
recommendation so far as it applies to the [14 Rorschach
University of Toledo; George Bombel, The Menninger Clinic, Houston,
Texas; Nicolae Dumitrascu, The Danielsen Institute at Boston University. variables that Wood et al. call the] Cognitive Quartet (Wood et
Joni L. Mihura and Gregory J. Meyer receive royalties from a test al., 2015, p. 243). Wood et al.s withdrawal of their proposed
manual (Meyer, Viglione, Mihura, Erard, & Erdberg, 2011) and associated moratorium on the Rorschach is a noteworthy event and we
products for using the Rorschach.
We thank Wei-Cheng (Wilson) Hsiao, Andrea B. Kiss, and Joseph A.
Reed for their assistance. 1
When we use the term authors or Wood et al. to refer to previous
Correspondence concerning this article should be addressed to Joni L. publications by these authors, we are referring to one or all of the first four
Mihura, Department of Psychology, University of Toledo, 2801 W. Ban- authors, as this is the first time the fifth author has published with this
croft Sreet, Toledo, OH 43606. E-mail: joni.mihura@utoledo.edu group.
250
REPLY TO WOOD ET AL. (2015) 251
appreciate how seriously they considered the implications of Rorschach research as well as standards for Rorschach validity
our meta-analyses. meta-analyses (Garb, 1999; Garb et al., 1998, 1999, 2001; Lilien-
feld et al., 2000; Wood & Lilienfeld, 1999; Wood, Lilienfeld,
The Nature of Wood et al.s (2015) Garb, & Nezworski, 2000; Wood, Nezworski, Stejskal, Garven, &
West, 1999). These guidelines have helped raise Rorschach re-
Comment and Our Reply
searchers awareness of potential methodological or statistical
Wood et al. (2015) did not find problems with our meta- problems and pitfalls. In developing, executing, and evaluating our
analyses or our conclusions; instead, their article focused on Rorschach meta-analyses we aimed to be responsive to these
possible publication bias for four Rorschach variables as well as issues. We also checked our decisions and data analyses many
various other criticisms of the Rorschach. To test for publica- times. Therefore, the standards set by these authors helped make
tion bias, Wood et al. conducted four new meta-analyses that our study better and our resulting findings more reliable and
added unpublished dissertations to the pool of peer-reviewed unbiased. In the following sections, we use Wood et al.s standards
articles. In reviewing their meta-analyses, we discovered nu- and recommendations to review the evidence reported in their
merous methodological errors, data errors, and omitted studies. article.

They also did not report interrater reliability for any of their
decisions, a serious oversight given they have applied strict
interrater reliability standards in other Rorschach validity meta- Data Treatment Standards
analyses (see Garb et al., 2001). Further, Wood et al.s conclu- In addition to adding dissertations as a way to address pub-
sions were largely based on a narrative review of studies and lication bias, Wood et al. (2015) made two key changes to our
post hoc analyses instead of their meta-analytic results (or methodology: (a) they replaced study control groups with In-
ours). Consequently, in this Reply, we review their evidence in ternational Norms and (b) they did not correct for the Number
each of these areas. of Responses. As we describe below, they omitted the fact that
this contradicts their previous stance on both of these issues.
Wood et al.s (2015) Main Argument This inconsistency raises concerns about the objectivity of these
decisions, especially because the reversal of their previous
Wood et al.s (2015) argument can be summarized as follows: stance produces results in the direction of their hypothesis (i.e.,
Two groups of Rorschach Comprehensive System (CS) scores that these four noncognitive Rorschach variables are invalid).
exist; one is valid, the other is not. The validity of one group, the Replacing local comparison samples with general norms.
cognitive variables, is already well-recognized; however, its com- Wood et al. (2015) have repeatedly argued against Rorschach
ponent scores are largely redundant and most should be replaced researchers using general norms instead of a locally collected
with an IQ test. The other group of scores, the noncognitive control group. They state: As Exner, Kinder, and Curtiss
variables, only appear to be valid due to publication bias, which (1995, p. 151) have pointed out, one of the most common
can be proven by comparing the results in peer-reviewed articles to errors noted in Rorschach research occurs when investigators
those in unpublished dissertations. attempt to use normative data as some control or reference
It is important to note two problems with this argument before sample (Garb et al., 2001, p. 442). Wood et al. have called
reviewing the evidence presented to support it. First, the dichoto- this procedure a methodological flaw and placed it first in a
mous cognitive/noncognitive grouping of variablesas well as the
list of six widespread and serious problems in the Rorschach
label Cognitive Quartet2were created by Wood et al. (2015) for
validity literature (Wood et al., 1999, 2000), citing two reasons:
their article. They are not used in the CS literature. But mainly,
(a) The local comparison group might control for other factors
their cognitive/noncognitive dichotomization is not supported be-
like age, socioeconomic status, and educational level, and (b)
cause the variables (a) overlap significantly3 and (b) are not
different scoring conventions can exist so, optimally, it is best
supported by research as cohering to form two latent factors
to use the same coders for the target and comparison sample.
(Meyer, 2007). Second, it cannot be argued that the noncognitive
Yet in Wood et al.s (2015) meta-analyses, they replaced the
scores validity is threatened by publication bias but the cognitive
local samples that researchers collected with the International
scores are irrefutably valid without providing a reason as to why
Norms in six instances. In two of these instances, they replaced
the cognitive scores would be immune to such publication bias.
local cultural samples with the International Norms (Skinstad,
Given the overlap across their cognitive/noncognitive categories
Troland, & Mortensen, 1999 [Norway]; Yamamoto et al., 2010
and the fact their cognitive variables were not included in their test
for publication bias, Wood et al.s meta-analyses can only speak to
2
the validity of the constructs for the four specific variables they Wood et al. (2015) refer to one group of CS variables as Deviant
investigated.4 Verbalizations but this should be called Critical Special scores; Deviant
Verbalizations is the name for one of the six Critical Special scores.
3
See the descriptive labels in Tables 1 and 2 in Wood et al. (2015). Note,
Evaluating Wood et al.s (2015) Meta-Analyses by for example, that Weighted Sum of Color in Table 2 (p. 238) is one half of
Experience Actual in Table 1 (p. 237). To identify the other overlapping
Their Standards areas, see the description of the variables in Mihura et al. (2013).
4
As we noted above, these authors have previously concluded It is also noteworthy that in other publications the authors have
strongly endorsed the validity of Rorschach variables that are decidedly
that all the [Rorschach] meta-analyses have had serious method- noncognitivei.e., the Rorschach Oral Dependency Scale, Elizur Anxiety
ological flaws (Lilienfeld et al., 2000, p. 35). Therefore, Wood et and Hostility scales, and Rorschach Prognostic Rating Scale (e.g., Wood,
al. (2015) have published methodological recommendations for Nezworski, & Garb, 2003).
252 MIHURA, MEYER, BOMBEL, AND DUMITRASCU
[Japan]5) though they have consistently said it is necessary to use Blinding the coding judges. Although Wood et al. (2015)
culturally specific comparison samples (Garb et al., 2001; Lilien- have stated that the coding judges for Rorschach meta-analyses
feld et al., 2000; Wood, Garb, Lilienfeld, & Nezworski, 2002; should be blind to study results, they did not require this of their
Wood & Lilienfeld, 1999). Previously, when Wood et al. have judge(s). These authors previously labeled the use of nonblinded
labeled this use of remote norms an error and a methodological coding judges a methodological flaw due to the potential for cherry
flaw, it was to assail positive evidence for Rorschach validity picking or confirmation bias (e.g., choosing findings that confirm
(Garb, Wood, & Nezworski, 2000; Garb et al., 2001; Wood et al., Rorschach validity). Although they did not mention why they did
1999, 2000). Now, by reversing their stance, it reduces the average not follow their standard here, it could be for the same reasons we
r for these findings from .20 to .03 (k 6; Auker-Keller, 1998; did not use this method: (a) it is exceedingly time-consuming to
Exner, 1986; Owens, 1982; Skinstad et al., 1999; Yamamoto et al., prepare results-blinded review materials for judges in large scale
2010; Zodan, 2010). meta-analyses, and (b) although the authors have expressed con-
Controlling for the Number of Responses. As noted by Ror- cern that using Rorschach researchers as coders might bias the
schach critics and researchers alike for many years, positive find- results because they will be familiar with many of the studies, it is
ings for a target variable can be an artifact of the relationship unfeasible to use coders who are not familiar with either the
between the Number of Responses (R) provided during the task and
Rorschach or its research methodology. Because our coding judges
the criterion variable of interest6 (Cronbach, 1949; Hunsley &
were not blind to the results, we conducted formal statistical
Bailey, 1999; Lilienfeld et al., 2000; Meyer, 1992; Wood, Nez-
analyses to test for potential bias and found no evidence of it
worski, & Stejskal, 1996; Wood, Nezworski, Lilienfeld, & Garb,
(Mihura et al., 2013, pp. 563564). In contrast, Wood et al. did not
2003). As stated by Wood, Nezworski, Lilienfeld et al. (2003)
provide any evidence against their potential confirmation bias (i.e.,
The drab and unassuming R is the power lurking behind the
to confirm Rorschach invalidity).
Rorschach throne, the unnoticed figure in the background that
influences nearly everything else about the test . . . R is the ocean, Applying the study selection criteria. The authors have used
and the other Rorschach variables the boats that rise and fall on its strict standards for mistakes in coding the data in other Ror-
tides (p. 150). Because of this, and consistent with the principles schach meta-analyses (Garb et al., 2001, p. 441). However, it is
of psychometric meta-analysis where the goal is to correct validity important to note that sometimes when reviewers of meta-analyses
coefficients for known sources of error and bias (Borenstein, consider something an error it is actually a judgment call.8 There-
Hedges, Higgins, & Rothstein, 2009; Hunter & Schmidt, 2004), we fore, in reviewing Wood et al.s (2015) work we strive to distin-
statistically adjusted the effects when the target variable showed a guish between a judgment call and an objective error.
statistically significant association with the Number of Responses.
For all of these reasons, we were very surprised Wood et al. (2015) 5
Wood et al. (2015) substituted the International Norms for the Japanese
chose to abandon our methodology. Adjusting for the Number of norms used by Yamamoto et al. (2010; i.e., Takahashi, Takahashi, &
Responses does not consistently decrease or increase the magni- Nishio, 2007) claiming that their norms were seriously discrepant from
tude of the association between the target and criterion variableit both the International Norms and a Japanese sample used in creating the
depends on the criterion variable, hypothesis, and direction of International Norms (Nakamura, Fuchigami, & Tsugawa, 2007) (p. 3).
However, the only notable differences are to be expected because
differences in the Number of Responses. However, in this situation, Yamamoto et al. did not use the standard CS tables for Form Quality
Wood et al.s reversal of our methodology and their decision to not coding, they used Form Quality tables developed with Japanese nonpa-
correct for this known artifact reduced the average r for these tients specifically for use in Japan (see Meyer, Viglione, Mihura, Erard, &
findings from .22 to .08 (k 8; Black, 2002; Burns, 2003; Daley, Erdberg, 2011; Takahashi, Takahashi, & Nishio, 2009). The coding guide-
lines for the other 46 variablesincluding the target variable in Wood et
1992; Exner, 1986; Flahault & Sultan, 2010; Meyer, 1993; Porcelli al.s meta-analysis, Sum of Shadingare the same (and therefore directly
& Mihura, 20107). comparable).
6
Wood et al. (2015) criticized the use of semipartial correlations in our
meta-analyses, but they omitted the facts that semipartial correlations were
Meta-Analytic Standards (a) used by us only once for the variables in their meta-analyses and (b)
included in their other Rorschach meta-analysis (Wood et al., 2010, p.
Wood et al. (2015) have used the following methodological 340).
standards to evaluate Rorschach meta-analyses: (a) judges coding 7
Wood et al. (2015) reported that the effect was from Porcelli and
the studies should be blind to the results; (b) all predictors Meyer (2002), but their supplemental materials show it was from Porcelli
should be coded; (c) coding judges should use a study review and Mihura (2010).
8
For example, when Garb et al. (2001) said Hiller, Rosenthal, Bornstein,
protocol (or what they call a coding book); (d) coding should be Berry, and Brunell-Neuleibs (1999) Rorschach meta-analysis was flawed
free of error; (e) good interrater reliability should exist for coding due to coding mistakes, their only example was that Hiller had given the
decisions; (f) either kappa or an intraclass correlation should be judges the results of a Rorschach index but not its six component variables.
used as the interrater reliability statistic to control for chance The judges task was to determine if individual predictor-criterion pairs
agreement; and (g) the appropriate formula should be used for should count as a validity coefficient using results-blinded synopses of
each study. Garb et al. (2001) emphasized that the results for these six
aggregating effect sizes (Garb, 1999; Garb et al., 1998, 1999, variables were nonsignificant leading to a high estimate for the average
2001; Lilienfeld et al., 2000; Wood & Lilienfeld, 1999; Wood et effect size (p. 441), suggesting Hiller engaged in selective reporting by
al., 1999, 2000). Meta-analyses that violate these standards have omitting them from the judges materials. However, these variables were
been deemed methodologically flawed and fatally so. The au- being used to test discriminant validityi.e., they were predicted not to be
associated with the criterion variable. Further, given the two-step proce-
thors have also recommended that Rorschach meta-analyses in- dure used by Hiller et al. (1999, pp. 280 281) to determine which effects
clude unpublished studies, suggesting this might be accomplished should be sent to the judges, it also can be argued that Hiller appropriately
with dissertations (Garb et al., 2001). omitted these variables in the first step.
Author-hypothesized associations. In selecting study findings included for this variable (sexually or physically abused; observer
for inclusion, there were several instances when Wood et al. ratings of somatic concerns; and physical illness, including chil-
(2015) did not accurately apply our central study selection criteria, dren who have a parent with a serious physical illness). They also
which was to (a) [identify] all the author-hypothesized associa- included sex offenders as a criterion variable because they said it
tions in the literature, (b) then [identify] all instances when any was associated with Anatomy and X-ray in the existing research
other author had evaluated the same predictor criterion relation literature (Wood et al., 2015, p. 242). However, Wood et al. did not
but had not hypothesized it, and (c) [then judge] the fit of all these identify all studies that used the Anatomy and X-ray variable in
previously hypothesized criteria to the target scales intended core order to determine if each studys criterion variable would fit its
construct, retaining only those that [fall] in the conceptual bulls- construct label (Preoccupations with Body Vulnerability or Its
eye (Mihura et al., 2013, p. 559). In seven instances, Wood et al. Functioning).
did not use the study authors hypotheses and devised their own, Choosing the appropriate findings to serve as the validity coef-
which reduced the average r for these findings from .18 to .07 (k ficients is the most challenging task in a construct validity meta-
7; Auker-Keller, 1998, p. 6; Meyer, 1993, p. 156; Smith, 2008, p. analysis. Making these judgments requires an understanding of the
29; Zodan, 2010, pp. 79 80). predictor variable, the criterion variable, each of their proposed
In two other cases, Wood et al. (2015) used dissertation findings constructs, and the research that supports the criterion variable as
that the authors clearly stated were not hypothesized. In one valid measure of its own construct. This is why evidence of
dissertation with a sample of custody litigants, the author hypoth- interrater reliability is required in Rorschach validity meta-
esized that faking-good response sets would significantly sup- analyses, which Wood et al. did not provide (as we discuss in a
press the true correlative relationship between conceptually related subsequent section).
Rorschach and MMPI-2 variables (Gregg, 1998, p. 138). Despite Omitting relevant findings or including irrelevant findings.
this expected suppressive response set, Wood et al. compared In our meta-analyses. We appreciate Wood et al.s (2015)
Greggs litigants to the International Norms. The second disserta- endorsement of our thorough search of the published Rorschach
tion used an archival sample of Rorschach protocols lacking in- literature (p. 242). It was indeed a time-consuming meticulous
terrater reliability data that were obtained from homeless, indigent
endeavor requiring thousands of hours over several years. They did
women applying for social security benefits. The author asserted
think we overlooked findings from three articles, however, that
the study was descriptive in nature and did not involve formal
they included in their meta-analyses. Although they considered
hypothesis testing (Burns, 2003, p. 114), though Wood et al.
this a minor error, we did not overlook these findings, one of
compared Burns data to the International Norms. Including these
which was our own study. For two of these studies (Porcelli &
two studies introduced an average negative effect of .26.
Mihura, 2010; Skinstad et al., 1999), Wood et al. compared their
Construct matching. Wood et al. (2015) have recommended
descriptive statistics to the International Norms. We excluded
that all predictors in a study be coded and included in the
these findings because comparing descriptive statistics to the In-
meta-analysis (Garb et al., 2001, p. 443). This certainly sounds
ternational Norms was only to be used if (a) an author hypothe-
reasonable, but in construct validity meta-analyses for psycholog-
ical tests, whether or not a finding qualifies as a validity coefficient sized differences between a target group . . . and normative data
requires significantly more judgment than most meta-analyses in and (b) another author had only [italics added] provided descrip-
psychology, which generally focus on variables predicting an a tive statistics for the target group [that is, and had no comparison
priori clearly defined criterion such as reliable change on a treat- sample] (Mihura et al., 2013, p. 559). The latter was implemented
ment outcome measure or observer-ratings of five factor model specifically to ensure that we found all relevant effect sizes for any
items. Our Method section addresses this dilemma (Mihura et al., hypothesized association. Otherwise, if one were to scour the
2013, pp. 558 559); but as eloquently stated in Cronbach and literature and lift various descriptive statistics from studies to
Meehls (1955) classic article, Construct validity must be inves- compare to norms, one could choose from tens of thousands of
tigated whenever no criterion or universe of content is accepted as results. Wood et al. misapplied this rule to the two aforementioned
entirely adequate to define the quality to be measured (p. 282). articles plus two dissertations. One dissertation even specifically
Consequently, all Rorschach construct validity meta-analyses re- stated The Comprehensive System was strictly utilized for the systems
quire that coders make judgments as to whether specific predictor interpretation method and not to compare results with the Comprehen-
criterion findings qualify as relevant validity coefficients. sive Systems norms (Auker-Keller, 1998, p. 46). The misappli-
In our meta-analyses, the judgment task was to decide whether cation of our descriptive statistics rule introduced an average r of
the criterion variables construct fit the relevant Rorschach con- .09 (k 4; Auker-Keller, 1998; Belcher, 1995; Porcelli & Mihura,
struct label (Mihura et al., 2013, p. 559). This judgment required 2010; Ritsher, 1997).
that we (a) select validity criteria that hit the bulls-eye of the The third finding Wood et al. (2015) thought we overlooked was
Rorschach construct and (b) judge whether the criterion variable from McCraw and Pegg-McNabs (1989) study comparing Anat-
itself is a measure of [its] intended construct [and] when in doubt, omy and X-ray scores of sex offenders and non-sex-offenders. To
we consulted the relevant research (pp. 559 560). But Wood et justify their inclusion of this finding, Wood et al. cited the Variable
al. (2015) did not follow this procedure. What they did is more Selection and Validity chapter from the Rorschach Performance
akin to a criterion validity meta-analysis, not a construct validity
meta-analysis. Using Anatomy and X-ray as an example, they did 9
In our meta-analyses this step required that we make judgments on
not review and judge each predictor criterion association in every 3,074 potentially relevant validity coefficients to determine whether the
study to see if it fit the inclusion criteria.9 Instead, they searched criterions construct matched the Rorschach variables construct (Mihura et
the literature looking only for the criterion variables that we al., 2013, p. 559).
Assessment System (R-PAS) manual (Meyer, Viglione, Mihura, close readings of that article [Garca-Alba, 2004] we were unable
Erard, & Erdberg, 2011). However, this chapter clearly states it to find such a comparison in a form that allowed the extraction of
was based on the results of an earlier version of our meta-analyses a validity coefficient (p. 241). They are correct, the data were not
(p. 441), not the final published version.10 We originally included in the article. When the article lacked sufficient information to
McCraw and Pegg-McNab because another set of authors compute an effect size and was based on a dissertation the judge
(Hughes, Deville, Chalhoub, & Romboletti, 1992), which Wood et was required to obtain the dissertation, if possible, to see if it
al. overlooked, had hypothesized that sexual assault offenders had contained the relevant data (Mihura et al., 2013, p. 560). In the
poorer relational maturity than other offenders. Hughes et al. author note (p. 40), Garca-Alba stated that her study was based on
believed this would manifest on the Rorschach by seeing internal a dissertation. The effect size was r .25.
body parts (anatomy) in contrast to whole or partial (e.g., a face)
human images. However, during the editorial review process we
were asked to tighten the bulls-eye when matching predictor- Interrater Reliability
criterion constructs, which resulted in the exclusion of 161 find-
Of all the problems present in Wood et al.s (2015) new meta-
ings. The findings from these two articles were among these
analyses, we were most surprised that they did not report interrater
excluded findings because we judged that sexual offender charac-

reliability for their decisions about which effect sizes to include.
teristics did not target the bulls-eye of Anatomy and X-rays
These critics have had strict interrater reliability standards for
construct label, Preoccupations with Body Vulnerability or Its
other Rorschach meta-analyses including (a) the requirement of
Functioning. Therefore, we did not overlook McCraw and Pegg-
study review protocols, (b) good levels of agreement regarding
McNab; we deliberately excluded it. We do not understand why
which findings to include in the meta-analysis, and (c) the appro-
Wood et al. utilized a criterion from an earlier version of our
priate statistics to compute interrater reliability (Garb, 1999; Garb
meta-analyses that we ultimately clearly excluded. But having
et al., 2001; Wood & Lilienfeld, 1999). When these standards have
done so, they should have informed readers that their rationale was
not been met, the authors have dismissed the meta-analytic find-
based on an earlier version of our meta-analyses. Including sex
offenders as a criterion for Anatomy and X-ray in Wood et al. ings (e.g., Garb et al., 2001). We met all three of these standards;
introduced an average r of .07 from three studies (Belcher, 1995; Wood et al. met none. For example, Wood et al. said they used our
McCraw & Pegg-McNab, 1989; Smith, 2008). methodology to replicate our meta-analyses but they did not re-
In Wood et al.s meta-analyses. We identified many findings quest our review protocol (Mihura et al., 2013, p. 558). They also
that Wood et al. (2015) overlooked that fit their study selection did not report reliability for their judgments (nor who made the
criteria (which includes the requirement that the dissertation be judgments or how disagreements were resolved). The interrater
available for free download11). Due to time constraints we could reliability between our decisions and theirs on which effects to
not fully examine the relevant literature to double-check their include in their meta-analyses was .22. This is particularly low
meta-analyses. Instead, starting alphabetically we focused on agreement considering their decisions for articles were not con-
Anatomy and X-ray. We targeted the specific variables that Wood ducted blind to ours.
et al. believed served as valid criteria for Anatomy and X-ray,
whether or not we agreed. This strategy allowed us to more
Statistical Analyses
objectively evaluate whether relevant findings were overlooked.
We focused on the criterion variables Wood et al. used for Anat- The meta-analytic aggregation model. Wood et al. (2015)
omy and X-ray that had the easiest search termssex offenders, frequently expressed perplexity that their results differed from
children or adults with a history of abuse, and ratings of childrens ours. In each instance, the discrepancy was due to an error on their
somatic concerns (i.e., sex, offender, abuse, somatic). We did not part. For example, they found it puzzling that for Weighted Sum
search dissertations for findings with Anatomy and X-ray that used of Color we arrived at an estimated validity of .38, whereas [they]
their broad physical illness criterion because this search would arrived at an estimate of .49 for the same studies and samples (p.
have been too time-consuming. 240). The discrepancy occurred because they used the formula for
Within our very circumscribed search parameters, we found 14 a fixed-effects model rather than a random-effects model as our
Anatomy and X-ray findings that Wood et al. overlooked. These methodology they were following required.12 Their effect size is
included eight Anatomy and X-ray findings for sex offenders larger because a fixed-effect model weights sample size more
(Celenza & Hilsenroth, 1997, p. 102; Csercsevits, 2000, p. 61; heavily than does a random-effects model, and Weighted Sum of
Fong Hartsfield, 2000, p. 551; Gacono, Meloy, & Bridges, 2000, Colors largest samples had larger effects than the smaller sam-
pp. 768 774; Hughes et al., 1992, p. 327; Yanovsky, 1994, pp. 38,
71), four Anatomy and X-ray findings for children or adults with a
10
history of sexual or physical abuse (Hayman, 2000, p. 241; Ma- In its next printing, the R-PAS test manual will be updated and
lone, 1995, p. 185; Moore, 2003, p. 172; Worgul, 1989, p. 68), and corrected to eliminate the suggestion that sex offenders fit Anatomy and
X-rays core construct.
two Anatomy and X-ray findings for ratings of childrens somatic 11
Wood et al. (2015) said they included dissertations starting in 1997
concerns (Hayman, 2000, p. 197; Moore, 2003, pp. 154, 268). Four but also a few earlier dissertations that could be downloaded without
of these relevant but overlooked findings were from dissertations special payment (p. 238). This process sounded unsystematic so we
already included in Wood et al.s meta-analysis (Hayman, 2000; contacted J. M. Wood for clarification. He explained that their institution
has free access to dissertations in ProQuest but not all are available before
Moore, 2003). 1997.
Finally, Wood et al. (2015) excluded a finding for Anatomy and 12
We describe our use of a random-effects model several times (Mihura
X-ray that was in our meta-analyses because even after several et al., 2013, pp. 562564).
ples.13 The opposite impact occurred for Anatomy and X-ray, for their other Rorschach meta-analytic study to claim that The only
which their aggregated effect size should be .13 instead of .07. meta-analysis to systematically compare published Rorschach
Other statistical errors. We found other problems with studies and Rorschach dissertations with respect to methodological
Wood et al.s (2015) data but we will mention only two. In three quality found no difference (Wood et al., 2010) (Wood et al.,
instances, Wood et al. used the wrong sample size (De Vincent, 2015, p. 242). However, their 2010 meta-analytic study did not
2009, N 24 not 50, p. 65; Leifer, Shapiro, Martone, & Kassem, focus on study quality; it focused on the relationship between the
1991, N 64 not 111, p. 17; Zodan, 2010, N 60 not 45 for Sum Rorschach scores and psychopathy. In one minor portion they
of Shading, p. 156). But most importantly, their analyses did not indicated whether the individual meta-analytic studies reported
target publication bias. Their goal was to test publication bias by interrater reliability. Wood et al. (2010) reported that five (50%)
showing that the published articles had significantly higher of the 10 published studies and three (25%) of the 12 dissertations
effect sizes than unpublished dissertations. Instead, they com-
neglected to report reliabilities for the Rorschach scores in the
pared the studies we included in our meta-analyses (articles)
meta-analysis (p. 342). However, when correcting for errors15
with the new studies they identified (which included both
and distinguishing between chapters and peer-reviewed articles as
dissertations and articles).

publications, 100% articles, 0% chapters, and 67% dissertations
in Wood et al.s (2010) meta-analyses reported interrater reliabil-

Publication Bias?
ity, which would indicate lower quality for chapters and disserta-
tions compared to peer-reviewed articles.
Contending With the Methods We Used to Address
Nonequivalence of the journal and dissertation review
Publication Bias
process. As another argument for equivalence in quality between
Surprisingly, Wood et al.s (2015) comment failed to reconcile unpublished dissertations and peer-reviewed articles, Wood et al.
their conclusions about the presence of publication bias with the (2015) claimed that many dissertations in their meta-analyses were
analyses we conducted showing its absence (Mihura et al., 2013, p. chaired by leading Rorschach researchers (p. 242). Actually,
564). Our funnel plot, tau, Eggers regression test, and fail-safe N most of them were not. Regardless, chairing a dissertation is not
did not find evidence of publication biasnor, importantly, did equivalent to conducting a study oneself, where one has control
these analyses support Wood et al.s hypothesis that noncognitive over the data collection, coding, and statistical analyses. Rorschach
variables are more susceptible to publication bias than cognitive validity research is complex and difficult to do well. Data collec-
variables. We also conducted other analyses to target potential tors must have extensive training to properly administer and code
hindsight bias for study authors and selection bias for us. Our the Rorschach, as well as access to well-trained interrater reliabil-
findings did not support either. For example, we ensured that every ity coders. Further, an enormous amount of data, often derived
time an author hypothesized an effect, that same predictor from statistical analyses the student has never before run or inter-
criterion pair was obtained from any and every other study in
preted with real data, must be managed and interpreted in disser-
which it had not been hypothesized. The effect sizes for author-
tations. In tables alone, the dissertations in Wood et al.s meta-
hypothesized and non-author-hypothesized effects were virtually
analyses contained an average of 1,366 Rorschach data points
identical (.26 and .29, respectively). Wood et al. did not acknowl-
each. Dissertations are also typically 100 to 300 pages long, taxing
edge these systematic efforts to address bias or discuss the dis-
the carefulness and attention to methodological and statistical
crepancies between our findings and theirs.
detail that committee members can provide. Attesting to this, as
Evaluating the Complete Literature previously discussed, Wood et al. themselves overlooked four
relevant findings in dissertations already included in their meta-
Wood et al. (2015) stated, Because unpublished studies were analyses (Hayman, 2000; Moore, 2003); two of these were in a
omitted, the [Mihura et al., 2013] meta-analysis [sic]14 sometimes dissertation with almost 7,000 Rorschach data points (Hayman).
substantially overestimated the validity of Rorschach scores in the Wood et al. also overlooked the relevant descriptive data for
complete scientific literature (p. 242). However, in addition to the Anatomy and X-Ray in two dissertations (Daley, 1992, p. 64;
other issues with their meta-analyses we described, Wood et al. did Sakowicz, 2010, p. 95) and chose instead to use only the Anatomy
not evaluate the complete scientific literature. They included find- descriptive data.
ings from only one category of unpublished studies unpublished
dissertationsand only ProQuest dissertations freely available
13
through their library. They omitted books, book chapters, and Their use of a fixed-effect instead of random-effects model also led to
conference proceedings and did not search study references or notably larger differences in their moderator analyses that tested for pub-
lication bias (ps .001, .008, .001 vs. ps .019, .047, .049).
request unpublished studies from established researchers. They 14
In the text of their Comment, Wood et al. (2015) consistently refer to
also excluded the same type of studies listed in our limitations our meta-analyses of 53 variables with relevant data as a meta-analysis in
sectionthose written in a non-English language, published the singular (27 out of 27 times) and refer to their review of four variables
before 1974, and conducted using a Rorschach system other than as meta-analyses in the plural (8 out of 8 times).
15
the CS (Mihura et al., 2013, p. 577). Wood et al. (2010) mistakenly counted (a) one dissertation as report-
ing interrater reliability that reports none (Cunliffe, 2002) and (b) four
articles as not reporting interrater reliability that did not actually contribute
Study Quality in Unpublished Rorschach Dissertations data to their meta-analyses; as additional errors, two of these four articles
(marked with asterisks) actually did contain interrater reliability (Gacono
Errors in Wood et al.s (2015) evidentiary study. As an & Meloy, 1991; Gacono, Meloy, & Berg, 1992, p. 37; Gacono, Meloy, &
argument for dissertation study quality, Wood et al. (2015) cite Heaven, 1990; Meloy & Gacono, 1992, p. 108).
Further, unlike the peer-reviewed process for journals, no blind- Suicide Constellation. Instead, after completing their meta-analyses,
ing exists between student author and dissertation committee they (a) reverted to using a narrative review and critique and (b)
members, who usually have ongoing working relationships in reran their analyses on a few studies using a different comparison
which the members have made a significant investment in the sample, and based on this they overturned their meta-analytic
student. Lovitts (2005) conducted focus groups of 276 dissertation findings. But it is methodologically unsound to conduct a meta-
chairs and committee members and found they rarely, if ever, analysis and then undo it by picking apart the studies you included
failed a dissertation. Instead, they seek excuses [and] take into and the method by which you included them. Instead, from the
consideration such things as their feelings about the person rather start, one must conduct a meta-analysis with the desired method-
than the objective document. In the end, most defer to the adviser, ology applied evenly across all studies.
hold their noses, and vote to pass (p. 21). Weighted sum of color. Wood et al. (2015) overturned their
Student errors can occur when administering the Rorschach, Weighted Sum of Color meta-analytic results post hoc by combin-
recording and coding the responses, entering the codes into a ing the raw mean scores for this variable in four borderline
scoring program, transferring data from the scoring program to a personality patient samples and then compared it to the Interna-
statistical software package, matching the scores to de-identified tional Norms. After doing so, they claimed the borderline patients
clinical data, and so forth. Completing a dissertation can be an could not be differentiated from nonpatients and that this var-
especially challenging situation for clinical psychology students iables meaning was thus presently a mystery (p. 241). However,
who do not typically pursue a research career (Norcross & Kar- averaging raw scores rather than effect sizes is inconsistent with
piak, 2012) and are commonly completing their first or second meta-analytic procedures because it discards the study design
research project. Not one of the 23 dissertation authors in Wood et features that may be connected with the raw score values. It also
al.s meta-analyses went on to become lead authors on other introduces the two problems previously discussed, that Wood et al.
Rorschach research, and only two became lead authors on any themselves have repeatedly described as problems in other publi-
published research at all. cations: replacing local study comparison samples with general
Data errors in the dissertations. The argument for study norms and not controlling for the Number of Responses. For three
quality ideally should be based on the studies themselves, not of the four studies (Exner, 1986, p. 460; Skinstad et al., 1999, p.
remote indicators. When reviewing Wood et al.s (2015) disserta- 138; Zodan, 2010, p. 162) the composite mean Number of Re-
tions we found many errors. As part of our meta-analytic review sponses was lower in both the borderline (18.69) and the compar-
protocol, When errors were discovered (e.g., findings reported in
the text contradicted those in a table), the data were included only
16
if the studys author(s) corrected the discrepancies when con- By identifying dissertation errors, we do not wish to discourage
students or their mentors from conducting challenging Rorschach research
tacted (Mihura et al., 2013, p. 560). At least eight of the 23 for a dissertation project, though we do wish to encourage great care with
dissertations (35%) in Wood et al. had disqualifying errors,16 the project.
17
which included impossible results of various types, such as dichot- For example, two dissertations (Moore, 2003, p. 183; Sakowicz, 2010,
omous percentages that sum to greater than 1.0, mathematically pp. 92, 95) reported a mean for Anatomy and X-ray that was lower than for
impossible combinations of values in tables, and impossibly high Anatomy alone. Moore also reported a mean for Anatomy and Morbid that
was lower than for either Anatomy or Morbid alone (p. 183) and reported
proportions of diagnoses (Bank, 2001; Daley, 1992; Hayman, that Anatomy was correlated with itself at a magnitude of r .18 (p.
2000; Moore, 2003; Owens, 1982; Sakowicz, 2010; Saraydarian, 269). Bank (2001) reported a mean for the sum of Anatomy, X-ray, and
1990; Verias, 2007).17 Several other methodological problems Blood (p. 34) that was smaller than for Anatomy alone (p. 35). Two
existed, such as absent (Burns, 2003) or unrealistically high inter- dissertations (Daley, 1992, p. 103; Owens, 1982, pp. 22, 83) reported a
mean for Sum of Shading that was larger than the sum of the means for its
rater reliability for Rorschach codingthat is, mean ICC .969 component parts. In Daley, there were also discrepancies between descrip-
(Zodan, 2010, p. 153) and mean .971 (Smith, 2008, p. 93), tive data in an appendix and the tabular results in the text for Anatomy and
suggesting either failure to blind coders, use of an inaccurate X-ray (pp. 64, 105, 106). Saraydarian (1990) reported values for Weighted
formula, overly simplistic Rorschach responses, or miscomputed Sum of Color, EA es, and Lambda for the schizotypal personality sample
or fabricated data. Another dissertation labeled statistically signif- that are impossible unless the mean Number of Responses was at least
twice the normative value. There were also serious problems with the
icant but nonhypothesized findings as unanticipated hypotheses methodology and the archival data in this study, such that 10 years of
(Grosso, 1999, p. 188). These dissertation errors do not disprove inpatient data needed to be reviewed just to obtain 13 eligible borderline
publication bias, but they do suggest we should expect lower effect personality disorder patients; further, the borderline diagnosis was unreli-
sizes in Rorschach dissertations than in peer-reviewed articles. able and the Rorschach administration was unstandardized (pp. 87-92). In
a series of tables, Hayman (2000) reported impossible percentages of youth
Given the impossible results and problems we observed, the ab- with particular diagnoses or clinical problems. For example, one table
sence of publication may reflect the peer review process working reported that eating disorders were observed in 5% of about 150 youth who
as it is supposed to by keeping untrustworthy findings from were positive on a test indicator and 80% (identified by DSM diagnosis) or
reaching the public domain. 88% (determined by chart review) of about 850 youth who were negative
on that test indicator (Table 13, p. 144). Further, these large differences in
base rates were reported as nonsignificant. The agreement for chart ratings
Wood et al.s Negative Conclusions Are Not Based on of somatic concerns were also very unreliable at kappa .17 (p. 119).
Verias (2007) reported that their depressive criterion was formed by
Their Meta-Analyses or Ours clinician ratings on five symptom dimensions. In order to be considered
Wood et al. (2015) concluded that their analyses revealed that depressed, the patient had to reach a cut-off on at least three of the five ratings. It
was reported that 166 people were positive on at least three of the five ratings,
none of these [four Rorschach] scores was truly strongly sup- yet that was impossible given the number of people who were positive on
ported (p. 243). But this statement is misleading. Their meta- the five individual criteria were 16, 52, 69, 114, and 208 (see pp. 68-72).
analyses found effects greater than .40 for Weighted Sum of Color and This is only a small sample of the errors we encountered.
ison samples (17.84) than in the International Norms (22.31; also excluded studies that selectively reported only significant
Meyer, Erdberg, & Shaffer, 2007, p. S203). The fourth study findings if it was not possible to determine whether relevant
(Saraydarian, 1990, p. 61) did not report the mean Number of variables had also been examined (Mihura et al., 2013, p. 560).
Responses; however, they allowed protocols with as few as 10 For Exner and Wylie we conservatively computed an average
responses, whereas the International Norms has at least the re- effect across all possible predictor scale cut points because the
quired 14 responses per protocol. These study-specific problems authors did not specify a validity cut point a priori. Therefore, our
attest to the need for local comparison samples and statistical effect size for this study was substantially lower than that derived
control for the Number of Responses. They also illustrate precisely by Wood et al. (.39 vs. .54).
why meta-analyses rely on effect sizes rather than raw scores when All of the studies included in our meta-analysis for Suicide
pooling results.
Constellation fit the inclusion criteria. If Wood et al. disagreed
The other argument Wood et al. (2015) used to overturn their
with our criteria then they would need to conduct a meta-analysis
meta-analytic findings for Weighted Sum of Color was based on
with different criteria applied evenly across all studies. Our meta-
the results of just one study. They asserted that Any attempt to
analytic findings for Suicide Constellation should not be over-
understand Weighted Sum of Color must also come to terms with

turned.
the findings of a dissertation by De Vincent (2009) (p. 240).

Why? Was this a large multiple-study dissertation or a best- Complexity/Synthesis variables. Finally, Wood et al. (2015)
evidence synthesis of the literature? No, it was a small single-study gave an extended critique of the six Complexity/Synthesis vari-
dissertation (N 24). One cannot conduct a meta-analysis and ables, a subcomponent of the four types of variables they consid-
then use the results of one small study to challenge it. If one small ered most valid, to argue that they are largely redundant and should
study were so powerful, Psychological Bulletin would be the size be replaced by an intelligence test. The authors begin their argu-
of a pamphlet. ment by reporting these variables intercorrelations (M |.58|, k
Suicide Constellation. Wood et al. (2015) overturned their 6) from one of their studies (Wood, Krishnamurthy, & Archer,
positive meta-analytic findings for the Suicide Constellation with 2003); but, inexplicably they omitted the Complexity Ratio vari-
a post hoc study-by-study narrative review and critique. They able that had the lowest intercorrelations (M |.35|, k 4).
agreed with the positive findings of one study (Fowler, Piers, Regardless, neither value is atypical for psychological tests. The
Hilsenroth, Holdwick, & Padawer, 2001) but argued against three analogous average intercorrelations on the WAIS-IV (an adult
others. First, Wood et al. argued against a finding that utilized intelligence test) and WMS-IV (a memory test) are .63 and .44,
cerebrospinal fluid levels of 5-hydroxyindoleacetic acid (CSF respectively.18 Next, Wood et al. claim that the Complexity/Syn-
5-HIAA) in recent suicide attempters as a suicide severity risk thesis variables all predict essentially the same criteria, including
indicator (Lundbck et al., 2006) because suicide risk was not IQ scores, dementia, and head injury, as can be verified by con-
the central validity question for the Suicide Constellation (Wood sulting the Appendix of the Mihura et al. meta-analysis [sic] (p.
et al., 2015, p. 239). However, Suicide Risk is exactly the 237).19 It is expected and good that these Rorschach variables are
construct label for the Suicide Constellation, making CSF 5-HIAA
related to broad measures of cognitive complexity. Importantly,
levels in recent attempters a relevant criterion for that Rorschach
they also had distinctive relationships with 10 other criterion
variable. Their argument is particularly confusing because in the
variables in our meta-analyses, which Wood et al. did not mention.
previous paragraph they overtly state that the validation goal is to
Wood et al. (2015) conclude their argument against the Com-
evaluate the Suicide Constellation as a measure of suicide risk
(p. 239). Wood et al. also criticized our inclusion of CSF 5-HIAA plexity/Synthesis variables by stating that cognitive impairment
as a validity criterion for Suicide Constellation because we did not and cognitive ability are more appropriately assessed by a well-
include other suicide risk factors as validity criteria such as major validated intelligence test than by the Rorschach (p. 243). This
depressive disorder, bipolar disorder, and substance abuse (p. statement is true, but it is a straw man argument. We did not
239). However, as previously discussed, our methodology requires evaluate Rorschach variables as measures of cognitive impair-
identifying all the author-hypothesized associations in the litera- ment and cognitive ability. Cognitive tests are designed as
ture (Mihura et al., 2013, p. 559). Wood et al. should know these maximal performance measures of cognitive ability and impair-
risk factors were not hypothesized for the Suicide Constellation in ment (Cronbach, 1990)the Rorschach is not, even though scores
the entire pool of articles. derived from it relate to everyday cognitive sophistication under
Next, Wood et al. (2015) reported that Meyer (1993) found no task demands that pull for typical performance. In our meta-
correlation (r .00; N 90) between Suicide Constellation scores analytic methodology, we placed considerable importance on tar-
and patient suicide attempts (p. 239). However, this result is not geting the bulls-eye of the Rorschach variables construct when
from Meyers hypothesis, which was that average-length [Ror- matching the criterion variables. The constructs assessed by the six
schach] protocols would demonstrate the greatest external valid- Complexity/Synthesis variables are provided in our Table 1 (Mi-
ity (p. 156). This hypothesis was formulated to address the hura et al., 2013, pp. 550 553); none mention intelligence.
problematic influence of varying Number of Responses (as we
have discussed). The hypothesis-based finding was r .22. Fi-
nally, Wood et al. claimed that Exner and Wylies (1977) Suicide
Constellation finding should be excluded because it involved 18
Derived from the normative samples in the respective test manuals
extensive fishing (p. 240). However, we had an exclusion (Wechsler, 2008, 2009).
criterion to address this problemwhich Wood et al. could have 19
As a correction, the relevant validity criteria are listed in our Results
used if they believed it fitand we used it in several cases: We sectionnot our Appendix.
Where Do We Go From Here? illness (Doctoral dissertation). Available from ProQuest Dissertations
and Theses database. (UMI No. 9934015)
Although we have largely focused on the problems in Wood et Bank, C. A. (2001). Predicting sexually abused victims using the Ror-
al.s (2015) meta-analyses and related arguments, we agree that the schach Inkblot Test (Doctoral dissertation). Available from ProQuest
key issues they raised are important and should be addressed. For Dissertations and Theses database. (UMI No. 3014406).
example, we agree that reducing redundancy among Rorschach Belcher, P. L. (1995). A comparison of male adolescent sexual offenders
variables is important. Redundancy among variables steered our using the Rorschach (Doctoral dissertation). Available from ProQuest
decision to omit or combine several variables in our new Ror- Dissertations and Theses database. (UMI No. 9536333)
schach system even though, independently, these variables ap- Black, E. M. (2002). The use of the Rorschach Inkblot Technique in the
detection and diagnosis of child sexual abuse (Doctoral dissertation).
peared to possess good validity (see Meyer et al., 2011, pp.
Available from ProQuest Dissertations and Theses database. (UMI No.
459 463). Wood et al. also recommended that, to avoid false
3068669)
inferences about pathology in clinical practice, Rorschach inter- Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009).
pretations be based on comparisons with the CS International Introduction to meta-analysis. Chichester, England: Wiley. http://dx.doi
Norms (Meyer et al., 2007) rather than with the regular CS norms .org/10.1002/9780470743386
(Exner, 2003) (Wood et al., 2015, p. 243). Although this recom- Burns, J. C. (2003). Use of the Rorschach to identify trauma in a sample
mendation was not in response to findings in our article, we agree of homeless and indigent women (Doctoral dissertation). Available from
with them and also recognize their earlier work suggesting prob- ProQuest Dissertations and Theses database. (UMI No. 3084479)
lems with the CS norms (Wood, Teresa, Garb, & Lilienfeld, 2001, Celenza, A., & Hilsenroth, M. (1997). Personality characteristics of mental
p. 350). Not only do we agree on this point, we also use the health professionals who have engaged in sexualized dual relationships:
International Norms as reference data in our new Rorschach sys- A Rorschach investigation. Bulletin of the Menninger Clinic, 61, 90
tem (Meyer et al., 2011). 107. Retrieved from http://unboundmedicine.com/medline/citation/
9066179/
As previously discussed, for many years Wood et al. (2015)
Cronbach, L. J. (1949). Statistical methods applied to Rorschach scores; a
have raised concerns about the Number of Responses as a potential
review. Psychological Bulletin, 46, 393 429. http://dx.doi.org/10.1037/
artifact (e.g., Wood et al., 1996; Wood, Nezworski, Lilienfeld et h0059467
al., 2003). We agree. To address this problem, we revised admin- Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). New
istration guidelines to reduce the variability in Rorschach re- York, NY: Harper & Row.
sponses (Meyer et al., 2011). So far, evidence suggests these Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological
efforts have led to significantly reduced variability (e.g., Reese, tests. Psychological Bulletin, 52, 281302. http://dx.doi.org/10.1037/
Viglione, & Giromini, 2014). We also developed a scoring pro- h0040957
gram to statistically adjust for the overall complexity of a persons Csercsevits, M. (2000). A comparative study of juvenile sex offenders,
Rorschach protocol, which includes Number of Responses (Meyer juvenile delinquents, and juvenile dependents using the Rorschach Ink-
et al., 2011). Lastly, for many years the authors have recom- blot Test (Doctoral dissertation). Available from ProQuest Dissertations
mended that Rorschach validity be more systematically and thor- and Theses database. (UMI No. 9969876)
Cunliffe, T. (2002). A Rorschach investigation of incarcerated female
oughly addressed (Lilienfeld et al., 2000; Wood et al., 1996).
psychopaths. (Doctoral dissertation). Available from ProQuest Disser-
Obviously we agree because we devoted significant efforts to
tations and Theses database. (UMI No. 3043373)
conduct construct validity meta-analyses for 65 Rorschach vari- Daley, B. S. (1992). A descriptive study of the Rorschach and chronic pain
ables (Mihura et al., 2013).20 (Doctoral dissertation). Available from ProQuest Dissertations and The-
To close, Wood et al. (2015) found our meta-analyses to be ses database. (UMI No. 9236716)
accurate and unbiased and, hence, lifted their moratorium on the De Vincent, T. (2009). A validation of the controls cluster of the Rorschach
Rorschach (Garb, 1999); but they also cautioned that our results Comprehensive System (Doctoral dissertation). Available from ProQuest
should not be seen as the final word regarding the scientific status Dissertations and Theses database. (UMI No. 3353725)
of clinical psychologys most contested measure (p. 243). We Exner, J. E. (1986). Some Rorschach data comparing schizophrenics
agree. This is how we concluded our article: Like Smith and with borderline and schizotypal personality disorders. Journal of
Glasss [1977] situation, we do not expect ours to be the final word Personality Assessment, 50, 455 471. http://dx.doi.org/10.1207/
s15327752jpa5003_14
on the topic. Indeed, we hope that it is not . . . (Mihura et al.,
Exner, J. E. (2003). The Rorschach: A Comprehensive System. Vol. 1:
2013, p. 580). We thank the authors for the opportunity to continue
Basic foundations and principles of interpretation (4th ed.). New York,
this dialogue in the quest to make the Rorschach a more valid and NY: Wiley.
useful clinical instrument. Exner, J. E., Jr., & Wylie, J. (1977). Some Rorschach data concerning
suicide. Journal of Personality Assessment, 41, 339 348. http://dx.doi
.org/10.1207/s15327752jpa4104_1
20
There were also many other areas where we applied their recommen- Flahault, C., & Sultan, S. (2010). On being a child of an ill parent: A
dations (e.g., no Rorschach inkblot criterion, attend to possible criterion Rorschach investigation of adaptation to parental cancer compared to
contamination, convert various statistics to a common effect size metric, other illnesses. Rorschachiana, 31, 43 69. http://dx.doi.org/10.1027/
attend to possible hindsight bias).
1192-5604/a000004
Fong Hartsfield, R. F. (2000). A Rorschach study of schizophrenic rapists
(Doctoral dissertation). Available from ProQuest Dissertations and The-
References
ses database. (UMI No. 9973838)
Auker-Keller, A. A. (1998). Female breast cancer patients: The differ- Fowler, J. C., Piers, C., Hilsenroth, M. J., Holdwick, D. J., Jr., & Padawer,
ences in psychological and family functioning based on severity of J. R. (2001). The Rorschach Suicide Constellation: Assessing various
degrees of lethality. Journal of Personality Assessment, 76, 333351. Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Cor-
http://dx.doi.org/10.1207/S15327752JPA7602_13 recting error and bias in research synthesis (2nd ed.). Thousand Oaks,
Gacono, C. B., & Meloy, J. R. (1991). A Rorschach investigation of CA: Sage.
attachment and anxiety in antisocial personality disorder. Journal of Leifer, M., Shapiro, J. P., Martone, M. W., & Kassem, L. (1991). Ror-
Nervous and Mental Disease, 179, 546 552. http://dx.doi.org/10.1097/ schach assessment of psychological functioning in sexually abused girls.
00005053-199109000-00005 Journal of Personality Assessment, 56, 14 28. http://dx.doi.org/
Gacono, C. B., Meloy, J. R., & Berg, J. L. (1992). Object relations, 10.1207/s15327752jpa5601_2
defensive operations, and affective states in narcissistic, borderline, and Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status
antisocial personality disorder. Journal of Personality Assessment, 59, of projective techniques. Psychological Science in the Public Interest, 1,
32 49. http://dx.doi.org/10.1207/s15327752jpa5901_4 27 66.
Gacono, C. B., Meloy, J. R., & Bridges, M. R. (2000). A Rorschach Lovitts, B. E. (2005). How to grade a dissertation. Academe, 91, 18 23.
comparison of psychopaths, sexual homicide perpetrators, and nonvio- Retrieved from http://www.aaup.org/reports-and-publications/academe
lent pedophiles: Where angels fear to tread. Journal of Clinical Psychol- Lundbck, E., Forslund, K., Rylander, G., Jokinen, J., Nordstrm, P.,
ogy, 56, 757777. http://dx.doi.org/10.1002/(SICI)1097-4679 Nordstrm, A. L., & sberg, M. (2006). CSF 5-HIAA and the Ror-
(200006)56:6757::AID-JCLP63.0.CO;2-I schach test in patients who have attempted suicide. Archives of Suicide
Gacono, C. B., Meloy, J. R., & Heaven, T. R. (1990). A Rorschach Research, 10, 339 345. http://dx.doi.org/10.1080/13811110600790942
investigation of narcissism and hysteria in antisocial personality. Jour- Malone, J. A. (1995). Rorschach correlates of childhood incest history in
nal of Personality Assessment, 55, 270 279. http://dx.doi.org/10.1207/ adult women in psychotherapy (Doctoral dissertation). Available from
s15327752jpa5501&2_26 ProQuest Dissertations and Theses database. (UMI No. 9600581)
Garb, H. N. (1999). Call for a moratorium on the use of the Rorschach McCraw, R. K., & Pegg-McNab, J. (1989). Rorschach comparisons of
Inkblot Test in clinical and forensic settings. Assessment, 6, 313317. male juvenile sex offenders and nonsex offenders. Journal of Person-
http://dx.doi.org/10.1177/107319119900600402 ality Assessment, 53, 546 553. http://dx.doi.org/10.1207/
Garb, H. N., Florio, C. M., & Grove, W. M. (1998). The validity of the s15327752jpa5303_11
Rorschach and the Minnesota Multiphasic Personality Inventory: Re- Meloy, J. R., & Gacono, C. B. (1992). The aggression response and the
sults from meta-analyses. Psychological Science, 9, 402 404. http://dx Rorschach. Journal of Clinical Psychology, 48, 104 114. http://dx.doi
.doi.org/10.1111/1467-9280.00075 .org/10.1002/1097-4679(199201)48:1104::AID-JCLP22704801153
Garb, H. N., Florio, C. M., & Grove, W. M. (1999). The Rorschach .0.CO;2-1
controversy: Reply to Parker, Hunsley, & Hanson. Psychological Sci- Meyer, G. J. (1992). Response frequency problems in the Rorschach:
ence, 10, 293294. http://dx.doi.org/10.1111/1467-9280.00154 Clinical and research implications with suggestions for the future. Jour-
Garb, H. N., Wood, J. M., & Nezworski, M. T. (2000). Projective tech- nal of Personality Assessment, 58, 231244. http://dx.doi.org/10.1207/
niques and the detection of child sexual abuse. Child Maltreatment, 5, s15327752jpa5802_2
161168. http://dx.doi.org/10.1177/1077559500005002007 Meyer, G. J. (1993). The impact of response frequency on the Rorschach
Garb, H. N., Wood, J. M., Nezworski, M. T., Grove, W. M., & Stejskal, constellation indices and on their validity with diagnostic and MMPI-2
W. J. (2001). Toward a resolution of the Rorschach controversy. Psy- criteria. Journal of Personality Assessment, 60, 153180. http://dx.doi
chological Assessment, 13, 433 448. http://dx.doi.org/10.1037/1040- .org/10.1207/s15327752jpa6001_13
3590.13.4.433 Meyer, G. J. (2007, March). Some old and new factors to help conceptu-
Garca-Alba, C. (2004). Anorexia and depression: Depressive comorbidity alize CS Scores. In R. J. Ganellen (Chair), What might this become?
in anorexic adolescents. The Spanish Journal of Psychology, 7, 40 52. Future directions for Rorschach Assessment. Symposium presented at
http://dx.doi.org/10.1017/S113874160000473X the annual meeting of the Society for Personality Assessment, Wash-
Gregg, P. A. (1998). The effect of impression management on correlations ington, DC.
between Rorschach and MMPI-2 variables (Doctoral dissertation). Meyer, G. J., Erdberg, P., & Shaffer, T. W. (2007). Toward international
Available from ProQuest Dissertations and Theses database. (UMI No. normative reference data for the comprehensive system. Journal of
9807469) Personality Assessment, 89(Suppl 1), S201S216. http://dx.doi.org/
Grosso, C. G. (1999). An investigation of the construct and concurrent 10.1080/00223890701629342
validity of variables from the Comprehensive System for the Rorschach Meyer, G. J., Viglione, D. J., Mihura, J. L., Erard, R. E., & Erdberg, P.
in a child psychiatric inpatient population (Doctoral dissertation). Avail- (2011). Rorschach Performance Assessment System: Administration,
able from ProQuest Dissertations and Theses database. (UMI No. coding, interpretation, and technical manual. Toledo, OH: Author.
9940446) Mihura, J. L., Meyer, G. J., Dumitrascu, N., & Bombel, G. (2013). The
Hayman, J. A. (2000). An actuarial and confirmatory investigation of the validity of individual Rorschach variables: Systematic reviews and
interpersonal and self-perception clusters of the Rorschach Comprehen- meta-analyses of the comprehensive system. Psychological Bulletin,
sive System (Doctoral dissertation). Available from ProQuest Disserta- 139, 548 605. http://dx.doi.org/10.1037/a0029406
tions and Theses database. (UMI No. 9965254) Moore, T. L. (2003). Sexually victimized and non-sexually victimized
Hiller, J. B., Rosenthal, R., Bornstein, R. F., Berry, D. T. R., & Brunell- adolescent sexual offenders: One group or two? (Doctoral dissertation).
Neuleib, S. (1999). A comparative meta-analysis of Rorschach and Available from ProQuest Dissertations and Theses database. (UMI No.
MMPI validity. Psychological Assessment, 11, 278 296. http://dx.doi 8217265)
.org/10.1037/1040-3590.11.3.278 Nakamura, N., Fuchigami, Y., & Tsugawa, R. (2007). Rorschach Compre-
Hughes, S. A., Deville, C., Chalhoub, M., & Romboletti, R. (1992). The hensive System data for a sample of 240 adult nonpatients from Japan.
Rorschach human anatomy response: Predicting sexual offending be- Journal of Personality Assessment, 89(Suppl 1), S97S102. http://dx.doi
havior in juveniles. The Journal of Psychiatry & Law, 20, 313333. .org/10.1080/00223890701583291
Retrieved from http://psycnet.apa.org/Psycinfo/1994-00100-001 Norcross, J. C., & Karpiak, C. P. (2012). Clinical psychologists in the
Hunsley, J., & Bailey, J. M. (1999). The clinical utility of the Rorschach: 2010s: 50 years of the APA Division of Clinical Psychology. Clinical
Unfulfilled promises and an uncertain future. Psychological Assessment, Psychology: Science and Practice, 19, 112. http://dx.doi.org/10.1111/
11, 266 277. http://dx.doi.org/10.1037/1040-3590.11.3.266 j.1468-2850.2012.01269.x
Owens, T. H. (1982). Personality characteristics of female psychotherapy Wood, J. M., Krishnamurthy, R., & Archer, R. P. (2003). Three factors of
patients with a history of incest (Doctoral dissertation). Available from the comprehensive system for the Rorschach and their relationship to
ProQuest Dissertations and Theses database. (UMI No. 8217265) Wechsler IQ scores in an adolescent sample. Assessment, 10, 259 265.
Porcelli, P., & Meyer, G. J. (2002). Construct validity of Rorschach http://dx.doi.org/10.1177/1073191103255493
variables for alexithymia. Psychosomatics, 43, 360 369. http://dx.doi Wood, J. M., & Lilienfeld, S. O. (1999). The Rorschach Inkblot Test: A
.org/10.1176/appi.psy.43.5.360 case of overstatement? Assessment, 6, 341351. http://dx.doi.org/
Porcelli, P., & Mihura, J. L. (2010). Assessment of alexithymia with the 10.1177/107319119900600405
Rorschach Comprehensive System: The Rorschach Alexithymia Scale Wood, J. M., Lilienfeld, S. O., Garb, H. N., & Nezworski, M. T. (2000).
(RAS). Journal of Personality Assessment, 92, 128 136. http://dx.doi The Rorschach test in clinical diagnosis: A critical review, with a
.org/10.1080/00223890903508146 backward look at Garfield (1947). Journal of Clinical Psychology, 56,
Reese, J. B., Viglione, D. J., & Giromini, L. (2014). A comparison between 395 430. http://dx.doi.org/10.1002/(SICI)1097-4679(200003)56:
comprehensive system and an early version of the Rorschach perfor- 3395::AID-JCLP153.0.CO;2-O
mance assessment system administration with outpatient children and Wood, J. M., Lilienfeld, S. O., Nezworski, M. T., Garb, H. N., Allen, K. H.,
adolescents. Journal of Personality Assessment, 96, 515522. http://dx & Wildermuth, J. L. (2010). Validity of Rorschach Inkblot scores for
.doi.org/10.1080/00223891.2014.889700 discriminating psychopaths from non-psychopaths in forensic popula-

tions: A meta-analysis. Psychological Assessment, 22, 336 349. http://
Ritsher, J. E. B. (1997). Depression assessment of Russian psychiatric

patients: Validity of MMPI and Rorschach scales (Doctoral disserta- dx.doi.org/10.1037/a0018998
tion). Available from ProQuest Dissertations and Theses database. (UMI Wood, J. M., Nezworski, M. T., & Garb, H. N. (2003). Whats right with
No. 9816520) the Rorschach? The Scientific Review of Mental Health Practice, 2,
Sakowicz, K. (2010). Rorschach correlates of childhood sexual abuse 142146.
(Doctoral dissertation). Available from ProQuest Dissertations and The- Wood, J. M., Nezworski, M. T., Lilienfeld, S. O., & Garb, H. N. (2003).
Whats wrong with the Rorschach? Science confronts the controversial
ses database. (UMI No. 3437950)
inkblot test. San Francisco, CA: Jossey-Bass.
Saraydarian, L. (1990). Diagnostic clarity of borderline personality disor-
Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1996). The Compre-
der: A Rorschach study (Doctoral dissertation). Available from ProQuest
hensive System for the Rorschach: A critical examination. Psychological
Dissertations and Theses database. (UMI No. 9027329)
Science, 7, 310. http://dx.doi.org/10.1111/j.1467-9280.1996.tb00658.x
Skinstad, A. H., Troland, K., & Mortensen, J. K. (1999). Rorschach
Wood, J. M., Nezworski, M. T., Stejskal, W. J., Garven, S., & West, S. G.
responses in borderline personality disorder with alcohol dependence.
(1999). Methodological issues in evaluating Rorschach validity: A com-
European Journal of Psychological Assessment, 15, 133142. http://dx
ment on Burns and Viglione (1996), Weiner (1996), and Ganellen
.doi.org/10.1027//1015-5759.15.2.133
(1996). Assessment, 6, 115129. http://dx.doi.org/10.1177/
Smith, A. L. (2008). Verbal and social functioning among boys in the
107319119900600202
criminal justice system (Doctoral dissertation). Available from ProQuest
Wood, J. M., Teresa, P. M., Garb, H. N., & Lilienfeld, S. O. (2001). The
Dissertations and Theses database. (UMI No. 3320210) misperception of psychopathology: Problems with the norms of the
Takahashi, M., Takahashi, Y., & Nishio, H. (2007). Comprehensive System for the Rorschach. Clinical Psychology: Science
= [Rorschach test interpretation method]. Tokyo, and Practice, 8, 350 373. http://dx.doi.org/10.1093/clipsy.8.3.350
Japan: Kongo Shuppan. Worgul, K. J. (1989). Rorschach comparisons of physically abused, prob-
Takahashi, M., Takahashi, Y., & Nishio, H. (2009). lematic, and normal school-age children (Doctoral dissertation). Avail-
= [Rorschach form quality table]. Tokyo, Japan: able from ProQuest Dissertations and Theses database. (UMI No.
Kongo Shuppan. 8926554)
Verias, E. A. (2007). Further investigation into the Rorschach and the Yamamoto, K., Kanbara, K., Mutsuura, H., Ban, I., Mizuno, Y., Abe, T.,
utility of a modified DEPI in relation to adolescent depression (Doctoral . . . Fukunaga, M. (2010). Psychological characteristics of Japanese
dissertation). Available from ProQuest Dissertations and Theses data- patients with chronic pain assessed by the Rorschach test. BioPsycho-
base. (UMI No. 3270246) Social Medicine, 4, 20. http://dx.doi.org/10.1186/1751-0759-4-20
Wechsler, D. (2008). WAIS-IV: Administration and scoring manual. San Yanovsky, A. (1994). The role of denial in adolescent sex offenders who
Antonio, TX: The Psychological Corporation. were sexually victimized as children as measured by the Rorschach
Wechsler, D. (2009). WMS-IV: Administration and scoring manual. San (Doctoral dissertation). Available from ProQuest Dissertations and The-
Antonio, TX: The Psychological Corporation. ses database. (UMI No. 9533358)
Wood, J. M., Garb, H. N., Lilienfeld, S. O., & Nezworski, M. T. (2002). Zodan, J. (2010). Rorschach assessment of childhood sexual abuse and
Clinical assessment. Annual Review of Psychology, 53, 519 543. http:// borderline pathology: A comparison of clinical samples (Doctoral dis-
dx.doi.org/10.1146/annurev.psych.53.100901.135136 sertation). Available from ProQuest Dissertations and Theses database.
Wood, J. M., Garb, H. N., Nezworski, M. T., Lilienfeld, S. O., & Duke, (UMI No. 3425222)
M. C. (2015). A second look at the validity of widely used Rorschach
indices: Comment on Mihura, Meyer, Dumitrascu, and Bombel (2013). Received May 26, 2014
Psychological Bulletin, 141, 236 249. http://dx.doi.org/10.1037/ Revision received September 7, 2014
a0036005 Accepted October 28, 2014

2015, Standards, Accuracy, and Questions of Bias in Rorschach Meta-Analyses PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2015, Standards, Accuracy, and Questions of Bias in Rorschach Meta-Analyses PDF

Uploaded by

Copyright:

Available Formats

Psychological Bulletin 2015 American Psychological Association

2015, Vol. 141, No. 1, 250 260 0033-2909/15/$12.00 http://dx.doi.org/10.1037/a0038445

Standards, Accuracy, and Questions of Bias in Rorschach Meta-Analyses:

Joni L. Mihura and Gregory J. Meyer George Bombel

Keywords: Rorschach, meta-analysis, construct validity, comprehensive system, psychological assess-

merous methodological errors, data errors, and omitted studies. article.

excluded findings because we judged that sexual offender charac-

dissertations and articles).

in Wood et al.s (2010) meta-analyses reported interrater reliabil-

understand Weighted Sum of Color must also come to terms with

the findings of a dissertation by De Vincent (2009) (p. 240).

.doi.org/10.1080/00223891.2014.889700 discriminating psychopaths from non-psychopaths in forensic popula-

Ritsher, J. E. B. (1997). Depression assessment of Russian psychiatric

You might also like