You are on page 1of 117

Training of spatial ability 1

RUNNING HEAD: SPATIAL TRAINING META-ANALYSIS

Training Spatial Skills:

What Works, for Whom, Why and for How Long?

Linda L. Liu, David H. Uttal, Loren M. Marulis, Alison R. Lewis, and Christopher M. Warren

Northwestern University

Nora S. Newcombe

Temple University

This work was supported by the Spatial Intelligence and Learning Center (NSF Grant ). We
thank Spyros Konstantopoulos and Larry Hedges for their help. Send correspondence to
David Uttal (duttal@northwestern.edu) or Nora Newcombe (newcombe@temple.edu).
Training of spatial ability 2

Abstract

We meta-analyzed 113 research studies in order that attempted to improve spatial reasoning. We

investigated the magnitude, durability, transfer and mediators of traning effectsto determine the

extent to which spatial skills can be improved by training, to identify factors that moderate

training effects, and to ascertain the durability and generalizability of training effects.. The

magnitude We found that effect sizes were affecte substantially by the presence and type of

control or comparison groups. of effect sizes was highly dependent on the kind of control groups

used in the studies. After treatment group improvement was separated from control group

improvement, the mean effect size for treatment was g = .75 (SE = .03). Treatment group effect

sizes did not differ for children (g = .70, SE = .05) and adults (g = .77, SE = .03) or for males (g

= .58, SE = .06) and females (g = .59, SE = .06). Training effects were stable; the Furthermore,

eeffectsffect sizes were stable regardless of whether not affected by delays between training and

post-test mesuresposttesting was immediate or delayed. Although the magnitude of transfer was

not as great as the magntidue of original traning, there was evidence of both and near and far

transfer. Effect sizes were still evident, although significantly lower, when far transfer rather than

near transfer of training was examined. Considered together, the results suggest that spatially

enriched education could pay high dividends in terms of improved participation in mathematics,

science, and engineering.


Training of spatial ability 3

Training Spatial Skills:

What Works, for Whom, Why and for How Long?

The central question in the study of development is how a mature form arises from initial

departure points, and what influences the course of such development. Related questions include

the origins and determinants of individual variation, and the existence of sensitive periods for

influencing development for better or for worse (Bornstein, 1989). Although spirited debate on

these matters continues, there has been an increasing emphasis on malleability in the

neuroscience of development (e.g., Johnson, Munakata & Gilmore, 2002; Shonkoff & Phillips,

2000). Relatedly, there has been a focus on education that maximizes human potential and

reduces inequality, both in preschool children (e.g., Heckman & Masterov, 2007) and for a

variety of subjects taught in school, such as reading (e.g., Rayner, Foorman, Perfetti, Pesetsky &

Seidenberg, 2001) and mathematics (e.g., National Mathematics Advisory Panel, 2008).

Spatial skills are not a school subject, but they have been shown to be an important

predictor of students’ interest and success in science, technology, engineering and mathematics

(STEM; Hedges & Chung, in prep; Humphreys, Lubinski & Yao, 1993; Shea, Lubinski &

Benbow, 2001). Building on such considerations, the National Research Council (2006) recently

published a report, Learning to Think Spatially, which emphasized the importance of spatial

thinking in science and mathematics education and called for educators to incorporate

interventions to enhance spatial skills into the curriculum.

Such effortsEfforts to improve spatial skills are predicated on the assumption that spatial

skills are in fact malleable. Yet, the fundamental question of whether long-lasting improvements

in spatial skill can be attained through training has yet to be resolved conclusively. Diverse

claims have been made regarding the effectiveness of spatial training. Some investigators have
Training of spatial ability 4

argued that training spatial performance leads only to fleeting improvement, limited to cases

where there is a high degree of similarity between the tasks trained and the outcome measures of

interest (Eliot, 1987; Eliot & Fralley, 1976; Maccoby & Jacklin, 1974; Sims & Mayer, 2002). In

fact, the recent NRC report questioned the generality of training effects and concluded that

transfer of spatial improvements has not been convincingly demonstrated. The report called for

research aimed at determining how to improve spatial performance in a generalizable way

(Learning to Think Spatially, 2006). In this paper, we aim to assessTherefore, we tested the

hypothesis that training, education, or life experience can improve spatial skills. In particular, we

address four main questions about spatial training, using a meta-analysis of existing training

studies in the literature.

First, we began simply by examining the magnitude of the improvements that can be

obtained, broken down by the type of spatial skill assessed and subsequently by variables such as

duration of training and study design. We grouped major spatial dependent variables into a set of

conceptual categories, and examined the impact of coded study characteristics (e.g., training

duration, frequency, type of control group) for each category of dependent measure. If the sizes

of training effects are heavily dependent on the spatial measures chosen in training studies, then

the effect sizes of these conceptual categories might vary a great deal. On the other hand, these

conceptual categories might not vary in malleability, yielding effect sizes that are similar in

magnitude. Finding the latter could suggest either that training exerts a general, as opposed to

task-specific, influence on spatial ability or that differences in study characteristics (e.g., training

duration, study methodology) are stronger determinants of the size of training effects than are the

precise identity of measures being trained.


Training of spatial ability 5

Second, we examined the issue of transfer. If spatial training indeed produces

generalizable effects, then we should expect performance to improve not only on tasks directly

trained but also on transfer tests—measures that were not directly trained but that were

administered along with the trained task to assess whether there was transfer to related skills. We

should expect some limits of this transfer, with the magnitude of training effects being larger for

near transfer, when the training and reference tests are similar, dropping off as training and

reference tests become less similar (i.e., far transfer).

Third, we addressed the question of durability. Are training-related gains maintained and,

if so, for how long? We ascertained the durability of training and transfer by estimating the point

at which, if ever, training effects declined to pretest levels. We performed such these analyses in

two ways: 1) across studies, comparing the effects of posttesting after the different intervals of

time used in different studies, and 2) within studies, for those studies that included both

immediate and delayed posttests.

Fourth, we analyzed whether training effects are more pronounced for some groups than

for others. For example, =It it is often argued that females should improve more with training

than males because they have been more deprived of spatial experience (e.g. Sherman, 1967);

however, a prior meta-analysis found that males and females improved in parallel (Baenninger &

Newcombe, 1989). It might also be predicted that children would improve more than adults,

either because they are lower in spatial skill and hence have more to gain, or are in a sensitive

period that has closed by adulthood. Across training studies, do children, in fact, show larger

effects of training compared to adults? What is the impact of other grouping variables, such as

SES or participant screening criteria, on the size of training effects?


Training of spatial ability 6

In summary, despite the high volume of research investigating the impact of various

training interventions on spatial outcomes, the field lacks a systematic and comprehensive

analysis of training effects for a variety of spatial skills for specified groups, and, especially,

lacks an accurate assessment of durability and transfer. In this paper, we examine the questions

of what spatial outcome measures are most and least amenable to training as a function of the

various types of training used to improve each spatial outcome and address how the effects of

training are moderated by individuals’ pre-existing levels of performance. We examine the effect

of study characteristics such as participant screening criteria, method of training, and frequency

of training as a means for accounting for the variability that exists in the magnitude of training

effects. We hope to shed light on the controversy surrounding what constitutes the most

appropriate methods for training different spatial skills as well as questions regarding the

durability and generalizability of spatial training effects. Ultimately, we hope to be able to make

informed recommendations to educators about the most appropriate ways to train spatial skills, to

establish guidelines regarding the extent to which different types of spatial skills are typically

improved with training, and to inform the design of educational interventions to help improve

students’ performance in STEM disciplines.

Method

Eligibility Criteria

Several criteria were used to determine whether a study would be included in this meta-

analysis:

1. The study included at least one spatial outcome variable. Examples include, but are

not limited to, performance on published psychometric subtests of spatial ability,


Training of spatial ability 7

reaction time on a spatial task (e.g., mental rotation or finding an embedded figure), or

measures of environmental learning (e.g., navigating a maze). 1

2. The study used training, education, or another type of specific intervention that was

designed to improve performance on a spatial task.

3. The study employed an experimental or quasi-experimental design. In other words,

either the study included a control group that did not receive the training, the study

used a pretest-posttest design that assessed performance relative to a baseline measure

obtained before the intervention was given, or the study compared the effects of

training on pre-existing groups (e.g., engineering and liberal arts students) that were

not randomly assigned to treatment or control groups.

4. The study focused on a non-clinical population. Thus, we excluded studies that used

spatial training to improve spatial skills after brain damage or to ameliorate the

negative effects of Alzheimer’s disease. We aAlso excluded were studies whose

exclusivethat focused exclusively on focus was the rehabilitation of high-risk or at

risk populations.

Literature Search and Retrieval

We operationalized the inclusion criteria through a search of several electronic databases,

including PsycINFO, Web of Science, Dissertations Abstracts International and ERIC

(Educational Resources Information Center) through May 31, 2007. The search included foreign-

language articles provided thatif they included an English abstract. The goal was to perform a

comprehensive screening of all studies reporting on the effects of spatial training so that each

study would have a fair opportunity to be considered for inclusion.


Training of spatial ability 8

We used the following search term: (training OR practice OR education OR experience

OR instruction) AND (pre-measure OR premeasure OR post-measure OR postmeasure OR pre-

test OR post-test OR pretest OR post test OR experiment OR post treatment OR post-treatment)

AND (spatial OR visuospatial OR geospatial). This search resulted in 788 hits. We read the

abstracts of these articles to determine whether they met the criteria described above. To ensure

that these decisions were made reliably, two researchers read through 155 (20%) randomly

selected abstracts, rated their eligibility, and, reached consensus through discussion for any

disagreements. Any disagreements were resolved by consensus using a discussion among a

group consisting of the two raters and two additional raters. This process resulted in 158

abstracts that were deemed to be potentially relevant.

In addition, weWe also contacted experts and authors in the field for any published and

unpublished data (of their own or that of their colleagues) and relevant references. We sent out

approximately 205 inquiries, and received 34 responses (17%), which yielded 43 manuscripts.

Two researchers independently read each of these additional studies, again rating the relevance

of the articles on the criteria mentioned above. In total, 201 abstracts were acquired and reviewed

(158 through electronic searches, 43 through author correspondence). Ninety-nine of these

studies were deemed relevant after being read in full by at least two researchers. The level of

agreement between the two raters on their decisions for inclusion and exclusion (Cohen’s Kappa

= .74), which is “substantial” as defined by Landis and Koch (1977) and just below the .75 cut-

off of “excellent” defined by Capozzoli, McSweeney and Sinha (1999). A review of the

reference lists of these articles yielded another 14 relevant articles. In all, the review process

produced a sample of 113 (i.e., 99 + 14) studies. These included articles published in scientific

journals as well as institutional technical reports, unpublished manuscripts and dissertations. It


Training of spatial ability 9

also included several non-English manuscripts, obtained via translators who were familiar with

the psychology literature who obtained articles translated from Korean, Dutch, Romanian,

Japanese, French, and German.

We took steps to avoid the “file drawer problem” (Rosenthanl, 1979), which potentially

could inflate our estimate of the magnitude of training effects. Because studies that find large

effects of training are more likely to be published than those that report little or no effect

(Rosenthal, 1979), we took steps to avoid publication bias by increasing our access to

unpublished work. First, when we wrote to authors and experts, we explicitly asked them to

include unpublished work. Second, we searched references lists of our articles for relevant

unpublished conference proceedings, and we also looked through the table of contents of any

recent relevant conference proceedings that were accessible online. Third, our search of

Dissertation Abstracts International yielded many unpublished articles, which were included

when they were relevant. For dissertations that were eventually published, we examined both the

published article and original dissertation. We augmented the data from the published article if

the the dissertation contained additional, unpublished data that were relevant to our objectives.

In some cases, authors did not provide sufficient information in the papers to allow us

to calculate effect sizes. To address this problem, we contacted the authors. For example, in

some cases, we requested separate means for control and treatment groups when only the F- or t-

statistics summarizing significant group differences were reported. Authors provided usable data

in approximately 20% of these cases and we used these data to compute effect sizes, separately

for males and females and control and treatment groups whenever possible.
Training of spatial ability 10

Conversion to Effect Sizes

The data from each study were entered into the software program Comprehensive Meta-

Analysis (CMA; Borenstein, Hedges, Higgins, & Rothstein, 2005).The program accepts multiple

types of data input, including not only means and standard deviations but also categorical data,

odds ratios, etc. Measures of effect size typically quantify the magnitude of gain associated with

a particular treatment relative to the improvement observed in the control group. Gains can be

conceptualized as an improvement in score and effect sizes can also be computed from an F

statistic, t statistic or chi-square value as well as from change scores representing the difference

in mean performance at two points in time. Thus, in some cases, it was possible to obtain effect

sizes without having the actual mean scores associated with a treatment (See Schmidt and

Hunter, Wilson & Lipsey Pink book).

All effect sizes were expressed as Hedges’ g, a slightly more conservative derivative of

Cohen’s d. We chose Hedges’ g as a measure of effect size because it correctsincludes a

correction for biases due to sample size, weighting each effect size by the standard error of the

effect size so that less precise estimates are given less weight in the analyses. Hedges g is

computed as:

Mtreatment Ð Mcontrol
g = ÃMSEwithin SÕs

We used a random effects model in order to ensure that our results can be generalized

beyond the studies selected and to increase the likelihood that we can apply our inferences more

generally to research on the effectiveness of training and or experience on spatial ability.

Random effects models are used when there is reason to suspect that variability would not be due

solely to sampling error (Lipsey & Wilson, 2001). Given the broadness and complexity of this

topicspatial cognition,, the studies we included differed in numerous several ways beyond the
Training of spatial ability 11

effects of sampling, making it unlikely that all sources of variance could be accounted for by a

single model. Using random effects allowed stronger and wider generalizations to be made.

Effect sizes were metaanalyzed using SPSS 16.0 macros (Wilson, 2002). These macros

were usedWe usedWilsons’ (2002) SPSS macros to calculate mean effect size (MEANES) and to

perform metaanalytic analogs to an analysis of variance (METAF) and a modified weighted

multiple regression (METAREG) for effect sizes. Thus, they allowed us to testThe macros

allowed us to test for both simple effects and interactions among grouping variables (e.g., study

characteristics) on effect size. Our sample before trimming consisted of 113 manuscripts (87

published and 26 unpublished) with relevant, available and usable data, yielding 634 effect sizes.

Coding of Study Descriptors

We developed a coding system for characterizingcoded the methods and procedures used in

each study. The coding scheme addressed the following characteristics of each study: The variables

coded described details about thethe sample, methods and procedures, spatial measures used, study

design, nature of control group, and details about the procedure such as length and frequency of

training, and time lapse delay between the end of training and the posttest were also coded for each

study included in the metaanalysis. posttest.

Both outcome measures and methods of training were classified into categories to

facilitate generating conclusions about groups of studies. We defined five categories of outcome

measures: spatial perception, perspective taking, assembly/ transformation, spatial principles,

and mental rotation. As summarized in Table 1, these five categories of outcome measures

overlapped in part with the three catogores that developed by Linn and Petersen (1985) used in

their meta-analysis of gender differences in spatial skill. Our categories also and map onto three
Training of spatial ability 12

of the five factors established by Carroll’s (1993) extensive re-analysis of individual differences

data (see also Lohman, 1988; Miyake et al., 2001).

Our category of Spatial perception encompasses tasks previously categorized as

measuring Visuospatial Perceptual Speed (Carroll, 1993) or Spatial visualization (Linn &

Petersen, 1985). Likewise, Assembly/transformation corresponds to Spatial Visualization,

categories used by both sets of researchers. Mental rotation, a category also included in Linn and

Petersen’s classification, was designated as Spatial Relations (also called Speeded Rotation) by

Carroll (1993). Were there two other categories Carroll used that we did not include? We

don’t mention his Closure speed or Flexibility of closure categories, but these are part of

our Spatial Perception category, I think.

We also included two additional categories. We added the category of Spatial principles

Principles because this domain represents a distinct and highly-specific skill that was not

included in the factor-analytic studies reanalyzed by Carroll (1993), as it grew out of a different

(Piagetian) research tradition; this category also corresponds to what Linn and Petersen (1985)

referred to as spatial perception. We also included Perspective taking, which was not included in

Linn and Petersen’s analysis. Perspective taking also grew out of the Piagetian tradition and has

been shown to be distinct from mental rotation even though the two can be considered

computationally equivalent (Hegarty & Waller, 2004; Huttenlocher & Presson, 1973, 1979).

Seven studies that reported the effects of training on an entire spatial test battery without

providing means for the individual subtests within the battery (making it impossible to calculate

separate effect sizes for the spatial components represented by the subtests) were excluded.

In addition to forming five conceptual categories of dependent measures, we also

classified studies by method of training using the flowchart shown in Figure 1. Coders read the
Training of spatial ability 13

Method and Results sections of each study and evaluated the training procedure used to obtain

each effect size. Coders then classified whether each effect size was the result of training using a

course, videogame, or a spatial task (either specific practice involving the outcome measure of

interest or transfer to an untrained spatial task). Within the course category, we distinguished

studies in which enrollment in a course was the sole manipulation from those studies that

compared the effects of an enhanced course to the typical version of the course. The former were

typically a semester in length whereas the latter were conducted in a shorter length of time within

a semester. Within instances of videogame training, we coded whether the videogame focused on

mental rotation skill.

Finally, we also coded more general details about the studies themselves, such as

publication year, country of origin, and socioeconomic status of country. Two trained coders

coded all 113 studies and inter-rater agreement was 95% on categorical measures and 91% on

continuous measures.

Outliers and Publication Bias

In our reporting of results, weOutliers. first considered the presence of outliers or

publication bias. Five studies that reported very extreme effect sizes (some as large as 8.33)

were excluded because they were conducted with participants from significantly underprivileged

countries who might have been expected to have very little experience with testing or spatial

testing in general. These five studies differed significantly from the rest of the sample in terms of

mean effect size (the mean, g = 1.71, SE = .07, k = 93, was more than twice the group mean of

the remaining sample, g = .65, SE = .03, k = 509), Q (1, 601) = 187.99, p < .001. 2 The countries

represented in these studies were also ranked significantly lower according to the Human

Development Index, a composite of standard of living, life expectancy, well-being and education
Training of spatial ability 14

that provides a general indicator of a nation’s quality of life (Human Development Report,

2007/2008). The three countries represented in these six studies, Papua New Guinea, Bahrain

and Nigeria, were ranked 145, 41 and 158, respectively, compared to an average ranking of 12.19

for the remaining 101 studies in our sample. Because studies focusing on more underprivileged

populations typically report effect sizes many times larger than those obtained from more typical

populations (REFS), the inclusion of these studies would lead to an exaggerated view of the

malleability of spatial training. Consistent with this view, when these studies were included, we

found a strongignificant correlation in our sample between HDI ranking and effect size,

Spearman’s rho (df = 600) = .30, p < .001, suggesting that the inclusion of these studies from

very low HDI ranking countries would inflate the overall effect size. Thus, these studies were

excluded from subsequent analyses.

We also took steps to curb the effects of less extreme outliers. To curb the effect of

extreme effect sizes, we performed a recodingWe recoded (i.e., WindsorizingWindsored) of

extreme effect sizes that were more than 4 SD (.475) above the mean of the remaining sample by

capping them at 4 SD above the mean. In other words, the value of any effect sizes greater than

2.55 was reset to 2.55. We did this for a total of 14 effect sizes in all. Windsorizing had a

negligible effect on the mean effect size of the overall sample (unadjusted g = .65, SE = .03 vs.

Windsorized g = .64, SE = .02) but had the desired effect of curtailing the value of the effect

sizes at the upper extreme (Figure 2). This recoding of extreme values also reduced their impact

on the mean effect size of the subgroups considered in the next section. All analyses reported

below were conducted Henceforth, all results are from this on the Windsorized valuessample.

Assessing Publication Bias. Although we performed a thorough search for unpublished

studies, we were concerned that the mean effect size of the sample could be biased upward due
Training of spatial ability 15

to a publication bias toward statistically significant findings (Lipsey & Wilson, 1993). To

measure the size of this potential publication bias, we first compared the average effect size of

published (g = .66, SE = .03) and unpublished (g = .58, SE = .04) studies in our sample and

found only a marginally significant difference, Q (1, 507) = 2.71, p = .10. To ascertain whether

this difference was large enough to affect our conclusions, we also calculated the Fail-safe n,

which estimates the number of studies with a null outcome (i.e., the number of studies reporting

g = 0) that would be required to render the overall mean effect size to be negligible in magnitude.

For the purposes of calculation, we set the value for a negligible effect size to be .10, which is

smaller than Cohen’s (1979) criterion for a low effect size. Thus, we were able to use Orwin’s

(1983) formula to calculate the fail-safe n for each partition. According to Orwin’s formula K0 =

k [ESk/ESc - 1], where ESk is the mean effect size, ESc is a mean effect size judged to be

negligible in magnitude (in this case, .10), and K0 is the fail-safe n, or the number of studies with

null results that would render the mean effect size to be negligible (i.e., equal to .10). By Orwin’s

formulaThis analysis revealed that, there would need to be 2794 “file drawer” studies reporting

an effect size of zero to reduce the mean effect size of the sample to .10. Taken together, these

results suggest that our sampling procedure produced an adequate and representative sample of

both published and unpublished research.

Results from Trimmed Sample

Our final sample consisted of 101 studies with 509 effect sizes: 76 (75%) that were

published in journals and 29 (25%) from dissertations, unpublished data or conference papers. In

our sample, 85 studies (84%) were conducted in the United States. The characteristics of these

studies are summarized in Table 2.


Training of spatial ability 16

Even with the outliers removed, the sample was highly heterogeneous (Q = 2351.57, df =

508, p < .0001), indicating that the effect sizes included in our sample differed in systematic

ways that might be uncovered by partitioning them into smaller groups. We followed the

procedure described by Hedges and Becker (1986) of using study descriptors to create effect size

partitions for reducing heterogeneity. The goal was to devise subgroups of interest in which the

main source of variability is sampling error (Lipsey & Wilson, 2001). Put another way, Voyer et

al. (2007) conceptualized homogeneous clusters as summarizing g), that is, roups of studies that

“can be considered replications of each other” (Voyer et al, 2007, p. 29).

Homogeniety can be difficult to obtain in meta-analyses (e.g., Voyer at al, 2005).

Therefore, We we followed the method of Voyer, Voyer and Bryden (1995) and Voyer, Postma,

Brake and Imperato-McGinley’s (2007) of expanding the p values indicating a significant test for

homogeneity, from greater than .05 being the standard for homogeneity to including p-values

that were less than .05 but greater than .005. As these authors noted, homogeneity can be

difficult to achieve within metaanalyses. By this convention, a sample of effect sizes whose

homogeneity statistic has a significance level p > .005 would be considered statistically

homogeneous. Partitioning had the desired effect of reducing heterogeneity, although complete

homogeneity was difficult to attain, a common obstacle reported in metaanalyses (Voyer et al.

2005). Because the amount of variability depends on the underlying literature, it is particularly

difficult to attain homogeneity with highly diverse samples of research. To determine the

appropriate partitions to make sense of the variation in effect sizes, we identified variables that

were likely to have a significant modulating effect on effect size.

Research designs. An effect size indicating the success of a single training intervention is

typically expressed as the extent to which the treatment group outperforms the control group
Training of spatial ability 17

(noted as E vs. C, or Ec). In other words, this effect size summarizes the impact of training on

spatial skill measured either as a pretest to posttest change, as a between subjects comparison of

treatment to control group, or as a combination of the two in a pretest-posttest with control

design.

When multiple training interventions are compared across different studies, however, Ec

can be difficult to interpret. Given that Ec, reflects the improvement of trained groups relative to

control groups, if control groups also improve to varying degrees, whether Ec is large because of

large improvements in trained groups or small improvements in control groups cannot be

determined without examining the groups separately.

To illustrate, in the studies sampled here, we found that the presence of a control group

had a significant effect on the magnitude of Ec. Comparing the mean effect sizes for the three

categories of study design, pretest-posttest with control, treatment vs. control only, and pretest-

posttest only, we found that mean effect sizes differed significantly depending on whether the

effectiveness of a treatment intervention was evaluated relative to a control group, Q (2, 507) =

17.88, p < .001. A post-hoc comparison, using an adjusted alpha of .01 to reduce the Type I error

rate, revealed that studies that used a pretest-posttest only design, in which no control group

offsets the training-related gains of the treatment group, reported the highest mean Ec overall (g =

.85, SE = .07, k = 47). The Pretest-posttest only group was significantly higher than both the

Treatment vs. Control group (g = .50, SE = .05, k = 127, p < .001) and the Pretest-posttest with

control group (g = .64, SE = .03, k = 334, p < .01), which were also marginally different from

each other (p < .05). Thus, studies that did not include a control group tended to report higher

effect sizes than studies that included a control group.


Training of spatial ability 18

We also tested whether the nature of the control group manipulation affected the

magnitude of control group improvement by comparing effect sizes by type of control group (No

control group, No treatment, Treatment as usual, Diluted treatment, or Alternative treatment).

The significance of this result reflected only the difference between studies with or without

control groups, Q (4, 507) = 16.74, p < .01, with studies with no control group yielding

significantly higher Ec effect sizes than those with a control group. There were no other

significant differences among the different types of control groups.

These results suggest that it is critically important to consider training-related gains

against the backdrop of the gains experienced by the control group. To isolate the magnitude of

treatment-related gains, we focused our analyses on the studies in which it was possible to

calculate effect sizes for the treatment and control group separately. Before doing so, we tested

whether the mean effect size for the “inseparable” studies (define inseparable) differed

significantly from the “separable” studies on which we planned to base our analyses. In fact,

there was no significant difference between the Ec effect sizes from the inseparable studies, M =

.70, SE = .05, k = 79, and the Ec effect sizes from the separable studies, M = .62, SE = .04, k =

256, Q (1, 334) = 2.81, p > .09. This result suggests that the nonseparability of the studies was

likely due only to a difference in preference for how the data were reported and not due to

substantive differences in the separable and nonseparable studies. Consequently, we examined

the improvements made by treatment and control groups separately (Wilson, Lipsey, & Derzon,

2003).

Among the 55 studes that used a pretest-posttest with control design and provided

sufficient data for this analysis, treatment groups (g = .75, SE = .03, k = 246) improved

significantly more than control groups did, g = .56, SE = .03, k = 224, Q (1, 453) = 23.38, p <
Training of spatial ability 19

.001. In other words, when treatment effect sizes are isolated from the improvements made by

the control groups, the treatment groups show average improve ofwe can conclude that training

improves spatial skills an average of .75, or ¾ of a SD. Our remaining analyses focus on these

separable studies, but we retained the unseparable studies to be used in specific comparisons

when appropriate. The remainder of the paper focuses on the factors that moderate this

improvement of treatment groups alone (as opposed to treatment relative to control), and we will

refer to this improvement as treatment effect size. Mean effect sizes for each study, along with

key characteristics of each study, are summarized in a table in the appendix.in Appendix A.

Analysis plan

Our analysis plan addressed our four major questions. First, does average effect size

differ for different categories of spatial outcome measures and how do they rank against one

another in terms of their malleability to training? Second, what is the magnitude of the test-retest

effect for spatial training and how large are the gains attributable to training, above and beyond

the retesting effect? Third, we considered the retention, what is the duration of training effects. ?

Are training-related gains maintained and for how long? FinallyFourth, we asked to what degree

do individuals starting at different spatial skill levels benefit from spatial training?

I. Does the size of training effects vary as a function of outcome measure?

To what extent are some spatial measures more malleable than others? In other words,

does the size of training effects depend on the outcome measure? In addition, toTo what extent

is malleability attributable to the effects of repeated practice? how How much does training to

transfer to untrained tasks? These questions have important implications for the design of

educational interventions, in which it is necessary to know both the magnitude of gains that can

be expected and the generality of the benefits that can be expected with training.
Training of spatial ability 20

To answer these questions, we first drew upon the largest number of studies that could be

compared by using Ec as the measure of effect size. This focus allowed us to make a

comprehensive determination of whether there were large differences in malleability among the

five categories of outcome measures: spatial perception, perspective taking, assembly/

transformation, spatial principles, and mental rotation. Overall, we found statistically significant

differences in mean effect size (Ec) by outcome measure, Q (4, 508) = 12.68, p < .05. Post-hoc

comparisons of the of the Ec effect sizes (with alpha reduced to .01 to control the Type I error

rate) suggested that spatial principles showed the largest gains with training while studies of

spatial perception showed the smallest gains with training (p < .01).

We also analyzed the treatment and control group effect sizes separately to eliminate the

ambiguity of evaluating effect sizes relative to control groups. Among the smaller sample of 55

studies with complete data, we still found significant differences by outcome measure, Q (4, 245)

= 11.20, p < .05. However, as shown in Table 3, the ordering of effect sizes revealed a different

pattern once the effect of control groups was removed and only the effect of treatment was

considered. In fact, although spatial perception studies yielded the lowest mean Ec effect size of

all outcome measures, the mean effect size for spatial perception treatment groups was the

highest of all groups, significantly higher than mental rotation (p < .01), with no other pairwise

differences significant at the .01 level.

Across the five outcome measures, there were also significant differences in the pretest to

posttest gains of the control groups, Q (4, 223) = 25.80, p < .001. Control-group improvement

was highest for the Spatial perception and Assembly/transformation control groups improved the

most of all control groupscategories. Assembly/ transformation control groups (M = 70, SE =

.05) also showed significantly larger gains than mental rotation control groups did, M = .51, SE =
Training of spatial ability 21

.04, p < .01. On the other hand, control groups for spatial principles (M = .18, SE = .10) showed

significantly smaller gains than all other control groups (ps < .01). No other pairs were

significantly different. These results suggests that spatial perception measures, in fact, yielded

the largest gains with training but that these gains may have been underestimated because of the

relatively large gains made concurrently by the control groups. In contrast, the spatial principles

treatment groups showed relatively modest gains with training, but when gains were evaluated

relative to control groups, spatial principles appeared to be highly malleable because the control

groups showed particularly small gains with training.

If differences in mean effect size between spatial perception and spatial principles were

primarily the result of differences in control group improvement, then we would also expect the

difference in control group improvement to be significantly larger relative to the difference in

their treatment groups. To test this prediction, we performed meta-regression (reference)? of

effect size onto Group (treatment vs. control), Outcome measure (spatial perception vs. spatial

principles), and Group x Outcome. The results were consistent with the apparent difference in

malleability being the result of differential control group improvement. As shown in Figure 3, we

found a significant interaction between group and outcome measure, providing strong evidence

that the difference between treatment and control group improvement was smaller in studies of

spatial perception than in studies of spatial principles. Thus, the apparent difference in

malleability between spatial perception and spatial principles is not due to the effects of training

but is, instead, primarily the result of differences in the improvement of control groups; s spatial

perception control groups improving a great deal and spatial principles control groups improving

very little. In sum, our results suggest that spatial perception is, in fact, highly malleable to

training, as evidenced by the high gains observed in its treatment groups, while spatial principles
Training of spatial ability 22

is significantly less malleable. Both of these results are consistent with our claim that control

groups moderate the size of training effects observed for different outcome measures. These

results also illustrate what is emerging as an important theme of this metaanalysis: the fact that

control groups play an important, moderating role for the size of training effects.

Why is it that control groups for spatial perception improved to a larger extent than

control groups for spatial principles? One possibility is a greater proportion of “reactive”

manipulations being used for some outcome measures than others. To operationalize reactivity,

we rank ordered the different types of control groups by effect size to determine which control

group manipulations were associated with higher and lower effect sizes. Those yielding higher

effect sizes were judged to be more reactive. For example, spatial perception control groups

might show larger effect sizes because a larger proportion of them receive reactive

manipulations. In contrast, spatial principles control groups could yield smaller effect sizes

because a larger proportion of these studies used inert control group manipulations. To

operationalize reactivity, we rank ordered the different types of control groups by effect size to

determine which control group manipulations were associated with higher and lower effect sizes.

Those yielding higher effect sizes were judged to be more reactive.

We analyzed the relative frequency with which each type of control group was used

within each outcome category (Figure 4). We found that theThe majority of spatial perception

studies tended to use an alternative treatment as a control group (51 out of 87 or 59% of spatial

perception effect sizes). On the other thandIn contrast,, the majority of in studies of spatial

principles the control groups were most likely to receive no treatment at all (66 out of 100 or

66% of spatial principles effect sizes). The difference in proportions of control groups observed

for these two outcome categories was statistically significant, χ2 (4, 187) = 87.04, p < .001.
Training of spatial ability 23

However, we found that control group effect sizes did not differ significantly for the four types of

control group (i.e., received nothing, treatment as usual, diluted treatment, alternative treatment),

Q (3, 223) = 5.69, p < .13. Among these groups, control groups that received a diluted version of

the treatment improved the least (M = .42, SE = .09) while those that received nothing (M = .62,

SE = .05) or received an alternative treatment (M = .60, SE = .05) improved slightly more. Thus,

the high proportion of alternative treatments used for spatial perception studies might explain the

high control group improvement observed for this group. In contrast, the high proportion of

control groups that received nothing in the spatial principles group does not seem to explain the

low performance of its control groups.

The previous analysis rank ordered the outcome measures in terms of their malleability to

training, although most of the measures exhibited similar effect sizes, on average. Significant

heterogeneity remained in each outcome category, suggesting that other study charactieristcs

might moderate the size of training effects. We next considered the effect sizes of the treatment

groups for each outcome category separately, and. Wwhenever possible we used coded study

variables and differences in training to account for variability that remained within each outcome

measure category.

Coded study variables. As shown in Table 4, the average length of training across all

studies was 21 days (range = 195). This translated into an average of 6.13 (SD = 400) hours of

training, with the majority of studies administering training in one single session. The number of

test items or trials varied considerably by outcome measure, with studies training to improve

mental rotation skills using the largest number of trials (M = 213.68, SD = 108.46) and studies of

perspective taking skill using the smallest number of trials (M = 6.71, SD = 1.60), F (4, 52) =

24.85, p < .001.


Training of spatial ability 24

Several variables had no significant effect on effect size for any of the outcome measures

we tested: Neither age (younger than 13 vs. 13 – 18 years vs. older than 18 years) nor publication

status (published or not) was significantly related to treatment group effect size. Whether ;The

location training (in a classroom or elsewhere) was administered in a classroom also had no

significant impact on training effect size, either overall (p > .39) or within any of the outcome

measure categories (all ps > .11).

Three variables had a significant effect on some but not all outcome measures (Table 4).

For the sample as a whole and within 4 out of the 5 outcome measure categories there was no

significant effect of training frequency on treatment effect size. The lone exception was mental

rotation, which showed significantly higher effect sizes when multiple sessions of training were

used instead of one single session. , which yielded the lowest treatment effect size overall (g =

.38, SE = .08). Whether feedback was providedPresence or absence of feedback also had a

significant impact on effect size. For Assembly/transformation and spatial principles, studies that

provided feedback during training yielded significantly higher treatment effect sizes than studies

that did not, ps < .05. The reverse was true for spatial perception, where the mean treatment

effect size was lower for studies that provided feedback than it was for those that did not.

Finally, random (vs. nonrandom) assignment was generally associated with significantly lower

effect sizes for the overall sample (p < .01) as well as for Assembly/transformation and mental

rotation (ps < .05) training studies, but this was not true for any other outcome measure.

Types of training. We also classified training into three major categories: course training,

videogame playing, and performance of spatial tasks. Each category was subdivided further to

help pinpoint the components of each type of training that were related to the size of training

effects.
Training of spatial ability 25

Course training. We distinguished course training, in which simply being enrolled in a

course constituted the treatment, from training in which a short-term course enhancement was

compared to the standard version of the course. In the former case, the control group was

typically enrolled in a different, nonspatial course (e.g., water purification as opposed to

drafting). In the latter case, training consisted of a single unit or course module, administered to a

small number of students who were compared to students receiving the course as usual.

We found that the average treatment effect size for pure course training (M = 1.11, SE =

.10, k = 15) was significantly higher than the treatment effect size when training consisted of an

enhanced course, M = .58, SE = .05, k = 60, Q (1, 74) = 21.66, p < .001. A similar trend was

observed within the outcome measure categories in which there were sufficient numbers of effect

sizes to perform a comparison. As shown in Table 5, mean treatment effect size was higher for

pure course training compared to enhanced course training for Assembly and transformation,

Q(1, 36) = 7.69, p < .01, and to a lesser degree for mental rotation, Q (1, 34) = 3.34, p < .07.

Because the results compared only the performance of the treatment groups, these results

are not confounded affected by control group performance. One disadvantage of this approach is

that many full-length courses used designs in which it was not possible to calculate separate

treatment and control effect sizes and, thus, were not included in the previous analysis.

Furthermore, because many studies examining course training provided few details about how

training was administered (i.e., frequency of sessions, session length, etc.) focusing only on

treatment groups made it difficult to obtain enough cases to identify characteristics that might

explain the advantage of full-length course training vs. short-term enhanced course training.

ThusTherefore, we also compared the effect of full-length course training and short-term

enhanced course training for the two categories of “inseparable” studies: those using pretest-
Training of spatial ability 26

posttest only designs and those using treatment vs. control (i.e., between subjects) designs. We

found largely the same pattern of results that was observed when only treatment groups were

considered. As summarized in Table 5, among both pretest-posttest only studies and treatment vs.

control studies, the mean effect size for full-length course training was again significantly higher

than the mean for enhanced courses. Within outcome measure categories, this was true for

mental rotation and assembly/transformation (although the result did not reach significance for

the pretest-posttest studies of assembly/transformation). Given the similarity in the pattern of

results, we pooled the effect sizes from the separable and inseparable in order to compare full-

length courses versus short-term enhanced courses.

We performed an ANOVA comparing timing characteristics of full-length and short-term

courses and found a number of significant differences. On average, training for full courses was

longer in duration (p < .01) and was administered in more sessions (p < .001). Full course

training took place over an average of 78.91 days (SD = 12.94) compared to 45.17 days (SD =

38.98) for enhanced courses. Full course training was given in nearly twice as many sessions (M

= 16.27, SD = 5.24) compared to enhanced courses (M = 8.69, SD = 4.90). On the other hand,

theThe difference in total number of hours spent training was not significantly

differentsignificant, although we observed thatbut the mean number of hours was higher for

enhanced course training took longer on average (M = 52.54, SD = 85.86) than full course

training did (M = 32.07, SD = 11.23), p > .43. Consequently, when training was converted to a

per diem rate (total hours of training divided by total length of training period), we found that

enhanced courses could be characterized as more intense, performing a higher number of hours

of training in a shorter number of days. Enhanced courses trained participants at a rate of 2.07
Training of spatial ability 27

hours per day (SD = 3.03) compared to full-length courses, whose training averaged .43 hours

per day, p = .078.

These results are consistent with performance advantage that results from distributed

versus massed practice (Ebbinghaus, 1885; Donovan & Radosevich, 1999): Short-term enhanced

courses and full-length courses require a similar number of hours of training; however, in short-

term enhanced courses, the distribution of these hours over a shorter training period and into a

smaller number of sessions may account for their smaller effect sizes relative to traditional,

semester-long courses that distribute the same amount of training over a longer period and spread

it out across a greater number of sessions. In short, our results are consistent with past research

demonstrating that distributed training is more effective than mass training.

Videogame training. Training effects from videogame play were similar for all outcome

measure categories, Q (4, 56) = 2.45, p > .65. However, videogames that entailed mental

rotation (e.g., Tetris) were associated with stronger training effects (M = .87, SE = .09) compared

to games that did not involve mental rotation, M = .63, SE = .06, Q (1, 56) = 5.22, p < .05. The

same was true of Assembly/transformation tasks: Videogames involving mental rotation (M =

.91, SE = .09) yielded larger effects than non-mental rotation videogames (M = .35, SE = .09) did

on tasks requiring spatial assembly or transformation, Q (1, 11) = 17.54, p < .001. Whether

videogames involved mental rotation did not have a significant impact on mental rotation task

performance, however. This may have been because treatment effect sizes were large for

videogame training on mental rotation outcome measures regardless of whether the videogames

used mental rotation (M = .85, SE = .12) or not (M = .72, SE = .08), p > .36.

Spatial task training. The final category of training involved the use of spatial tasks as a

form of training. encompassed studies that used training involving the administration of spatial
Training of spatial ability 28

tasks. TThese were mutually exclusive of studies that used courses or videogames as training.

Within this category, we examined the effectiveness of training that involved direct or repeated

practice on a task of interest and then measured the improvements on the task of interest as well

as the transfer of training to other reference tests administered in the study. We distinguish

practice from other types of training in that we define practice as repetitions of the same task and

training as more varied and less task-specific. We consider them separately as two different

methods of training but acknowledge that either can lead to generalization and transfer. In cases

of transfer, following Barnett and Ceci’s (2002) distinction we distinguished near transfer, in

which training and the reference test were highly similar (e.g., Tetris-playing to mental rotation

or WLT using a round flask to WLT using an irregularly-shaped flask) from far transfer, in which

training and reference test were more dissimilar (e.g., Tetris-playing to Paper Folding or

inclusion of a spatial module in an engineering course to Spatial Visualization test score).

In this section, we focus on the treatment effect sizes for training that involved repeated

practice on the same task versus training that required transfer to an untrained task. However, we

acknowledge that the absolute number of tests given may also play an important role in

increasing training effects. Thus, we focus on treatment effect sizes here but, in the next section,

consider the effects of training that includes one single task versus training that includes

multiple, different tests during training.

Whenever an outcome measure was not identical to the training task, it was counted as a

test of transfer. Only for Assembly/transformation tasks was the mean treatment effect size

significantly higher for repeated practice than when transfer was required (p < .05). The

similarity between effect sizes for repeated practice and transfer tests within the Mental rotation

category initially seems to suggest a uniformly high rate of successful transfer and to contradict
Training of spatial ability 29

past findings suggesting that mental rotation training effects are highly specific (e.g., Sims &

Mayer, 1996). We found, however, that this result was largely due to our broad construal of

transfer in this analysis.

Because our definition of transfer included a wide range of training and tests, we also

subdivided our transfer effect sizes and compared cases of “near” transfer, where the training

task and outcome measure were highly similar (e.g., training on rotating 2-D figures and testing

on Card Rotations Test), with cases of “far” transfer, where training and outcome were more

dissimilar. In most cases, near transfer produced significantly higher effect sizes than far transfer

did. The mean treatment effect size for near transfer was significantly higher than for far transfer

for the overall sample, p < .001, spatial perception, p < .001, and mental rotation, p < .001. The

significant difference between near and far transfer makes sense in light of past work: although

there was no difference between treatment effect sizes for repeated practice and transfer overall,

training that constituted “near” transfer was significantly more effective in improving spatial

skills than training that was considered to be “far” transfer. Thus, our results are consistent with

work suggesting that spatial training is more effective when it is similar to the task of interest. In

sum, in most cases, training that more closely approximated the outcome measure of interest was

more effective than training that was less similar.

Finally, to gain a sense for what has “worked” in terms of type of training and outcome

measure, we rank ordered treatment effect sizes into quartiles and compared the proportion of

each type of training and each outcome measure found in each quartile. This provided another

way of examining how effect sizes clustered in order to identify the characteristics of the most

successful training studies.


Training of spatial ability 30

We found no significant association between type of training and treatment effect size

quartile, χ2 (df = 9) = 12.03, p > .21. There was, however, a significant association between

outcome measure and treatment effect size quartile, χ2 (df = 12) = 21.32, p < .05. As summarized

in Table 6, this is consistent with our results in that it revealed that the majority of treatment

effect sizes for spatial perception were found in the highest (4th) quartile while the majority of

treatment effect sizes for spatial principles were found in the lowest (1st) quartile. This analysis

reflects underlying deficits in the field, namely the scarcity of research

I. Summary

In this section, we investigated whether different spatial outcomes show different gains

with training. We compared the effect sizes for the treatment groups of each of the five outcome

measure categories used in our analysis: spatial perception, perspective taking,

assembly/transformation, spatial principles, and mental rotation. By analyzing only the treatment

group effect sizes, we avoided the problem of confounding effect size with control group

improvement.

Once the effect sizes for treatment groups were considered separately from those of

control groups, we found that spatial perception, in fact, was highly malleable to training; its low

effect size Ec was the result of its control groups also showing large gains with training. On the

other hand, studies of spatial principles showed large Ec effect sizes, largely because its control

groups showed extremely small gains with training. We found that studiesStudies of spatial

perception most often presented control groups with an alternative treatment in place of the

training intervention while studies of spatial principles while control groups in studies of spatial

principles tended to receive no treatment. This may help to explain why the control groups for

spatial perception tended to show relatively large improvements.


Training of spatial ability 31

In addition to the presence of control groups, certain study characteristics also had a

significant impact on treatment effect size. Mental rotation benefited from multiple sessions of

training (compared to one single session); for all other outcome measures, however, treatment

effect sizes were similar for multi-session and single-session studies. Providing feedback during

training produced mixed results: For assembly/transformation and spatial principles, providing

feedback was associated with higher treatment effect sizes while for spatial perception, it was

associated with lower effect sizes. Type of training also moderated effect sizes for different

outcome measures. We found that videogames that entailed mental rotation produced larger

training effects than nonrotation games, but all types of spatial outcome measures appeared to

benefit to an equal extent from videogame training.

Training in the form of a lengthy course compared to an alternative course (e.g., drafting

vs. water purification) reported led to larger effect sizes than shorter-term course enhancements

(e.g., addition of 3-D module to an existing course). We found this to be theThis result held case

across all three types of study designs (i.e., pretest-posttest with control, treatment vs. control,

pretest-posttest only). Our results suggested that fFull-length courses included the same number

of hours of training as enhanced courses but distributed it over a longer period of time and

divided it into a greater number of sessions. This finding may explain the consistently higher

effect sizes associated with full-length courses, which use distributed training, compared to

short-term enhanced courses, which used mass training. Finally, we found that training effects

were similar across studies for repeated practice and for training that required a degree of

transfer to a different task. However, training effects were significantly smaller for tasks

requiring far transfer and the size of training effects for near transfer were large, on average.
Training of spatial ability 32

Taken together, these results provide some evidence that the effects of training do extend

beyond mere practice on a task. Evidence for far transfer suggests that improvements in

untrained tasks accompany the improvements that result directly from training. In the next

section, we will decompose these effects of practice to try and specify the extent and limit to

which repeated practice improves spatial performance.

II. What is the magnitude of the test-retest effect in spatial training and what factors are

associated with larger retesting effects?

In the previous section, we showed that when the influence of control groups is removed,

both repeated practice as well as training using highly similar spatial tasks led to improvements

in spatial skills. Because our intention waswe wanted to separate training effects from gains in

the the influence of training from the gains experienced by control groups, the previous analysis

included only those studies that provided separate information about the performance of both the

treatment and control groups. Focusing only on these studies focused on the performance of the

treatment groups within those studies in which it was possible to calculate separate effect sizes

for the treatment and control groups.

This aspect of the analysis limits the conclusions that can be drawn in a few ways. First,

it reduced the number of effect sizes that could be included in the analysis. Second, it examined

in isolation the use of practice as a treatment manipulation when, in reality, repeated practice is

often given in combination with other training methods. For example, many researchersstudies

used an additive methodology; the control group received repeated practice, and the treatment

group received the same practice plus an additional treatment in which a control group is given

repeated practice and the treatment group receives repeated practice along with an additional

treatment intervention (e.g., Terlecki, Newcombe, & Little, 2008). Thus, the linking of individual
Training of spatial ability 33

treatment effect sizes to a particular aspect of training may have ignored the fact that studies may

present participants with a collection of tests and that the unique constellation of tests, as well as

the characteristics of the individual tests themselves, may have a modulating influence on effect

size.

In this section we consider factors that influenced learning in the control groups. Our

working hypothesis is that the type of experience subjects had in the control group strongly

influenced the magnitude of their improvement. In addition, we test the hypothesis that

improvements in the control group represent something more than simple test-retest effects.

To address this issue (?), We we also considered that the type of filler task separating the

administration of the test and retest might also affect the size of the retesting effect. For example,

the magnitude of test-retest effects might depend on whether filler tasks were spatial in nature or

nonspatial, as in cases where subjects completed a regular (unenhanced) version of a course or

left the testing site and returned at a designated time to complete the retest. Of the 203 control

group effect sizes included in this analysis, 88 used a spatial filler task and 115 used a nonspatial

filler task. There were 119 effect sizes derived from test-retest on a single measure and 84 effect

sizes from test-retest on multiple (i.e., more than one) measures. We compared these

improvements against cases in which individuals performed repeated practice on a single

measure (17 effect sizes).

Therefore, we sought toThus, to quantify and decompose the magnitude of test-retest

effects. Our focus in this section is therefore only on testing and resting among the , we focused

in this section on the instances of testing and retesting among the control groups. Our approach

was two-fold: First, we compared the magnitudeWe began by examing of test-retest effects

among the following types of control groups: 1) those who practiced a single task repeatedly in
Training of spatial ability 34

lieu of the training received by the treatment group; 2) those that received nothing or performed a

nonspatial filler task between a pretest and posttest on a single measure; 3) those that performed

a spatial filler task between a pretest and posttest on a single measure; 4) those that received

nothing or performed a nonspatial filler task between pretest and posttest administrations of

multiple measures; and 5) those that performed a spatial filler task between a pretest and posttest

administration of multiple measures.

We then ?compared the mean effect size of these 5 types of groups and found that there

were significant differences among these variants on test-retest control groups, Q (4, 223) =

18.32, p < .01. The means are summarized in Figure (See Figure 5). A MetaRegression

confirmed main significant effects of both number of measures (single vs. multiple, ß = .39, p <

.001) and type of filler task (spatial vs. nonspatial, ß = .26, p < .001) as well as a significant

Number x Filler interation, ß = -40, p < .001.

These results suggest that the act of retesting on a single test (when a nonspatial filler task

is used) does have an effect on raising scores from pretest levels (g = .37, SE = .05). This is

similar in magnitude to the average test-retest effect reported in the literature (.28). This test-

retest effect is even larger, however, when a spatial filler task is used (g = .66, SE = .06). The

presence of a significant interaction indicated that the difference in mean effect size between

single and multiple measures was significant for studies using a nonspatial filler task (p < .001)

but not for studies using a spatial filler task (p = .26). In other words, among subjects given

nonspatial filler tasks, control group subjects whose test-retest protocol included multiple

measures improved more than subjects who were only tested and retested on a single measure. In

contrast, test-retest procedures that included a single or multiple measures generated similar

levels of improvement when the test and retest were separated by a spatial filler task. This
Training of spatial ability 35

finding is to be expected if we consider that subjects given a spatial filler task may still be

learning something. These improvements were statistically similar to the gains observed when

subjects performed repeated testing on a single task (g = .61, SE = .09).

The total number of tests that control groups completed was also associated with higher

effect sizes, Q (2, 223) = 12.03, p < .01. As shown in Figure 6, control groups that completed a

test-retest procedure on a single test (M = .49, SE = .04) improved significantly less than those

that completed five or more tests (M = .78, SE = .08, p < .01) and those that completed 2-4 tests

(M = .59, SE = .04), although the latter difference was significant only at p < .05. Because the

two categories of multiple tests did not differ, the main distinction appeared to be between

control groups receiving a single test and those receiving multiple tests, M = .64, SE = .04, Q (1,

223) = 7.66, p < .01. There was not an appreciable gain for control groups that received 4 tests

or more than 4 tests in a test-retest design. In fact, there was a large and significant gain in

control group performance between those completing pretest-posttest on only one test (M = .49,

SE = .04) versus those receiving two tests, M = .70, SE = .06, Q (1, 166) = 9.09, p < .01. Thus,

the inclusion of at least two different measures in a test-retest design appears to provide a level of

“training” to control groups.

A second finding of interest here is that control groups that merely practiced a single task

repeatedly improved to a similar degree as control groups that took a pre- and posttest on

multiple measures. Although the two groups may learn different things with repeated testing,

multiple (more than 2) repetitions on a single test seem to yield similar gains as two (pretest and

posttest) administrations of multiple tests (defined as more than one test).

Finally, it is also interesting to note that, on average, control groups that received a spatial

filler task improved to a similar degree as those that received a nonspatial filler task or no filler
Training of spatial ability 36

task at all. This is consistent with our earlier analysis of control groups that revealed a significant

difference between studies that had no control group and those that had a control group but no

significant differences between the different types of control groups. However, those who

received a nonspatial filler task showed larger improvements as more measures were included

while those who received a spatial filler task performed similarly whether a single measure or

multiple measures were included for test-retest.

II. Summary

The typical improvement that is expected to result from retesting on a single test is .29.

Some aspects of Our our results were consistent with the typical pattern observed in past

research: Control groups that received a test-retest regimen on a single measure (with no spatial

filler) improved about .38. However, we also found that the number of tests that accompany a

measure, as well as the nature of the filler task also have an effect on the size of the test-retest

improvement observed.

In general, control group participants who were tested and retested on multiple measures

improved significantly more than those who did so onreceived only a single measure. This

tendency was particularly true when the intervening time between the tests was spent doing

either nothing or completing a nonspatial filler task. We also found a similar degree of

improvement in control groups that practiced one task repeatedly and control groups that

completed only a pretest and posttest but on multiple measures. This might also help to explain

the high gains observed earlier when battery was considered as an outcome measure, since

studies in that category typically administered a large number of tests at pretest and posttest.

We noted earlier that the success of interventions is typically judged by looking at the Ec

effect sizes, which summarize the gains shown by the treatment group relative to the gains
Training of spatial ability 37

shown by the control group. In the previous section, we showed that spatial skills improve

regardless of whether individuals receive repeated practice on a task of interest or if they are

trained on related spatial skills, with larger training effects being observed when there is a higher

degree of similarity between the training and outcome measures. However, in this section we

showed that not only is the degree of similarity between training and outcome important, the

number of tests included within a training regimen also effects the size of training effects.

Specifically, control groups that received multiple tests in a test-retest design improved more

than control groups that received only a single test and improved to the same degree as control

groups that engage in repeated practice on a single test.

This methodological issue—that the number of different tests administered is effectively

training individuals—has implications for how the effectiveness of training interventions are

evaluated. Because within a single study, a large improvement by the control group attenuates

the size of the effect size for the intervention, a study that administers multiple measures versus a

single measure may report different effect sizes. It also suggests that the concept of what

constitutes training may need to be reframed. Training is not limited to the content of the

material, it also consists of increasing familiarity with procedures and taking tests. Thus, control

groups, who do not receive the same material content as treatment groups, still show sizable

improvements when they are enrolled in a study. Furthermore, control groups that complete a

test-retest on multiple tests have even more opportunities for being “trained” and, as such, show

larger improvements compared to control group individuals who take a single test.
Training of spatial ability 38

III. Are the effects of spatial training durable and how long do Durability of Trainingthey

last?

The majority of studies tested only the immediate effects of training. However, we found

that pretest to posttest improvements did not differ significantly among posttests given

immediately, 2 weeks after, or more than 2 weeks after the end of training (Figure 7). The last

category includes posttests that were given up to 3 months after the end of training.

Because studies that administered a delayed posttest did not test long-term retention more

than once, the previous analysis was based on the comparison of durability across different

studies. What about studies that employed both and immediate and delayed posttest? As shown

in Figure 8, we found the same to be true within studies that administered both immediate and

delayed posttests: the effect size from pretest to immediate posttest (g = .64) was not

significantly different from the effect size from pretest to delayed posttest (g = .65). Thus, the

effects of training are durable; they do not decline the gains in spatial skill due to training do not

decline significantly from the end of training.

We also tested whether the effects of training on transfer to reference tests was also

durable. We found Ffar transfer was actually more durable than near transfer waseffects to be

more resistant to delays than near transfer effects (Figure 9). There was no interaction between

Delay type and Transfer type. This result could suggest that training designed to achieve far

transfer may also be more likely to yield training effects that are durable. Very few studies tested

the effects of long delays after the conclusion of training, but these results do suggest that

transfer is durable, although certainly more research is needed that examines the long-term

retention of training and transfer effects.


Training of spatial ability 39

III. Summary

Some researchers have speculated that spatial training effects are neither long-lasting nor

generalizable to tasks beyond those directly trained. Contrary to these assumptions, we found

that the consensus in past research is that the effects or training are durable. We found that

thereThere were are no significant losses in pretest-posttest gains resulting from training, even

for studies that retested participants 3 months from the end of training. We also found that

training is generalizable to tasks other than those used in traininggeneralizes: In the studies

surveyed here,; participants performed better on a variety showed improved performance on a

variety of reference tests that were not trained directly. The gains on these tests of transfer also

appeared to be durable over time.

Why haven’t past attempts at spatial training led to stronger effects? First, we have

shown that the extent to which control groups improve is important to consider and we have also

shown that control groups may improve a lot (g = .56 in our sample). This is very large given

that the test-retest effect alone is .26. Furthermore, length of training may not be long enough. If

long periods of training are used (e.g., Terlecki, Newcombe & Little, 2008), durable training

effects AND far transfer observed.

IV. How do individuals’ pre-existing levels of performance modulate the size of training-

related gains in spatial performance?

In this section, we address the questions related to whether high- and low-performing

individuals benefit from training to the same degree, whether they be males vs. females, or adults

vs. children, etc. Our goal was to shed light on some of the sources of individual differences in

receptivity to training and also to ascertain the extent that methodological factors (such as
Training of spatial ability 40

differences in improvement by control groups) could account for some of these apparent

differences in malleability. Addressing this issue is of

Determining whether the effects of training and experience on spatial ability are stronger

or weaker within different populations is of critical importance for determining who stands to

gain the most from different types of spatial training.

Within the literature on the effects of training or experience on spatial performance, some

Some of the largest improvements in spatial performance have been recorded withinobserved in

populations with limited exposure to spatial tasks. (e.g., Saunderson,1973; Seddon & Shubbar,

1984; Seddon, Eniaiyeju & Jusoh, 1984; Seddon & Shubber, 1985; Shubbar, 1990). Higher

versus lower degrees of prior experience with spatial tasks also have a modulating influence on

the size of training effects. For example, Gagnon (1985) found that female, but not male,

participants showed significant improvements in spatial skills after playing a spatially

demanding video game, likely probably because the females reported lower levels of previous

game-playing experience. On the other handHowever, the spatial skills of males and females

with low levels of gaming experience improved equally after when Dorval and Pepin (1986)

specifically recruited males and females with low levels of gaming experience, they found that

both sexes showed significant improvements in spatial skills after playing a spatial video

gamespatial video game training (Dorval & Pepin, 1986). Thus, it is of great theoretical and

practical interest to test the hypothesis that differing amounts of spatial experience and activities

lead to differences (or different increases) in spatial ability. This hypothesis has received

substantial attention in the literature; researchers have called it the experiential hypothesis,

Sherman’s hypothesis, or the Bent Twig Model is experiential hypothesis (Baenninger &

Newcombe, 1989), also termed Sherman’s hypothesis or the bent twig model (Casey, 1996), that
Training of spatial ability 41

differing amounts of exposure to spatial experiences and activities lead to differences in spatial

ability).

Specifically, theThe experiential hypothesis predicts that less spatially-experienced

individuals will show larger training effects compared to more spatially-experienced individuals.

As such, training effects should be larger for studies conducted in less-industrialized versus

more-industrialized countries, larger for females than males due to their differing degrees of

spatial experience (Baenninger & Newcombe, 1989), and also larger for children compared to

adults. Our results earlier are consistent with the first prediction: We excluded 6 studies from

nonindustrialized countries (g = 1.67, SE = .07) from ourthe main analysis because their mean

effect size was significantly higher than that of the remaining studies, which were from more

industrialized countries, g = .68, SE = .02, Q (1, 633) = 167.61, p < .001. Even after these

extreme cases were excluded, there remained a significant, negative relationship between HDI

ranking and treatment group effect size, r (236) = -.21, p < .01. The same negative relationship

was found when all Ec effect sizes were considered, r (508) = -.12, p < .01. Figure 10 shows the

relationship between windsorized effect size and HDI ranking.

Within the studies retained in our sample (i.e., after the exclusion of outliers), we were

interested in whether training had the same impact on males and females and children and adults.

We begin this section with a discussion of how spatial training might play a role in modifying

differences in spatial skills that have been observed as a function of sex. We also examine

whether there is support in our data for an age difference in malleability to training.

Sex differences. Mirroring the gender gap in the STEM disciplines, spatial skills are one

cognitive domain in which sex differences have been reliably and systematically found (Linn &

Peterson 1985). Men consistently score higher than women on most standardized measures of
Training of spatial ability 42

spatial skills, with the notable exception of object location memory (Voyer, Voyer & Postma,

2003). The most popular explanation of this discrepancy, first advanced by Fennema and

Sherman (1977), attributes the difference in scores to the cultural sex-typing of spatial activities

such that males are more likely to engage in them. This difference, in turn, leads to greater and

richer levels of spatial experience among males than females. Indeed, research has shown that

males are much more likely to participate in spatial activities such as sports and construction

play while females are traditionally much more likely to engage in play with dolls, cooking and

art (Baenninger & Newcombe, 1995; Voyer, Nolan, & Voyer, 2000). These spatial experiences

and activities can, in a sense, be considered training (and likely enhancing) spatial skills.

Thus far, thereThere have been many individual studies of the effects of training in sex

differences in spatial cognition, but to our knowledge, the last meta-analysis of this topic was

conducted almost twenty years ago (B. However, to our knowledge, the meta-analysis

conducted by Baenniger and Newcombe (1989) over 15 years ago is the only systematic review

of these studies.aenninger and Newcombe, 1989). These researchers Baenninger and

Newcombe (1989) tested the experiential hypothesis by examiningexamined the relation the

relationship between spatial activity participation or experience and scores on psychometric tests

of spatial ability. They found a weak, but significant, relation They found a weak but reliable

relationship between between spatial activity participation and spatial ability for both males and

femalesfor both males and females, supporting the notion that males may have an advantage in

spatial ability due to their greater amount of spatial experience over females. However, they also

found that spatial ability test performance was equally improved with training for females and

males. In other words, their meta-analysis didBaenninger and Newcombe did not show find the
Training of spatial ability 43

Sex by Training interaction that would be expected if would have been predicted by the

hypothesis that spatial experience is the key in leading to enhanced spatial skills.

Recent work by Levine et al. (2005) provides additional support for the experiential

hypothesis. They found that the emergence sex differences on spatial rotation and (examples)

tasks depended on SES level; low SES boys and girls performed the same, but mid- and high-

SES boys performed better than their female counterparts. They Levine et al. speculated that

low-SES children do not have as much access to stimulating materials that have been shown to

enhance spatial ability, such as video games, puzzles and Legos as higher SES-children do. If

the experiences that tend to enhance spatial abilities are absent from the lives of low-SES

children and less prevalent for girls of all SES levels, then we would expect that they would do

worse on spatial tests but might improve substantially if provided with adequate or compensating

experience or training.

Although sex differences were not the primary focus of our analysis, we were able to test

for sex differences with our data set for all studies in which separate means for males and

females were provided by the authors. Table 7 gives the We computed the mean effect sizes of

the effect sizes of control and treatment groups for both sexes. There were no and found no

significant sex differences in mean effect size for either the control groups (p > .56) or treatment

groups (p > .18). However, as expected, the). Treatment mean effect size for the treatment

groups groups performed significantly better than the control groups did, but this replicates

analyses reported above. was significantly higher than for the control groups (p < .001). Using

the metaregression macro, we tested the significance of the Condition x Sex The interaction

between Condition and Sex was not significant and found no significant interaction (p > .54).

This is consistent with previous research (e.g., Baenninger & Newcombe, 1989) which found
Training of spatial ability 44

that males and females both benefit from training. Our results suggest indicate that training

leads to comparable gains in male and females, in both control and training groupsoverall, male

and female control groups and treatment groups show statistically similar gains after

participating in training studies. Contrary to the experiential hypothesis, we did not find that

women improved more than men did.Treatment groups show larger improvements than control

groups do, indicating that training is effective, but training does not appear to be more effective

for women than men, as both sexes improve with training. The group means are summarized in

Table 7.

The results presented thus far indicate that training leads to improvement of equal

magnitude for Our results indicate that the magnitude of improvement is similar for males and

females. We next addressed whether males and females began at the same or at different levels

of performance. However, these results do not address whether there are pre-existing differences

across studies and whether these differences are reduced or even eliminated with training. Males

and females both appear to benefit from training, but what is the nature of this improvement?

Despite the high volume of research on sex differences, no consensus has been reached on how

training modifies pre-existing levels of performance in males and females. Figure 11 depicts two

competing scenarios seem possible for depicting the start and end points for males and females

in spatial skill. The first (a) is a remediation scenario, in which the mean performance of males is

higher at pretest but females catch up with practice. The second (b) is one of parallel

improvement; males perform better at the beginning than females do, and training leads to

comparable improvement in both groups. Thus the male advantage is maintained across training.

in which there is a consistent male advantage in spatial skill in which both sexes respond to
Training of spatial ability 45

training but in which a male advantage exists both before and after training and training does not

seem to close this gender gap.

To determine which of the two hypothetical scenarios provides the better summary of the

effects of spatial training on the performance of males and females, we first computed the pre-

and post-test performance by sex, we also computed the magnitude of sex differences at pretest

and posttest for every study for which this was possible. To be included in this analysis, a study

We included only those studies that must have provided both the mean pretest and posttest

scores for male and female participants separately. A total of 35 studies were found thatThirty-

five studies satisfied met these criteria. Following the procedure described by Voyer, Voyer, &

Postma (2007), we calculated the size of the sex difference in pretest scores and the sex

difference in posttest scores for each of the 35 studies. The size of the sex difference is

summarized by the Hedges’ g statistic, which in this case reflects the size of the sex difference

favoring males. A larger and more positive g in this case reflects a larger sex difference favoring

males over females, a negative g would indicate a sex difference favoring females over males,

and a g close to zero would suggest no sex difference. Using the SPSS macros, we tested the

significance of sex differences at pretest and again at posttest to determine whether significant

sex differences in pretest performance were observed.

Across the 35 studies, males performed better than women did both at For the 35 studies

that were included in this analysis, we found that a significant sex difference favoring men was

found at both pretest (g = .50, SE = .04, k = 48) and posttest (g = .44, SE = .04, k = 48) and that

the size of this difference did not change significantly as a result of participating in training, Q

(1, 95) = .86, n. s. We repeated this analysis separately for each of the five outcome measures

and found the same result. These results are summarized in Table 8. To explore the possibility
Training of spatial ability 46

that the male advantage was reduced for some outcome measures but not others, we repeated this

analysis for the five outcome measure categories and found the same pattern of results. Sex

differences favoring men were statistically unchanged from pretest to posttest for all measures.

The results are summarized in Table 8.

This pattern isOur findings are consistent with the parallel improvement scenario. On

average, males perform better than females at pre-test. Training leads to similar improvement in

males and females, and males therefore maintain their initial advantage. Training does not seem

to reduce the gender gap in spatial skills, but it does help both men and women perform at

substantially higher levels. in which both men and women improve with training but women do

not catch up to the levels of men. In other words, the consensus of the studies we survey here is

that there is a male advantage on spatial skills and that it persists despite the fact that both males

and females improve with training. Thus, our results suggest that the male advantage in spatial

skills observed at pretest is not eliminated with training.

Age differences. We also tested whether children (who have more limited spatial

experience) would show consistently larger training effects compared tothan adults did. We

found that the overall Ec effect size was higher for children (g = .75, SE = .04) than adults, g =

.61, SE = .03, Q (1, 538) = 8.69, p < .01, and that this difference was driven by an age difference

in the extent to which the control groups improved during testing. As shown in Table 9, when the

control and treatment groups were considered separately, a significant effect of age was found for

the control groups only, with adult control groups improving significantly more than children’s,

p < .001. There was no significant age difference, however, in the size of improvement by the

treatment groups (p > .11) nor was there a significant Age x Condition interaction (p > .16). For

both children and adults, treatment groups yielded significantly higher effect sizes than control
Training of spatial ability 47

groups did, ps < .001. Thus, both children and adults improved to an equal extent with training

and, despite differences in their pre-existing levels of performance, there was no evidence to

suggest that children showed larger gains with training than adults did.

We also tested whether the nonsignificance of the overall sex difference could be

attributed to the existence of a significant Age x Sex interaction. In other words, if a sex

difference existed but was more pronounced in either children or adults, this interaction would

obscure an overall main effect of sex. Because we found a significant difference in control group

effect size for children and adults, we tested for an Age x Sex interaction on the treatment groups

only. A metaregression of effect size on Age, Sex and Age x Sex interaction revealed no

significant effects, ps > .08. The only outcome that approached significance was a main effect of

Sex (p = .08). Otherwise, there was no evidence to suggest there was sex difference in the size of

training effects, either among children or adults.

Thus far, we have focused on age comparisons between children and adults. Yet, the

experiential hypothesis would also predict that training effects would be stronger for younger

children than older children. To test this prediction, we compared effect sizes (from treatment

groups only) of individuals under 13 years, from 13 – 18 years, and older than 18 years. We

found no significant differences in treatment group effect size, p > .72. We also compared the

effect sizes from control groups to determine whether younger and older children’s control

groups perform differently in training studies. We reported earlier that children’s control groups

showed significantly smaller gains with training compared to adults. When younger and older

children were considered separately, we found a significant effect of age on control group effect

size, with control groups for the youngest children improving significantly less than those of

older children and adults (Table 10). Thus, our results suggest again that there is no age
Training of spatial ability 48

difference in effect size for treatment groups, only age differences in the extent to which control

groups improve.

We investigated a possible explanation for why the youngest control group participant

showed significantly smaller improvements compared to those who were older by analyzing the

proportion of each control group type used by age. If certain types of control groups were used

more often with the youngest participants than with the other two age groups and these were

more inert (i.e., less reactive and likely to produce gains), this might explain the low levels of

improvement in the youngest control groups. We analyzed the proportion studies that used each

type of control group and compared these for each age group. The data for the Age x Type of

control group analysis are summarized in Table 11, both for the separable studies and for the

entire sample (separable and unseparable studies).

The analysis of Age by Control group type was not significant when only the separable

studies were considered (χ2 = 6.61, p < .36) but was significant when the entire sample was

included, χ2 = 14.33, p < .05. Both analyses converged on the same result, that for the youngest

age group, control groups were more likely to receive no treatment than one of the other control

group manipulations. In contrast, control groups for the middle age group (13 – 18 years) were

given treatment as usual most often while control groups for the oldest age group (over 18 years)

received no treatment and alternative treatments equally often.

Was the type of control groups used with the youngest participants responsible for the

low performance of the Under 13 control groups? Specifically, was there one type of control

group associated with low effect sizes and was this type overrepresented in the youngest group of

participants? To answer this question, we compared the mean effect sizes for each type of control

group for each age group (Figure 12). Recall that our earlier analysis (when all age groups were
Training of spatial ability 49

considered together) had indicated that mean effect size did not differ by type of control group.

In contrast, when this analysis was broken down by age, the type of control group was

significantly related to control group effect size (p < .01 for the oldest group, p < .05 for the

middle age group, and p = .08 for the youngest group). Thus, although type of control group had

a significant impact on control group improvement, the types of control groups that were

associated with the highest and lowest effect sizes were not the same for each age group.

We also tested whether the lowest-performing control group for the youngest age group

(i.e., diluted treatment) was also the type of control group used most frequently by the youngest

group. This turned out not to be the case: the lowest-performing control group was not the most

frequently-occurring. Instead, control groups that received nothing both were the most

commonly used and also yielded the highest effect size.

Taken together, this analysis reveals three main findings: 1) The control groups for the

youngest participants improved the least of all three age groups; 2) The types of control group

that were used most frequently varied significantly by age; and 3) The nature of what the control

group received had a significant impact on effect size. However, there was no evidence to

suggest that the choice of control groups accounted for the low gains observed for the youngest

control groups, since the most frequently used control group (receives nothing) was associated

with the highest mean effect size. Thus, it appears that control groups from studies of young

children tend to show small gains but that this cannot be explained by their receiving a higher

proportion of ineffectual control group manipulations. In sum, there is not strong evidence to

suggest that the types of control groups used accounted for the low effect sizes of the youngest

control group participants.


Training of spatial ability 50

Low vs. High-performers. Finally, we also tested whether the process of screening out

high-performing individuals at the start of training has a significant impact on the size of training

effects. For example, past work has shown that differences in spatial skill among low- and high-

frequency video gamers can be reduced or eliminated if the low-gamers are given additional

video game playing experience (Gagnon, 1985; Dorval & Pepin, 1986. Okagaki & Frensch,

1994). Do training studies that incorporate a screening procedure yield higher effect sizes

compared to those that enroll all participants, regardless of performance level?

In all, 11 out of 101 studies (107 effect sizes) tested only individuals who were defined

during a screening procedure as low scorers. An ANOVA on Ec effect sizes revealed that the 11

studies that tested only low scorers (M = .74, SE = .05) were significantly higher in mean effect

size than the remaining 90 studies, M = .61, SE = .03, Q (1, 507) = 4.90, p < .05. Of the 55

separable studies, there was no difference in the mean effect size for the control groups of the 7

studies that prescreened and the 48 that did not prescreen participants, p > .61. A significant

difference in mean effect size was found for the treatment groups, however, with treatment group

effect size being significantly larger for studies employing prescreening (M = .90, SE = .07, k =

40) compared to those that did not, M = .72, SE = .03, k = 206, Q (1, 245) = 5.31, p < .05. These

results are also consistent with the experiential hypothesis: Studies that focus on training low-

performing individuals report significantly larger effect sizes compared to studies that train

individuals representing a wider range of skill levels.

IV. Summary

Testing individuals that vary widely in spatial ability are important to our investigation of

spatial training effects because they address the extent to which it is possible to improve the
Training of spatial ability 51

spatial skills of low-performing groups and help to identify the situations in which we can expect

the gap between low- and high-performers can be closed or even eliminated.

We first tested whether the typical male advantage in specific spatial skills were reduced,

or even eliminated, when females were given additional training or experience. Our results were

most consistent with a parallel improvement outcome pattern, indicating that both males and

females improved with spatial training but that the male advantage present at pretest remained at

posttest after training. Thus, although a number of individual studies have reported that pretest

sex differences favoring males were erased after females showed larger gains with training

compared to males (e.g., Gittler & Gluck, 1998; Kass, Ahlers & Dugger, 1998; Larson et al.,

1999; Lohman & Nichols, 1990; Parameswaran, 1996; Vasta, Knott & Gaze, 1996), the

consensus across studies is most consistent with males and females showing equal gains with

training. After combining results across studies, we conclude that although males and females

both respond to training, spatial training does not eliminate the sex differences in spatial skills

that favor males.

Comparisons of children and adults likewise did not reveal significant differences in the

size of training effects obtained for each group. It is sometimes assumed that children will show

greater effects of training compared to adults. However, once we accounted for the influence of

control groups, we found no evidence to support this difference. We suggest that one reason why

training studies of children may appear to yield larger effect sizes than studies of adults is that

children’s control groups, on average, improve significantly less than adults’ do. This does not

appear to be due to a tendency to use ineffectual control group manipulations with younger

children; the highest-performing type of control group was also the most frequently used type

anong the youngest group. Likewise, the lowest-perforning type of control group was the least
Training of spatial ability 52

frequently used. Thus, the low performance of the youngest control groups may be accounted for

by more general factors, such as lack of familiarity with testing or possible failure to develop

strategies spontaneously, factors that may be less likely to limit performance in adult control

groups.

We did find that studies focusing on low-performing individuals (as defined by a

prescreening procedure) tended to report higher effect sizes than those that tested participants

regardless of skill level. This is consistent with the pattern we observed earlier for studies

focusing on low-SES populations likely to have limited spatial experiences. Like the current

group of studies that focuses on low-performers, these studies also reported larger than average

training effects. Overall, our results in this section provide qualified support for the experiential

hypothesis, with both sexes benefiting equally with training and training improving spatial skills

to a similar degree for different age groups but training having a larger effect on individuals who

are lower-than-average in spatial skill.

General Discussion

A number of studies suggest that spatial skills can be influenced by experience or

training, yet there is considerable variability in the magnitude of training effects that have been

reported. Despite the large number of studies that have found positive effects of training on

spatial performance, other studies have found minimal or even some negative effects of receiving

interventions(e.g., Johnson, 1991; Larson, 1996; Kass et al, 1998; Simmons, 1998; Kirby &

Boulter, 1999; Faubion, Cleveland & Harrel,1942; Smedslund, 1963; Gagnon, 1985; Kass,

Ahlers & Dugger,1998; McGillicuddy-DeLisi, DeLisi & Youniss,1978; Johnson, 1991; Vasta,

Knott & Gaze, 1996; Simmons, 1998 and Smith, 1998).


Training of spatial ability 53

We suggest that the mixed results of past research on training can be attributed, in part, to

overlooking the size of improvements made by control groups and failing to consider how

incidental aspects of study design, such as the number of measures included within a battery of

tests, might on their own produce pretest-posttest gains.

Overall, our results clearly support that spatial traning yields substantial, durable gains in

spatial skills that generalize to other tasks. Generally, we found that both men and women and

children and adults benefit from training and that there were limited differences in malleability.

The differences in malleability among the types of outcome measures and training methods were

somewhat limited and these differences did not seem to be as important as the differences

resulting from methodological characteristics shared by the groups of studies. Specifically, in

nearly all cases, the size of the training-related improvements was heavily dependent on whether

studies included a control group and, if so, the size of the gains observed within their control

groups.

We also argue for a broader conception of what constitutes training. We suggest that a full

characterization of spatial training entails not only examining the content of courses or training

regimens but also examining the nature of the practice effect that results from being enrolled in a

training study and being tested multiple times. We found that control group participants who

were otherwise “untrained” showed differences in improvement when they received multiple

tests or a single test, suggesting that even untrained particpants in training studies learn

something important. This may be learning about the act of taking a test, becoming familiar

spatial measures, both of which are enhanced when individuals take multiple tests, when there is

the opportunity to compare items across tests and to learn by making contrasts (Gentner &

Markman, 1994; 1997). Alignable differences are highlighted when similar entities are
Training of spatial ability 54

compared, so the act of taking multiple tests permits comparisons to be made across different

spatial tests and can potentially highlight important similarities and differences in test content

and strategy.

When we isolated the size of the treatment effects and compared them across outcome

measures, we found that there were relatively few differences in treatment effect sizes but large

differences in the extent to which control groups improved. We also found that it was not the

total number of hours spent training that made a difference but rather how it was distributed:

Full-length and enhanced courses used a comparable number of hours of training, but full-length

courses spread out this training over a longer training period, resulting in a less-intensive but

more distributed pattern of training.

Repeated practice on a task of interest yielded improvements that were similar in

magnitude to cases of “near” transfer, where the practiced task and the outcome measure were

highly similar. Thus, training and outcome do not need to be identical in order for training-

related gains to be observed. However, repeated practice and near transfer both led to

significantly higher gains than “far” transfer did. This suggests that while it is efficacious to train

spatial skills, that there are limits to how these training effects will generalize to tasks that are

dissimilar to those trained. It is worth noting, however, that far transfer effects were more durable

than near transfer effects. We also found that taking multiple pretests and posttests yields larger

improvements on average compared to taking a single pretest and posttest, and that the effect of

giving multiple measures was similar to the improvement generated from practicing repeatedly

on a single task.

In our sample, the majority of studies used immediate posttesting after the conclusion of training.

However, among the studies that delayed the administration of the posttest there was strong
Training of spatial ability 55

evidence for the successful maintenance of training effects. The magnitude of training effects

was statistically similar for posttests given immediately, 2 weeks after, and even more than 2

weeks after the end of training.

With regard to Sherman’s hypothesis, we found qualified support for the role of pre-

existing levels of spatial experience on the size of spatial training effects. The size of spatial

training effects depended on a country’s HDI ranking, with lower ranking countries reporting

much larger training effects than countries that ranked higher in HDI. Likewise, studies that

focused on remediating low-performing individuals reported significantly higher effect sizes than

studies that tested all ability levels. This suggests that whether spatial interventions are deemed

to be effective depends heavily on the populations selected for study. Studies focusing on low-

SES or low-performing individuals will tend to report higher effect sizes than studies testing

higher-SES or typically-performing individuals (e.g., college students).

On the other hand, we found at the same time that neither sex nor age was a consistent

predictor of training effects. Specifically, the overall pattern for males and females was

consistent with a scenario of parallel improvement, with both sexes improving to an equal extent

with training but males maintaining their advantage over females across all different types of

spatial tasks and training. Comparisons of children and adults likewise did not reveal significant

differences in the size of training effects obtained for each group. Once the effect of control

group improvement was removed, a direct comparison of treatment group effect sizes revealed

no significant differences between children and adults. There was also no evidence that this

affect was modified by sex; it appears that children and adults improve with training to a similar

extent regardless of sex. Thus, these results suggest that when comparing training effects, it is at

least as important to consider the methodological characteristics of studies (i.e., the nature and
Training of spatial ability 56

improvement of control groups) as it is to consider demographic or individual difference

variables.

One intriguing corollary of Sherman’s hypothesis is that spatial skills in which

individuals are relatively untrained (i.e., receive little practice on or have little experience in)

should show particularly large gains in training. Although humans process and navigate a rich

array of spatial information on a daily basis, it is possible that not all spatial skills receive equal

training in naturalistic settings and that an increased reliance on technological aids (e.g., online

maps, GIS) have removed many opportunities to receive natural practice on spatial skills. Certain

professions might yield specific types of spatial experiences that are relevant to spatial test-

taking (e.g., dress making and the SR-DAT, Workman, Caldwell & Kallal, 1999; working in a

restaurant and the Water level task, Vasta, Rosenberg, Knott & Gaze, 1997). For most

individuals, however, opportunities for naturalistic spatial training are less available. Thus,

secular trends describing the interaction between technological devices and changes in spatial

ability represents one potentially fruitful avenue for future research.

Conclusion

Ultimately, the goal of research on spatial skills is to translate the results of individual

training studies into the development of best-practice guidelines for spatial interventions. Reports

of success for individual training regimens on isolated spatial tasks are important in that they

attest to the efficacy of training these skills. However, success in the STEM disciplines depends

on improving more molar measures, such as grades in school and the performance in situated

contexts of tasks requiring spatial skills. Thus, success in improving component skills such as

mental rotation or spatial perception is not noteworthy unless it can be shown that these

improvements translate into the skills that are relevant to success in STEM. This attitude is also
Training of spatial ability 57

important when evaluating the implications for some of our other results, namely that across

studies, spatial training was not successful in closing the gap between male and female

performance levels. On one hand, one might interpret this result negatively in that it could imply

a level of inevitability about the male advantage in spatial skills or the impossibility of female

students catching up. However, it is important to note, first, that acknowledging the existence of

a gender gap in basic spatial skills is not the same as conceding that males should outperform

females in all of the molar measures relevant to STEM success (e.g., grades, job performance).

In other words, the goal of future research is not to focus on remediation in order to close the

gender gap in basic spatial skills but to close the gap in STEM success.

Directions for future research. This metaanalysis was limited to studies that included at

least one spatial outcome measure. One important direction for future work is to analyze the

relationship between basic spatial skills and measures that are directly relevant to STEM success,

such as classroom behaviors, grades, and occupational success. Our analysis identified some key

characteristics of successful training studies, including maintaining at least a moderate degree of

similarity between training task and outcome measure, providing feedback during training, and

implementing longer, distributed methods of training as opposed to shorter-term, intensive

training regimens. Across all studies, 81% tested the effects of training immediately after the

conclusion of training, indicating the need for more research that includes delayed posttesting.

Related to the issue of control groups, we found that control group improvement varies

widely but that the magnitude of gains experienced by a control group can heavily influences

whether a training intervention is judged to be effective. For example, studies that fail to include

a control group typically report significantly higher effect sizes than those that include some type

of control group. When control groups are included but improve a great deal, this also masks the
Training of spatial ability 58

effectiveness of a training intervention. Furthermore, within certain populations, such as young

children, control groups tend to perform very poorly, which also can potentially lead to inflated

estimates about the effectiveness of training. Thus, the appropriate interpretation of training

effect sizes must be done cautiously and the specification of control groups are an important

parameter to consider when making judgments about the effectiveness of training.

In this metaanalysis, we also found not only that test-retest effects are sizable but that the

they increase with the number of separate tests included as part of a training study. Taken

together, these findings suggest that a significant component of training, one that improves with

age, is the act of learning to take tests and the development of spontaneous strategies for

approaching spatial tests. On one hand, designing training interventions that surpass the sizable

improvements attained through repeated practice is a challenge. On the other hand, it also

suggests a potentially important mechanism for raising spatial skills to a minimum level of

performance. Our finding that the act of taking multiple separate tests provides “training”

suggests the importance of training test literacy as well as focusing on component spatial skills.
Training of spatial ability 59

References

* Denotes studies included in the meta-analysis

* Arond, N. (1966). Acceleration in the acquisition of perspective. Unpublished doctoral

dissertation, University of Minnesota.

Baenninger, M., & Newcombe, N. S. (1989). The role of experience in spatial test performance:

A meta-analysis. Sex Roles, 20, 327-344.

Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn? A taxonomy

for far transfer. Psychological Bulletin, 128, 612-637.

* Barsky, R. D., & Lachman, M. E. (1986). Understanding of horizontality in college women:

Effects of two training procedures. International Journal of Behavioral Development, 9,

31-43.

* Batey, A. H. (1986). The effects of training specificity on sex differences in spatial ability.

Unpublished doctoral dissertation, University of Maryland.

* Battista, M. T., Wheatley, G. H., & Talsma, G. (1982). The importance of spatial visualization

and cognitive development for geometry learning in preservice elementary teachers.

Journal for Research in Mathematics Education, 13, 332-340.

* Beilin, H., Kagan, J., & Rabinowitz, R. (1966). Effects of verbal and perceptual training on

water level representation. Child Development, 37, 317-329.

* Ben-Chaim, D., Lappan, G., & Houang, R. T. (1988). The effect of instruction on spatial

visualization skills of middle school boys and girls. American Educational Research

Journal, 25, 51-71.


Training of spatial ability 60

* Blade, M. F., & Watson, W. S. (1955). Increase in spatial visualization test scores during

engineering study. Psychological Monographs: General and Applied, 69 (12, Whole No.

397), 1-13.

* Blatter, P. (1983). Training in spatial ability: A test of Sherman's hypothesis. Perceptual &

Motor Skills, 57, 987-992.

Borenstein, M., Hedges, L., Higgins, J., Rothstein, H. (2005). Comprehensive Meta-analysis Ver-

sion 2, Biostat, Inc., Englewood NJ.

Bornstein, M. H. (1989). Sensitive periods in development: Structural characteristics and causal

interpretations. Psychological Bulletin, 105, 179-197.

* Braukmann, J. (1991). A comparison of two methods of teaching visualization skills to college

students. Unpublished doctoral dissertation, University of Idaho.

* Brinkmann, E. H. (1966). Programmed instruction as a technique for improving spatial

visualization. Journal of Applied Psychology, 50, 179-184.

* Brown, F. R. (1954). The effect of an experimental course in geometry on ability to visualize in

three dimensions. Unpublished doctoral dissertation, University of Illinois.

Capozzoli, M. V., McSweeney, L., Sinha, D. (1999). Beyond kappa: A review of interrater

agreement measures. Canadian Journal of Statistics, 27, 3-23.

* Carpenter, F., Brinkmann, E. H., & Lirones, D. S. (1965). Educability of students in the

visualization of objects in space (Cooperative Research Project No. 1474). Ann Arbor,

MI: The University of Michigan.

* Chance, J. E., & Goldstein, A. G. (1971). Internal-external control of reinforcement and

embedded-figures performance. Perception & Psychophysics, 9, 33-34.


Training of spatial ability 61

* Chatters, L. B. (1984). An assessment of the effects of video game practice on the visual motor

perceptual skills of sixth grade children. Unpublished doctoral dissertation, University of

Toledo.

* Churchill, R. D., Curtis, J. M., Coombs, C. H., & Harrell, T. W. (1942). Effect of engineer

school training on the surface development test. Educational & Psychological

Measurement, 2, 279-280.

* Ciganko, R. A. (1973). The effect of spatial information training and drawing practice upon

spatial visualization ability and representational drawings of ninth grade students.

Unpublished doctoral dissertation, Illinois State University.

* Clements, D. H., Battista, M. T., Sarama, J., & Swaminathan, S. (1997). Development of

students' spatial thinking in a unit on geometric motions and area. The Elementary School

Journal, 98, 171-186.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:

Lawrence Earlbaum Associates.

* Connor, J. M., Schackman, M., & Serbin, L. A. (1978). Sex-related differences in response to

practice on a visual-spatial test and generalization to a related test. Child Development,

49, 24-29.

* Day, J. D., Engelhardt, J. L., Maxwell, S. E., & Bolig, E. E. (1997). Comparison of static and

dynamic assessment procedures and their relation to independent performance. Journal

of Educational Psychology, 89, 358-368.

* De Lisi, R., & Cammarano, D. M. (1996). Computer experience and gender differences in

undergraduate mental rotation performance. Computers in Human Behavior, 12, 351-

361.
Training of spatial ability 62

* De Lisi, R. & Wolford, J. L. (2002). Improving children's mental rotation accuracy with

computer game playing. Journal of Genetic Psychology, 163, 272-282.

* Deratzou, S. (2007). A qualitative inquiry into the effects of visualization on high school

chemistry students’ learning process of molecular structure. Unpublished doctoral

dissertation, Drexel University.

Donovan, J. J. & Radosevich, D. R. (1999). A meta-analytic review of the distribution of practice

effect: Now yu see it, now you don’t. Journal of Applied Psychology, 84, 795-805.

* Dorval, M. & Pepin, M. (1986). Effect of playing a video game on a measure of spatial

visualization. Perceptual & Motor Skills, 62, 159-162.

* Drauden, G. M. (1980). Training in spatial ability. Unpublished doctoral dissertation,

University of Minnesota.

* Duesbury, R. T., & O'Neil, H. F. (1996). Effect of type of practice in a computer-aided design

environment in visualizing three-dimensional objects from two-dimensional orthographic

projections. Journal of Applied Psychology, 81, 249-260.

* Eliot, J. (1966). The effects of age and training upon children's conceptualization of space.

Unpublished doctoral dissertation, Stanford University.

* Embretson, S. E. (1987). Improving the measurement of spatial aptitude by dynamic testing.

Intelligence, 11, 333-358.

* Emler, N. & Valiant, G. L. (1982). Social interaction and cognitive conflict in the

development of spatial coordination skills. British Journal of Psychology, 73, 295-303.

* Faubion, R. W., Cleveland, E. A., & Harrell, T. W. (1942). The influence of training on

mechanical aptitude test scores. Educational and Psychological Measurement, 2, 91-94.


Training of spatial ability 63

* Feng, J. (2006). Cognitive training using action video games: A new approach to close the

gender gap. Unpublished doctoral dissertation, University of Toronto.

* Feng, J., Spence, I., & Pratt, J. (2007). Playing and action video game reduces gender

differences in spatial cognition. Psychological Science, 18, 850-855.

* Ferrini-Mundy, J. (1987). Spatial training for calculus students: Sex differences in

achievement and in visualization ability. Journal for Research in Mathematics Education,

18, 126-140.

* Gagnon, D. (1985). Videogames and spatial skills: An exploratory study. Educational

Communication and Technology, 33, 263-275.

Gentner, D., & Markman, A. B. (1994). Structural alignment in comparison: No difference

without similarity. Psychological Science, 5(3), 152-158.

Gentner, D., & Markman, A. B. (1997). Structure mapping in analogy and similarity. American

Psychologist, 52, 45-56.

* Gerson, H. B. P., Sorby, S. A., Wysocki, A., & Baartmans, B. J. (2001). The development and

assessment of multimedia software for improving 3-D spatial visualization skills.

Computer Applications in Engingeering Education, 9, 105-113.

* Geva, E., & Cohen, R. (1987). Transfer of Spatial Concepts from Logo to Map-Reading

(Ontario Department of Education Technical Report No. 143). Toronto: Ontario

Department of Education.

* Gittler, G., & Gluck, J. (1998). Differential transfer of learning: Effects of instruction in

descriptive geometry on spatial test performance. Journal for Geometry and Graphics, 2,

71-84.
Training of spatial ability 64

* Golbeck, S. L. (1998). Peer collaboration and children's representation of the horizontal

surface of liquid. Journal of Applied Developmental Psychology, 19, 571-592.

Halpern, D. F., Benbow, C. P., Geary, D. C., Gur, R. C., Hyde, J. S., & Gernsbacher, M. A.

(2007). The science of sex differences in science and mathematics. Psychological

Science, 8, 1-51.

Hausknecht, J. P., Halpert, J. A., Di Paolo, N. T., & Gerrard, M. O. M. (in press). Retesting in

selection: A meta-analysis of practice effects for tests of cognitive ability. Journal of

Applied Psychology.

Heckman, J. J. & Masterov, D. V. (2006). The productivity argument for investing in young

children. Review of Agricultural Economics, 29, 446-493.

Hedges, L. V., & Chung, V. (in preparation). Does spatial ability predict STEM college major

and employment? An examination of two longitudinal studies.

* Heil, M., Rossler, F., Link, M., & Bajric, J. (1998). What is improved if a mental rotation task

is repeated--the efficiency of memory access, or the speed of a transformation routine?.

Psychological Research, 61, 99-106.

* Hsi, S., Linn, M. C., & Bell, J. E. (1997). The role of spatial reasoning in engineering and the

design of spatial instruction. Journal of Engineering Education, 86, 151-158.

Humphreys, L. G., Lubinski, D., & Yao, G. (1993). Utility of predicting group membership and

the role of spatial visualization in becoming an engineer, physical scientist, or artist.

Journal of Applied Psychology, 78, 250-261.

* Johnson, J. E. (1991). Can spatial visualization skills be improved through training that

utilizes computer-generated visual aids?. Unpublished doctoral dissertation, University

of Minnesota.
Training of spatial ability 65

* Johnson, S., Flinn, J. M., & Tyer, Z. E. (1979). Effect of practice and training in spatial skills

on embedded figures scores of males and females. Perceptual & Motor Skills, 48, 975-

984.

Johnson, M. H., Munakata, Y., & Gilmore, R. O. (Eds.) (2002). Brain Development and

Cognition: A Reader (2nd Edition). Oxford: Blackwell Publishers.

Kanaya, T., Scullin, M. H., & Ceci, S. J. (2003). The Flynn effect and U.specific. policies: The

impact of rising IQ scores on American society via mental retardation diagnoses.

American Psychologist, 58, 778-790.

* Kaplan, B. J., & Weisberg, F. B. (1987). Sex differences and practice effects on two visual-

spatial tasks. Perceptual & Motor Skills, 64, 139-142.

* Kass, S. J., Ahlers, R. H., & Dugger, M. (1998). Eliminating gender differences through

practice in an applied visual spatial task. Human Performance, 11, 337-349.

* Kastens, K. A., & Liben, L. S. (2007). Eliciting self-explanations improves children’s

performance on a field-based map skills task. Cognition and Instruction, 25, 45–74.

* Kastens, K. A., Kaplan, D., & Christie-Blick, K. (2001). Development and evaluation of

“Where are We?” map skills software and curriculum. Journal of Geoscience Education,

49, 249-266.

* Kidder, F. R. (1973). An investigation of nine-year-old, eleven-year-old, and thirteen-year-old

children's comprehension of Euclidean transformations. Unpublished doctoral

dissertation, University of Georgia.

* Kirby, J. R., & Boulter, D. R. (1998). Spatial abilities and transformational geometry.

Zeitschrift fur Padagogische Psychologie (German Journal of Educational Psychology),

12, 146-155.
Training of spatial ability 66

* Kozhevnikov, M., & Thornton, R. (2006). Real-time data display, spatial visualization ability,

and learning force and motion concepts. Journal of Science Education and Technology,

15, 111-132.

* Kwon, O. N., Kim, S. H., & Kim, Y. (2002). Enhancing spatial visualization through Virtual

Reality (VR) on the web: Software design and impact analysis. Journal of Computers in

Mathematics and Science Teaching, 21, 17-31.

* Kwon, O. N. (2003). Fostering spatial visualization ability through web-based virtual-reality

program and paper-based program. In C. W. Chung, C. K., Kim, W. Kim, T. W. Ling, &

K. H. Song (Eds.), Web and Communication Technologies and Internet-related Social

Issues (pp. 710-706). Springer Berlin/Heidelberg.

Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical

data. Biometrics, 33, 159-174.

* Larson, K. L. (1996). Persistence of training effects on the coordination of perspective tasks.

Unpublished doctoral dissertation, Biola University.

* Larson, P. et al. (1999). Gender issues in the use of virtual environments. CyberPsychology &

Behavior, 2, 113-123.

* Lee, K. O. (1995). The effects of computer programming training on the cognitive

development of 7-8 year-old children. Korean Journal of Child Studies, 16, 79-88.

* Leino, V., & Willemsen, E. (1976). Use of a perceptually based apparatus to train adult

women's performance on a Piagetian measure of the horizontality concept. Perceptual &

Motor Skills, 42, 363-369.

* Lizarraga, M. L. S. & Garcia Ganuza, J. M. (2003). Improvement of mental rotation in girls

and boys. Sex Roles, 49, 277-286.


Training of spatial ability 67

* Lohman, D. F., & Nichols, P. D. (1990). Training spatial abilities: Effects of practice on

rotation and synthesis tasks. Learning and Individual Differences, 2, 67-93.

* Longstreth, L. E., & Alcorn, M. B. (1990). Susceptibility of Wechsler spatial ability to

experience with related games. Educational & Psychological Measurement, 50, 1-6.

* Lord, T. R. (1985). Enhancing the visuo-spatial aptitude of students. Journal of Research in

Science Teaching, 22, 395-405.

* Luursema, J., Verwey, W. B., Kommers, P. A. M., Geelkerken, R. H., & Vos, H. J. (2006).

Optimal conditions for computer-assisted anatomical learning. Interacting with

Computers, 18, 1123-1138.

* McClurg, P. A., & Chaille, C. (1987). Computer games: Environments for developing spatial

cognition? Journal of Educational Computing Research, 3, 95-111.

* McCuiston, P. J. (1991). Static vs. Dynamic Visuals in Computer-Assisted Instruction.

Engineering Design Graphics Journal, 55, 25-33.

* McGee, M. G. (1978). Effects of training and practice on sex differences in mental rotation

test scores. The Journal of Psychology, 100, 87-90.

* McGillicuddy-De Lisi, A. V., De Lisi, R., & Youniss, J. (1978). Representation of the

horizontal coordinate with and without liquid. Merrill-Palmer Quarterly, 24, 199-208.

* Mendicino, L. (1958). Mechanical reasoning and space perception: Native capacity or

experience. Personnel and Guidance Journal, 36, 335-38.

* Miller, G. G., & Kapel, D. E. (1985). Can non-verbal, puzzle type microcomputer software

affect spatial discrimination and sequential thinking skills of 7th and 8th graders?

Education, 106, 160-167.


Training of spatial ability 68

* Miller, J. W., Boismier, J. D., & Hooks, J. (1969). Training in spatial conceptualization:

Teacher directed activity, automated and combination programs. The Journal of

Experimental Education, 38, 87-93.

Morris, S. B. (in press). Estimating effect sizes from the pretest-postest-control design.

Organizational Research Methods.

* Moses, B. (1969, April). The effects of spatial instruction on mathematical problem-solving

performance. Paper presented at the annual meeting of the American Educational

Research Association, San Francisco, CA.

* Mullin, L. N. (2006). Interactive navigational learning in a virtual environment: Cognitive,

physical, attentional, and visual components. Unpublished doctoral dissertation, The

Catholic University of America.

National Research Council (2006). Learning to Think Spatially. Washington D. C.: The

National Academies Press.

* Okagaki, L., & Frensch, P. A. (1994). Effects of video game playing on measures of spatial

performance: Gender effects in late adolescence. Journal of Applied Developmental

Psychology, 15, 33-58.

* Parameswaran, G. (2003). Age, gender and training in children's performance of Piaget's

horizontality task. Educational Studies, 29, 307-319.

* Parameswaran, G. & De Lisi, R. (1996). Improvements in horizontality performance as a

function of type of training. Perceptual & Motor Skills, 82, 595-603.

* Pepin, M., & Dorval, M. (1986, April). Effect of playing a video game on adults’ and

adolescents’ spatial visualization. Paper presented at the annual meeting of the American

Educational Research Association, San Francisco, CA.


Training of spatial ability 69

* Piburn, M. D., Reynolds, S. J., McAuliffe, C., Leedy, D. E., Birk, J. P., & Johnson, J. K.

(2005). The role of visualization in learning from computer-based images. International

Journal of Science Education, 27, 513-520.

* Priddle, R. E., & Rubin, K. H. (1977). A comparison of two methods for the training of spatial

cognition. Merrill-Palmer Quarterly, 23, 57-65.

* Ranucci, E. R. (1952). Effect of the study of solid geometry on certain aspects of space

perception abilities. Unpublished doctoral dissertation, Columbia University.

Rayner, K., Foorman, B. R., Perfetti, C. A., Pesetsky, D., & Seidenberg, M. S. (2001). How

psychological science informs the teaching of reading. Psychological Science in the

Public Interest, 2, 31-74.

Rosenthal, R. (1979). The “file drawer problem” and tolerance for null results. Psychological

Bulletin, 86, 638-461.

* Saunderson, A. (1973). The effect of a special training programme on spatial ability test

performance. New Guinea Psychologist, 5, 15-23.

* Savage, R. (2006). Training wayfinding: Natural movement in mixed reality. Unpublished

doctoral dissertation, University of Central Florida.

* Schaeffer, P. D., & Thomas, J. (1998). Difficulty of a spatial task and sex difference in gains

from practice. Perceptual & Motor Skills, 87, 56-58.

* Schaie, K. & Willis, S. L. (1986). Can decline in adult intellectual functioning be reversed?

Developmental Psychology, 22, 223-232.

* Schmitzer-Torbert, N. (2007). Place and response learning in human virtual navigation:

Behavioral measures and gender differences. Behavioral Neuroscience, 121, 277-290.


Training of spatial ability 70

* Schofield, N. J., & Kirby, J. R. (1994). Position location on topographical maps: Effects of

task factors, training and strategies. Cognition and Instruction, 12, 35-60.

* Seddon, G. M., & Shubber, K. E. (1984). The effects of presentation mode and colour in

teaching the visualization of rotation in diagrams of molecular structures. Research in

Science in Technological Education, 2, 167-176.

* Seddon, G. M., & Shubber, K. E. (1985a). The effects of colour in teaching the visualization

of rotation in diagrams of three-dimensional structures. British Educational Research

Journal, 11, 227-239.

* Seddon, G. M., & Shubber, K. E. (1985b). Learning the visualization of three-dimensional

spatial relationships in diagrams at different ages in Bahrain. Research in Science and

Technological Education, 3, 97-108.

* Seddon, G. M., Eniaiyeju, P. A., & Jusoh, I. (1984). The visualization of rotation in diagrams

of three-dimensional structures. American Educational Research Journal, 21, 25-38.

* Shavalier, M. (2004). The effects of CAD-like software on the spatial ability of middle school

students. Journal of Educational Computing Research, 31, 37-49.

Shea, D. L., Lubinski, D., Benbow, C. P. (2001). Importance of assessing spatial ability in

intellectually talented young adolescents: a 20-year longitudinal study. Journal of

Educational Psychology, 93, 604-614.

Sherman, J. A. (1967). Problem of sex differences in space perception and aspects of intellectual

functioning. Psychological Review, 74, 290-299.

Shonkoff, J., & Phillips, D. A (Eds.). (2000). From neurons to neighborhoods. Washington, DC:

National Academy of Sciences.


Training of spatial ability 71

* Shubbar, K. E. (1990). Learning the visualisation of rotations in diagrams of three dimensional

structures. Research in Science and Technological Education, 8, 145–154.

* Simmons, N. A. (1998). The effect of orthographic projection instruction on the cognitive style

of field dependence-independence in human resource development graduate students.

Unpublished doctoral dissertation, Clemson University, 1990.

* Sims, V. K. & Mayer, R. E. (2002). Domain specificity of spatial expertise: The case of video

game players. Applied Cognitive Psychology, 16, 97-115.

* Smedslund, J. (1963). The effect of observation on children's representation of the spatial

orientation of a water surface. The Journal of Genetic Psychology, 102, 195-201.

* Smith, G. G. (1998). Computers, computer games, active control and spatial visualization

strategy. Unpublished doctoral dissertation, Arizona State University.

* Smith, W. S., & Litman, C. I. (1979). Early adolescent girls' and boys' learning of a spatial

visualization skill. Science Education, 63, 671-676.

* Smith, W. S., & Schroeder, C. K. (1979). Instruction of fourth grade girls and boys on spatial

visualization. Science Education, 63, 61-66.

* Sorby, S. A., & Baartmans, B. J. (1996). A course for the development of 3-D spatial

visualization skills. Engineering Design Graphics Journal, 60, 13-20.

* Sorby, S. A. (2008). Applied Educational Research in Developing 3-D spatial skills for

engineering students. Unpublished manuscript.

Spelke, E. S. (2005). Sex differences in intrinsic aptitude for mathematics and science?: A

critical review. American Psychologist, 60, 950–958.

* Stericker, A. & LeVesconte, S. (1975). Effect of brief training on sex-related differences in

visual-spatial skill. Journal of Personality & Social Psychology, 43, 1018-1029.


Training of spatial ability 72

* Stringer, P. (1975). Drawing training and spatial ability. Ergonomics, 18, 101-108.

* Subrahmanyam, K. & Greenfield, P. M. (1994). Effect of video game practice on spatial skills

in girls and boys. Journal of Applied Developmental Psychology, 15, 13-32.

Terlecki, M. S., & Newcombe, N. S. (2005). How important is the digital divide? The relation

of computer and videogame usage to gender differences in mental rotation ability. Sex

Roles, 53, 433-441.

* Terlecki, M. S., Newcombe, N. S., & Little, M. (2007). Durable and generalized effects of

spatial experience on mental rotation: Gender differences in growth patterns. Applied

Cognitive Psychology. Retrieved online 3/19/2008 from www.interscience.wiley.com.

DOI: 10.1002/acp.1420.

* Thomas, D. A. (1996). Enhancing spatial three-dimensional visualization and rotational

ability with three-dimensional computer graphics. Unpublished doctoral dissertation,

United States International University, San Diego.

* Trethewey, S. O. (1990). Effects of computer-based two and three dimensional visualization

training for male and female engineering graphics students with low, middle, and high

levels of visualization skill as measured by mental rotation and hidden figures tasks.

Unpublished doctoral dissertation, Ohio State University.

* Turner, G. F. W. (1997). The effects of stimulus complexity, training, and gender on mental

rotation performance: A model-based approach. Unpublished doctoral dissertation, The

Pennsylvania State University.

U. S. Department of Education (2008). The Final Report of the National Mathematics Advisory

Panel. Retrieved online on June 24, 2008 from

http://www.ed.gov/about/bdscomm/list/mathpanel/report/final-report.pdf
Training of spatial ability 73

* Van Voorhis, W. R. (1941). The improvement of space perception ability by training.

Unpublished doctoral dissertation, The Pennsylvania State University.

* Vasta, R., Knott, J. A., & Gaze, C. E. (1996). Can spatial training erase the gender differences

on the water-level task?. Psychology of Women Quarterly, 20, 549-567.

Vasta, R., Rosenberg, D., Knott, J. A., & Gaze, C. E. (1997). Experience and the water-level task

revisited: Does expertise exact a price? Psychological Science, 8, 336-339.

* Wang, C. H., Chang, C. Y., & Li, T. Y. (2006). The comparative efficacy of 2D-versus 3D-

based media design for influencing spatial visual skills. Computers in Human Behavior,

23, 1943-1957.

* Weidenbauer, G., Schmid, J., & Jansen-Osmann, P. (2006). Manual training of mental rotation.

European Journal of Cognitive Psychology, 19, 17-36.

* Wilmshurst, L. A. & Rubin, K. H. (1974, January). Training Preschoolers to Understand Left-

Right and Spatial Relations. Research in Education.

Wilson, D. B. (2002). SPSS macros for metaanalysis. Downloaded with permission from

http://mason.gmu.edu/~dwilsonb/ma.html

* Workman, J. E., Caldwell, L. F., & Kallal, M. J. (1999). Development of a test to measure

spatial abilities associated with apparel design and product development. Clothing &

Textiles Research Journal, 17, 128-133.

* Workman, J. E. & Lee, S. H. (2004). A cross-cultural comparison of the apparel spatial

visualization test and paper folding test. Clothing & Textiles Research Journal, 22, 22-

30.

* Workman, J. E. & Zhang, L. (1999). Relationship of general and apparel spatial visualization

ability. Clothing & Textiles Research Journal, 17, 169-175.


Training of spatial ability 74

* Worsencroft, R. R. (1955). The effect of training on the spatial visualizing ability of

engineering students. Journal of Engineering Drawing, 19, 7-12.

* Wright, R., Thompson, W. L., Ganis, G., Newcombe, N. S., & Kosslyn, S. M. (2008). Training

generalized spatial skills. Unpublished manuscript.

* Yates, L. G. (1986). Effect of visualization training on spatial ability test scores. Journal of

Mental Imagery, 10, 81-91.


Training of spatial ability 75

Author Note

Linda L. Liu, David H. Uttal, Loren M. Marulis, Christopher M. Warren, and Alison R.

Lewis, Department of Psychology, Northwestern University.

Nora S. Newcombe, Department of Psychology, Temple University.

We thank Kate O’Doherty, Bridget O’Brien, Maggie Carlin, Melissa Sifuentes, Bonnie

Vu, and Eleanor Tushman for their assistance in coding the studies included in this metaanalysis.

Correspondence concerning this article should be addressed to Linda L. Liu (linda-

liu@northwestern.edu), Department of Psychology, Northwestern University, 2029 Sheridan

Road, Evanston, IL 60208-2710.


Training of spatial ability 76

Footnotes
1
We excluded cases in which the training outcome was a single summary score from an

entire psychometric test (e.g., WPPSI-R or the Kit of Factor Referenced Tests) in which no

breakdown of subtests was included. Because of the high internal consistency of standardized

test batteries, psychometric test scores appear highly malleable, but this is a characteristic of the

high internal consistency of its test items rather than due to aspects related to training procedures.
2
The Q statistic represents the total homogeneity statistic. When an analysis analogous to

ANOVA is performed on effect sizes, the Q is partitioned into the portion represented by the

grouping variable and the portion representing the within groups residual. It follows a chi-square

distribution (Lipsey & Wilson, 2001).


Training of spatial ability 77

Table captions

Table 1. Defining characteristics of the outcome measure categories and their correspondence to

categories used in past research.

Table 2. Study characteristics for the 101 studies remaining in the metaanalysis after the

exclusion of outliers.

Table 3. Mean effect sizes for control groups, treatment groups, and Ec summarized by outcome

measure category.

Table 4. Treatment group effect sizes by outcome measure and study variables.

Table 5. Treatment group effect sizes by outcome measure and type of training.

Table 6. Number of effect sizes organized by quartile as a function of outcome measure.

Table 7. Mean effect sizes for control groups and treatment groups by sex.

Table 8. Mean effect size that summarizes the size of the sex difference favoring males over

females. Averaged overall all outcome measures and also presented for each outcome measure.

Table 9. Mean effect sizes for control groups and treatment groups for children vs. adults.

Table 10. Average treatment and control group effect size for younger and older children and

adults.

Table 11. Proportion of each type of control group used by age.


Training of spatial ability 78

Figure Captions

Figure 1. Flowchart used for classifying effect sizes by method of training.

Figure 2. Distribution of raw and windosorized effect sizes.

Figure 3. Comparison of treatment vs. control group improvement for spatial perception and

spatial principles.

Figure 4. Relative proportion of control group effect size for each type of control group.

Figure 5. Mean effect sizes for types of test-retest control groups.

Figure 6. Number of separate tests included in training regimen (control group effect sizes).

Figure 7. Mean effect size as a function of time between end of training and posttest.

Figure 8. Comparison of effect sizes for immediate posttest vs. delayed posttest among studies

that included both types.

Figure 9. Mean effect size for near and far transfer after immediate and delayed administration

of posttest.

Figure 10. Scatterplot of mean effect size g vs. HDI ranking of country.

Figure 11. Three hypothetical scenarios for the effect of training on on sex differences in spatial

skill.

Figure 12. Mean effect size by type of control group and age.
Training of spatial ability 79

Table 1

Outcome Description Examples of Linn and Carroll


category measures Petersen (1993)
(1985)
Spatial Embedded Figures Spatial Visuospatial
perception Perceiving objects, Task, Flexibility Visualization Perceptual
paths, or spatial of Closure, Mazes, Speed
configurations amidst Gottschadlt
distracting background Hidden Figures,
information. Purdue Perceptual
Screening Test
Assembly/ Piecing together objects Form Board, Spatial Spatial
Transformation into more complex Block Design, Visualization Visualization
configurations or Paper Folding,
visualizing and mentally Mental Cutting,
transforming objects, Paper Folding,
often from 2D to 3D or SR-DAT-
vice versa. visualization,
Surface
Development Test
Rotation Rotate 2D or 3D objects. Vandenberg Mental Spatial
Mental Rotation, Rotation Relations/
Cube Comparison, Speeded
Wheatley Spatial Rotation
Test, Guilford-
Zimmermann
Clock task, Card
Rotation,
Alphanumeric
Rotation,
Thurstone’s Cards
and Figures
Purdue Spatial
Visualization Test
Spatial Water-level task, Spatial Not included
Understanding abstract
principles Water-clock, Perception
spatial principles, such
Plumb-line, Cross-
as horizontal invariance
bar, Rod and
or verticality.
Frame task
Perspective Piaget’s Three Not included Not included
Visualizing an
taking Mountains Task,
environment in its
Guilford-
entirety from a different
Zimmerman-
position.
spatial orientation
Training of spatial ability 80

Table 2

Variable n (studies) % of studies


Subject characteristics
Gender composition
All males 11 10.89
Both males and females 35 34.65
All females 5 4.95
Not specified a 50 49.50
Age of participants (months) b
60 months or younger or preschool 6 5.94
61 through 138, or elementary school 27 26.73
139 through 179, or junior high 12 11.88
180 through 215, or high school 7 6.93
216 and older AND college student 53 52.48
216 and older AND non-college student 10 9.90
Age range of group (years)
Younger than 13 30 29.70
13 to 18 16 15.84
18 and older 58 57.43
Study methods and procedures
Performed random assignment of subjects 48 47.52
Prescreened to include only low-scorers 11 10.89
Took place in a classroom 41 40.59
Provided feedback during training 29 28.71
Study design
Pretest-posttest with control 72 71.29
Treatment vs. Control 19 18.81
Pretest-posttest only 14 13.86
Nature of control group b
No control group 17 16.83
Received no treatment or “waitlist” 31 30.69
Business as usual 23 22.77
Diluted version of treatment 15 14.85
Alternative treatment 20 19.80
Length of training period days (out of 96 studies)
1 (one-time session) 31 32.29
2 – 30 days 35 36.46
31 – 60 days 10 10.42
61 – 112 days (about 1 quarter or semester) 16 16.67
113 – 266 days (about 1 school year) 5 5.21
More than 266 days 2 2.08
Total hours of training (out of 64 studies)
Less than 1 hour 11 17.19
1 – 2 hours 12 18.75
3- 8 hours 21 32.81
Training of spatial ability 81

9 – 20 hours 16 25.00
21 – 40 hours 5 7.81
41 – 100 hours 2 3.13
More than 100 hours 2 3.13
Total number of sessions (out of 77 studies)
1 (one-time session) 33 42.86
2 7 9.09
3–7 13 16.88
8 – 14 21 27.27
15 – 21 8 10.39
Frequency of training (out of 78 studies)
One-time session 37 47.44
One session per 1-2 weeks 14 17.95
2-3 sessions per week 14 17.95
4-5 sessions per week or “daily” 14 17.95
Days from end of training to posttest (out of 98)
None (immediate posttest) 84 85.71
1–6 7 7.14
7 – 31 11 11.22
61-90 1 1.02
More than 90 1 1.02
Categories of outcome measures b
Spatial principles c 11 10.89
Spatial perception 21 20.79
Perspective taking 9 8.91
Mental rotation 48 47.52
Assembly/Transformation 50 49.50
Training categories
Videogames 18 17.82
Courses
Course alone was treatment 15 14.85
Enhanced course 27 26.73
Spatial training
Tested effects of repeated practice 36 35.64
Tested for transfer to untrained tasks 83 82.18
Study characteristics
Published (out of 113 studies) 76 75.25
Publication year (out of 113 studies)
Through 1970s 27 26.73
1980s 20 19.80
1990s 31 30.69
2000s 24 23.76
Location of study b
Australia 1 1%
Austria 1 1%
Canada 6 7%
Training of spatial ability 82

China 1 1%
Germany 2 2%
Greece 1 1%
Korea 3 3%
Norway 1 1%
Spain 1 1%
The Netherlands 1 1%
United Kingdom 2 2%
United States 84 83%
a
Data were not reported in a way that separate effect sizes could be obtained for each sex.
b
Percentages do not sum to 100% because of studies that tested multiple age groups, used more
than one type of control group, included outcome measures from multiple categories, or tested
participants from more than one country.
c
Same as Linn and Petersen’s (1985) category of Spatial Perception
Training of spatial ability 83

Table 3
Control Treatment C vs. T Effect size Ec
sig.
Outcome category g (SE) N g (SE) N g (SE) n
Spatial perception .65 (.10) 11 .96 (.11) 11 p < .05 .52 (.12) 11
Perspective taking .46 (.12) 5 .89 (.10) 5 p < .01 .89 (.18) 5
Assembly/transform. .71 (.05) 25 .78 (.05) 25 n. s. .54 (.07) 25
Spatial principles .18 (.09) 7 .75 (.07) 7 p < .001 .89 (.11) 7
Mental rotation .51 (.04) 31 .67 (.04) 31 p < .01 .61 (.06) 31
AVERAGE .56 (.03) 55 .75 (.03) 55 p < .001 .62 (.04) 55

† Homogeneity achieved
ab
Groups labeled with different superscripts are significantly different.
* Age x Control group type χ2 significant, p < .05
Training of spatial ability 84

Table 4 Spatial Perspective Assembly/ Spatial Mental AVERAGE of


perception taking transformation principles rotation Trt groups
Training length (days) 1 (97) 10 (70) 21 (111) 1 (69) 21 (195) 21 (195)
Hours of training 1 (6) 2.67 (7) 9.58 (400) .75 (5) 6.13 (67) 6.13 (400)
Retention length (days) 0 (9) 0 (14) 0 (90) 0 (0) 0 (140) 0 (140)
Number of items/trials & 34 (12) 7 (2) 56 (27) 10 (5) 214 (108) 104 (122)

Effect size by Age p > .30 p > .26 p > .25 p > .63 p > .33 p > .52
Under 13 .69 (.31), 4 .99 (.14), 13 .52 (.21), 4 .79 (.11), 22 .55 (.10), 13 .73 (.06), 56
13 – 18 -- -- .88 (.10), 17 -- .75 (.10), 14 .83 (.07), 31
Over 18 1.04 (.15), 19 .53 (.39), 1 .76 (.07), 39 .71 (.13), 10 .67 (.04), 82 .74 (.03), 151

Effect size by training


n. s. n. s. n. s. n. s. p < .001 p > .18
frequency
One-time session 1.03 (.17), 15 1.03 (.15), 11 .46 (.24), 3 .78 (.08), 31 .38 (.08), 19 .72 (.05), 79
Once per 1-2 weeks .69 (.27), 5 .48 (.36), 1 .80 (.09), 20 -- .93 (.07), 27 .85 (.06), 53
2-3 times a week -- 1.41 (.51), 1 .64 (.16), 7 .28 (.40), 1 .69 (.08), 18 .68 (.08), 27
4-5/ week; “daily” -- -- .89 (.11), 16 -- .59 (.08), 18 .73 (.07), 34
Avg. multi-session a .67 (.24), 6 .68 (.21), 3 † .83 (.06), 47 .28 (.40), 1 .81 (.04), 75 .81 (.04), 132

Feedback provided
p < .05 n. s. p < .01 p < .05 n. s. n. s.
after each trial?
Yes .48 (.25), 5 .81 (.31), 2 † .99 (.09), 17 .89 (.10), 21 .63 (.06), 33 .77 (.05), 78
No 1.13 (.14), 18 .97 (.14), 12 .70 (.06), 43 .51 (.13), 11 † .67 (.04), 75 .73 (.03), 159

Study published? p > .37 p > .26 p > .56 -- p > .82 p > .69
Yes 1.01 (.14), 21 .99 (.14), 13 .76 (.07), 38 .76 (.60), 32 .66 (.04), 75 .76 (.03), 179
No .61 (.43), 2 .53 (.39), 1 .82 (.09), 22 -- .68 (.06), 34 .73 (.05), 59

Random assignment? p > .22 p > .75 p < .05 p > .48 p < .01 p < .001
Yes .90 (.14), 19 .85 (.32), 2 .64 (.08), 27 .80 (.11), 20 .57 (.05), 57 .66 (.04), 125
Training of spatial ability 85

No 1.31 (.31), 4 .96 (.14), 12 .90 (.07), 33 .68 (.14), 12 .78 (.05), 52 .85 (.04), 113

Classroom? p > .58 p > .19 p > .10 p > .55 p > .11 p > .39
Took place in classroom .65 (.61), 1 .84 (.15), 8 .89 (.08), 22 .85 (.15), 12 .60 (.06), 41 .73 (.04), 84
Outside of classroom .99 (.14), 22 1.22 (.25), 6 .72 (.07), 38 .74 (.10), 19 .72 (.05), 68 .77 (.03), 153
† Homogeneous (p > .005) Groups are significantly different: p < .05, p < .01, & p < .001
* + a
Average of all multi-session groups
Training of spatial ability 86

Table 5 Spatial Perspective Assembly/ Spatial Mental rotation AVERAGE of


perception taking transformation principles Trt groups
Courses (Trt groups only) 1.14 (.47), 3 -- .91 (.08), 29 -- .50 (.05), 35 .70 (.05), 67
Course served as treatment 1.65 (.66), 2 -- 1.21 (.15), 9 * -- .76 (.15), 4 1.11 (.10), 15 &
Enhanced course .23 (.90), 1 -- .77 (.10), 20 * -- .46 (.06), 31 .58 (.05), 52 &

Pretest-posttest design -- -- .53 (.08), 12 -- 1.28 (.14), 17 .93 (.09), 29


Course served as treatment -- -- 1.10 (.18), 2 & -- 1.63 (.19), 11 + 1.32 (.14), 15 &
Enhanced course -- -- .39 (.09), 10 & -- .75 (.24), 6 + .56 (.13), 14 &

Trt vs. Control design (Ec) -- -- .70 (.16), 17 -- -- .53 (.10), 31


Course served as treatment -- -- 1.39 (.33), 5 * -- -- 1.21 (.24), 7 *
Enhanced course -- -- .48 (.19), 12 * -- -- .37 (.12), 24 *

Videogames .56 (.22), 3 .53 (.37), 1 .63 (.07), 12 .28 (.38), 1 .75 (.06), 40 .71 (.05), 57
Non-MR videogame -- -- .35 (.09), 4 & -- .72 (.08), 27 .63 (.06), 36 *
MR videogame -- -- .91 (.09), 8 & -- .85 (.12), 13 .87 (.09), 21 *

Spatial training p > .93 p > .31 p < .05 p > .55 p > .24 p > .49
Repeated practice 1.01 (.18), 4 .90 (.15), 7 1.10 (.21), 3 * .82 (.10), 17 .53 (.09), 20 .93 (.07), 34
Transfer from training on a
.94 (.12), 14 1.17 (.24), 6 .65 (.07), 32 * .72 (.13), 14 .62 (.06), 41 .65 (.04), 131
different spatial task
 “Near” transfer 1.66(.15),8& -- .65 (.19), 3 .73 (.45), 14 1.03 (.08), 11 & 1.01 (.07), 36 &
 “Far” transfer .29 (.14), 6& 1.38 (.36), 6 .64 (.06), 29 -- .44 (.05), 30 & .56 (.04), 79 &

-- No cases found
† Homogeneity achieved (p > .005)
Groups are significantly different: * p < .05, + p < .01, & p < .001
Training of spatial ability 87

Table 6 Outcome measure*

Spatial Perspective Assembly/ Spatial Mental


perception taking transformation principles rotation

Q1 4 0 13 11 32

Q2 6 5 22 5 25

Q3 4 1 19 7 28

Q4 9 8 14 9 24

† Homogeneity achieved
ab
Groups labeled with different superscripts are significantly different.
* Outcome measure x Quartile χ2 significant, p < .05

Table 7

Control a Experimental b

Females .53 (.05), 66 .81 (.05), 69

Males .49 (.05), 63 .71 (.05), 63

† Homogeneity achieved
ab
Groups labeled with different superscripts are significantly different.
Training of spatial ability 88

Table 8

Sex difference Sex difference Pretest-posttest


at pretest at posttest Significance

Overall .50 (.04), 48 .44 (.04), 48 p > .35

Spatial perception .47 (.11), 9 .36 (.11), 9 p > .46

Perspective taking -- -- --

Assembly/transformation .47 (.13), 6 .48 (.13), 6 p > .96

Spatial principles .32 (.10), 2 .24 (.09), 2 p > .54

Mental rotation .52 (.06), 31 .47 (.06), 31 p > .51

† Homogeneity achieved
ab
Groups labeled with different superscripts are significantly different.
Training of spatial ability 89

Table 9

Control a Experimental b

Children .41 (.05), 62 a .69 (.05), 89

Adults .61 (.03), 170 b .77 (.03), 167

† Homogeneity achieved
ab
Groups labeled with different superscripts are significantly different.

Table 10

Control Experimental

Younger than 13 .34 (.07), 32 a .73 (.06), 56

13 – 18 years .69 (.06), 43 b .79 (.07), 39

Older than 18 years .57 (.03), 149 b .74 (.03), 151

† Homogeneity achieved
ab
Groups labeled with different superscripts are significantly different.
Training of spatial ability 90

Table 11 Separable studies All studies *

Trt as Diluted Alternative Diluted Alternative


No trt No trt Trt as usual
usual trt trt trt trt
6 4 2 2 14 5 5 3
Younger than 13

13 – 18 years 3 6 2 1 4 8 2 1

Older than 18 years


11 7 3 11 14 10 8 16

† Homogeneity achieved
ab
Groups labeled with different superscripts are significantly different.
* Age x Control group type χ2 significant, p < .05
Training of spatial ability 91

Figure 1

Start by reading about the training


for the outcome measure in the study
(code each line of the study separately)

Stop. Enter yes Does training consist


ÒVIDEO GÓ of playing video or computer
games?

no
Stop. Enter
ÒSPECIFICÓ
yes
Did training Were SÕ s enrolled in a
yes
match the outcome course where the training
no measure? occurred?
Stop. Enter
ÒCOURSESÓ
no

Stop. Enter yes Were SÕ


s trained on the
no
ÒSPECIFICÓ same task as the
outcome measure?
Training of spatial ability 92

Figure 2

6.0

4.0


g







 


2.0 

0.0

Unadjusted g Windsorized g
Training of spatial ability 93

Figure 3

1.2

1 0.95
Mean effect size (g)

0.8
0.76
0.64
0.6

0.4
0.18
0.2

0
Spatial principles Spatial perception

Control Treatment
Training of spatial ability 94

Figure 4

0.8
Proportion of effect sizes

0.6

0.4

0.2

0
n

n
es

n
g
io

io

ti o
in

l
at

ip

at
k

ep
ta

c
m

ot
rin

c
r

r
fo

er
iv

al
lp
s

lp
ct

t
an

en
ia
pe

ia
at
tr

at
rs
y/

Sp

Sp
Pe
bl
m
se
As

No control group Receives nothing Treatment as usual


Diluted treatment Alternative treatment
Training of spatial ability 95

Figure 5

0.8

0.7 0.65 0.66


Mean effect size (g)

0.6 0.54
0.5
0.37
0.4

0.3
0.2
0.1

0
Nonspatial Spatial
Type of filler task

Single measure Multiple measures


Training of spatial ability 96

Figure 6

0.9
0.78
0.8
Mean effect size (g)

0.7 0.59
0.6 0.49
0.5
0.4
0.3
0.2
0.1
0
One measure 2 - 4 measures 5 or more
measures
Number of test-retest measures per study
Training of spatial ability 97

Figure 7

1
0.9
0.8 0.76
0.7
Effect size (g)

0.59 0.57
0.6
0.5
0.4
0.3
0.2
0.1
0
Immediate test Up to 2 weeks More than 2
weeks
Training of spatial ability 98

Figure 8

1
0.9
0.8
0.7 0.64 0.65
Effect size (g)

0.6
0.5
0.4
0.3
0.2
0.1
0
Immediate test Delayed test
Training of spatial ability 99

Figure 9

1.2
1.05
1
Effect size (g)

0.8 0.72
0.58
0.6
0.37
0.4
0.2
0
Near Far
Transfer type

Immediate test Delayed test


Training of spatial ability 100

Figure 10
Training of spatial ability 101

Figure 11

a) M b)
M
F
a F
a
Training of spatial ability 102

Figure 12

1.2
p < .05
1
Average effect size (g)

p < .01
0.8
p = .08
0.6

0.4

0.2

0
Younger than 13 13 - 18 years Older than 18
-0.2

Receives nothing Treatment as usual Diluted treatment Alternative treatment


Training of spatial ability 103

Appendix

Mean effect sizes and key characteristics of studies included in the metaanalyis.

Training Outcome Outcome


Authors Training description Sex c Age d g k
category a measure category b
Arond 1966 step-by-step training vs. perspective taking
confrontation training 1 task (author 2 3 2 .524 3
vs. control (no training) created)
Barsky & Lachman 1986 - Not separated Physical knowledge vs.
Reference system vs. RFT (Rod and
1, 2 3 1 1 .445 2
control (observe and Frame Task)
think only)
Barsky & Lachman 1986 - Control Water level task,
Barsky & Lachman 1986 - Exp Physical knowledge vs. Plumb line task, .151 6
Reference system vs. 1, 2 Embedded Figures 3, 4, 5 1 1
control (observe and Test, Primary
think only) Mental Abilities- .666 8
Space Relations
Batey 1986 Highly-specific training
vs. non-specific training SR-DAT,
(instruction in 1, 2 Horizontality test, 1, 3, 4, 5 1, 2 2 .504 16
orthographic projection) VMRT, GEFT
vs. control (no training)
Battista 1982 Effect of geometry
course on those high
vs. low in cognitive 3 PSVT-R 4 1 1 .495 1
development (by
Sheehan’s Longeot).
Beilin, Kagan & Rabinowitz 1966 - Control Perceptual training on
WLT, with or without .061 2
motor response, vs. Water level task
Beilin, Kagan & Rabinowitz 1966 - Exp 2 3 3 2
Verbal program vs. jar-
covered or jar-exposed .848 12
control
Ben-Chaim, Lappan & Huang 1988 5th, 6th, 7th, and 8th Middle Grades
graders from inner city, Mathematics
rural, and suburban Project (Spatial
schools trained in 2 Visualizations Test) 7 1, 2 2 1.08 14
spatial visualization
(concrete activities,
building, drawing solids)
Blade & Watson 1955 Cooper Union, West 3 College Entrance 7 2 1 .717 9
Training of spatial ability 104

Point, University of Examination Board


Wisconsin Engineering test (Spatial
students after 1 year of Relations Test
engineering coursework Form VAC-1)
vs. control (non-
engineering courses)
Literature students with .42 2
Blatter 1983 - Control spatial relations
instruction (2 and 3D 2 SR-DAT 1 1, 2 2
visualization, block .53 2
Blatter 1983 - Exp counting, graphing) vs.
normal literature course
Braukmann 1991 - Control Along with lecture on
orthographic projection, TOPS, SMT cube .315 3
Braukmann 1991 - Exp 2 1, 4 3 1
3D CAD vs control test
(traditional 2D manual .636 2
drafting) training
Brinkmann 1966 8th graders: 505-item
program to improve
geometry visualization 2 SR-DAT 1 3 2 1.151 1
vs. control (regular
math class)
Brown 1954 - Control At 3 schools, 2 year vs.
1 year vs. 6 months 1.171 10
plane, advanced and
3 1 3 1
solid geometry vs. SR-DAT
Brown 1954 - Exp control group of math
enthusiasts (who later 1.533 5
took solid geometry)
Carpenter, Brinkmann & Lirones 1965 In 4 school districts,
space relations training
(develop flat patterns 2 SR-DAT 1 3 2
1.396 1
into solids using tactile
training, films) vs.
control (no training)
Chance & Goldstein 1971 Tested effect of Embedded Figures
repeated practice (no 1 Task 5 1, 2 1 1.116 2
control group)
Chatters 1984 Groups comparable in
visual-motor perceptual
skill given video-game WISC - Block
4 1, 5 3 2 .557 2
training (Space Design, Mazes
Invaders) vs. control
(no training)
Training of spatial ability 105

Churchill, Curtis, Coombs & Harrell 1942 - Drafting course vs. Surface 1.255 1
Control control (Water Development Test
3 1 2 1
Churchill, Curtis, Coombs & Harrell 1942 - Purification course) 1.391 1
Exp
2D and 3D visualization Multiple Aptitudes .678 2
Ciganko 1973 - Control practice vs. control (2D Test of 2-D Spatial
2 1 3 2
and 3D observational Relations .797 2
Ciganko 1973 - Exp practice)
Clements et al. 1997 Geometry training in
slides, flips, turns etc. Wheatley Spatial
4 4 1, 2 2 1.191 2
using video game Test (MRT)
Tumbling Tetronimoes
Folding Blocks
Connor, Schackman & Serbin 1978 – Unsep Task (adapted from 1 1, 2 2 .259 2
SR-DAT)
Connor, Schackman & Serbin 1978 – Control Training in visuospatial .618 2
1, 2
disembedding vs.
control (no training) Children's 5 1, 2 2
.969 2
Connor_Schackman_and_Serbin_1978 – Exp Embedded Figures
Test

Day 1997 Block design training 1 Block Design 1 3 2 1.533 2


(WPPSI subtest)
DeLisi & Cammarano 1996 - Control Video game training .284 2
with Blockout vs. VMRT
4 4 1, 2 1
DeLisi & Cammarano 1996 - Exp control (solitaire) .566 2

DeLisi & Wolford 2002 - Control Video game training French Kit Card .341 2
with Tetris vs. control 4 Rotation Test 4 1, 2 2
DeLisi & Wolford 2002 - Exp (Carmen Sandiago) .597 2
Deratzou 2006 Visualization training Card rotation, cube
with problems sets, comparison, Form
journals, videos, lab 3 Board, Paper 4 1, 2 2 .583 10
experiments, computers Folding, Surface
Development Test
.265 2
Dorval & Pepin 1986 - Control Zaxxon video game
playing vs. control (no Embedded Figures
game play) 4 Test 5 1, 2 1
.540 2
Dorval & Pepin1986 - Exp
Training of spatial ability 106

Drauden 1980 - Control Spatial orientation


training focusing on MR Form Board, 1.006 12
using Cards and Vandenberg-Kuse
2 1, 4 1, 2 1
Drauden 1980 - Exp Figures, Cubes Test, MRT
and Guilford’s Clocks vs .956 14
control (no training)
Duesbury & O'Neil 1996 Wireframe CAD training Flanagan Industrial
on orthographic Tests Assembly,
projection, line-feature SR-DAT, Surface
matching, 2 and 3D 2 Development Test, 1 2 1 .690 4
visualization vs. control Paper Folding Test
(traditional blueprint
reading course)
Eliot 1966 K, 1st and 3rd graders
given non-reversible or Mapboard task
reversible and simple or (based on Three
2 2 3 2 .239 12
complex training on 3 Mountains task)
Mountains Task vs.
control (went to library)
Embretson 1987 Paper folding training
vs. control (clerical 2 SR-DAT 1 3 1 .686 3
training)
Emler 1982 - Control Reconstruct model Construction task:
village with Combined Make model of .450 3
conflict vs. Inter- village from
Emler 1982 - Exp individual conflict vs. 1 different 2 3 2
Intra-individual conflict perspectives
vs. collective conflict vs. .896 7
control (direct practice)
Faubion, Cleveland & Harrel 1942 Mechanical course
training vs. control (also Surface
soldiers, matched in 3 Development Test 1 2 1 .056 1
mental test scores, no
mechanical training)
Feng, Spence & Pratt 2007 - Control Training using action
vs. control (nonaction VMRT .330 4
videogame) 4 4 1, 2 1
Feng, Spence & Pratt 2007 - Exp
1.279 4
Feng 2006 - Control Training using action .274 2
vs. control (nonaction VMRT
4 4 1, 2 1
Feng 2006 - Exp videogame) 1.106 2

Ferrini-Mundy 1987 - Unseparated Audiovisual spatial 3 1 3 1 .519 2


visualization training, SR-DAT
with or without tactual
Training of spatial ability 107

practice, vs. Control1


(posttest only group)

Ferrini-Mundy 1987 - Control Audiovisual spatial


visualization training, .655 1
with or without tactual SR-DAT
3 1 3 1
Ferrini-Mundy 1987 1 - Exp practice vs. Control2
(calculus course as .620 2
usual)
Gagnon 1985 - Control Playing 2D Targ and 3D Guilford-
Battlezone video games Zimmermann .388 3
vs. control (no video Spatial Orientation
Gagnon 1985 - Exp game playing) 4 and Visualization; 2, 4, 5 3 1
Employee Aptitude
Survey: Visual .535 3
Pursuit Test
Gerson et al. 2001 - Control Engineering course with SR-DAT, Mental .841 5
lecture and spatial Cutting Test, MRT,
Gerson et al. 2001 - Exp modules vs. control 3 3DC (cube test), 1, 4 3 1
(course and modules PSVT-R 1.067 5
without lecture)
Geva 1987 - Control 2nd vs. 4th graders - 7 Map reading task
months of LOGO .249 6
Geva 1987 - Exp instruction vs control 4 4 3 2
(regular computer use .368 6
in school)
Gittler & Gluck 1998 - Control Training in Descriptive 3DC (Three-
Geometry vs. control dimensional cube .296 2
Gittler & Gluck 1998 - Exp (no course) 3 test) 4 1, 2 2
.622 2

Golbeck_1998 - Control 4th vs. 6th grade,


matched ability vs. .304 2
unmatched-high vs. Water level task
Golbeck_1998 - Exp 1 3 3 2
umatched-low vs.
control (worked alone) .528 6

Heil et al. 1998 Practice group with RT mental


additional, specific rotations-familiar
practice vs. control (3 objects in familiar
1 4 3 1 .562 2
sessions of MR practice orientations
without additional
specific practice)
Hsi, Linn & Bell, 1997 Pretest vs. posttest 2 Paper folding, cube 1, 4, 7 1, 2 1 .517 4
after strategy instruction counting, matching
Training of spatial ability 108

using Block Stacking rotated objects,


and Display Object spatial battery of
software modules with orthographic,
isometric vs. isometric views
orthographic items
Johnson 1991 Isometric drawing aid
vs. 3D rendered model SR-DAT
vs. animated wireframe
2 1 3 1 .016 4
vs. control (no aid,
practice with drawings
only)
Johnson et al.1979 - Control Liberal arts vs. drafting .887 4
vs. math course vs. Embedded Figures
Johnson et al.1979 - Exp control (no training, 3 Test 5 1, 2 1
1.464 2
math or liberal art
major)
Kaplan & Weisberg 1987 Pretest vs. posttest for Purdue Perceptual
3rd vs. 5th graders vs. Screening Test
1 5 3 2 .430 2
control (no feedback) (embedded and
successive figures)
Kass, Ahlers & Dugger 1998 Practice Angle on the
Bow task with feedback Angle on the Bow
and read instruction 1, 2 measure 5 1, 2 1 .501 8
manual vs. control
(read manual only)
Kastens & Liben 2007 Explaining condition vs. Sticker Map Task
control (did not explain (representational
1 1 3 2 .791 3
sticker placement) correspondence,
errors, offset)
Kastens, Kaplan & Christie-Blick 2001 Training using Where Reality-to-Map
are We? videogame vs. (Flag-Sticker) Test
4 1 1, 2 2 .320 2
control (completed task
without assistance)
Kidder 1973 Training to form and Math Invariance
draw mental image of Test, Achievement
Euclidean test battery
transformation vs. 2 4, 7 3 2 .748 2
control (learning
geometric terms for the
transformations)
Kirby & Boulter 1999 - Control Training in object Factor referenced
manipulation and visual tests (Hidden .055 1
Kirby & Boulter 1999 - Exp imagery vs. paper- 2 patterns, card 7 3 2
pencil instruction vs. rotations, surface .123 2
control (test-only) development)
Training of spatial ability 109

Kozhevnikov & Thornton 2006 - Control Added Interactive Paper Folding Test,
Lecture Demonstrations MRT .399 9
(ILDs) to physics
Kozhevnikov & Thornton 2006 - Exp instruction for Dickinson
vs. Tufts science and 2, 3 1, 4 3 1
nonscience majors and
.424 9
middle-school and high
school science teachers

Kwon 2002 - Control Visualization software Middle Grades .521 1


using Virtual Reality vs. Mathematics
Kwon 2002 - Exp 2, 4 7 1 2
control (standard 2D Project Spatial .703 1
text and software) Visualization Test
Kwon 2003 - Control Spatial visualization Middle Grades
instructional program Mathematics .150 1
Kwon 2003 - Exp using Virtual Reality vs. 2 Project Spatial 7 3 2
Paper-based instruction Visualization Test .915 2
vs. control (no training)
Larson 1996 Commentary and View point task
movement vs. control (based on 3
1 2 3 2 1.423 1
(no commentary and no Mountains)
movement)
Larson et al.1999 Repeated Virtual VMRT
Reality Spatial Rotation
1 4 1, 2 1 .303 2
training vs. control (filler
task)
Lee 1995 - Control LOGO training vs. .210 1
control (no LOGO 4 Water level task 3 3 2
Lee 1995 - Exp .280 1
training) for 2nd graders
Leino & Willemsen - Control Victor box graduated .250 1
training of horizontality 1 Water level task 3 1 1
Leino & Willemsen - Exp vs. control (no training) .939 1
Lizarraga & Ganuza 2003 - Control Mental rotation training SR-DAT – .256 2
worksheet vs. control visualization and
Lizarraga & Ganuza 2003 - Exp 1, 2 1, 4 3 2
(regular math course) mental rotation .791 2
Lohman & Nichols 1990 - Control Train with repeated Form Board, Paper
practice on 3D MRT - Folding Test, Card 1.01 5
test on speeded Rotations,
Lohman & Nichols 1990 - Exp 1, 2 1, 4 3 1
rotation task vs. control Thurstone’s
(test-retest, without the Figures, MRT .894 4
repeated practice)
Training of spatial ability 110

Longstreth_&_Alcorn_1990 1 - Control Play with blocks WPSSI - Block


different vs. same in Design, Mazes .328 2
Longstreth_&_Alcorn_1990 1 - Exp color as those used in 2 1, 5 3 2
WPSSI vs. control (play .618 4
with non-block toys)
Lord_1985 1 - Control Imagining planes Planes of
cutting through solid Reference, Factor .057 4
training vs. control Referenced Tests-
(regular biology class 1, 2 Spatial Orientation 1, 4, 5, 7 3 1
Lord_1985 1 - Exp with lecture, seminar and Visualization,
and lab) Flexibility of .795 4
Closure
Luursema et al. 2006 Study 3D stereoptic and Identification of
2D anatomy stills vs. anatomical
control (study only structures and
2 1, 5 3 1 .465 2
typical 2D biocular localization of
stills) cross-sections in
frontal view
McClurg & Chaile 1987 - Control 5th vs. 7th vs. 9th grade:
Factory themed vs. Mental Rotations .339 3
McClurg & Chaile 1987 - Exp Stellar 7 mission video 4 Test 4 3 2
games vs. control (no .796 6
video game play)
McCuiston 1991 Computer assisted
descriptive geometry
lesson with animation VMRT
2 4 3 1 .503 1
and 3D views vs.
control (static lessons
with text, no animation)
McGee 1978 Effect of static vs. Shepard & Metzler-
dynamic presentation of inspired MRT
2 4 1, 2 1 .535 2
graphic images in
improving MR
McGillicuddy-DeLisi, DeLisi & Youniss 1978 - 1st, 3rd, 5th grade vs. .278 4
Control college students trained Crossbar task,
1 3 3 1, 2
McGillicuddy-DeLisi, DeLisi & Youniss 1978 - with crossbar task vs. water level task .848 4
Exp control (water level)
Mendicino 1958 - Control Year-long 10th grade
.474 1
vocational machine
Mendicino 1958 - Exp shop class with related 3 SR-DAT 1 2 2
mechanical drawing vs. .444 1
non-engineering class
Training of spatial ability 111

Miller & Kapel 1985 - Control 7th vs. 8th grade Gifted
vs. control (Average Wheatly Spatial .750 4
Miller & Kapel 1985 - Exp ability) students trained 4 Test (MRT 4 3 2
with problem solving .939 4
video game
Miller, Boismier & Hooks 1969 Teacher-directed
training vs. automated Perspective Ability
training in sighting vs. 2 Test (PAT) 2 1, 2 2 .711 2
combination vs. control
(no training)
Moses 1979 Spatial mathematics Form Board,
course including Punched Holes,
lessons on 3D and 2D Card Rotations
objects vs. control from Kit of
3 1, 4 3 2 .516 4
(math class as usual) Reference Tests for
Cognitive Factors,
Gulliksen’s
Identical Blocks
Mullin 2006 Physical vs. cognitive Wayfinding to
vs. no physical control target, pointing to
over navigation, with 1 target, recalling 5 3 1 .392 32
attention vs. distracted object locations
during wayfinding
Okagaki & Frensch 1994 - Control Tetris video game Form Board, .239 6
training vs control (no Card Rotation,
Okagaki & Frensch 1994 - Exp 4 1, 4 1, 2 1
video game) Cube Comparison, .420 6
(from French kit)
Parameswaran 2003 Ages 5, 6, 7, 8, 9: Water level task,
Graduated training vs. Verticality task
Demonstration training 1, 2 3 1, 2 2 .870 40
vs. control (completed
task with no feedback)
Parameswaren 1996 - unseparated Tutor guided direct Van verticality test,
Parameswaren 1996 - Control instruction in principle Water-clock and
1, 2 3 1, 2 1 .703 16
Parameswaren 1996 - Exp vs. Learner guided self- cross-bar tests of
discovery vs. control horizontality
(no feedback) Water level task .103 2
1 3 1, 2 1
.525 4
Pepin & Dorval 1986 - Control Zaxxon video game .157 4
training vs. control (no SR-DAT
4 1 1, 2 1, 2
Pepin & Dorval 1986 - Exp training) .332 4

Piburn et al. 2005 Computer enhanced 2 Surface 1, 4 3 1 .539 3


geology module vs. Development Test-
control (regular geology Visualization and
Training of spatial ability 112

course with standard Orientation (Cube


written manuals) Rotation Test )
Priddle & Rubin 1977 - Control Training left-right Spatial
relations using Verbal- perspectives task 1.173 2
visual vs. Verbal-motor 2 (consider object’s 2 3 2
Priddle & Rubin 1977 - Exp
vs. control (nonspatial appearance from 1.682 4
activity) other views)
Ranucci 1952 - Control Solid geometry training Form Board, SR- .651 3
vs. control (taking non- DAT, Thurstone's
3 1, 4 3 2
Ranucci 1952 - Exp solid geometry math Lozenge Test .590 3
course)
Saunderson 1973 Spatial training with Compilation of 4
manipulatory materials spatial tests: SR-
vs. control (regular DAT, Concealed
2 7 3 1 .551 2
nonspatial, math figures, Fitting
course) shapes, Inverse
drawing
Savage 2006 Repeated practice Time required to
using virtual reality to 1 traverse each tile 5 3 1 .723 6
traverse a maze of maze
Schaeffer & Thomas 1998 Repeated practice on Gottschadlt Hidden
rotated Embedded 1 Figures, EFT 5 1, 2 1 .828 2
Figures Task
Schaie et al. 1986 Spatial training vs. Alphanumeric
control (inductive rotation, Object
1 4 3 1 .417 3
reasoning training) rotation, PMA-
Spatial Orientation
Schmitzer-Torbert 2007 - Control Place vs. Response Percent correct
.882 8
learning of Transfer and Route stability
1, 2 5 1, 2 1
Schmitzer-Torbert 2007 - Exp target vs. control on maze learning
1.645 8
(Training target) for first vs. last trial
Schofield & Kirby 1994 Area (restricted search Location time to
space) vs. Area and mark placement on
Orientation provided, map, Surface
vs. Spatial training Development, Card
(identify features and 1, 2 Rotations (S-1 1, 4 2 1 .491 12
visualize contour map) Ekstrom kit of
vs. Verbal training factor referenced
(verbalize features) vs. cognitive tests)
control (no instructions)
Seddon & Shubber 1984 All vs. half colored 1 Rotations test 4 2 2 .758 9
slides vs. monochrome (author created)
slides shown
simultaneously vs.
cumulatively vs.
Training of spatial ability 113

individually vs. control


Seddon & Shubber 1985a 13-14, 15-16, vs. 17-18 Mental rotations
year-olds, with 0, 6, test (author
9,15 or 18 colored created)
1 4 2 2 1.886 36
structures, with and
without 3, 6, or 9
diagrams
Seddon & Shubber 1985b 13-14, 15-16 vs. 17-18 Framework test,
year-olds Cues test—
Overlap, Angles,
1 Relative size, 1, 4 2 2 .995 18
Foreshortening;
Mental rotation
(author created)
Seddon, Eniaiyeju & Jusoh 1984 D vs. SMD vs. MD Mental rotations
training for those failing test (author
1. 2. 3 or 4 cue tests. created)
Compared 10° vs. 60,°
abrupt vs. dissolving, 1 4 2 1 1.742 24
diagram change for
children remediated in
Stage 1 vs. control (no
remediation)
Shavalier 2004 - Control Trained with Virtus Paper folding test, .435 3
WalkthroughPro Eliot Price Test
Shavalier 2004 - Exp 2 1, 2, 4 3 2
software vs. Control (adaptation of 3 .491 3
group (no treatment) mountains), VMRT
Shubbar 1990 3 vs. 6 vs. 30 second Mental Rotations
rotation speed, with or 1 test (author 4 2 2 2.260 6
without shadow created)
Simmons 1998 - Unseparable Took pretest-posttest on Visualization test,
both Visualization and GEFT
GEFT vs. only
2 1, 5 3 1 .359 3
Visualization vs. no
Visualization (GEFT
posttest only)
Simmons 1998 - Control Self-paced instruction Visualization test,
booklet in orthographic GEFT .565 1
projection vs. control
Simmons 1998 - Exp 2 1, 5 3 1
(professor-led
discussion of .646 1
professional issues)
Training of spatial ability 114

Sims & Mayer 2002 - Control Tetris players vs. non- Paper folding test,
Tetris players vs. Form Board and 1.111 9
control (no video game MRT (with tetris vs.
Sims & Mayer 2002 - Exp 4 1, 4 1 1
play) nontetris shapes or
letters), Card 1.193 9
Rotations
Smedslund 1963
Ink horizontality training Water-level task
1 3 3 2 .184 1

Smith 1998 - Control Active (used computer) visualization .220 1


vs. Passive participants puzzles,
Smith 1998 - Exp 2 1 2 2
(watched actives use polynomial -.254 1
the computer) assembly
Smith & Litman 1979 Tangram training vs. SVAT (Spatial
control (placebo test 2 Visualization 1 1, 2 2 .569 2
given in next room) Abilities Test)
Smith & Schroeder 1979 Tangram training vs. SVAT (Spatial
control (took SVAT 2 Visualization 1 1, 2 2 .947 2
before training only) Abilities Test)
Sorby & Baartmans 1996 Freshman engineering Purdue Spatial
students (male and Visualization Test
female) (PSVT-R) - Score -
3 4 3 1 .926 1
Identify 3D
irregular solid in a
different orientation
Sorby 2007 pretest vs posttest Mental Cutting Test
scores for those in (MCT), DAT-SR,
initial spatial skills Purdue Spatial
course (1 quarter) or 3 Visualization Test 1, 4 3 1 1.718 9
those in multimedia (PSVT-R)
software course (1
semester)
Stericker & LeVesconte - Control spatial visualization SR-DAT, Primary
training vs control (no Mental Abilities- .437 8
Stericker & LeVesconte - Exp training) 1 Space Relations, 1, 4, 5 1, 2 1
VMRT, Group .974 8
Embedded Figures
Stringer 1975 - Control Drawing course training SR-DAT, Form
vs. control (typical Board, Paper .894 5
course assignment) 2 Folding, 2D Card 1, 4 2 1
Stringer 1975 - Exp
Rotations, 3D .850 5
Cube Comparisons
Subrahmanyam & Greenfield 1994 Playing spatial video Computer based
4 5 1, 2 2 2.176 2
game Marble Madness test of dynamic
Training of spatial ability 115

vs. control (quiz show spatial skills


game Conjecture)
Terlecki, Newcombe & Little 2007 - Control Playing tetris along with Paper Folding Test,
repeated practice vs. Surface .629 6
Terlecki, Newcombe & Little 2007 - Exp control (repeated Development Test,
practice only) Guilford-
1, 2, 4 1, 4 1, 2 1
Zimmerman Clock
task, MRT .852 6

Thomas 1996 - Control 3D CADD instruction Cube rotation .745 2


vs. control (2D CADD 3 (author created) 4 2 1
Thomas 1996 - Exp instruction) 1.074 2
Trethewey 1990 - Control Paired with partner vs. MRT, Flexibility of
control (worked alone), Closure .604 4
Trethewey 1990 - Exp High vs. mid vs. low 2 4, 5 3 1
scorers on PFT .591 3
placement test
Turner 1997 - Control CAD mental rotation
training using same or VMRT .117 8
different, old or new
item types, for Cooper
Turner 1997 - Exp 2 4 1, 2 1
Union vs. Penn State
engineering students .219 8
vs. control (standard
wireframe CAD)
Van Voorhis 1941 - Control Extended space .956 1
perception training vs. Cards & Figures
Van Voorhis 1941 – Exp 3 4 2 1
control (usual class in (Thurstone PMA) 1.597 1
descriptive geometry)
Vasta, Knott & Gaze 1996 Self-discovery
(problems ranked in Water level task,
difficulty and competing Plumb line task
1, 2 3 1, 2 1 .214 4
cues) vs. control (equal
practice with nonranked
problem set)
Wang, Chang & Li 2006 - Control 3D media presenting Purdue
interactive visualization Visualization of -.211 1
exercises showing Rotation Test
Wang, Chang & Li 2006 - Exp different perspectives, 2 4 3 1
manipulation and
.080 1
animation of objects vs.
control (2D media)
Training of spatial ability 116

Weidenbauer, Schmid & Jansen-Osmann Virtual Manual MRT


2007 - Control training with joystick vs. .245 16
control (play computer VMRT – RT, Errors
1, 2 4 1, 2 1
Weidenbauer, Schmid & Jansen-Osmann quiz show game).
2007 - Exp Compared rotations of .373 16
22.5, 67.5, 112.5, 157.5
Wilmhurst 1974 - Control Relational skill training Spatial
vs. control (no training) Egocentricity and .082 .043
Wilmhurst 1974 - Exp 2 Relations Test 2 3 2
(author created) 1.410 .134

Workman, Caldwell & Kallal 1999 Training in clothing Apparel Spatial


construction and Visualization Test
3 1 1 1 1.015 2
pattern making vs. (author created),
control (no training) SR-DAT
Workman & Lee 2004 Flat pattern apparel Apparel Spatial
training Visualization Test
3 1 3 1 .373 2
author created),
Paper Folding Test
Workman & Zhang 1999 Computer-Aided Design Apparel Spatial
vs. Manual Pattern Visualization Test
making vs. control 3 (author created). 1 3 1 1.937 4
(course in CAD instead Surface
of Pattern making) Development Test
Worsencroft 1955 - Control First-year engineering Spatial Relations .357 2
coursework vs. First- Test VAC1 form
Worsencroft 1955 - Exp 3 7 3 1
year Liberal Arts YCU .960 2
coursework
Wright, Thompson, Ganis, Newcombe & 1.201 12
Kosslyn 2008 - Control Practice vs. transfer on Mental Paper
MRT or Paper Folding Folding Test and
1, 2 1 3 1
Wright, Thompson, Ganis, Newcombe & Task MRT - RT, slope, .581 12
Kosslyn 2008 - Exp intercept , errors

Yates 1986 spatial visualization Paper Folding and


training vs. control (no Cube Comparison-
training) 2 Kit of Reference 1, 4 2 1 .619 2
Tests for Cognitive
Factors
a b c d
1 = Specific training 1 = Assembly/transformation 1 = Females 1 = Adults
2 = General spatial training 2 = Perspective taking 2 = Males 2 = Children
3 = Course 3 = Spatial principles 3 = Not specified
4 = Video game 4 = Mental rotation
Training of spatial ability 117

5 = Spatial perception

You might also like