You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/269799300

Test Review: Test of Written Language--Fourth Edition

Article  in  Journal of Psychoeducational Assessment · November 2011


DOI: 10.1177/0734282911406646

CITATIONS READS

0 662

2 authors, including:

Emma A. Climie
The University of Calgary
14 PUBLICATIONS   87 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Adam W. Mccrimmon on 11 April 2016.

The user has requested enhancement of the downloaded file.


406646
Psychoeducational AssessmentTest Review
JPAXXX10.1177/0734282911406646Journal of

Journal of Psychoeducational Assessment

Test Review 29(6) 592­–596


© 2011 SAGE Publications
Reprints and permission: http://www.
sagepub.com/journalsPermissions.nav
http://jpa.sagepub.com

D. D. Hammill & S. C. Larsen


Test of Written Language—Fourth Edition. (TOWL-4).
Austin, TX: PRO-ED, 2009.

Reviewed by: Adam W. McCrimmon & Emma A. Climie, University of Calgary, Calgary, Alberta, Canada
DOI: 10.1177/0734282911406646

Test Description
The Test of Written Language—Fourth Edition (TOWL-4), published by PRO-ED, is a newly
updated individual or group-based measure of written language for students aged 9 years,
0 months through 17 years, 11 months. The stated purposes of the measure are to identify students
in need of support or intervention in the area of written language, identify strengths and weak-
nesses in students’ writing abilities, document progress resulting from written language interven-
tions, and provide measurement in written language research.
The TOWL-4 is classified as a Level B measure, and may be administered by psychologists
and nonpsychologists who have undergone formal training in standardized psychoeducational
assessment. The stated administration and scoring time is approximately 60 to 90 min. The
TOWL-4 consists of seven subtests that combine to form two composites (Contrived Writing and
Spontaneous Writing) and an Overall Writing score. Contrived Writing tasks focus on discrete
aspects of written discourse (e.g., spelling, punctuation, word usage) whereas Spontaneous
Writing tasks examine an individual’s functional writing ability (i.e., quality). Scaled scores and
percentiles are provided for subtests and composites, and scoring parameters for each subtest are
provided in the Examiner’s Manual.
The TOWL-4 kit consists of an Examiner’s Manual, a Supplemental Practice Scoring Booklet,
two Record/Story Scoring Forms (Form A and Form B), two Student Response Booklets (Form
A and Form B), and three colored Picture Cards (a sample card and one card for Form A and
Form B, respectively). The Examiner’s Manual is effectively laid out, beginning with a discus-
sion of the history of the previous editions of the measure and a brief overview of written lan-
guage assessment. Following this, the Examiner’s Manual provides sections on administration
instructions, recording and interpretation, a description of the normative sample, and presenta-
tion of the psychometric properties of the measure, including a small section on controlling for
test bias.

Subtest Description and Scoring


The test begins with a standardized introduction to the measure. Following this, examinees are
provided the corresponding Picture Card (A or B) and informed that they have 5 min to prepare
and 15 min to write a story about the event and activities depicted in the picture. This story is
scored after the TOWL-4 administration is completed, and counts as subtests 6 (Contextual
Conventions) and 7 (Story Composition) through the use of specified criteria found in the
Record/Story Scoring Form. Scores for these two subtests are attached to elements pertaining
to the mechanics of writing (e.g., capitalization, punctuation, spelling, grammar, etc.). Although
the scoring parameters for each element differs, with some receiving a score of 0 or 1 and others

Downloaded from jpa.sagepub.com at UNIV CALGARY LIBRARY on November 15, 2011


Test Review 593

receiving a score of 0, 1, or 2, the scoring is primarily based on yes/no qualifications (i.e., the
element is present or is not) or quantifiable data (e.g., 0 = not evident, 1 = 1-2 items evident,
2 = 3+ items evident). The Supplemental Practice Scoring Booklet contains 10 sample stories
and the corresponding scoring for each and is intended to provide examples of scoring for the
benefit of examiners who desire additional practice in scoring these subtests.
For all subtests, there is an initial sample item followed by a number of test items of increasing
difficulty. Vocabulary is the second task in the administration sequence although it is listed as
Subtest 1. This task requires the examinee to write a sentence containing a specified word. Each
item is scored as either 1 (correct) or 0 (incorrect). For this subtest, as with all subsequent subtests,
examples of each score, as well as rationale, are provided in the Examiner’s Manual.
Subtests 2 (Spelling) and 3 (Punctuation) are administered simultaneously. The examiner reads
a sentence to the examinee, who then writes the sentence in their Response Booklet. Each sen-
tence is then scored either 1 (correct) or 0 (incorrect) for accuracy in spelling and punctuation.
Logical Sentences, Subtest 4, presents the examinee with 22 sentences in the Response Booklet,
each with an incorrect element of logic, such as use of a homonym (e.g., “here” rather than “hear”)
or incongruence of terms (e.g., The shoes were hungry to run).
Subtest 5, Sentence Combining, requires examinees to view several sentences and combine
them into one coherent sentence. Initial items contain two sentences that can be combined by
only adding the word “and” between each statement, whereas subsequent items are made more
difficult through the addition of more sentences and the complexity of language required to
effectively combine them.

Technical Adequacy
Development and Standardization

The TOWL-4 norming sample was collected throughout 2006-2007 and consisted of 2,205
children and adolescents ranging in age from 9 years to 17 years, 11 months across 17 American
states. The sample had a relatively consistent number of participants within each age group
(N = 201-294). Demographics were based on data from the 2007 U.S. census (U.S. Bureau of the
Census, 2007) and sample characteristics took into consideration gender, race, ethnicity, house-
hold income, educational level of parents, geographic region, and other exceptionalities of the
child (e.g., Learning Disabilities, Attention-Deficit/Hyperactivity Disorder, Hearing Impairment,
Speech–Language Disorder, Emotional disturbance, Blindness/partial sight, Physical Impairment,
Gifted/talented, and those on the Autistic Spectrum). It should be noted that no Canadian norms
have been created for this measure.

Reliability
Internal consistency. Internal consistency was measured through the use of Cronbach’s coef-
ficient alpha (examining both Forms A and B) and through administration of alternative forms.
Coefficient alpha scores (across ages) for the subtests were generally acceptable, ranging from
.74 to .92. The three composites (Contrived Writing, Spontaneous Writing, and Overall Writ-
ing) yielded good to excellent coefficient alpha scores of .84 to .96.
The TOWL-4 provides alternative form analysis in both the immediate and delayed timeframe.
In general, students who immediately completed both forms of the TOWL-4 demonstrated rela-
tively consistent results, with subtest mean correlations ranging from .74 to .86 on subtest scores
and .82 to .94 on composite scores. With the delayed administration, students were administered
both forms and then readministered both forms 2 weeks later. As with the immediate timeframe,

Downloaded from jpa.sagepub.com at UNIV CALGARY LIBRARY on November 15, 2011


594 Journal of Psychoeducational Assessment 29(6)

alternate forms testing was found to be acceptable, with 9 of 10 subtests and composite scores
correlating at .80 or higher.
Test-retest reliability. Test-retest reliability was conducted using a subsample of 84 students in
Texas (aged 9 years-17 years, 11 months). Both forms of the TOWL-4 were administered approx-
imately 2 weeks apart. The standard scores for each of these subtests were correlated, with a
majority (93%) having correlations rounding to .80 or higher and 54% rounding to or exceeding
.90, indicating that there is acceptable test-retest reliability between Forms A and B.
Scorer differences. In the case of tests such as the TOWL-4, clear scoring criteria are important
to reduce the subjectivity of scoring. To examine interrater scoring, 41 protocols were scored
independently by two individuals highly familiar with the scoring criteria. A majority of the sub-
test and composite scores resulted in correlations at or above .90 in magnitude, indicating strong
correlation. The two subtests that did not meet this cutoff were part of the Story Composition
composite, but both coefficients were still within the acceptable range (.80s).

Validity
Content validity. The TOWL-4 manual provides specific details on the rationale for including
each subtest, including consideration of individual items and composite domains. As well, careful
consideration was given to selecting stimulus pictures to ensure that pictures were appropriate,
child-friendly, and recognizable. A substantial review process was undertaken and included the
solicitation of feedback from teachers, university staff, school assessment personnel, and creators
of other assessment measures.
Test creators also took into consideration the possibility of item bias and conducted statistical
analyses to determine whether there may be a bias in item responses based on gender, race, or
ethnic background. Results of the analyses led authors to conclude that the TOWL-4 is within the
acceptable limits regarding item bias.
Criterion-prediction validity. A subset of participants were given both the TOWL-4 as well as
another standardized measure designed to measure a similar construct (e.g., reading or writing)
to determine the ability of the TOWL-4 to effectively predict an individual’s writing and read-
ing performance. Test scores on the TOWL-4 were correlated with performance on the Written
Language Observation Scale (WLOS; Hammill & Larson, 2009), the Reading Observation
Scale (ROS; Wiederholt, Hammill, & Brown, 2009), and the Test of Reading Comprehension—
4th edition (TORC-4; Brown, Wiederholt, & Hammill, 2009). Analyses revealed no significant
differences between mean scores on the TOWL-4 composite scores and those on the WLOS,
ROS, and TORC-4, indicating consistent performance of the participants between these tests.
Construct validity. A three-step process was undertaken to examine the TOWL-4’s ability to
accurately measure an individual’s writing ability. First, authors identified several constructs pre-
sumed to underlie writing performance. Second, using these constructs, a number of hypotheses
were created. Finally, these hypotheses were tested through scientific method. The first hypothe-
sis, that writing ability should be related to age, was found to be supported through correlational
analyses. As well, consistent with the second hypothesis, moderate correlations (.31 to .70) were
found between subtests, indicating that the subtests were correlated but not so closely related that
they were redundant. The third hypothesis predicted that writing ability should correlate with intel-
ligence. Using the Wechsler Intelligence Scale for Children—4th Edition (WISC-IV; Wechsler,
2003) and the Comprehensive Tests of Nonverbal Intelligence (CTONI; Hammill, Pearson, &
Wiederholt, 1996), overall correlations found a moderately strong relationship between the
TOWL-4 and WISC-IV (.53 to .75) and an acceptable correlation between the TOWL-4 and the
CTONI (.36 to .58). Finally, it was predicted that there would be differences in performance between
those of Average writing ability and those who were known to be poor or adept at writing. Indeed,

Downloaded from jpa.sagepub.com at UNIV CALGARY LIBRARY on November 15, 2011


Test Review 595

atypical populations scored significantly lower (e.g., Learning Disabilities) or higher (e.g., Gifted/
talented students) than typically-developing students.

Commentary and Recommendations


The TOWL-4 is a well-designed measure of written language in children and adolescents. The
evaluation of both contrived and spontaneous writing is particularly commendable as this inclu-
sion allows for a broader understanding of students’ written expression abilities. Moreover, the
test developers underwent the revision process by evaluating each of the previous versions and
accounted for reviewers’ concerns and recommendations for improvement. The administration
and scoring procedures are relatively straightforward. The Student Response Booklets are effi-
ciently designed and allow for quick and easy scoring of each item and the Picture Cards are
likely to be appealing to children as each depict a “busy” scene conducive to commenting and
description. As well, the measure itself is relatively inexpensive, an aspect that will appeal to
many practicing professionals.
In regard to the technical properties of the TOWL-4, it would appear as though this measure
demonstrates appropriate standardization and adequate psychometric properties. The standardiza-
tion sample was created based on recent census data and efforts were made to recruit participants
from a number of sites across the United States. One minor criticism, however, results from the fact
that the authors did not create comparable Canadian norms. Given the prevalence of the use of the
TOWL-4 in Canada, creation of Canadian norms would have been beneficial. Moreover, the fact
that the sample for the test-retest reliability evaluation all came from one U.S. state could indicate
a limitation in the accuracy and representativeness of this value.
Regarding the TOWL-4’s reliability and validity estimates, the authors point out that it is often
difficult to create clear and objective scoring for tests that are inherently subjective in scoring. As
such, the authors of the TOWL-4 should be commended for their significant attempt to ensure that
scoring procedures are as objective as possible. As well, reliability estimates for this measure are
consistently within the acceptable range and the authors took special care to ensure that the valid-
ity of this measure is also acceptable. It is apparent that the TOWL-4 meets acceptable standards
regarding test construction.
Despite the numerous positive qualities of the TOWL-4, there are some limitations that bear
mention. First, the subtests on the TOWL-4 bear striking resemblance to those from other
measures of academic achievement such as the Woodcock-Johnson III Tests of Achievement
(Woodcock, McGrew, & Mather, 2001) and the Wechsler Individual Achievement Test—Second
Edition (WIAT-II; Wechsler, 2001). As such, some professionals may find that the TOWL-4
provides little additional information to these measures in the process of a comprehensive assess-
ment. As well, at a reported 60 to 90 min administration time, many students are likely to experi-
ence fatigue during the administration, particularly those who struggle with written expression.
Computer scoring software is not offered, resulting in a minor inconvenience to some profes-
sionals. Finally, the Examiner’s Manual surprisingly lacks a section pertaining to strategies or
recommended activities that could be suggested to teachers or other classroom personnel based
on a student’s profile or pattern of personal strengths and weaknesses.
In summary, the TOWL-4 is an effective stand-alone measure of written expression in children
and adolescents. The evaluation of both the contrived and spontaneous aspects of writing in one
cohesive measure is particularly praiseworthy. However, the lack of Canadian norms may be a
factor for use in that geographic region. Overall, the authors have produced a commendable tool
with appropriate and adequate psychometric properties that will likely prove beneficial to school
psychologists and other classroom personnel.

Downloaded from jpa.sagepub.com at UNIV CALGARY LIBRARY on November 15, 2011


596 Journal of Psychoeducational Assessment 29(6)

References
Brown, V. L., Wiederholt, J. L., & Hammill, D. D. (2009). Test of Reading Comprehension. Austin, TX:
Hammill Institute on Disabilities.
Hammill, D. D., & Larson, S. C. (2009). Written Language Observation Scale. Austin, TX: Hammill Institute
on Disabilities.
Hammill, D. D., Pearson, N., & Wiederbolt, J. L. (1996). Comprehensive Test of Nonverbal Intelligence.
Austin, TX: PRO-ED.
U. S. Bureau of the Census. (2007). Statistical Abstract of the United States (126th ed.). Washington, DC:
Author.
Wechsler D. (2001). Wechsler Individual Achievement Test (2nd ed.). San Antonio, TX: Psychological
Corporation.
Wechsler, D. (2003). Wechsler Intelligence Scale for Children (4th ed.). San Antonio, TX: Psychological
Corporation.
Wiederholt, J. L., Hammill, D. D., & Brown, V. L. (2009). Reading Observation Scale. Austin, TX: Hammill
Institute on Disabilities.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock-Johnson III Tests of Achievement.
Itasca, IL: Riverside.

Downloaded from jpa.sagepub.com at UNIV CALGARY LIBRARY on November 15, 2011


View publication stats

You might also like