You are on page 1of 23

SCIENTIFIC STUDIES OF READING, 10(3), 301–322

Copyright © 2006, Lawrence Erlbaum Associates, Inc.

Measures of Reading Comprehension:


A Latent Variable Analysis of the
Diagnostic Assessment of Reading
Comprehension
David J. Francis
University of Houston

Catherine E. Snow
Harvard University

Diane August
Center for Applied Linguistics
Washington, DC

Coleen D. Carlson
University of Houston

Jon Miller
University of Wisconsin-Madison

Aquiles Iglesias
Temple University

This study compares 2 measures of reading comprehension: (a) the Woodcock–John-


son Passage Comprehension test, a standard in reading research, and (b) the Diagnos-
tic Assessment of Reading Comprehension (DARC), an innovative measure. Data
from 192 Grade 3 Spanish-speaking English language learners (ELLs) were used to
fit a series of latent variable analyses designed to explicitly test the discriminant va-

Correspondence should be sent to David J. Francis, Texas Institute for Measurement, Evaluation,
and Statistics, 100 TLCC Annex, University of Houston, Houston, TX 77204–6022. E-mail:
dfrancis@uh.edu
302 FRANCIS ET AL.

lidity and differential determinants of the 2 measures. Findings indicated that the 2
measures are related (r = .61) but distinct, and influenced by different factors. The
DARC is less strongly related to word-level skills and more strongly related to mea-
sures of narrative language production and memory. Both tests are equally influ-
enced by measures of nonverbal reasoning. These differential patterns of relations,
which cannot be explained on the basis of differential reliabilities, reflect true differ-
ences in the processing demands of the tests for 3rd-grade ELLs.

Assessing reading comprehension is challenging, because it is a complex and mul-


tiply determined outcome (RAND Reading Study Group, 2002). Thus, students’
success in comprehending a text may be disrupted by difficulties with any of sev-
eral precursor skill domains: print skills, reflected in measures of phonological
awareness, word reading and/or nonword reading accuracy, and word reading effi-
ciency (Adams, 1990; Gough & Tunmer, 1986; Perfetti, 1985; Vellutino, 1979,
1987); oral language skills, reflected in assessments of vocabulary, linguistic
memory, and language processing (Bradley & Bryant, 1983; Gathercole &
Pickering, 2000; Hulme, Muter, Snowling, & Stevenson, 2004); and extended dis-
course skills, reflected in measures of narrative production (Tabors, Snow, &
Dickinson, 2001). In addition, students may have difficulties with “pure” compre-
hension skills—retaining information from the text, accessing relevant informa-
tion in memory, making inferences that incorporate both those sources of informa-
tion, and developing adjusted knowledge schemas that take them into account
(Dixon, LeFevre, & Twilley, 1988; Engle, Nations, & Cantor, 1990; Haenggi &
Perfetti, 1994; Palmer et al., 1985).
Distinguishing among these different sources of failure in reading is crucial if
we are to tailor instruction and intervention appropriately. It makes little sense to
focus instruction exclusively on strategies for comprehension with students whose
word reading skills are deficient or who have inadequate knowledge of the mean-
ing of the words used in the text. Alternately, it makes little sense to focus time and
instructional attention on comprehension strategies with students who are already
strategic readers but whose comprehension is hampered by failures of fluency or
word knowledge. In assessing the reading skills of English language learners
(ELLs), it is particularly important to pinpoint sources of difficulty, because ELLs
can have extremely uneven profiles of skills, for example, good word reading skills
but very limited English vocabulary (Lesaux & Siegel, 2003) or good comprehen-
sion strategies but limited relevant background knowledge.
The most widely used comprehension assessments fail to distinguish among the
various sources of poor comprehension—which is appropriate, as they are de-
signed to identify level of functioning rather than to provide diagnostic informa-
tion. Our ultimate goal is to develop a reading comprehension assessment that, in
conjunction with other targeted measures of precursor skills, could inform instruc-
tion by identifying students’ particular profiles of weakness and strength.
MEASURES OF READING COMPREHENSION 303

In this article, we take a first step toward a deeper understanding of reading


comprehension assessments by exploring the relationship of precursor skills to
comprehension on two distinct types of comprehension tasks: (a) one widely used,
reliable, standardized portmanteau measure, the Woodcock–Johnson Language
Proficiency Battery–Revised (WLPB–R) Passage Comprehension subtest (PC),
and (b) one experimental, innovative, analytic measure, the Diagnostic Assess-
ment of Reading Comprehension (DARC; August, Francis, Hsu, & Snow, in
press).

THE DARC

The DARC was designed on the basis of previous test-development work by Potts
and Peterson (1985) and Hannon and Daneman (2001). Potts and Peterson’s test
isolated four processes hypothesized to occur during successful reading compre-
hension (Dixon et al., 1988; Engle et al., 1990; Haengi & Perfetti, 1994; Palmer et
al., 1985): (a) recalling from memory new information presented in the text, which
we call text memory; (b) making novel inferences based on information provided
in the text, called text inferencing; (c) accessing relevant prior knowledge from
long-term memory, called knowledge access; and (d) integrating accessed prior
knowledge with new text information, called knowledge integration. Potts and Pe-
terson validated their test by showing predictive relationships from total scores to
performance on a general measure of reading comprehension and from scores re-
flecting the four components to other, independent tests of those components.
The Potts and Peterson (1985) assessment used reading passages consisting of
three sentences that described relations among a set of real and artificial terms, for
example, “A JAL is larger than a TOC,” “A TOC is larger than a PONY,” and “A
BEAVER is larger than a CAZ.” Combining the information in the text with world
knowledge would in principle allow the construction of a five-item linear ordering
(JAL > TOC > PONY > BEAVER > CAZ). Participants read and studied the para-
graph and then responded to true–false statements of four types. Text memory
statements (e.g., “A JAL is larger than a TOC”) tested information explicitly men-
tioned in the paragraph. Text inferencing statements (e.g., “A JAL is larger than a
PONY”) required integrating information across propositions in the text (i.e., “A
JAL is larger than a TOC”; “A TOC is larger than a PONY”); no prior knowledge
was required. Knowledge access statements (e.g., “A PONY is larger than a
BEAVER”) could be answered by accessing prior knowledge; no information
from the text was required. Knowledge integration statements (e.g., “A TOC is
larger than a BEAVER”) required integrating prior knowledge (ponies are larger
than beavers) with a text-based fact (i.e., “A TOC is larger than a PONY”).
Potts and Peterson (1985) found that knowledge integration correlated with the
two text-based constructs—text memory and text inferencing—as well as with
304 FRANCIS ET AL.

knowledge access. However, knowledge access was not strongly correlated with the
text-based constructs, suggesting that the ability to remember new information and
the tendency to use world knowledge are separable. Hannon and Daneman (2001)
confirmed the conclusions from Potts and Peterson’s work, using a version of the test
that had more complex texts for use with university students. Both the correlations of
total score with a global, standardized test of reading comprehension ability (the
Nelson–Denny test of reading comprehension) and the correlation of individual
construct scores with specific tests of those constructs proved reliable.
August et al. (in press) built on these studies by piloting the DARC, a test de-
signed specifically to minimize the impact of word reading accuracy or speed and
vocabulary on comprehension. Their goal was to evaluate among ELLs the feasi-
bility and utility of a reading assessment in which the passages used simple, regu-
lar, high-frequency words and in which the impact of variation in background
knowledge was minimized by limiting topics to very familiar ones (e.g., pets, bicy-
cles) and by introducing nonce words for novel concepts. Items were constructed
as true–false statements referring to information presented in familiar narra-
tive-style passages, like the following:

Nan has four pets. One pet is a cat. Nan’s cat is fast. Nan has a pet culp. Nan’s
pet culp is like her cat. But Nan’s pet culp is faster than her cat.

August et al. (in press) demonstrated with three different sets of pilot partici-
pants that the DARC is feasible for use with children as young as kindergarteners,
that simple yes–no responses were adequate to reflect children’s comprehension
processing, and that different aspects of the comprehension process (text memory,
text inferencing, background knowledge, and knowledge integration) could be
measured independently. Crucially, the pilot results showed wide variation in per-
formance among ELLs who all scored low on a general comprehension measure.
Some children who scored poorly on measures with a higher vocabulary load and
greater syntactic complexity, such as the Stanford–9 or the WLPB–PC measures,
performed well on the DARC.
In this article, we explore further the differential functioning of the DARC and
the more widely used WLPB–PC, by contrasting how print-related, language-re-
lated, and narrative skills predict outcomes on these two measures within one
group of Latino third-grade ELLs. The WLPB–PC is typically strongly affected by
print-related skills; we ask whether the DARC reflects its design by showing a
weaker relation. On the other hand, the relation of both WLPB–PC and DARC per-
formance to oral language measures might be expected to be strong (Hulme et al.,
2004), given the central importance of language processing in reading comprehen-
sion. Skill in producing narratives has been shown to relate to word reading in La-
tino ELLs (Miller et al., 2006), but the relation of narrative production skill to com-
prehension measures remains open to speculation. It is possible that the task
MEASURES OF READING COMPREHENSION 305

demands posed by the DARC will privilege verbal-processing and verbal-reason-


ing skills, whereas the WLPB–PC will be more affected by traditional lan-
guage-proficiency measures, such as vocabulary. We have a number of central re-
search questions: (a) Can the DARC and the WLPB be differentiated as measures
of reading comprehension? If so, (b) how well do print skills predict performance
on the WLPB–PC measure versus the DARC? (c) Do other factors, such as partici-
pants’ oral language skills and narrative production, differentially relate to the
WLPB–PC and the DARC? (d) More generally, is there evidence that the DARC
operates in a way that is distinctively different from the WLPB–PC measure, thus
confirming its promise as a novel, informative measure of reading comprehension?
In the process of addressing these questions, we further hope to provide a template
to other reading researchers for empirically investigating measures of reading
comprehension.

METHOD

Participants
The sample comprised 192 third-grade Spanish-speaking ELLs in 33 transitional
bilingual education classrooms in nine schools in two different Texas school dis-
tricts. The two districts were demographically distinct: a large, densely populated
metropolitan area in southeastern Texas and a semiurban area in the Rio Grande
Valley. Approximately 65% (n = 125) of the sample came from the latter site. Cri-
teria for including schools in the sample were that more than 40% of the school
population were Latino, that at least 30% of the kindergarteners were considered
limited English proficient, that the schools were performing adequately on their
state accountability assessments, and that they were implementing a transitional
bilingual education model.
The final sample was evenly divided between boys (n = 94) and girls (n = 92),
with 6 cases missing information on gender. This sample is derived from a larger
study of kindergarten to Grade 3 students focused on developing and validating as-
sessments for use with Spanish-speaking ELLs (Francis, Carlson, et al., 2005).
The total sample consisted of 1,644 students across kindergarten to Grade 3, of
which a random sample participated in testing with the DARC and the oral narra-
tive procedure. All students included in analyses described were in Grade 3 and
had completed the DARC in English. Of the 401 students in Grade 3 in the larger
sample, 214 were given the opportunity to complete the DARC in English and in
Spanish, and 198 completed it in English. Most of these students (n = 192) also
completed an oral narrative production task in English (Miller et al., 2006). The re-
sulting sample of 192 students performed better as a group on standardized mea-
sures of English language than the remainder of the Grade 3 sample. However, as
306 FRANCIS ET AL.

can be seen in Table 1, they tended to perform well below normative expectations
as a group, with means on standardized language measures ranging from 70 to 81.

Measures
Students were administered a battery of language and literacy assessments in both
Spanish and English in sessions separated by about 2 weeks. We focus here only on
the English assessments. The total testing time was about 3 hr. If students were un-
able to complete testing in a single setting, additional sessions were allowed. The
battery was designed to measure key skills related to the development of literacy and
oral language proficiency: decoding accuracy and fluency, phonological awareness,
vocabulary, syntax, listening comprehension, and reading comprehension.
All tests were administered using standard administration procedures as pre-
scribed by the test developer/publisher where such procedures were available, with
the following exceptions. To increase students’ chances for completing assess-
ments in English, we first provided instructions and example items in English ac-
cording to standard administration guidelines. However, if students were unable to
complete the practice items in English, the examiner administered instructions in
Spanish and then repeated the practice items. Students still unable to complete the
practice items in English then were given them in Spanish. If the student was still
unable to complete the practice items, then testing was discontinued for that
subtest. However, if the student was able to complete the practice items in Spanish,
the English practice items were readministered. If the student was successful on
the English practice items, testing continued in English. If the student did not com-
plete the practice items in English, then testing was discontinued for that subtest.

Phonological awareness. The Comprehensive Test of Phonological Pro-


cessing (CTOPP; Wagner, Torgesen, & Rashotte, 1999), considered the gold stan-
dard for assessing phonological awareness skills, was administered. CTOPP
subtests measure phonological awareness, phonological memory, and rapid nam-
ing. For a complete description of the individual subtests, the reader is referred to
Wagner et al. (1999) or Schatschneider et al. (1998). In this study, we used the
subtests measuring phonological awareness, including First Sound Comparison
(Cronbach’s α = .67), Final Sound Comparison (α = .73), Blending Phonemes into
Words (α = .85), Blending Phonemes into Non-Words (α = .87), Segmenting
Words into Phonemes (α = .93), Segmenting Non-Words into Phonemes (α = .93),
and Phoneme Elision (α = .91). All reported alphas are based on the sample of
third-grade students in this study. The somewhat lower alpha for First Sound Com-
parison is attributable to ceiling effects. Total scores for the phonological subtests
were summed to form a composite measure of phonological awareness (α = .85)
based on prior research indicating that these tasks are unidimensional
(Schatschneider et al., 1998). Intrasubtest correlations in this sample ranged from
MEASURES OF READING COMPREHENSION 307

.14 to .79 (Mdn r = .44), with the smallest correlations involving the First Sound
Comparison (M = 9.5, with a maximum of 10).

Decoding accuracy and fluency. To measure decoding accuracy, we used


two subtests from the WLPB (Woodcock, 1991). The Letter Word Identification
subtest measures real-word decoding by presenting individual words that the stu-
dent reads out loud. The Word Attack subtest uses pseudoword decoding to assess
the examinee’s knowledge of the rules for decoding words phonetically in English.
The two subtests consist of 57 and 30 items, respectively. To limit overtesting, both
subtests use a ceiling rule: Testing continues until the examinee misses the six
highest numbered items on a page. Internal consistency reliability estimates in the
current sample were .89 for each subtest, whose scale scores also correlate .89 with
one another.
To measure fluency of decontextualized word reading, we used the Test of
Word Reading Efficiency (TOWRE; Torgesen, Wagner, & Rashotte, 1999). The
TOWRE requires the examinee to read aloud as quickly as possible a list of words
ordered by difficulty. The score is the number of words read correctly in 45 sec. We
converted this into number of words read correctly per minute of reading time and
used the raw score, because all students were in the same grade. Students were ran-
domly assigned to receive either Form A or Form B of the TOWRE real-word read-
ing test. In addition to the TOWRE, we created an experimental word reading effi-
ciency form for use in the project. This form used only words taken from Grade 1
texts; thus the words were not graded in difficulty. The experimental form corre-
lated .82 with Form A and .85 with Form B of the TOWRE. For the sake of analy-
sis, we combined the two scores for each student into a single estimate of
decontextualized word reading efficiency.

Oral language proficiency—Standardized. To measure oral language


proficiency, we used several subtests of the WLPB in English (Woodcock, 1991;
Woodcock & Muñoz-Sandoval, 1995). The WLPB is a highly regarded battery of
tests with high internal consistency and test–retest reliability values as well as ex-
tensive validity data (Woodcock, 1991). The test development, scaling, and
norming process for the assessment is described in detail in the WLPB manual
(Woodcock, 1991). All subtests use ceiling rules to terminate assessment follow-
ing a specified number of errors. The WLPB allows for various scale score metrics.
In this study, we used the age-based standard scores.
The Listening Comprehension subtest has 38 test items in which the examinee
listens to a brief passage that omits one word. The examinee completes the state-
ment by providing a single word that is consistent with the preceding information.
In this sample, the Listening Comprehension subtest had an internal consistency
reliability of .88.
308 FRANCIS ET AL.

The Memory for Sentences subtest targets semantics and syntax. The examinee
is asked to repeat precisely what is said by the examiner (Items 1–15) or presented
on audiotape (Items 16–32); items go up to sentences of roughly 20 words and
multiple clauses. Single-word items receive 1 point if correct, and multiword items
are scored 0, 1, or 2 points (2 points for exact reproduction). In the current sample,
internal consistency reliability was estimated at α = .74.
The Picture Vocabulary subtest of the WLPB begins with multiple-choice items
on which the student points to the picture that matches a vocabulary word provided
orally by the examiner. At Item 8, the test becomes a confrontation naming test,
that is, one where children are shown a picture and asked to provide a word that de-
scribes the picture of a targeted subpart of the picture.
On the Verbal Analogies subtest of the WLPB, the examinee is required to com-
plete items of the form “A is to B, as C is to … .” Internal consistency reliability
was estimated at α = .81 for both the Verbal Analogies and Picture Vocabulary
subtests in the current sample.

Oral language proficiency—Narrative procedure. In addition to the stan-


dardized language measures, we collected an oral narrative to reflect more natural
use of language. Students retold a story based on one of the wordless picture books
of Mercer Meyer. The exact procedure is described in detail in Miller et al. (2006).
In brief, the examinee looked through the pictures as the examiner told the story
following a flexible script. The student then retold the story while still looking at
the book. The examiner, who sat opposite the student, offered reminders that only
the student could see the pictures to reduce opportunities for the student to point to
the pictures and use nonspecific referents to elements in the story. The retelling
was recorded on digital minidisk and transcribed into computer text files for subse-
quent analysis using the Structured Analysis of Language Transcripts (Miller &
Iglesias, 2003). The Structured Analysis of Language Transcripts provides a vari-
ety of measures, including vocabulary diversity, fluency, and syntax. In addition,
the narratives were scored by hand for narrative structure. To assess reliability, a
random sample of 20 narratives was scored by multiple raters for protocol accu-
racy (98%–100%), for transcription accuracy (90%–98%), for the narrative struc-
ture score (Krippendorff’s α = .74), and for the subordination index (α = .96;
Krippendorff, 1980).
Measures of language proficiency taken from the narrative included Mean
Length of Utterance in Words (MLUW), Number of Different Words (NDW),
Number of Total Words (NTW), Subordination Index (SI), Words Per Minute
(WPM), and Narrative Structure Score (NSS). The rules for segmenting speech
into utterances are given in Miller et al. (in press) and Loban (1976). NDW is cal-
culated as the number of different word roots without inflections in the child’s re-
telling. NTW gives the total transcript length in number of words and is related to
vocabulary. WPM (obtained by dividing NTW by total seconds in the retell, then
MEASURES OF READING COMPREHENSION 309

multiplying by 60) provides a measure of verbal fluency. SI is the average number


of dependent clauses in an utterance, a measure of syntactic complexity. Finally,
the NSS, a measure of coherent story structure, was obtained by scoring the narra-
tive holistically on a 6-point scale (0–5) on each of seven story grammar elements:
Introduction, Character Development, Mental States, Character Referencing, Con-
flict/Resolution, Cohesion, and Conclusion. These are discussed in greater detail
in Miller et al. (2006).

Reading comprehension. The study used two measures of reading com-


prehension: (a) the PC subtest from the WLPB (Woodcock, 1991) and (b) the total
score and subtest scores from the DARC (August et al., in press). The WLPB–PC
uses a cloze procedure to examine the ability to understand information read si-
lently. The examinee reads a sentence or short passage from which individual
words have been omitted, then provides the most appropriate word to fill in the
blank given the meaning of the sentence or passage. The PC subtest is used exten-
sively in reading research because of its high reliability and validity. In this sample,
reliability was estimated to be .81 based on internal consistency.
The DARC requires children to read a passage and answer 30 true–false questions
about the information provided in the story. The questions are designed to assess stu-
dents’background knowledge, memory for the text, ability to form inferences based
on information provided in the text, and ability to form inferences that require inte-
gration of information presented in the text with information known from back-
ground knowledge. When these data were collected, two stories had been developed,
each in English and adapted into Spanish using a back-translation method (i.e.,
translation into Spanish and then back into English by independent translators).
Each student read one story in English and one story in Spanish, with the pairing of
language and story determined at random. We report here on the total correct score
on the story read in English and on scores measuring Text Memory, Text Inferencing,
Knowledge Integration, and Background Knowledge. For Story 1, internal consis-
tency for the total score was estimated at .75, whereas for Story 2 it was estimated at
.68. Reliability for the subtests was generally in the .5 to .6 range. For the purposes of
the current analyses, subtest scores were used in factor analytic models that take into
account their respective reliabilities. In all cases, scores from the two stories were
standardized to a common mean and standard deviation, so that performance for stu-
dents reading Story 1 was “equated” to performance for students reading Story 2 in
computing the subtest scores and total scores.

RESULTS

Means, standard deviations, and minima and maxima for each measure are pro-
vided in Table 1. The table presents the reading comprehension measures first, fol-
310 FRANCIS ET AL.

TABLE 1
Descriptive Statistics for English Assessments With Grade 3
English Language Learner Students

Measure N M SD Min Max

DARC Total 192 23.33 3.92 10.00 30.00


DARC Text Memory 192 14.82 2.58 6.10 18.68
DARC Text Inferencing 192 15.30 3.02 7.44 19.00
DARC Knowledge Integration 192 15.51 2.82 7.46 20.47
DARC Background Knowledge 191 15.58 2.91 3.02 18.50
WLPB Passage Comprehension 183 97.32 16.87 50.00 144.00
WLPB Letter Word Scale Score 183 112.10 28.38 34.00 198.00
WLPB Word Attack Scale Score 182 113.00 25.74 57.00 161.00
Narrative Mean Length of Utterance 192 7.24 1.05 4.48 10.71
Narrative Subordination Index 192 1.15 0.12 0.67 1.48
Narrative Number of Different Words 192 101.50 24.98 25.00 159.00
Narrative Fluency (words per minute) 192 98.36 26.02 24.22 158.80
WLPB Verbal Analogies 183 92.64 16.07 47.00 162.00
WLPB Picture Vocabulary 183 71.51 24.23 12.00 132.00
WLPB Listening Comprehension 180 72.73 19.73 9.00 141.00
WLPB Memory for Sentences 183 81.69 19.43 34.00 131.00
CTOPP Phonological Awareness 192 73.79 21.06 10.00 117.00
Word Reading Efficiency (words per second) 192 1.46 0.33 0.66 2.43
CTOPP Memory for Digits 192 10.81 2.51 5.00 20.00
Raven’s Colored Progressive Matrices (no. correct) 192 27.23 4.15 12.00 36.00

Note. Min = minima; Max = maxima; DARC = Diagnostic Assessment of Reading Comprehen-
sion; WLPB = Woodcock Language Proficiency Battery–Revised; CTOPP = Comprehensive Test of
Phonological Processing.

lowed by WLPB measures of word reading, the narrative language measures, the
WLPB language measures, phonological awareness, word reading efficiency,
memory, and nonverbal reasoning (Raven’s Color Progressive Matrices; Raven,
Reven, & Court, 1998).
Table 1 shows that 178 cases (93%) have complete data. We standardized each
of the subtest scores for the two DARC forms to perform common analyses. Be-
cause the DARC is an experimental measure, scores reported in Table 1 have lim-
ited value in indicating the level of sample performance on comprehension. How-
ever, the means in Table 1 show relatively good performance on word reading
skills and the WLPB reading comprehension measure. The standardized language
measures (Picture Vocabulary, Verbal Analogies, Listening Comprehension, and
Memory for Sentences) from the WLPB show, however, that these students are
scoring substantially more poorly on language proficiency than they are on the
WLPB measures of reading. In fact, performance on the reading measures is al-
most 1 standard deviation above average, whereas the measures of language profi-
ciency range from 0.5 to almost 2 standard deviations below average. The mea-
MEASURES OF READING COMPREHENSION 311

sures of narrative production paint a somewhat less bleak picture than the WLPB
language proficiency measures, but caution is imposed by the limited normative
information on this task.
To investigate the discriminant validity of the DARC and WLPB–PC measures,
we fit a series of latent variable models to the data. These models were developed
specifically to investigate the relations among the reading comprehension mea-
sures when examined alone together with the other measures in Table 1 that are
demonstrated precursors of reading comprehension. Table 2 presents correlations
between each of the predictors and each of the measures of reading comprehen-
sion, including the WLPB–PC, the four subtest scores from the DARC, and the to-
tal DARC score. These bivariate correlations show small to moderate correlations
between the DARC and WLPB–PC. The DARC tends to correlate more highly
with measures of language from the WLPB than with measures of word-level read-
ing skills. However, WLPB–PC (PC) shows a similar pattern in this sample.
To investigate the discriminant validity of the DARC and PC, we used confir-
matory factor analysis to estimate and test a series of latent variable models de-

TABLE 2
Correlations of Language and Literacy Measures to Woodcock Language
Proficiency Battery– Revised (WLPB) Passage Comprehension and
Diagnostic Assessment of Reading Comprehension (DARC)
Measures of Reading Comprehension

DARC

WLPB–
Measure PC Total TM TI KI BK

WLPB Passage Comprehension .46 .24 .29 .48 .30


WLPB Letter Word .66 .34 .11 .28 .35 .22
WLPB Word Attack .64 .28 .05 .31 .30 .14
Narrative Mean Length of Utterance .40 .22 .15 .13 .23 .17
Narrative Number of Different Words .47 .49 .32 .28 .34 .43
Narrative Fluency .35 .31 .26 .13 .32 .22
Narrative Subordination Index .36 .28 .23 .14 .24 .23
WLPB Verbal Analogies .78 .45 .29 .29 .44 .25
WLPB Picture Vocabulary .76 .57 .33 .37 .55 .37
WLPB Listening Comprehension .72 .51 .30 .36 .45 .36
WLPB Memory for Sentences .75 .54 .25 .39 .54 .35
CTOPP Phonological Awareness .60 .40 .19 .36 .32 .28
Word Reading Efficiency .56 .31 .13 .16 .34 .26
CTOPP Memory for Digits .34 .33 .09 .26 .31 .26
Raven’s Colored Progressive Matrices .44 .36 .19 .27 .25 .27

Note. | r | > .15 is statistically significant at p < .05. | r | > .24 is statistically significant at p < .001.
PC = Passage Comprehension; TM = Text Memory; TI = Text Inferencing; KI = Knowledge Integra-
tion; BK = Background Knowledge; CTOPP = Comprehensive Test of Phonological Processing.
312 FRANCIS ET AL.

signed to test explicit hypotheses about the two sets of measures. This approach to
testing models of discriminant validity was described in greater detail in Francis,
Fletcher, Catts, and Tomblin (2005). In the present context, we fit a series of four
latent variable models:

• Model 1–RC considers only the measures of reading comprehension and


their relations to one another.
• Model 2–PR considers only the predictors and their relations to one another.
• Model 3–RCPR simply combines the results of Model 1 and Model 2 to ex-
plore relations between predictors and comprehension.
• Model 4–2RCPR tests explicitly the discriminant validity of PC and the
DARC measures by introducing separate factors into Model 3.

Fit statistics for the four models are presented in Table 3. Fit statistics for Models 1
and 2 cannot be compared statistically to one another, or to Models 3 and 4, as
these models are not nested. Model 3 is, however, nested in Model 4, and thus these
models can be explicitly compared using the information in Table 3. All models
were fit using data from the subset of 178 cases with complete data.
We first fit a single-factor model (Model 1–RC) to the five reading comprehen-
sion measures (PC, Text Memory, Text Inferencing, Knowledge Integration, and
Background Knowledge) without regard for word-level reading skills (accuracy and
fluency), language proficiency, phonological awareness, memory, or verbal reason-
ing. As shown in Table 3, this model provided an exceptionally good fit to the data.
The chi-square test was not statistically significant, χ2(5, N = 178) = 5.62, p < .35.
Descriptive indices of fit were also strong, including the root mean square error of
approximation of .024 and the standardized root mean square residual (SRMSR) of
.033. Both of these measures indicate a well-fitting model when they fall below .05.
Thus, on balance, the information in Table 3 suggests that the reading comprehen-
sion measures intercorrelate in a way consistent with a single underlying dimension.
We would conclude from Model 1 that PC and the four DARC reading measures re-
flect a single factor of Reading Comprehension.
However, the test of unidimensionality (i.e., single-factoredness) afforded by
Model 1 is relatively low powered because of the limited number of measures in-
cluded in the model. Expanding the set of measures in the model increases the
power of the model to discriminate between PC and the measures of the DARC. To
introduce the other measures of Table 1 into the model in a meaningful way, we
first fit a series of models that examined only those measures. We began with a
model that included factors for Decoding (Letter Word scale score and Word At-
tack scale score from the WLPB), Narrative Language Production (MLUW, SI,
NDW, and WPM from the retelling), Standardized Language Proficiency (Verbal
Analogies, Listening Comprehension, Picture Vocabulary, and Memory for Sen-
tences from the WLPB), Phonological Awareness (PA; the CTOPP total), Fluency
MEASURES OF READING COMPREHENSION 313

TABLE 3
Fit Statistics for Latent Variable Models of Comprehension

Model χ2 df RMSEA SRMSR GFI AGFI

1 – RC 5.62 5 .024 .033 .99 .96


2 – PR 113.62 57 .076 .049 .92 .84
3 – RCPR 246.26 125 .076 .060 .87 .80
4 – 2RCPR 202.47 118 .062 .050 .89 .83
5 – M4R 202.47 118 .062 .050 .89 .83
5 – Restricted 206.20 124 .060 .051 .89 .83

Note. Model 1–RC – Reading comprehension measures only; single - factor model with WLPB–PC
and four DARC indicators (TM, TI, KI, and BK). Model 2–PR – predictors only; factors are decoding
(DE), Narrative Language (NL), Standardized Language (SL), Phonological Awareness (PA), Fluency
(FL), Memory (ME), and Nonverbal IQ (NV). PA, FL, and NV are measured by single indicators in all
models. ME is measured by WLPB MS and CTOPP MD. DE is measured by LW and WA from the
WLPB; NL is measured by all narrative measures (MLUW, NDW, SI, WPM), SL is measured by all
WLPB language measures (PV, VA, LC, and MS). These relations do not change across subsequent
models. Model 3–RCPR: combines models RC and PR; single Reading Comprehension factor. All fac-
tors are allowed to correlate freely. Model 4–2RCPR: is identical to Model 3 but splits the Reading
Comprehension factor into two factors, one measured only by WLPB PC and the other measured by the
four DARC measures (TM, TI, KI, and BK). Models 3 and 4 are nested and can be compared statisti-
cally. Model 5–M4R: reparameterization of Model 4 to account for relations of Reading Comprehen-
sion and Predictor Factors through Factor on Factor regressions. Model 5 is equivalent to Model 4.
Model 5–Restricted: – constrains to 0.0 nonsignificant regressions between the Reading Comprehen-
sion factors and the Predictor Factors. PA did not contribute uniquely to either Reading Comprehen-
sion factor. Regression of the Reading Comprehension factors onto the restricted set of Predictor Fac-
tors (see Table 5) fully accounted for the correlation between the two Reading Comprehension factors.
Model 5 Restricted is nested in Model 5. WLPB = Woodcock Language Proficiency Battery–Revised;
PC = Passage Comprehension; DARC = Diagnostic Assessment of Reading Comprehension; TM =
Text Memory; TI = Text Inferencing; KI = Knowledge Integration; BK = Background Knowledge; MS
= Memory for Sentences; CTOPP = Comprehensive Test of Phonological Processing; MD = Memory
for Digits; LW = Letter Word Scale Score; WA = Word Attack Scale Score; MLUW = Mean Length of
Utterance in Words; NDW = Number of Different Words; SI = Subordination Index; WPM = Words Per
Minute; PV = Picture Vocabulary; VA = Verbal Analogies; LC = Listening Comprehension; MS =
Memory for Sentences; RMSEA = root-mean-square residual; GFI = goodness-of-fit index; AGFI =
adjusted goodness-of-fit index.

(Word Reading Efficiency), Memory (Memory for Digits from the CTOPP), and
Nonverbal Intelligence (Raven’s Colored Progressive Matrices). We subsequently
revised the model based on information about lack of fit, in particular relying on
the modification indices to introduce two changes. The first change was to allow
for a test-specific correlation between the two narrative production measures SI
and MLUW; the second change was to allow Memory for Sentences to load on the
Memory factor along with Memory for Digits. The final model, the fit statistics for
which are presented in Table 3 under Model 2–PR, yielded a reasonable fit to the
data for the 14 predictor measures in Table 1. The overall chi-square for Model 2 is
314 FRANCIS ET AL.

statistically significant, χ2(57, N = 178) = 113.62, p < .001, which suggests a lack
of fit of the model to the data, but the other information in Table 3 suggests a rea-
sonably good-fitting model. Other information, not presented in Table 3, also sug-
gests a good-fitting model. Specifically, the expected cross-validation index
(ECVI) for the model of 1.19 was equal to the ECVI for a saturated model, and the
model AIC (Akaike’s Information Criterion) and saturated model AIC were virtu-
ally identical (210.77 and 210.00, respectively), whereas the model CAIC (consis-
tent AIC) of 411.49 was smaller than the saturated model CAIC of 640.09. Thus,
the model appears to do a reasonably good job of describing the relations among
the 14 predictors. It should be noted that the correlation between the two language
factors in the final version of Model 2 was estimated to be .72. Thus, although the
two language factors are highly correlated, the correlation is different from 1.0, in-
dicating that the narrative and standardized measures are tapping somewhat differ-
ent aspects of language functioning.
Model 3–RCPR combined the single Reading Comprehension factor of Model
1 with the seven predictor factors of Model 2. Information about model fit can be
found in Table 3. The goodness-of-fit index has dropped below .90, and the
SRMSR has increased to above .05. In both of the two models that were combined
to produce Model 3, the SRMSR was below .05. The increase in SRMSR is due to
the combination of the comprehension measures and the predictors in the same
model and suggests that the lack of fit is due to the model’s inability to reproduce
the correlations among the comprehension measures and the predictor measures.
In addition, the ECVI for Model 3 was 2.17, just slightly larger than the ECVI for a
saturated model (ECVI = 2.15). AIC for Model 3 was 383.89, compared with
380.00 for a saturated model, whereas CAIC was 655.71 for Model 3, compared
with 1174.54 for a saturated model. Thus, the fit, although not terrible, has deterio-
rated somewhat relative to that of Models 1 and 2.
Model 4–2RCPR explicitly tests the extent to which the lack of fit in Model 3 is
attributable to the fact that the measures of reading comprehension are not
unidimensional. In particular, Model 4 splits the Reading Comprehension factor of
Model 1 into two factors: one for WLPB–PC and one for the four measures of the
DARC. As mentioned previously, Model 3 is nested in Model 4, and thus we can
explicitly test whether Model 4 offers a statistically significant improvement over
Model 3. As can be seen in Table 3, a substantial portion of the lack of fit in Model
3 is attributed to the mis-specification on the Reading Comprehension factor. By
treating the two sets of reading comprehension measures as separate factors, the
chi-square statistic drops from 246.26 to 202.47 on df = 7. This difference of 43.79
is statistically significant at p < .0001 and represents roughly 18% of the lack of fit
in Model 3, which, it must be recalled, was created by joining two reasonably
well-fitting models. Splitting the reading comprehension measures into two fac-
tors yields an ECVI of 1.93, an AIC of 342.43, and a CAIC of 643.52. Statistics for
the saturated model are unchanged as these are dependent only on the data.
MEASURES OF READING COMPREHENSION 315

TABLE 4
Estimated Factor Correlations From Models 3 and 4

Factor 1 2 3 4 5 6 7 8 9 10

1. Reading Comprehension (Model 3) .73 .66 .99 .66 .61 .48 .49
2. WLPB– Passage Comprehension
(Model 4)
3. DARC Reading Comprehension
(Model 4) .61
4. Decoding .68 .43 .35 .60 .54 .64 .33 .39
5. Narrative Language .54 .64 .35 .72 .46 .51 .38 .24
6. Standardized Language .86 .77 .60 .72 .66 .49 .54 .38
7. Phonological Awareness .58 .48 .54 .46 .66 .43 .59 .41
8. Fluency .55 .41 .64 .51 .49 .43 .31 .33
9. Memory .38 .52 .33 .38 .55 .59 .31 .35
10. Nonverbal IQ .42 .40 .39 .24 .38 .41 .33 .35

Note. Correlations from Model 3 are given above the diagonal; correlations from Model 4 are given
below the diagonal. Model 4 splits the Reading Comprehension factor of Model 3 into two factors: one
for the Woodcock–Johnson Language Proficiency Battery–Revised (WLPB) Passage Comprehension
measure and one for the four measures from the Diagnostic Assessment of Reading Comprehension
(DARC). Note that correlations among the seven predictors factors are unchanged.

Table 4 presents factor correlations from Models 3 (above the diagonal) and 4
(below the diagonal). The correlations among the factors measured by the predic-
tors are unchanged in the two models. The correlations in Table 4 show that the
Reading Comprehension factor of Model 3 is indistinguishable from the Standard-
ized Language factor (r = .99) and is highly correlated with Decoding (r = .73; see
row 1 of Table 4). In contrast, when the Reading Comprehension factor is split into
two factors in Model 4, the two factors show somewhat different relations to the set
of predictors (see the columns labeled 2 and 3 in Table 4). In particular, the PC fac-
tor is more highly correlated than the DARC factor with Decoding (r = .68 vs. .43),
Phonological Awareness (r = .58 vs. .48), Fluency (r = .55 vs. .41), and the Stan-
dardized Language factor (r = .86 vs .77), whereas the DARC factor is more highly
correlated than the PC factor with the Narrative Language Production factor (r =
.64 vs. .54) and the Memory factor (r = .52 vs. .38). Both factors correlate about
equally with Nonverbal IQ. This differential pattern of correlations cannot be ex-
plained as a simple difference in the reliabilities of the PC and DARC factors. If
one factor were simply measured more reliably than the other, then we would ob-
serve the same pattern of correlations but a difference in magnitude. Instead, this
differential pattern of relations indicates that the factors potentially tap somewhat
different aspects of the Reading Comprehension construct.
To understand how these predictor factors might account for variation in PC and
DARC performance, we reparameterized Model 4 so that the predictor factors
were allowed to freely intercorrelate, but the relations between the seven predictor
316 FRANCIS ET AL.

factors and the two reading comprehension factors were accounted for through re-
gression of the two Reading Comprehension factors on the seven predictor factors.
Allowing the two Reading Comprehension factors to have correlated disturbances
yields a model with the same fit, albeit with different parameters. See the line for
Model 5–M4R in Table 3. In this alternate parameterization of Model 4, it is possi-
ble to examine the unique contribution of the seven predictor factors to each of the
two Reading Comprehension factors. In reducing Model 5, we constrained to 0.0
the direct regression of the Reading Comprehension factors on those predictor fac-
tors that did not make a statistically significant contribution to them. The final re-
duced set of factor-on-factor regression coefficients is presented in Table 5. The fit
statistics in Table 3 show that this reduced set of factor-on-factor regressions fits
the data almost as well as Model 5 (i.e., Model 4), which provides the upper limit
on how well the reduced model can fit the data. The difference in chi-square statis-
tics between the two models is not statistically significant, and all remaining fit sta-
tistics are essentially equivalent between the two models. The coefficients in Table
5 show that the Decoding and Fluency factors contribute only to the Reading Com-
prehension factor measured by WLPB–PC and do not contribute to the DARC fac-
tor once the two language and nonverbal reasoning factors have been accounted
for. Furthermore, the two language factors (viz., the Standardized Language factor
and the Narrative Language Production factor) make relatively equal contributions
in predicting the DARC factor, but these two language factors have opposite signs
in predicting to the PC factor. Similarly, the Memory factor has a negative signed
coefficient in predicting to the PC factor. Insofar as none of the correlations among

TABLE 5
Estimated Regression Coefficients for Predicting Reading Comprehension Factors
From Predictors in Model 4

Predictor

Comprehension Narrative Standardized Non-


Factor Statistic Decoding Language Language PA Fluency Memory verbal IQ

WLPB–PCa β 0.15 –0.31 1.09 0.13 –0.20 0.08


SE 0.07 0.16 0.15 0.06 0.10 0.05
t 2.09 –1.98 7.49 2.16 –1.93 1.66
DARCb β 0.17 0.29 0.06
SE 0.10 0.08 0.04
t 1.61 3.45 1.69

Note. Coefficients left blank are constrained to be 0 in the model. PA = Phonological Awareness; WLPB–PC
= Woodcock Language Proficiency Battery–Passage Comprehension; DARC = Diagnostic Assessment of Read-
ing Comprehension.
aR2 = .81. bR2 = .62.
MEASURES OF READING COMPREHENSION 317

FIGURE 1 Scatterplot of Woodcock Language Proficiency Battery—Revised (WLPB–R)


Passage Comprehension and Diagnostic Assessment of Reading Comprehension (DARC) Total
Reading scores. Plotted symbol is proportional in size to the score on WLPB–Decoding. The
WLPB–Decoding score is formed here by averaging standard scores for WLPB–Letter Word
and WLPB–Word Attack. DARC Total Reading has been standardized to a mean of 100 and
standard deviation of 15 prior to plotting.

the factors were negative, these sign reversals in the coefficients should be inter-
preted with some caution given the relatively small sample size available for this
study (n = 178).
As a final examination of the role of decoding in the PC and DARC factors, we
provide a scatter plot of the DARC total scores (standardized to a mean of 100 and
standard deviation of 15) and the WLPB–PC (see Figure 1). The plotting symbol is
designed to be proportional to the decoding score, which was calculated by averag-
ing the Letter Word scale score and Word Attack scale score from the WLPB. The
plot in Figure 1 shows fairly clearly that higher decoding scores tend to coincide
with higher comprehension scores on both tests, but this pattern is somewhat more
marked for PC. Note the preponderance of larger circles toward the right side of
the figure (e.g., above a score on the horizontal axis of 105 or 110). Contrast that
with the number of smaller circles in the left-hand side of the figure that tend to
range from the bottom of the figure to the top of the figure, that is, they run
throughout the extent of the score range of the DARC. It is certainly the case that
318 FRANCIS ET AL.

higher decoding scores lead to better performance on the DARC, just as they do on
all measures of reading comprehension, but this tendency is less pronounced as
both Figure 1 and the correlations in Table 4 indicate.

DISCUSSION

The results of these analyses show striking differences in the relations of various
predictors to outcomes on the two criterion measures of reading comprehension.
First, WLPB–PC is much more strongly related to print skills than is DARC per-
formance—confirming that we have achieved some degree of success in design-
ing the DARC to minimize the effects of variation in word reading ability on
variation in DARC scores. Particularly striking is that print-related skills of de-
coding and fluency make significant unique contributions to the prediction of
WLPB–PC but not to the DARC once contributions from language and reason-
ing factors have been accounted for. That the factor regressions of the restricted
version of Model 5 fully account for the relation between the two Reading Com-
prehension factors suggests that the basis of their relation is in language and not
in print-level skills. Note that these findings do not mean that the DARC is unre-
lated to print-level skills. Indeed, the correlation between the DARC and
print-level skills factors of Model 4 are moderately large, namely, .43 and .41 for
the Decoding and Fluency factors, respectively. However, these correlations do
indicate that we have achieved a measure of success in reducing the role of print
skills in the DARC measure of comprehension while increasing somewhat the
role of verbal processing.
Second, oral language skills are relatively much more important in explaining
variance in the DARC than in the WLPB–PC outcomes. Although the absolute
level of variance explained on the DARC (R2 = .62) is much lower (R2 = .81 for the
WLPB–PC), reflecting its lower reliability (which places an upper limit on pre-
dictability), it is clear that DARC performance is not so overdetermined by print
skills that the visible contribution of other domains is restricted.
Third, the narrative production measures show a possibly negative relation to
performance on the WLPB–PC assessment in the restricted factor regression
model (reduced version of Model 5) but a significant and positive relation to
DARC performance. In particular, overall narrative skill reflected in the Narrative
Language factor (the SI, MLUW, and NDW used in producing the oral narrative;
measures of its length and sophistication; and the WPM, a measure of oral fluency)
was uniquely related to DARC performance.
These results help to establish the value of the DARC as a measure of reading
comprehension on which performance is determined by abilities we might think of
as central to comprehension itself—memory for the text read, integration of new
information with information stored in memory, making connections across those
MEASURES OF READING COMPREHENSION 319

sources of knowledge, and structuring the integrated information into narrative


forms that might promote longer term retention. In short, the DARC, like the narra-
tive production task, places a premium on verbal processing of information and not
simply on verbal knowledge as reflected in measures of vocabulary. Although
these aspects of verbal skill are certainly related, they are distinct, and it stands to
reason that their contributions to the processing of text can be differentiated with
an appropriately crafted measure of comprehension. We certainly do not underes-
timate the importance of word-reading skills as a determinant of comprehension
success; we do suggest, though, that comprehension measures that reflect other
domains are important in guiding instruction and in informing our understanding
about the complexity of achieving successful comprehension of a text.
The analyses reported here made use of latent variable models to explicate pos-
sible psychometric properties, such as convergent and discriminant validity, of two
sets of reading comprehension measures, one a popular and heavily researched test
and the other a novel test with roots in experimental psychology. That preliminary
models showed the two sets of measures to converge on a single construct high-
lights the importance of explicit testing of psychometric models in both simple and
complex contexts. Other models certainly could have been considered and tested
as potential explanations of the relations among the 19 measures studied here. The
extent to which these findings will hold up on replication remains to be deter-
mined. Although there is reason for optimism given the consistency of these find-
ings with other research on the WLPB–PC (Francis, Fletcher, et al., 2005) and with
the theory underlying the development of the DARC, the sample size used in this
study is small by latent variable modeling standards, and the models are relatively
complex. Thus, caution in interpreting the results is advised, especially those in-
volving the relative contributions of different predictors to the two comprehension
factors.

Limitations
Of course, the study reported here is subject to many limitations. First, the version
of the DARC used was a preliminary one, and the test itself falls short of desired
levels of reliability. This limits the amount of variance that can be explained and
contrasts to the much more reliable subtests of the WLPB.
Second, we report data on a particular group of third-grade ELLs, all of whom
received literacy instruction in a pair of school districts in Texas. It is thus impossi-
ble to estimate the degree to which the results reported here would generalize to a
more heterogeneous sample of readers.
Third, although the DARC was designed to minimize the impact on perfor-
mance of vocabulary knowledge, we have evidently not fully achieved this goal,
despite the very limited lexical range in the DARC passages. Performance on the
DARC shows a significant correlation to language factors, and future versions of
320 FRANCIS ET AL.

the DARC will have to be refined to control and/or manipulate this relationship
more precisely. Of course, whether the relations with language currently shared by
the DARC reflect knowledge of semantics, such as vocabulary knowledge, or ver-
bal reasoning skills that can be differentiated from knowledge of semantics, awaits
further research. This research will likely require greater refinement of the lan-
guage constructs in our models as well as the ability to more precisely measure the
specific processing demands of the comprehension measures. We neither included
models with a general language factor nor attempted to isolate possible methods
factors in the models investigated here. Models with general language factors have
been found in prior research using standardized language measures (Fletcher et al.,
1996) but have not been applied to narrative production and standardized language
measures in ELL populations, to our knowledge. Similarly, Mehta, Foorman,
Branum-Martin, and Taylor (2005) found a single latent factor to account for vari-
ability across language and literacy measures, although their study used only a
very limited set of language measures. It is clear from the current analyses that a
single language factor could not account for the covariances among the language
measures in this sample, as evidenced by the magnitude of the correlation between
the two language factors in all of the models (2–5). However, it is possible that a
general language factor with specific factors designed to capture method variance,
or possibly more specific aspects of language processing, could provide a better fit
to the data than the current models. The extent to which the conclusions reached
about the DARC and PC Reading Comprehension factors would hold up under
competing models for the language factors and other predictors must await future
research.
Fourth, although we would like to argue for the value of assessing reading com-
prehension using measures that minimize the impact of print skills and of vocabu-
lary knowledge for all students, in fact we have so far tested the DARC only with
second-language speakers of English, and its value with monolingual English
readers remains undemonstrated. In making this claim, it is important to keep in
mind what is meant by minimizing print skills. Although we do not deny the im-
portance of print skills in comprehension generally, in assessing comprehension
there is some value to isolating the relative roles of print skills from the various
forms of cognitive processing that take place during comprehension of complex
text so that assessment can effectively be used to guide instruction. So long as per-
formance on comprehension assessments is determined by many factors whose
relative contributions are undifferentiated in the scores obtained on the assess-
ments, the goal of using assessment to guide instruction will remain elusive. We
have proposed here one approach to isolating the contributions of various impor-
tant components to reading comprehension. Specifically, we have shown that it
may be possible to constrain the decoding and vocabulary demands of the text
while increasing the processing demands of understanding the text. At the same
time, we have shown how latent variable models can play an important role in eval-
MEASURES OF READING COMPREHENSION 321

uating the success of our efforts to isolate and measure these important processes.
Although reading comprehension in its natural state is dependent on many skills
and abilities, its assessment may be better served by measures like the DARC that
attempt to isolate these processes from one another for the purpose of diagnosis
and guiding instruction.

ACKNOWLEDGMENTS

This research was supported in part by grants HD39521, “Oracy/Literacy Devel-


opment of Spanish-speaking Children” and R305U010001, “Biological and Be-
havioral Variation in the Language Development of Spanish-speaking Children”,
both of which were jointly funded by the National Institute of Child Health and
Human Development and the Institute of Education Sciences. The findings and
conclusions reported herein are those of the authors and do not necessarily reflect
the views of the agencies or the federal government, either expressly or implied.

REFERENCES

Adams, M. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press.
August, D., Francis, D., Hsu, H.-Y., & Snow, C. (in press). Assessing reading comprehension in
bilinguals. Elementary School Journal.
Bradley, L., & Bryant, P. E. (1983). Categorizing sounds and learning to read—A causal connection.
Nature, 301, 419–421.
Dixon, P., LeFevre, J. A., & Twilley, L. C. (1988). Word knowledge and working memory as predictors
of reading skill. Journal of Educational Psychology, 80, 465–472.
Engle, R. W., Nations, J. K., & Cantor, J. (1990). Is “working memory capacity” just another name for
word knowledge? Journal of Educational Psychology, 82, 799–804.
Fletcher, J. M., Stuebing, K. K., Shaywitz, B. A., Brandt, M. E., Francis, D. J., & Shaywitz, S. E.
(1996). Measurement issues in the interpretation of behavior–brain relationships. In R. Thatcher, N.
Krasnegor, & G. R. Lyon (Eds.), Developmental neuroimaging: Mapping the development of brain
and behavior (pp. 255–262). New York: Academic.
Francis, D., Carlson, C., Fletcher, J., Foorman, B., Goldenberg, C., Vaughn, S., et al. (2005). Oracy/lit-
eracy development of Spanish-speaking children: A multi-level program of research on language mi-
nority children and the instruction, school and community contexts, and interventions that influence
their academic outcomes. Perspectives, pp. 8–12.
Francis, D. J., Fletcher, J. M., Catts, H., & Tomblin, B. (2005). Dimensions affecting the assessment of
reading comprehension. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehension and
assessment (pp. 369–394). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Gathercole, S. E., & Pickering, S. J. (2000). Working memory deficits in children with low achieve-
ments in the National Curriculum at seven years. British Journal of Educational Psychology, 70,
177–194.
Gough, P. B., & Tunmer, W. E. (1986). Decoding, reading, and reading disability. Remedial and Special
Education, 7(1), 6–10.
322 FRANCIS ET AL.

Haenggi, D., & Perfetti, C. A. (1994). Processing components of college-level reading comprehen-
sions. Discourse Processes, 17, 83–104.
Hannon, B., & Daneman, M. (2001). A new tool for understanding individual differences in the compo-
nent processes of reading comprehension. Journal of Educational Psychology, 93, 103–128.
Hulme, C., Muter, V., Snowling, M., & Stevenson, J. (2004). Phonemes, rimes, vocabulary, and gram-
matical skills as foundations of early reading development: Evidence from a longitudinal study. De-
velopmental Psychology, 40, 665–681.
Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills, CA:
Sage.
Lesaux, N., & Siegel, L. (2003). The development of reading in children who speak English as a second
language. Developmental Psychology, 39, 1005–1019.
Loban, W. (1976). Language development: Kindergarten through grade twelve (Research Rep. No.
18). Urbana, IL: National Council of Teachers of English.
Mehta, P. D., Foorman, B. R., Branum-Martin, L., & Taylor, P. W. (2005). Literacy as a unidimensional
multilevel construct: Validation, sources of influence, and implications in a longitudinal Study in
grades 1 to 4. Scientific Studies of Reading, 9, 85–116.
Miller, J., & Iglesias, A. (2003). Systematic analysis of English and Spanish language transcripts.
Madison, WI.
Miller, J. F., Iglesias, A., Heilmann, J., Fabiano, L., Nockerts, A., & Francis, D. (2006). Oral language
and reading in bilingual children. Learning Disabilities Research & Practice, 21, 30–43.
Palmer, J., MacLeod, C. M., Hunt, E., & Davidson, J. E. (1985). Information processing correlates of
reading. Journal of Memory and Language, 24, 59–88.
Perfetti, C. A. (1985). Reading ability. New York: Oxford Press.
Potts, G. R., & Peterson, S. B. (1985). Incorporation versus compartmentalization in memory for dis-
course. Journal of Memory and Language, 24, 107–118.
RAND Reading Study Group. (2002). Reading for understanding: Toward an R&D program in reading
comprehension. Washington, DC: RAND Education.
Raven, J., Raven, J. C., & Court, J. H. (1998). Coloured Progressive Matrices 1998 edition. Oxford,
England: Oxford Psychologists Press.
Schatschneider, C., Francis, D. J., Foorman, B. R., Fletcher, J. M., & Mehta, P. (1998). The
dimensionality of phonological awareness: An application of item response theory. Journal of Edu-
cational Psychology, 91, 439–449.
Tabors, P. O., Snow, C. E., & Dickinson, D. K. (2001). Homes and schools together: Supporting lan-
guage and literacy development. In D. K. Dickinson & P. O. Tabors (Eds.), Beginning literacy with
language (pp. 313–334). Baltimore: Brookes.
Torgesen, J. K., Wagner, R. K., & Rashotte, C. A. (1999). Test of Word Reading Efficiency. Austin, TX:
PRO-ED.
Vellutino, F. R. (1979). Dyslexia: Theory and research. Cambridge, MA: MIT Press.
Vellutino, F. R. (1987, March). Dyslexia. Scientific American, 34–41.
Wagner, R., Torgesen, J., & Rashotte, C. (1999). Comprehensive Test of Phonological Processing. Aus-
tin, TX: PRO-ED.
Woodcock, R. W. (1991). Woodcock Language Proficiency Battery–Revised (English form). Chicago:
Riverside.
Woodcock, R. W., & Muñoz-Sandoval, A. F. (1995). Woodcock Language Proficiency Battery–Revised
(Spanish form). Chicago: Riverside.

You might also like