Professional Documents
Culture Documents
Catherine E. Snow
Harvard University
Diane August
Center for Applied Linguistics
Washington, DC
Coleen D. Carlson
University of Houston
Jon Miller
University of Wisconsin-Madison
Aquiles Iglesias
Temple University
Correspondence should be sent to David J. Francis, Texas Institute for Measurement, Evaluation,
and Statistics, 100 TLCC Annex, University of Houston, Houston, TX 77204–6022. E-mail:
dfrancis@uh.edu
302 FRANCIS ET AL.
lidity and differential determinants of the 2 measures. Findings indicated that the 2
measures are related (r = .61) but distinct, and influenced by different factors. The
DARC is less strongly related to word-level skills and more strongly related to mea-
sures of narrative language production and memory. Both tests are equally influ-
enced by measures of nonverbal reasoning. These differential patterns of relations,
which cannot be explained on the basis of differential reliabilities, reflect true differ-
ences in the processing demands of the tests for 3rd-grade ELLs.
THE DARC
The DARC was designed on the basis of previous test-development work by Potts
and Peterson (1985) and Hannon and Daneman (2001). Potts and Peterson’s test
isolated four processes hypothesized to occur during successful reading compre-
hension (Dixon et al., 1988; Engle et al., 1990; Haengi & Perfetti, 1994; Palmer et
al., 1985): (a) recalling from memory new information presented in the text, which
we call text memory; (b) making novel inferences based on information provided
in the text, called text inferencing; (c) accessing relevant prior knowledge from
long-term memory, called knowledge access; and (d) integrating accessed prior
knowledge with new text information, called knowledge integration. Potts and Pe-
terson validated their test by showing predictive relationships from total scores to
performance on a general measure of reading comprehension and from scores re-
flecting the four components to other, independent tests of those components.
The Potts and Peterson (1985) assessment used reading passages consisting of
three sentences that described relations among a set of real and artificial terms, for
example, “A JAL is larger than a TOC,” “A TOC is larger than a PONY,” and “A
BEAVER is larger than a CAZ.” Combining the information in the text with world
knowledge would in principle allow the construction of a five-item linear ordering
(JAL > TOC > PONY > BEAVER > CAZ). Participants read and studied the para-
graph and then responded to true–false statements of four types. Text memory
statements (e.g., “A JAL is larger than a TOC”) tested information explicitly men-
tioned in the paragraph. Text inferencing statements (e.g., “A JAL is larger than a
PONY”) required integrating information across propositions in the text (i.e., “A
JAL is larger than a TOC”; “A TOC is larger than a PONY”); no prior knowledge
was required. Knowledge access statements (e.g., “A PONY is larger than a
BEAVER”) could be answered by accessing prior knowledge; no information
from the text was required. Knowledge integration statements (e.g., “A TOC is
larger than a BEAVER”) required integrating prior knowledge (ponies are larger
than beavers) with a text-based fact (i.e., “A TOC is larger than a PONY”).
Potts and Peterson (1985) found that knowledge integration correlated with the
two text-based constructs—text memory and text inferencing—as well as with
304 FRANCIS ET AL.
knowledge access. However, knowledge access was not strongly correlated with the
text-based constructs, suggesting that the ability to remember new information and
the tendency to use world knowledge are separable. Hannon and Daneman (2001)
confirmed the conclusions from Potts and Peterson’s work, using a version of the test
that had more complex texts for use with university students. Both the correlations of
total score with a global, standardized test of reading comprehension ability (the
Nelson–Denny test of reading comprehension) and the correlation of individual
construct scores with specific tests of those constructs proved reliable.
August et al. (in press) built on these studies by piloting the DARC, a test de-
signed specifically to minimize the impact of word reading accuracy or speed and
vocabulary on comprehension. Their goal was to evaluate among ELLs the feasi-
bility and utility of a reading assessment in which the passages used simple, regu-
lar, high-frequency words and in which the impact of variation in background
knowledge was minimized by limiting topics to very familiar ones (e.g., pets, bicy-
cles) and by introducing nonce words for novel concepts. Items were constructed
as true–false statements referring to information presented in familiar narra-
tive-style passages, like the following:
Nan has four pets. One pet is a cat. Nan’s cat is fast. Nan has a pet culp. Nan’s
pet culp is like her cat. But Nan’s pet culp is faster than her cat.
August et al. (in press) demonstrated with three different sets of pilot partici-
pants that the DARC is feasible for use with children as young as kindergarteners,
that simple yes–no responses were adequate to reflect children’s comprehension
processing, and that different aspects of the comprehension process (text memory,
text inferencing, background knowledge, and knowledge integration) could be
measured independently. Crucially, the pilot results showed wide variation in per-
formance among ELLs who all scored low on a general comprehension measure.
Some children who scored poorly on measures with a higher vocabulary load and
greater syntactic complexity, such as the Stanford–9 or the WLPB–PC measures,
performed well on the DARC.
In this article, we explore further the differential functioning of the DARC and
the more widely used WLPB–PC, by contrasting how print-related, language-re-
lated, and narrative skills predict outcomes on these two measures within one
group of Latino third-grade ELLs. The WLPB–PC is typically strongly affected by
print-related skills; we ask whether the DARC reflects its design by showing a
weaker relation. On the other hand, the relation of both WLPB–PC and DARC per-
formance to oral language measures might be expected to be strong (Hulme et al.,
2004), given the central importance of language processing in reading comprehen-
sion. Skill in producing narratives has been shown to relate to word reading in La-
tino ELLs (Miller et al., 2006), but the relation of narrative production skill to com-
prehension measures remains open to speculation. It is possible that the task
MEASURES OF READING COMPREHENSION 305
METHOD
Participants
The sample comprised 192 third-grade Spanish-speaking ELLs in 33 transitional
bilingual education classrooms in nine schools in two different Texas school dis-
tricts. The two districts were demographically distinct: a large, densely populated
metropolitan area in southeastern Texas and a semiurban area in the Rio Grande
Valley. Approximately 65% (n = 125) of the sample came from the latter site. Cri-
teria for including schools in the sample were that more than 40% of the school
population were Latino, that at least 30% of the kindergarteners were considered
limited English proficient, that the schools were performing adequately on their
state accountability assessments, and that they were implementing a transitional
bilingual education model.
The final sample was evenly divided between boys (n = 94) and girls (n = 92),
with 6 cases missing information on gender. This sample is derived from a larger
study of kindergarten to Grade 3 students focused on developing and validating as-
sessments for use with Spanish-speaking ELLs (Francis, Carlson, et al., 2005).
The total sample consisted of 1,644 students across kindergarten to Grade 3, of
which a random sample participated in testing with the DARC and the oral narra-
tive procedure. All students included in analyses described were in Grade 3 and
had completed the DARC in English. Of the 401 students in Grade 3 in the larger
sample, 214 were given the opportunity to complete the DARC in English and in
Spanish, and 198 completed it in English. Most of these students (n = 192) also
completed an oral narrative production task in English (Miller et al., 2006). The re-
sulting sample of 192 students performed better as a group on standardized mea-
sures of English language than the remainder of the Grade 3 sample. However, as
306 FRANCIS ET AL.
can be seen in Table 1, they tended to perform well below normative expectations
as a group, with means on standardized language measures ranging from 70 to 81.
Measures
Students were administered a battery of language and literacy assessments in both
Spanish and English in sessions separated by about 2 weeks. We focus here only on
the English assessments. The total testing time was about 3 hr. If students were un-
able to complete testing in a single setting, additional sessions were allowed. The
battery was designed to measure key skills related to the development of literacy and
oral language proficiency: decoding accuracy and fluency, phonological awareness,
vocabulary, syntax, listening comprehension, and reading comprehension.
All tests were administered using standard administration procedures as pre-
scribed by the test developer/publisher where such procedures were available, with
the following exceptions. To increase students’ chances for completing assess-
ments in English, we first provided instructions and example items in English ac-
cording to standard administration guidelines. However, if students were unable to
complete the practice items in English, the examiner administered instructions in
Spanish and then repeated the practice items. Students still unable to complete the
practice items in English then were given them in Spanish. If the student was still
unable to complete the practice items, then testing was discontinued for that
subtest. However, if the student was able to complete the practice items in Spanish,
the English practice items were readministered. If the student was successful on
the English practice items, testing continued in English. If the student did not com-
plete the practice items in English, then testing was discontinued for that subtest.
.14 to .79 (Mdn r = .44), with the smallest correlations involving the First Sound
Comparison (M = 9.5, with a maximum of 10).
The Memory for Sentences subtest targets semantics and syntax. The examinee
is asked to repeat precisely what is said by the examiner (Items 1–15) or presented
on audiotape (Items 16–32); items go up to sentences of roughly 20 words and
multiple clauses. Single-word items receive 1 point if correct, and multiword items
are scored 0, 1, or 2 points (2 points for exact reproduction). In the current sample,
internal consistency reliability was estimated at α = .74.
The Picture Vocabulary subtest of the WLPB begins with multiple-choice items
on which the student points to the picture that matches a vocabulary word provided
orally by the examiner. At Item 8, the test becomes a confrontation naming test,
that is, one where children are shown a picture and asked to provide a word that de-
scribes the picture of a targeted subpart of the picture.
On the Verbal Analogies subtest of the WLPB, the examinee is required to com-
plete items of the form “A is to B, as C is to … .” Internal consistency reliability
was estimated at α = .81 for both the Verbal Analogies and Picture Vocabulary
subtests in the current sample.
RESULTS
Means, standard deviations, and minima and maxima for each measure are pro-
vided in Table 1. The table presents the reading comprehension measures first, fol-
310 FRANCIS ET AL.
TABLE 1
Descriptive Statistics for English Assessments With Grade 3
English Language Learner Students
Note. Min = minima; Max = maxima; DARC = Diagnostic Assessment of Reading Comprehen-
sion; WLPB = Woodcock Language Proficiency Battery–Revised; CTOPP = Comprehensive Test of
Phonological Processing.
lowed by WLPB measures of word reading, the narrative language measures, the
WLPB language measures, phonological awareness, word reading efficiency,
memory, and nonverbal reasoning (Raven’s Color Progressive Matrices; Raven,
Reven, & Court, 1998).
Table 1 shows that 178 cases (93%) have complete data. We standardized each
of the subtest scores for the two DARC forms to perform common analyses. Be-
cause the DARC is an experimental measure, scores reported in Table 1 have lim-
ited value in indicating the level of sample performance on comprehension. How-
ever, the means in Table 1 show relatively good performance on word reading
skills and the WLPB reading comprehension measure. The standardized language
measures (Picture Vocabulary, Verbal Analogies, Listening Comprehension, and
Memory for Sentences) from the WLPB show, however, that these students are
scoring substantially more poorly on language proficiency than they are on the
WLPB measures of reading. In fact, performance on the reading measures is al-
most 1 standard deviation above average, whereas the measures of language profi-
ciency range from 0.5 to almost 2 standard deviations below average. The mea-
MEASURES OF READING COMPREHENSION 311
sures of narrative production paint a somewhat less bleak picture than the WLPB
language proficiency measures, but caution is imposed by the limited normative
information on this task.
To investigate the discriminant validity of the DARC and WLPB–PC measures,
we fit a series of latent variable models to the data. These models were developed
specifically to investigate the relations among the reading comprehension mea-
sures when examined alone together with the other measures in Table 1 that are
demonstrated precursors of reading comprehension. Table 2 presents correlations
between each of the predictors and each of the measures of reading comprehen-
sion, including the WLPB–PC, the four subtest scores from the DARC, and the to-
tal DARC score. These bivariate correlations show small to moderate correlations
between the DARC and WLPB–PC. The DARC tends to correlate more highly
with measures of language from the WLPB than with measures of word-level read-
ing skills. However, WLPB–PC (PC) shows a similar pattern in this sample.
To investigate the discriminant validity of the DARC and PC, we used confir-
matory factor analysis to estimate and test a series of latent variable models de-
TABLE 2
Correlations of Language and Literacy Measures to Woodcock Language
Proficiency Battery– Revised (WLPB) Passage Comprehension and
Diagnostic Assessment of Reading Comprehension (DARC)
Measures of Reading Comprehension
DARC
WLPB–
Measure PC Total TM TI KI BK
Note. | r | > .15 is statistically significant at p < .05. | r | > .24 is statistically significant at p < .001.
PC = Passage Comprehension; TM = Text Memory; TI = Text Inferencing; KI = Knowledge Integra-
tion; BK = Background Knowledge; CTOPP = Comprehensive Test of Phonological Processing.
312 FRANCIS ET AL.
signed to test explicit hypotheses about the two sets of measures. This approach to
testing models of discriminant validity was described in greater detail in Francis,
Fletcher, Catts, and Tomblin (2005). In the present context, we fit a series of four
latent variable models:
Fit statistics for the four models are presented in Table 3. Fit statistics for Models 1
and 2 cannot be compared statistically to one another, or to Models 3 and 4, as
these models are not nested. Model 3 is, however, nested in Model 4, and thus these
models can be explicitly compared using the information in Table 3. All models
were fit using data from the subset of 178 cases with complete data.
We first fit a single-factor model (Model 1–RC) to the five reading comprehen-
sion measures (PC, Text Memory, Text Inferencing, Knowledge Integration, and
Background Knowledge) without regard for word-level reading skills (accuracy and
fluency), language proficiency, phonological awareness, memory, or verbal reason-
ing. As shown in Table 3, this model provided an exceptionally good fit to the data.
The chi-square test was not statistically significant, χ2(5, N = 178) = 5.62, p < .35.
Descriptive indices of fit were also strong, including the root mean square error of
approximation of .024 and the standardized root mean square residual (SRMSR) of
.033. Both of these measures indicate a well-fitting model when they fall below .05.
Thus, on balance, the information in Table 3 suggests that the reading comprehen-
sion measures intercorrelate in a way consistent with a single underlying dimension.
We would conclude from Model 1 that PC and the four DARC reading measures re-
flect a single factor of Reading Comprehension.
However, the test of unidimensionality (i.e., single-factoredness) afforded by
Model 1 is relatively low powered because of the limited number of measures in-
cluded in the model. Expanding the set of measures in the model increases the
power of the model to discriminate between PC and the measures of the DARC. To
introduce the other measures of Table 1 into the model in a meaningful way, we
first fit a series of models that examined only those measures. We began with a
model that included factors for Decoding (Letter Word scale score and Word At-
tack scale score from the WLPB), Narrative Language Production (MLUW, SI,
NDW, and WPM from the retelling), Standardized Language Proficiency (Verbal
Analogies, Listening Comprehension, Picture Vocabulary, and Memory for Sen-
tences from the WLPB), Phonological Awareness (PA; the CTOPP total), Fluency
MEASURES OF READING COMPREHENSION 313
TABLE 3
Fit Statistics for Latent Variable Models of Comprehension
Note. Model 1–RC – Reading comprehension measures only; single - factor model with WLPB–PC
and four DARC indicators (TM, TI, KI, and BK). Model 2–PR – predictors only; factors are decoding
(DE), Narrative Language (NL), Standardized Language (SL), Phonological Awareness (PA), Fluency
(FL), Memory (ME), and Nonverbal IQ (NV). PA, FL, and NV are measured by single indicators in all
models. ME is measured by WLPB MS and CTOPP MD. DE is measured by LW and WA from the
WLPB; NL is measured by all narrative measures (MLUW, NDW, SI, WPM), SL is measured by all
WLPB language measures (PV, VA, LC, and MS). These relations do not change across subsequent
models. Model 3–RCPR: combines models RC and PR; single Reading Comprehension factor. All fac-
tors are allowed to correlate freely. Model 4–2RCPR: is identical to Model 3 but splits the Reading
Comprehension factor into two factors, one measured only by WLPB PC and the other measured by the
four DARC measures (TM, TI, KI, and BK). Models 3 and 4 are nested and can be compared statisti-
cally. Model 5–M4R: reparameterization of Model 4 to account for relations of Reading Comprehen-
sion and Predictor Factors through Factor on Factor regressions. Model 5 is equivalent to Model 4.
Model 5–Restricted: – constrains to 0.0 nonsignificant regressions between the Reading Comprehen-
sion factors and the Predictor Factors. PA did not contribute uniquely to either Reading Comprehen-
sion factor. Regression of the Reading Comprehension factors onto the restricted set of Predictor Fac-
tors (see Table 5) fully accounted for the correlation between the two Reading Comprehension factors.
Model 5 Restricted is nested in Model 5. WLPB = Woodcock Language Proficiency Battery–Revised;
PC = Passage Comprehension; DARC = Diagnostic Assessment of Reading Comprehension; TM =
Text Memory; TI = Text Inferencing; KI = Knowledge Integration; BK = Background Knowledge; MS
= Memory for Sentences; CTOPP = Comprehensive Test of Phonological Processing; MD = Memory
for Digits; LW = Letter Word Scale Score; WA = Word Attack Scale Score; MLUW = Mean Length of
Utterance in Words; NDW = Number of Different Words; SI = Subordination Index; WPM = Words Per
Minute; PV = Picture Vocabulary; VA = Verbal Analogies; LC = Listening Comprehension; MS =
Memory for Sentences; RMSEA = root-mean-square residual; GFI = goodness-of-fit index; AGFI =
adjusted goodness-of-fit index.
(Word Reading Efficiency), Memory (Memory for Digits from the CTOPP), and
Nonverbal Intelligence (Raven’s Colored Progressive Matrices). We subsequently
revised the model based on information about lack of fit, in particular relying on
the modification indices to introduce two changes. The first change was to allow
for a test-specific correlation between the two narrative production measures SI
and MLUW; the second change was to allow Memory for Sentences to load on the
Memory factor along with Memory for Digits. The final model, the fit statistics for
which are presented in Table 3 under Model 2–PR, yielded a reasonable fit to the
data for the 14 predictor measures in Table 1. The overall chi-square for Model 2 is
314 FRANCIS ET AL.
statistically significant, χ2(57, N = 178) = 113.62, p < .001, which suggests a lack
of fit of the model to the data, but the other information in Table 3 suggests a rea-
sonably good-fitting model. Other information, not presented in Table 3, also sug-
gests a good-fitting model. Specifically, the expected cross-validation index
(ECVI) for the model of 1.19 was equal to the ECVI for a saturated model, and the
model AIC (Akaike’s Information Criterion) and saturated model AIC were virtu-
ally identical (210.77 and 210.00, respectively), whereas the model CAIC (consis-
tent AIC) of 411.49 was smaller than the saturated model CAIC of 640.09. Thus,
the model appears to do a reasonably good job of describing the relations among
the 14 predictors. It should be noted that the correlation between the two language
factors in the final version of Model 2 was estimated to be .72. Thus, although the
two language factors are highly correlated, the correlation is different from 1.0, in-
dicating that the narrative and standardized measures are tapping somewhat differ-
ent aspects of language functioning.
Model 3–RCPR combined the single Reading Comprehension factor of Model
1 with the seven predictor factors of Model 2. Information about model fit can be
found in Table 3. The goodness-of-fit index has dropped below .90, and the
SRMSR has increased to above .05. In both of the two models that were combined
to produce Model 3, the SRMSR was below .05. The increase in SRMSR is due to
the combination of the comprehension measures and the predictors in the same
model and suggests that the lack of fit is due to the model’s inability to reproduce
the correlations among the comprehension measures and the predictor measures.
In addition, the ECVI for Model 3 was 2.17, just slightly larger than the ECVI for a
saturated model (ECVI = 2.15). AIC for Model 3 was 383.89, compared with
380.00 for a saturated model, whereas CAIC was 655.71 for Model 3, compared
with 1174.54 for a saturated model. Thus, the fit, although not terrible, has deterio-
rated somewhat relative to that of Models 1 and 2.
Model 4–2RCPR explicitly tests the extent to which the lack of fit in Model 3 is
attributable to the fact that the measures of reading comprehension are not
unidimensional. In particular, Model 4 splits the Reading Comprehension factor of
Model 1 into two factors: one for WLPB–PC and one for the four measures of the
DARC. As mentioned previously, Model 3 is nested in Model 4, and thus we can
explicitly test whether Model 4 offers a statistically significant improvement over
Model 3. As can be seen in Table 3, a substantial portion of the lack of fit in Model
3 is attributed to the mis-specification on the Reading Comprehension factor. By
treating the two sets of reading comprehension measures as separate factors, the
chi-square statistic drops from 246.26 to 202.47 on df = 7. This difference of 43.79
is statistically significant at p < .0001 and represents roughly 18% of the lack of fit
in Model 3, which, it must be recalled, was created by joining two reasonably
well-fitting models. Splitting the reading comprehension measures into two fac-
tors yields an ECVI of 1.93, an AIC of 342.43, and a CAIC of 643.52. Statistics for
the saturated model are unchanged as these are dependent only on the data.
MEASURES OF READING COMPREHENSION 315
TABLE 4
Estimated Factor Correlations From Models 3 and 4
Factor 1 2 3 4 5 6 7 8 9 10
1. Reading Comprehension (Model 3) .73 .66 .99 .66 .61 .48 .49
2. WLPB– Passage Comprehension
(Model 4)
3. DARC Reading Comprehension
(Model 4) .61
4. Decoding .68 .43 .35 .60 .54 .64 .33 .39
5. Narrative Language .54 .64 .35 .72 .46 .51 .38 .24
6. Standardized Language .86 .77 .60 .72 .66 .49 .54 .38
7. Phonological Awareness .58 .48 .54 .46 .66 .43 .59 .41
8. Fluency .55 .41 .64 .51 .49 .43 .31 .33
9. Memory .38 .52 .33 .38 .55 .59 .31 .35
10. Nonverbal IQ .42 .40 .39 .24 .38 .41 .33 .35
Note. Correlations from Model 3 are given above the diagonal; correlations from Model 4 are given
below the diagonal. Model 4 splits the Reading Comprehension factor of Model 3 into two factors: one
for the Woodcock–Johnson Language Proficiency Battery–Revised (WLPB) Passage Comprehension
measure and one for the four measures from the Diagnostic Assessment of Reading Comprehension
(DARC). Note that correlations among the seven predictors factors are unchanged.
Table 4 presents factor correlations from Models 3 (above the diagonal) and 4
(below the diagonal). The correlations among the factors measured by the predic-
tors are unchanged in the two models. The correlations in Table 4 show that the
Reading Comprehension factor of Model 3 is indistinguishable from the Standard-
ized Language factor (r = .99) and is highly correlated with Decoding (r = .73; see
row 1 of Table 4). In contrast, when the Reading Comprehension factor is split into
two factors in Model 4, the two factors show somewhat different relations to the set
of predictors (see the columns labeled 2 and 3 in Table 4). In particular, the PC fac-
tor is more highly correlated than the DARC factor with Decoding (r = .68 vs. .43),
Phonological Awareness (r = .58 vs. .48), Fluency (r = .55 vs. .41), and the Stan-
dardized Language factor (r = .86 vs .77), whereas the DARC factor is more highly
correlated than the PC factor with the Narrative Language Production factor (r =
.64 vs. .54) and the Memory factor (r = .52 vs. .38). Both factors correlate about
equally with Nonverbal IQ. This differential pattern of correlations cannot be ex-
plained as a simple difference in the reliabilities of the PC and DARC factors. If
one factor were simply measured more reliably than the other, then we would ob-
serve the same pattern of correlations but a difference in magnitude. Instead, this
differential pattern of relations indicates that the factors potentially tap somewhat
different aspects of the Reading Comprehension construct.
To understand how these predictor factors might account for variation in PC and
DARC performance, we reparameterized Model 4 so that the predictor factors
were allowed to freely intercorrelate, but the relations between the seven predictor
316 FRANCIS ET AL.
factors and the two reading comprehension factors were accounted for through re-
gression of the two Reading Comprehension factors on the seven predictor factors.
Allowing the two Reading Comprehension factors to have correlated disturbances
yields a model with the same fit, albeit with different parameters. See the line for
Model 5–M4R in Table 3. In this alternate parameterization of Model 4, it is possi-
ble to examine the unique contribution of the seven predictor factors to each of the
two Reading Comprehension factors. In reducing Model 5, we constrained to 0.0
the direct regression of the Reading Comprehension factors on those predictor fac-
tors that did not make a statistically significant contribution to them. The final re-
duced set of factor-on-factor regression coefficients is presented in Table 5. The fit
statistics in Table 3 show that this reduced set of factor-on-factor regressions fits
the data almost as well as Model 5 (i.e., Model 4), which provides the upper limit
on how well the reduced model can fit the data. The difference in chi-square statis-
tics between the two models is not statistically significant, and all remaining fit sta-
tistics are essentially equivalent between the two models. The coefficients in Table
5 show that the Decoding and Fluency factors contribute only to the Reading Com-
prehension factor measured by WLPB–PC and do not contribute to the DARC fac-
tor once the two language and nonverbal reasoning factors have been accounted
for. Furthermore, the two language factors (viz., the Standardized Language factor
and the Narrative Language Production factor) make relatively equal contributions
in predicting the DARC factor, but these two language factors have opposite signs
in predicting to the PC factor. Similarly, the Memory factor has a negative signed
coefficient in predicting to the PC factor. Insofar as none of the correlations among
TABLE 5
Estimated Regression Coefficients for Predicting Reading Comprehension Factors
From Predictors in Model 4
Predictor
Note. Coefficients left blank are constrained to be 0 in the model. PA = Phonological Awareness; WLPB–PC
= Woodcock Language Proficiency Battery–Passage Comprehension; DARC = Diagnostic Assessment of Read-
ing Comprehension.
aR2 = .81. bR2 = .62.
MEASURES OF READING COMPREHENSION 317
the factors were negative, these sign reversals in the coefficients should be inter-
preted with some caution given the relatively small sample size available for this
study (n = 178).
As a final examination of the role of decoding in the PC and DARC factors, we
provide a scatter plot of the DARC total scores (standardized to a mean of 100 and
standard deviation of 15) and the WLPB–PC (see Figure 1). The plotting symbol is
designed to be proportional to the decoding score, which was calculated by averag-
ing the Letter Word scale score and Word Attack scale score from the WLPB. The
plot in Figure 1 shows fairly clearly that higher decoding scores tend to coincide
with higher comprehension scores on both tests, but this pattern is somewhat more
marked for PC. Note the preponderance of larger circles toward the right side of
the figure (e.g., above a score on the horizontal axis of 105 or 110). Contrast that
with the number of smaller circles in the left-hand side of the figure that tend to
range from the bottom of the figure to the top of the figure, that is, they run
throughout the extent of the score range of the DARC. It is certainly the case that
318 FRANCIS ET AL.
higher decoding scores lead to better performance on the DARC, just as they do on
all measures of reading comprehension, but this tendency is less pronounced as
both Figure 1 and the correlations in Table 4 indicate.
DISCUSSION
The results of these analyses show striking differences in the relations of various
predictors to outcomes on the two criterion measures of reading comprehension.
First, WLPB–PC is much more strongly related to print skills than is DARC per-
formance—confirming that we have achieved some degree of success in design-
ing the DARC to minimize the effects of variation in word reading ability on
variation in DARC scores. Particularly striking is that print-related skills of de-
coding and fluency make significant unique contributions to the prediction of
WLPB–PC but not to the DARC once contributions from language and reason-
ing factors have been accounted for. That the factor regressions of the restricted
version of Model 5 fully account for the relation between the two Reading Com-
prehension factors suggests that the basis of their relation is in language and not
in print-level skills. Note that these findings do not mean that the DARC is unre-
lated to print-level skills. Indeed, the correlation between the DARC and
print-level skills factors of Model 4 are moderately large, namely, .43 and .41 for
the Decoding and Fluency factors, respectively. However, these correlations do
indicate that we have achieved a measure of success in reducing the role of print
skills in the DARC measure of comprehension while increasing somewhat the
role of verbal processing.
Second, oral language skills are relatively much more important in explaining
variance in the DARC than in the WLPB–PC outcomes. Although the absolute
level of variance explained on the DARC (R2 = .62) is much lower (R2 = .81 for the
WLPB–PC), reflecting its lower reliability (which places an upper limit on pre-
dictability), it is clear that DARC performance is not so overdetermined by print
skills that the visible contribution of other domains is restricted.
Third, the narrative production measures show a possibly negative relation to
performance on the WLPB–PC assessment in the restricted factor regression
model (reduced version of Model 5) but a significant and positive relation to
DARC performance. In particular, overall narrative skill reflected in the Narrative
Language factor (the SI, MLUW, and NDW used in producing the oral narrative;
measures of its length and sophistication; and the WPM, a measure of oral fluency)
was uniquely related to DARC performance.
These results help to establish the value of the DARC as a measure of reading
comprehension on which performance is determined by abilities we might think of
as central to comprehension itself—memory for the text read, integration of new
information with information stored in memory, making connections across those
MEASURES OF READING COMPREHENSION 319
Limitations
Of course, the study reported here is subject to many limitations. First, the version
of the DARC used was a preliminary one, and the test itself falls short of desired
levels of reliability. This limits the amount of variance that can be explained and
contrasts to the much more reliable subtests of the WLPB.
Second, we report data on a particular group of third-grade ELLs, all of whom
received literacy instruction in a pair of school districts in Texas. It is thus impossi-
ble to estimate the degree to which the results reported here would generalize to a
more heterogeneous sample of readers.
Third, although the DARC was designed to minimize the impact on perfor-
mance of vocabulary knowledge, we have evidently not fully achieved this goal,
despite the very limited lexical range in the DARC passages. Performance on the
DARC shows a significant correlation to language factors, and future versions of
320 FRANCIS ET AL.
the DARC will have to be refined to control and/or manipulate this relationship
more precisely. Of course, whether the relations with language currently shared by
the DARC reflect knowledge of semantics, such as vocabulary knowledge, or ver-
bal reasoning skills that can be differentiated from knowledge of semantics, awaits
further research. This research will likely require greater refinement of the lan-
guage constructs in our models as well as the ability to more precisely measure the
specific processing demands of the comprehension measures. We neither included
models with a general language factor nor attempted to isolate possible methods
factors in the models investigated here. Models with general language factors have
been found in prior research using standardized language measures (Fletcher et al.,
1996) but have not been applied to narrative production and standardized language
measures in ELL populations, to our knowledge. Similarly, Mehta, Foorman,
Branum-Martin, and Taylor (2005) found a single latent factor to account for vari-
ability across language and literacy measures, although their study used only a
very limited set of language measures. It is clear from the current analyses that a
single language factor could not account for the covariances among the language
measures in this sample, as evidenced by the magnitude of the correlation between
the two language factors in all of the models (2–5). However, it is possible that a
general language factor with specific factors designed to capture method variance,
or possibly more specific aspects of language processing, could provide a better fit
to the data than the current models. The extent to which the conclusions reached
about the DARC and PC Reading Comprehension factors would hold up under
competing models for the language factors and other predictors must await future
research.
Fourth, although we would like to argue for the value of assessing reading com-
prehension using measures that minimize the impact of print skills and of vocabu-
lary knowledge for all students, in fact we have so far tested the DARC only with
second-language speakers of English, and its value with monolingual English
readers remains undemonstrated. In making this claim, it is important to keep in
mind what is meant by minimizing print skills. Although we do not deny the im-
portance of print skills in comprehension generally, in assessing comprehension
there is some value to isolating the relative roles of print skills from the various
forms of cognitive processing that take place during comprehension of complex
text so that assessment can effectively be used to guide instruction. So long as per-
formance on comprehension assessments is determined by many factors whose
relative contributions are undifferentiated in the scores obtained on the assess-
ments, the goal of using assessment to guide instruction will remain elusive. We
have proposed here one approach to isolating the contributions of various impor-
tant components to reading comprehension. Specifically, we have shown that it
may be possible to constrain the decoding and vocabulary demands of the text
while increasing the processing demands of understanding the text. At the same
time, we have shown how latent variable models can play an important role in eval-
MEASURES OF READING COMPREHENSION 321
uating the success of our efforts to isolate and measure these important processes.
Although reading comprehension in its natural state is dependent on many skills
and abilities, its assessment may be better served by measures like the DARC that
attempt to isolate these processes from one another for the purpose of diagnosis
and guiding instruction.
ACKNOWLEDGMENTS
REFERENCES
Adams, M. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press.
August, D., Francis, D., Hsu, H.-Y., & Snow, C. (in press). Assessing reading comprehension in
bilinguals. Elementary School Journal.
Bradley, L., & Bryant, P. E. (1983). Categorizing sounds and learning to read—A causal connection.
Nature, 301, 419–421.
Dixon, P., LeFevre, J. A., & Twilley, L. C. (1988). Word knowledge and working memory as predictors
of reading skill. Journal of Educational Psychology, 80, 465–472.
Engle, R. W., Nations, J. K., & Cantor, J. (1990). Is “working memory capacity” just another name for
word knowledge? Journal of Educational Psychology, 82, 799–804.
Fletcher, J. M., Stuebing, K. K., Shaywitz, B. A., Brandt, M. E., Francis, D. J., & Shaywitz, S. E.
(1996). Measurement issues in the interpretation of behavior–brain relationships. In R. Thatcher, N.
Krasnegor, & G. R. Lyon (Eds.), Developmental neuroimaging: Mapping the development of brain
and behavior (pp. 255–262). New York: Academic.
Francis, D., Carlson, C., Fletcher, J., Foorman, B., Goldenberg, C., Vaughn, S., et al. (2005). Oracy/lit-
eracy development of Spanish-speaking children: A multi-level program of research on language mi-
nority children and the instruction, school and community contexts, and interventions that influence
their academic outcomes. Perspectives, pp. 8–12.
Francis, D. J., Fletcher, J. M., Catts, H., & Tomblin, B. (2005). Dimensions affecting the assessment of
reading comprehension. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehension and
assessment (pp. 369–394). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Gathercole, S. E., & Pickering, S. J. (2000). Working memory deficits in children with low achieve-
ments in the National Curriculum at seven years. British Journal of Educational Psychology, 70,
177–194.
Gough, P. B., & Tunmer, W. E. (1986). Decoding, reading, and reading disability. Remedial and Special
Education, 7(1), 6–10.
322 FRANCIS ET AL.
Haenggi, D., & Perfetti, C. A. (1994). Processing components of college-level reading comprehen-
sions. Discourse Processes, 17, 83–104.
Hannon, B., & Daneman, M. (2001). A new tool for understanding individual differences in the compo-
nent processes of reading comprehension. Journal of Educational Psychology, 93, 103–128.
Hulme, C., Muter, V., Snowling, M., & Stevenson, J. (2004). Phonemes, rimes, vocabulary, and gram-
matical skills as foundations of early reading development: Evidence from a longitudinal study. De-
velopmental Psychology, 40, 665–681.
Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills, CA:
Sage.
Lesaux, N., & Siegel, L. (2003). The development of reading in children who speak English as a second
language. Developmental Psychology, 39, 1005–1019.
Loban, W. (1976). Language development: Kindergarten through grade twelve (Research Rep. No.
18). Urbana, IL: National Council of Teachers of English.
Mehta, P. D., Foorman, B. R., Branum-Martin, L., & Taylor, P. W. (2005). Literacy as a unidimensional
multilevel construct: Validation, sources of influence, and implications in a longitudinal Study in
grades 1 to 4. Scientific Studies of Reading, 9, 85–116.
Miller, J., & Iglesias, A. (2003). Systematic analysis of English and Spanish language transcripts.
Madison, WI.
Miller, J. F., Iglesias, A., Heilmann, J., Fabiano, L., Nockerts, A., & Francis, D. (2006). Oral language
and reading in bilingual children. Learning Disabilities Research & Practice, 21, 30–43.
Palmer, J., MacLeod, C. M., Hunt, E., & Davidson, J. E. (1985). Information processing correlates of
reading. Journal of Memory and Language, 24, 59–88.
Perfetti, C. A. (1985). Reading ability. New York: Oxford Press.
Potts, G. R., & Peterson, S. B. (1985). Incorporation versus compartmentalization in memory for dis-
course. Journal of Memory and Language, 24, 107–118.
RAND Reading Study Group. (2002). Reading for understanding: Toward an R&D program in reading
comprehension. Washington, DC: RAND Education.
Raven, J., Raven, J. C., & Court, J. H. (1998). Coloured Progressive Matrices 1998 edition. Oxford,
England: Oxford Psychologists Press.
Schatschneider, C., Francis, D. J., Foorman, B. R., Fletcher, J. M., & Mehta, P. (1998). The
dimensionality of phonological awareness: An application of item response theory. Journal of Edu-
cational Psychology, 91, 439–449.
Tabors, P. O., Snow, C. E., & Dickinson, D. K. (2001). Homes and schools together: Supporting lan-
guage and literacy development. In D. K. Dickinson & P. O. Tabors (Eds.), Beginning literacy with
language (pp. 313–334). Baltimore: Brookes.
Torgesen, J. K., Wagner, R. K., & Rashotte, C. A. (1999). Test of Word Reading Efficiency. Austin, TX:
PRO-ED.
Vellutino, F. R. (1979). Dyslexia: Theory and research. Cambridge, MA: MIT Press.
Vellutino, F. R. (1987, March). Dyslexia. Scientific American, 34–41.
Wagner, R., Torgesen, J., & Rashotte, C. (1999). Comprehensive Test of Phonological Processing. Aus-
tin, TX: PRO-ED.
Woodcock, R. W. (1991). Woodcock Language Proficiency Battery–Revised (English form). Chicago:
Riverside.
Woodcock, R. W., & Muñoz-Sandoval, A. F. (1995). Woodcock Language Proficiency Battery–Revised
(Spanish form). Chicago: Riverside.