Interpreting Evaluative Evidence

INTERPRETING EVALUATIVE EVIDENCE
We have pointed out that measures of the outcomes of an instructional pro-

gram—that is, measures of learned intellectual skills, cognitive strategies,
information, attitudes, and motor skills—are influenced by a number of variables
in the educational situation besides the program itself. Process variables in the
operation of the instructional program may directly affect learning and, thus, also
affect its outcomes. Support variables in the school or in the home determine the
opportunities for learning and, thus, influence the outcomes of learning that are
observed. And most prominently of all, the learning aptitude of students strongly
influences the outcomes measured in an evaluation study.
If the effectiveness of the designed instruction is to be evaluated, certain
controls must be instituted over process, support, and aptitude variables to ensure
that the "net effect" of the instruction is revealed. Procedures for accomplishing this
control are described in this section. Again, it may be necessary to point out that
only the basic logic of these procedures can be accounted for here. However, such
logic is of critical importance in the design of evaluation studies.
Controlling for Aptitude Effects
The assessment of outcomes of instruction in terms of question 1 (To what
extent have objectives been met?) needs to take account of the effects of aptitude
variables. In the context of this question, it is mainly desirable to state what is the
level of intelligence of the students being instructed. This may be done most simply
by giving the average score and some measure of dispersion of the distribution of
scores (such as the standard deviation) on a standard test of intelligence. However,
correlated measures such as SES are frequently used for this purpose. Supposing
that 117 out of 130 objectives of a designed course are found to have been met, it
is of some importance to know whether the average IQ of the students is 115 (as
might be true in a suburban school) or 102 (as might occur in some sections of a
city or in a rural area). It is possible that, in the former setting, the number of
objectives achieved might be 117 out of 130, whereas in the latter, this might drop
to 98 out of 130. The aims of evaluation may best be accomplished bv trving out
the instructional entity in several different schools, each having a somewhat
different range of student learning aptitude.
When the purposes of question 2 (To what degree is it better?) are being
served in evaluation, one must go beyond simply reporting the nature and amount
of the aptitude variable. In this case, the concern is to show whether any difference
exists between the new instructional program and some other—in other words, to
make a comparison. Simply stated, making a comparison requires the
demonstration that the two groups of students were equivalent to begin with.
Equivalence of students in aptitude is most likely to occur when successive classes
of students in the same school, coming from the same neighborhood, are employed
as comparison groups. This is the case when a newly designed course is introduced
in a classroom or school and is to be compared with a different course given the
previous year.
Other methods of establishing equivalence of initial aptitudes are often
employed. Sometimes, it is possible to assign students randomly to different
classrooms within a single school, half of which receive the newly designed
instruction and half of which do not. When such a design is used, definite
administrative arrangements must be made to ensure randomness—it cannot be
assumed. Another procedure is to select a set of schools that are "matched," insofar
as possible, in the aptitudes of their students and to try out the new instruction in
half of these, making a comparison with the outcomes obtained in those schools not
receiving the new instruction. All of these methods contain certain complexities of
design that necessitate careful management if valid comparisons are to be made.
There are also statistical methods of control for aptitude variables—methods
that "partial out" the effects of aptitude variables and, thus, reveal the net effect of
the instruction. In general, these methods follow this logic: If the measured outcome
is produced by A and I, where A is aptitude and I is instruction, what would be the
effect of I alone if A were assumed to have a constant value rather than a variable
one? Such methods are of considerable value in revealing instructional
effectiveness, bearing in mind particularly the prominent influence the A variable
is likely to have.
Whatever particular procedure is employed, it should be clear that any valid
comparison of the effectiveness of instruction in two or more groups of students
requires that equivalence of initial aptitudes be established. Measures of
intelligence, or other correlated measures, may be employed in the comparison.
Students may be randomly assigned to the different groups or their aptitudes may
be compared when assignment has been made on other grounds (such as school
location). Statistical means may be employed to make possible the assumption of
equivalence. Any or all of these means are aimed at making a convincing case for
equivalence of learning aptitudes among groups of students whose capabilities
following instruction are being compared. No study evaluating learning outcomes
can provide valid evidence of instructional effectiveness without having a way of
controlling this important variable.
Controlling for the Effects of Support Variables
For many purposes of evaluation, support variables may be treated as input
variables and, thus, controlled in ways similar to those used for learning aptitude.
Thus, when interest is centered upon the attainment of objectives (question 1), the
measures made of support variables can be reported along with outcome measures
so that they can be considered in interpreting the outcomes. Here again, a useful
procedure is to try out the instruction in a variety of schools displaying different
characteristics (or different amounts) of support.
Similarly, the comparisons implied by question 2 and part of question 3
require the demonstration of equivalence among the classes or schools whose
learning outcomes are being compared. Suppose that outcome measures are
obtained from two different aptitude-equivalent groups of students in a school, one
of which has been trying out a newly designed course in English composition, while
the other continues with a different course. Assume that, despite differences in the
instruction, the objectives of the two courses are largely the same and that
assessment of outcomes is based on these common objectives. . Class M is found
to show significant^ better performance, on the average, than does class N. Before
the evidence that the new instruction is "better" can be truly convincing, it must be
shown that no differences exist in support variables. Since the school is the same,
many variables of this sort can be shown to be equivalent, such as the library, the
kinds of materials available, and others of this nature. Where might differences in
support variables be found? One possibility is the climate of the two classrooms—
one may be more encouraging to achievement than the other. Two different teachers
are involved—one may be disliked, the other liked. Student attitudes may be
different—more students in one class may seek new opportunities for learning than
do students in the other. Variables of this sort that affect opportunities for learning
may accordingly affect outcomes. Therefore, it is quite essential that equivalence
of groups with respect to these variables be demonstrated or taken into account by
statistical means.
Controlling for the Effects of Process Variables
The assessment and control of process variables is of particular concern in
seeking evidence bearing on the attainment of stated objectives (question 1). Quite
evidently, an instructional entity may work either better or worse depending upon
how the operations it specifies are carried out. Suppose, for example, that a new
course in elementary science presumes that teachers will treat the directing of
students' activities as something left almost entirely to the students themselves
(guided by an exercise booklet). Teachers find that under these circumstances, the
students tend to raise questions to which they (the teachers) don't always know the
answers. One teacher may deal with this circumstance by encouraging students to
see if they can invent a way of finding the answer. Another teacher may require that
students do only what their exercise book describes. Thus, the same instructional
program may lead to quite different operations. The process variable differs
markedly in these two instances, and equally marked effects may show up in
measures of outcome. If the evaluation is of the formative type, the designer may
interpret such evidence as showing the need for additional teacher instructions or
training. If summative evaluation is being conducted, results from the two groups
of students must be treated separately to disclose the effects of the process variable.
In comparison studies (question 2), process variables are equally important.
As in the case of aptitude or support variables, they must be controlled in one wav
or another in order for valid evidence of the effectiveness of instruction to be
obtained. Equivalence of groups in terms of process variables must be shown, either
by exercising direct control over them by a randomizing approach, or by statistical
means. It may be noted that process variables are more amenable to direct control
than are either support or aptitude variables. If a school or class is conducted in a
noisy environment (a support variable), the means of changing the noise level may
not be readily at hand. If, however, a formative evaluation study shows that some
teachers have failed to use the operations specified by the new instructional
program (a process variable), instruction of these teachers can be undertaken so that
the next trial starts off with a desirable set of process variables.
Unanticipated outcomes (question 3) are equally likely to be influenced by
process variables and accordingly require similar control procedures. A set of
positive attitudes on the part of students of a newly designed program could result
from the human modeling of a particular teacher and, thus, contrast with less
favorable attitudes in another group of students who have otherwise had the same
instruction. It is necessary in this case, also, to demonstrate equivalence of process
variables before drawing conclusions about effects of the instructional entity.
Controlling Variables by Randomization
It is generally agreed that the best possible way to control variables in an
evaluation study is to ensure that their effects occur in a random fashion. This is the
case when students can be assigned to control and experimental groups in a truly
random manner or when an entire set of classes or schools can be divided into such
groups randomly. In the simplest case, if the outcomes of group A (the new
instructional entity) are compared with those of group B (the previously employed
instruction), and students drawn from a given population have been assigned to
these groups in equal numbers at random, the comparison of the outcomes may be
assumed to be equally influenced by aptitude variables. Similar reasoning applies
to the effects of randomizing die assignment of classrooms, teachers, and schools
to experimental and control groups in order to equalize process and support
variables.
Randomization has the effect of controlling not only the specific variables
that have been identified, but also other variables that may not have been singled
out for measurement because their potential influence is unknown. Although ideal
for purposes of control, in practice, randomizing procedures are usually difficult to
arrange. Schools do not customarily draw their students randomly from a
community or assign them randomly to classes or teachers. Accordingly, the
identification and measurement of aptitude, support, and process variables must
usually be undertaken as described in the preceding sections. When random
assignment of students, teachers, or classes is possible, evaluation studies achieve
a degree of elegance they do not otherwise possess.

Interpreting Evaluative Evidence

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Interpreting Evaluative Evidence

Uploaded by

Copyright:

Available Formats

INTERPRETING EVALUATIVE EVIDENCE

We have pointed out that measures of the outcomes of an instructional pro-

You might also like