You are on page 1of 14

ZDM Mathematics Education (2011) 43:723–736

DOI 10.1007/s11858-011-0347-0

ORIGINAL ARTICLE

Developing conceptions of statistics by designing measures


of distribution
Richard Lehrer • Min-Joung Kim • Ryan Seth Jones

Accepted: 17 June 2011 / Published online: 2 July 2011


Ó FIZ Karlsruhe 2011

Abstract Students often learn procedures for measuring, 1 Introduction


but rarely do they grapple with the foundational conceptual
problem of generating and validating coordination between Measurement is often conceived as a mundane activity, and
a measure and the phenomenon being measured. Coordi- in school it typically arrives pre-formed. Students often
nating measures with phenomenon involves developing an learn measurement procedures (Lee & Smith, 2011), but
appreciation of the objects and relations in each as well as rarely do they grapple with the foundational conceptual
establishing their mutual correspondence. We supported problem of how to establish coordination between the
students’ developing conceptions of statistics by position- objects and relations of a particular phenomenon, its
ing them to design measures of center and of variability for structure, and with the corresponding symbolic, and often
distributions that they had generated through repeated material, objects and relations of its measure. This ‘‘black
measure of a length. After students invented and explored box’’ quality of measure is often true of workplaces as well,
the viability of their measures individually, they partici- where the very success of the tools employed tend to
pated in a public (whole-class conversation) forum obscure the origins of the correspondences established
featuring justification and reflection about the viability of their between measure and phenomena (Bakker, Wijers, Joinker
designed measures. We illustrate how individual invention & Akkerman, 2011). Yet research in mathematics educa-
enticed students to attend to, and to make explicit, tion suggests that even young students can be supported to
characteristics of distribution not initially noticed or known investigate and understand relations between characteris-
only tacitly. Conceptions of statistics and of relevant tics of space, such as length and area, and aspects of their
characteristics of distribution were further expanded as measure, such as unit and scale (e.g., Clements & Bright,
students justified and argued about the utility and 2003). We sought to leverage this emerging tradition
prospective generalization of particular inventions. Teachers of involving students in the conceptual foundations of
supported student learning by highlighting prospective measurement in the less explored domain of statistical
relations between characteristics of measures and charac- reasoning. Statistics measure characteristics of distribution,
teristics of distribution as they emerged during the course but as with other forms of measure, students often treat
of activity in each setting. statistics as matters of procedure (Zawojewski & Shaugh-
nessy, 2000).
Accordingly, we positioned 10- and 11-year-old students
to design measures of characteristics of distributed data,
such as their center and variability. Distribution is a key that
The opinions expressed are those of the authors and do not represent unlocks much of statistical reasoning (Cobb, 1999). Like
views of the U.S. Department of Education or of the U.S. National other measures, a statistic (a measure of a characteristic of a
Science Foundation. distribution) represents a commitment to certain aspects of a
distribution at the expense of others. For example, the mean
R. Lehrer (&)  M.-J. Kim  R. S. Jones
Vanderbilt University, Nashville, USA statistic partitions the distribution into fair (equal) shares, so
e-mail: rich.lehrer@vanderbilt.edu that center is measured by the magnitude of a fair share. In

123
724 R. Lehrer et al.

contrast, the median attends to the order of the values and During formation, before the relation between theory
ignores their magnitude, with the exception of the magni- and measure has stabilized, measure is anything but
tude of the median value to represent the center. In previous ordinary. It is this extraordinary quality, summarized by
research, we found that when students invented statistics, Fig. 1, that we seek to employ to support learning. Grap-
they typically engaged in a dialectic process of closer pling with the measure of a phenomenon positions learners
examination of the characteristics of distribution (e.g., the to explore the objects and relations that constitute a
order of case values, the distance between a case value and phenomenon and simultaneously to generate potential indi-
the center), coupled with struggles to develop measures cators, and relations among these indicators, to constitute a
sensitive to these emerging characteristics (Lehrer & Kim, measure. A constant consideration is the correspondence
2009; Lehrer, Kim & Schauble, 2007; Petrosino, Lehrer, & between indicators and relations in the measure space and
Scahuble, 2003). In this report, we continue to examine the their counterparts in the space of phenomena. The arrows
learning potential of inventing measure by considering both displayed in Fig. 1 are intended to suggest that establishing
individual generation and collective reflection about these coordination between measure and phenomena is interac-
individually invented measures, with an eye toward char- tive. This accords with the common sense notion that as
acterizing the affordances of each setting for learning. We one attempts to measure something, one’s ideas about the
first articulate how measure and phenomena are intertwined nature of the system being measured may change as well,
in any setting. We then situate design of statistical measures and transitions in understanding of a system may similarly
within a perspective of statistics education as learning to provoke transitions in how it is measured. Consider, for
model data. We describe three illustrative cases of statistical example, a fundamental quality of a space such as area. As
invention, one in an individual setting, albeit with teacher one develops a measure of area, one learns to appreciate
scaffolding, and the others in a collective setting featuring additional qualities of area, such as its invariance under
presentation and reflection about the coordination between dissection and re-arrangement, and the implication thereof
the invented measure and characteristics of distributions. for the additivity of units of area. Viewing units of area
The cases reveal the affordances of these distinct forms of measure as additive supports reasoning about other possi-
engagement for student learning. In both contexts, we were ble configurations of the area bounded by a particular fig-
especially interested in how teachers guided student design ure that conserve area (Lehrer, Jacobson et al., 1998). Ford
and reflection. (2010) further suggests that issues of coordination between
measures and phenomena are sustained in communities
1.1 History of measure: the problem of coordination where there are clear advantages for rendering measure and
phenomena in ways that make them reproducible and
Throughout the historical development of knowledge that hence discursive objects for the community (see also,
is now codified by disciplinary boundaries, such as math- Gooding, 1990). This suggests a need for collective as well
ematics, sciences, engineering and technologies, measure as individual forums for developing measure.
and theory were interlocking, so that measurement was one
of the principal ways in which knowledge was generated
(Crosby, 1997). Van Fraassen (2008) succinctly captures
the historic interpenetration of measure and of phenomena:
‘‘What counts as a measurement of (physical quantity) X?
and What is (that physical quantity) X? cannot be answered
independently of each other’’ (p. 116). He suggests
temperature as emblematic of this tangled relation. The
measure of temperature is now so ordinary that the only
novelty worth mentioning is when the reading exceeds or
falls below some threshold. But viewed historically, this
ordinary result represents the terminus of a long and
fractured span where both measure and phenomena were
contested (Chang, 2004). What was temperature? What
was the meaning of zero on a measurement scale? These
contests were eventually resolved by advances in both
theory, where the kinetic approach to ideal gases specified
relations among particles, velocity and temperature, and in
advances in measure material (e.g., indicators based on Fig. 1 Progressive elaboration of structure of measure and structure
different thermometric fluids). of phenomena

123
Developing conceptions of statistics by designing measures of distribution 725

1.2 Situating statistical measure: statistics education Posing Generating &


as modeling Questions SelectingAttributes

The larger orientation toward data in which students


participated was one we term data modeling (Lehrer &
Romberg, 1996; Lehrer & Schauble, 2004; Konold &
Lehrer, 2008), illustrated in Fig. 2. As suggested by
inspection of Fig. 2, data modeling integrates inquiry, the
generation of data, chance and inference. The upper Constructing
triangle conveys the design phase of statistical investigation,
Measures
which begins with a question and includes determination
of which attributes of a phenomenon is suited best for
measures that may shed light on the question. The design
process involves a first-hand sense of measure, in which
the coordination between measure and phenomenon is Structuring Representing
often contested, especially when designing a measure
calls into question the very nature of the attribute being
measured. Assuming that this first-hand sense of measure
has been resolved well enough for the purposes of a
particular inquiry (e.g., appropriate measures have been
either invented or appropriated), the lower triangle of
Fig. 2 depicts steps toward generating inference about the
question in light of uncertainty arising from chance var-
Fig. 2 Data modeling integrates data, chance and inference
iation. Chance variation produces a distribution of mea-
surement values, which we commonly refer to as data. 1.3 The instructional trajectory
Statistics are second-order measures of the distribution of
the resulting data. For example, the sample mean is a In the instances of invented measure we report, students in
measure of the distribution’s center, and the sample fifth- (age 10) and sixth-grade (age 11) classrooms each
standard deviation is a measure of the variability of the measured the length of their teacher’s outstretched arms.
distribution. (See Lehrer & Kim, 2009 for a fuller description of the
Revisiting Fig. 1, characteristics of distributions of data instruction.) Our intention was to engage students in the
play the role of phenomena and statistics play the role of production of variability through measurement error as an
measures of these characteristics. A statistic is composed of initial step for initiating statistical reasoning and as a
symbolic objects and relations that together function to means for answering a simple question about the true
indicate aspects of the structure of a distribution. Consider, length of their teacher’s arm-span. Because they were
for example, the range statistic. This statistic indicates the measure agents, students were positioned to think simul-
variability of a distribution as the difference between the taneously about individual (i.e., a particular value) and
highest and lowest case values. Although this correspon- collective levels (i.e., the batch) of data. These levels
dence and rationale may appear transparent, it will have to parallel the case and aggregate levels of all batches of data
be established by learners. And it is by establishing and but provide ready reference for the distinction in the con-
perhaps reflecting upon such correspondences that learners text of a process that is intelligible to students. Students
are positioned to better appreciate the rationale for con- each repeated the measure with different tools, a change in
ventions established by the field—the particular coordina- process that affected the variability of the resulting col-
tion between statistic and characteristics of distribution lection. For example, measurements of a person’s arm-span
represented by that statistic. For example, students may from a 15 cm ruler and from a meter stick typically form
find that the range statistic itself may vary considerably distributions with near-identical centers, but with clear
from sample-to-sample. Hence, other measures of vari- distinctions in variability. Students’ independent measures,
ability may be preferred. Careful examination of the recorded on sticky notes or index cards, were displayed on
affordances and constraints of measures in light of char- a wall for the more variable measurements.
acteristics of distribution is central to our instructional Small groups of students next invented paper-and-pencil
design. displays on chart paper that captured their sense of a

123
726 R. Lehrer et al.

‘‘pattern’’ or a ‘‘trend’’ that they noticed in the batch of measure and distribution. In all three cases, teachers sup-
measurement data. The intention was to engage students in ported student invention and contest. Hence, in each case,
structuring data and in the construction of the shape of the we describe forms of teacher support that appeared to
data. That is, we anticipated that the configurations of the bootstrap productive learning in each setting.
data highlighted by each display would vary, thus provid-
ing teachers with an opportunity to engage students in a
conversation about how the choices made by designers 2 Methods
affected how the displays looked. Students did generate a
variety of displays, most of which bore little resemblance 2.1 Participants
to conventional displays, despite several years of exposure
to conventional representations in their schooling (Lehrer Students were in one fifth- and two sixth-grade classrooms,
et al., 2007). Students also considered how the shape of the corresponding to ages 10 and 11, respectively. The fifth-
data might change if the process were repeated. This image grade classroom was located in an urban school in the
of repeated process is critical for longer term development southeastern region of the United States. These students
of statistical reasoning (Cobb & Moore, 1997; Thompson, were ethnically diverse (e.g., in one sixth-grade classroom
Liu, & Saldanha, 2007). Students often suggested that the students included recent immigrants from Africa, Ku-
outliers and other ‘‘mistakes’’ were unlikely to re-occur, rdistan, and Mexico, as well as African-American, Asian-
but because a person or an object has a definitive length, American, European-American, Hispanic-American, and
the middle clump of the measurements, visible in only Native-American students). The first author served as the
some types of display, was apt to be repeated. teacher for these students, and the case of individual
With this preamble, we challenged students to design invention is drawn from this classroom. Other students
two different measures. We employ the notion of design to were sixth-graders (n = 95: female = 45, male = 50),
foreground the construction of a method that would result who attended a middle school (grades 6–8) in the Mid-
in a quantity indicating one or more characteristics of a western region of the United States where nearly all
distribution. The first was a measure of the ‘‘best guess of students were of European descent. Their teacher was not a
the real measurement.’’ This approach invited students to participant in the research reported elsewhere. The teacher
consider a statistic as a measure of the signal of the batch had 10 years of teaching experience at this grade level, and
of measurement outcomes, here the true length of the a total of 26 years in the profession. He agreed to partici-
teacher’s arm-span (Konold & Pollatsek, 2002; Petrosino, pate in our efforts to expand the reach of the work beyond
Lehrer, & Scahuble, 2003). The second challenge was to its initial context of development. The teacher taught four
design a measure of the ‘‘precision’’ of the batch of mea- classes, and the cases of collective measure are drawn from
surements, so that students were positioned to develop an two of these classes.
indicator of variability. Student design was assisted by a
computer tool, TinkerPlots (Konold & Miller, 2005; Ko- 2.2 Data sources
nold, 2007), which facilitated the structuring and display of
the original measurements of arm-span. 2.2.1 Case one: inventing a statistic
Although the instructional trajectory went on to engage
students in the construction of models of chance to account From a catalogue of video recordings of invented statistics
for the observed variability of measurements, our analysis developed in our previous research (e.g., Lehrer et al.,
focuses on this point of the trajectory, which typically 2007; Lehrer & Kim, 2009), we selected an episode of
spanned 2 h of instruction. Our aim is to illustrate the invention that occurred during a relatively brief span of
affordances of the individual and collective settings for time, yet that clearly illuminated a student’s attempts to
learning by considering how students resolved the problem grapple with the coordination of the structure of the mea-
of developing coordinating measure-statistic and phenom- sure (an invented median) with observed, tangible char-
ena-distribution in each. We illustrate this process of acteristics of distribution. The teacher’s role was also
coordination by describing three cases. The first illuminates visible in this episode, which we viewed as important for
individual invention as a student developed a measure of clarifying that invention is not intended as a synonym for
the true length of her teacher’s arm-span, during which discovery.
initial intuitions about the importance of the center clump
of the data guided invention of a measure of center. The 2.2.2 Cases two and three: reflecting upon a statistic
second and third cases illuminate how the process of col-
lective negotiation about the uses and limitations of an We identified 10 episodes of measure reviews of student-
invented measure augmented students’ conceptions of both invented statistics of precision in the classrooms of the

123
Developing conceptions of statistics by designing measures of distribution 727

sixth-grade teacher who had not participated in the previ- For each conversational turn, we inferred the nature of
ous studies. Within each of these episodes, a student shared knowledge that appeared to be revealed by participant talk
his or her invented statistic, and the teacher moderated a and gesture. We were oriented toward sequences of turns,
whole-class discussion. The average length of these dis- or exchanges, that either established or that indicated
cussions was 12 min, with the longest lasting about 20 min uncertainty about the coordination between an invented
and shortest, about 1 min (at the end of a class). Of these, statistic and one or more characteristics of the distribution.
we selected two episodes that represented common, yet We tracked shifts in the conceptual grounds of these
distinct approaches seen in students’ invented measures of exchanges, using these shifts to mark the termination of
variability: range and deviation from center. Student–stu- one exchange and the beginning of another. To make the
dent and student–teacher interactions during the measure trajectory of conceptual change more visible, we rendered
reviews were video recorded with a single camera located the transcripts of these conversational exchanges as nar-
at the back of the room. rative and tabular summaries. These summaries repre-
sented the topical shifts in disciplinary knowledge about
2.3 Analysis distribution and/or statistic. To illustrate this process of
summarization, we apply it to a portion of the transcript
Video records of classroom interactions for each case were from the beginning of the second case presented.
transcribed following Jeffersonian transcription notation The exchanges represented in Table 1 were grounded in
(Jefferson, 1984), and were linked to video so that we a shared visual reference generated by the teacher who
could also view participants’ gaze and gestures, to the displayed, via computer projection, graphs of the mea-
extent feasible by the placement of the camera. The Jef- surements taken with more and less precise tools. By
fersonian conventions followed included use of underline exchange, we refer to one or more turns of conversation
to indicate emphasis in speech, = to indicate the break that shared a common goal or topic. Here, in the first turn,
and subsequent continuation of interrupted speech, (# of we observed a coordination between reading the display,
seconds) to indicate pauses in speech with (.) indicating a which for John indicated agreement among measurers, and
pause of less than 0.5 s, ((italic text)) to indicate gestures or the measure-statistic, the range. This constituted the first
descriptive commentary, [ ] to indicate the start and end exchange. We also noted the coordination between an
points of overlapping speech, \ [to indicate more slowly imagined state of display (one column or stack of values)
delivered speech, [ \ to indicate more rapidly delivered and the implication that the resulting measure would be
speech, ° ° to indicate quite speech, ::: to indicate pro- zero (turns 3–11). This constituted the second exchange.
longation of an utterance, . or ; to indicate falling pitch, ? Accompanying the narrative description of the turns was a
or : to indicate rising pitch, - to indicate an abrupt half or tabular form of representation for each exchange, later
interruption in utterance, and (hhh) to indicate audible displayed in Table 3, to help readers coordinate visually
exhalation. the role of the teacher, the nature of the measure proposed,

Table 1 Example of a transcript: initial exchanges in a measure review


Turn Speaker Utterances

01 John So um my my method was to first find the range of your data and um to find out how much um how ho:::w much the graph or
the people agree is uh the range is the key. Um. If the range would be like one like stack= ((moving his right hand up and
down))
02 Teacher That.
03 John =it would be the range would be zero and like zero is the best? Sort of like grading of it? And then.
04 Teacher So zero is good. I am guessing then lower number is good.
05 John [Lower number is really good.]
06 Teacher [Does anybody have question ] on what he just said so far? I will tell you why you need to question him because this gonna be
part of your assignment today cause you are gonna apply one of these things. If you don’t understand what anybody up there
says you need to a:::sk. Okay.
07 John So it’s like zero is the best grading or best range and then infinity?
08 Teacher ((laughing))
09 John Infinity is like the worst.
10 Teacher Ya. Infinity would be horrible. Basically you are telling me that the lower the number the better it is.
11 John Exactly.

123
728 R. Lehrer et al.

and the characteristics of the distribution during each arm-span. Figure 3 is a facsimile of the display constructed
exchange. Both the narrative and tabular forms of summary by the student featured in this case study, Alina. The nature
omitted other qualities of the interaction that were avail- of the display constituted an important resource for Alina’s
able in the fuller transcription, especially the emotional efforts to invent a measure of the best guess of her tea-
valences of the participants (e.g., jokes, laughter) and cher’s arm-span. Moreover, in whole-class conversation
indicators of commitment and involvement in the interac- preceding this episode of invention, Alina suggested that
tions (e.g., ‘‘exactly!’’). the outcomes on the ends of the distribution would be less
likely to appear again, if measurers were more careful in
their measuring procedure. Hence, both the display that she
3 Results constructed and her sense of the relative reliability of
particular case values likely informed Alina’s construction.
The results of the summaries of interaction in each setting The episode of invention (see Table 2 for a summary of
are described, first in the setting of inventing measure and the episode) was initiated as her teacher asked: ‘‘What is
then in the setting of the measure review. your best sense about what it (her teacher’s arm-span)
really is?’’ Alina’s response was linked to the display: She
3.1 Inventing a measure of center: the half spot emphatically enclosed with her hands the center clump of
values in the 150s cm interval displayed in Fig. 3. Alina’s
Recall that prior to invention of statistics, students invented gesture indicated that the aspect of the distribution that she
displays of the batch of measurements of their teacher’s attended to was the center clump. Her initial ‘‘measure’’
was that of literal enclosure of the region with her hands.
In the next exchange (2), the teacher protested that Alina
had indicated a region, not a particular value, a reminder
that the measure should result in a quantity. Alina
responded by shrinking the span enclosed by her hands to
converge on the middle of the neighborhood of values,
suggesting that it must be 155 (cm). Hence, the measure
was transformed from a neighborhood of values, the center
clump, into a quantity representing the mid-point of that
clump.
In the exchange (3) that followed, the teacher invoked
disciplinary criteria of communication and generalization
for measure. He asked Alina to create an algorithm
(‘‘rules’’) for ‘‘someone else’’ so that they would arrive at
the same value (155 cm), and he further set a goal that the
Fig. 3 Alina’s visually guided intuition about the location of the true algorithm should ‘‘work for any batch of data (to get a
measure middle number). Like if we measured again, it would work

Table 2 The case of Alina: emerging coordination between measure and characteristics of distribution
Exchange Teacher role Nature of measure Focal characteristics of distribution

1 Set a goal of finding a best Region of values that were enclosed by her Center clump
guess of a teacher’s arm- hands
span
2 Conveyed a conventional Coordinated gestural sweep from top and Mid-point of ordered list of the values contained in
view of a measure: a bottom of center clump with intersection at the center clump
measure is a quantity middle of center clump
3 Set criteria for measure: Algorithm employed order and interval- Considered entire distribution with properties of
clear communication and group to generate a middle number ordered values within intervals of size 10
generalization
4 Sought clarification of result Imagined generalization of algorithm to Existing data was one instantiation of a repeated
of algorithm other batches of data process
5 No explicit support but prior Interval, order, and count of intervals to The center clump was coordinated with its measure
history of instruction locate the mid-point of the center clump by appeal to symmetry. First explication of shape of
about symmetry distribution as symmetric

123
Developing conceptions of statistics by designing measures of distribution 729

on that batch too. So what do we do?’’ Alina’s response During these five exchanges, Alina sought to render into
was to write ‘‘directions’’ in a manner that she character- measure her intuition that the real length of her teacher’s
ized as ‘‘like Star Logo,’’ a computer programming arm-span would be best estimated by considering the
language. These directions included organizing the data to center clump of the distribution. As she constructed her
partition it into groups of 10, and to order the data from measure, she considered how the properties of interval and
least to greatest within each group. This exchange with the order that were constituents of her display could be
teacher resulted in two forms of transformation, one per- exploited to develop a mechanical (algorithmic) process
taining to measure and the other to distribution. In the that others could also employ to find a middle value, or half
measure world, Alina’s reliance on her own agency was spot. Alina’s consideration of the imagined results of a
supplanted by an algorithmic third-person voice by analogy repetition of the process by which the batch of data of
to a computer program. Historically, this has been an measurements was generated grounded her anticipations of
important step in the development of mathematics (Ber- how the measure might change in light of sample-to-sam-
linski, 2000; Rotman, 2000). The result transformed the ple variation. The role of her teacher included reminding
measure space to include explicit symbolization of the Alina of the need to create a quantity for purposes of
interval and order characteristics of the display, so that measure. He also supported the disciplinary virtues of clear
the foundations of the measure were no longer dependent communication and generalization as desirable qualities of
on indexing by hand and sweep of hand to locate the center measure.
clump. This transition in the objects and relations in the Although Alina’s invention was not as general or as
measure world was accompanied by a renewed focus on easily replicated as conventional measures of center, it
the entire batch of data, enlarging the scope of distribution. contained the seeds of more powerful conventional statis-
Alina suggested (exchange 4) that the next step in the tics by virtue of its reliance on count and order. The visual
directions would be to find the ‘‘half spot,’’ which the display highlighted the symmetry of the distribution, which
teacher assumed to mean a reference to her previous supported the press made by the teacher to generate a
indicator of the mid-point of the center clump. So the quantity to measure a location (here a value along the axis
teacher asked if that meant that the directions would always of reflection). Subsequent to this exchange, Alina revised
locate the group with the most values (the highest relative the scope of count to act on ordered cases instead of
frequency). Instead of agreeing with the teacher, Alina ordered intervals, but it is likely that this further innovation
responded that this might not occur, mentioning that if the was borrowed from other classmates who invented the
measurements were repeated, there might be ‘‘a few less median. Yet Alina’s prior efforts to invent a measure
values in the 150’s and a few more in the 160’s.’’ That is, positioned her to understand the virtues of counting
she was ‘‘seeing’’ the existing sample as one of many ordered cases rather than ordered intervals. Parenthetically,
that could be generated by the process of measuring the the invention of median precipitated another measurement
teacher’s arm-span, perhaps cued by the earlier classroom quandary in the classroom: The middle number (157 cm),
conversation about the potential outcomes of repeating the the median, no longer corresponded to any particular
process by which the measurements were generated. measured value (there were 24 cases). This was an
In the next exchange (5), Alina sought to reconcile her important step in the development of measure, because it
initial sense of mid-point of the highest frequency clump suggested that a measure need not copy to adequately
with her emerging algorithm for measuring the half spot. represent a characteristic of a distribution. When pressed to
Alina’s innovation was to use counts of intervals to the left justify the use of an in-between value as an indicator of the
and right of the interval with the highest relative frequency true length, the student inventors appealed to the consis-
of cases: ‘‘to find our half spot, we counted on each side the tency of the in-between value with neighboring values, so
number we had’’ illustrating that there were four groups to even if no measurer obtained that measure, it could serve as
the left of the 150s and four to the right. She justified this a plausible estimate of true length. This justification was
use of counting by appeal to reflection symmetry, literally reminiscent of Alina’s reliance on neighborhoods of val-
folding the display to demonstrate that the highest fre- ues, center clumps, and suggested to us that displays that
quency interval constituted the line of reflection and that make this characteristic of distribution visible are apt to
the distribution was (nearly) symmetric about this interval. afford opportunities to reason about center statistics as
Hence, the original mid-point indexed by gesture could ‘‘middle numbers.’’ It should be noted that the dependence
now be indicated by a replicable process of grouping, of this sense of middle on order was not transparent to all
ordering, and counting. students in the class. When given an unordered list of
Table 2 summarizes the transitions and coordinations values, some students simply found the middle of the list
between the measure constructed by Alina and the aspects and suggested that this was the median. However, these
of distribution attended to during each state of measure. moments proved to be productive failures (Kapur, 2008)

123
730 R. Lehrer et al.

that provided pedagogical opportunity to revisit the struc- measure, ‘‘the range of your data.’’ He went on in this turn
ture of both the characteristic at hand (center), and the and the one following (2) to suggest minimum (zero) and
corresponding measure (median), and to ground the maximum, (infinity) values of the scale of measure, and he
necessary coordination between order and count in this justified these values by performing, with gesture and
correspondence. word, imagined transformations of the data. He declared
that a measure of zero corresponded to complete agree-
3.2 Reflections on inventions: range ment, indicated by a single column of values. A scaled
value of infinity, ‘‘the worst,’’ would occur as the extreme
As previously mentioned, after inventing a statistic, stu- values of the data moved away from the center with infinite
dents participated in a whole-group conversation, here extension. Other than stage setting, the teacher reminded
termed a measure review, during which the inventor the class of norms of collective responsibility for asking
explained and justified her or his invention. Other students questions whenever the method or its grounds were not
were free to ask questions about, or even challenge, the clear.
inventor’s explanations or justifications, but they were also The next exchange (3) was dominated by explorations of
accountable for understanding how the inventor’s measure the procedure to generate indicators of precision. Several
functioned, both conceptually and procedurally. We turn students tried out the statistic and quickly reached
now toward illustrating how this process of measure review consensus that values of the statistic appropriately corre-
functioned to further elaborate students’ conceptions of sponded with the degree of agreement evident by visual
statistics and of distribution. In this instance, a sixth-grade inspection of the displays in Fig. 4. We anticipated that this
student, John, presented his invention of range as an indi- would conclude this particular measure review, but Ethan
cator of the precision of measure. (The precision of mea- soon challenged the premise of coordination between the
sure is conventionally the reciprocal of its variability, but range and precision during the next exchange (4). He
this distinction was not made by either the teacher or the proposed, plausibly in light of students’ own efforts to
participating students.) The teacher staged this particular measure their teacher’s arm-span, a transformation of
measure review using TinkerPlots to display two distribu- the distribution in the upper panel of Fig. 4 to generate a
tions of the measures of the arm-span of a teacher, illus- ‘‘two-way’’ outlier, meaning that both displays would
trated in Fig. 4. The upper panel represents measurements have similar extreme cases. John’s immediate reaction was:
obtained with a meter stick and the lower panel, mea- ‘‘It doesn’t matter.’’ To support Ethan’s proposed trans-
surements obtained with a 15 cm ruler. The teacher initi- formation, the teacher used TinkerPlots to animate it,
ated the review by reminding the students that the goal was resulting in the display shown in Fig. 5.
to talk about their inventions, ‘‘kinda like the best guess The teacher then asked a question to motivate further
methods, but we call this the best precision methods.’’ contest (exchange 5). He asked whether or not the ‘‘brown
In the first exchange (1), John immediately established a graph (upper panel) still tends to agree more?’’ Looking at
coordination between reading the display as showing ‘‘how the display, Alex responded that the graph in the upper
much the graph or the people agree’’ and his proposed panel indicated greater agreement ‘‘cause its smaller,’’ a

Fig. 4 TinkerPlots display of


the measurements of a teacher’s
arm-span employing a meter
stick (top panel) and a 15 cm
ruler (bottom panel)

123
Developing conceptions of statistics by designing measures of distribution 731

Fig. 5 ‘‘What if’’ we move


these two values?

reference to the extent of spread that he saw. But in the he further established values indicating minimum and
next turn of conversation, Ethan objected to Alex’s visually maximum precision, along with the entailments of each for
guided interpretation by exclaiming: ‘‘Not now! Not by his imagined displays of data. Classroom reflections extended
(John’s) way.’’ the conception of the validity of the range statistic by
The next few turns of the conversation revolved around considering circumstances that might diminish its corre-
the contradiction between the range measure and the spondence with characteristics of the distribution (its
qualities of the distribution evident by visual inspection. clumpiness) that were constituents of variability. The
Clear differences in the variability of the data were now not teacher’s role was one of animating some of the transfor-
reflected by a measure that relied on only two values. mations imagined by students with the display technologies
In the next exchange (6), like any good inventor, John of TinkerPlotsTM. In addition, at critical moments he
proposed a patch to the measure, one of trimming an outlier supported dissent, allowing for the development of the
in each tail of the distribution of values. Some students space of imagined transformations. The teacher pursued
agreed, but Ethan again objected, pointing out that perhaps disciplinary values of communicative clarity and coherence
it was not appropriate to remove outliers: ‘‘(they) aren’t by pursuing the paradox raised by measures whose values
outliers by some people I mean.’’ Ethan’s objection seemed did not correspond to perceptually clear differences in
to suggest that what might be one person’s outlier was variability.
another person’s valid measurement. Other students won-
dered about the effects of multiple outliers. 3.3 Reflections on inventions: deviation
The teacher decided that the conversational exchange
presented an opportunity for further development of dis- The second measure review, conducted in another class
ciplinary values of coherence and communicative clarity. taught by the same sixth-grade teacher, featured explora-
Accordingly, he continued the conversation (exchange 7) tion of another student-invented measure, the sum of
by animating an unrestricted truncation (‘‘keep chopping it deviations of measured values from the sample median.
down’’), by removing cases symmetrically until he had The duration of this review was much longer, so we have
achieved an absurd result of reaching the median (which selected only exchanges involving explicit coordination of
was identical) for each distribution. This demonstrated the the statistic with qualities of variability. The teacher again
need to clarify and codify the truncation repair. But, as staged the conversation with the display depicted in Fig. 4,
the teacher concluded, ‘‘there is nothing wrong with the and he reminded the class of norms of sense making (‘‘And
range,’’ as he emphasized that the purpose was to examine you guys are listening for does this make sense?’’) and
the circumstances in which it would serve as good measure. generalization (‘‘Could I do this on other data?’’).
Table 3 summarizes the trajectory of the conversation, In the first exchange, the student designer, Adam, pro-
which began with an almost ideal grasp of measure by its posed a deviation-based approach to precision that relied
author: John clearly related properties of measure to on the distance between each case and the median and
properties of distribution associated with its variability, and represents the precision as the sum of these deviations:

123
732 R. Lehrer et al.

Table 3 Summary of the measure review of John’s method


Exchange Teacher role Nature of measure Attended characteristics of distribution

1 Set the stage by sharing visual reference Range How much people agreed was indicated by
and reminding of norms governing clumps in the data
measure review
2 Teacher adopted John’s voice to Minimum, maximum values of scale Complete agreement indicated by one stack of
highlight the measure of zero and its data on display, corresponding to a measure
correspondence to complete of zero
agreement/no variability Complete disagreement indicated by imagined
smear of data, infinite measure
3 Solicited other participants to test the Calculated values for empirical Checked correspondence between obtained
claims samples values and observed clumps of data
4 Animated transformation of data Correspondence between range and An imagined distribution with extreme values
qualities of distribution challenged but high amount of clumping
5 Created tension by highlighting the Explored generalization of measure The imagined transformation created a
contrast between ‘‘seeing’’ and tension—students ‘‘saw’’ that the
measuring distributions were comparatively more or less
variable, but the statistic was now no longer
sensitive to these differences
6 No explicit teacher support Truncated range Trimmed values were removed from the
distribution
7 Privileged value of algorithm that Revised measure must include Distribution recharacterized by regions of
included clear guidance about stopping rule for truncation. Range values, some of which may be more
trimming may not account for clumps or important for purposes of measuring
regions of values precision

‘‘So were gonna find for our thing were gonna find for zero’’), which he again coordinated with an imagined
each data point ((pointing with index figure at a location (‘‘can’t be anywhere away from the median’’).
case)) were gonna find how far away each data point Following several other exchanges in this spirit, the
is from the median ((gestures with hand)). Like how teacher invited (exchange 5) students to disrupt the coor-
far each individual one is. And then we add that up dination between statistic and variability previously
for each thing (each deviation) and that shows it established (‘‘Can anyone think of any time that this might
(precision) for ours.’’ become a problem?’’). A pair of students proposed another
imagined distribution, identical to the extant data but with
Other members of the class did not understand Adam’s
one difference (‘‘one outlier that is like way far away and
explanation, so the teacher created data consisting of a few
then everything else is clumped’’). Acting on this imagined
cases and a median. Adam demonstrated his method with
distribution, Jeff illustrated how Adam’s method would
this small batch (exchange 2), and most students indicated
convey the misleading impression that the measurements
that they now understood Adam’s method. The teacher
were ‘‘not as precise.’’
asked about the direction of the difference (negative vs.
The teacher (exchange 6) elaborated on Jeff’s proposed
positive) and Adam clarified that he was interested in the
scenario by aligning himself and the class with it: ‘‘We
absolute value of the deviations, not their direction.
pick a ridiculous number like a thousand.’’ By choosing an
In the next exchange (3), Adam further clarified his
extreme case, the teacher illustrated Jeff’s concern with a
measure by aligning the resulting sums of deviations as
common heuristic employed in mathematical reasoning,
measures of precision with reference to an ideal, imagined
extreme case. The conversation turned toward debates
state (‘‘zero is most precise’’) of complete alignment of all
about the role of the outlier. Some students proposed
values with one location.
eliminating it because it is so distinctive, while others
When the teacher invited questions (exchange 4), many
argued that a measure of spread should indicate how much
students responded that they liked the measure, because it
‘‘all of the measurements’’ tend to agree, thus justifying the
clearly coordinated the measured quantity with the shared
outlier’s effect on the measure. This debate so engaged the
sense of which display exhibited (‘‘you can see’’) greater
students that as the teacher received a call on the telephone
precision. During this exchange, Adam again reminded the
from the office some of the students continued to negotiate
class of the anchor of the scale (‘‘the precisest one is

123
Developing conceptions of statistics by designing measures of distribution 733

Table 4 Summary of the measure review of Adam’s method


Exchange Teacher role Nature of measure Characteristics of distribution

1 Staged distributions Set norms Sum of deviations from center Variability as distance
2 On-the-spot creation of example Method to calculate demonstrated Distance coordinated center and case, absolute value,
distribution. Highlighted absolute not signed differences
value
3 No explicit teacher support Minimum value Imagined distribution of all values at same location
4 Invites questions Communicative clarity. Minimum Measure aligns with visual perception of extant
value distributions. Imagined distribution of all values at
the median
5 Invites problematic Outlier may result in lack of Imagined copy of distribution but with an outlier
correspondence between measure
and precision
6 Poses extreme case Outlier’s effect on measure is more Imagined copy of distribution but with a ‘‘ridiculous’’
evident outlier

the measures value in light of the effect of the outlier. the terrain as they grappled with the tension between
Table 4 summarizes the trajectory of this measure review. generating a quantity and understanding whether or not that
As in the previous measure review, students were sen- quantity was an adequate measure of the desired charac-
sitive to the need to establish the quality of the precision teristic of the distribution, or even whether the character-
statistic by coordinating measure values with observed and istic of the distribution indicated by the measure was in fact
with imagined states of variability. The imagined states of the most desirable. For example, in the case of individual
variability include plausible transformations as well as invention, the student’s initial sense of measure was one of
those generated for the sake of the argument—the ‘‘ridic- gestural enclosure of a region of case values. The particular
ulous’’ number suggested by the teacher. The contest region selected was guided by the display the student had
around outliers and their effect on the sum of deviations previously developed, which structured distribution into
was extended the next day by the teacher, who suggested a decade intervals and ordered the values within each dec-
further problematic by again posing two different distri- ade. Hence, the student began with a clear intuition that the
butions, one with a large number of cases but with little referent of the measure, the real length of her teacher’s
variability and the other with a small number of cases and a arm-span, was apt to be located in the middle region of the
comparatively large degree of variability. In this imagined graph. Yet, this gesture was not a quantity, and the stu-
scenario, the sum measure did not correspond to the spread dent’s next step was to locate, again by hand, the middle
of data visible in the displays. We mention this to highlight value within the region. Reminded that this process was not
that the measure reviews produced multiple opportunities reproducible by others, the student constructed an algo-
for students to reconcile disrupted correspondence between rithm that would allow other members of the class to locate
characteristics of distribution and measures of the charac- the same value as representing the ‘‘1/2 spot.’’ As she
teristics, many more than could be represented here. constructed the algorithm, the structure of the distribution
that was previously tacit was made explicit, as she
employed count of ordered intervals to identify the critical
4 Discussion region. Somewhat unexpectedly, she also anticipated the
effects on the measure of other samples drawn from the
Our aim in the research program reported here was to same process, indicating that her sense of the structure of
design instruction that positioned students to participate in the distribution was such that she expected some variability
the dialectic of measure—the emerging relation between in the center if the process were repeated. She justified her
the structure of a measure and the structure of the phe- algorithm by appealing to the symmetry of the distribution,
nomena being measured—by posing the challenge of literally demonstrating reflection symmetry around the line
designing statistics of their own invention. By participating of reflection defined by her choice of center. The process of
in this dialectical process, we anticipated that students invention was dialectical in the sense that her initial mea-
would explore much of the conceptual terrain encompassed sure fell short of disciplinary demands of measurement,
by the discipline’s treatment of statistic and distribution. In and this insufficiency appeared to generate closer attention
this, we were not disappointed. Students traversed much of to the characteristics of distribution. These characteristics

123
734 R. Lehrer et al.

were then exploited to generate a method of measure that 4.1 The promise of measure reviews as classroom
could be reproduced and communicated. As we indicated, activity structures
the use of interval bootstrapped the development of the
measure, and in subsequent revision (the next day) was As Ford (2010) notes, systems of measure do not develop
eliminated by relying on counts of cases rather than counts in soliloquy, but instead gain traction through their par-
of intervals to find the 1/2 spot. This revision was not ticipation in a wider collective where they are challenged
trivial, especially because it identified a different value as and refined. Holland, Lachicotte, Skinner & Cain (1998)
the middle, a value that did not exist in the data. As with note that such forms of collective, figured worlds, take
other acts of invention, this quandary provided pedagogical shape within and grant shape to the co-production of many
opportunities for the class. During this act of invention, the forms of social activity. But in realms of measurement, the
teacher was not a passive observer but instead engaged in collective focus has an additional impetus of reproduc-
disciplinary dialogue during which he represented values tion—to participate in a measure discourse, one must be
of communicative clarity, reproducibility and generaliza- able to frame a measure as reproducible by its participants.
tion, and the role of algorithm as a means to encapsulate Such explicitness then allows for opposition, as members
these values. We believe that it is important to provide envision circumstances or visions of phenomena not cap-
students with enough timely support and sufficient leeway tured by the measure. In each case of measure review, we
to generate their own measures so that the essential tension found that student-invented statistics included clear coor-
between statistic and distribution is maintained. A signa- dination between quantity and variability. In each, the
ture of sufficient tension is successive waves of student inventors clarified the scale of measure, invoking in both
revision, as demonstrated so clearly by this case of reviews an imagined distribution of values corresponding
invention. to zero variability, and contrasted this value to both
The cases described illuminate a repeated finding empirical and imagined distributions with much greater
across several studies: As students design statistical variability and correspondingly greater measured quanti-
measures, their resolutions to the challenges of quantifi- ties. These initial orientations toward measure suggest
cation have many parallels in statistical convention. For student appropriation of the measurement ‘‘game.’’ As
example, as we reported, students’ invented measures of further evidence of appropriation by students of the need to
precision included notions of the distance between end- engage the problematic of measure, the very quality of
points of the distribution and of distance between case valid measure established by the presenter in each case was
and center. In the discipline, these conceptions of vari- employed to undermine it by other members of the class.
ability find their counterparts in the range and in average The primary vehicle for doing so was to imagine other
deviation, although the latter is defined by the mean plausible transformations of the empirical data, so that
rather than by the students’ choice of median deviation. reflections about the viability and generalization of the
Inventing measures also supported the development of measure were carried out primarily in imagined, not
what we term meta-measure competency (related to empirical realms. The resulting exchanges often corre-
diSessa’s (2004) construct of meta-representational com- sponded to important issues in professional practice, such
petency), meaning that students appeared increasingly to as tensions between accounting for all data and trimming
generate epistemic criteria for what counted as a good unruly data. Hence, although the nature of student inter-
measure. Some of these criteria were very general, and actions varied in each case, the activity structure of the
we suspect that students might be disposed to invoke measure review provided a public space where students
them in other systems of measure. For example, students made visible the grounds of argument for a measure’s
often demanded that other students define and clarify the validity and where students suggested possible worlds
structure of a statistic in a manner that could be readily where these grounds might prove insufficient. The latter is
communicated and replicated by people other than its critical to mathematical generalization.
inventor. Moreover, although students often struggled
with the notion of creating a quantity to represent a dis- 4.2 The teacher’s role in promoting disciplinary values
tribution of quantities, there were multiple occasions and practices
where students claimed that a measure was adequate,
because there was a correspondence between the quantity During the process of initial design, the teacher invoked the
generated by the measure and the quality of the distri- need for communicative clarity and reproducibility to help
bution being measured. Students often demanded too that the student initiate a transformation from graphical region
measures be general and not tied to particular circum- to quantity. The teacher also stood in for the disciplinary
stances, as seen in the use of imagined distributions to value of generalization, although in this, the student
contest the measures. inventor was his willing accomplice (it was she who

123
Developing conceptions of statistics by designing measures of distribution 735

mentioned to the teacher the need to think about what would Berlinski, D. (2000). The advent of the algorithm. New York:
happen if the process were repeated). During the measure Harcourt.
Chang, H. (2004). Inventing temperature. Measurement and scientific
reviews, the teacher promoted a focus on relations between progress. Oxford: Oxford University Press.
a statistic and the characteristic of the distribution being Clements, D. H., & Bright, G. (2003). Learning and teaching
measured by that statistic. Some of the teacher moves were measurement. 2003 Yearbook. Washington, D.C.: National
generic, such as staging the discussion by providing a Council of Teachers of Mathematics.
Cobb, P. (1999). Individual and collective mathematical learning: The
projection of two distributions simultaneously. This staging case of statistical data analysis. Mathematical Thinking and
meant that students had a common object to view and Learning, 1, 5–44.
transform as needed. But other teacher moves were less Cobb, G. W., & Moore, D. S. (1997). Mathematics, statistics, and
generic. For example, the teacher often assisted student teaching. American Mathematical Monthly, 104(9), 801–823.
Crosby, A. (1997). The measure of reality: Quantification and
exploration of the viability of particular statistic-distribu- western society 1250–1600. New York: Cambridge University
tion relations by creating paradoxes, such as the exchange Press.
during the measure review of the range statistic. In this diSessa, A. (2004). Metarepresentation. Native competence and
exchange, student perception of the density of case values in targets for instruction. Cognition and Instruction, 22(3),
293–331.
the display suggested one conclusion about the comparative Ford, M. J. (2010). Critique in academic disciplines and active
variability of the two sample distributions, but the value of learning of academic content. Cambridge Journal of Education,
the statistic demanded a different conclusion. The paradox 40(3), 265–280.
between seeing and measuring helped students understand Gooding, D. (1990). Experiment and the making of meaning. Human
agency in scientific observation and experiment. Dordrecht:
the contingencies of a particular measure and communi- Kluwer Academic Publishers.
cated more general lessons about the epistemic nature of Holland, D., Lachicotte, W., Jr., Skinner, D., & Cain, C. (1998).
measure. In both cases, the teacher also ratified disciplinary Agency and identity in cultural worlds. Cambridge, MA:
values such as the extreme case heuristic and the generation Harvard University Press.
Jefferson, G. (1984). Transcription notation. In J. Maxwell & J.
of possible, rather than sole reliance on empirical, scenarios Heritage (Eds.), Structures of social action (pp. ix–xvi). New
for considering the validity of a measure. York: Cambridge University Press.
In closing, we suggest that involving students in the Kapur, M. (2008). Productive failure. Cognition and Instruction,
design of measures is a practical pedagogy by which 26(3), 379–424.
Konold, C. (2007). Designing a data analysis tool for learners. In M.
understanding of phenomena and of measure co-originate. C. Lovett & P. Shah (Eds.), Thinking with data (pp. 267–291).
As students invent measures, they encounter and attempt to New York: Lawrence Erlbaum Associates.
resolve the often problematic and occasionally contentious Konold, C., & Lehrer, R. (2008). Technology and mathematics
relation between phenomena and measure. Initial measures education. In L. D. English (Ed.), Handbook of international
research in mathematics education (pp. 49–69). New York:
can be fruitfully expanded and refined by participation in Routledge.
activity structures such as measure reviews, where different Konold, C., & Miller, C. D. (2005). TinkerPlots: Dynamic data
perspectives about the same measure provide rich oppor- exploration [Computer software]. Emeryville, CA: Key Curric-
tunities for productive conceptual expansions of the ulum Press.
Konold, C., & Pollatsek, A. (2002). Data analysis as the search for
meaning of a measure and of the characteristics of the signals in noisy processes. Journal of Research in Mathematics
phenomenon indicated by that measure. Teachers play a Education, 33(4), 259–289.
vital role in supporting students’ reach for complex notions, Lee, K., & Smith III, J. P. (2011). What’s different across an ocean?
such as statistic and variability, so that their reach comes to How Singapore and U.S. elementary mathematics curricula
introduce and develop length measurement (in press).
exceed their initial grasp. Such experience and associated Lehrer, R., Jacobson, C., Thoyre, G., Kemeny, V., Strom, D.,
mathematical aesthetic may prove to be of enduring value Horvath, J., et al. (1998). Developing understanding of geometry
as students participate in ever widening circles of civic and space. In R. Lehrer & D. Chazan (Eds.), Designing learning
discourses that are governed by measures and models. environments for developing understanding of geometry and
space (pp. 169–200). Lawrence Erlbaum Associates: Mahwah,
NJ.
Acknowledgments The research reported here was supported by Lehrer, R., & Kim, M.-J. (2009). Structuring variability by negoti-
the U.S. National Science Foundation, REC-0337675 and by the ating its measure. Mathematics Education Research Journal,
Institute of Education Sciences, U.S. Department of Education, 21(2), 116–133.
through Grant R305K060091 to Vanderbilt University. Lehrer, R., Kim, M.-J., & Schauble, L. (2007). Supporting the
development of conceptions of statistics by engaging students in
measuring and modeling variability. International Journal of
Computers for Mathematical Learning, 12(3), 195–216.
References Lehrer, R., & Romberg, T. (1996). Exploring children’s data
modeling. Cognition & Instruction, 14(1), 69–108.
Bakker, A., Wijers, M., Joinker, V., & Akkerman, S. (2011). The use, Lehrer, R., & Schauble, L. (2004). Modeling natural variation through
nature and purposes of measurement in intermediate-level distribution. American Educational Research Journal, 41(3),
occupations. ZDM. 635–679.

123
736 R. Lehrer et al.

Petrosino, A. J., Lehrer, R., & Schauble, L. (2003). Structuring error In M. C. Lovett & P. Shah (Eds.), Thinking with data (pp. 207–231).
and experimental variation as distribution in the fourth grade. New York: Lawrence Erlbaum.
Mathematical Thinking and Learning, 5(2&3), 131–156. Van Fraassen, B. C. (2008). Scientific representation: Paradoxes of
Rotman, B. (2000). Mathematics as sign. Stanford: Stanford Univer- perspective. Oxford: Oxford University Press.
sity Press. Zawojewski, J. S., & Shaughnessy, J. M. (2000). Mean and median:
Thompson, P. W., Liu, Y., & Saldanha, L. A. (2007). Intricacies Are they really so easy? Mathematics Teaching in the Middle
of statistical inference and teachers’ understanding of them. School, 5, 436–440.

123

You might also like