You are on page 1of 29

Evaluation of the Target 2021 Program

After Five Semesters of Implementation

By Tim R. Sass, Ph.D.

January 14, 2019


Executive Summary
In an effort to improve outcomes for students potentially impacted by test manipulation on the
2009 CRCT exams, the Atlanta Public Schools created the “Target 2021” initiative. The program’s goals
included improving four outcomes: attendance, course failure, reading achievement, and graduation.
Among students enrolled in APS in 2014/15, students were initially identified as eligible for the program
based on having high numbers of answers changed from wrong to right on their 2009 CRCT exams. In fall
2015, families of eligible students were invited to participate, and over 99 percent agreed to do so. A total
of 3,075 were on the initial list to be offered services and were still enrolled in APS on the first day of the
spring 2016 semester. By January 2016 individual support plans were created for each participating
student and services began to be provided soon thereafter. These services included academic monitoring,
individual counseling, attendance incentives, ACT/SAT prep, tutoring, behavioral interventions,
college/career fairs and parent workshops. The available information is not sufficient to determine
exactly why students received different services, the precise nature of services each student received and
how the intensity of those services differed across students. Therefore, the analysis is based on the
average effect of the program over all participants during the first five semesters of implementation
from spring 2016 to spring 2018.

This report is the third evaluation of the impact of the Target 2021 program. The first report
analyzed the effects of the first semester of implementation, spring 2016. The second report, issued in
November 2017, provides estimates of impacts after three semesters, spring 2016 through the end of the
2016/17 school year. In the current report the analyses are extended to the first five semesters of
implementation, spring 2016 through the end of the 2017/18 school year. With the exception of high
school graduation, outcomes for students who participated in the program in spring 2016, the 2016/17
school year and the 2017/18 school year are compared to the outcomes for students who had been in
classrooms flagged for high levels of wrong-to-right erasures in spring 2009, but which individually had
few erasures and were thus not offered Target 2021 services (and remained in APS through the 2017/18
school year). For graduation, the available data track students who move to other districts in Georgia and
thus it is possible to compare students who participated in the first semester of Target 2021 (Spring 2016)
with students who were also in flagged classrooms, but had too few wrong-to-right answers on the
2009 CRCT to qualify for Target 2021 services, whether or not they remained in APS after the 2015/16
school year.

Evaluation of the impacts of the Target 2021 program have become more difficult over time for
a number of reasons. First, virtually all participants are now in high school, where there are no end-of-
grade exams to assess academic progress in relation to participants scores prior to the start of Target
2021. Also, the CAAS exam, which was used in previous reports, is no longer administered. Third,
there has been a drop in the number of students taking the 9th Grade Literature end-of-course exam
in several APS high schools, making its use as an outcome measure problematic. As a partial
substitute, two outcome measures have been added to this report – scores on the PSAT administered
to 10th graders and college enrollment rates one year after graduation.

1
Like previous reports, the analyses in this report utilize two different statistical techniques, a
“difference-in-differences” approach, which compares the change over time in outcomes for Target 2021
students with the change over time in outcomes for the comparison group, and a “regression
discontinuity” approach which compares outcomes for Target 2021 students who had just enough wrong-
to-right erasures on their 2009 exam to be designated as having their test scores manipulated (and thus
be eligible for services) with students in the comparison group who had slightly less than the requisite
number of wrong-to-right erasures to be classified as having their scores manipulated (and thus were not
offered Target 2021 services).

Over the first five semesters of implementation the results indicate:

• No effect or perhaps a slight negative impact on attendance, depending on the level of confidence
desired.

• No effect on grade point averages in core subjects.


• No effect on the number of courses failed.

• No impact on 9th Grade Literature end-of-course exam scores.

• Either a modest reduction or no impact on PSAT reading scores, depending on the analytical
technique.
• No impact on the likelihood of graduation.
• Either no impact or a negative effect on the likelihood of attending college within one year of
graduation, with results depending on the level of confidence and whether or not statistical controls
are included in the model.

2
Background
Allegations of widespread manipulation of student test scores by Atlanta Public Schools (APS)
teachers and school administrators first became public in 2009. It was alleged that scores on the Spring
2009 administration of Criterion-Referenced Competency Test (CRCT), given to students in grades 1-8,
had been falsified by changing wrong answers to right answers after the exam was given. In early 2010
the Governor’s Office of Student Achievement (GOSA) conducted a statewide analysis of erasures on the
CRCT. Classes were “flagged” based on high numbers of wrong-to-right (WTR) erasures and schools were
categorized based on the proportion of flagged classrooms in the school. 1 Nearly 60 percent of
elementary and middle schools in APS were identified as having 20 percent or more of their classrooms
flagged. Results of the erasure analysis were used by the Georgia Bureau of Investigation (GBI) to select
schools for detailed investigation, which included interviews with school personnel. In over half of these
schools, educators confessed to manipulating test scores. Investigators concluded that systemic
misconduct occurred in over three-fourths of the schools that were investigated in detail. The
investigation also revealed that test manipulation had been going on for some time, perhaps as far back
as 2001 in some schools (Office of the Governor, 2011).

In May 2015 researchers from Georgia State University presented a report to APS (Sass, Apperson
and Bueno, 2015) that analyzed the impacts of test manipulation on subsequent outcomes for students.
Based on the number of WTR erasures on individual exams, relative to the average in a typical year when
manipulation did not occur, the report found that approximately 60 percent of students in flagged
classrooms in 2008/09 likely had their test answers manipulated in one or more subjects on the spring
2009 CRCT exam. Controlling for observable student characteristics, the study compared outcomes for
students whose scores were likely manipulated in flagged classrooms to outcomes for other students in
flagged classrooms who do not appear to have their answers changed ex-post. The report concluded that
manipulation of students’ test answers had negative consequences for later student performance in
reading and English Language Arts (ELA), but not in math. The losses were in the range of 0.06 to 0.14
standard deviations of student achievement or roughly equivalent to one-fourth to one-half of typical
annual learning gains for students. Additional analyses did not uncover any appreciable effects on either
student attendance or the number of student disciplinary incidents.

Following the presentation of the Georgia State research report, APS began to formulate a plan
for assisting students who may have been negatively affected by test score manipulation in 2009 and prior
years. The resulting program was dubbed “Target 2021.” As stated on the APS web site, “The purpose of
the CRCT Remediation and Enrichment Academic Program (Target 2021) is to provide the students who
were impacted by the CRCT score anomalies targeted supports delivered via the development and

1
Classrooms were flagged when the number of WTR erasures was greater than three standard deviations above the
state mean. An adjustment was made for class size by dividing the standard deviation by the square root of the class
size. The state investigation refers to “flagged classrooms,” though they were in fact groups of students who were
administered a given test by a single proctor. The test score administrator was not necessarily the classroom teacher
for the tested subject.

3
implementation of individual learning plans designed to position them towards proficiency for graduation
and equip them with post‐secondary options.”

The Target 2021 program focuses on four student outcomes: attendance, grades, reading
achievement and ultimately graduation. Corresponding to these areas, the program has four specific
goals:

• The overall attendance rate for students served in Target 2021 will increase when compared to the
attendance rate of the same students one year prior.

• The percentage of students failing one or more courses will decrease when compared to the year
prior.
• The reading levels of students will increase at a rate that makes up for the negative effect identified
in the original study.
• The graduation rate of the Target 2021 students eligible to graduate in 2016 will be greater than that
of their academically similar peers.

Based on prior enrollment in a flagged classroom in 2009 and individual WTR erasure counts on
the 2009 CRCT exams, APS created an initial list of students potentially affected by test manipulation and
enrolled in APS during the 2014/15 school year. 2 Of these students, 3,075 were enrolled in APS as of the
first day of the spring 2016 semester (January 6, 2016). Over 99 percent of all students who were offered
Target 2021 services participated in the program; only about 30 students and their families opted out of
the program.

In consultation with parents, the district created individual support plans and established goals
for each participant by January 2016. A variety of services and incentives were subsequently provided to
participants during the spring 2016 semester. These services included academic monitoring, individual
counseling, attendance incentives, ACT/SAT prep, tutoring, behavioral interventions, college/career fairs
and parent workshops. It is not possible to determine exactly which services each student received and
the intensity and timing of those services. Therefore, the following analysis can only gauge the average
effect of the program on participants.

2
APS used a threshold of five or more WTR erasures on either the math, reading or ELA 2009 CRCT exams to
determine eligibility for the Target 2021 program. This is a slightly more lenient threshold than that used to
determine “cheated” students in Sass, Apperson and Bueno (2015). In the Sass, Apperson and Bueno analysis a
student was designated as having been cheated in 2009 if the number of WTR erasures on a given exam exceeded
the number of WTR erasures corresponding to the 95th percentile of the WTR erasure distribution in 2013 (when by
all accounts no test manipulation occurred). The corresponding thresholds were five or more WTR erasures in
reading, five or more WTR erasures in ELA and six or more WTR erasures in math.

4
Sample and Methodology
In order to evaluate the impact of the Target 2021 program on student outcomes, it is important
to compare outcomes for Target 2021 participants to our best estimate of how they would have
performed had they not participated in the program. 3 Schools that received detailed investigations from
the GBI (due to having significant proportions of their classes being flagged for high WTR erasure counts)
tended to be schools that served large proportions of disadvantaged students and that showed low levels
of overall student achievement. Consequently, the analysis is limited to students who were enrolled in
flagged classrooms within investigated schools in 2009. Given the Target 2021 program did not begin
until spring 2016, it is necessary to further limit the sample to students who were enrolled in APS in both
the fall and spring semesters of the 2015/2016 school year. Since we want to measure the full impact of
five semesters of the intervention (except for graduation), the sample is further constrained to students
who were also enrolled in an APS school in 2016/17 and 2017/18. This results in an analysis sample of
2,456 students. As shown in Table 1, students in the analysis sample were primarily enrolled in grades 9-
12 since test manipulation occurred in grades 1-8 in 2008/09. The small proportion of students in grade
eight are students who repeated a grade sometime between 2008/09 and 2017/18.

Table 1: Number of Students in Analysis Sample by Grade Level in 2017/18

Grade Level Total Number of Number of


Number of Target
Students Comparison
2021 Participants
Students

8 7 3 4
9 351 210 141

10 719 399 320


11 642 372 270

12 737 509 228

Total 2,456 1,493 963

As reported by Sass, Apperson and Bueno, students who had few WTR erasures in 2009 tended
to be higher achieving students (as evidenced by them being disproportionally in the top quintile of the
achievement distribution the following year). Consequently, a simple comparison of outcomes for
students receiving Target 2021 services to outcomes for students in flagged classrooms not receiving

3
Throughout the analysis we focus on students who actually participated in the program, i.e. “treated” students.
This could be problematic if there was significant self-selection into the program. To avoid potential bias from self-
selection one could analyze effects on eligible students, including both actual participants and those who chose to
opt-out of the program. Such an “intent-to-treat” analysis is superfluous in the present instance since less than one
percent of eligible students chose not to participate. To verify this, we did replicate the analyses presented in this
report using eligible students rather than participants and the results were nearly identical in all cases.

5
Target 2021 services may not provide an accurate measure of the program’s effectiveness. We therefore
rely on two strategies that allow us to more meaningfully measure the program’s impact: difference-in-
differences and regression discontinuity.

Difference-in-Differences
A simple approach to analyzing the program’s impact would be to compare the change or
“difference” in outcomes for participants before and after receiving Target 2021 services. For example,
consider the potential impact of Target 2021 services on student attendance. One could compare
attendance rates for participants in fall 2015 (before receiving any services) to attendance rates for
the same students in spring 2018 (when they had been receiving intervention services for five
semesters). The advantage of this approach is that one is comparing the outcomes for the same students
at different points in time, thereby avoiding potential bias from making comparisons to other students
who may differ from the treated students in ways that are not observable. The problem with a
simple comparison of outcomes over time is that other things may have been occurring in the district
in spring 2018 that could have boosted attendance for all students, whether or not they participated in
Target 2021.
To avoid falsely attributing changes in attendance to participation in Target 2021, we compare
the difference in attendance between fall 2015 and spring 2018 for Target 2021 participants to the same
difference in attendance over the same time period for students in the comparison group. This sort of
comparison is known as a “difference-in-differences” approach. If changes in attendance were due to
factors unrelated to Target 2021 participation (e.g. students show up less often when the weather is nice),
then attendance would fall in the spring semester for all students, but the difference in Fall-2015-to-
Spring-2018 changes in attendance between participants and non-participants would be zero.

While the difference-in-differences approach mitigates potential bias by analyzing changes over
time in student outcomes (rather than levels) across treated and comparison students, estimates of the
impact of Target 2021 could still be biased if the characteristics of participants are associated with changes
over time in outcomes. For example, suppose that students from low-income households tend to have
bigger drop-offs in attendance during spring than students from more affluent families. Further,
suppose that Target 2021 participants are more likely to be from low-income households than comparison
students. Under this scenario, the true impact of Target 2021 would be understated in the difference-in-
differences analysis.

One important characteristic that affects attendance is the grade a student is enrolled in. For
example, absenteeism tends to be higher in middle school than in elementary school, and high school
freshmen tend to have higher absenteeism than students in grades 10-12. To account for differences in
typical attendance rates across grade levels, we estimate models that include controls for the grade in
which a student is enrolled, as well as controls for demographic characteristics.

To minimize potential bias resulting from student/family characteristics that are associated both
with Target 2021 participation and with changes over time in student outcomes, we also include in our
set of statistical controls that include student gender, race/ethnicity, free/reduced-price lunch status,
Limited English Proficiency status and disability status.

6
Regression Discontinuity
Often students are assigned to educational programs based on how they score on some particular
metric. Some states, for instance, base eligibility for gifted education on IQ scores. Similarly, a student’s
grade point average or performance on a standardized exam may provide a basis for summer school
offerings. The regression discontinuity technique exploits the fact that students who fall just below the
cutoff for program eligibility are going to be nearly identical to those who just exceed the threshold for
participation. For example, if program participation is based on a test score, then, for students near the
test score threshold, participation may depend simply on whether a student happened to guess correctly
on a couple of questions on an exam. If guessing right is purely by chance, the assignment of students
near the cutoff to treatment and control groups would be equivalent to a randomized experiment.

In the present context, the offer of Target 2021 services depended on the number of WTR
erasures on a student’s 2009 CRCT exams. If a student was in a flagged classroom and had five or more
WTR erasures in any of three subject area exams, math, reading and language arts, they were deemed
eligible to receive Target 2021 services. 4 The actual number of WTR erasures on a student’s exam
depended on many factors, including the student’s ability and prior education (which determine how
many questions they initially answered correctly), the choice of a student to change an answer on their
own, the likelihood an educator would select their exam for manipulation ex-post and the questions the
educator chose to correct. While a student with 15 WTR erasures on an exam would likely have had worse
outcomes in the absence of test manipulation than a student with zero or one WTR, a student who had
five WTR erasures in reading and four WTR erasures in math and ELA would likely be no different, on
average, than a student who had four WTR erasures in each of the three subject areas.

Rather than a simple comparison of means for students above and below the cutoff value of WTR
erasures for Target 2021 eligibility, the regression discontinuity approach allows for trends in the outcome
as one moves away from the threshold. Given the relatively few possible WTR values, we simply allow for
a linear trend in the outcome with respect to the number of WTR erasures. We also incorporate
demographic controls in the analysis to further ensure the comparability of students just above and just
below the eligibility cutoff.

The advantage of the regression discontinuity approach, relative to difference-in-differences, is a


smaller chance the results will be biased. By comparing students that are very near a cutoff, the regression
discontinuity approach is more like a true experiment where individuals are randomly assigned to the
treatment and control groups and thus no different on average. This potential gain comes at a cost,
however. Since the regression continuity approach only compares students near the WTR threshold for
receipt of Target 2021 services, it effectively employs a smaller sample and yields estimates that are less

4
The list of Target 2021 students in the analysis sample was compared to WTR erasure data used in Sass, Apperson
and Bueno (2015). All 2,819 Target 2021 participants had five or more WTR erasures on one or more of the 2009
CRCT exams according to the data used by Sass, Apperson and Bueno. However, approximately eight percent of the
comparison group (who did not receive Target 2021 services) had five or more WTR erasures on one or more of the
2009 CRCT exams. We therefore employ a “fuzzy” regression discontinuity design.

7
precise. Put differently, under the regression discontinuity approach it is more likely that one would fail
to reject the null hypothesis that the Target 2021 program had no effect.

Results

Descriptive Statistics
Table 2 presents summary statistics on the characteristics of students in the Target 2021 and
comparison groups. The observable characteristics of the students in the two groups appear to be quite
similar. The only noticeable difference is a somewhat larger proportion of special education students in
the comparison group. The fact that the demographic characteristics of the two groups are similar is not
surprising, given that students in both groups were enrolled in schools investigated for test manipulation
in 2009.

Table 2. Percentages of Student Characteristics for Target 2021 and Comparison Groups,
2017/18

Description Target 2021 Comparison


Students Students

Female 0.498 0.514

Black 0.969 0.963

Hispanic 0.027 0.028


Other Races 0.003 0.007
Free/Reduced-Price Lunch 0.991 0.978

Limited English Proficiency 0.000 0.000


Special Education 0.114 0.145

Attendance
Figure 1 illustrates the attendance rates by semester for students participating in Target 2021 and
the comparison group of students. Attendance drops off for both groups in spring 2018, the fifth semester
following implementation of Target 2021 services. The average reduction in the attendance rate is slightly
larger for the Target 2021 recipients (-1.03 percentage points), but the difference is not statistically
significant at a 95 percent confidence level.

8
Figure 1: Mean Attendance Percentage in Fall 2015 and Spring 2018 by Semester and Target
2021 Status

Attendance by Semester and Target 2021 Status


100

92.9 92.8

81.7 80.6
Percentage of Days Attended
20 40 0 60 80

Not Target 2021 Target 2021


Fall 2015 Spring 2018

Figure 2 presents a visual representation of the regression discontinuity analysis of attendance


outcomes. The dark blue dots represent the average attendance percentage for students with a given
level of WTR erasures on their 2009 exam and the red bars represent a 95 percent confidence interval
around those sample means. The black lines are simply linear trends across WTR erasure levels above
and below the threshold for Target 2021 eligibility. From the figure, it is clear that students above and
below the threshold of five or more erasures (in either ELA, math or reading on the 2009 CRCT exam),
have similar attendance rates in spring 2018.

9
Figure 2: Regression Discontinuity Analysis of Attendance Rates in Spring 2018

RD Analysis of Attendance Spring 2018


90
Percentage of Days Attended
80 75 85

0 1 2 3 4 5 6 7 8 9 10
Maximum WTR Erasures

Average at Each Erasure Level Line of Best Fit

Numerical results from the regression discontinuity model, as well as for the alternative
specifications of the difference-in-difference model (with and without controls). The estimated impacts
range from -1.18 to 1.89, and in all but one case (difference-in-differences with controls and a 90%
confidence level) we cannot reject the null hypothesis that the effect of Target 2021 participation on
attendance is zero.

Table 3. Estimated Effects of Target 2021 Participation on Attendance Percentage

Model Estimated Effect Different from Zero Different from Zero


at a 90% at a 95%
Confidence Level? Confidence Level?

Difference-in-Differences without
-1.033 No No
Controls
Difference-in-Differences with
-1.178 Yes No
Controls
Regression Discontinuity without
1.886 No No
Controls

10
Grade Point Average
The available data on course grades are limited to high school students. Only “core” classes, those
that are in the subject areas used by the University System of Georgia to calculate high school grade point
averages for college admission purposes, were included in the analysis. Grades are measured on a 100-
point scale. As illustrated in Figure 3, high-school grade point averages for both the Target 2021
participants and the comparison group increased slightly from fall 2015 to spring 2018. The difference in
GPA changes is not statistically significant.

Figure 3: Mean Core GPA in Fall 2015 and Spring 2018 by Semester and Target 2021 Status

HS Core GPA by Semester and Target 2021 Status


77.7 78.3
80

76.9 76.2
60
Core GPA
40
20
0

Not Target 2021 Target 2021


Fall 2015 Spring 2018

The regression discontinuity analysis produces similar results. As depicted in Figure 4, the grade
point averages for students just above the WTR erasure threshold are similar to the grade point averages
for students just below the cutoff (who did not receive Target 2021 services). As shown in the summary
of results presented in Table 4, controlling for student characteristics does not significantly alter the
difference-in-differences results. In all cases we cannot reject the conclusion that Target 2021
participation in spring 2016, school year 2016/17 and school year 2017/18 had no effect on students’
grade point average in core academic subjects.

11
Figure 4: Regression Discontinuity Analysis of HS Core GPA in Spring 2018

RD Analysis of HS Core GPA Spring 2018


85
80
Core GPA
75
70
65

0 1 2 3 4 5 6 7 8 9 10
Maximum WTR Erasures

Average at Each Erasure Level Line of Best Fit

Table 4. Estimated Effects of Target 2021 Participation on HS Core GPA (100-point scale)
Model Point Estimate Different from Zero Different from Zero
at a 90% Confidence at a 95% Confidence
Level Level

Difference-in-Differences without
1.296 No No
Controls
Difference-in-Differences with all
0.690 No No
Controls
Regression Discontinuity without
1.997 No No
Controls

Number of Courses Failed


For the sub-sample of high school students, a student was deemed to have failed a course based
on their 100-point-scale numeric semester grade. A numeric grade below 70 is considered failing. This
includes courses in math, ELA, science, social studies and foreign languages.

Figure 5 depicts the mean number of core courses failed in fall 2015 and spring 2018 for Target
2021 recipients and for the comparison group. The number of course failures for both groups are

12
substantially lower after five semesters of Target 2021 implementation, but the reduction is larger for
Target 2021 students. The difference in the change over time in the number of course failures, 0.02, is
not statistically different from zero at the 95 percent confidence level, however.

Figure 5: Mean Number of “Core” Courses Failed in Fall 2015 and Spring 2018 by Semester
and Target 2021 Status

HS Courses Failed by Semester and Target 2021 Status


.8

0.73 0.73
Number of Courses Failed
.6

0.46
0.44
.2 0.4

Not Target 2021 Target 2021


Fall 2015 Spring 2018

Results from the regression discontinuity analysis of course failure are depicted in Figure 6. The
trend line for Target 2021 participants (those with five or more erasures) nearly meets the trend line for
the comparison group, indicating that receipt of Target 2021 services did not significantly affect the
number of courses failed for students near the cutoff for Target 2021 participation.

13
Figure 6: Regression Discontinuity Analysis of Number of “Core” Courses Failed in Spring
2018

1.2 RD Analysis of HS Courses Failed Spring 2018


Number of Courses Failed
.4 .6 .2 .8 1

0 1 2 3 4 5 6 7 8 9 10
Maximum WTR Erasures

Average at Each Erasure Level Line of Best Fit

Numerical results from the difference-in-differences and the regression models of course failure
are summarized in Table 5. In no case do the estimates of the impact of Target 2021 participation differ
from zero at even a 90 percent confidence level, indicating the Target 2021 did not have a significant
impact on course failure in spring 2018.

14
Table 5. Estimated Effects of Target 2021 Participation on Number of “Core” Courses Failed

Model Estimated Effect Different from Zero Different from Zero


at a 90% Confidence at a 95% Confidence
Level? Level?

Difference-in-Differences without
-0.021 No No
Controls
Difference-in-Differences with all
0.048 No No
Controls
Regression Discontinuity without
-0.178 No No
Controls

Reading Test Scores


Since one of the Target 2021 goals was to improve reading achievement, measures of student
performance in reading were sought out. During the era of test manipulation, statewide achievement
exams (the Criterion Reference Competency Test or CRCT) covered fives subject areas, including reading.
When the state moved to the Milestones assessment in 2014/15, reading was no longer tested as a subject
area separate from English Language Arts. Thus, we cannot directly track the impact of Target 2021 on
reading achievement using statewide end-of-grade exams. Likewise, for a time APS administered a
computer adaptive assessment (CAAS) in reading (other subject areas are tested as well). The CAAS exam
is no longer administered, however. We can utilize two indirect measures, performance on the 9th grade
literature end-of-course exam and scores on the Reading component of the PSAT, which is administered
districtwide to APS students in 10th grade.

End-of-Course 9th Grade Literature Score


Given that most Target 2021 students are now in high school and end-of-grade assessments do
not go past 8th grade, the closest we can get to a state assessment of reading performance are scores on
the statewide end-of-course test (EOCT) for 9th-grade Literature. Clearly, this is an imperfect measure at
best, since reading is not the focus of the literature course and test scores reflect competencies in areas
other than reading. Another limiting factor is that the tests are only given once (almost exclusively in 9th
grade) and therefore cover a small segment of the overall analysis sample. In addition, there is not a
directly comparable EOCT score prior to 9th grade. Therefore, it is necessary to use scores on the EOG in
English Language Arts exam as a baseline for the difference-in-differences analysis of 9th-grade Literature
exam scores. In order to avoid including grade repeaters, we limit the sample to students who took an
ELA end-of-grade exam the prior year in 8th grade. 5

5
These sample restrictions produce an estimation sample of only 98 students. Even without the prior-year-end-of-
grade-exam restriction, sample sizes for 9th Grade Lit. fall dramatically from 2017 to 2018. Analysis of GOSA
school-level data indicates that some APS High Schools (particularly those in high poverty areas) had significantly

15
The difference-in-differences analysis results are depicted in Figure 7. Scores on the 9th grade
Literature exam (expressed as percentiles of the statewide test score distribution) fell between spring
2015 and spring 2018. The drop was slightly larger for the Target 2021 participants (by 0.46), but the
difference between changes in test scores over time for Target 2021 participants and comparison students
is not significantly different from zero.

Figure 7: Mean 9th Grade Literature Percentile Score in 2018 and Mean EOG ELA Percentile
Score in 2015 by Target 2021 Status

9th Grade Literature by Year and Target 2021 Status


20

16.44
15.41
15

12.98
State Percentile

12.40
105
0

Not Target 2021 Target 2021


2015 EOG 2018 EOCT

Results from the regression discontinuity analysis, shown in Figure 8, do not reveal any significant
impact of Target 2021 participation on 9th grade Literature EOCT scores. The trend lines are close to one
another near the threshold, and the difference is much smaller than the confidence bans for the groups
just above and below the cutoff, indicating no statistically significant differences.

fewer test takers and lower scores in 2018 (e.g. Carver drops from 223 test takers in 2017 to 161 test takers in
2018; Douglas dropped from 346 test takers in 2017 to 45 test takers in 2018. As a result, the mean state
percentile for both the Target-2021 and Not-Target-2021 groups are much lower than in prior years.

16
Figure 8: Regression Discontinuity Analysis of 9th Grade Literature EOCT Percentile

RD Analysis of 9th Grade Lit. Score Spring 2018


25
20
Percentile Score
10 5
0 15

0 1 2 3 4 5 6 7 8 9 10
Maximum WTR Erasures

Average at Each Erasure Level Line of Best Fit

Results from all of the estimated models of 9th grade Literature EOCT scores are summarized in
Table 6. For each of the difference-in-differences specifications we find a larger drop in 9th-grade
literature scores for Target 2021 participants compared to students not receiving services. This difference
is significant at the 90 percent confidence level but not at the 95 percent level. Neither of the
regression discontinuity specifications provide evidence that participation in Target 2021 during the five
semesters of Target 2021 implementation had a significant impact on student performance on
the 9th grade Literature exam.

Table 6. Estimated Effects of Target 2021 Participation on 9th Grade Literature Percentile

Model Estimated Effect Different from Zero Different from Zero


at a 90% Confidence at a 95% Confidence
Level? Level?

Difference-in-Differences without
-0.464 No No
Controls
Difference-in-Differences with all
-0.956 No No
Controls
Regression Discontinuity without
1.891 No No
Controls

17
10th Grade PSAT Reading Score
One other measure of reading performance is the score of the reading component of the PSAT
test that is administered in the fall to virtually all 10th graders in APS. The scores for the fall 2018 exam
are not yet available, so it was only possible to assess performance on the fall 2016 and fall 2017 exams.
The fall 2016 exam was administered about one and one-half semesters after the initiation of Target 2021
and the fall 2017 exam was administered three and one-half semesters after the start of Target 2021.
Given the PSAT is administered districtwide to 10th graders, the fall 2016 analysis includes students who
were in 9th grade at the start of Target 2021 in spring 2016 while the fall 2017 analysis primarily covers
students who were in 8th grade when the Target 2021 program began.

PSAT reading scores for Target 2021 participants and the comparison group of students on the
fall 2016 exam are depicted in Figure 9 and scores for the fall 2017 exam appear in Figure 10. For both
time periods, scores for Target 2021 participants are lower than for the comparison group. Given the
exam is only administered to all students in 10th grade, a difference-in-differences analysis is not possible.

Figure 9: Mean PSAT Reading Score for 10th Grade Students in Fall 2016 by Target 2021 Status

PSAT Reading Score - 10th Grade Fall 2016


408.58
400

391.71
300
PSAT Reading Score
200100
0

Not Target 2021 Target 2021

18
Figure 10: Mean PSAT Reading Score for 10th Grade Students in Fall 2017 by Target 2021
Status

PSAT Reading Score - 10th Grade Fall 2017


405.52
400

384.32
300
PSAT Reading Score
200
100
0

Not Target 2021 Target 2021

Results from the regression discontinuity analysis, shown in Figures 11 and 12, do not reveal any
significant impact of Target 2021 participation on 10th grade PSAT scores. Percentile scores generally
decrease with the number of WTR erasures, both below and above the WTR threshold for Target 2021
eligibility. The trend lines are close to one another near the threshold, and the difference is much smaller
than the confidence bans for the groups just above and below the cutoff.

19
Figure 11: Regression Discontinuity Analysis of PSAT Reading Scores of 10th Grade Students in
Fall 2016

460 440 RD Analysis of PSAT Reading Score G10 Fall 2016


PSAT Reading Score
380 400 360420

0 1 2 3 4 5 6 7 8 9 10
Maximum WTR Erasures

Average at Each Erasure Level Line of Best Fit

20
Figure 12: Regression Discontinuity Analysis of PSAT Reading Scores of 10th Grade Students in
Fall 2017

450 RD Analysis of PSAT Reading Score G10 Fall 2017


PSAT Reading Score
400350

0 1 2 3 4 5 6 7 8 9 10
Maximum WTR Erasures

Average at Each Erasure Level Line of Best Fit

Results from all of the estimated models of 10th grade PSAT scores are summarized in Table 7. For
each of the simple comparisons of PSAT scores, the scores for Target 2021 participants are below those
of students in the comparison group. This differences are significant at the 95 percent confidence level.
In contrast, the regression discontinuity analysis does not provide any evidence that participation in
Target 2021 had a significant impact on student performance on the 10th grade PSAT reading exam.

21
Table 7. Estimated Effects of Target 2021 Participation on PSAT Reading Scores for 10th Grade
Students, Fall 2016

Model Estimated Effect Different from Zero Different from Zero


at a 90% Confidence at a 95% Confidence
Level? Level?

Simple Difference, without Controls -16.873 Yes Yes


Simple Difference, with all Controls -17.610 Yes Yes
Regression Discontinuity without
7.102 No No
Controls

Table 8. Estimated Effects of Target 2021 Participation on PSAT Reading Scores for 10th Grade
Students, Fall 2017

Model Estimated Effect Different from Zero Different from Zero


at a 90% Confidence at a 95% Confidence
Level? Level?

Simple Difference, without Controls -21.195 Yes Yes


Simple Difference with all Controls -18.326 Yes Yes
Regression Discontinuity without
7.776 No No
Controls

Graduation
The analysis of the impact of Target 2021 participation on the likelihood of graduation is
limited to students who would've been on track for graduation at the end of spring 2018, i.e. students
who were enrolled in 10th grade when the Target 2021 program began in the 2015-16 school year. As
noted in Table 9, this includes 945 students who were sophomores in 2015/15 (653 Target 2021
students and 292 students from the comparison group). As with the PSAT analysis, since there is
only a single outcome (graduation by the end of the 2017-18 school year), a difference-in-
differences analysis cannot be conducted; only the simple difference in graduation between Target
2021 students and the comparison group can be analyzed. Instead, past performance is taken into
account by including prior exam scores in the prediction of eventual graduation. Graduation was defined
as receipt of a regular high school diploma. Students receiving certificates of completion or special
education diplomas were treated as not graduating.

As shown in Figure 13, the graduation rates of Target 2021 and comparison students were nearly
identical. It is important to note that the graduation rates are conditional on being enrolled in 10th grade

22
in 2015/16. They are not the four-year cohort graduation rates of high school freshman that are typically
reported.

Table 9. Number of Students in Graduation Analysis Sample by Treatment Status

Total Number of Target 2021 Participants Comparison Students


Students

945 653 292

Figure 13: Mean Percentage of Students Who Were in Grade 10 in 2015-16 and Received a
Regular High School Diploma by the end of the 2017-18 School Year by Target 2021 Status

2018 Graduation by Target 2021 Status


80

72.60 72.43
60
Percentage of Students
4020
0

Not Target 2021 Target 2021

Table 10 summarizes the results from the graduation models. 6 In the simple-difference
specifications the estimated impacts range from -0.17 to 0.80 percentage points. These small differences

6
Although it possible to generate numerical estimates from the regression discontinuity model, the statistical
program failed to generate the relevant graph.

23
are indistinguishable from zero. Likewise, the regression discontinuity analysis finds no significant
differences in graduation between the Target 2021 participants and the comparison group.

Table 10. Estimated Effects of Target 2021 on Probability of Receipt of Standard HS Diploma (in
Percentage Points)

Model Estimated Effect Different from Zero Different from Zero


at a 90% Confidence at a 95% Confidence
Level? Level?

Simple Difference (Binary Probit)


-0.168 No No
without Controls
Simple Difference (Binary Probit) with
0.796 No No
Controls
Regression Discontinuity without
-2.968 No No
Controls

College Enrollment
As an additional measure of long-run impacts, an analysis of college enrollment was performed.
This is an imperfect measure of student success, since it does not account for students who acquire
marketable skills in high school and successfully transition into the workforce immediately after
graduation. College enrollment is defined as being enrolled in a post-secondary institution in either the
fall or spring of the 12-month period following high school graduation. As such the sample only includes
students who earned a high school diploma. Thus, unlike the graduation statistics, the college attendance
rates are contingent on continued enrollment in APS (and thus graduation from an APS high school). Thus
dropouts and students who transfer out of APS are excluded from both groups. Given that the most
recent data for post-secondary enrollment are for the 2017/18 academic year, the analysis sample is
comprised of students who were enrolled in grade 11 in the first year of Target 2021 implementation in
2015/16.

Both the simple-difference analysis, illustrated in Figure 14, and the regression discontinuity
analysis, depicted in Figure 15, suggest college enrollment rates were lower for Target 2021 participants.
However, the numerical results presented in Table 11 indicate these differences are not significant at a
95 percent confidence level.

24
Figure 14. Estimated Effects of Target 2021 on Probability of College Enrollment Within One
Year of Receipt of HS Graduation (in Percentage Points) – Students in Grade 11 in 2015/16

College Enrollment by Target 2021 Status


60.48
60

52.21
Percentage of Students
20 0 40

Not Target 2021 Target 2021

25
Figure 15. Regression Discontinuity Analysis of Estimated Effects of Target 2021 on College
Enrollment Within One Year of Receipt of HS Graduation (in Percentage Points) – Students in
Grade 11 in 2015/16

RD Analysis of College Enrollment


.8
Percent of Students in PostSecondary
.4 .5.3 .6 .7

0 1 2 3 4 5 6 7 8 9 10
Maximum WTR Erasures

Average at Each Erasure Level Line of Best Fit

Table 11. Estimated Effects of Target 2021 on Probability of College Enrollment Within One
Year of HS Graduation (in Percentage Points)

Model Estimated Effect Different from Zero Different from Zero


at a 90% Confidence at a 95% Confidence
Level? Level?

Simple Difference (Binary Probit) -8.268 Yes No


without Controls
Simple Difference (Binary Probit) with
-4.776 No No
Controls
Regression Discontinuity without
-20.493 Yes No
Controls

Summary and Recommendations


The Target 2021 program provided individualized support programs for students who may have
been affected by test manipulation on the 2009 CRCT exams. A variety of services were provided from
February 2016 through the end of the 2017/18 school year. This analysis compared outcomes for Target
2021 participants with a comparison group of students who had been in schools investigated for test

26
manipulation and in classrooms with unusually high levels of WTR erasures, but who individually had
relatively few WTR erasures on their own 2009 CRCT exams.

Two primary methods were employed, a difference-in-differences approach that compared


changes in performance over time for Target 2021 participants with changes over time for students in the
comparison group, and a regression discontinuity approach in which outcomes for students just above
the WTR threshold for participation were compared to outcomes for students just below the WTR cutoff
for receiving Target 2021 services. In most cases, there were no significant differences in outcomes
between the two groups, indicating that participation in the Target 2021 program had little significant
impact on measured student outcomes.

When interpreting the findings, there are several important factors to consider. First, by now
nearly all the students who may have been affected by test score manipulation in 2009 are in high school,
where it is hard for even the best designed and implemented interventions to have substantial effects.
Second, some of the outcomes targeted by the initiative could not be measured well. In particular, while
raising reading achievement was a stated goal, reading-specific exams are no longer administered
statewide in Georgia.

Moving forward, due to the dwindling numbers of Target 2021 participants and their enrollment
in high school grades, the value of continued annual evaluations is low. It is recommended that further
analysis of program impacts be limited to a final evaluation at the end of the 2021 school year

27
References
Office of the Governor (2011). “Special Investigators’ Report to the Governor,” unpublished report.
Sass, Tim R. Jarod Apperson and Carycruz Bueno (2015). “The Long-Run Effects of Teacher Cheating on
Student Outcomes,” unpublished manuscript.

28

You might also like