You are on page 1of 18

Classification

of Schools by Academic Achievement Measures

Classification of Schools By Academic Achievement Measures


Kyle N. Payne Group 3 1

Classification of Schools by Academic Achievement Measures Stat 448 Final Project Kyle N. Payne INTRODUCTION In many applications, it makes logical and practical sense to dichotomize continuous variables. In terms of academic performance in educational policy, we could practically describe academic performance in terms of high academic achievement and low academic achievement. While it is reasonable to assume that in dichotomizing continuous variables causes a considerable loss in information (Cohen, 1983) we can also reflect upon the considerable ease of the interpretation in a dichotomy, and how this could help lawmakers, policy specialists, etc. in the development of suitable educational policy. From an applied perspective, also it is logical to investigate the extent that demographic variables predict the classification of schools in terms of academic achievement, and such is the subject of the following analysis. The data set under study consists of math and reading scores from standardized tests administered annually to 3rd and 5th graders in the state of Illinois, as well as several demographic and economic variables. The standardized test in question, the Illinois Standard Achievement Test or ISAT is intended to assess individual student achievement relative to Illinois Learning Standards. The data set contains data for cohorts of students measured at both 3rd and 5th grade from 1999 - 2011. Measurements are at the school level, with averages taken across students. The entire dataset consists of 69466 observations across 109 variables, of which 10 were created over the course of the analysis. These variables consist of coding variables, and averages of other variables across similar groups (like 3rd, and 5th grade). The cohort 1 data (training set) consists of 1783 observations across 109 variables, as does the cohort 2 data (test set). The data was compiled by faculty and staff at the University Of Illinois department of Labor and Employment Relations. Note that some analyses are placed in the appendix for ease of reading. METHODS For my analysis, I chose to use a quadratic discriminant function analysis to model the class membership of elementary schools in Illinois into two dichotomous classes, schools that obtain High Academic Achievement (HAA), and those that obtain Low Academic Achievement (LAA). The criterion for either is decided in advance, i.e. for cohort 1, the data are coded 0 for LAA or 1 for HAA based on if the proportion of students that exceeded expectations in ISAT scores (averaged across math and reading and grades for each school) is above or below 15% respectively. The scale for each grade and test subject were equal, which allowed for easy averaging across grade 3, 4, and 5 for each school, as well as for the two test types. The test scores are standardized, meaning that all schools are assessed in the same manner, such that the test scores are relative to an Illinois state standard. The discriminant analysis was performed using the SAS 9.2 and SAS 9.3 platforms with the stepdisc and discrim procedures. I considered cohort 1 as the training set, and used a stepwise model selection procedure in order to select the appropriate model out of a space of possible 2

Classification of Schools by Academic Achievement Measures models. Predictors selected are general demographic variables of interest, including the average number of low-income students per school, student teacher ratio, etc. For fitting the discriminant function, the variable that is the classification is dependent on is academ_achieve, the proportion of students that exceed expectations on the ISAT averaged across math and reading and grade 3, and 5. The coding variable AA is of the form = { 0 < .15, 1 .15} This is a measure of the average school-wise score on the ISAT. While each class is not multivariate normally distributed, the quadratic discriminant function is relatively robust to non-normality. However to address the relative performance of the discriminant analysis to other methods, I have also used a logistic regression to model the probability schools being assigned to the two classifications. This secondary analysis was done using the SAS 9.2 platform with the logistic procedure. RESULTS Section 1 The stepdisc procedure was initially utilized for the following predictors: avg_stud_lowincome The average number of low income students per school chronic_truant_rate The average proportion of chronic truancy per school avg_dist_tch_salary The average teacher salary per district avg_perc_dist_tch_badegree The average percent of teachers with bachelors degrees per district avg_perc_dist_tch_madegree - The average percent of teachers with masters degrees per district bamaxpay_sched - The bachelors degree maximum pay schedule per school mamaxpay_shed - The masters degree maximum pay schedule per school The procedure was carried out with a .05 selection level and .05 significance level. Table 1.1 below demonstrates the first part of the analysis, in which the predictors are entered into the model based upon their significance.

Classification of Schools by Academic Achievement Measures


Statistics for Entry, DF = 1, 1708 R- Squar Toleranc e F Value Pr > F e

Variable avg_stud_lowincome chronic_truant_rate avg_dist_tch_salary avg_perc_dist_tch_badegree avg_perc_dist_tch_madegree bamaxpay_sched mamaxpay_sched

0.534 1961.05 <.000 5 1 0.127 250.52 <.000 9 1 0.000 1 0.013 7 0.015 1 0.11 0.741 2 23.70 <.000 1 26.16 <.000 1

1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

0.100 191.67 <.000 9 1 0.001 3 Table 1.1 2.26 0.132 6

avg_stud_lowincome, chronic_truant_rate, avg_perc_dist_tch_badegree, avg_perc_dist_tch_madegree, are statistically significant at the 0.05 level. We can see that the variable that makes up the vast majority of the variance explained in the model is avg_stud_lowincome. avg_stud_lowincome is significant when entered into the model, and we can also see that the multivariate statistics below indicate improvement over the null model.
Multivariate Statistics Statistic Wilks' Lambda Pillai's Trace Average Squared Canonical Correlation Value F Value Num DF Den DF Pr > F

0.4655 1961.0 15 5 0.5344 1961.0 85 5 0.5344 85


Table 1.2

1 1

1708 <.000 1 1708 <.000 1

However, it is seen in table 1.3 upon the second step of the stepwise selection process, that all other terms have dropped below any practical significance in ! : 4

Classification of Schools by Academic Achievement Measures


Statistics for Entry, DF = 1, 1707 Partial R- Toleranc Square F Value Pr > F e

Variable chronic_truant_rate avg_dist_tch_salary avg_perc_dist_tch_badegree avg_perc_dist_tch_madegre e bamaxpay_sched mamaxpay_sched

0.0020 0.0142 0.0072 0.0069 0.0035 0.0009

3.41 0.065 2 24.60 <.000 1 12.31 0.000 5 11.87 0.000 6 5.96 0.014 7 1.60 0.206 5

0.7956 0.9853 0.9934 0.9918 0.7670 0.9939

Table 1.3 Therefore, while the stepwise process finishes after 4 steps with the significant predictors below in table 1.4, we can effectively call into question the practical significance of the other predcitors given the very small partial ! square values. Stepwise Selection Summary Ste p 1 2 3 4 Numbe r In Entered Remove d Partial R- Square F Value Pr > F Wilks' Pr < Lambda Lambda

1 avg_stud_lowincome 2 avg_dist_tch_salary

0.5345 1961.0 <.000 0.46551 <.0001 5 1 497 0.0142 0.0057 0.0068


Table 1.4

24.60 <.000 0.45890 <.0001 1 121 9.80 0.001 0.45628 <.0001 8 123 11.70 0.000 0.45317 <.0001 6 070

3 avg_perc_dist_tch_bade gree 4 avg_perc_dist_tch_mad egree


Thus, I fit the discriminant function with only the avg_stud_lowincome variable as a predictor. The discrim procedure was utilized, with the classification performed on the coded variable AA = {0 for LAA, 1 for HAA}.

Classification of Schools by Academic Achievement Measures


Class Level Information Variabl e AA Name 0 _0 1 _1 Prior Frequenc Proportio Probabilit y Weight n y

842 842.00 0.472238 0.500000 00 941 941.00 0.527762 0.500000 00 Table 1.5

The discrimination resulted in a near 50/50 discrimination of the data, with a roughly 47% of the schools in the LAA category and 53% in the HAA category. As seen in the table 1.7, that the overall classification error rate is 16.11, which consists of a 0.2138 misclassification for the LAA class and 0.1084 misclassification rate for the HAA class . Number of Observations and Percent Classified into AA From AA LAA HAA Total Priors LAA 662 78.62 102 10.84 764 42.85 0.5 Table 1.6 HAA 180 21.38 839 89.16 1019 57.15 0.5 Total 842 100.00 941 100.00 1783 100.00

Classification of Schools by Academic Achievement Measures

Error Count Estimates for AA Rate LAA HAA Total

0.213 0.108 0.161 8 4 1 0 Table 1.7 0

Priors 0.500 0.500

Refitting the model with proportional priors, I received the same results of non- homogenous variance between the two groups, and therefore the quadratic discriminant function analysis was used, as seen in Table 1.8. The MANOVA results are similar to the non-proportional prior analysis (Table 1.9).
Chi-Square DF Pr > ChiSq

177.13229 1 9 Table 1.8

<.0001

Multivariate Statistics and Exact F Statistics S=1 M=-0.5 N=889.5 Statistic Wilks' Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root Value F Value Num DF Den DF Pr > F

0.47971 1931.6 133 5 0.52028 1931.6 867 5 1.08458 1931.6 699 5 1.08458 1931.6 699 5 Table 1.9

1 1 1 1

1781 <.000 1 1781 <.000 1 1781 <.000 1 1781 <.000 1

The use of proportional priors increased the misclassification rate for the LAA class and decreased the misclassification rate for the HAA class. However these changes were very slight. The analysis with proportional priors resulted in a very slight increase in the misclassification rate at 0.1621 (Table 1.11).

Classification of Schools by Academic Achievement Measures


Number of Observations and Percent Classified into AA From AA LAA HAA Total Priors LAA HAA Total

652 77.43 99 10.52 751 42.12

190 22.57 842 89.48 1032 57.88

842 100.00 941 100.00 1783 100.00

0.47224 0.52776 Table 1.10


Error Count Estimates for AA Rate Prior s LAA HAA Total

0.22 0.10 0.162 57 52 1 0.47 0.52 22 78 Table 1.11

The cross-validated error rate estimates are slightly higher than the resubstitution rates (table 1.12), which are typically less accurate.
Cross Validated Error Count Estimates for AA Rate Prior s LAA HAA Total

0.22 0.10 0.162 57 63 6 0.47 0.52 22 78 Table 1.12

Because the purpose of the discriminant analysis is to be able to use the training set data to classify future data, I viewed cohort 1 data as a training set, and used cohort 2 data as a test set. While neither data set is completely randomly sampled, we can view cohort 2 as test set for classification under the assumption that there is no distinct non-stochastic difference in the amount of low-income students, and ISAT test scores. Therefore, using the cohort 1 data as the training set with proportional 8

Classification of Schools by Academic Achievement Measures priors, the result of the classification of cohort 2 is shown in table 1.13 below. We can see that a larger proportion of cohort 2 is classified into the HAA class compared with cohort 1.
Number of Observations and Percent Classified into AA Total LAA HAA Total

762 43.12 Table 1.13

1005 1767 56.88 100.00

Priors 0.47224 0.52776

Due to the univariate nature of the discriminant analysis, we can also view the classification visually. Figure 1.1 describes the predicted probability of being classified into the HAA group as a function of the average number of low-income students per school. The blue represents the HAA class, and red represents the LAA class.

Figure 1.1 Reviewing the assumptions for quadratic discriminant analysis, it is clear that there are several violations in this particular analysis. The distributions of the average number of low-income students for the LAA and HAA classes are both highly non-normal (figure 1.2), which is a consequence of splitting the data into the two classes. However, I proceeded in the face of this because not all violations of assumptions are equally detrimental, while some make an analysis completely invalid, some only affect the precision and accuracy of the analysis to a degree. The robustness of LDA and QDA to violations of normality has been investigated in (Sever, Lajovic & Rajer, 2005). The results of (Sever, Lajovic & Rajer, 2005) 9

Classification of Schools by Academic Achievement Measures indicate that the largest effect of non-normality on the discriminant analysis is the increased bias of error count estimates. Skewness in distribution appears to have little to no effect on the discriminant analysis using LDA or QDA.
AA=0
25
20.0 17.5 15.0

AA=1

20

Percent

Percent

15

12.5 10.0 7.5 5.0 2.5

10

0 0 25 50 75 100 125 150 175 200 225 250 275 300

0 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

Figure 1.2 Section 2 Because the classification scheme under study involves classifying data into dichotomous classes, I also used logistic regression of the average number of low- income students per school onto the log odds of said school being classified in the either of the AA classes. Logistic regression is competitive with discriminant analysis for classification because of its relatively small set of assumptions, and thus the non-normality of the classes is not a violation. The generalized logit link function was utilized as suggested in (Der & Everitt, 2002) due to the ordinal nature of the scale of the response. The test of the global null hypothesis (table 2.1) and the MLE estimates (table 2.2) are all significant. The asymptotic Wald Chi-Square value should be precise due to the large sample size.
Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Chi-Square DF Pr > ChiSq

avg_stud_lowincome

avg_stud_lowincome

1134.8846 1 927.6747 1

<.0001 <.0001 <.0001

509.1861 1 Table 2.1

10

Classification of Schools by Academic Achievement Measures


Analysis of Maximum Likelihood Estimates Estimat AA DF e 1 Standar d Wald Error Chi-Square Pr > ChiSq

Parameter Intercept

1 2.9648

0.1345 485.7581

<.0001 <.0001

avg_stud_lowincom 1 e

1 -0.0234 0.00104 509.1861

Table 2.2 The odds ratio estimate for the average number of low-income students on HAA is equal to 0.977 (Table 2.3). This implies that an increase in low-income students per school is more likely in the LAA class.
Odds Ratio Estimates Point Estimat 95% Wald e Confidence Limits

Effect

AA

avg_stud_lowincom 1 e

0.977

0.975

0.979

Table 2.3 Viewing diagnostics (figure 2.1, 2.2), it is clear that there are no obvious violations of assumptions of homogeneity of residual variance. However, we do see that the classes are completely separated in their residuals, which is likely due to the artificial-ness of the classification scheme.

Figure 2.1

11

Classification of Schools by Academic Achievement Measures

Figure 2.2 Due to the univariate nature of the analysis, we can also view the logistic regression in terms of average number of low-income students on the probability of a school being classified as a HAA school. Figure 2.3 describes the predicted probability of a school being classified into the HAA class by the average number of low-income students per school.

Figure 2.3

We can also view measures of the association of predicted probabilities and the observed response. The percent concordant is the percent of responses that have a predicted mean score that also exists in the same class. The c-c measure is an adjustment on the ROC c measure. It ranges from 0.5 to 1, where 0.5 reflects a model

12

Classification of Schools by Academic Achievement Measures randomly predicting the response, and 1 perfectly classifying the response (table 2.4). It appears as if the classification is relatively accurate.
Association of Predicted Probabilities and Observed Responses Percent Concordant Percent Discordant Percent Tied Pairs

90.8 Somers'
D

0.81 8 0.81 9 0.40 8 0.90 9

9.1 Gamma 0.1 Tau-a

Section 3 In comparing the two models it is clear that the discriminant analysis may give relatively biased predictions when compared to the logistic regression. This reflects the possible bias of the model due to the violations of normality. While the two models do deviate from each other in their predictions of the probability of being classified into the HAA class, the two models are roughly similar (Figure 3.1).

7923 c-c 22 Table 2.4

Figure 3.1

13

Classification of Schools by Academic Achievement Measures Conclusion From the two analyses, we can paint a very convincing picture: The average number of low-income students per school is associated with decreases in the probability of said school being classified as into the High Academic Achievement class. Both models predict that schools with high number of low-income students have a high probability of being classified as LAA, and therefore the models predict that those schools have a lower number of students that exceed expectations on ISAT scores. Not only did the Average Number of Low-Income Students per school classify schools well, it did so above any other demographic predictor. The model selection process described in section 1 of the results section is evidence towards this point, as avg_stud_lowincome had a partial ! = 0.5345. This could provide a useful perspective to budgetary decisions, as the average number of low-income students explained much more variance then the average teacher salary per district (Although this is a messy comparison as there is variance in average teacher salary within a district). While this effect size may seem relatively small, it is actually quite high with regard to effects sizes commonly expected in social science. This also speaks to the general noisey-ness of the data. Further analysis could look at the relative performance of the discriminant model across each of the cohorts, or using a more sophisticated multivariate regression model where ISAT scores for math and reading are multiple responses. Other types of classification schemes could also be performed on the data, such as K-Means clustering, non-parametric discriminant analyses, etc. 14

Classification of Schools by Academic Achievement Measures Reference Cohen, J. (1983). Cost of dichotomization. Applied Psychological Measurement, 7(3), 249-250.

Der, G. & Everitt, B. S. (2002). A handbook of statistical analyses using sas. (2nd ed., p. 292). Boca Raton, FL: Chapman & Hall/CRC Sever, M., Lajovic, J., & Rajer, B. (2005). Robustness of the fishers discriminant . Metodoloki zvezki,2(2), 239-242.
15

Classification of Schools by Academic Achievement Measures Appendix: A1. Some univariate results for avg_stud_lowincome: LAA:
Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 842 Sum Weights 205.1981 Sum Observations 84.2103863 Variance -0.6552029 Kurtosis 41417329.4 Corrected SS 41.0385799 Std Error Mean 842 172776.8 7091.38915 -0.8303315 5963858.28 2.90208156


Basic Statistical Measures Location Mean Median Mode Goodness-of-Fit Tests for Normal Distribution Test Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling D W-Sq A-Sq Statistic 0.1670009 Pr > D 5.4448388 Pr > W-Sq 33.6363182 Pr > A-Sq p Value <0.010 <0.005 <0.005 205.1981 Std Deviation 231.7000 Variance 279.2000 Range Interquartile Range Variability 84.21039 7091 300.00000 140.90000


AA=0
300 250

avg_stud_lowincome

200 150 100 50 0 0.01 0.1 1 5 10 25 50 75 90 95 99 99.9 99.99

Normal Percentiles

16

Classification of Schools by Academic Achievement Measures HAA:


Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation Basic Statistical Measures Location Mean Median Mode 59.72030 Std Deviation 46.40000 Variance 0.00000 Range Interquartile Range Variability 53.66708 2880 282.30000 74.10000 941 Sum Weights 59.7202976 Sum Observations 53.6670837 Variance 1.18972537 Kurtosis 6063436.14 Corrected SS 89.8640595 Std Error Mean 941 56196.8 2880.15587 1.4619666 2707346.52 1.74949693


Goodness-of-Fit Tests for Normal Distribution Test Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling D W-Sq A-Sq Statistic 0.1328989 Pr > D 3.7635999 Pr > W-Sq 24.7062620 Pr > A-Sq
AA=1
300 250

p Value <0.010 <0.005 <0.005

avg_stud_lowincome

200 150 100 50 0 0.01 0.1 1 5 10 25 50 75 90 95 99 99.9 99.99

Normal Percentiles

A3. Step three of step disc procedure:

17

Classification of Schools by Academic Achievement Measures


Statistics for Removal, DF = 1, 1707 Variable avg_stud_lowincome avg_dist_tch_salary No variables can be removed. Statistics for Entry, DF = 1, 1706 Variable chronic_truant_rate avg_perc_dist_tch_badegree avg_perc_dist_tch_madegree bamaxpay_sched mamaxpay_sched Variable avg_perc_dist_tch_badegree will be entered. Variable(s) That Have Been Entered avg_stud_lowincome Multivariate Statistics Statistic Wilks' Lambda Pillai's Trace Average Squared Canonical Correlation Value 0.456281 0.543719 0.543719 F Value 677.64 677.64 Num DF 3 3 Den DF 1706 1706 Pr > F <.0001 <.0001 avg_dist_tch_salary avg_perc_dist_tch_badegree Partial R-Square 0.0018 0.0057 0.0055 0.0036 0.0011 F Value 3.02 9.80 9.38 6.19 1.89 Pr > F 0.0826 0.0018 0.0022 0.0129 0.1690 Tolerance 0.7843 0.9771 0.9753 0.7578 0.9789 Partial R-Square 0.5411 0.0142 F Value 2012.52 24.60 Pr > F <.0001 <.0001

18