You are on page 1of 18

# Classification

## Classification of Schools By Academic Achievement Measures

Kyle N. Payne Group 3 1

Classification of Schools by Academic Achievement Measures models. Predictors selected are general demographic variables of interest, including the average number of low-income students per school, student teacher ratio, etc. For fitting the discriminant function, the variable that is the classification is dependent on is academ_achieve, the proportion of students that exceed expectations on the ISAT averaged across math and reading and grade 3, and 5. The coding variable AA is of the form = { 0 < .15, 1 .15} This is a measure of the average school-wise score on the ISAT. While each class is not multivariate normally distributed, the quadratic discriminant function is relatively robust to non-normality. However to address the relative performance of the discriminant analysis to other methods, I have also used a logistic regression to model the probability schools being assigned to the two classifications. This secondary analysis was done using the SAS 9.2 platform with the logistic procedure. RESULTS Section 1 The stepdisc procedure was initially utilized for the following predictors: avg_stud_lowincome The average number of low income students per school chronic_truant_rate The average proportion of chronic truancy per school avg_dist_tch_salary The average teacher salary per district avg_perc_dist_tch_badegree The average percent of teachers with bachelors degrees per district avg_perc_dist_tch_madegree - The average percent of teachers with masters degrees per district bamaxpay_sched - The bachelors degree maximum pay schedule per school mamaxpay_shed - The masters degree maximum pay schedule per school The procedure was carried out with a .05 selection level and .05 significance level. Table 1.1 below demonstrates the first part of the analysis, in which the predictors are entered into the model based upon their significance.

## Classification of Schools by Academic Achievement Measures

Statistics for Entry, DF = 1, 1708 R- Squar Toleranc e F Value Pr > F e

0.534 1961.05 <.000 5 1 0.127 250.52 <.000 9 1 0.000 1 0.013 7 0.015 1 0.11 0.741 2 23.70 <.000 1 26.16 <.000 1

## 0.100 191.67 <.000 9 1 0.001 3 Table 1.1 2.26 0.132 6

avg_stud_lowincome, chronic_truant_rate, avg_perc_dist_tch_badegree, avg_perc_dist_tch_madegree, are statistically significant at the 0.05 level. We can see that the variable that makes up the vast majority of the variance explained in the model is avg_stud_lowincome. avg_stud_lowincome is significant when entered into the model, and we can also see that the multivariate statistics below indicate improvement over the null model.
Multivariate Statistics Statistic Wilks' Lambda Pillai's Trace Average Squared Canonical Correlation Value F Value Num DF Den DF Pr > F

Table 1.2

1 1

## 1708 <.000 1 1708 <.000 1

However, it is seen in table 1.3 upon the second step of the stepwise selection process, that all other terms have dropped below any practical significance in ! : 4

## Classification of Schools by Academic Achievement Measures

Statistics for Entry, DF = 1, 1707 Partial R- Toleranc Square F Value Pr > F e

## 0.0020 0.0142 0.0072 0.0069 0.0035 0.0009

3.41 0.065 2 24.60 <.000 1 12.31 0.000 5 11.87 0.000 6 5.96 0.014 7 1.60 0.206 5

## 0.7956 0.9853 0.9934 0.9918 0.7670 0.9939

Table 1.3 Therefore, while the stepwise process finishes after 4 steps with the significant predictors below in table 1.4, we can effectively call into question the practical significance of the other predcitors given the very small partial ! square values. Stepwise Selection Summary Ste p 1 2 3 4 Numbe r In Entered Remove d Partial R- Square F Value Pr > F Wilks' Pr < Lambda Lambda

1 avg_stud_lowincome 2 avg_dist_tch_salary

## 0.5345 1961.0 <.000 0.46551 <.0001 5 1 497 0.0142 0.0057 0.0068

Table 1.4

24.60 <.000 0.45890 <.0001 1 121 9.80 0.001 0.45628 <.0001 8 123 11.70 0.000 0.45317 <.0001 6 070

Thus, I fit the discriminant function with only the avg_stud_lowincome variable as a predictor. The discrim procedure was utilized, with the classification performed on the coded variable AA = {0 for LAA, 1 for HAA}.

## Classification of Schools by Academic Achievement Measures

Class Level Information Variabl e AA Name 0 _0 1 _1 Prior Frequenc Proportio Probabilit y Weight n y

842 842.00 0.472238 0.500000 00 941 941.00 0.527762 0.500000 00 Table 1.5

The discrimination resulted in a near 50/50 discrimination of the data, with a roughly 47% of the schools in the LAA category and 53% in the HAA category. As seen in the table 1.7, that the overall classification error rate is 16.11, which consists of a 0.2138 misclassification for the LAA class and 0.1084 misclassification rate for the HAA class . Number of Observations and Percent Classified into AA From AA LAA HAA Total Priors LAA 662 78.62 102 10.84 764 42.85 0.5 Table 1.6 HAA 180 21.38 839 89.16 1019 57.15 0.5 Total 842 100.00 941 100.00 1783 100.00

## Priors 0.500 0.500

Refitting the model with proportional priors, I received the same results of non- homogenous variance between the two groups, and therefore the quadratic discriminant function analysis was used, as seen in Table 1.8. The MANOVA results are similar to the non-proportional prior analysis (Table 1.9).
Chi-Square DF Pr > ChiSq

## 177.13229 1 9 Table 1.8

<.0001

Multivariate Statistics and Exact F Statistics S=1 M=-0.5 N=889.5 Statistic Wilks' Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root Value F Value Num DF Den DF Pr > F

0.47971 1931.6 133 5 0.52028 1931.6 867 5 1.08458 1931.6 699 5 1.08458 1931.6 699 5 Table 1.9

1 1 1 1

## 1781 <.000 1 1781 <.000 1 1781 <.000 1 1781 <.000 1

The use of proportional priors increased the misclassification rate for the LAA class and decreased the misclassification rate for the HAA class. However these changes were very slight. The analysis with proportional priors resulted in a very slight increase in the misclassification rate at 0.1621 (Table 1.11).

## Classification of Schools by Academic Achievement Measures

Number of Observations and Percent Classified into AA From AA LAA HAA Total Priors LAA HAA Total

## 0.47224 0.52776 Table 1.10

Error Count Estimates for AA Rate Prior s LAA HAA Total

## 0.22 0.10 0.162 57 52 1 0.47 0.52 22 78 Table 1.11

The cross-validated error rate estimates are slightly higher than the resubstitution rates (table 1.12), which are typically less accurate.
Cross Validated Error Count Estimates for AA Rate Prior s LAA HAA Total

## 0.22 0.10 0.162 57 63 6 0.47 0.52 22 78 Table 1.12

Because the purpose of the discriminant analysis is to be able to use the training set data to classify future data, I viewed cohort 1 data as a training set, and used cohort 2 data as a test set. While neither data set is completely randomly sampled, we can view cohort 2 as test set for classification under the assumption that there is no distinct non-stochastic difference in the amount of low-income students, and ISAT test scores. Therefore, using the cohort 1 data as the training set with proportional 8

Classification of Schools by Academic Achievement Measures priors, the result of the classification of cohort 2 is shown in table 1.13 below. We can see that a larger proportion of cohort 2 is classified into the HAA class compared with cohort 1.
Number of Observations and Percent Classified into AA Total LAA HAA Total

## Priors 0.47224 0.52776

Due to the univariate nature of the discriminant analysis, we can also view the classification visually. Figure 1.1 describes the predicted probability of being classified into the HAA group as a function of the average number of low-income students per school. The blue represents the HAA class, and red represents the LAA class.

Figure 1.1 Reviewing the assumptions for quadratic discriminant analysis, it is clear that there are several violations in this particular analysis. The distributions of the average number of low-income students for the LAA and HAA classes are both highly non-normal (figure 1.2), which is a consequence of splitting the data into the two classes. However, I proceeded in the face of this because not all violations of assumptions are equally detrimental, while some make an analysis completely invalid, some only affect the precision and accuracy of the analysis to a degree. The robustness of LDA and QDA to violations of normality has been investigated in (Sever, Lajovic & Rajer, 2005). The results of (Sever, Lajovic & Rajer, 2005) 9

Classification of Schools by Academic Achievement Measures indicate that the largest effect of non-normality on the discriminant analysis is the increased bias of error count estimates. Skewness in distribution appears to have little to no effect on the discriminant analysis using LDA or QDA.
AA=0
25
20.0 17.5 15.0

AA=1

20

Percent

Percent

15

10

## 0 0 25 50 75 100 125 150 175 200 225 250 275 300

0 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

Figure 1.2 Section 2 Because the classification scheme under study involves classifying data into dichotomous classes, I also used logistic regression of the average number of low- income students per school onto the log odds of said school being classified in the either of the AA classes. Logistic regression is competitive with discriminant analysis for classification because of its relatively small set of assumptions, and thus the non-normality of the classes is not a violation. The generalized logit link function was utilized as suggested in (Der & Everitt, 2002) due to the ordinal nature of the scale of the response. The test of the global null hypothesis (table 2.1) and the MLE estimates (table 2.2) are all significant. The asymptotic Wald Chi-Square value should be precise due to the large sample size.
Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Chi-Square DF Pr > ChiSq

avg_stud_lowincome

avg_stud_lowincome

1134.8846 1 927.6747 1

10

## Classification of Schools by Academic Achievement Measures

Analysis of Maximum Likelihood Estimates Estimat AA DF e 1 Standar d Wald Error Chi-Square Pr > ChiSq

Parameter Intercept

1 2.9648

0.1345 485.7581

<.0001 <.0001

avg_stud_lowincom 1 e

## 1 -0.0234 0.00104 509.1861

Table 2.2 The odds ratio estimate for the average number of low-income students on HAA is equal to 0.977 (Table 2.3). This implies that an increase in low-income students per school is more likely in the LAA class.
Odds Ratio Estimates Point Estimat 95% Wald e Confidence Limits

Effect

AA

avg_stud_lowincom 1 e

0.977

0.975

0.979

Table 2.3 Viewing diagnostics (figure 2.1, 2.2), it is clear that there are no obvious violations of assumptions of homogeneity of residual variance. However, we do see that the classes are completely separated in their residuals, which is likely due to the artificial-ness of the classification scheme.

Figure 2.1

11

## Classification of Schools by Academic Achievement Measures

Figure 2.2 Due to the univariate nature of the analysis, we can also view the logistic regression in terms of average number of low-income students on the probability of a school being classified as a HAA school. Figure 2.3 describes the predicted probability of a school being classified into the HAA class by the average number of low-income students per school.

Figure 2.3

We can also view measures of the association of predicted probabilities and the observed response. The percent concordant is the percent of responses that have a predicted mean score that also exists in the same class. The c-c measure is an adjustment on the ROC c measure. It ranges from 0.5 to 1, where 0.5 reflects a model

12

Classification of Schools by Academic Achievement Measures randomly predicting the response, and 1 perfectly classifying the response (table 2.4). It appears as if the classification is relatively accurate.
Association of Predicted Probabilities and Observed Responses Percent Concordant Percent Discordant Percent Tied Pairs

90.8 Somers'
D

## 9.1 Gamma 0.1 Tau-a

Section 3 In comparing the two models it is clear that the discriminant analysis may give relatively biased predictions when compared to the logistic regression. This reflects the possible bias of the model due to the violations of normality. While the two models do deviate from each other in their predictions of the probability of being classified into the HAA class, the two models are roughly similar (Figure 3.1).

## 7923 c-c 22 Table 2.4

Figure 3.1

13

Classification of Schools by Academic Achievement Measures Conclusion From the two analyses, we can paint a very convincing picture: The average number of low-income students per school is associated with decreases in the probability of said school being classified as into the High Academic Achievement class. Both models predict that schools with high number of low-income students have a high probability of being classified as LAA, and therefore the models predict that those schools have a lower number of students that exceed expectations on ISAT scores. Not only did the Average Number of Low-Income Students per school classify schools well, it did so above any other demographic predictor. The model selection process described in section 1 of the results section is evidence towards this point, as avg_stud_lowincome had a partial ! = 0.5345. This could provide a useful perspective to budgetary decisions, as the average number of low-income students explained much more variance then the average teacher salary per district (Although this is a messy comparison as there is variance in average teacher salary within a district). While this effect size may seem relatively small, it is actually quite high with regard to effects sizes commonly expected in social science. This also speaks to the general noisey-ness of the data. Further analysis could look at the relative performance of the discriminant model across each of the cohorts, or using a more sophisticated multivariate regression model where ISAT scores for math and reading are multiple responses. Other types of classification schemes could also be performed on the data, such as K-Means clustering, non-parametric discriminant analyses, etc. 14

Classification of Schools by Academic Achievement Measures Reference Cohen, J. (1983). Cost of dichotomization. Applied Psychological Measurement, 7(3), 249-250.

Der, G. & Everitt, B. S. (2002). A handbook of statistical analyses using sas. (2nd ed., p. 292). Boca Raton, FL: Chapman & Hall/CRC Sever, M., Lajovic, J., & Rajer, B. (2005). Robustness of the fishers discriminant . Metodoloki zvezki,2(2), 239-242.
15

Classification of Schools by Academic Achievement Measures Appendix: A1. Some univariate results for avg_stud_lowincome: LAA:
Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 842 Sum Weights 205.1981 Sum Observations 84.2103863 Variance -0.6552029 Kurtosis 41417329.4 Corrected SS 41.0385799 Std Error Mean 842 172776.8 7091.38915 -0.8303315 5963858.28 2.90208156

Basic Statistical Measures Location Mean Median Mode Goodness-of-Fit Tests for Normal Distribution Test Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling D W-Sq A-Sq Statistic 0.1670009 Pr > D 5.4448388 Pr > W-Sq 33.6363182 Pr > A-Sq p Value <0.010 <0.005 <0.005 205.1981 Std Deviation 231.7000 Variance 279.2000 Range Interquartile Range Variability 84.21039 7091 300.00000 140.90000

AA=0
300 250

avg_stud_lowincome

## 200 150 100 50 0 0.01 0.1 1 5 10 25 50 75 90 95 99 99.9 99.99

Normal Percentiles

16

## Classification of Schools by Academic Achievement Measures HAA:

Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation Basic Statistical Measures Location Mean Median Mode 59.72030 Std Deviation 46.40000 Variance 0.00000 Range Interquartile Range Variability 53.66708 2880 282.30000 74.10000 941 Sum Weights 59.7202976 Sum Observations 53.6670837 Variance 1.18972537 Kurtosis 6063436.14 Corrected SS 89.8640595 Std Error Mean 941 56196.8 2880.15587 1.4619666 2707346.52 1.74949693

Goodness-of-Fit Tests for Normal Distribution Test Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling D W-Sq A-Sq Statistic 0.1328989 Pr > D 3.7635999 Pr > W-Sq 24.7062620 Pr > A-Sq
AA=1
300 250

## p Value <0.010 <0.005 <0.005

avg_stud_lowincome

## 200 150 100 50 0 0.01 0.1 1 5 10 25 50 75 90 95 99 99.9 99.99

Normal Percentiles

17

## Classification of Schools by Academic Achievement Measures

Statistics for Removal, DF = 1, 1707 Variable avg_stud_lowincome avg_dist_tch_salary No variables can be removed. Statistics for Entry, DF = 1, 1706 Variable chronic_truant_rate avg_perc_dist_tch_badegree avg_perc_dist_tch_madegree bamaxpay_sched mamaxpay_sched Variable avg_perc_dist_tch_badegree will be entered. Variable(s) That Have Been Entered avg_stud_lowincome Multivariate Statistics Statistic Wilks' Lambda Pillai's Trace Average Squared Canonical Correlation Value 0.456281 0.543719 0.543719 F Value 677.64 677.64 Num DF 3 3 Den DF 1706 1706 Pr > F <.0001 <.0001 avg_dist_tch_salary avg_perc_dist_tch_badegree Partial R-Square 0.0018 0.0057 0.0055 0.0036 0.0011 F Value 3.02 9.80 9.38 6.19 1.89 Pr > F 0.0826 0.0018 0.0022 0.0129 0.1690 Tolerance 0.7843 0.9771 0.9753 0.7578 0.9789 Partial R-Square 0.5411 0.0142 F Value 2012.52 24.60 Pr > F <.0001 <.0001

18