A Comparison of Missing Value Imputation Methods for Classifying
Patient Outcome Following Trauma Injury
Kay I Penny School of Accounting, Economics ana Statistics, Napier University, Craiglockhart Campus, Eainburgh, EH14 1DJ k.pennynapier.ac.uk Thomas Chesney Nottingham University Business School, Jubilee Campus, Wollaton Roaa,Nottingham, NG8 1BB Thomas.Chesneynottingham.ac.uk Abstract. A stuay is aesignea to compare several missing value imputation methoas to enable classification of patient outcome following trauma infury. The Glasgow coma score is a measure of heaa infury severity, ana is known to be important in aetermining patient outcome. The Glasgow coma scores are missing for 12 of the aataset, ana in oraer to classify patient outcome for these patients, the missing values are first imputea. The first part of the stuay is aesignea to compare the performance of several missing value imputation methoas, ana errors between imputea values ana known values of Glasgow coma scores are calculatea. The secona part of the stuay involves analysing the imputea aata sets using logistic regression to classify whether patients live or aie. Accuracy of results are comparea in terms of sensitivity, specificity, positive preaictive value ana negative preaictive value. Keywords. Missing value imputation, Logistic regression, Trauma injury. 1. Introduction Logistic regression is Irequently used in many areas oI medicine and healthcare to classiIy a binary outcome |4|. OIten, the presence oI missing values are not a problem when analysing a data set, unless at least one oI the variables which are important in the analysis contains a substantial proportion oI missing values. In this paper, logistic regression Ior classiIying whether a patient lives or dies Iollowing trauma injury is considered. Trauma injury is a leading cause oI loss oI liIe |5|, and in 1991 a trauma system was put in place at the North StaIIordshire Hospital (NSH) in Stoke-on-Trent in the U.K. The NSH is a major trauma centre and receives patient reIerrals Irom surrounding hospitals in the area. Injury details recorded include Injury Severity Score (ISS) |2|, Abbreviated Injury Scores (AIS) |1|, the Glasgow Coma Score (GCS) |7|, the patient's gender and age, management and subsequent management interventions, and the outcome oI the treatment, including whether the patient lived or died during their hospital stay. The aim oI this study is to investigate the impact oI missing value imputation techniques on the accuracy oI classiIying patient death Iollowing trauma injury. 2. Methods The study involves trauma audit data Irom patients treated at the North StaIIordshire Hospital Irom 1993 to 1999 and Irom 2001 to 2004. The gap was due to lack oI resources which aIIected data collection during this period. Only the most severely injured patients i.e. patients with an ISS greater than 15 are included in this study, resulting in a total oI 1658 patients in the dataset. Factors considered Ior inclusion in the analysis (see Table 1) include patient age and gender, mechanism oI injury, whether the injury is blunt or penetrating, whether the patient is reIerred Irom another hospital, and the year, 367 Proceedings of the ITI 2008 30 th Int. Conf. on Information Technology Interfaces, June 23-26, 2008, Cavtat, Croatia month, day oI the week, and time oI day oI injury. There are several injury severity scores which are considered in the analysis; these include twelve abbreviated injury scores (AIS) which relate to diIIerent parts oI the body, and the total Glasgow Coma Score (GCS) which measures the severity oI head injury. Table 1. Factors considered for inclusion in the analyses Sex (Male or Female) Age group (years). 0-15; 16-25; 26-35; 36-50; 51-70; over 70 Year of admission (1992 - 8, 2001-5) Month of admission (Jan Dec) Day of admission (Mon Sun) 1ime of admission (0000 - 0359; 0400 -0759; 0800 - 1159; 1200 - 1559; 1600 - 1959; 2000 - 2359) Referred from another hospital (yes or no) Mechanism of injury group: Motor vehicle crash; Fall greater than 2m; Fall less than 2m; Assault; Other 1ype of trauma: blunt (yes or no) penetrating (yes or no) Abbreviated injury scores (AIS): Head Face Lower limb Neck Chest External Abdomen Cervical-spine Upper limb Thoracic-spine Spine Lumbar-spine 1otal Clasgow coma score (CCS).3 -15 There are two parts to the data analysis: the Iirst step involves assessing the error in imputing the missing values, and the second step involves assessing the accuracy oI modelling patient death. The data are split into two parts; one halI (comprising 830 patients) is used to assess the errors involved using the imputation techniques, and the other halI oI the data set (comprising 828 patients) is used to classiIy patient death. 2.1. Dealing with missing values Previous work |3| compared the results oI Iour diIIerent artiIicial neural network models as well as logistic regression modelling to predict patient death during hospital stay Iollowing injury. The GCS score was Iound to have high importance in the artiIicial neural networks, and GCS was statistically signiIicant in the logistic regression model. In order Ior GCS to be included in these models, 12 oI the sample, i.e. patients whose GCS scores were not recorded, were excluded Irom the analysis. Hence missing value imputation is considered here in order that all patients can be included in the modelling process. To calculate the GCS score, scores Ior eye response, motor response and verbal response are each recorded on an ordinal scale. The total GCS is calculated by summing these three measures, resulting in a score ranging Irom 3 to 15. A low score relates to a severe head injury, and a high score implies a mild head injury. Four imputation techniques are considered in this study: mean imputation, group mean imputation, predictive mean imputation, and hot- deck imputation. Mean imputation involves replacing the missing GCS scores with the overall mean Ior the observed data. The mean is then rounded to the nearest integer, as GCS is an ordinal measure. The missing GSC scores are imputed with a score oI 11, which is the mean oI the observed scores. Group mean imputation is when the missing score is replaced by the group mean calculated Ior the subset oI patients with the same AIS head score. Predictive mean imputation uses multiple linear regression to predict GSC where AIS scores Ior head, abdomen, cervical spine and lumbar spine are the independent variables. Hot-deck imputation |6| involves substituting individual values drawn Irom patients with observed data who are 'similar to the patient with the missing value. In terms oI the GCS scores, this would involve imputing a GCS score drawn Irom a subset oI patients who are 'similar to the patient with the missing GCS score. In order to impute a particular GCS score, this method sorts patients both with observed values and those with missing values Ior this score into a number oI subsets according to a set oI covariates which are associated with the GCS scores. In this application, the imputation subsets comprise patients with the same values oI mechanism oI injury and the injury severity 368 scores: AIS head, AIS abdomen, AIS lumbar spine and AIS cervical spine. Patients with missing GCS scores will then have their missing values replaced with observed values selected at random, with replacement, Irom patients in the same subset i.e. patients who are similar with respect to these covariates. II there are no observed values in the corresponding subset oI patients, then the subset is collapsed by one level, and this process is repeated until an observed value can be Iound. In this part oI the study, only cases with observed GCS scores are used. EIIectively, one third oI the observed GCS scores is deleted, and then estimated using each oI the Iour imputation methods in turn. Since the true GCS scores are known, the mean error (ME), mean absolute error (MAE) and mean square error (MSE) are calculated Ior each imputation method. 2.2. Patient classification The second part of this study involves the classification of patient outcome i.e. whether the patient survives or dies during their hospital stay. Logistic regression modelling is used to classify this binary outcome. In medical applications it is oIten the case that a logistic regression model is developed using the complete data set, and the model is then tested on the same set oI data used to build it. However, it is not ideal to test the model with the same data used to build it, hence two thirds oI the data are used to train the model and the other third used to test it. The logistic regression models are developed to determine a parsimonious model with good predictive ability, yet the models are as simple to interpret as possible. In order to include cases with missing GCS scores in the modelling, each oI the imputation methods are applied to both the training and test datasets prior to perIorming the classiIication oI patient outcome. The modelling process is repeated Iour times, once Ior each imputation method, and results are compared according to sensitivity (SENS), speciIicity (SPEC), positive predictive value (PPV) and negative predictive value (NPV). A cut-point oI 0.5 is used in the logistic regression modelling to allow comparability between the models. 3. Results Table 2 contains error measures Ior each oI the Iour imputation methods. The mean error is low Ior each oI the Iour imputation methods which indicates there is little evidence oI systematic error in the imputations. The mean absolute error is greatest Ior the mean imputation method, and lowest Ior the group mean imputation. The mean square error is greatest Ior the hot-deck imputation, which implies that there may be some rather large errors compared to the other methods, leading to the inIlation oI the mean square error. Table 2. lmputation errors Imputation method Mean error (ME) Mean Absolute Error (MAE) Mean Square Error (MSE) Mean -0.42 4.18 21.80 Group mean 0.39 3.24 16.91 Predictive mean 0.24 3.38 17.03 Hot-deck -0.08 3.62 27.05 The accuracy results Irom the logistic regression are presented in Table 3. The mean, group mean and predictive mean imputation methods perIorm equally well, whereas the hot- deck imputation method perIorms less well according to all Iour accuracy criteria. Table 3. Evaluations of classification Imputation method SENS SPEC PPV NPV Mean 48 93 0.64 0.87 Group mean 48 93 0.64 0.87 Predictive mean 48 93 0.64 0.87 Hot-deck 45 92 0.59 0.86 Table 4 contains a listing oI the Iactors included in the training models. All Iour training models, one Ior each oI the imputation methods, contain the same six Iactors in the Iinal model. The coeIIicients in the Iour training models do diIIer between the models, as do their corresponding odds ratios associated with patient 369 death. A typical logistic regression model shows increased odds oI death iI involved in a motor vehicle crash, being older in age, having a more severe injury according to AIS scores Ior head, external, and abdomen, and also having a more severe head injury as measured by the GCS score. Table 4. Factors included in the logistic regression modelling Factors Included Age group Mechanism oI injury GCS score AIS head AIS abdomen AIS external 4. Conclusions The results show no distinction between the mean, group mean and predictive mean imputation methods in terms oI sensitivity, speciIicity, PPV and NPV, although the mean imputation method had the greatest MAE. The hot-deck imputation method gives slightly lower accuracy in predicting patient death, and this imputation method also gave the greatest MSE. These results are quite surprising, in particular, that the mean imputation method perIorms as well as the group mean and predictive mean methods which both incorporate additional inIormation about the cases with missing values into the estimates. Although none oI these results led to greatly accurate classiIication oI patient death Iollowing trauma injury, they do allow classiIication oI patients whose Glasgow coma scores are missing. These patients would not have been included in either building or testing the models in a complete-case analysis. In other words, it would not have been possible to make any prediction Ior a patient with missing GCS values, whereas using imputation allows a prediction to be made. Further work on a larger data set would be beneIicial. One approach would be to carry out a simulation study using the complete-case data only, where a subset oI GCS scores is deleted to mimic the pattern oI missingness in the observed data. This would allow the assessment oI the diIIerent imputation techniques on a much larger scale, and the results may be more stable giving more insight into diIIerences in perIormance between the imputation methods. Also, similar techniques could then be applied to the whole trauma injury dataset which includes patients with all levels oI injury severity, not only those most severely injured with ISS ~ 15. 5. References |1| Association Ior the Advancement oI Automotive Medicine, 'The abbreviated injury scale, 1990 revision, Des Pleines, IL, Association for the Aavancement of Automotive Meaicine, 1990. |2| Baker S.P., O'Neill B., Haddon Jr. W., and Long W.B., 'The injury severity score: a Method Ior describing patients with multiple injuries and evaluating patient care, Journal of Trauma, vol. 14, pp. 187-196, 1974. |3| Chesney T., Penny K.I., Oakley P., Davies S., Chesney D., MaIIulli N., and Templeton J., 'Data mining medical inIormation: Should artiIicial neural networks be used to analyse trauma audit data? Int J of Healthcare Information Systems ana Informatics, Vol. 1(2), pp. 51-64, 2006. |4| Hosmer H.W. and Lemeshow S., Appliea Logistic Regression, 2 nd edition. New York: Wiley, 2000. |5| Joshipura M., Mock C., Goosen J., and Peden M., 'Essential Trauma Care: strengthening trauma systems around the world, Infury, vol. 35, pp. 841-845, 2004. |6| Little R.J.A. and Rubin D.B., Statistical Analysis with Missing Data. New Jersey: John Wiley & Sons, 2002. |7| Teasdale G.and Jennett B., 'Assessment oI coma and impaired consciousness. A practical scale, Lancet, vol. 2, pp. 81-3, 1974 370