You are on page 1of 4

ERRATA SHEET

ERRATA IN “COMPARISON OF IMPUTATION PROCEDURES: THE CASE OF THE


FAMILY INCOME EXPENDITURE SURVEY

1. On page vii, second paragraph, fourth line from the bottom, the sentence should have
been: (b) stochastic regression imputation method was selected the best imputation
method instead of (b) when predicting an actual value from the set of nonresponse
data, the regression model must have a minimum coefficient of determination of 80%
for better prediction of the observations.
.

2. On page 5, last paragraph, fifth line from the bottom, the sentence should have been:
will help statisticians (instead of helps) to provide a method in handling
nonresponse….

3. On page 6, second paragraph, last sentence, the sentence should have been: If the
pattern of nonresponse…

4. On the same page 6, third paragraph, For DR read DRI, For SR read SRI and For HD
read HDI

5. On page 18, last line, the sentence should have been: … is close to the respondents,
yr

6. On page 25, second paragraph, the sentence should have been: Problems might arise
if the imputation classes are not formed with caution.

7. On the same page 25, 28, and 62 the citation should have been (Cheng & Sy, 1999).

8. On page 26, the formula should have been:

The m should be replaced by r.

9. On page 27, first paragraph, last line, the sentence should have been: Many related
literatures stated that this is the least effective and it is highly discouraged to use this
method.
10. In Figure 1 on page 29, Under Person 4, For HS2 = Y read HS2 = N

11. On page 31, first paragraph, last sentence, the sentence should have been: This
technique is seen as the generalization of the group mean imputation (GMI), another
name for mean imputation which have been discussed previously.

12. On page 32, fourth line, the m should be replaced by r.

13. Same as (8) on page 41

14. On page 45, section 4.5 should have been:

4.5 Comparison of Imputation Methods

4.5.1. Bias and Variance of the sample mean

The primary objective of using imputation methods is to be able to generate

statistically reliable estimates. To check if the imputation methods produced reliable and

determine the effect of the varying nonresponse rates on the performance of the

imputation techniques, one of the three criteria which is the bias and the variance of the

sample mean were measured.

To compute for the bias of the mean of the imputed data, the following

procedures were implemented:

a. The mean of the imputed data, y ' was computed. For hot deck and stochastic

regression imputation, the average of all the mean of the 1000 simulated data sets was

computed.

b. The mean of the actual data, y was computed.

c. The resulting bias of the mean of the imputed data was computed by getting the

difference between (1) and (2).


For the overall mean and deterministic regression imputation, the variance is

zero. On the other hand, for hot deck and stochastic regression imputation, the variance is

given by:

and

where y'i is the mean of the imputed data for the i-th set, y ' is the mean of the

1000 means of the imputed data sets.

The results of this section will be presented in the next chapter.

15. In section 4.5.2 on pages 46-47, after the discussion of the Kolmogorov-Smirnov test,
there should have been a procedure like this:

To provide additional information to the distribution of the imputed vs. actual data,
the comparison of the frequency distribution of the actual (deleted) vs. imputed
values was obtained. This was done in order to show the effect of the imputed values
to the distribution of the imputed data set.

In performing the test, the following steps are made:

a. Income and Expenditure deciles were created. The deciles that were used in the
previous test were the same deciles used here.

b. The obtained deciles were used as upper bounds of the frequency classes.

c. A Frequency Distribution Table (FDT) for both the imputed values and actual
values was generated.
d. For the hot deck and stochastic regression which had 1000 sets, the relative
frequencies (RF) for each frequency class were averaged over 1000 RFs.

16. On page 48, first sentence, the sentence should have been: The Mean Absolute
Deviation (MAD) is a criterion for measuring the closeness with which the deleted
values are reconstructed.

17. On the same page 48, last paragraph, third line from the bottom, the sentence should
have been: … Absolute Deviation and Root Mean Square Deviation and were saved
… (deleted the doubled phrase “were saved “ )

18. On page 49, the sentence on item #1 should have been: In each criterion mentioned in
the previous sections, the imputation methods were ranked ….

19. On page 59, first paragraph, the phrases “coefficient of variation” should have been
“coefficient of determination”

20. On pages 59-68, ŷ r should be replaced by y r

21. On page 86, MAR should be replaced by MCAR.

22. On the same page 86, second paragraph, first line, the sentence should have been:
Regarding the variance estimation, further studies should implement the use of proper
variance estimation method like the Jackknife Variance Estimator.

You might also like