108 views

Uploaded by Chitra Belwal

MDA Output Interpretation

- WEKA Explorer Tutorial.pdf
- Application of Cluster Analysis to Fabric Classification
- Chapter 1 Rm
- 2018_AUT_BIOST_514_517_RiceK
- Research on Sorting Algorithm for Web Contents
- The Unscrambler X v10.3 - User Manual
- Midterm Sample 2 2010
- Course Out Line_TPS XX
- Musical Trends and Predictability of Success in Contemporary Songs in and Out of the Top Charts
- data mining
- Artificial Intelligent Tools for detecting incipient faults in transformer using Dissolved Gas Analysis
- 3.2- Selecting a Stat Test
- Classification Scoring for Cleaning Inconsistent Survey Data
- GNP Trends
- 31 Practice Statistics Test Hypothesis Testing
- Group-Sequential Non-Inferiority Tests for the Odds Ratio of Two Proportions (Simulation)
- Lecture 8
- DayLite Implementation Guide
- Tourist Profile of Young-Adults in Macedonia and Their Perception of E-Tools
- Statistics

You are on page 1of 32

No

Pattern of functions

at centroids

1.evaluated

Ratio of valid

cases: Indep variables = 20:1(ideal) & 5:1(okay). In our case 154:4

interpreted?

2.correctly

Min cases

(at least 20) in smallest group > indep variables. In our case: cases = 42 & 4

3. Wilks Lambda (rel. measure) WSS/TSS (group means differ) 0<<1(group means same). In

case of 1 discriminant Fn. (DF) look at sigf.

4. Discriminant Fn. = # of groups 1

Yes (higher, the better) depicts rel. discriminatory power of DF.

5. Eigen value: BSS/WSS

6. Boxs M H0: HGNS of Var-Covar Matrix of Indep variables

After converting the data to SPSS format, click on Classify, then click Discriminant.

A box titled, Discriminant Analysis will come up. Click on Grouping Variable. This is the non-metric

dependent variable. Define its range. Then enter the independent variables. There are two methods

for MDA Enter method and Stepwise method. We will start with Enter method.

Next click on Statistics. There are 3 headings here: Descriptives (click all boxes), Function

Coefficients (click none) and Matrices (click none). In descriptives we have Means, Univariate

ANOVAs, and Boxs M.

Means: We use the group means for interpretation as in the HATB example.

Univariate ANOVAs: Pursuing these tests suggests which variables might be useful

discriminants.

Boxs M: A test of equality of group variance-covariance matrices. For sufficiently large

samples a high p-value signifies that there is insufficient evidence that the matrices differ.

H0: the variance/covariance matrices of independent variables across groups are the same;

H1: the variance/covariance matrices across groups are different.

Click continue and go to Classify.

Prior Probabilities: compute from group sizes: This incorporates the sizes of the groups as defined by

the dependent variable into the classification of the cases using the discriminant functions.

Display: Casewise results: This will give you the classification details for each case in the output.

Display: Summary table: This will include summary tables comparing actual and predicted

classification.

Display: Leave-one-out classification: This is to ask SPSS to include a cross-validation classification

in the output. This option produces a less biased estimate of classification accuracy by sequentially

holding each case out of the calculations for the discriminant functions, and using the derived

functions to classify the case held out.

Use Covariance Matrix: Within-groups: The Covariance matrices are the measure of the dispersion in

the groups defined by the dependent variable. If we fail the homogeneity of group variances test

(Boxs M), our option is use Separate groups covariance in classification. Hence, it is good if the null

hypothesis is accepted.

Plots: Combined-groups: This will help to obtain a visual plot of the relationship between functions

and groups defined by the dependent variable.

Discriminant

Introduction: Based on a discriminant analysis using the simultaneous method for including

variables age [age], highest year of school completed [educ], gender [gender], and total family

income [incom98] were found to be useful in distinguishing between groups defined by the dependent

variable seen thriller movie in last year [tmovie].

A discriminant function differentiated survey respondents who had not seen a thriller movie in the

last year from survey respondents who had seen thriller movie in the last year.

Unweighte

d Cases

Valid

Excluded

Missing or out-of-range group codes

At least one missing discriminating variable

Both missing or out-of-range group codes and at least one missing

discriminating variable

Total

Total

Percent

154

74

31

11

57.0

27.4

11.5

4.1

116

270

43.0

100.0

Interpretation: The minimum ratio of valid cases to independent variables for discriminant analysis

is 5 to 1, with a preferred ratio of 20 to 1. In this analysis, there are 154 valid cases and 4

independent variables. The ratio of cases to independent variables is 38.5 to 1, which satisfies the

minimum requirement. In addition, the ratio of 38.5 to 1 satisfies the preferred ratio of 20 to 1. Now,

let us go to Prior Probabilities for Groups.

Group Statistics

Mean

SEEN THRILLER

MOVIE IN LAST YEAR

NO

YES

Total

Std.

Deviation

Valid N (listwise)

Unweighted Weighted

HIGHEST YEAR OF SCHOOL 13.42

COMPLETED

RESPONDENTS GENDER 1.67

TOTAL FAMILY INCOME 15.73

AGE OF RESPONDENT 39.19

HIGHEST YEAR OF SCHOOL 13.58

COMPLETED

RESPONDENTS GENDER 1.33

TOTAL FAMILY INCOME 16.60

AGE OF RESPONDENT 44.36

HIGHEST YEAR OF SCHOOL 13.47

COMPLETED

RESPONDENTS GENDER 1.57

TOTAL FAMILY INCOME 15.97

15.67

2.93

111

111

111.000

111.000

.47

5.42

14.86

2.84

111

111

43

43

111.000

111.000

43.000

43.000

.47

4.44

15.73

2.90

43 43.000

43 43.000

154 154.000

154 154.000

.50

5.16

154 154.000

154 154.000

Group Statistics

Mean

SEEN THRILLER

MOVIE IN LAST YEAR

NO

YES

Total

Std.

Deviation

Valid N (listwise)

Unweighted Weighted

HIGHEST YEAR OF SCHOOL 13.42

COMPLETED

RESPONDENTS GENDER 1.67

TOTAL FAMILY INCOME 15.73

AGE OF RESPONDENT 39.19

HIGHEST YEAR OF SCHOOL 13.58

COMPLETED

RESPONDENTS GENDER 1.33

TOTAL FAMILY INCOME 16.60

AGE OF RESPONDENT 44.36

HIGHEST YEAR OF SCHOOL 13.47

COMPLETED

RESPONDENTS GENDER 1.57

TOTAL FAMILY INCOME 15.97

15.67

2.93

111

111

111.000

111.000

.47

5.42

14.86

2.84

111

111

43

43

111.000

111.000

43.000

43.000

.47

4.44

15.73

2.90

43 43.000

43 43.000

154 154.000

154 154.000

.50

5.16

154 154.000

154 154.000

Interpretation: (i) The average "age" for survey respondents who had not seen a thriller movie in the

last year (mean=46.36) was higher than the average "age" for survey respondents who had seen a

thriller movie in the last year (mean=39.19).

So, survey respondents who had not seen a thriller movie in the last year were older than survey

respondents who had seen a thriller movie in the last year.

(ii) Since "gender" is a dichotomous variable, the mean is not directly interpretable. Its interpretation

must take into account the coding by which 1 corresponds to male and 2 corresponds to female. The

higher means (as compared to 1.57) for survey respondents who had not seen a thriller movie in the

last year (mean=1.67), when compared to the means for survey respondents who had seen an thriller

movie in the last year (mean=1.33), implies that the groups contained fewer survey respondents who

were male and more survey respondents who were female.

Survey respondents who had not seen a thriller movie in the last year were more likely to be female

than survey respondents who had seen a thriller movie in the last year. Let us now go to the next

table Tests of Equality of Group Means.

Tests of Equality of Group Means

Wilks' Lambda

F df1 df2 Sig.

AGE OF RESPONDENT

.958 6.684 1 152 .011

HIGHEST YEAR OF

.999

.091 1 152 .763

Interpretation: Interpretation: As we

SCHOOL COMPLETED

RESPONDENTS GENDER

.904 16.069 1 152 .000 know, Wilks' Lambda tests the extent

TOTAL FAMILY INCOME

.994

.889 1 152 .347 of equality of group means and their

statistical significance for

independent variables. In this case, we notice that gender and age have better values of Wilks'

lambda statistic with a probability of p<0.05 which depicted by the level of significance. Hence they

should be considered as good discriminatory independent variables.

However, this observation should be cross-validated from the structure matrix, as well. If the

structure matrix and this table are in similar lines, we can infer the discriminatory power of the

indep variables.

Analysis 1

Box's Test of Equality of Covariance Matrices

Log Determinants

SEEN THRILLER MOVIE IN LAST YEAR Rank

Log Determinant

NO

4

9.176

YES

4

8.573

Pooled within-groups

4

9.077

The ranks and natural logarithms of determinants printed are those of the group covariance matrices.

Test Results

Box's M

F

10.220

Approx.

.983

df1

10

df2

30348.624

Sig.

.455

Tests null hypothesis of equal population covariance matrices.

Interpretation: H0: the variance-covariance matrices of the two groups are the same in the population.

This is a test of variability. But this is an overall judgement of all the indep variables taken together

Boxs M

= 10.220

Approx. F

= 0.983

Conclusion: F-tab = FINV(0.05,10,30349) = 1.83. As calculated F is much lower than Tabular F, the

null hyp is accepted. This is also confirmed by the p-value. Therefore, we can conclude that group

homogeneity is present.

Eigenvalues

Function

Eigenvalue % of Variance Cumulative % Canonical Correlation

1

.169

100.0

100.0

.380

a First 1 canonical discriminant functions were used in the analysis.

Interpretation: There exists one Eigenvalue for one discriminant function. It depicts the relative discriminatory

power of the discriminant functions. For two groups, it hardly makes any sense, but in case there are more than

two groups, it allows to understand which function is better.

In the present example there are two groups, i.e., seen thriller movie last year = 1 and not seen

thriller movie last year=0. As we know, when there are two groups only one discriminant function can

be extracted from the data and its Eigenvalue () is interpreted as follows:

In simple language, in the two-group case, we can define Between Sum of Squares (i.e., Sum of

Squares Across Groups) as follows:

BSS z 0 z z1 z z j z

2

WSS z i 0 z 0 z i1 z1 z ij z j

2

Eigenvalue is defined as:

BSS

. So, if = 0.00, the model has no discriminatory power (as BSS =

WSS

0). The larger the value of the greater the discriminatory power of the model.

Two groups can be separated by one discriminant function. Three groups require two discriminant

functions. The required number of functions is usually one less than the number of groups.

With 4 independent variables and 2 groups defined by the dependent variable, the maximum

possible number of discriminant functions was 1, which accounts for the 100% of variation by itself.

(cross-check this with the 3-group case)

The significance of the maximum possible number of discriminant functions supports the

interpretation of a solution using 1 discriminant function. Now let us go to Functions at Group

Centriods.

(III) Wilks' Lambda

Test of Wilks' Chi-square

Function(s) Lambda

.855

23.440

df

Sig.

.000

Interpretation: The overall relationship in discriminant analysis is based on the existence of sufficient

statistically significant discriminant functions to separate all of the groups defined by the dependent

variable. As we see, the observed Chi-square (23.440) does not fall within the critical region (as

=CHIINV(0.025,4) = 11.1433 and =CHIINV(0.975,4) = 0.484. The probability of p<0.001 [p(calculated)]

is also less than the level of significance of 0.05 [p(critical)].

Wilks Lambda is given as follows:

WSS

, which is nothing but the within group variation with

TSS

respect to the total variation. Hence, a lower value of this implies greater homogeneity within group,

which we will be happy to get. But, again, this is the overall Wilks Lambda, considering all the indep

variables together. Now, let us go to Group Centroids.

Function

1

AGE OF RESPONDENT

.610

HIGHEST YEAR OF SCHOOL COMPLETED

.138

RESPONDENTS GENDER

.849

TOTAL FAMILY INCOME

-.139

Interpretation: The function of these coefficients is to compare the relative importance of the indep variables.

So, we find that AGE and GENDER has relatively higher importance compared to EDUC and INCOME98

As the coefficients are standardized, the constant term is weaved with the model, & hence we dont

have one.

(V) Structure Matrix

Function

1

.791

.510

-.186

-.060

RESPONDENTS GENDER

AGE OF RESPONDENT

TOTAL FAMILY INCOME

HIGHEST YEAR OF SCHOOL

COMPLETED

Pooled within-groups correlations between discriminating variables and standardized canonical discriminant

functions Variables ordered by absolute size of correlation within function.

Interpretation: (i) We are interested in the role of the independent variable in predicting group

membership, i.e. are higher or lower scores on the independent variable associated with membership

in one group rather than the other?

This relationship can be stated as a comparison of the means of the groups defined by the dependent

variable.

In direct entry discriminant analysis, there is not a statistical test for each individual independent

variable. The interpretation that a variable is contributing to the discrimination of the groups defined

by the dependent variable is based on the loadings in the structure matrix.

We will use the rule of thumb that contributing variables have a loading +/-0.30 or higher on the

discriminant function. (This is very important, as it acts as a benchmark.) we did not discuss this im

class so please make a note of this.)

If an analysis has loadings higher than 0.30 on more that one function, we interpret the variable in

relationship to the function with the highest loading.

Based on the structure matrix, the independent variable age has a high enough loading (r=0.510) to

warrant interpretation as distinguishing between the groups differentiated by discriminant function,

i.e. between the group who had not seen a thriller movie and the group who had seen a thriller movie

in the last year. Let us now go to group statistics.

(ii) The largest loadings for "highest year of school completed" [educ] and total family income

[income98] in the structure matrix is less than 0.30. The variables are not interpreted because they

are not contributing to the discrimination of the groups. Let us now go back to Prior Probabilities for

Groups.

(iii) Based on the structure matrix, the independent variable gender has a high enough loading

(r=0.791) to warrant interpretation as distinguishing between the groups differentiated by

discriminant function, i.e., between the group who had not seen a thriller movie and the group who

had seen a thriller movie in the last year. Let us now go back to Group Statistics.

Though gender has a loading of 0.791, it is not considered as it is a binary variable. Had there been

nothing better, we would have considered it.

Function

SEEN THRILLER MOVIE IN LAST YEAR

1

NO

.254

YES

-.656

Unstandardized canonical discriminant functions evaluated at group means

Interpretation: Before we interpret the relationship between the independent variables and the

dependent variable, we need to identify which groups defined by the dependent variable are

differentiated by which discriminant function.

In a problem with only two groups, the solution is obvious, but we will see how to derive the answer

for more complicated groupings. You can compare this with the 3-group case (page: 22)

In order to specify the role that each independent variable plays in predicting group membership on

the dependent variable, we must link together the relationship between the discriminant functions

and the groups defined by the dependent variable, the role of the significant independent variables in

the discriminant functions, and the differences in group means for each of the variables.

Each function divides the groups into two subgroups by assigning negative values to one subgroup

and positive values to the other subgroup. Function 1 separates survey respondents who had seen a

thriller movie in the last year (-.656) from survey respondents who had not seen a thriller movie in

the last year (.254). Let us now go to the structure matrix.

Classification Statistics

Classification Processing Summary

Processed

270

Excluded

Missing or out-of-range group codes

0

At least one missing discriminating variable 42

Used in Output

228

(II) Prior Probabilities for Groups

Prior

SEEN THRILLER

MOVIE IN LAST YEAR

NO

YES

Total

.721

.279

1.000

Unweighted

Weighted

111

43

154

111.000

43.000

154.000

Interpretation: (i) In addition to the requirement for the ratio of cases to independent variables,

discriminant analysis requires that there be a minimum number of cases in the smallest group

defined by the dependent variable. The number of cases in the smallest group must be larger than

the number of independent variables, and preferably contains 20 or more cases.

The number of cases in the smallest group in this problem is 43, which is larger than the number of

independent variables (4), satisfying the minimum requirement. In addition, the number of cases in

the smallest group satisfies the preferred minimum of 20 cases.

(ii) The independent variables could be characterized as useful predictors of membership in the

groups defined by the dependent variable if the cross-validated classification accuracy rate was

significantly higher than the accuracy attainable by chance alone.

Operationally, the cross-validated classification accuracy rate should be 25% or higher than the

proportional by chance accuracy rate.

(iii) The proportional by chance accuracy rate was computed by squaring and summing the

proportion of cases in each group from the table of prior probabilities for groups (0.279 + 0.721 =

0.598).

The criteria (thumb-rule) for a useful model is 25% greater than the by chance accuracy rate (1.25 x

59.8% = 74.75%).

If the sample size did not initially satisfy the minimum requirements, discriminant analysis is not

appropriate. Let us now go to Classification Results.

Casewise Statistics

Actual

Group

Highest Group

Predicted

Group

P(G=g | D=d)

P(D>d | G=g)

Case Number

Original

df

1

2

1

0

0

0

.206

.393

1

1

.552

.642

.601

.090

.388

.012

10

ungrouped

.056

11

.435

13

14

ungrouped

15

16

17

Squared

Group

Mahalanobis

Distance to

Centroid

P(G=g | D=d)

Squared

Mahalanobis

Distance to

Centroid

1.602

.730

1

1

.448

.358

.126

.003

.863

.273

.137

2.054

.948

2.877

.052

6.796

.563

.745

.437

3.147

.975

6.319

.025

11.727

.957

3.647

.043

7.955

.544

.610

.456

2.863

.374

.898

.791

.102

3.240

.836

.825

.043

.175

1.248

.069

.953

3.298

.047

7.436

ungrouped

.058

.957

3.601

.043

7.887

.938

.807

.006

.193

.976

18

ungrouped

.383

.565

.761

.435

3.179

19

.537

.690

.381

.310

.086

20

ungrouped

.302

.600

1.066

.400

3.776

21

.451

.538

.568

.462

2.771

22

ungrouped

.426

.548

.635

.452

2.916

23

.312

.908

1.024

.092

3.696

29

.964

.789

.002

.211

.749

30

ungrouped

.456

.885

.556

.115

2.745

31

.217

.559

1.525

.441

.105

32

.732

.741

.118

.259

.322

33

.898

.815

.016

.185

1.079

34

.783

.834

.076

.166

1.407

35

.359

.629

.842

.371

.000

36

.744

.840

.107

.160

1.531

39

.989

.798

.000

.202

.855

40

.915

.780

.011

.220

.646

41

.498

.879

.459

.121

2.523

42

.228

.566

1.452

.434

.086

43

.459

.535

.549

.465

2.729

44

ungrouped

.550

.503

.357

.497

2.274

45

.277

.612

1.179

.388

3.987

For the original data, squared Mahalanobis distance is based on canonical functions. For the cross-validated

data, squared Mahalanobis distance is based on observations.

** Misclassified case

a. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the

functions derived from all cases other than that case.

Classification Results

Original

Crossvalidated

Count

NO

YES

Ungrouped cases

%

NO

YES

Ungrouped cases

Count

NO

%

NO

99

29

60

89.2

67.4

81.1

99

YES

NO

YES

Total

YES

12

14

14

10.8

32.6

18.9

12

111

43

74

100.0

100.0

100.0

111

30

13

89.2 10.8

69.8 30.2

43

100.0

100.0

Classification Results

a. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the

functions derived from all cases other than that case.

b. 73.4% of original grouped cases correctly classified.

c. 72.7% of cross-validated grouped cases correctly classified.

Interpretation: The proportional chance accuracy rate was computed by squaring and summing the

proportion of cases in each group from the table of prior probabilities for groups (0.279 + 0.721 =

0.598).

The criteria (thumb-rule) for a useful model is 25% greater than the by chance accuracy rate (1.25 x

59.8% = 74.75%). This should be ideally greater than or equal to proportional by chance accuracy

criterion. Here, we find it to be 72.7%, which is quite close. So, we can conclude that the criterion for

classification accuracy is satisfied.

Conclusions: The final question is a summary of the findings of the analysis: overall relationship, individual

relationships, and usefulness of the model.

Cautions are added, if needed, for sample size and level of measurement issues.

Age and gender were the two independent variables we identified as strong contributors to distinguishing

between the groups defined by the dependent variable.

The analysis identified the following specific relationships. Survey respondents who had not seen a

thriller movie in the last year were older than survey respondents who had seen a thriller movie in

the last year. Moreover, survey respondents who had not seen a thriller movie in the last year were

most likely to be female than survey respondents who had seen a thriller movie in the last year.

10

Stepwise case:

When stepwise method is used to enter predictor variables we can use Wilks Lambda criterion for

entry. Stepwise MDA identifies independent variables with lowest significant Wilks Lambda and enters

the model. At each step it is important to check whether the previous indep variables that entered in the

previous steps still have a significant .

Appendix

What is an Eigenvalue?

In matrix algebra, an eigenvalue is a constant, which if subtracted from the diagonal

elements of a matrix, results in a new matrix whose determinant equals zero.

An example

4

Given the matrix: A =

2

1

4 x

5

2

1

5 x

( 4 - x) (5 - x) - (2) (1) = 0.0

(20 - 4x - 5x + x2 - 2) = 0.0

(18 - 9x + x2) = 0.0

(x2 - 9x + 18) = 0.0

This quadratic equation has two solutions or eigenvalues:

+ 6 and + 3

We will use discriminant analysis to evaluate the relationship between the dependent variable

opinion about spending on welfare [natfare] and the independent variables number of hours

worked in the past week [hrs1]; self-employment [wrkslf]; highest year of school completed [educ]; and

income [rincom98]. Like the previous method, stepwise regression also requires the dependent

variable to be non-metric and the indep variables be metric or dichotomous. The dependent variable

has three levels: 1=too little; 2=about right; 3=too much.

The procedure of ticking boxes/options remains the same expect clicking on Use stepwise method

instead of the earlier case of clicking Enter independents together.

Discriminant analysis is all about the existence of sufficiently statistically significant discriminant

functions to separate the groups defined by the dependent variable. In this case, we have 3 groups

defined about spending on welfare and 4 indep variables; so the maximum number of discriminant

functions are 2.

11

In order to specify the role that each independent variable plays in predicting group membership on

the dependent variable, we must link together the relationship between the discriminant functions

and the groups defined by the dependent variable, the role of the significant independent variables in

the discriminant functions, and the differences in group means for each of the variables

Discriminant

Analysis Case Processing Summary

Unweighte

N Percent

d Cases

Valid

138

51.1

Excluded

Missing or out-of-range group codes 7

2.6

At least one missing discriminating variable 115

42.6

Both missing or out-of-range group codes and at least one 10

3.7

missing discriminating variable

Total 132

48.9

Total

270 100.0

Interpretation: Preferred ratio of valid cases to independent variables is 20:1; in this case it is

138:4=34.5.

Group Statistics

WELFARE

TOO NUMBER OF HOURS WORKED LAST WEEK

LITTLE R SELF-EMP OR WORKS FOR SOMEBODY

HIGHEST YEAR OF SCHOOL COMPLETED

RESPONDENTS INCOME

Unweighted Weighted

43.96

1.93

13.73

13.70

13.24

.26

2.40

5.03

56

56

56

56

56.000

56.000

56.000

56.000

RIGHT R SELF-EMP OR WORKS FOR SOMEBODY

HIGHEST YEAR OF SCHOOL COMPLETED

RESPONDENTS INCOME

37.90

1.90

14.78

14.00

13.23

.30

2.56

5.50

50

50

50

50

50.000

50.000

50.000

50.000

MUCH R SELF-EMP OR WORKS FOR SOMEBODY

HIGHEST YEAR OF SCHOOL COMPLETED

RESPONDENTS INCOME

42.03

1.75

13.38

14.75

10.46

.44

2.52

5.30

32

32

32

32

32.000

32.000

32.000

32.000

R SELF-EMP OR WORKS FOR SOMEBODY

HIGHEST YEAR OF SCHOOL COMPLETED

RESPONDENTS INCOME

41.32

1.88

14.03

14.05

12.85

.33

2.54

5.25

138

138

138

138

138.000

138.000

138.000

138.000

Interpretation: The average "number of hours worked in the past week" [HRS1] for survey respondents

who thought we spend about the right amount of money on welfare (mean=37.90) was lower than the

average HRS1 for survey respondents who thought we spend too little money on welfare

(mean=43.96) and survey respondents who thought we spend too much money on welfare

(mean=42.03).

12

This supports the relationship that survey respondents who thought we spend about the right

amount of money on welfare worked fewer hours in the past week than survey respondents who

thought we spend too little or too much money on welfare.

The second variable "self-employment" [wrkslf] is a dichotomous variable. Hence, the mean is not

directly interpretable. Its interpretation must take into account the coding by which 1 corresponds to

self-employed and 2 corresponds to working for someone else. The higher means for survey

respondents who thought we spend too little money on welfare (mean=1.93), when compared to the

means for survey respondents who thought we spend too much money on welfare (mean=1.75),

implies that the groups contained fewer survey respondents who were self-employed and more survey

respondents who were working for someone else.

The average "highest year of school completed" [educ] for survey respondents who thought we spend

about the right amount of money on welfare (mean=14.78) was higher than that of survey

respondents who thought we spend too little money on welfare (mean=13.73) and survey respondents

who thought we spend too much money on welfare (mean=13.38).

NUMBER OF HOURS WORKED LAST WEEK

R SELF-EMP OR WORKS FOR SOMEBODY

HIGHEST YEAR OF SCHOOL COMPLETED

RESPONDENTS INCOME

Wilks' Lambda

.956

.954

.947

.994

F

3.100

3.284

3.785

.411

df1

2

2

2

2

df2

135

135

135

135

Sig.

.048

.041

.025

.664

Interpretation: Note that =FINV(0.05,2,135) = 3.063. All the first 3 indpe variables have a Fobserved value, higher than this hence, the null hyp is rejected and they make a significant

contribution in discriminating the dep variable. Using the lowest Wilks Lambda for entry in stepwise

MDA, we find that HIGHEST YEAR OF SCHOOL COMPLETED has the lowest value, followed by R SELFEMP OR WORKS FOR SOMEBODY and NUMBER OF HOURS WORKED LAST WEEK. At this stage, we

find that RESPONDENTS INCOME has the highest Wilks Lambda insignificant, as well.

Analysis 1

Box's Test of Equality of Covariance Matrices

Log Determinants

WELFARE

Rank

Log Determinant

TOO LITTLE

3

4.145

ABOUT RIGHT

3

4.629

TOO MUCH

3

4.748

Pooled within-groups

3

4.603

The ranks and natural logarithms of determinants printed are those of the group covariance matrices.

Test Results

Box's M

F

Approx.

df1

19.386

1.560

12

13

df2 53206.694

Sig.

.096

Tests null hypothesis of equal population covariance matrices.

Interpretation: As the F-observed lies below =FINV(0.05,12,53207) = 1.75, the null hyp is accepted, which is a

requirement by assumption.

Stepwise Statistics

Variables Entered/Removed

Statistic Between Groups

Min. D Squared

Exact F

Statistic df1

df2

Sig.

.475 1 135.000

.492

Step

Entered

1 NUMBER OF HOURS WORKED

.023 TOO LITTLE and

LAST WEEK

TOO MUCH

2 R SELF-EMP OR WORKS FOR

.251 TOO LITTLE and

3.289 2 134.000 4.031E-02

SOMEBODY

ABOUT RIGHT

3 HIGHEST YEAR OF SCHOOL

.364 TOO LITTLE and

2.433 3 133.000 6.783E-02

COMPLETED

TOO MUCH

At each step, the variable that maximizes the Mahalanobis distance between the two closest groups is entered.

a Maximum number of steps is 8.

b Maximum significance of F to enter is .05.

c Minimum significance of F to remove is .10.

d F level, tolerance, or VIN insufficient for further computation.

Interpretation: As mentioned, [educ] has not been included in the list of the best subset of predictors

in the question.

We are interested to know whether higher/lower scores of the indep variables is associated with

membership with one group than the other. This can be taken up as a comparison of the means of

the groups defined by the dep variable.

In step 1, the variable that enters is HRS1 (number of hours worked in past week). From here, let us

go to structure matrix.

In step 2, the variable that enters is self-employment [workslf]. So, this can be called the sencond

best predictor. Now, let us go to the Structure Matrix.

The variable that entered and removed, "highest year of school completed" [educ] was added to the

discriminant analysis in step 3. Highest year of school completed can be characterized as the third

best predictor. Let us now go to the Structure Matrix.

Step

1

2

WEEK

NUMBER OF HOURS WORKED LAST

WEEK

R SELF-EMP OR WORKS FOR

SOMEBODY

Tolerance Sig. of F to

Min. D Between Groups

Remove Squared

1.000

.048

.986

.044

.986

.037

ABOUT RIGHT

.023 TOO LITTLE and

TOO MUCH

14

WEEK

R SELF-EMP OR WORKS FOR

SOMEBODY

HIGHEST YEAR OF SCHOOL COMPLETED

.957

.017

.986

.040

.970

.010

ABOUT RIGHT

.037 TOO LITTLE and

TOO MUCH

.251 TOO LITTLE and

ABOUT RIGHT

Interpretation: Tolerance, if you remember, is the extent of variability of the selected independent variable NOT

explained by other indep variables. So, higher the tolerance, the better, in the sense that lower is the change of

problems like multicollinearity. But, as evident, as more and more indep variables enter using stepwise method,

Tolerance is bound to fall. As long as these are significant it is fine. As you will notice, the stringency of Sig. Of F

to remove is gradually relaxed as we move from step 1 to step 3.

Variables Not in the Analysis

Step

0

Tolerance

LAST WEEK

R SELF-EMP OR WORKS FOR

SOMEBODY

HIGHEST YEAR OF SCHOOL

COMPLETED

RESPONDENTS INCOME

1.000

Min. Sig. of F

Min. D

Tolerance to Enter Squared

1.000

.048

.023

1.000

1.000

.041

.008

1.000

1.000

.025

.021

1.000

1.000

.664

.003

SOMEBODY

HIGHEST YEAR OF SCHOOL

COMPLETED

RESPONDENTS INCOME

.986

.986

.037

.251

.970

.970

.009

.037

.862

.862

.332

.100

COMPLETED

RESPONDENTS INCOME

.970

.957

.010

.364

.840

.837

.200

.297

RESPONDENTS INCOME

.705

.705

.132

.521

Between Groups

TOO LITTLE and TOO

MUCH

TOO LITTLE and ABOUT

RIGHT

TOO LITTLE and TOO

MUCH

TOO LITTLE and ABOUT

RIGHT

TOO LITTLE and ABOUT

RIGHT

TOO LITTLE and TOO

MUCH

TOO LITTLE and TOO

MUCH

TOO LITTLE and TOO

MUCH

ABOUT RIGHT and TOO

MUCH

TOO LITTLE and ABOUT

RIGHT

Wilks' Lambda

Number of

Step Variables

1

1

2

2

3

3

.956

.910

.850

1

2

3

df3

2 135

2 135

2 135

Statistic

3.100

3.223

3.767

df1

2

4

6

Exact F

df2

Sig.

135.000 4.828E-02

268.000 1.319E-02

266.000 1.288E-03

Interpretation: This is the Wilks Lambda table for variables (and not functions) it shows that in all

the steps the variables entered have a p-value of less than 0.05 (i.e., Step 1 p-value=0.048; Step 2

p-value=0.013; and Step 3 p-value=0.001).

Eigenvalues

FunctionEigenvalue

% of Cumulativ Canonical

15

Variance

e % Correlation

1

.117

68.3

68.3

.323

2

.054

31.7

100.0

.227

a First 2 canonical discriminant functions were used in the analysis.

Interpretation: As we know, higher the Eigenvalue the better. The first Fn. Has a better Eigenvalue compared to

the second. % of variance explained between groups as compared to within groups is also seen to be higher in

the first case.

Wilks' Lambda

Test of Function(s) Wilks' Lambda Chi-square df Sig.

1 through 2

.850

21.853 6 .001

2

.949

7.074 2 .029

Interpretation: Wilks Lambda, which tests discriminant functions statistical significance, has

identified 2 discriminant functions both with p-value less than 0.05.

Standardized Canonical Discriminant Function Coefficients

Function

1

2

NUMBER OF HOURS WORKED LAST WEEK

-.704 .444

R SELF-EMP OR WORKS FOR SOMEBODY

.149 .942

HIGHEST YEAR OF SCHOOL COMPLETED

.810 .070

Structure Matrix

Function

1

2

HIGHEST YEAR OF SCHOOL COMPLETED

.687

.136

NUMBER OF HOURS WORKED LAST WEEK

-.582

.345

R SELF-EMP OR WORKS FOR SOMEBODY

.223

.889

RESPONDENTS INCOME

.101

.292

Pooled within-groups correlations between discriminating variables and standardized canonical discriminant

functions Variables ordered by absolute size of correlation within function.

* Largest absolute correlation between each variable and any discriminant function

a This variable not used in the analysis.

Interpretation: Here we see that HRS1 has the largest loading in discr function 1, which differentiates

survey respondents who thought we spend about the right amount of money on welfare from who

thought we spend too little or too much money on welfare. From here let us go to Group Statistics.

For the entry of the second variable, i.e., workslf, the largest loading was 0.889 on discriminant

function 2. This discriminates survey respondents who thought we spend too little money on welfare

from who thought we spend too much money on welfare. Now, let us again go to Group Statistics.

In the structure matrix, the largest loading for the variable "highest year of school completed" [educ]

was .687 on discriminant function 1 which differentiates survey respondents who thought we spend

about the right amount of money on welfare from who thought we spend too little or too much money

on welfare. Let us now go to Group Statistics.

16

Function

WELFARE

1

2

TOO LITTLE

-.220

.235

ABOUT RIGHT

.446 -3.150E-02

TOO MUCH

-.311

-.362

Unstandardized canonical discriminant functions evaluated at group means

Interpretation: The values at group centroids for the first discriminant function were positive for the

group who thought we spend about the right amount of money on welfare (.446) and negative for

groups who thought we spend too little (-.220) or too much (-.311) money on welfare. This pattern

clearly distinguishes survey respondents who thought we spend about the right amount of money on

welfare from the other survey respondents.

The values at group centroids for the second discriminant function were positive for the group who

thought we spend too little money on welfare (.235) and negative for group who thought we spend too

much money on welfare (-.362). This pattern distinguishes survey respondents who thought we

spend too little money on welfare from survey respondents who thought we spend too much money

on welfare.

Classification Statistics

Classification Processing Summary

Processed

Excluded

Missing or out-of-range group codes

At least one missing discriminating variable

Used in Output

270

0

100

170

WELFARE

TOO LITTLE

ABOUT RIGHT

TOO MUCH

Total

Unweighted

.406

56

.362

50

.232

32

1.000

138

Weighted

56.000

50.000

32.000

138.000

Interpretation: The minimum number of cases per indep variable is fulfilled with the minimum cases

being 32.

The independent variables could be characterized as useful predictors of membership in the groups

defined by the dependent variable if the cross-validated classification accuracy rate was significantly

higher than the accuracy attainable by chance alone.

Operationally, the cross-validated classification accuracy rate should be 25% or more higher than the

proportional by chance accuracy rate.

17

The proportional by chance accuracy rate was computed by squaring and summing the proportion of

cases in each group from the table of prior probabilities for groups (0.406 + 0.362 + 0.232 = 0.350,

or 35.0%).

The proportional by chance accuracy criteria was 43.7% (1.25 x 35.0% = 43.7%). Now, let us go to

Classification Results.

18

Casewise Statistics

Highest Group

Predicted

Group

P(D>d |

G=g)

P(G=g | D=d)

Discriminant Scores

Mahalanob

Mahalanob

is Distance

is Distance

to Centroid

to Centroid

Actual

Case

Group

Number

p df

Original

1

2

2

.662 2

.484

.826

1

.393

1.473

.867

.774

2

2

2

.233 2

.691

2.917

1

.232

5.331

2.076

.479

5

1

2

.738 2

.535

.607

1

.336

1.764

1.098

.395

6

1

1

.280 2

.636

2.549

3

.224

3.514

-1.645

.956

8

2

2

.950 2

.445

.103

1

.387

.606

.558

.269

12

3

1

.953 2

.490

.097

2

.273

1.036

-.531

.254

13

1

1

.680 2

.559

.772

3

.235

1.388

-1.032

.570

14

1

1

.953 2

.490

.097

2

.273

1.036

-.531

.254

42

1

1

.941 2

.437

.122

2

.371

.223

.120

.311

43

3

1

.272 2

.660

2.603

3

.187

4.007

-1.438

1.293

44

2

1

.970 2

.472

.061

2

.333

.533

-.102

.451

45

1

1

.447 2

.602

1.611

3

.230

2.418

-1.366

.781

47

2

1

.999 2

.465

.003

2

.320

.523

-.205

.283

48

2

1

.987 2

.449

.025

2

.316

.500

-.253

.079

49

1

1

.738 2

.565

.607

2

.239

2.100

-.715

.837

50 ungrouped

3

.000 2

.818

16.603

1

.127

21.444

-2.531

-3.778

51

3

2

.165 2

.482

3.598

3

.315

3.556

.782

-1.898

53

1

2

.535 2

.539

1.253

1

.355

2.319

1.193

.802

For the original data, squared Mahalanobis distance is based on canonical functions. For the cross-validated data, squared Mahalanobis distance is

based on observations.

** Misclassified case

a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other

than that case.

19

2

1

TOO LITTLE

ABOUT RIGHT

TOO MUCH

WELFARE

-1

Group Centroids

Function 2

-2

Ungrouped Cases

TOO MUCH

-3

ABOUT RIGHT

-4

TOO LITTLE

-3

-2

-1

Function 1

Classification Results

Predicted Group

Membership

TOO ABOUT TOO

LITTLE RIGHT MUCH

WELFARE

Total

TOO LITTLE

43

15

6

64

ABOUT RIGHT

26

30

6

62

TOO MUCH

17

10

9

36

Ungrouped cases

3

3

2

8

%

TOO LITTLE

67.2

23.4

9.4 100.0

ABOUT RIGHT

41.9

48.4

9.7 100.0

TOO MUCH

47.2

27.8

25.0 100.0

Ungrouped cases

37.5

37.5

25.0 100.0

Cross-validated Count

TOO LITTLE

43

15

6

64

ABOUT RIGHT

26

30

6

62

TOO MUCH

17

11

8

36

%

TOO LITTLE

67.2

23.4

9.4 100.0

ABOUT RIGHT

41.9

48.4

9.7 100.0

TOO MUCH

47.2

30.6

22.2 100.0

a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by

the functions derived from all cases other than that case.

b 50.6% of original grouped cases correctly classified.

c 50.0% of cross-validated grouped cases correctly classified.

Original Count

20

Interpretation: The cross-validated accuracy rate computed by SPSS was 50.0% which was greater

than or equal to the proportional by chance accuracy criteria of 43.7% (1.25 x 35.0% = 43.7%). The

criteria for classification accuracy is satisfied.

Conclusion: Hours worked, self-employment, and education were the three independent variables we

identified as strong contributors to distinguishing between the groups defined by the dependent

variable.

The model was characterized as useful because it equaled the by chance accuracy criterion.

The summary correctly states the specific relationships between the dependent variable groups and

the independent variables we interpreted.

Survey respondents who thought we spend about the right amount of money on welfare worked fewer

hours in the past week than survey respondents who thought we spend too little or too much money

on welfare. Survey respondents who thought we spend too little money on welfare were less likely to

be self-employed than survey respondents who feel we spend too much money on welfare. Survey

respondents who thought we spend about the right amount of money on welfare had completed more

years of school than survey respondents who thought we spend too little or too much money on

welfare.

Question:

Dependent non-metric?

Independent variables

metric or dichotomous?

Inappropriate

application of a

statistic

No

Yes

included in analysis?

Yes

No

True

21

Question:

variables identified in the research question.

Ratio of cases to

independent variables at

least 5 to 1?

Yes

Satisfies preferred

Yes

ratio of cases

to IV's

of

20

to

1

Number of cases in

smallest group greater

than number of independent

variables?

Inappropriate

application of a

statistic

No

No

No

Inappropriate

application of a

statistic

Yes

Satisfies preferred DV

group minimum size

of 20 cases?

No

Yes

True

Sufficient statistically

significant functions to

distinguish DV groups?

No

False

Yes

or sample size not meeting

preferred requirements?

No

True

Yes

True with caution

22

Stepwise method of

entry used to include

independent variables?

No

Yes

Best subset of

predictors correctly

identified?

Yes

No

Relationships between

individual IVs and DV

groups interpreted

correctly?

or sample size not meeting

preferred requirements?

False

No

False

Yes

No

True

Yes

Cross-validated accuracy is

25% higher than proportional

by chance accuracy rate?

Yes

No

23

stated (significant function)?

No

Yes

and DV correctly stated?

No

Yes

sample size not meeting

preferred requirements?

Classification accuracy

supports useful model?

No

Yes

No

Yes

True

Tests of Equality of Group Means

Wilks' Lambda

F df1 df2 Sig.

AGE OF RESPONDENT

.958 6.684 1 152 .011

HIGHEST YEAR OF

.999

.091 1 152 .763

SCHOOL COMPLETED

RESPONDENTS GENDER

.904 16.069 1 152 .000

TOTAL FAMILY INCOME

.994

.889 1 152 .347

Analysis 1

Box's Test of Equality of

Covariance Matrices

Log Determinants

SEEN THRILLER MOVIE IN LAST YEAR Rank

Log Determinant

NO

4

9.176

YES

4

8.573

Pooled within-groups

4

9.077

The ranks and natural logarithms of determinants printed are those of the group covariance matrices.

Test Results

Box's M

F

Approx.

df1

df2

10.220

.983

10

30348.624

24

Sig.

.455

Tests null hypothesis of equal population covariance matrices.

Interpretation: This is a test of variability, as mentioned above. But this is an overall judgement of all

the indep variables taken together.

Boxs M

= 10.220

Approx. F

= 0.983

Conclusion: F-tab = FINV(0.05,10,30349) = 1.83. As calculated F is much lower than Tabular F, the

null hyp is accepted. This is also confirmed by the p-value. Therefore, we can conclude that group

homogeneity is present.

Eigenvalues

Function

Eigenvalue % of Variance Cumulative % Canonical Correlation

1

.169

100.0

100.0

.380

a First 1 canonical discriminant functions were used in the analysis.

Interpretation: There exists one Eigenvalue for one discriminant function. It depicts the relative discriminatory

power of the discriminant functions. For two groups, it hardly makes any sense, but in case there are more than

two groups, it allows to understand which function is better.

In the present example there are two groups, i.e., seen thriller movie last year = 1 and not seen

thriller movie last year=0. As we know, when there are two groups only one discriminant function can

be extracted from the data and its Eigenvalue () is interpreted as follows:

In simple language, in the two-group case, we can define Between Sum of Squares (i.e., Sum of

Squares Across Groups) as follows:

BSS z 0 z z1 z z j z

2

WSS z i 0 z 0 z i1 z1 z ij z j

2

Eigenvalue is defined as:

BSS

. So, if = 0.00, the model has no discriminatory power (as BSS =

WSS

0). The larger the value of the greater the discriminatory power of the model.

Two groups can be separated by one discriminant function. Three groups require two discriminant

functions. The required number of functions is usually one less than the number of groups.

With 4 independent variables and 2 groups defined by the dependent variable, the maximum

possible number of discriminant functions was 1, which accounts for the 100% of variation by itself.

(cross-check this with the 3-group case)

25

The significance of the maximum possible number of discriminant functions supports the

interpretation of a solution using 1 discriminant function. Now let us go to Functions at Group

Centriods.

(III) Wilks' Lambda

Test of Wilks' Chi-square

Function(s) Lambda

.855

23.440

df

Sig.

.000

Interpretation: The overall relationship in discriminant analysis is based on the existence of sufficient

statistically significant discriminant functions to separate all of the groups defined by the dependent

variable. As we see, the observed Chi-square (23.440) does not fall within the critical region (as

=CHIINV(0.025,4) = 11.1433 and =CHIINV(0.975,4) = 0.484. The probability of p<0.001 [p(calculated)]

is also less than the level of significance of 0.05 [p(critical)].

Wilks Lambda is given as follows:

WSS

, which is nothing but the within group variation with

TSS

respect to the total variation. Hence, a lower value of this implies greater homogeneity within group,

which we will be happy to get. But, again, this is the overall Wilks Lambda, considering all the indep

variables together. Now, let us go to Group Centroids.

Function

1

AGE OF RESPONDENT

.610

HIGHEST YEAR OF SCHOOL COMPLETED

.138

RESPONDENTS GENDER

.849

TOTAL FAMILY INCOME

-.139

(V) Structure Matrix

Function

1

.791

.510

-.186

-.060

RESPONDENTS GENDER

AGE OF RESPONDENT

TOTAL FAMILY INCOME

HIGHEST YEAR OF SCHOOL

COMPLETED

Pooled within-groups correlations between discriminating variables and standardized canonical discriminant

functions Variables ordered by absolute size of correlation within function.

(IV) Functions at Group Centroids

Function

SEEN THRILLER MOVIE IN LAST YEAR

1

NO

.254

YES

-.656

Unstandardized canonical discriminant functions evaluated at group means

Classification Statistics

Classification Processing Summary

Processed

270

26

Excluded

At least one missing discriminating variable

Used in Output

(II) Prior Probabilities for Groups

Prior

SEEN THRILLER

MOVIE IN LAST YEAR

NO

YES

Total

.721

.279

1.000

0

42

228

Unweighted

Weighted

111

43

154

111.000

43.000

154.000

Interpretation: (i) In addition to the requirement for the ratio of cases to independent variables,

discriminant analysis requires that there be a minimum number of cases in the smallest group

defined by the dependent variable. The number of cases in the smallest group must be larger than

the number of independent variables, and preferably contains 20 or more cases.

The number of cases in the smallest group in this problem is 43, which is larger than the number of

independent variables (4), satisfying the minimum requirement. In addition, the number of cases in

the smallest group satisfies the preferred minimum of 20 cases. Now we go to Wilks Lambda.

27

Casewise Statistics

Actual

Group

Highest Group

Predicted

Group

P(D>d | G=g)

P(G=g | D=d)

Case Number

Original

df

1

2

1

0

0

0

.206

.393

1

1

.552

.642

.601

.090

.388

Squared

Group

Mahalanobis

Distance to

Centroid

P(G=g | D=d)

Squared

Mahalanobis

Distance to

Centroid

1.602

.730

1

1

.448

.358

.126

.003

.863

.273

.137

2.054

.948

2.877

.052

6.796

.563

.745

.437

3.147

.012

.975

6.319

.025

11.727

10

ungrouped

.056

.957

3.647

.043

7.955

11

.435

.544

.610

.456

2.863

13

.374

.898

.791

.102

3.240

14

ungrouped

.836

.825

.043

.175

1.248

15

.069

.953

3.298

.047

7.436

16

ungrouped

.058

.957

3.601

.043

7.887

17

.938

.807

.006

.193

.976

18

ungrouped

.383

.565

.761

.435

3.179

19

.537

.690

.381

.310

.086

20

ungrouped

.302

.600

1.066

.400

3.776

21

.451

.538

.568

.462

2.771

22

ungrouped

.426

.548

.635

.452

2.916

23

.312

.908

1.024

.092

3.696

29

.964

.789

.002

.211

.749

30

ungrouped

.456

.885

.556

.115

2.745

31

.217

.559

1.525

.441

.105

32

.732

.741

.118

.259

.322

33

.898

.815

.016

.185

1.079

34

.783

.834

.076

.166

1.407

35

.359

.629

.842

.371

.000

36

.744

.840

.107

.160

1.531

39

.989

.798

.000

.202

.855

40

.915

.780

.011

.220

.646

41

.498

.879

.459

.121

2.523

42

.228

.566

1.452

.434

.086

43

.459

.535

.549

.465

2.729

44

ungrouped

.550

.503

.357

.497

2.274

45

.277

.612

1.179

.388

3.987

For the original data, squared Mahalanobis distance is based on canonical functions. For the cross-validated

data, squared Mahalanobis distance is based on observations.

** Misclassified case

a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the

functions derived from all cases other than that case.

28

Classification Results

Original

Crossvalidated

Count

NO

YES

Ungrouped cases

%

NO

YES

Ungrouped cases

Count

NO

NO

99

29

60

89.2

67.4

81.1

99

YES

NO

YES

Total

YES

12

14

14

10.8

32.6

18.9

12

111

43

74

100.0

100.0

100.0

111

30

13

89.2 10.8

69.8 30.2

43

100.0

100.0

Classification Results

a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the

functions derived from all cases other than that case.

b 73.4% of original grouped cases correctly classified.

c 72.7% of cross-validated grouped cases correctly classified.

Question:

Dependent non-metric?

Independent variables

metric or dichotomous?

Inappropriate

application of a

statistic

No

Yes

included in analysis?

Yes

No

True

29

Question:

variables identified in the research question.

Ratio of cases to

independent variables at

least 5 to 1?

Yes

Satisfies preferred

Yes

ratio of cases

to IV's

of

20

to

1

Number of cases in

smallest group greater

than number of independent

variables?

Inappropriate

application of a

statistic

No

No

No

Inappropriate

application of a

statistic

Yes

Satisfies preferred DV

group minimum size

of 20 cases?

No

Yes

True

Sufficient statistically

significant functions to

distinguish DV groups?

No

False

Yes

or sample size not meeting

preferred requirements?

No

True

Yes

True with caution

30

Stepwise method of

entry used to include

independent variables?

No

Yes

Best subset of

predictors correctly

identified?

Yes

No

Relationships between

individual IVs and DV

groups interpreted

correctly?

or sample size not meeting

preferred requirements?

False

No

False

Yes

No

True

Yes

Cross-validated accuracy is

25% higher than proportional

by chance accuracy rate?

Yes

No

31

stated (significant function)?

No

Yes

and DV correctly stated?

No

Yes

sample size not meeting

preferred requirements?

Classification accuracy

supports useful model?

No

Yes

No

Yes

True

32

- WEKA Explorer Tutorial.pdfUploaded bymehdi.chlif4374
- Application of Cluster Analysis to Fabric ClassificationUploaded bychrysobergi
- Chapter 1 RmUploaded byAnkur Dharod
- 2018_AUT_BIOST_514_517_RiceKUploaded byclempi15
- Research on Sorting Algorithm for Web ContentsUploaded byAnonymous vQrJlEN
- The Unscrambler X v10.3 - User ManualUploaded byjohneiver
- Midterm Sample 2 2010Uploaded byHazra Imran
- Course Out Line_TPS XXUploaded byKarthik Reddy
- Musical Trends and Predictability of Success in Contemporary Songs in and Out of the Top ChartsUploaded byNirian
- data miningUploaded bycrinaus2003
- Artificial Intelligent Tools for detecting incipient faults in transformer using Dissolved Gas AnalysisUploaded byIRJET Journal
- 3.2- Selecting a Stat TestUploaded byFlower
- Classification Scoring for Cleaning Inconsistent Survey DataUploaded byAI Coordinator - CSC Journals
- GNP TrendsUploaded byParveen Niazi
- 31 Practice Statistics Test Hypothesis TestingUploaded bycrosstheevil
- Group-Sequential Non-Inferiority Tests for the Odds Ratio of Two Proportions (Simulation)Uploaded byscjofyWFawlroa2r06YFVabfbaj
- Lecture 8Uploaded byColin
- DayLite Implementation GuideUploaded bybrufpot
- Tourist Profile of Young-Adults in Macedonia and Their Perception of E-ToolsUploaded byJPMNT
- StatisticsUploaded bysky
- Naive Bayes HerniUploaded byEri Zuliarso
- Thesis-METHIAN.docxUploaded byAnali Barbon
- TerminologyUploaded byBenjamin Basow
- Bios TastUploaded byAuthor Nauman Shad
- Managerial Culture - Factor of Influence of PerformanceUploaded byMathew Usf
- 122511558 Biostat Quiz LeakUploaded byEdgar Dominic A. Bongco
- MIT15_075JF11_exam04Uploaded byElzein Amir Elzein
- ExcelCrashCourseCourseNotes1498261505273-1522955684302Uploaded byKhrum Ahmed
- PAS ProjectUploaded byAkash Verma
- Lecture 05 Classification of ServicesUploaded byNupur Agarwal Jain

- Membership Details_yamuna_sports_complex.docUploaded byChitra Belwal
- Determining TotalUploaded by(unknown)
- When to Use What TestUploaded bySyed Abdullah Mohsin Raza
- Quasi ContractsUploaded byChitra Belwal
- XAT Essay TopicsUploaded byChitra Belwal
- Capstone FundamentalsUploaded byChitra Belwal
- Prem ChandUploaded byChitra Belwal
- Timing DiagramsUploaded byChitra Belwal

- Word Problems Involving Systems of Linear EquationsUploaded byrotsacrreijav123
- a Non-iterative, Polynomial, 2-Dimensional Calibration Method Implemented in a MicrocontrollerUploaded by徐锐
- BAB01- Introduction to SimulinkUploaded bycakMAD69
- Econ_360-10-15-ChapUploaded byPoonam Naidu
- Learning Polynomials With Neural NetworksUploaded byjoscribd
- Algebra - BoundlessUploaded byklingon
- Analytical ExpressionsUploaded byParag Jyoti Dutta
- Fundamentals of Process Control Theory ThirdEd_Murrill_Unit2Uploaded byGanesh Gany
- QQRUploaded byklatifdg
- Uncertainty Analysis of Creep and Shrinkage Effects - 1983Uploaded byhoustonhimself
- using ode45Uploaded byhmsedighi459
- 6th grade math curriculum mapUploaded byapi-109360342
- Fundamentals of Electrical EngineeringUploaded bygolemarms
- SimileUploaded byDino Dino
- 029φφδγUploaded byPanos Panos
- Math Quest Further Maths VCE 11 (2016 Edition)Uploaded byNhi
- G-7 3rd Periodic Test MathematicsUploaded byLhot Bilds Siapno
- spss18p3Uploaded byAnand Nilewar
- Applications of the Equations of KinematicsUploaded byJuvilyn Mandap
- Stella IntroUploaded byZandec Javier
- 6714533-Plant-Maintenance-CustomizingUploaded bynagar_nitish
- Course Outline Ramp Up Algebra 1Uploaded byMrs. Smillie
- HSC Chem NotesUploaded byJudy Tsui
- Process Dynamics and Control Notes.pdfUploaded byBongibethu Msekeli Hlabano
- Load PullUploaded byDilaawaiz Fazal
- BesterN2013-ConcreteLaboratoryReportUploaded byShaluka Wijesiri
- feep107Uploaded bykritagyasharma29
- 2.Lect2 Basic Tools of Economics AnalysisUploaded byDiganta Dey
- STM-U4Uploaded bykdhanamjay
- Chapter 1 Function (Part1)Uploaded byKathrine Tan