You are on page 1of 15

7/10/2015

Guidelines in Scale Development


Psychological and Psychometric Testing

Scale Construction

Session 6
Prof. Swati Dhir

Step 1

Determine clearly What it is you want to measure

Step 2

Generate an Item Pool

Step 3

Determine the format for Measurement

Step 4

Have initial item pool reviewed by Experts

Step 5

Consider inclusion of Validation items

Step 6

Administer Items to a Development Sample

Step 7

Evaluate the items

Step 8

Optimize scale Length

Constructs and Measurement

Purpose

Should the scale be based in theory or should you strike out


in new intellectual direction ?

To design a questionnaire that provides a


quantitative measurement of an abstract
theoretical variable

Figuring out how to measure what you want to measure

Not all surveys are scales; Decide whether it


really is a scale

Should some aspect of phenomenon be emphasized more


than others ?

Good scales possess both validity and


reliability

Step 1

Step 2

Step 3

Determine clearly Generate an Item Determine the


What it is you
Pool
format for
want to measure
Measurement

Theory as an aid
to clarity

Specificity as an
aid to clarity

Step 4

Step 5

Have initial item Consider


pool reviewed by inclusion of
Experts
Validation items

Step 6

Step 7

Administer Items Evaluate the


to a Development items
Sample

Step 8

Construct Development : A construct is a hypothetical


variable composed of different elements that are thought to
be related (e.g., 5 questions tapping Job satisfaction)

Optimize scale
Length

Boundaries of the phenomenon must be recognized so that


the content of the scale does not drift into unintended
domains

Locus of control is a widely used concept that concerns who


or what influences important outcomes in lives

Creating Items
Writing good items for a scale is definitely an art rather than a science
Think creatively about the construct you seek to measure
Make the questions simple, specific and straightforward
Avoid biased language (emotional words, emphasized text)

Multidimensional
LOC

LOC: oneself , powerful others and chance or fate


Depends largely what level of locus relates to the questions

Avoid double-barreled questions


Do you think that the technical service department is prompt and helpful?
Avoid Nonmonotonic questions

What to include in
a measure

Items that cross over into a related construct can be


problematic

Only people in the military should be allowed to personally own assault rifles.

Step 1

Step 2

Step 3

Determine clearly Generate an Item Determine the


What it is you
Pool
format for
want to measure
Measurement

Step 4

Step 5

Have initial item Consider


pool reviewed by inclusion of
Experts
Validation items

Step 6

Step 7

Administer Items Evaluate the


to a Development items
Sample

Step 8
Optimize scale
Length

7/10/2015

Creating Items

Three Components of Attitudes

Redundancy: Reliability= f (no. of items)


I will do almost anything to ensure my childs success
No sacrifice is too great if it helps my child achieve success

No. of items- 2:1


Avoid exceptionally lengthy items; Reading difficulty level
Use reverse coding a number of your items
Highest value +Lowest value selected response

Cognitive
Component

Affective
Component

How a person thinks


about an attitude
object (product, issue,
candidate, idea)

How a person feels


about an attitude
object

Behavioral
A persons behavioral
predisposition to
respond to an attitude
object in a certain way

Common structure, self contained and no dependency between items

Step 1

Step 2

Step 3

Determine clearly Generate an Item Determine the


What it is you
Pool
format for
want to measure
Measurement

Step 4

Step 5

Have initial item Consider


pool reviewed by inclusion of
Experts
Validation items

Step 6

Step 7

Administer Items Evaluate the


to a Development items
Sample

Step 8
Optimize scale
Length

On the Importance of Attitudes

Measurement
The term questionnaire item is used to
denote a single question on a survey,
corresponding to a single column in a
dataset.
Scales typically denote sets of questions
which become mathematical
combinations of survey items.

I believe both candidates


bring strengths to the table

Step 1

Step 2

Determine
Generate an
clearly What it Item Pool
is you want to
measure

Measurement/Scaling Properties
Assignment
You can assign objects to categories

Step 4

Step 6

Step 7

Consider
Administer
inclusion of
Items to a
Validation items Development
Sample

Step 5

Evaluate the
items

Step 8
Optimize scale
Length

Types of Scales

Nominal Scale
Has Assignment Only (What is Your Gender?)

Ordinal
Has Assignment, Order (Education)

Order (Magnitude)
You can order objects in terms of having more or less of some quality
Distance (Equal Intervals)
The distance between adjacent points on the scale is identical

Step 3

Determine the Have initial


format for
item pool
Measurement
reviewed by
Experts

What is your income? (5-10k; 11-15k; 16-20k; 20-25k; 25-30k)

Interval
Has Assignment, Order, Equal Intervals (Temperature)

Hybrid Ordinally-Interval Scale


Like an ordinal scale, but researcher pretends it is an interval scale (e.g.,
assumes 1 to 7 scale is an interval scale); commonly used in questionnaires

Ratio
Has Assignment, Order, Equal Intervals, Absolute Zero (Number of Cars,
weight)

Origin (Absolute Zero Point)


Zero means something (absence of a given quality)

7/10/2015

Formats for Measurement

Issues in Designing Verbal Rating Scales

Different intensities of the attribute, spaced to represent equal intervals


Could be formatted with agree-disagree response option

Thurstone
Scaling

Series of items tapping progressively higher levels of an attribute


Do you smoke?
Do you smoke more than 10 cigarettes' in a day?
Do you smoke more than a pack?

Guttman
Scaling

A list of adjective pairs either unipolar or bipolar


e.g. Friendly or not friendly ; Friendly or hostile

Semantic
Differential

Many measures taken by researchers are verbal ratings

What do we need to consider when we develop verbal rating scales?


Number of categories
Forced vs. unforced scale
Balanced or unbalanced scale
Extent of verbal description
Should response categories be numbered or not
Comparative vs. noncomparative scale
Scale direction

The item is prepared as a declarative sentence, followed by response


options indicating varying degree of agreement
Widely used in measuring opinions, beliefs and attitudes

Likert Scale

Number of Response Categories?

Forced vs. Unforced Scale?

To what extent are you satisfied with your current Laptop ?

How likely would you be to buy a car manufactured in Brazil?

Most researchers suggest between 5 and 7 categories;


for example:

Forced Scale (even number of options forces the respondent to lean


one way or the other):

Extremely
Dissatisfied

Dissatisfied

Somewhat
Dissatisfied

Neither

Somewhat
Satisfied

Satisfied

Extremely
Satisfied

Very
Unlikely

Unlikely

Somewhat
Unlikely

Somewhat
Likely

Likely

Very
Likely

Unforced scale gives people a neutral option:

Too few does not give you enough information


Too many and it will be hard for people to discriminate between the
options (e.g., a 100-point scale)

Balanced vs. Unbalanced Scale?

1
Very
Unlikely

2
Unlikely

3
Somewhat
Unlikely

4
Neither

5
Somewhat
Likely

India should invest in Infrastructure.

Balanced scale (same number of positive and negative options):

Label endpoints or label all options?

2
Dissatisfied

3
Somewhat
Dissatisfied

4
Neither

5
Somewhat
Satisfied

6
Satisfied

7
Extremely
Satisfied

7
Very
Likely

Extent of Verbal Description?

How satisfied are you with your current hair stylist?

1
Extremely
Dissatisfied

6
Likely

1
Strongly
Disagree

7
Strongly
Agree

1
Strongly
Disagree

2
Moderately
Disagree

3
Slightly
Disagree

4
Neither Agree
or Disagree

5
Slightly
Agree

6
Moderately
Agree

7
Strongly
Agree

Unbalanced scale (here all options are positive):


1
Somewhat
Satisfied

7
Very
Satisfied

Unbalanced scale can give biased results; unless distribution is


naturally skewed to one side of the scale, should use balanced scale

Labeling all options can aid in interpretation

7/10/2015

Should Categories be Numbered?


Toyota is an Environment Friendly Company
Strongly
Disagree
1
-3

Moderately
Disagree
2
-2

Slightly
Disagree
3
-1

Neither Agree
or Disagree
4
0

Slightly
Agree
5
1

Comparative vs. Noncomparative?


Noncomparative question
How would you evaluate Pepsodent toothpaste?

Moderately
Agree
6
2

Strongly
Agree
7
3

Should we have
numbers here?

Numbers can help respondents understand scale

Comparative question
Compared to your current brand, how would you evaluate
Pepsodent toothpaste?
Comparative questions establish the referent and can be useful if you
need to know how your product compares to a specific competitor or
the customers current brand

1 to 7 scale quite common


But -3 to +3 can help interpretation of scale (disagree is negative,
agree is positive), however, it may overemphasize negativity

Noncomparative have the advantage of allowing the respondent to


create their own referent, which can potentially improve accuracy

Judgment call; pretesting both scales could help identify problems

Direction of Scale?
Typical direction (lower values, negative connotation on left):
Strongly
Disagree
1

Moderately
Disagree
2

Slightly
Disagree
3

Neither Agree
or Disagree
4

Slightly
Agree
5

Moderately
Agree
6

Strongly
Agree
7

Some scales are not valenced, so must be careful about positioning. For
a semantic differential scale, with amusing positioning:
Unpleasant

-2

-1

Pleasant

Flimsy
Male

-2
-2

-1
-1

0
0

1
1

2
2

Sturdy
Female

Single-items adequate for measurement?


Suppose an instructor had single-question exams?
Suppose the CAT (or GMAT) had only 5 possible
scores (similar to A,B,C,D,F grades)?

This arrangement suggests that males are to be evaluated negatively;


must be careful in designing scales so as not to bias results

Composite, or Multiple-Item Scales

Formative and Reflective Items

Capture the sensitivity to the continuous nature


of many subtle differences among consumers
Simultaneously address concerns of: Accuracy
and Consistency

Formative
items

Can be combined to
measure the multiple
aspects of a construct,
though not necessary
that respondents answer
each item similarly

Reflective
items

Measures a single trait


and respondents should
answer each item
similarly

All relate to larger issue of measurement error


Items within a scale are typically interchangeable for reflective items but not
for formative items

7/10/2015

Formative Scale Items: Satisfaction


Reflective Items: Materialism
My last flight on JA departed on-time.

Timeliness

An airline could always be on-time if they made that their priority

I admire people who own expensive homes, cars, and clothes.

JA has competitive fares.

Pricing

It upsets me to know others on the same flight have paid a lower price for their seat.
JA ticketing personnel are polite.

Staff

I dont place much emphasis on the amount of material objects


people own as a sign of success.*

JA has friendly reservation operators.


I know its not the airlines fault when a flight is cancelled.

Service

The two-item restriction on carry-on luggage is insensitive to the needs of todays passengers.
JA has ample leg-room for me in coach seating.

Some of the most important achievements in life include acquiring


material possessions.

Travelling Comfort

The things I own say a lot about how well Im doing in life.
I dont pay much attention to the material objects other people own.*

JA did not lose my luggage on my last trip.

* Reverse coded

I have not been bumped from a JA flight in the last two years.

Reviewed By Experts
Ask panel of expert to rate how relevant they think each item
is to what you intend to measure
Provide the expert the working definition of the construct
Can evaluate the items clarity and conciseness (by rating
relevance as high, moderate or low)

Step 2

Determine
Generate an
clearly What it is Item Pool
you want to
measure

Step 3
Determine the
format for
Measurement

Step 4

Step 5

Step 6

Have initial item Consider


Administer
pool reviewed
inclusion of
Items to a
by Experts
Validation items Development
Sample

Step 7
Evaluate the
items

Step 8

-Social desirability scale (Strahan and Gerbasi,


1972)

- For detecting undesirable response


tendencies
we can use MMPI
(Minnesota Multiphasic Personality
Inventory) and response biases can be
detected

Reliability: Test- Retest Method, Alternate forms


method and Split haves method

Optimize scale
Length

Consider inclusion of validation items


Social desirability

Internal Validity (No confounds)


External Validity (Generalized to your target
population)
Content related Evidence: Face validity
Criterion Related Evidence: Predictive Validity,
Concurrent Validity
Construct Related Evidence: Convergent Validity,
Discriminant Validity

Can provide pointing out ways of tapping the phenomenon


that you have failed to include

Step 1

Validity and Reliability

Administer Items to a Development


Sample

Administer items along with the pool of new


items to some subjects
The subject sample should be large enough to
eliminate subject variance as a significant concern
If a single scale is to be extracted from a pool of
about 20 items , fewer than 300 subject may
suffice
Entering the data
Using Computer software
www.surveymonkey.com
http://www.qualtrics.com
Step 1

Step 2

Step 3

Determine clearly Generate an Item Determine the


What it is you
Pool
format for
want to measure
Measurement

Step 4

Step 5

Have initial item Consider


pool reviewed by inclusion of
Experts
Validation items

Step 6

Step 7

Administer Items Evaluate the


to a Development items
Sample

Step 8
Optimize scale
Length

7/10/2015

Evaluate the items

Why a large sample ??


In small sample, patterns of co variation among the
items may not be stable
Development sample may not represent the
population for which the scale is intended
Level of attributes present in sample v/s intended
population
A sample that is qualitatively rather than quantitatively
different from the target population (the relationship among
items or constructs may differ from the population)

An item should high co relation with the true


score of latent variable
Inspect the correlation matrix
higher the co relation among items higher are the
individual item reliabilities

Reverse Scoring
Item Scale co relation- an uncorrected itemtotal co relation makes good conceptual
sense , the reality is that the items inclusion
in scale can inflate the co relation coefficient
Step 1

Step 2

Step 3

Determine clearly Generate an Item Determine the


What it is you
Pool
format for
want to measure
Measurement

Evaluate the items


Item variance valuable attribute for a scale
item is relatively high variance
Items means close to center of the range of
possible scores is also desirable otherwise item
might fail to detect certain values of construct
Coefficient alpha-is an indication of proportion
of variance in the scale scores that is
attributable to true score
a non central mean, poor variability, negative co
relation among items, low item scale co relation
and weak inter item co relation will tend to
reduce alpha

Step 5

Step 6

Step 7

Administer Items Evaluate the


to a Development items
Sample

Step 8
Optimize scale
Length

Optimize Scale length


Effect of scale length on reliability
-Scale alpha is dependent on co variation among the
items and no of items
-If a scale reliability is too low, then brevity is no value
Effects of dropping bad items if an item has
sufficiently lower than average correlation with the
other item, dropping it will raise alpha
Tinkering with scale length- items whose omission
has the least ve or most +ve effect on alpha is the
best one to drop first

Step 1

Step 2

Step 3

Determine clearly Generate an Item Determine the


What it is you
Pool
format for
want to measure
Measurement

Split Items- If developmental sample is sufficient large, split


it into two sub samples one can serve as
primary development sample and other can be
used to cross validate the findings
- Splitting provides valuable information about
scale stability

Step 4

Have initial item Consider


pool reviewed by inclusion of
Experts
Validation items

Step 4

Step 5

Have initial item Consider


pool reviewed by inclusion of
Experts
Validation items

Step 6

Step 7

Administer Items Evaluate the


to a Development items
Sample

Step 8
Optimize scale
Length

Psychological and
Psychometric testing

Session 8: Item Analysis


Prof. Swati Dhir
swati.d@iimranchi.ac.in

7/10/2015

Item Analysis - Outline


In constructing a new test (or shortening or
lengthening an existing one), the final set of items
is usually identified through a process known as
item analysis.

Linda Croker

Both the validity and the reliability of any test depend


ultimately on the characteristics of its items.

1. Types of test items


Selected response items
Constructed response items

2. Parts of test items


3. Guidelines for writing test items
4. Item Analysis
Distracter measures
Item difficulty measures
Item discrimination measures

1. Types of test items

Selected response
Multiple choice
Likert scale
Q-sort

Constructed response

Free response
Fill-in-the-blank
Essay tests
Portfolios
In-basket technique

A. Selected response
Task is to choose between set answers
Multiple
choice or
forced choice Advantage: Ease of scoring &
scoring requires little skill
Disadvantage: may test memory rather
than comprehension
Correct response must be distinct
Distracters should not be obvious or
ambiguous

A. Selected response
Multiple choice or
forced choice
Likert format

Test-taker chooses a point


on a scale that expresses
their attitude or belief
Data lend themselves to
factor analysis

A. Selected response
Multiple choice or
forced choice
Likert format
Q-sort

A large set of cards each with


statement referring to a
target
Test-taker sorts cards into
piles in terms of how
accurate statements are as a
description of target
Generally 9 piles

7/10/2015

B. Constructed response items


Free response
Fill-in-theblank

B. Constructed response items

Test-taker responds without constraint


Describes what is important to him/her

Strengths

Used to test for knowledge or to find out about beliefs and


attitudes

Essay tests

Preferred when you want to assess test-takers ability to think


analytically, integrate ideas, and express himself/herself

Portfolios

Not really a test


Collections of things the person being evaluated has produced

In-basket
technique

Used in business; Job candidate gets a set ofeveryday


problems, says how he or she would deal with those problems
Requires expert raters to grade response

Assess higher-order skills


More useful feedback to
test-taker
Positive influence on study
habits

Weaknesses
Time consuming to use
Possible subjectivity in
scoring

Easier to create items

2. Parts of test items

3. Writing test items guidelines

Stimulus or item stem


What the subject responds to

A.
B.
C.
D.
E.
F.

Response format or method


Typically multiple choice, Likert or constructed response

Conditions governing the response


time limits; allowing probes for ambiguous responses; how response is recorded

Define clearly
Generate a pool of potential items
Monitor reading level
Use unitary items
Avoid long items
Break any response set

Procedures for scoring the response


Particularly important for constructed response items

A. Multiple choice distracter measures

4. Item analysis

Multiple
choice
distracter
analysis

Item difficulty
measure P

Discrimination
index D

Item total
correlation

How many people


choose each
distracter?

Distracters should be
equally attractive
Correct choice should be
based on knowledge
Where knowledge is
lacking, choice should be
random

7/10/2015

Estimation Methods

B. Item Difficulty Measure Pi


The item difficulty for item i, pi , is defined as the proportion of
examinees who get that item correct.

Method for
Dichotomously
Scored Item

P(i) = # got item correct


# taking test

Though the proportion of examinees passing an item


traditionally has been called the item difficulty, this
proportion logically should be called item easiness,
because the proportion increase as the item becomes
easier.

Method for Dichotomously Scored Items


Difficulty Factor

Method for
Polytomously
Scored Item

Grouping
Method

Difficulty Factor
Range 0 -1; Optimal Level is .5

R
P
N

P is the difficulty of a certain item.


R is the number of examinees who get
that item correct.
N is the total number of examinees

The HIGHER the difficulty factor the easier the


question is, so a value of 1 would mean all the students
got the question correct and it may be too easy
If you want the subjects to master the topic area, high
difficulty values should be expected

Guided Practice
What is the P for Items 1-3

Example 1
There are 80 high school students attending a
science achievement test, and 61 students pass item
1, 32 students pass item 10. Please calculate the
difficulty for item 1 and 10 separately.
P1= 0.76; P10= 0.4

Student

Raw
score

Item 1

Item 2

Item 3

Item 4

Item 5

10

7/10/2015

Difficulty Factor

Method for Polytomously Scored Items

What does it mean?

Item # 1 = .8 may be too easy


Item # 2 = .6 good
Item # 3 = .4 may be slightly difficult
Item # 4 = 0.5 Optimum
Item # 5 = 0.6 Good

X
X max

X , the mean of total examinees scores on one item


X max , the perfect scores of that item

The perfect scores of one open- ended item is 20


points, the average score of total examinees on
this item is 11 points. What is the item difficulty?
P = .55

Grouping Method (Use of Extreme Groups) (T. L. Kelley,


1939)

Upper (U) and Lower (L) Criterion groups are


selected from the extremes of distribution of
P PL
test scores or job ratings.
P U
2

PU

is the proportion for examinees of upper group who get the item
correct.

PL

is the proportion for examinees of lower group who get the item
correct.

Example 3
There are 371 examinees attending a language test.
Known that 64 examinees of 27% upper extreme
group pass item 5, and 33 examinees of 27%
lower extreme group pass the same item. Please
compute the difficulty of item 5.
Key : 0.49

Correct Chance Effects on Item Difficulty for Multiple-Choice Item

The difficulty of one five-choice item is .50, the difficulty of


another four-choice item is .53. Which item is more difficulty?

CP

KP 1
K 1

CP ,corrected item difficulty


P , item difficulty
K , the number of choices for that item

ANSWER
CP1

KP 1 5 0.5 1

0.38
K 1
5 1

CP2

KP 1 4 0.53 1

0.37
K 1
4 1

So, the four-choice item is more difficult.

10

7/10/2015

C. Item Discrimination Measures

Item Discrimination

Item discrimination refers to the degree to which an item


differentiates correctly among test takers in the behavior that
the test is designed to measure

Item-total
correlation

Discrimination
index D

Discrimination Index D

(Used for dichotomously scored items)

Extreme groups method

U = # getting item correct in top group


L = # getting item correct in bottom group
nU = # in top group
nL = # in bottom group

D= U L
nU
nL

To be able to discriminate between different levels of


achievement, the difficulty factor should be between .3 and .7

Example 1

There are 141 students attending a world history test.


(1) If we use the ratio 27% to determine the upper and
lower group, then how many examinees are there in the
upper and lower group separately?
(2) If 18 examinees in upper group answer item 5 correctly,
and 6 examinees in lower group answer it correctly,
then calculate the discrimination index for item 5.
Answer: 38, 0.315

Values of D may range from -1.00 to 1.00.

Item Total Correlation

Guidelines for Interpretation of D Value


D.40, the item is functioning quite satisfactorily

Good item

High correlation
People who get item correct have high score on
the test

.30 D.39, little or no revision is required


.20 D.29, the item is marginal and needs revision

People who get item wrong have low score on the


test

Poor item

Low correlation: look at wording may be


testing reading skill

D.19, the item should be eliminated or completely


revised

11

7/10/2015

Choice Analysis

Psychological and Psychometric


Testing

Whether the examinees who choose the correct choice is


more than those who choose the wrong choices
Whether a lot of examinees choose the wrong choices
Whether the examinees of upper group who choose the
correct choice is more than the examinees of lower group
Whether the examinees of upper group who choose the
wrong choice is more than those of lower group

Session 8&9
Prof. Swati Dhir

Whether there is any item that quite a number of


examinees make no choices

Literature Review (Home work)

Excel Add-ins
Use the Analysis ToolPak to perform complex
data analysis
If data analysis command is not available
Command: File_Option_Add
Ins_Manage_Select_ Analysis Toolpak (check
box and ok

Research Methodology

Item Generation
Content validation
Adding some criterion related construct
Context of the study
Interitem Analysis
Exploratory Factor Analysis
Construct validity (Convergent and Divergent)
External Validity
Sampling Adequacy
Reliability
Criterion Validity (Predictive and Concurrent)

Content Validity

Rating by experts
80% consensus
Drop the items if it is not consistent
Items may be reworded
Command: Analyze_ Descriptive Statistics_
Cross tabs
Select rater 1 as row and rater 2 as column
Click statistics_ select kappa_ continue

12

7/10/2015

Example

Content Validity
Kappa might be interpreted (Landis & Koch,1977)
Kappa

Data Entry
Files export
Variable view
Missing Values (Analyze_Missing Value)
Descriptive Statistics (DS):
Frequency (Analyze_DS_Frequency

Data cleaning

Interpretation

<0

Poor agreement

0.0 0.20

Slight agreement

0.21 0.40

Fair agreement

0.41 0.60

Moderate agreement

0.61 0.80

Substantial agreement

0.81 1.00

Almost perfect agreement

Interitem Analysis
Selection of closely associated items thereby
increasing the reliability of the scale
Mean, Standard Deviation and Intercorrelations
Though, there is no definite cutoff score for
adequate variability
However, SD of 1 represents adequate amount of
variability for usefulness of an item
Any item that correlates at less than 0.40 with all
other items should be dropped
Too high means for particular item_ Outliers
Command: Analyze_Correlate_Bivariate

Exploratory Factor Analysis


Validity Coefficient: The relationship between a test
and a criterion is usually expressed as a correlation
called a validity coefficient
Principal Axis Factor analysis with Varimax Rotation
Factor loading >0.5
Square of factor loading is the percentage of variation
in the criterion we can know from the test scores
Command: Analyze_Dimension Reduction_Factor

13

7/10/2015

Eigen Values 1

Scree Test

Most widely used of all factor number rules

Involves constructing a graph in which eigen values from


the matrix are plotted in descending order

For any matrix of correlations, it is possible to compute a set

Graph is then examined to determine the number of eigen


values that precedes the last major drop

of numerical values called eigen values.


They reflect the variance accounted for by principal

components,
with the first value reflecting the variance explained by the

strongest component,
Example of a Scree Plot

the second value the variance explained by the second strongest

component and so on.

Limitations
There is no clear definition of what constitutes a
major drop.
Sometimes the data may produce a gradual
decreasing slope with no major break points
The scree test has been found to function reasonably
well in cases where strong PCs are present.

External Validity
Means and medians should not be very
different
Skewness: measure of symmetry or more
precisely the lack of symmetry (<2)
Kurtosis: measure of whether the data are
peaked or flat relative to a normal distribution
(<5)
Command: Analyze_Descritive statistics_
Descriptives_Option_Distribution_Kurtosis
and Skewness

Construct Related Evidence


Convergent Validity
All items loaded significantly on the respective factors
Average loading > 0.7

Discriminant Validity
No cross loading
Correlations among factors should be low
Variance Extracted between construct > Correlation
Construct

Sampling Adequacy
Kaiser Meyer Olkin KMO: To check the case to
variable ratio for the analysis
Range= 0-1
Acceptance limit >0.6

Bartletts test of Sphericity: Relates to the study


and thereby and thereby shows the validity and
suitability of the responses collected
Significant at 0.05 (with 95% confidence limit)
Command: Analyze_Dimension Reduction_Factor

14

7/10/2015

Internal Consistency: Reliability

Criterion Related Evidence


Predictive Validity

Command: Analyze_Scale_Reliability
Analysis_Alpha

R Square, Beta value and significance level


Intercorrelations among all the factors
Command: Analyze_Regression_Linear_DV
and IV

15

You might also like