Steps in Scale Construction:Techniques and Guidelines

7/10/2015
Guidelines in Scale Development

Psychological and Psychometric Testing
Scale Construction
Session 6
Prof. Swati Dhir
Step 1
Determine clearly What it is you want to measure
Step 2
Generate an Item Pool
Step 3
Determine the format for Measurement
Step 4
Have initial item pool reviewed by Experts
Step 5
Consider inclusion of Validation items
Step 6
Administer Items to a Development Sample
Step 7
Evaluate the items
Step 8
Optimize scale Length
Constructs and Measurement
Purpose
Should the scale be based in theory or should you strike out

in new intellectual direction ?
To design a questionnaire that provides a

quantitative measurement of an abstract
theoretical variable
Figuring out how to measure what you want to measure
Not all surveys are scales; Decide whether it

really is a scale
Should some aspect of phenomenon be emphasized more

than others ?
Good scales possess both validity and

reliability
Step 1
Step 2
Step 3
Determine clearly Generate an Item Determine the

What it is you
Pool
format for
want to measure
Measurement
Theory as an aid
to clarity
Specificity as an
aid to clarity
Step 4
Step 5
Have initial item Consider

pool reviewed by inclusion of
Experts
Validation items
Step 6
Step 7
Administer Items Evaluate the

to a Development items
Sample
Step 8
Construct Development : A construct is a hypothetical

variable composed of different elements that are thought to
be related (e.g., 5 questions tapping Job satisfaction)
Optimize scale
Length
Boundaries of the phenomenon must be recognized so that

the content of the scale does not drift into unintended
domains
Locus of control is a widely used concept that concerns who

or what influences important outcomes in lives
Creating Items
Writing good items for a scale is definitely an art rather than a science
Think creatively about the construct you seek to measure
Make the questions simple, specific and straightforward
Avoid biased language (emotional words, emphasized text)
Multidimensional
LOC
LOC: oneself , powerful others and chance or fate

Depends largely what level of locus relates to the questions
Avoid double-barreled questions

Do you think that the technical service department is prompt and helpful?
Avoid Nonmonotonic questions
What to include in
a measure
Items that cross over into a related construct can be

problematic
Only people in the military should be allowed to personally own assault rifles.
Step 1
Step 2
Step 3

What it is you
Pool
format for
want to measure
Measurement
Step 4
Step 5

Experts
Validation items
Step 6
Step 7

Sample
Step 8
Optimize scale
Length
7/10/2015
Creating Items
Three Components of Attitudes
Redundancy: Reliability= f (no. of items)

I will do almost anything to ensure my childs success
No sacrifice is too great if it helps my child achieve success
No. of items- 2:1

Avoid exceptionally lengthy items; Reading difficulty level
Use reverse coding a number of your items
Highest value +Lowest value selected response
Cognitive
Component
Affective
Component
How a person thinks

about an attitude
object (product, issue,
candidate, idea)
How a person feels

about an attitude
object
Behavioral
A persons behavioral
predisposition to
respond to an attitude
object in a certain way
Common structure, self contained and no dependency between items
Step 1
Step 2
Step 3

What it is you
Pool
format for
want to measure
Measurement
Step 4
Step 5

Experts
Validation items
Step 6
Step 7

Sample
Step 8
Optimize scale
Length
On the Importance of Attitudes
Measurement
The term questionnaire item is used to
denote a single question on a survey,
corresponding to a single column in a
dataset.
Scales typically denote sets of questions
which become mathematical
combinations of survey items.
I believe both candidates

bring strengths to the table
Step 1
Step 2
Determine
Generate an
clearly What it Item Pool
is you want to
measure
Measurement/Scaling Properties
Assignment
You can assign objects to categories
Step 4
Step 6
Step 7
Consider
Administer
inclusion of
Items to a
Validation items Development
Sample
Step 5
Evaluate the
items
Step 8
Optimize scale
Length
Types of Scales
Nominal Scale
Has Assignment Only (What is Your Gender?)
Ordinal
Has Assignment, Order (Education)
Order (Magnitude)
You can order objects in terms of having more or less of some quality
Distance (Equal Intervals)
The distance between adjacent points on the scale is identical
Step 3
Determine the Have initial

format for
item pool
Measurement
reviewed by
Experts
What is your income? (5-10k; 11-15k; 16-20k; 20-25k; 25-30k)
Interval
Has Assignment, Order, Equal Intervals (Temperature)
Hybrid Ordinally-Interval Scale

Like an ordinal scale, but researcher pretends it is an interval scale (e.g.,
assumes 1 to 7 scale is an interval scale); commonly used in questionnaires
Ratio
Has Assignment, Order, Equal Intervals, Absolute Zero (Number of Cars,
weight)
Origin (Absolute Zero Point)

Zero means something (absence of a given quality)
7/10/2015
Formats for Measurement
Issues in Designing Verbal Rating Scales
Different intensities of the attribute, spaced to represent equal intervals

Could be formatted with agree-disagree response option
Thurstone
Scaling
Series of items tapping progressively higher levels of an attribute

Do you smoke?
Do you smoke more than 10 cigarettes' in a day?
Do you smoke more than a pack?
Guttman
Scaling
A list of adjective pairs either unipolar or bipolar

e.g. Friendly or not friendly ; Friendly or hostile
Semantic
Differential
Many measures taken by researchers are verbal ratings
What do we need to consider when we develop verbal rating scales?

Number of categories
Forced vs. unforced scale
Balanced or unbalanced scale
Extent of verbal description
Should response categories be numbered or not
Comparative vs. noncomparative scale
Scale direction
The item is prepared as a declarative sentence, followed by response

options indicating varying degree of agreement
Widely used in measuring opinions, beliefs and attitudes
Likert Scale
Number of Response Categories?
Forced vs. Unforced Scale?
To what extent are you satisfied with your current Laptop ?
How likely would you be to buy a car manufactured in Brazil?
Most researchers suggest between 5 and 7 categories;

for example:
Forced Scale (even number of options forces the respondent to lean

one way or the other):
Extremely
Dissatisfied
Dissatisfied
Somewhat
Dissatisfied
Neither
Somewhat
Satisfied
Satisfied
Extremely
Satisfied
Very
Unlikely
Unlikely
Somewhat
Unlikely
Somewhat
Likely
Likely
Very
Likely
Unforced scale gives people a neutral option:
Too few does not give you enough information

Too many and it will be hard for people to discriminate between the
options (e.g., a 100-point scale)
Balanced vs. Unbalanced Scale?
1
Very
Unlikely
2
Unlikely
3
Somewhat
Unlikely
4
Neither
5
Somewhat
Likely
India should invest in Infrastructure.
Balanced scale (same number of positive and negative options):
Label endpoints or label all options?
2
Dissatisfied
3
Somewhat
Dissatisfied
4
Neither
5
Somewhat
Satisfied
6
Satisfied
7
Extremely
Satisfied
7
Very
Likely
Extent of Verbal Description?
How satisfied are you with your current hair stylist?
1
Extremely
Dissatisfied
6
Likely
1
Strongly
Disagree
7
Strongly
Agree
1
Strongly
Disagree
2
Moderately
Disagree
3
Slightly
Disagree
4
Neither Agree
or Disagree
5
Slightly
Agree
6
Moderately
Agree
7
Strongly
Agree
Unbalanced scale (here all options are positive):

1
Somewhat
Satisfied
7
Very
Satisfied
Unbalanced scale can give biased results; unless distribution is

naturally skewed to one side of the scale, should use balanced scale
Labeling all options can aid in interpretation
7/10/2015
Should Categories be Numbered?

Toyota is an Environment Friendly Company
Strongly
Disagree
1
-3
Moderately
Disagree
2
-2
Slightly
Disagree
3
-1
Neither Agree
or Disagree
4
0
Slightly
Agree
5
1
Comparative vs. Noncomparative?

Noncomparative question
How would you evaluate Pepsodent toothpaste?
Moderately
Agree
6
2
Strongly
Agree
7
3
Should we have
numbers here?
Numbers can help respondents understand scale
Comparative question
Compared to your current brand, how would you evaluate
Pepsodent toothpaste?
Comparative questions establish the referent and can be useful if you
need to know how your product compares to a specific competitor or
the customers current brand
1 to 7 scale quite common

But -3 to +3 can help interpretation of scale (disagree is negative,
agree is positive), however, it may overemphasize negativity
Noncomparative have the advantage of allowing the respondent to

create their own referent, which can potentially improve accuracy
Judgment call; pretesting both scales could help identify problems
Direction of Scale?
Typical direction (lower values, negative connotation on left):
Strongly
Disagree
1
Moderately
Disagree
2
Slightly
Disagree
3
Neither Agree
or Disagree
4
Slightly
Agree
5
Moderately
Agree
6
Strongly
Agree
7
Some scales are not valenced, so must be careful about positioning. For
a semantic differential scale, with amusing positioning:
Unpleasant
-2
-1
Pleasant
Flimsy
Male
-2
-2
-1
-1
0
0
1
1
2
2
Sturdy
Female
Single-items adequate for measurement?

Suppose an instructor had single-question exams?
Suppose the CAT (or GMAT) had only 5 possible
scores (similar to A,B,C,D,F grades)?
This arrangement suggests that males are to be evaluated negatively;

must be careful in designing scales so as not to bias results
Composite, or Multiple-Item Scales
Formative and Reflective Items
Capture the sensitivity to the continuous nature

of many subtle differences among consumers
Simultaneously address concerns of: Accuracy
and Consistency
Formative
items
Can be combined to
measure the multiple
aspects of a construct,
though not necessary
that respondents answer
each item similarly
Reflective
items
Measures a single trait

and respondents should
answer each item
similarly
All relate to larger issue of measurement error

Items within a scale are typically interchangeable for reflective items but not
for formative items
7/10/2015
Formative Scale Items: Satisfaction

Reflective Items: Materialism
My last flight on JA departed on-time.
Timeliness
An airline could always be on-time if they made that their priority
I admire people who own expensive homes, cars, and clothes.
JA has competitive fares.
Pricing
It upsets me to know others on the same flight have paid a lower price for their seat.
JA ticketing personnel are polite.
Staff
I dont place much emphasis on the amount of material objects

people own as a sign of success.*
JA has friendly reservation operators.

I know its not the airlines fault when a flight is cancelled.
Service
The two-item restriction on carry-on luggage is insensitive to the needs of todays passengers.
JA has ample leg-room for me in coach seating.
Some of the most important achievements in life include acquiring

material possessions.
Travelling Comfort
The things I own say a lot about how well Im doing in life.
I dont pay much attention to the material objects other people own.*
JA did not lose my luggage on my last trip.
* Reverse coded
I have not been bumped from a JA flight in the last two years.
Reviewed By Experts
Ask panel of expert to rate how relevant they think each item
is to what you intend to measure
Provide the expert the working definition of the construct
Can evaluate the items clarity and conciseness (by rating
relevance as high, moderate or low)
Step 2
Determine
Generate an
clearly What it is Item Pool
you want to
measure
Step 3
Determine the
format for
Measurement
Step 4
Step 5
Step 6

Administer
pool reviewed
inclusion of
Items to a
by Experts
Validation items Development
Sample
Step 7
Evaluate the
items
Step 8
-Social desirability scale (Strahan and Gerbasi,

1972)
- For detecting undesirable response

tendencies
we can use MMPI
(Minnesota Multiphasic Personality
Inventory) and response biases can be
detected
Reliability: Test- Retest Method, Alternate forms

method and Split haves method
Optimize scale
Length
Consider inclusion of validation items

Social desirability
Internal Validity (No confounds)

External Validity (Generalized to your target
population)
Content related Evidence: Face validity
Criterion Related Evidence: Predictive Validity,
Concurrent Validity
Construct Related Evidence: Convergent Validity,
Discriminant Validity
Can provide pointing out ways of tapping the phenomenon

that you have failed to include
Step 1
Validity and Reliability
Administer Items to a Development

Sample
Administer items along with the pool of new

items to some subjects
The subject sample should be large enough to
eliminate subject variance as a significant concern
If a single scale is to be extracted from a pool of
about 20 items , fewer than 300 subject may
suffice
Entering the data
Using Computer software
www.surveymonkey.com
http://www.qualtrics.com
Step 1
Step 2
Step 3

What it is you
Pool
format for
want to measure
Measurement
Step 4
Step 5

Experts
Validation items
Step 6
Step 7

Sample
Step 8
Optimize scale
Length
7/10/2015
Evaluate the items
Why a large sample ??

In small sample, patterns of co variation among the
items may not be stable
Development sample may not represent the
population for which the scale is intended
Level of attributes present in sample v/s intended
population
A sample that is qualitatively rather than quantitatively
different from the target population (the relationship among
items or constructs may differ from the population)
An item should high co relation with the true

score of latent variable
Inspect the correlation matrix
higher the co relation among items higher are the
individual item reliabilities
Reverse Scoring
Item Scale co relation- an uncorrected itemtotal co relation makes good conceptual
sense , the reality is that the items inclusion
in scale can inflate the co relation coefficient
Step 1
Step 2
Step 3

What it is you
Pool
format for
want to measure
Measurement
Evaluate the items

Item variance valuable attribute for a scale
item is relatively high variance
Items means close to center of the range of
possible scores is also desirable otherwise item
might fail to detect certain values of construct
Coefficient alpha-is an indication of proportion
of variance in the scale scores that is
attributable to true score
a non central mean, poor variability, negative co
relation among items, low item scale co relation
and weak inter item co relation will tend to
reduce alpha
Step 5
Step 6
Step 7

Sample
Step 8
Optimize scale
Length
Optimize Scale length

Effect of scale length on reliability
-Scale alpha is dependent on co variation among the
items and no of items
-If a scale reliability is too low, then brevity is no value
Effects of dropping bad items if an item has
sufficiently lower than average correlation with the
other item, dropping it will raise alpha
Tinkering with scale length- items whose omission
has the least ve or most +ve effect on alpha is the
best one to drop first
Step 1
Step 2
Step 3

What it is you
Pool
format for
want to measure
Measurement
Split Items- If developmental sample is sufficient large, split

it into two sub samples one can serve as
primary development sample and other can be
used to cross validate the findings
- Splitting provides valuable information about
scale stability
Step 4

Experts
Validation items
Step 4
Step 5

Experts
Validation items
Step 6
Step 7

Sample
Step 8
Optimize scale
Length
Psychological and
Psychometric testing
Session 8: Item Analysis

Prof. Swati Dhir
swati.d@iimranchi.ac.in
7/10/2015
Item Analysis - Outline

In constructing a new test (or shortening or
lengthening an existing one), the final set of items
is usually identified through a process known as
item analysis.
Linda Croker
Both the validity and the reliability of any test depend

ultimately on the characteristics of its items.
1. Types of test items

Selected response items
Constructed response items
2. Parts of test items

3. Guidelines for writing test items
4. Item Analysis
Distracter measures
Item difficulty measures
Item discrimination measures
1. Types of test items
Selected response
Multiple choice
Likert scale
Q-sort
Constructed response
Free response
Fill-in-the-blank
Essay tests
Portfolios
In-basket technique
A. Selected response
Task is to choose between set answers
Multiple
choice or
forced choice Advantage: Ease of scoring &
scoring requires little skill
Disadvantage: may test memory rather
than comprehension
Correct response must be distinct
Distracters should not be obvious or
ambiguous
Multiple choice or
forced choice
Likert format
Test-taker chooses a point

on a scale that expresses
their attitude or belief
Data lend themselves to
factor analysis
Multiple choice or
forced choice
Likert format
Q-sort
A large set of cards each with

statement referring to a
target
Test-taker sorts cards into
piles in terms of how
accurate statements are as a
description of target
Generally 9 piles
7/10/2015
B. Constructed response items

Free response
Fill-in-theblank
B. Constructed response items
Test-taker responds without constraint

Describes what is important to him/her
Strengths
Used to test for knowledge or to find out about beliefs and

attitudes
Essay tests
Preferred when you want to assess test-takers ability to think

analytically, integrate ideas, and express himself/herself
Portfolios
Not really a test

Collections of things the person being evaluated has produced
In-basket
technique
Used in business; Job candidate gets a set ofeveryday

problems, says how he or she would deal with those problems
Requires expert raters to grade response
Assess higher-order skills

More useful feedback to
test-taker
Positive influence on study
habits
Weaknesses
Time consuming to use
Possible subjectivity in
scoring
Easier to create items
2. Parts of test items
3. Writing test items guidelines
Stimulus or item stem

What the subject responds to
A.
B.
C.
D.
E.
F.
Response format or method

Typically multiple choice, Likert or constructed response
Conditions governing the response

time limits; allowing probes for ambiguous responses; how response is recorded
Define clearly
Generate a pool of potential items
Monitor reading level
Use unitary items
Avoid long items
Break any response set
Procedures for scoring the response

Particularly important for constructed response items
A. Multiple choice distracter measures
4. Item analysis
Multiple
choice
distracter
analysis
Item difficulty
measure P
Discrimination
index D
Item total
correlation
How many people

choose each
distracter?
Distracters should be
equally attractive
Correct choice should be
based on knowledge
Where knowledge is
lacking, choice should be
random
7/10/2015
Estimation Methods
B. Item Difficulty Measure Pi

The item difficulty for item i, pi , is defined as the proportion of
examinees who get that item correct.
Method for
Dichotomously
Scored Item
P(i) = # got item correct

# taking test
Though the proportion of examinees passing an item

traditionally has been called the item difficulty, this
proportion logically should be called item easiness,
because the proportion increase as the item becomes
easier.
Method for Dichotomously Scored Items

Difficulty Factor
Method for
Polytomously
Scored Item
Grouping
Method
Difficulty Factor
Range 0 -1; Optimal Level is .5
R
P
N
P is the difficulty of a certain item.

R is the number of examinees who get
that item correct.
N is the total number of examinees
The HIGHER the difficulty factor the easier the

question is, so a value of 1 would mean all the students
got the question correct and it may be too easy
If you want the subjects to master the topic area, high
difficulty values should be expected
Guided Practice
What is the P for Items 1-3
Example 1
There are 80 high school students attending a
science achievement test, and 61 students pass item
1, 32 students pass item 10. Please calculate the
difficulty for item 1 and 10 separately.
P1= 0.76; P10= 0.4
Student
Raw
score
Item 1
Item 2
Item 3
Item 4
Item 5
10
7/10/2015
Difficulty Factor
Method for Polytomously Scored Items
What does it mean?
Item # 1 = .8 may be too easy

Item # 2 = .6 good
Item # 3 = .4 may be slightly difficult
Item # 4 = 0.5 Optimum
Item # 5 = 0.6 Good
X
X max
X , the mean of total examinees scores on one item

X max , the perfect scores of that item
The perfect scores of one open- ended item is 20

points, the average score of total examinees on
this item is 11 points. What is the item difficulty?
P = .55
Grouping Method (Use of Extreme Groups) (T. L. Kelley,

1939)
Upper (U) and Lower (L) Criterion groups are

selected from the extremes of distribution of
P PL
test scores or job ratings.
P U
2
PU
is the proportion for examinees of upper group who get the item
correct.
PL
is the proportion for examinees of lower group who get the item
correct.
Example 3
There are 371 examinees attending a language test.
Known that 64 examinees of 27% upper extreme
group pass item 5, and 33 examinees of 27%
lower extreme group pass the same item. Please
compute the difficulty of item 5.
Key : 0.49
Correct Chance Effects on Item Difficulty for Multiple-Choice Item
The difficulty of one five-choice item is .50, the difficulty of

another four-choice item is .53. Which item is more difficulty?
CP
KP 1
K 1
CP ,corrected item difficulty

P , item difficulty
K , the number of choices for that item
ANSWER
CP1
KP 1 5 0.5 1
0.38
K 1
5 1
CP2
KP 1 4 0.53 1
0.37
K 1
4 1
So, the four-choice item is more difficult.
10
7/10/2015
C. Item Discrimination Measures
Item Discrimination
Item discrimination refers to the degree to which an item

differentiates correctly among test takers in the behavior that
the test is designed to measure
Item-total
correlation
Discrimination
index D
Discrimination Index D
(Used for dichotomously scored items)
Extreme groups method
U = # getting item correct in top group

L = # getting item correct in bottom group
nU = # in top group
nL = # in bottom group
D= U L
nU
nL
To be able to discriminate between different levels of

achievement, the difficulty factor should be between .3 and .7
Example 1
There are 141 students attending a world history test.

(1) If we use the ratio 27% to determine the upper and
lower group, then how many examinees are there in the
upper and lower group separately?
(2) If 18 examinees in upper group answer item 5 correctly,
and 6 examinees in lower group answer it correctly,
then calculate the discrimination index for item 5.
Answer: 38, 0.315
Values of D may range from -1.00 to 1.00.
Item Total Correlation
Guidelines for Interpretation of D Value

D.40, the item is functioning quite satisfactorily
Good item
High correlation
People who get item correct have high score on
the test
.30 D.39, little or no revision is required

.20 D.29, the item is marginal and needs revision
People who get item wrong have low score on the

test
Poor item
Low correlation: look at wording may be

testing reading skill
D.19, the item should be eliminated or completely

revised
11
7/10/2015
Choice Analysis
Psychological and Psychometric

Testing
Whether the examinees who choose the correct choice is

more than those who choose the wrong choices
Whether a lot of examinees choose the wrong choices
Whether the examinees of upper group who choose the
correct choice is more than the examinees of lower group
Whether the examinees of upper group who choose the
wrong choice is more than those of lower group
Session 8&9
Prof. Swati Dhir
Whether there is any item that quite a number of

examinees make no choices
Literature Review (Home work)
Excel Add-ins
Use the Analysis ToolPak to perform complex
data analysis
If data analysis command is not available
Command: File_Option_Add
Ins_Manage_Select_ Analysis Toolpak (check
box and ok
Research Methodology
Item Generation
Content validation
Adding some criterion related construct
Context of the study
Interitem Analysis
Exploratory Factor Analysis
Construct validity (Convergent and Divergent)
External Validity
Sampling Adequacy
Reliability
Criterion Validity (Predictive and Concurrent)
Content Validity
Rating by experts
80% consensus
Drop the items if it is not consistent
Items may be reworded
Command: Analyze_ Descriptive Statistics_
Cross tabs
Select rater 1 as row and rater 2 as column
Click statistics_ select kappa_ continue
12
7/10/2015
Example
Content Validity
Kappa might be interpreted (Landis & Koch,1977)
Kappa
Data Entry
Files export
Variable view
Missing Values (Analyze_Missing Value)
Descriptive Statistics (DS):
Frequency (Analyze_DS_Frequency
Data cleaning
Interpretation
<0
Poor agreement
0.0 0.20
Slight agreement
0.21 0.40
Fair agreement
0.41 0.60
Moderate agreement
0.61 0.80
Substantial agreement
0.81 1.00
Almost perfect agreement
Interitem Analysis
Selection of closely associated items thereby
increasing the reliability of the scale
Mean, Standard Deviation and Intercorrelations
Though, there is no definite cutoff score for
adequate variability
However, SD of 1 represents adequate amount of
variability for usefulness of an item
Any item that correlates at less than 0.40 with all
other items should be dropped
Too high means for particular item_ Outliers
Command: Analyze_Correlate_Bivariate
Exploratory Factor Analysis

Validity Coefficient: The relationship between a test
and a criterion is usually expressed as a correlation
called a validity coefficient
Principal Axis Factor analysis with Varimax Rotation
Factor loading >0.5
Square of factor loading is the percentage of variation
in the criterion we can know from the test scores
Command: Analyze_Dimension Reduction_Factor
13
7/10/2015
Eigen Values 1
Scree Test
Most widely used of all factor number rules
Involves constructing a graph in which eigen values from

the matrix are plotted in descending order
For any matrix of correlations, it is possible to compute a set
Graph is then examined to determine the number of eigen

values that precedes the last major drop
of numerical values called eigen values.

They reflect the variance accounted for by principal
components,
with the first value reflecting the variance explained by the
strongest component,
Example of a Scree Plot
the second value the variance explained by the second strongest
component and so on.
Limitations
There is no clear definition of what constitutes a
major drop.
Sometimes the data may produce a gradual
decreasing slope with no major break points
The scree test has been found to function reasonably
well in cases where strong PCs are present.
External Validity
Means and medians should not be very
different
Skewness: measure of symmetry or more
precisely the lack of symmetry (<2)
Kurtosis: measure of whether the data are
peaked or flat relative to a normal distribution
(<5)
Command: Analyze_Descritive statistics_
Descriptives_Option_Distribution_Kurtosis
and Skewness
Construct Related Evidence

Convergent Validity
All items loaded significantly on the respective factors
Average loading > 0.7
Discriminant Validity
No cross loading
Correlations among factors should be low
Variance Extracted between construct > Correlation
Construct
Sampling Adequacy
Kaiser Meyer Olkin KMO: To check the case to
variable ratio for the analysis
Range= 0-1
Acceptance limit >0.6
Bartletts test of Sphericity: Relates to the study

and thereby and thereby shows the validity and
suitability of the responses collected
Significant at 0.05 (with 95% confidence limit)
Command: Analyze_Dimension Reduction_Factor
14
7/10/2015
Internal Consistency: Reliability
Criterion Related Evidence

Predictive Validity
Command: Analyze_Scale_Reliability
Analysis_Alpha
R Square, Beta value and significance level

Intercorrelations among all the factors
Command: Analyze_Regression_Linear_DV
and IV
15

Steps in Scale Construction:Techniques and Guidelines

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Steps in Scale Construction:Techniques and Guidelines

Uploaded by

Copyright:

Available Formats

7/10/2015

Guidelines in Scale Development

Determine clearly What it is you want to measure

Generate an Item Pool

Determine the format for Measurement

Have initial item pool reviewed by Experts

Consider inclusion of Validation items

Administer Items to a Development Sample

Evaluate the items

Optimize scale Length

Constructs and Measurement

Should the scale be based in theory or should you strike out

To design a questionnaire that provides a

Figuring out how to measure what you want to measure

Not all surveys are scales; Decide whether it

Should some aspect of phenomenon be emphasized more

Good scales possess both validity and

Determine clearly Generate an Item Determine the

Have initial item Consider

Administer Items Evaluate the

Construct Development : A construct is a hypothetical

Boundaries of the phenomenon must be recognized so that

Locus of control is a widely used concept that concerns who

LOC: oneself , powerful others and chance or fate

Avoid double-barreled questions

Items that cross over into a related construct can be

Determine clearly Generate an Item Determine the

Have initial item Consider

Administer Items Evaluate the

Three Components of Attitudes

Redundancy: Reliability= f (no. of items)

No. of items- 2:1

How a person thinks

How a person feels

Common structure, self contained and no dependency between items

Determine clearly Generate an Item Determine the

Have initial item Consider

Administer Items Evaluate the

On the Importance of Attitudes

I believe both candidates

Determine the Have initial

What is your income? (5-10k; 11-15k; 16-20k; 20-25k; 25-30k)

Hybrid Ordinally-Interval Scale

Origin (Absolute Zero Point)

Formats for Measurement

Issues in Designing Verbal Rating Scales

Different intensities of the attribute, spaced to represent equal intervals

Series of items tapping progressively higher levels of an attribute

A list of adjective pairs either unipolar or bipolar

Many measures taken by researchers are verbal ratings

What do we need to consider when we develop verbal rating scales?

The item is prepared as a declarative sentence, followed by response

Number of Response Categories?

Forced vs. Unforced Scale?

To what extent are you satisfied with your current Laptop ?

How likely would you be to buy a car manufactured in Brazil?

Most researchers suggest between 5 and 7 categories;

Forced Scale (even number of options forces the respondent to lean

Unforced scale gives people a neutral option:

Too few does not give you enough information

Balanced vs. Unbalanced Scale?

India should invest in Infrastructure.

Balanced scale (same number of positive and negative options):