Professional Documents
Culture Documents
Reliability/Validity
Test retest: How stable a test is over time; obtained by administering the same test twice to a group of
individuals.
Parallel forms: Obtained by administering different versions of an assessment tool (both versions must
contain items that probe the same construct, skill, knowledge base, etc.) to the same group of
individuals.
Internal Consistency: used to evaluate the degree to which different test items produce similar results.
Interrater reliability: Has different judges or raters agree in their assessment decisions. Inter-rater
reliability is useful because human observers will not necessarily interpret answers the same way.
2. Validity – Tool measure what it says it measure. Compare to tools that work.
Types of Validity:
Content
- Most often used
- Established by peer review
Criterion
- Future – predictive
- Establish by looking at other measures that are already valid
Construct
- Constructs are theoretical notion (personality)
- Establish by correlating scores to some theorized outcome
3. Chronbach's Alpha – Special measure of internal consistency, the higher the values =
more confidence you cave have a test measure one thing.
Reliability: Repeat the trial when you use of the instrument (stability)
Validity: Compare the results of the blood pressure machine to an proven accurate machine
X= Observed Score
T = True Score, the score that you would receive if the test contained no error.
E = Error Score
Test interpretation can be used to measure something, the use is to see if those scores are reliable.
The Normal Distribution
Central Limit Theorem: As the sample size increases, the shape of the sampling distribution will
become more normally distributed.
2. z-score
1. Concept: Measures the # of standard deviation that an actual value lies from the mean
𝑟𝑎𝑤−𝑚𝑒𝑎𝑛
2. How to calculate: 𝑧 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣.
-1 to +1 = 68%
-2 to +2= 95%
-3 to +3 = 99%
Definition: assumes that every person has a true score on an item or a scale if we can only
measure it directly without error
How it relates to error in measurement
CTT analyses assumes that a person’s test score is comprised of their “true” score plus some
measurement error.
Error:
Is normally distributed
Uncorrelated with true score
Has a mean of Zero
5. Confidence interval
1. Definition: a range of a population values given the sample value, and that there is a
specified probability that the value of a parameter lies within it.
2. How to interpret
If you took different samples of the mean, 95% of the time the mean would fall
within the range of the interval
You are accepting a 5% chance that your measurement might not be
representative of the population
3. How to calculate a CI
4. How to interpret a CI
the larger the sample size, the smaller the standard error
Shows how accurate and precise the sample is as an estimate of the population
parameter.
6. ‘How to differentiate between two different CIs
by 1.96 to obtain an estimate of where 95% of the population sample means are
expected to fall in the theoretical sampling distribution.
Statistical Significance
The risk set by the researcher for rejecting a null hypothesis when it is true
Case II: Answer the question, are two groups different from one another?
1. Null:
Shows that any kind of difference or significance you see in a set of data is
due to chance.
denoted by H0, is usually the hypothesis that sample observations result purely
from chance.
2. Alternative:
1. Reject the null hypothesis (p-value <= alpha) and conclude that the alternative
hypothesis is true at the 95% confidence level (or whatever level you've selected).
2. Fail to reject the null hypothesis (p-value > alpha) and conclude that not enough
evidence is available to suggest the null is false at the 95% confidence level.
4. Error in measurement
1. Type I error
2. Type II error
3. Understand the decision matrix and the dynamics between managing error and
correct decisions
4. Significance levels
The risk set by the researcher for rejecting a null when it is true
2. Alpha level