You are on page 1of 24

Motivational Research

By Jerry W. Thomas, Decision Analyst


The global leader in analytical research systems
604 Avenue H East
Arlington, TX 76011-3100, USA
(1) 817.640.6166 or 1.800. ANALYSIS
http://www.decisionanalyst.com
© 1998 Decision Analyst, Inc.
1

M otivational
research is a
type of marketing
research that
attempts to
explain why
consumers behave
as they do.

M otivational research is a type


of marketing research that
attempts to explain why consumers
behave as they do. Motivational
research seeks to discover and
comprehend what consumers do
not fully understand about themselves.
Implicitly, motivational
research assumes the existence of
underlying or unconscious motives
that influence consumer behavior.
Motivational research attempts to
identify forces and influences that
consumers may not be aware of
(e.g., cultural factors, sociological
forces). Typically, these unconscious
motives (or beyond-awareness
reasons) are intertwined with
and complicated by conscious
motives, cultural biases, economic
variables, and fashion trends
(broadly defined). Motivational
research attempts to sift through all
of these influences and factors to
unravel the mystery of consumer
behavior as it relates to a specific
product or service, so that the marketer
better understands the target
audience and how to influence that
audience.
Motivational research is most valuable
when powerful underlying
motives are suspected of exerting
influence upon consumer behavior.
Products and services that relate,
or might relate, to attraction of the
opposite sex, to personal adornment,
to status or self-esteem, to
power, to death, to fears, or to
social taboos are all likely candidates
for motivational research.
For example, why do women tend
to increase their expenditures on
clothing and personal adornment
products as they approach the age
of 50 to 55? The reasons relate to
the loss of youth’s beauty and the
loss of fertility, and to related fears
of losing their husbands’ love. It is
also a time of life when discretionary
incomes are rising (the children
are leaving the nest). Other motives
are at work as well (women
are complicated creatures), but a
standard marketing research survey
would never reveal these motives,
because most women are not really
aware of why their interest in
expensive adornments increases at
this particular point in their lives.
Even benign, or low-involvement,
product categories can often benefit
from the insights provided by
motivational research. Typically, in
low-involvement product categories,
perception variables and cultural
influences are most important.
Our culture is a system of rules and
“regulations” that simplify and optimize
our existence. Cultural rules
govern how we squeeze a tube of
toothpaste, how we open packages,
2
how we use a bath towel, who does
what work, etc. Most of us are
relatively unaware of these cultural
rules. Understanding how these
cultural rules influence a particular
product can be extremely valuable
information for the marketer.
The Major Techniques
The three major motivational research
techniques are observation,
focus groups, and depth interviews.
Observation can be a fruitful method
of deriving hypotheses about
human motives. Anthropologists
have pioneered the development of
this technique. All of us are familiar
with anthropologists living with
the “natives” to understand their
behavior. This same systematic
observation can produce equally
insightful results about consumer
behavior. Observation can be accomplished
in-person or sometimes
through the convenience of video.
Usually, personal observation is
simply too expensive, and most
consumers don’t want an anthropologist
living in their household
for a month or two.
It is easier to observe consumers
in buying situations than in their
homes, and here the observation
can be in-person or by video
cameras. Generally, video cameras
are less intrusive than an in-person
observer. Finding a representative
set of cooperative stores, however,
is not an easy task, and the installation
and maintenance of video
cameras is not without its difficulties.
In-store observers can be used
as well, so long as they have some
“cover” that makes their presence
less obvious. But, observation by
video or human eye cannot answer
every question. Generally, observation
must be supplemented by
focus groups or depth interviews
to fully understand why consumers
are doing what they do.
The Focus Group
The focus group in the hands of a
skilled moderator can be a valuable
motivational research technique. To
reach its full motivational potential,
the group interview must be largely
nondirective in style, and the group
must achieve spontaneous interaction.
It is the mutual reinforcement
within the group (the group excitement
and spontaneity) that produces
the revelations and behaviors
that reveal underlying motives. A
focus group discussion dominated
by the moderator will rarely produce
any motivational insights.
A focus group actively led by the
moderator with much direct questioning
of respondents will seldom
yield motivational understanding.
But the focus group is a legitimate
motivational technique.
The Depth Interview
The heart and soul of motivational
research is the depth interview, a
lengthy (one to two hours), one-onone,
personal interview, conducted
directly by the motivational researcher.
Much of the power of the
depth interview is dependent upon
the insight, sensitivity, and skill of
the motivational researcher. The interviewing
task cannot be delegated
to traditional marketing research
interviewers—who have no training
in motivational techniques.
During the personal interview, the
motivational researcher strives to
create an empathic relationship
with each respondent, a feeling of
rapport, mutual trust, and understanding.
The researcher creates
a climate in which the respondent
feels free to express his feelings
and his thoughts, without fear of
embarrassment or rejection. The
researcher conveys a feeling that
the respondent and his opinions
are important and worthwhile, no
matter what those opinions are.
The motivational researcher is
accepting, nonthreatening, and
supportive. The emotional empathy
between motivational researcher
and respondent is the single most
important determinant of an effective
interview.
The motivational researcher relies
heavily upon nondirective interviewing
techniques. Her goal is
to get the respondent to talk, and
keep talking. The researcher tends
to introduce general topics, rather
than ask direct questions. She
probes by raising her eyebrows, by
a questioning look upon her face,
by paraphrasing what the respondent
has said, or by reflecting the
respondent’s own words back to
the respondent in a questioning
tone. Nondirective techniques are
the least threatening (and the least
biasing) to the respondent.
Projective techniques can play
an important role in motivational
research. Sometimes a respondent
can see in others what he cannot
see—or will not admit—about himself.
The motivational researcher
often asks the respondent to tell a
story, play a role, draw a picture,
complete a sentence, or associate
words with a stimulus. Photographs,
product samples, packages,
and advertisements can also be
used as stimuli to evoke additional
feelings, imagery, and comment.

D uring the interview, the researcher


watches for clues
that might indicate that a “sensitive
nerve” has been touched. Long
pauses by the respondent, slips of
the tongue, fidgeting, variations
in voice pitch, strong emotions,
facial expressions, eye movements,
avoidance of a question, fixation
on an issue, and body language are
some of the clues the motivational
researcher keys on. These “sensitive”
topics and issues are then
the focus of additional inquiry and
exploration later in the interview.
Each interview is tape-recorded
and transcribed. A typical motivational
study, consisting of 30 to
50 depth interviews, yields 1,000
to 2,000 pages of typed verbatim
dialogue. During the interview,
the motivational researcher makes
notes about the respondent’s behavior,
mannerisms, physical appearance,
personality characteristics,
and nonverbal communication.
These notes become a road map to
help the researcher understand and
interpret the verbatim transcript of
the interview.
The Analysis
The motivational researcher reads
and rereads the hundreds of pages
of verbatim respondent dialogue.
As she reads, the researcher looks
for systematic patterns of response.
She identifies logical inconsistencies
or apparent contradictions. She
compares direct responses against
projective responses. She notes the
consistent use of unusual words or
phrases. She studies the explicit
content of the interview and contemplates
its meaning in relation to
the implicit content. She searches
for what is not said as diligently
as she does for what is said. Like
a detective, she sifts through the
clues and the evidence to deduce
the forces and motives influencing
consumer behavior. No one clue
or piece of evidence is treated as
being very important. It is the convergence
of evidence and facts that
leads to significant conclusions. In
the scientific tradition, empiricism
and logic must come together and
make sense.
The analysis begins at the cultural
level. Cultural values and influences
are the ocean in which we
all swim and, of which, most of us
are completely unaware. What we
eat, the way we eat, how we dress,
what we think and feel, and the
language we speak are dimensions
of our culture. These taken-forgranted
cultural dimensions are the
basic building blocks that begin the
motivational researcher’s analysis.
The culture is the context that must
be understood before the behavior
of individuals within the context
can be understood. Every product
has cultural values and rules that
influence its perception and its usage.
3

T he analysis
begins at the
cultural level.
Cultural values
and influences are
the ocean in which
we all swim and,
of which, most of
us are completely
unaware.
4
The global leader in analytical research systems
604 Avenue H East
Arlington, TX 76011-3100, USA
(1) 817.640.6166 or 1.800. ANALYSIS
http://www.decisionanalyst.com
© 1998 Decision Analyst, Inc
shares, the role of advertising in the
category, and trends in the marketplace.
Only part of this business
environment knowledge can come
from the respondent, of course, but
understanding the business context
is crucial to the interpretation of
consumer motives in a way that
will lead to useful results. Understanding
the consumer’s motives
is worthless unless somehow that
knowledge can be translated into
actionable marketing and advertising
recommendations.
Sometimes a motivational study is
followed by quantitative surveys to
confirm the motivational hypotheses
as well as to measure the relative
extent of those motives in the
general population. But many times
motivational studies cannot be
proved or disproved by survey research,
especially when completely
unconscious motives are involved.
In these cases, the final evaluation

O nce the cultural context is


reasonably well understood,
the next analytic step is the exploration
of the unique motivations
that relate to the product category.
What psychological needs does
the product fulfill? Does the product
have any social overtones or
anthropological significance? Does
the product relate to one’s status aspirations,
to competitive drives, to
feelings of self-esteem, to security
needs? Are masochistic motives
involved? Does the product have
deep symbolic significance? And
so on. Some of these motives must
be inferred since respondents are
often unaware of why they do what
they do. But the analysis is not
complete.

T he last major dimension that


must be understood is the
business environment, including
competitive forces, brand perceptions
and images, relative market
of the hypothesized motives is by
the testing of concepts (or advertising
alternatives) that address the
different motives, or by other types
of contrived experiments.

O ne final note is relevant to the


successful conduct of motivational
research. It is critically
important that the motivational
researcher not be overly theoretical.
An eclectic, wide-ranging, and
open-minded philosophical perspective
is best. The researcher
should not formulate any “cast
in stone” hypotheses before she
conducts the motivational study.
Strongly held hypotheses, or rigid
adherence to theory, will doom a
motivational study to failure. Too
often we see what we set out to see,
or find that for which we search,
whether it exists or not. An objective,
open, unfettered mind is the
motivational researcher’s greatest
asset.
About the Author
Jerry W. Thomas (jthomas@decisionanalyst.com) is the President/CEO of Decision Analyst.
Decision Analyst is a leading international marketing research and marketing consulting firm.
The company specializes in advertising testing, strategy research, new product development,
and advanced modeling for marketing decision optimization. The author may be reached at
800.262.5974 or 1.817.640.6166.
Statistical method to test whether two (or more) variables are: (1) independent or (2) homogeneous. The chi-
square test for independence examines whether knowing the value of one variable helps to estimate the value of
another variable. The chi-square test for homogeneity examines whether two populations have the same
proportion of observations with a common characteristic. Though the formula is the same for both tests, the
underlying logic and sampling procedures vary.

Encyclopedia of Public Health:


Chi-Square Test
Top
Home > Library > Health > Public Health Encyclopedia

Studies often collect data on categorical variables that can be summarized as a series of counts. These counts are
commonly arranged in a tabular format known as a contingency table. For example, a study designed to determine
whether or not there is an association between cigarette smoking and asthma might collect data that could be
assembled into a 2−2 table. In this case, the two columns could be defined by whether the subject smoked or not,
while the rows could represent whether or not the subject experienced symptoms of asthma. The cells of the table
would contain the number of observations or patients as defined by these two variables.

The chi-square test statistic can be used to evaluate whether there is an association between the rows and columns
in a contingency table. More specifically, this statistic can be used to determine whether there is any difference
between the study groups in the proportions of the risk factor of interest. Returning to our example, the chi-square
statistic could be used to test whether the proportion of individuals who smoke differs by asthmatic status.

The chi-square test statistic is designed to test the null hypothesis that there is no association between the rows
and columns of a contingency table. This statistic is calculated by first obtaining for each cell in the table, the
expected number of

Table 1

Observed values for data presented in a two-by-two table


SOURCE: Courtesy of author.
Variable 1
Variable 2 Total
Yes No
Yes a b a+b
No c d c+d
Total a+c b+d n

events that will occur if the null hypothesis is true. When the observed number of events deviates significantly from
the expected counts, then it is unlikely that the null hypothesis is true, and it is likely that there is a row-column
association. Conversely, a small chi-square value indicates that the observed values are similar to the expected
values leading us to conclude that the null hypothesis is plausible. The general formula used to calculate the chi-
square (X2) test statistic is as follows: where O = observed count in category; E = expected count in the category
under the null hypothesis; df = degrees of freedom; and c, r represent the number of columns and rows in the
contingency table.

The value of the chi-square statistic cannot be negative and can assume values from zero to infinity. The p-value
for this test statistic is based on the chi-square probability distribution and is generally extracted from published
tables or estimated using computer software programs. The p-value represents the probability that the chi-square
test statistic is as extreme as or more extreme than observed if the null hypothesis were true. As with
the t and Fdistributions, there is a different chi-square distribution for each possible value of degrees of freedom.
Chi-square distributions with a small number of degrees of freedom are highly skewed; however, this skewness is
attenuated as the number of degrees of freedom increases. In general, the degrees of freedom for tests of
hypothesis that involve an r×c contingency table is
Table 2

Expected values for data presented in a two-by-two table


SOURCE: Courtesy of author.
Variable 1
Variable 2 Total
Yes No
Yes (a+b)(a+c)/n (a+b)(b+d)/n a+b
No (c+d)(a+c)/n (c+d)(b+d)/n c+d
Total a+c b+d n

equal to (r7minus;1)×(c−1); thus for any 2×2 table, the degrees of freedom is equal to one. A chi-square
distribution with one degree of freedom is equal to the square root of the normal distribution, and, consequently,
either the chi-square or standard normal table can be used to determine the corresponding p-value.

The chi-square test is most widely used to conduct tests of hypothesis that involve data that can be presented in a
2×2 table. Indeed, this tabular format is a feature of the case-control study design that is commonly used in public
health research. Within this contingency table, we could denote the observed counts as shown in Table 1. Under
the null hypothesis of no association between the two variables, the expected number in each cell under the null
hypothesis is calculated from the observed values using the formula outlined in Table 2.

The use of the chi-square test can be illustrated by using hypothetical data from a study investigating the
association between smoking and asthma among adults observed in a community health clinic. The results
obtained from classifying 150 individuals are shown in Table 3. As Table 3 shows, among asthmatics the proportion
of smokers was 40 percent (20/50), while the corresponding proportion among asymptomatic individuals was 22
percent (22/100). By applying the formula presented in Table 2, for the observed cell counts of 20, 30, 22, and 78
(Table 3) the corresponding expected counts are 14, 36, 28, and 72. The observed and expected counts can then
be used to calculate the chi-square test statistic as outlined in Equation 1. The resulting value of the chi-square

Table 3

Hypothetical data showing chi-square test


SOURCE: Courtesy of author.
Ever smoke cigarettes
Symptoms of asthma Total
Yes No
Yes 20 30 50
No 22 30 100
Total 42 108 150

test statistic is approximately 5.36, and the associated p-value for this chi-square distribution that has one degree
of freedom is 0.02. Therefore, if there was truly no association between smoking and asthma, there is a 2 out of
100 probability of observing a difference in proportions that is at least as large as 18 percent (40%–22%) by
chance alone. We would therefore conclude that the observed difference in the proportions is unlikely to be
explained by chance alone, and consider this result statistically significant.

Because the construction of the chi-square test makes use of discrete data to estimate a continuous distribution,
some authors will apply a continuity correction when calculating this statistic. Specifically, where Oi−Ei is the
absolute value of the difference between Oi and Ei and the term 0.5 in the numerator is often referred to as Yates
correction factor. This correction factor serves to reduce the chi-square value, and, therefore, increases the
resulting p-value. It has been suggested that this correction yields an overly conservative test that may fail to
reject a false null hypothesis. However, as long as the sample size is large, the effect of the correction factor is
negligible.

When there is a small number of counts in the table, the use of the chi-square test statistic may not be
appropriate. Specifically, it has been recommended that this test not be used if any cell in the table has an
expected count of less than one, or if 20 percent of the cells have an expected count that is greater than five.
Under this scenario, theFisher'sexact test is recommended for conducting tests of hypothesis.

(SEE ALSO: Normal Distributions; Probability Model; Sampling; Statistics for Public Health; T-Test)

Bibliography

Cohran, W. G. (1954). "Some Methods for Strengthening the Common X2 Test." Biometrics 10:417–451.

Grizzle, J. E. (1967). "Continuity Correction in the X2 Test for 2×2 Tables." The American Statistician 21:28–32.

Pagano, M., and Gauvreau, K. (2000). Principles of Biostatistics, 2nd edition. Pacific Grove, CA: Duxbury Press.

Rosner, B. (2000). Fundamentals of Biostatistics, 5th edition. Pacific Grove, CA: Duxbury Press.

— PAUL J. VILLENEUVE

Wikipedia:
Chi-square test
Top
Home > Library > Miscellaneous > Wikipedia

"Chi-square test" also known as Pearson's chi-square test.

A chi-square test (also chi-squared or χ2  test) is any statistical hypothesis test in which the sampling

distributionof the test statistic is a chi-square distribution when the null hypothesis is true, or any in which this

is asymptoticallytrue, meaning that the sampling distribution (if the null hypothesis is true) can be made to

approximate a chi-square distribution as closely as desired by making the sample size large enough.

Some examples of chi-squared tests where the chi-square distribution is only approximately valid:

 Pearson's chi-square test , also known as the chi-square goodness-of-fit test or chi-square test for

independence. When mentioned without any modifiers or without other precluding context, this test is usually

understood (for an exact test used in place of χ2, see Fisher's exact test).

 Yates' chi-square test, also known as Yates' correction for continuity.

 Mantel–Haenszel chi-square test.

 Linear-by-linear association chi-square test.

 The portmanteau test in time-series analysis, testing for the presence of autocorrelation

 Likelihood-ratio tests in general statistical modelling, for testing whether there is evidence of the need to

move from a simple model to a more complicated one (where the simple model is nested within the

complicated one).
One case where the distribution of the test statistic is an exact chi-square distribution is the test that the variance

of a normally-distributed population has a given value based on a sample variance. Such a test is uncommon in

practice because values of variances to test against are seldom known exactly.

Contents [hide]

1 Chi-square test for variance in a normal

population

2 See also

3 External links

4 References

Chi-square test for variance in a normal population

If a sample of size n is taken from a population having a normal distribution, then there is a well-known result

(seedistribution of the sample variance) which allows a test to be made of whether the variance of the population

has a pre-determined value. For example, a manufacturing process might have been in stable condition for a long

period, allowing a value for the variance to be determined essentially without error. Suppose that a variant of the

process is being tested, giving rise to a small sample of product items whose variation is to be tested. The test

statistic T in this instance could be set to be the sum of squares about the sample mean, divided by the nominal

value for the variance (i.e. the value to be tested as holding). Then T has a chi-square distribution with n–1

degrees of freedom. For example if the sample size is 21, the acceptance region for T for a significance level of 5%

is the interval 9.59 to 34.17.

See also

Statistics portal

 Chi-squared test nomogram

 G-test

 Likelihood-ratio tests are approximately chi-square tests

 McNemar's test, related to a chi-square test

 Pearson's chi-square test for a more detailed explanation

 T Test

 Wald test can be evaluated against a chi-square distribution

External links
 Chi-Square Calculator from GraphPad

 Vassar College's 2×2 Chi-Square with Expected Values

References

 Weisstein, Eric W., "Chi-Squared Test" from MathWorld.

 Corder, G.W., Foreman, D.I. (2009).Nonparametric Statistics for Non-Statisticians: A Step-by-Step

Approach Wiley,ISBN 9780470454619

 Greenwood, P.E., Nikulin, M.S. (1996) A guide to chi-squared testing. Wiley, New York. ISBN 047155779X

 Nikulin, M.S. (1973) Chi-square test for normality. "International Vilnius Conference on Probability Theory

and Mathematical Statistics", v.2, 119–122.

 Nikulin, M.S. (1973) Chi-square test for continuous distributions with scale and shift parameters, "Theory

of Probability and its Applications", v.18, #3, 559–568.


[hide]
v • d • e
Statistics
[show]
 
Descriptive statistics
[show]
 
Data collection
[show]
 
Statistical inference
[show]
 
Correlation and regression analysis
[show]
 
Data analyses and models for other specific data types
[show]
 
Applications
Category · Portal · Outline · Index

This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by
professional editors (see full disclaimer)

Donate to Wikimedia

Best of the Web:


chi-square test
Top

Some good "chi-square test" pages on the web:


Math
mathworld.wolfram.com
 
Related topics:
noncentral chi-square distribution (statistics)

confidence level (in marketing)

control (in marketing)

Related answers:
Chi-square test for 2x3 table? Read answer

A statistical hypothesis test is a method of making decisions using experimental data. In statistics, a result is

called statistically significant if it is unlikely to have occurred by chance. The phrase "test of significance" was coined

by Ronald Fisher: "Critical tests of this kind may be called tests of significance, and when such tests are available we may

discover whether a second sample is or is not significantly different from the first."[1]

Hypothesis testing is sometimes called confirmatory data analysis, in contrast to exploratory data analysis. In frequency

probability, these decisions are almost always made using null-hypothesis tests (i.e., tests that answer the

question Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at

least as extreme as the value that was actually observed?)[2] One use of hypothesis testing is deciding whether experimental

results contain enough information to cast doubt on conventional wisdom.

Statistical hypothesis testing is a key technique of frequentist statistical inference, and is widely used, but also much

criticized.[citation needed] While controversial,[3] the Bayesian approach to hypothesis testing is to base rejection of the hypothesis

on theposterior probability.[4] Other approaches to reaching a decision based on data are available via decision

theory and optimal decisions.
The critical region of a hypothesis test is the set of all outcomes which, if they occur, will lead us to decide that there is a

difference. That is, cause the null hypothesis to be rejected in favor of the alternative hypothesis. The critical region is

usually denoted by C.

Contents

 [hide]

1 Examples

o 1.1 Example 1 - Court Room

Trial

o 1.2 Example 2 - Clairvoyant

Card Game

o 1.3 Example 3 - Radioactive

Suitcase

o 1.4 Example 4 - Lady Tasting

Tea

2 The Testing Process

3 Definition of terms

4 Interpretation

5 Common test statistics

6 Origins

7 Importance

8 Potential misuse

9 Criticism

o 9.1 Significance and practical

importance

o 9.2 Meta-criticism

o 9.3 Philosophical criticism

o 9.4 Pedagogic criticism

o 9.5 Practical criticism

o 9.6 Straw man

o 9.7 Bayesian criticism

10 Publication bias

11 Improvements

12 See also
13 References

14 Further reading

15 External links

[edit]Examples

The following examples should solidify these ideas.

[edit]Example 1 - Court Room Trial

A statistical test procedure is comparable to a trial; a defendant is considered innocent as long as his guilt is not proven. The

prosecutor tries to prove the guilt of the defendant. Only when there is enough charging evidence the defendant is

condemned.

In the start of the procedure, there are two hypotheses H0: "the defendant is innocent", and H1: "the defendant is guilty". The

first one is called null hypothesis, and is for the time being accepted. The second one is called alternative (hypothesis). It is

the hypothesis one tries to prove.

The hypothesis of innocence is only rejected when an error is very unlikely, because one doesn't want to condemn an

innocent defendant. Such an error is called error of the first kind (i.e. the condemnation of an innocent person), and the

occurrence of this error is controlled to be seldom. As a consequence of this asymmetric behaviour, the error of the second

kind (setting free a guilty person), is often rather large.

Null Hypothesis (H0) is true Alternative Hypothesis (H1) is true


He truly is innocent He truly is guilty

Accept Null Wrong decision


Right decision
Hypothesis Type II Error

Wrong decision
Reject Null Hypothesis Right decision
Type I Error

[edit]Example 2 - Clairvoyant Card Game

A person (the subject) is tested for clairvoyance. He is shown the reverse of a randomly chosen play card 25 times and

asked which suit it belongs to. The number of hits, or correct answers, is called X.

As we try to find evidence of his clairvoyance, for the time being the null hypothesis is that the person is not clairvoyant. The

alternative is, of course: the person is (more or less) clairvoyant.

If the null hypothesis is valid, the only thing the test person can do is guess. For every card, the probability (relative

frequency) of guessing correctly is 1/4. If the alternative is valid, the test subject will predict the suit correctly with probability

greater than 1/4. We will call the probability of guessing correctly p. The hypotheses, then, are:
 null hypothesis       (just guessing)

and

 alternative hypothesis      (true clairvoyant).

When the test subject correctly predicts all 25 cards, we will consider him clairvoyant, and reject the null hypothesis. Thus

also with 24 or 23 hits. With only 5 or 6 hits, on the other hand, there is no cause to consider him so. But what about 12 hits,

or 17 hits? What is the critical number, c, of hits, at which point we consider the subject to be clairvoyant? How do we

determine the critical value c? It is obvious that with the choice c=25 (i.e. we only accept clairvoyance when all cards are

predicted correctly) we're more critical than with c=10. In the first case almost no test subjects will be recognized to be

clairvoyant, in the second case, some number more will pass the test. In practice, one decides how critical one will be. That

is, one decides how often one accepts an error of the first kind - a false positive, or Type I error. With c= 25 the probability of

such an error is:

and hence, very small. The probability of a false positive is the probability of randomly guessing correctly all 25 times.

Being less critical, with c=10, gives:

Thus, c=10 yields a much greater probability of false positive.

Before the test is actually performed, the desired probability of a Type I error is determined. Typically, values in

the range of 1% to 5% are selected. Depending on this desired Type 1 error rate, the critical value c is

calculated. For example, if we select an error rate of 1%, c is calculated thus:

From all the numbers c, with this property, we choose the smallest, in order to minimize the probability of

a Type II error, a false negative. For the above example, we select: c = 12.

But what if the subject did not guess any cards at all? Having zero correct answers is clearly an oddity

too. The probability of guessing incorrectly once is equal to p'=(1-p)=3/4. Using the same approach we

can calculate that probability of randomly calling all 25 cards wrong is:

This is highly unlikely (less than 1 in a 1000 chance). While the subject can't guess the cards

correctly, dismissing H0 in favour of H1 would be an error. In fact, the result would suggest a trait on

the subject's part of avoiding calling the correct card. A test of this could be formulated: for a
selected 1% error rate the subject would have to answer correctly at least twice, for us to believe

that card calling is based purely on guessing.

[edit]Example 3 - Radioactive Suitcase

As an example, consider determining whether a suitcase contains some radioactive material.

Placed under a Geiger counter, it produces 10 counts per minute. The null hypothesis is that no

radioactive material is in the suitcase and that all measured counts are due to ambient radioactivity

typical of the surrounding air and harmless objects. We can then calculate how likely it is that we

would observe 10 counts per minute if the null hypothesis were true. If the null hypothesis predicts

(say) on average 9 counts per minute and a standard deviation of 1 count per minute, then we say

that the suitcase is compatible with the null hypothesis (this does not guarantee that there is no

radioactive material, just that we don't have enough evidence to suggest there is). On the other

hand, if the null hypothesis predicts 3 counts per minute and a standard deviation of 1 count per

minute, then the suitcase is not compatible with the null hypothesis, and there are likely other

factors responsible to produce the measurements.

The test described here is more fully the null-hypothesis statistical significance test. The null

hypothesis represents what we would believe by default, before seeing any evidence. Statistical

significance is a possible finding of the test, declared when the observed sample is unlikely to have

occurred by chance if the null hypothesis were true. The name of the test describes its formulation

and its possible outcome. One characteristic of the test is its crisp decision: to reject or not reject

the null hypothesis. A calculated value is compared to a threshold, which is determined from the

tolerable risk of error.

Again, the designer of a statistical test wants to maximize the good probabilities and minimize the

bad probabilities.

[edit]Example 4 - Lady Tasting Tea

The following example is summarized from Fisher, and is known as the Lady tasting tea example.
[5]
 Fisher thoroughly explained his method in a proposed experiment to test a Lady's claimed ability

to determine the means of tea preparation by taste. The article is less than 10 pages in length and

is notable for its simplicity and completeness regarding terminology, calculations and design of the

experiment. The example is loosely based on an event in Fisher's life. The Lady proved him wrong.
[6]

1. The null hypothesis was that the Lady had no such ability.

2. The test statistic was a simple count of the number of successes in 8 trials.
3. The distribution associated with the null hypothesis was the binomial distribution familiar

from coin flipping experiments.

4. The critical region was the single case of 8 successes in 8 trials based on a conventional

probability criterion (< 5%).

5. Fisher asserted that no alternative hypothesis was (ever) required.

If and only if the 8 trials produced 8 successes was Fisher willing to reject the null hypothesis –

effectively acknowledging the Lady's ability with > 98% confidence (but without quantifying her

ability). Fisher later discussed the benefits of more trials and repeated tests.

[edit]The Testing Process

Hypothesis testing is defined by the following general procedure:

1. The first step in any hypothesis testing is to state the relevant null and alternative

hypotheses to be tested. This is important as mis-stating the hypotheses will muddy the

rest of the process.

2. The second step is to consider the statistical assumptions being made about the sample

in doing the test; for example, assumptions about the statistical independence or about

the form of the distributions of the observations. This is equally important as invalid

assumptions will mean that the results of the test are invalid.

3. Decide which test is appropriate, and stating the relevant test statistic T.

4. Derive the distribution of the test statistic under the null hypothesis from the

assumptions. In standard cases this will be a well-known result. For example the test

statistics may follow a Student's t distribution or a normal distribution.

5. The distribution of the test statistic partitions the possible values of T into those for which

the null-hypothesis is rejected, the so calledcritical region, and those for which it is not.

6. Compute from the observations the observed value tobs of the test statistic T.

7. Decide to either fail to reject the null hypothesis or reject it in favor of the alternative.

The decision rule is to reject the null hypothesis H0 if the observed value tobs is in the

critical region, and to accept or "fail to reject" the hypothesis otherwise.

It is important to note the philosophical difference between accepting the null hypothesis and

simply failing to reject it. The "fail to reject" terminology highlights the fact that the null hypothesis is

assumed to be true from the start of the test; if there is a lack of evidence against it, it simply

continues to be assumed true. The phrase "accept the null hypothesis" may suggest it has been

proved simply because it has not been disproved, a logical fallacy known as the argument from

ignorance. Unless a test with particularly high power is used, the idea of "accepting" the null
hypothesis may be dangerous. Nonetheless the terminology is prevalent throughout statistics,

where its meaning is well understood.

[edit]Definition of terms

The following definitions are mainly based on the exposition in the book by Lehmann and Romano:
[7]

Simple hypothesis 

Any hypothesis which specifies the population distribution completely.


Composite hypothesis 

Any hypothesis which does not specify the population distribution completely.


Statistical test 

A decision function that takes its values in the set of hypotheses.


Region of acceptance 

The set of values for which we fail to reject the null hypothesis.
Region of rejection / Critical region

The set of values of the test statistic for which the null hypothesis is rejected.
Power of a test (1 − β)

The test's probability of correctly rejecting the null hypothesis. The complement of the false negative rate, β.
Size / Significance level of a test (α)

For simple hypotheses, this is the test's probability of incorrectly rejecting the null hypothesis. The false

positive rate. For composite hypotheses this is the upper bound of the probability of rejecting the null hypothesis

over all cases covered by the null hypothesis.


Most powerful test

For a given size or significance level, the test with the greatest power.


Uniformly most powerful test (UMP)

A test with the greatest power for all values of the parameter being tested.
Consistent test

When considering the properties of a test as the sample size grows, a test is said to be consistent if, for a fixed

size of test, the power against any fixed alternative approaches 1 in the limit.[8]
Unbiased test 

For a specific alternative hypothesis, a test is said to be unbiased when the probability of rejecting the null

hypothesis is not less than the significance level when the alternative is true and is less than or equal to the

significance level when the null hypothesis is true.


Conservative test 

A test is conservative if, when constructed for a given nominal significance level, the true probability

of incorrectly rejecting the null hypothesis is never greater than the nominal level.
Steps in Hypothesis Testing (1 of 5)

The basic logic of hypothesis testing has been presented somewhat informally in the
sections on "Ruling out chance as an explanation" and the "Null hypothesis." In this
section the logic will be presented in more detail and more formally.

1. The first step in hypothesis testing is to specify the null hypothesis (H0) and


the alternative hypothesis (H1). If the research concerns whether one method
of presenting pictorial stimuli leads to better recognition than another, the
null hypothesis would most likely be that there is no difference between
methods (H0: μ1 - μ2 = 0). The alternative hypothesis would be H1: μ1 ≠ μ2. If the
research concerned the correlation between grades and SAT scores, the null
hypothesis would most likely be that there is no correlation (H0: ρ= 0). The
alternative hypothesis would be H1: ρ ≠ 0.

2. The next step is to select a significance level. Typically the 0.05 or the 0.01
level is used.

3. The third step is to calculate a statistic analogous to the parameter specified


by the null hypothesis. If the null hypothesis were defined by the parameter
μ1- μ2, then the statistic M1 - M2 would be computed.
4. The fourth step is to calculate the probability value (often called the p value).
The p value is the probability of obtaining a statistic as different or more
different from the parameter specified in the null hypothesis as the statistic
computed from the data. The calculations are made assuming that the null
hypothesis is true. (click here for a concrete example)

5. The probability value computed in Step 4 is compared with the significance


level chosen in Step 2. If the probability is less than or equal to the significance
level, then the null hypothesis is rejected; if the probability is greater than the
significance level then the null hypothesis is not rejected. When the null
hypothesis is rejected, the outcome is said to be "statistically significant" when
the null hypothesis is not rejected then the outcome is said be "not statistically
significant."

6. If the outcome is statistically significant, then the null hypothesis is rejected in


favor of the alternative hypothesis. If the rejected null hypothesis were that μ 1-
μ2 = 0, then the alternative hypothesis would be that μ 1≠ μ2. If M1 were greater
than M2 then the researcher would naturally conclude that μ 1 ≥ μ2.
(Click here to see why you can conclude more than μ1 ≠ μ2)
7. he final step is to describe the result and the statistical conclusion in an
understandable way. Be sure to present the descriptive statistics as well as
whether the effect was significant or not. For example, a significant difference
between a group that received a drug and a control group might be described as
follow:

Subjects in the drug group scored significantly higher (M = 23) than did
subjects in the control group (M = 17), t(18) = 2.4, p = 0.027.

The statement that "t(18) =2.4" has to do with how the probability value (p)
was calculated. A small minority of researchers might object to two aspects of
this wording. First, some believe that the significance level rather than the
probability level should be reported. The argument for reporting the probability
value is presented in another section. Second, since the alternative hypothesis
was stated as µ1 ≠ µ2, some might argue that it can only be concluded that the
population means differ and not that the population mean for the drug group is
higher than the population mean for the control group. 

Steps in Hypothesis Testing (4 of 5)

This argument is misguided. Intuitively, there are strong reasons for inferring that
the direction of the difference in the population is the same as the difference in the
sample. There is also a more formal argument. A non significant effect might be
described as follows:
Although subjects in the drug group scored higher (M = 23) than did subjects in the
control group, (M = 20), the difference between means was not significant, t(18) =
1.4, p = 0.179.

It would not have been correct to say that there was no difference between
the performance of the two groups. There was a difference. It is just that
the difference was not large enough to rule out chance as an explanation of
the difference. It would also have been incorrect to imply that there is no
difference in the population. Be sure not to accept the null hypothesis. teps
in Hypothesis Testing (5 of 5)

Next section: Why the null hypothesis is not accepted 

At this point you may wish to see a concrete example of using these seven
steps in hypothesis testing. If so, jump to the section on "Tests of μ, σ
known." 

Next section: Why the null hypothesis is not accepted 

You might also like