You are on page 1of 16

Choosing which statistical test to use.

Search the link below on google

Three major reasons for testing:

I. Means (test for means, difference of two means/paired, difference


of two means/independent sample)
II. Proportions (test for proportion, difference of two proportions
III. Relationships (chi-square, regression..)

NB

-Mean should be used when the data set does not have outliers

- When outliers are present, Median is best placed to describe the dataset than median

-Mode is mostly used when the no numbers are used eg strongly disagree, disagree,
not sure, agree, strongly agree.. the response with highest value is the mode.

Ie Mode is used when data is not numerical

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
When trying to figure out which statistical test to use, ask yourself the following
questions

1. What level of measurement was used in the type of data we are analyzing?

Is the data nominal or interval/ratio??


Nominal data is also called categorical, qualitative or nonparametric
Examples, frequencies, proportions and percentages

Frequencies e.g 30 were present while 5 were absent people (this can add up
to any total depending on the number of participants)
proportion e.g 0.92 was okay while 0.08 was defective (this has added up to
1)
Percentages will obviously add up to 100

Tests that involve nominal data are,

Test for proportion


Differences in two proportions
Chi-square test for independence

NB/ did you know that age group (15-24 years) is categorical?

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
Tests that involve interval/ratio data are or quantitative

-Daily sales

-Temperature

-Age

The most common summery value for the interval/ratio data is mean eg the mean
age, mean amount spent, the mean bars per week

Tests that involve interval/ratio data are,

1. Test for means,


2. Difference of two means/paired
3. Difference of two means/independent sample
4. Regression

Ordinal data can be classified with both nominal and interview ratio
depending on the circumstances

2. How many samples do we have?

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
Is it one sample in which we are testing the relevant statistics against the
hypothesized value or are they two samples being compared from one another?

Or is it one sample in which each observation is a measure or score for more than
variable? Where the same sample is measured twice.

3. What is the purpose of our analysis

We can be testing against the hypothesized value


Comparing between statistics
Or looking for a relationship (Regression & chi-squire)

Regression and chi-square are same in use but,

-chi-square: uses nominal data and the results are presented in table form

- Regression: uses interval/ratio data and is presented in scatter plot

For more examples link below

https://www.youtube.com/results?search_query=choosing+which+statistical+t
est+to+use

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
Hypothesis Testing

Null hypothesis.

1. When the p-value is less than the alpha value you reject null hypothesis, it
means theirs a statistically significance in the observed count. Hence
alternative hypothesis is selected

This means that there is a difference between variables

P<@= REJECT NULL

2. When the p-value is greater than the alpha value you accept(or fail to reject
null) the null hypothesis, it means there is No statistically significance in the
observed count.

This means that there is no difference between variables

P>@= ACCEPT NULL

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
According to John W. Creswell, we do not accept null hypothesis..but we fail to
reject

We dont prove null hypothesis knowledge is conjectural

NB/ In manual chi-square tests, the value for chi-square is calculated, then
critical value determined from the table(horizontal first figures are DF,
vertical is sig level..if critical value is less than chi-square value, then the
chi-square will lie on the rejection regionthis means there is difference, it
is significant.

Critical value acts like alpha value.

when the chi-square value > Critical value = Reject null hypothesis meaning
there is difference, its statistically significant. This is because the chi-square
value will fall in rejection region

when the chi-square value < Critical value = We fail to reject null hypothesis
meaning there is no difference, its not statistically significant.

Type 1 Error= The researcher incorrectly reject null hypothesis

Type 2 Error= The researcher incorrectly accept the null

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
-The probability committing Type I error is an alpha value (Having false positive)

-The probability committing Type II error is an Beta (false negative)

Alpha is mostly at 5%...that is, the probability of committing type I error is 5%

Beta is more complex, we do not estimate the Beta value, rather, the sample size,
significance level and effect size influence the Beta value and similarly, they
influence the power, power= 1-B

- Power is the probability that we will detect the difference that is actually
there, detect a true difference. Beta is a probability of type II error, which is
fail to reject null when it should have been rejected.

To determine the type II error, sample size should be known, effect size and p-
value

When power value reduces, Beta value increases

MEASUREMENT SCALE
Mr. MULI K. EMMANUEL
PhD Candidate, UoN
Operational definition of variables yield information or data that is to be analyzed, The
choice of the statistical procedure of analysis is determined by the type of measurement scare

Nominal:

-Are lowest level of measurement

-merely groups subjects or cases from sample into categories which has
same characteristics eg sex, race marital status, employment status,
language, religion

Nominal scales are used for labeling variables, without any quantitative
value. Nominal scales could simply be called labels EG, gender
(dichotomous i.e only two possible values, YES/NO, AM/PM), color,
MEAL PREFERENCE: Breakfast, Lunch, Dinner

RELIGIOUS PREFERENCE: 1 = Buddhist, 2 = Muslim, 3 = Christian, 4 = Jewish, 5 = Other

POLITICAL ORIENTATION: Republican, Democratic, Libertarian, Green.ODM, NASA,


JUBILEE

Ordinal

Ordinal

Ordinal refers to order in measurement. An ordinal scale indicates direction, in addition to


providing nominal information. Low/Medium/High; or Faster/Slower are examples of ordinal
levels of measurement. Ranking an experience as a "nine" on a scale of 1 to 10 tells us that it was
higher than an experience ranked as a "six." Many psychological scales or inventories are at the
ordinal level of measurement.

Examples:
RANK: 1st place, 2nd place, ... last place
LEVEL OF AGREEMENT: No, Maybe, Yes
POLITICAL ORIENTATION: Left, Center, Right

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
With ordinal scales, it is the order of the values is whats important and
significant, but the differences between each one is not really known.
Ordinal scales are typically measures of non-numeric concepts like
satisfaction, happiness, discomfort eg

Interval

An example of an interval scale is temperature, either measured on a Fahrenheit or Celsius scale

Examples:
TIME OF DAY on a 12-hour clock
POLITICAL ORIENTATION: Score on standardized scale of political
orientation
OTHER scales constructed so as to possess equal intervals

Interval time of day - equal intervals; analog (12-hr.) clock, difference between 1 and
2 pm is same as difference between 11 and 12 am

Interval scales are numeric scales in which we know not only the order, but also
the exact differences between the values. The classic example of an interval scale
is Celsius temperature because the difference between each value is the same. For
example, the difference between 60 and 50 degrees is a measurable 10 degrees, as

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
is the difference between 80 and 70 degrees. Time is another good example of an
interval scale in which the increments are known, consistent, and measurable.

Ratio

Ratio scales are the ultimate nirvana when it comes to measurement scales because
they tell us about the order, they tell us the exact value between units, AND they
also have an absolute zerowhich allows for a wide range of both descriptive and
inferential statistics to be applied. At the risk of repeating myself, everything above
about interval data applies to ratio scales + ratio scales have a clear definition of
zero. Good examples of ratio variables include height and weight.

Ratio scales provide a wealth of possibilities when it comes to statistical


analysis. These variables can be meaningfully added, subtracted, multiplied, divided
(ratios). Central tendency can be measured by mode, median, or mean; measures of
dispersion, such as standard deviation and coefficient of variation can also be
calculated from ratio scales.

In addition to possessing the qualities of nominal, ordinal, and interval scales, a


ratio scale has an absolute zero (a point where none of the quality being measured
exists). Using a ratio scale permits comparisons such as being twice as high, or
one-half as much. Reaction time (how long it takes to respond to a signal of some
sort) uses a ratio scale of measurement -- time. Although an individual's reaction
time is always greater than zero, we conceptualize a zero point in time, and can
state that a response of 24 milliseconds is twice as fast as a response time of 48
milliseconds.

Examples:
RULER: inches or centimeters YEARS of work experience
INCOME: money earned last year NUMBER of children
GPA: grade point average
Mr. MULI K. EMMANUEL
PhD Candidate, UoN
Ratio - 24-hr. time has an absolute 0 (midnight); 14 o'clock is twice as long from
midnight as 7 o'clock

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
Types of Statistical Data: Numerical, Categorical, and Ordinal

when working with statistics, its important to recognize the different types of data:
numerical (discrete and continuous), categorical, and ordinal.

A. Numerical data.
These data have meaning as a measurement, such as a persons height, weight, IQ,
or blood pressure; or theyre a count, such as the number of stock shares a person
owns, how many teeth a dog has,
1. Discrete data represent items that can be counted; they take on possible
values that can be listed out. The list of possible values may be fixed (also
called finite); or it may go from 0, 1, 2, on to infinite

2. Continuous data represent measurements; their possible values cannot be


counted and can only be described using intervals on the real number line.
For example, the exact amount of gas purchased at the pump for cars with
20-gallon tanks would be continuous data from 0 gallons to 20 gallons,
represented by the interval [0, 20], inclusive. You might pump 8.40 gallons,
or 8.41, or 8.414863 gallons, or any possible number from 0 to 20.

B. Categorical data:

NB/
Dichotomous data is that with only two possible answer eg gender
male/female

Dummy variables take only two possible values, 0 and 1. They signify
conceptual opposites: war vs. peace, fixed exchange rate vs. floating
exchange rate, etc

Mr. MULI K. EMMANUEL


PhD Candidate, UoN
Categorical data represent characteristics such as a persons gender, marital
status, hometown, or the types of movies they like. Categorical data can take on
numerical values (such as 1 indicating male and 2 indicating female), but
those numbers dont have mathematical meaning. You couldnt add them
together, for example. (Other names for categorical data are qualitative data, or
Yes/No data.)

C. Ordinal data

mixes numerical and categorical data. The data fall into categories, but the
numbers placed on the categories have meaning. For example, rating a
restaurant on a scale from 0 (lowest) to 4 (highest) stars gives ordinal data.
Ordinal data are often treated as categorical, where the groups are ordered when
graphs and charts are made. However, unlike categorical data, the numbers do
have mathematical meaning. For example, if you survey 100 people and ask
them to rate a restaurant on a scale from 0 to 4, taking the average of the 100
responses will have meaning. This would not be
the case with categorical data.

Normal vs. Binomial:

NORMAL (z) DISTRIBUTION

The normal (z) distribution is a continuous distribution that arises in many


natural processes. "Continuous" means that between any two data values we
Mr. MULI K. EMMANUEL
PhD Candidate, UoN
could (at least in theory) find another data value. For example, men's heights
vary continuously and are the result of so many tiny random influences that the
overall distribution of men's heights in America is very close to normal.
Another example is the data values that we would get if we repeatedly
measured the mass of a reference object on a pan balancethe readings would
differ slightly because of random errors, and the readings taken as a whole
would have a normal distribution.
We use skewness and kurtosis to measure the normality of data.
Skewness is measure of symmetry of distribution. Normal distribution is
symmetric and has 0 skewness

0 means data is perfectly skewed,


When data is less than -1 or greater than +1 we say that data is skewed
- and + means the distribution is approximately distributed.

Kurtosis
Is Preakness or flatness of the distribution
-Leptokurtic: means that high peak with kurtosis value of >3
-Mesokurtic :( Normal) has kurtosis of 3
-Platykurtic: has high flatness, no peak.has kurtosis of -3

The bell-shaped normal curve has probabilities that are found as the area between
any two z values.

BINOMIAL DISTRIBUTION

A binomial distribution is very different from a normal distribution, and yet if


Mr. MULI K. EMMANUEL
PhD Candidate, UoN
the sample size is large enough, the shapes will be quite similar.

The key difference is that a binomial distribution is discrete, not continuous. In


other words, it is NOT possible to find a data value between any two data
values.

PARAMETRIC VS NON-PARAMETRIC

PARAMETRIC: normal data, continuous data, there is by theory a value


between any two data sets. Statistics are measures computed from sample to
estimate parameters

NON-PARAMETRIC/ RANK SCORE: Non-normal distribution,


discrete data value between two data sets. Data inform of ranking and not
scores
Non-normal data can be converted to normal data to avoid performing
nonparametric tests which are less powerfulafter data transformation, data
parametric tests are conducted.

Alternative parametric tests


When a choice exists between using a parametric or a nonparametric procedure, and you are
relatively certain that the assumptions for the parametric procedure are satisfied, then use the
parametric procedure.

The following is a list of the nonparametric tests, and their parametric alternatives.

Nonparametric test Alternative parametric test


1-sample sign test 1-sample Z-test, 1-sample t-test
1-sample Wilcoxon test 1-sample Z-test, 1-sample t-test
Mann-Whitney test 2-sample t-test
Mr. MULI K. EMMANUEL
PhD Candidate, UoN
Nonparametric test Alternative parametric test
Kruskal-Wallis test One-way ANOVA
Mood's Median test One-way ANOVA
Friedman test Two-way ANOVA

NB/ Nonparametric tests often require you to modify the hypotheses. For example, most
nonparametric tests about the population center are tests about the median instead of the
mean. The test does not answer the same question as the corresponding parametric
procedure.

Multiple regression
Standardization of the coefficient is usually done to answer the question of which of the
independent variables have a greater effect on the dependent variable in a multiple regression
analysis, when the variables are measured in different units of measurement (for example,
income measured in dollars and family size measured in number of individuals).

Mr. MULI K. EMMANUEL


PhD Candidate, UoN

You might also like