Types of Data5

Choosing which statistical test to use.
Search the link below on google
Three major reasons for testing:
I. Means (test for means, difference of two means/paired, difference

of two means/independent sample)
II. Proportions (test for proportion, difference of two proportions
III. Relationships (chi-square, regression..)
NB
-Mean should be used when the data set does not have outliers
- When outliers are present, Median is best placed to describe the dataset than median
-Mode is mostly used when the no numbers are used eg strongly disagree, disagree,
not sure, agree, strongly agree.. the response with highest value is the mode.
Ie Mode is used when data is not numerical
Mr. MULI K. EMMANUEL

PhD Candidate, UoN
When trying to figure out which statistical test to use, ask yourself the following
questions
1. What level of measurement was used in the type of data we are analyzing?
Is the data nominal or interval/ratio??

Nominal data is also called categorical, qualitative or nonparametric
Examples, frequencies, proportions and percentages
Frequencies e.g 30 were present while 5 were absent people (this can add up
to any total depending on the number of participants)
proportion e.g 0.92 was okay while 0.08 was defective (this has added up to
1)
Percentages will obviously add up to 100
Tests that involve nominal data are,
Test for proportion

Differences in two proportions
Chi-square test for independence
NB/ did you know that age group (15-24 years) is categorical?

PhD Candidate, UoN
Tests that involve interval/ratio data are or quantitative
-Daily sales
-Temperature
-Age
The most common summery value for the interval/ratio data is mean eg the mean
age, mean amount spent, the mean bars per week
Tests that involve interval/ratio data are,
1. Test for means,

2. Difference of two means/paired
3. Difference of two means/independent sample
4. Regression
Ordinal data can be classified with both nominal and interview ratio
depending on the circumstances
2. How many samples do we have?

PhD Candidate, UoN
Is it one sample in which we are testing the relevant statistics against the
hypothesized value or are they two samples being compared from one another?
Or is it one sample in which each observation is a measure or score for more than
variable? Where the same sample is measured twice.
3. What is the purpose of our analysis
We can be testing against the hypothesized value

Comparing between statistics
Or looking for a relationship (Regression & chi-squire)
Regression and chi-square are same in use but,
-chi-square: uses nominal data and the results are presented in table form
- Regression: uses interval/ratio data and is presented in scatter plot
For more examples link below
https://www.youtube.com/results?search_query=choosing+which+statistical+t
est+to+use

PhD Candidate, UoN
Hypothesis Testing
Null hypothesis.
1. When the p-value is less than the alpha value you reject null hypothesis, it
means theirs a statistically significance in the observed count. Hence
alternative hypothesis is selected
This means that there is a difference between variables
P<@= REJECT NULL
2. When the p-value is greater than the alpha value you accept(or fail to reject
null) the null hypothesis, it means there is No statistically significance in the
observed count.
This means that there is no difference between variables
P>@= ACCEPT NULL

PhD Candidate, UoN
According to John W. Creswell, we do not accept null hypothesis..but we fail to
reject
We dont prove null hypothesis knowledge is conjectural
NB/ In manual chi-square tests, the value for chi-square is calculated, then
critical value determined from the table(horizontal first figures are DF,
vertical is sig level..if critical value is less than chi-square value, then the
chi-square will lie on the rejection regionthis means there is difference, it
is significant.
Critical value acts like alpha value.
when the chi-square value > Critical value = Reject null hypothesis meaning
there is difference, its statistically significant. This is because the chi-square
value will fall in rejection region
when the chi-square value < Critical value = We fail to reject null hypothesis
meaning there is no difference, its not statistically significant.
Type 1 Error= The researcher incorrectly reject null hypothesis
Type 2 Error= The researcher incorrectly accept the null

PhD Candidate, UoN
-The probability committing Type I error is an alpha value (Having false positive)
-The probability committing Type II error is an Beta (false negative)
Alpha is mostly at 5%...that is, the probability of committing type I error is 5%
Beta is more complex, we do not estimate the Beta value, rather, the sample size,
significance level and effect size influence the Beta value and similarly, they
influence the power, power= 1-B
- Power is the probability that we will detect the difference that is actually
there, detect a true difference. Beta is a probability of type II error, which is
fail to reject null when it should have been rejected.
To determine the type II error, sample size should be known, effect size and p-
value
When power value reduces, Beta value increases
MEASUREMENT SCALE
PhD Candidate, UoN
Operational definition of variables yield information or data that is to be analyzed, The
choice of the statistical procedure of analysis is determined by the type of measurement scare
Nominal:
-Are lowest level of measurement
-merely groups subjects or cases from sample into categories which has
same characteristics eg sex, race marital status, employment status,
language, religion
Nominal scales are used for labeling variables, without any quantitative
value. Nominal scales could simply be called labels EG, gender
(dichotomous i.e only two possible values, YES/NO, AM/PM), color,
MEAL PREFERENCE: Breakfast, Lunch, Dinner
RELIGIOUS PREFERENCE: 1 = Buddhist, 2 = Muslim, 3 = Christian, 4 = Jewish, 5 = Other
POLITICAL ORIENTATION: Republican, Democratic, Libertarian, Green.ODM, NASA,

JUBILEE
Ordinal
Ordinal
Ordinal refers to order in measurement. An ordinal scale indicates direction, in addition to

providing nominal information. Low/Medium/High; or Faster/Slower are examples of ordinal
levels of measurement. Ranking an experience as a "nine" on a scale of 1 to 10 tells us that it was
higher than an experience ranked as a "six." Many psychological scales or inventories are at the
ordinal level of measurement.
Examples:
RANK: 1st place, 2nd place, ... last place
LEVEL OF AGREEMENT: No, Maybe, Yes
POLITICAL ORIENTATION: Left, Center, Right

PhD Candidate, UoN
With ordinal scales, it is the order of the values is whats important and
significant, but the differences between each one is not really known.
Ordinal scales are typically measures of non-numeric concepts like
satisfaction, happiness, discomfort eg
Interval
An example of an interval scale is temperature, either measured on a Fahrenheit or Celsius scale
Examples:
TIME OF DAY on a 12-hour clock
POLITICAL ORIENTATION: Score on standardized scale of political
orientation
OTHER scales constructed so as to possess equal intervals
Interval time of day - equal intervals; analog (12-hr.) clock, difference between 1 and
2 pm is same as difference between 11 and 12 am
Interval scales are numeric scales in which we know not only the order, but also
the exact differences between the values. The classic example of an interval scale
is Celsius temperature because the difference between each value is the same. For
example, the difference between 60 and 50 degrees is a measurable 10 degrees, as

PhD Candidate, UoN
is the difference between 80 and 70 degrees. Time is another good example of an
interval scale in which the increments are known, consistent, and measurable.
Ratio
Ratio scales are the ultimate nirvana when it comes to measurement scales because
they tell us about the order, they tell us the exact value between units, AND they
also have an absolute zerowhich allows for a wide range of both descriptive and
inferential statistics to be applied. At the risk of repeating myself, everything above
about interval data applies to ratio scales + ratio scales have a clear definition of
zero. Good examples of ratio variables include height and weight.
Ratio scales provide a wealth of possibilities when it comes to statistical

analysis. These variables can be meaningfully added, subtracted, multiplied, divided
(ratios). Central tendency can be measured by mode, median, or mean; measures of
dispersion, such as standard deviation and coefficient of variation can also be
calculated from ratio scales.
In addition to possessing the qualities of nominal, ordinal, and interval scales, a

ratio scale has an absolute zero (a point where none of the quality being measured
exists). Using a ratio scale permits comparisons such as being twice as high, or
one-half as much. Reaction time (how long it takes to respond to a signal of some
sort) uses a ratio scale of measurement -- time. Although an individual's reaction
time is always greater than zero, we conceptualize a zero point in time, and can
state that a response of 24 milliseconds is twice as fast as a response time of 48
milliseconds.
Examples:
RULER: inches or centimeters YEARS of work experience
INCOME: money earned last year NUMBER of children
GPA: grade point average
PhD Candidate, UoN
Ratio - 24-hr. time has an absolute 0 (midnight); 14 o'clock is twice as long from
midnight as 7 o'clock

PhD Candidate, UoN
Types of Statistical Data: Numerical, Categorical, and Ordinal
when working with statistics, its important to recognize the different types of data:
numerical (discrete and continuous), categorical, and ordinal.
A. Numerical data.
These data have meaning as a measurement, such as a persons height, weight, IQ,
or blood pressure; or theyre a count, such as the number of stock shares a person
owns, how many teeth a dog has,
1. Discrete data represent items that can be counted; they take on possible
values that can be listed out. The list of possible values may be fixed (also
called finite); or it may go from 0, 1, 2, on to infinite
2. Continuous data represent measurements; their possible values cannot be

counted and can only be described using intervals on the real number line.
For example, the exact amount of gas purchased at the pump for cars with
20-gallon tanks would be continuous data from 0 gallons to 20 gallons,
represented by the interval [0, 20], inclusive. You might pump 8.40 gallons,
or 8.41, or 8.414863 gallons, or any possible number from 0 to 20.
B. Categorical data:
NB/
Dichotomous data is that with only two possible answer eg gender
male/female
Dummy variables take only two possible values, 0 and 1. They signify
conceptual opposites: war vs. peace, fixed exchange rate vs. floating
exchange rate, etc

PhD Candidate, UoN
Categorical data represent characteristics such as a persons gender, marital
status, hometown, or the types of movies they like. Categorical data can take on
numerical values (such as 1 indicating male and 2 indicating female), but
those numbers dont have mathematical meaning. You couldnt add them
together, for example. (Other names for categorical data are qualitative data, or
Yes/No data.)
C. Ordinal data
mixes numerical and categorical data. The data fall into categories, but the
numbers placed on the categories have meaning. For example, rating a
restaurant on a scale from 0 (lowest) to 4 (highest) stars gives ordinal data.
Ordinal data are often treated as categorical, where the groups are ordered when
graphs and charts are made. However, unlike categorical data, the numbers do
have mathematical meaning. For example, if you survey 100 people and ask
them to rate a restaurant on a scale from 0 to 4, taking the average of the 100
responses will have meaning. This would not be
the case with categorical data.
Normal vs. Binomial:
NORMAL (z) DISTRIBUTION
The normal (z) distribution is a continuous distribution that arises in many

natural processes. "Continuous" means that between any two data values we
PhD Candidate, UoN
could (at least in theory) find another data value. For example, men's heights
vary continuously and are the result of so many tiny random influences that the
overall distribution of men's heights in America is very close to normal.
Another example is the data values that we would get if we repeatedly
measured the mass of a reference object on a pan balancethe readings would
differ slightly because of random errors, and the readings taken as a whole
would have a normal distribution.
We use skewness and kurtosis to measure the normality of data.
Skewness is measure of symmetry of distribution. Normal distribution is
symmetric and has 0 skewness
0 means data is perfectly skewed,

When data is less than -1 or greater than +1 we say that data is skewed
- and + means the distribution is approximately distributed.
Kurtosis
Is Preakness or flatness of the distribution
-Leptokurtic: means that high peak with kurtosis value of >3
-Mesokurtic :( Normal) has kurtosis of 3
-Platykurtic: has high flatness, no peak.has kurtosis of -3
The bell-shaped normal curve has probabilities that are found as the area between
any two z values.
BINOMIAL DISTRIBUTION
A binomial distribution is very different from a normal distribution, and yet if

PhD Candidate, UoN
the sample size is large enough, the shapes will be quite similar.
The key difference is that a binomial distribution is discrete, not continuous. In

other words, it is NOT possible to find a data value between any two data
values.
PARAMETRIC VS NON-PARAMETRIC
PARAMETRIC: normal data, continuous data, there is by theory a value

between any two data sets. Statistics are measures computed from sample to
estimate parameters
NON-PARAMETRIC/ RANK SCORE: Non-normal distribution,

discrete data value between two data sets. Data inform of ranking and not
scores
Non-normal data can be converted to normal data to avoid performing
nonparametric tests which are less powerfulafter data transformation, data
parametric tests are conducted.
Alternative parametric tests

When a choice exists between using a parametric or a nonparametric procedure, and you are
relatively certain that the assumptions for the parametric procedure are satisfied, then use the
parametric procedure.
The following is a list of the nonparametric tests, and their parametric alternatives.
Nonparametric test Alternative parametric test

1-sample sign test 1-sample Z-test, 1-sample t-test
1-sample Wilcoxon test 1-sample Z-test, 1-sample t-test
Mann-Whitney test 2-sample t-test
PhD Candidate, UoN
Nonparametric test Alternative parametric test
Kruskal-Wallis test One-way ANOVA
Mood's Median test One-way ANOVA
Friedman test Two-way ANOVA
NB/ Nonparametric tests often require you to modify the hypotheses. For example, most
nonparametric tests about the population center are tests about the median instead of the
mean. The test does not answer the same question as the corresponding parametric
procedure.
Multiple regression
Standardization of the coefficient is usually done to answer the question of which of the
independent variables have a greater effect on the dependent variable in a multiple regression
analysis, when the variables are measured in different units of measurement (for example,
income measured in dollars and family size measured in number of individuals).

PhD Candidate, UoN

Types of Data5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Types of Data5

Uploaded by

Copyright:

Available Formats

Choosing which statistical test to use.

Search the link below on google

Three major reasons for testing:

I. Means (test for means, difference of two means/paired, difference

Ie Mode is used when data is not numerical

Mr. MULI K. EMMANUEL

Is the data nominal or interval/ratio??

Tests that involve nominal data are,

Test for proportion

Mr. MULI K. EMMANUEL

Tests that involve interval/ratio data are,

1. Test for means,

2. How many samples do we have?

Mr. MULI K. EMMANUEL

3. What is the purpose of our analysis

We can be testing against the hypothesized value

Regression and chi-square are same in use but,

- Regression: uses interval/ratio data and is presented in scatter plot

For more examples link below

Mr. MULI K. EMMANUEL

This means that there is a difference between variables

P<@= REJECT NULL

This means that there is no difference between variables

P>@= ACCEPT NULL

Mr. MULI K. EMMANUEL

We dont prove null hypothesis knowledge is conjectural

Critical value acts like alpha value.

Type 1 Error= The researcher incorrectly reject null hypothesis

Type 2 Error= The researcher incorrectly accept the null

Mr. MULI K. EMMANUEL

-The probability committing Type II error is an Beta (false negative)

Alpha is mostly at 5%...that is, the probability of committing type I error is 5%

When power value reduces, Beta value increases

-Are lowest level of measurement

RELIGIOUS PREFERENCE: 1 = Buddhist, 2 = Muslim, 3 = Christian, 4 = Jewish, 5 = Other

POLITICAL ORIENTATION: Republican, Democratic, Libertarian, Green.ODM, NASA,

Ordinal refers to order in measurement. An ordinal scale indicates direction, in addition to

Mr. MULI K. EMMANUEL

An example of an interval scale is temperature, either measured on a Fahrenheit or Celsius scale

Mr. MULI K. EMMANUEL

Ratio scales provide a wealth of possibilities when it comes to statistical

In addition to possessing the qualities of nominal, ordinal, and interval scales, a

Mr. MULI K. EMMANUEL

2. Continuous data represent measurements; their possible values cannot be

Mr. MULI K. EMMANUEL

Normal vs. Binomial:

NORMAL (z) DISTRIBUTION

The normal (z) distribution is a continuous distribution that arises in many

0 means data is perfectly skewed,

A binomial distribution is very different from a normal distribution, and yet if

The key difference is that a binomial distribution is discrete, not continuous. In

PARAMETRIC: normal data, continuous data, there is by theory a value

NON-PARAMETRIC/ RANK SCORE: Non-normal distribution,

Alternative parametric tests

Nonparametric test Alternative parametric test

Mr. MULI K. EMMANUEL

You might also like