Professional Documents
Culture Documents
NB
-Mean should be used when the data set does not have outliers
- When outliers are present, Median is best placed to describe the dataset than median
-Mode is mostly used when the no numbers are used eg strongly disagree, disagree,
not sure, agree, strongly agree.. the response with highest value is the mode.
1. What level of measurement was used in the type of data we are analyzing?
Frequencies e.g 30 were present while 5 were absent people (this can add up
to any total depending on the number of participants)
proportion e.g 0.92 was okay while 0.08 was defective (this has added up to
1)
Percentages will obviously add up to 100
NB/ did you know that age group (15-24 years) is categorical?
-Daily sales
-Temperature
-Age
The most common summery value for the interval/ratio data is mean eg the mean
age, mean amount spent, the mean bars per week
Ordinal data can be classified with both nominal and interview ratio
depending on the circumstances
Or is it one sample in which each observation is a measure or score for more than
variable? Where the same sample is measured twice.
-chi-square: uses nominal data and the results are presented in table form
https://www.youtube.com/results?search_query=choosing+which+statistical+t
est+to+use
Null hypothesis.
1. When the p-value is less than the alpha value you reject null hypothesis, it
means theirs a statistically significance in the observed count. Hence
alternative hypothesis is selected
2. When the p-value is greater than the alpha value you accept(or fail to reject
null) the null hypothesis, it means there is No statistically significance in the
observed count.
NB/ In manual chi-square tests, the value for chi-square is calculated, then
critical value determined from the table(horizontal first figures are DF,
vertical is sig level..if critical value is less than chi-square value, then the
chi-square will lie on the rejection regionthis means there is difference, it
is significant.
when the chi-square value > Critical value = Reject null hypothesis meaning
there is difference, its statistically significant. This is because the chi-square
value will fall in rejection region
when the chi-square value < Critical value = We fail to reject null hypothesis
meaning there is no difference, its not statistically significant.
Beta is more complex, we do not estimate the Beta value, rather, the sample size,
significance level and effect size influence the Beta value and similarly, they
influence the power, power= 1-B
- Power is the probability that we will detect the difference that is actually
there, detect a true difference. Beta is a probability of type II error, which is
fail to reject null when it should have been rejected.
To determine the type II error, sample size should be known, effect size and p-
value
MEASUREMENT SCALE
Mr. MULI K. EMMANUEL
PhD Candidate, UoN
Operational definition of variables yield information or data that is to be analyzed, The
choice of the statistical procedure of analysis is determined by the type of measurement scare
Nominal:
-merely groups subjects or cases from sample into categories which has
same characteristics eg sex, race marital status, employment status,
language, religion
Nominal scales are used for labeling variables, without any quantitative
value. Nominal scales could simply be called labels EG, gender
(dichotomous i.e only two possible values, YES/NO, AM/PM), color,
MEAL PREFERENCE: Breakfast, Lunch, Dinner
Ordinal
Ordinal
Examples:
RANK: 1st place, 2nd place, ... last place
LEVEL OF AGREEMENT: No, Maybe, Yes
POLITICAL ORIENTATION: Left, Center, Right
Interval
Examples:
TIME OF DAY on a 12-hour clock
POLITICAL ORIENTATION: Score on standardized scale of political
orientation
OTHER scales constructed so as to possess equal intervals
Interval time of day - equal intervals; analog (12-hr.) clock, difference between 1 and
2 pm is same as difference between 11 and 12 am
Interval scales are numeric scales in which we know not only the order, but also
the exact differences between the values. The classic example of an interval scale
is Celsius temperature because the difference between each value is the same. For
example, the difference between 60 and 50 degrees is a measurable 10 degrees, as
Ratio
Ratio scales are the ultimate nirvana when it comes to measurement scales because
they tell us about the order, they tell us the exact value between units, AND they
also have an absolute zerowhich allows for a wide range of both descriptive and
inferential statistics to be applied. At the risk of repeating myself, everything above
about interval data applies to ratio scales + ratio scales have a clear definition of
zero. Good examples of ratio variables include height and weight.
Examples:
RULER: inches or centimeters YEARS of work experience
INCOME: money earned last year NUMBER of children
GPA: grade point average
Mr. MULI K. EMMANUEL
PhD Candidate, UoN
Ratio - 24-hr. time has an absolute 0 (midnight); 14 o'clock is twice as long from
midnight as 7 o'clock
when working with statistics, its important to recognize the different types of data:
numerical (discrete and continuous), categorical, and ordinal.
A. Numerical data.
These data have meaning as a measurement, such as a persons height, weight, IQ,
or blood pressure; or theyre a count, such as the number of stock shares a person
owns, how many teeth a dog has,
1. Discrete data represent items that can be counted; they take on possible
values that can be listed out. The list of possible values may be fixed (also
called finite); or it may go from 0, 1, 2, on to infinite
B. Categorical data:
NB/
Dichotomous data is that with only two possible answer eg gender
male/female
Dummy variables take only two possible values, 0 and 1. They signify
conceptual opposites: war vs. peace, fixed exchange rate vs. floating
exchange rate, etc
C. Ordinal data
mixes numerical and categorical data. The data fall into categories, but the
numbers placed on the categories have meaning. For example, rating a
restaurant on a scale from 0 (lowest) to 4 (highest) stars gives ordinal data.
Ordinal data are often treated as categorical, where the groups are ordered when
graphs and charts are made. However, unlike categorical data, the numbers do
have mathematical meaning. For example, if you survey 100 people and ask
them to rate a restaurant on a scale from 0 to 4, taking the average of the 100
responses will have meaning. This would not be
the case with categorical data.
Kurtosis
Is Preakness or flatness of the distribution
-Leptokurtic: means that high peak with kurtosis value of >3
-Mesokurtic :( Normal) has kurtosis of 3
-Platykurtic: has high flatness, no peak.has kurtosis of -3
The bell-shaped normal curve has probabilities that are found as the area between
any two z values.
BINOMIAL DISTRIBUTION
PARAMETRIC VS NON-PARAMETRIC
The following is a list of the nonparametric tests, and their parametric alternatives.
NB/ Nonparametric tests often require you to modify the hypotheses. For example, most
nonparametric tests about the population center are tests about the median instead of the
mean. The test does not answer the same question as the corresponding parametric
procedure.
Multiple regression
Standardization of the coefficient is usually done to answer the question of which of the
independent variables have a greater effect on the dependent variable in a multiple regression
analysis, when the variables are measured in different units of measurement (for example,
income measured in dollars and family size measured in number of individuals).