Professional Documents
Culture Documents
Cross sectional survey they interview a different group of respondents every time
Panel study the same people are asked the same questions at various points in time
Coding the process of translating information in numbers
Frequency a table that shows the result of totaling the number of cases in each category of your variable
Identify the unit of analysis conceptually define your variable operationally define your variable
evaluate the validity and reliability of your variable
CHAPTER 3: Measures of central tendency
Mean, median, mode
Mean arithmetic average, divide the sum of the values of our variable by the number of cases
Median the value of the middle case (once you order raw data [N+1/2] or you find cumulative percent > 50%
in tabular data)
Mode the category with the most cases
Outliers
Raw vs. tabular data and calculating the measures of central tendency
CHAPTER 4: Measures of dispersion
Range highest value-lowest value (one single number)
Interquartile range the range within which the middle two-fourths of the population lie value (75th
percentile)value (25th percentile)
Box-and-whiskers plot a visual presentation of five points {Q0, Q1, Q2, Q3, Q4} {minimum, 25th percentile,
median, 75th percentile, maximum}
Variance the average squared distance form the mean
Standard deviation the square root of the variance; compares the value of each case with the mean
Dichotomous variables variables that may be nominal but have only two categories
Standard deviation for a dichotomous variable is a square root of p(1-p) [p being a percentage]
CHAPTER 5: Continuous probability
Normal (bell) curve symmetrical (no outliers to skew the mean away from the median) and unimodal (has a
single mode or peak) the peak of the curve has to be in the center of the distribution
Skewness to the right or left
Z-score - distance from the mean in terms of standard deviations; how many standard deviations a case is from
the mean
A z-table
- the probability (always between 0 and 1) of being within a z-score below the mean is identical to being
within a z-score above the mean (negative sign does not matter in this case, it only identifies below or
above the mean)
- what is the probability of being between the mean and the z-score
One-tailed vs. two-tailed probabilities what is the probability of being in a tail vs. a range that crosses the
mean but does not include the two wails on either side of it
CHAPTER 6: Means testing
Type I error a false positive finding detects and effect that is not actually present
Type II error a false negative finding fails to detect and effect that is actually present
Means testing comparing the distribution of a variable for an entire population with the distribution of the
same variable for a subset of cases
- is the mean for the subset (sample) different from that for the population and is that due to a chance?
- general standard we want to be 95% certain or to have only a 5% chance of a Type I error
find not only that the mean of the subgroup is different from the population but also that its distribution
is narrow enough to exclude the difference being caused by chance
- a population standard deviation always bigger than a sample standard deviation
Standard error of the mean
- it is dependent on the standard deviation of the values of X and the size of the sample
- the standard error is much tighter around the mean than is the standard deviation
the standard error of the mean is the same thing as the standard deviation of estimated of the mean around
the actual mean
T-score comparing a sample mean with a population mean to see how many standard errors away from the
mean the sample is
- the number of standard errors a sample mean is from a population mean
T-distribution is also dependent on the sample size
D.f. = degrees of freedom one less than n, the sample size (n-1)
T-table
- in the bottom row (infinity sign) the t-scores actually correspond with z-scores because of the large
sample size
- t-table is giving you the probability of being in the tail, whereas the z-table is giving you the probability
of being between mean and the value of X
- if our t-score is greater than the value under p=0.05, we say that the difference between the population
and sample means is statistically significant at the 0.05 level
Dummy variable a dichotomous variable that has been coded with 0 and 1 (nominal level data)
Confidence intervals around an estimated mean
Margin of error if we sandwich the mean in a range that extends a distance of the margin of error on each
side, we are 95% confident that the actually mean falls in that range
- the two tails total a probability of 0.05 and they are symmetrical
- as long as you have the mean and either the standard error or the standard deviation and sample size, we
can calculate the margin of error around our estimate of the mean
- we want to use the t-score for which 95% of the distribution is symmetrical around the mean and 5% of
the distribution is in the two tails
- the role of sample size in determining margin of error to decide how many people to survey
- bigger sample size, smaller margin of error
CHAPTER 8: Describing the pattern
Figures (information in some kind of picture, e.g. graphs for interval-level data) and tables (numbers and words
presented in rows and columns, e.g. contingency tables for categorical data)
Graphs
- a linear or curvilinear relationship between two interval-level variables
- positive/negative relationship between the dependent variable (y-axis) and independent variable (x-axis)
Contingency tables
- the independent variable is in the columns and the dependent variable is in the rows
- collapsing data as we do not want more than five categories (if collapsing interval-level data, keep the
size of the ranges equal + the lower range is included but the upper end only goes up to, but does not
include, the end point, e.g. $0-20,000 and $20,000-40,000)
- name: dependent variable by independent variable (e.g. Partisanship by Income)
It is a fallacy to take aggregate-level data and draw individual-level conclusions the ecological fallacy or an
aggregation bias
- why is there this contradiction between the state-level data and the individual-level data? the difference is in
the unit of analysis