You are on page 1of 3

CHAPTER 1: Political use of numbers: Lies and statistics

Scientific method how the world works?


- research question
- independent variable (cause) dependent variable (effect)
- empirical tests confirm or disconfirm the hypothesis
- experiments
- scientific observation
- public records any information gathered and maintained by a governmental body that is openly
available
- surveys, questionnaires
- qualitative vs. quantitative data (statistical analysis)
Descriptive vs. inferential statistics population is being studied vs. sample is being studied
Univariate statistics statistics that analyze a single variable (e.g. measures of central tendency and measures of
dispersion)
Bivariate statistics statistics that analyze two variables
A statistician says a relationship is significant if she is confident that a relationship exists, however weak it
might be
CHAPTER 2: Measurement
Conceptual definition the dictionary definition of a concept; what do we visualize when we use a term?
Operational definition the process by which we translate our observations of reality into a measurement
Reifying concepts like democracy discussing them as if they are real when the are actually abstractions
Indicators characteristics reflecting different aspects of a concept
Levels of measurement:
- classify, order, scale, locate absolute zero
- nominal, ordinal (categorical) and interval, ratio (numerical)
Nominal
- gender, race, religion, hair color, astrological sign no natural order, only categories
Ordinal
- class, educational degrees, socioeconomic status order from less to more but no precise distance
Interval
- change in income, balance of payments can have negative value, a uniform amount
Ratio
- unemployment rate, age, income only a positive value, starts from zero, two ratio values can be
compared proportionally, a uniform amount
- locate absolute zero (e.g. 0 years is absence of any years)
- a percentage or physical measure
Be careful: 2 options + unsure/no opinion/dont know option is nominal, not ordinal, because you do not know
where the third option stands
Validity it makes logical sense to measure our concept in this way
Face validity a particular measure is valid because it makes sense
Consensual validity a measure has widespread use
Associational validity our measure is correlated with other measures connected to the concept about which we
are interested
Predictive validity a measure is a good predictor of an effect we are trying to explain by our concept
Reliability the measurement is consistent; the results do not change with time or researcher

Cross sectional survey they interview a different group of respondents every time
Panel study the same people are asked the same questions at various points in time
Coding the process of translating information in numbers
Frequency a table that shows the result of totaling the number of cases in each category of your variable
Identify the unit of analysis conceptually define your variable operationally define your variable
evaluate the validity and reliability of your variable
CHAPTER 3: Measures of central tendency
Mean, median, mode
Mean arithmetic average, divide the sum of the values of our variable by the number of cases
Median the value of the middle case (once you order raw data [N+1/2] or you find cumulative percent > 50%
in tabular data)
Mode the category with the most cases
Outliers
Raw vs. tabular data and calculating the measures of central tendency
CHAPTER 4: Measures of dispersion
Range highest value-lowest value (one single number)
Interquartile range the range within which the middle two-fourths of the population lie value (75th
percentile)value (25th percentile)
Box-and-whiskers plot a visual presentation of five points {Q0, Q1, Q2, Q3, Q4} {minimum, 25th percentile,
median, 75th percentile, maximum}
Variance the average squared distance form the mean
Standard deviation the square root of the variance; compares the value of each case with the mean
Dichotomous variables variables that may be nominal but have only two categories
Standard deviation for a dichotomous variable is a square root of p(1-p) [p being a percentage]
CHAPTER 5: Continuous probability
Normal (bell) curve symmetrical (no outliers to skew the mean away from the median) and unimodal (has a
single mode or peak) the peak of the curve has to be in the center of the distribution
Skewness to the right or left
Z-score - distance from the mean in terms of standard deviations; how many standard deviations a case is from
the mean
A z-table
- the probability (always between 0 and 1) of being within a z-score below the mean is identical to being
within a z-score above the mean (negative sign does not matter in this case, it only identifies below or
above the mean)
- what is the probability of being between the mean and the z-score
One-tailed vs. two-tailed probabilities what is the probability of being in a tail vs. a range that crosses the
mean but does not include the two wails on either side of it
CHAPTER 6: Means testing
Type I error a false positive finding detects and effect that is not actually present
Type II error a false negative finding fails to detect and effect that is actually present
Means testing comparing the distribution of a variable for an entire population with the distribution of the
same variable for a subset of cases
- is the mean for the subset (sample) different from that for the population and is that due to a chance?
- general standard we want to be 95% certain or to have only a 5% chance of a Type I error

find not only that the mean of the subgroup is different from the population but also that its distribution
is narrow enough to exclude the difference being caused by chance
- a population standard deviation always bigger than a sample standard deviation
Standard error of the mean
- it is dependent on the standard deviation of the values of X and the size of the sample
- the standard error is much tighter around the mean than is the standard deviation
the standard error of the mean is the same thing as the standard deviation of estimated of the mean around
the actual mean
T-score comparing a sample mean with a population mean to see how many standard errors away from the
mean the sample is
- the number of standard errors a sample mean is from a population mean
T-distribution is also dependent on the sample size
D.f. = degrees of freedom one less than n, the sample size (n-1)
T-table
- in the bottom row (infinity sign) the t-scores actually correspond with z-scores because of the large
sample size
- t-table is giving you the probability of being in the tail, whereas the z-table is giving you the probability
of being between mean and the value of X
- if our t-score is greater than the value under p=0.05, we say that the difference between the population
and sample means is statistically significant at the 0.05 level
Dummy variable a dichotomous variable that has been coded with 0 and 1 (nominal level data)
Confidence intervals around an estimated mean
Margin of error if we sandwich the mean in a range that extends a distance of the margin of error on each
side, we are 95% confident that the actually mean falls in that range
- the two tails total a probability of 0.05 and they are symmetrical
- as long as you have the mean and either the standard error or the standard deviation and sample size, we
can calculate the margin of error around our estimate of the mean
- we want to use the t-score for which 95% of the distribution is symmetrical around the mean and 5% of
the distribution is in the two tails
- the role of sample size in determining margin of error to decide how many people to survey
- bigger sample size, smaller margin of error
CHAPTER 8: Describing the pattern
Figures (information in some kind of picture, e.g. graphs for interval-level data) and tables (numbers and words
presented in rows and columns, e.g. contingency tables for categorical data)
Graphs
- a linear or curvilinear relationship between two interval-level variables
- positive/negative relationship between the dependent variable (y-axis) and independent variable (x-axis)
Contingency tables
- the independent variable is in the columns and the dependent variable is in the rows
- collapsing data as we do not want more than five categories (if collapsing interval-level data, keep the
size of the ranges equal + the lower range is included but the upper end only goes up to, but does not
include, the end point, e.g. $0-20,000 and $20,000-40,000)
- name: dependent variable by independent variable (e.g. Partisanship by Income)
It is a fallacy to take aggregate-level data and draw individual-level conclusions the ecological fallacy or an
aggregation bias
- why is there this contradiction between the state-level data and the individual-level data? the difference is in
the unit of analysis

You might also like