Social Statistics

1) Basic Terminology
-Data (plural): measurements or observations (aka scores)

-------------------------------------------------------------------------Variable: A characteristic or condition which has different values for different individuals (ex. height,
test scores, gender)
---Independent Variable(IV): The variable that is controlled by an experimenter.
-----Quasi-Independent Variable (Q-IV): A variable that can't be manipulated but is used to determine
groups.(height, hair color, age, gender, etc...)
---Dependent Variable (DV): A variable that is allowed to vary and is observed in relation to the IV.
(dependent on the independent variable).
-----------------------------------------------------------------------Statistics:A set of calculations used to organize, summarize and interpret info.
---Descripitive Statistics:Used to organize, simplify and summarize data.
---Inferential Statistics: Using Sample statistics to make generalizations about their population.
-------------------------------------------------------------------------Population: ALL of the individual you wish to study (ex. all students in the US)
---Parameter: A value used to describe a population.
------------------------------------------------------------------------Sample: ONLY SOME of the individuals/objects you wish to study from a population (ex. 1000
students from New York)
---Statistic: A value used to describe a sample.
---Sampling Error: A discrepency which occurs between a sample and its population.
-------------------------------------------------------------------Control Condition: Individuals in this type of experimentation are given no experimental treatment or
are given a type of placebo.(This condition is used to have a base of reference for the experimental
group.)
-Experimental Condition: Individuals in this type of experimentation do recieve the treatment being
tested.
2) Basic Symbols
What do all these symbols mean!
Symbols are used in many formulas that are needed to calculate different statistics.
These are some of the basics:
= Sum
x = each variable score
SS = Sum of squared deviations
sqrt = Square root
df = degrees of freedom
Symbols used to describe a Population
-- = mean
-- = standard Deviation
--= variance
--N = Total number of population scores
Symbols used to describe a Sample
-- M = mean
-- s = Standard Deviation
-- s = variance
-- n = Total number of sample scores
3) Distribution: Tables and Graphs

Frequency Distribution: This is a list of the scores for a certain experiment and a measure of the
frequency of each score. This information can be used to contruct tables and graphs
Variability:A quantitative measurement of the degree to which the scores in a distribution are spread
out or clustered together.
Normal Distribution
This type of distribution is seen when the variables are clustered together with gradual decrease on
either side of the distribution.
This type of distribution is used often in calculations assuming a normal population distribution. I will
discuss this type of distribution later in further detail.
It is also called a Gaussian Curve or Bell Curve.
Negative Skew
A negative skew is when the variables in a distribution are clustered together with a few outliers
which change the distribution. (The tail of the graph points to the negative end)
outliers: These are variables that fall outside the normal trends for the distribution.
(ex. Lets say the variable for the graph above is shoe size and most of the data falls within sizes 7 to
10 but if a few individuals had a shoe size of 4. That would skew the distribution negatively.)
tip: greater than 50% of the scores are above the mean
Postive Skew
A positive skew is when the variables in a distribution are clustered together with a few outliers that
change the distribution positively.(The tail of the graph points to the positive end)
(ex. In this case given the same information as the previous example the outlier would have a shoe
size of 13 instead of 4. Making the distribution positively skewed.)
tip: greater than 50% of the scores are below the mean
4) Central Tendency: The Mean, Median and Mode
Central Tendency: A measurement that uses only one score to describe a distribution of scores.
These are a few ways to measure central tendency:
-Mean ( or M): The average (sum of scores/# of scores)
-------Ex. (5,4,3) 5+4+3= 12, 12/3=4, Mean = 4
-Median:The score which divides all scores in half when put into ascending order.
------Ex. (10,4,3,2,1) Median = 3

-Mode: The score or scores that occur most often in a set.
------Ex. (5,4,3,3,2) Mode = 3
*For statistics, the mean is most often used in calculations of central tendency.
5) More on Variables and Scales

Variables can be IV, DV, and Q-IV. (see basic terminology above)
They can also be either: Discrete or Continuous
-Discrete: No values can exist between pre-determined categories.
----(ex. categories can be Male/Female or ratings on a scale from 1-5)
-Continuous Variables: Variables that have an infinate number of possibilities usually numerical.
----(ex. temperature could be 98.5F or 97.6F. There are continous values for temp.)
N O I R Scales for Variables
N = Nominal: A discrete set of categories with different names.
-----(ex. Pop categories: Coke, Sprite, Dr. Pepper)
O = Ordinal: A set of categories ordered by sequence.
----- (ex. best, better, fair, worse, worst)
I = Interval: Ordered Categories with exact distances between categories. NO Real Zero..
----- (ex. Temp with same spacing: 10, 20, 30, 40, 50...) (Temperature always exists so NO real zero
value)
R = Ratio:This is a numerical scale with a true zero.
----- (ex. $0.00 is really equal to no money while there can't be true zero temp)
*Later on this information will be important to determine how to collect data and what type of scale is
best for the given situation.*
6) Basic Calculations using Basic Symbols

Order of Operations:
All are done from left to right
1) Calculate (Within Parenthesis)
2) Exponents (squared, etc...)

3) Multiply and Divide (* and /)
4) Summation (= sum)
5) Any other Addition or Subtraction (+ and -)
Calculating Mean:
M=(X)/n
Calculating Sum of squared deviations:
SS=(X-) or
SS = X-(X)/N
* both will result in the same answer*
Calculating Variability:
= SS/N
s = SS/(n-1)
Calculating Standard Error
M = /(sqrt of N)
sM = s/ (sqrt of n)
7) Understanding Variance
Variability: A measurement to show the degree to which the the scores or data are spread out or
clustered in a distribution.
---A good way to describe a distribution in terms of distance (ex. lets say that most adults are within
a foot (12") of 5'5" tall. Variability would show that distance for a normal height and would represent
the heights most likely for someone to fall into if they are a part of that population. There would be
people who are much taller such as basketball players and those that are much shorter but it is
much rarer and would be seen in this distribution as outlying values. )
In certain cases the values may be much closer to the mean or farther from the mean.
---If you look at weight verses height you may find that the range of values for weight will be much
larger than the range of the values for height. An adult could weight between *90lbs or 500lbs* while
the height of a person is much more limited *3'5" to 7'6"*
* these numbers are only used to represent a point not to show accurate representation of the actual
range of weights or height. *
8) Z test
So what is a Z test?
A "Z test" is a way to standardize each score in a distribution and then determine a relation between
all the scores. (A way to know how a certain score compares to the other scores)
You can use a Z test when:
-- estimating a population parameter
-- and there is only one sample group
-- and there is only one score per subject in the group
-- and is given or can be calculated
Once a Z test is calculated then:
all scores are between -4.00 and +4.00
=0
=1
A Z score:
1) replaces the original scores, mean and variance.
2) changes data to have a normal distribution
2) is + or - (+ is above the mean and - is below the mean)
3) is a number that represents distance from the mean (Z score mean = 0)
Calculating Z scores:
(X-)/
9) T Test
What is a T Test?
A "T Test" is very much like a "Z test" in that it standardizes scores in a distribution; However, in a "T
Test" instead of using the population variance() we use the sample variance (s)
You can use a T test when:
-- estimating a population parameter
-- and there is only one sample group

-- and there is only one score per subject in the group
-- and is NOT given or CANNOT be calculated
Once a T test is calculated then:
=0
=1
A T score is:
1) + or - (+ is above the mean and - is below the mean)
2) a number that represents distance from the mean which is equal 0 (-4.00 to +4.00)
Calculating T scores:
t=(M-) / sM
sM= s/ (sqrt of N)

Social Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Social Statistics

Uploaded by

Copyright:

Available Formats

1) Basic Terminology

-Data (plural): measurements or observations (aka scores)

3) Distribution: Tables and Graphs

4) Central Tendency: The Mean, Median and Mode

------Ex. (10,4,3,2,1) Median = 3

5) More on Variables and Scales

6) Basic Calculations using Basic Symbols

2) Exponents (squared, etc...)

-- and there is only one sample group

You might also like