Professional Documents
Culture Documents
July 2014
Slide 2
Exercises
- Self study about Boxplots
- Data transformation
- Check of Dataset
Slide 3
Slide 4
Exercises
- Regression Analysis
- Analysis of Variance (ANOVA)
Table of Contents
Some notes about ______________________________________________________________________________________ 9
Types of Scales ......................................................................................................................................................................................................9
Nominal scale ............................................................................................................................................................................................................................ 10
Ordinal scale .............................................................................................................................................................................................................................. 11
Metric scales (interval and ratio scales) .................................................................................................................................................................................... 12
Hierarchy of scales .................................................................................................................................................................................................................... 13
Properties of scales ................................................................................................................................................................................................................... 14
Summary: Type of scales .......................................................................................................................................................................................................... 15
Exercise in class: Scales ......................................................................................................................................................................................16
Distributions .........................................................................................................................................................................................................17
Measure of the shape of a distribution ...................................................................................................................................................................................... 18
Transformation of data .........................................................................................................................................................................................20
Why transform data? ................................................................................................................................................................................................................. 20
Type of transformation ............................................................................................................................................................................................................... 20
Linear transformation ................................................................................................................................................................................................................. 21
Logarithmic transformation ........................................................................................................................................................................................................ 22
................................................................................................................................................................................................................................................... 24
Summary: Data transformation.................................................................................................................................................................................................. 25
Data trimming .......................................................................................................................................................................................................26
Finding outliers and extremes ................................................................................................................................................................................................... 26
Boxplot ....................................................................................................................................................................................................................................... 27
Boxplot and error bars ............................................................................................................................................................................................................... 28
Q-Q plot ..................................................................................................................................................................................................................................... 29
Example ..................................................................................................................................................................................................................................... 33
Slide 6
Linear Regression_______________________________________________________________________________________ 35
Example ...............................................................................................................................................................................................................35
General purpose of regression .............................................................................................................................................................................38
Key Steps in Regression Analysis ........................................................................................................................................................................39
Regression model ................................................................................................................................................................................................40
Mathematical model .................................................................................................................................................................................................................. 40
Stochastic model ....................................................................................................................................................................................................................... 40
Gauss-Markov Theorem, Independence and Normal Distribution .........................................................................................................................42
Regression analysis with SPSS: Some examples ................................................................................................................................................43
Simple example (EXAMPLE02)................................................................................................................................................................................................. 43
Step 1: Formulation of the model .............................................................................................................................................................................................. 43
Step 2: Estimation of the model ................................................................................................................................................................................................ 44
Step 3: Verification of the model................................................................................................................................................................................................ 45
Step 3: Verification of the model t-tests .................................................................................................................................................................................. 46
Step 6. Interpretation of the model ............................................................................................................................................................................................ 47
Back to Step 3: Verification of the model .................................................................................................................................................................................. 48
Step 5: Testing of assumptions ................................................................................................................................................................................................. 50
Violation of the homoscedasticity assumption........................................................................................................................................................................... 53
Multiple regression ...............................................................................................................................................................................................54
Many similarities with simple Regression Analysis from above ................................................................................................................................................ 54
What is new? ............................................................................................................................................................................................................................. 54
Multicollinearity .....................................................................................................................................................................................................55
Outline ....................................................................................................................................................................................................................................... 55
How to identify multicollinearity ................................................................................................................................................................................................. 56
Slide 7
Multiple regression analysis with SPSS: Some detailed examples .......................................................................................................................57
Example of multiple regression (EXAMPLE04) ......................................................................................................................................................................... 57
Step 1: Formulation of the model .............................................................................................................................................................................................. 57
Step 3: Verification of the model (without dummy for gender) .................................................................................................................................................. 58
SPSS Output regression analysis (EXAMPLE04) ..................................................................................................................................................................... 58
Dummy coding of categorical variables ..................................................................................................................................................................................... 60
Gender as dummy variable ....................................................................................................................................................................................................... 61
Step 1: Formulation of the model (with dummy for gender) ...................................................................................................................................................... 61
Step 3: Verification of the model (with dummy for gender) ....................................................................................................................................................... 62
SPSS Output regression analysis (EXAMPLE04) ..................................................................................................................................................................... 62
Example of multicollinearity ....................................................................................................................................................................................................... 63
Step 1: Formulation of the model .............................................................................................................................................................................................. 63
SPSS Output regression analysis (Example of multicollinearity) I ............................................................................................................................................ 64
Slide 8
Exercises 03: ANOVA ____________________________________________________________________________________ 90
Categorical Metric
(SPSS: Ordinal, Nominal) (SPSS: Scale)
Stevens S.S. (1946): On the Theory of Scales of Measurement; Science, Volume 103, Issue 2684, pp. 677-680
Slide 10
Nominal scale
Consists of "names" (categories). Names have no specific order.
Must be measured with an unique (statistical) procedure.
Each category is assigned a number (code can be arbitrary but must be unique).
Ordinal scale
Consists of a series of values
Each category is associated with a number which represents the category's order.
The Likert scale (rating scale) is a special kind of ordinal scale.
Slide 12
inequality (=, )
Categorical
Interval
Slide 14
Properties of scales
Statistical analysis assumes that the variables have specific levels of measurement.
Variables that are measured nominal or ordinal are also called categorical variables.
Exact measurements on a metric scale are statistically preferable.
Slide 16
Distributions
Take an optical impression. Source: http://en.wikipedia.org (Date of access: July, 2014)
Normal Poisson
Widely used Law of rare
in statistics events (origin
(statistical 1898: number
inference). of soldiers
killed by
horse-kicks
each year).
Exponential Pareto
Queuing Allocation of
model (e.g. wealth
average time among indi-
spent in a viduals of a
queue). society ("80-
20 rule").
Slide 18
Analyze Descriptive Statistics Frequencies...
Measure of the shape of a distribution
Example
Dataset "Data_07.sav" (Tschernobyl fallout of radioactivity, measured in becquerel)
BQ LNBQ
N Valid 23 23
Missing 0 0
Skewness 2.588 .224
Std. Error of Skewness .481 .481
Kurtosis 7.552 -.778
Std. Error of Kurtosis .935 .935
Logarithmic transformation
Compute lnbq = ln(bq).
freq bq lnbq.
Slide 20
Transformation of data
Why transform data?
1. Many statistical models require that the variables (in fact: the errors) are
approximately normally distributed.
2. Linear least squares regression assumes that the relationship between two variables is linear.
Often we can "straighten" a non-linear relationship by transforming the variables.
3. In some cases it can help you better examine a distribution.
Type of transformation
Linear Transformation
Does not change shape of distribution.
Non-linear Transformation
Changes shape of distribution.
Analyze Descriptive Statistics Descriptives...
Slide 21
Linear transformation
Transformation rule
x - mean of sample
zi = i standard deviation of sample
Slide 22
Logarithmic transformation
Works for data that are skewed right.
Works for data where residuals get bigger for bigger values of the dependent variable.
Such trends in the residuals occur often, if the error in the value of an
outcome variable is a percent of the value rather than an absolute value.
For the same percent error, a bigger value of the variable means a bigger absolute error,
so residuals are bigger too.
Taking logs "pulls in" the residuals for the bigger values.
log(Y*error) = log(Y) + log(error)
Example: Body size against weight
100
90
Transformation rule
80
f(x) = log(x);x 1
f(x) = log(x +1);x 0
70
60
weight (in kg)
50
40
150 160 170 180 190 200
Logarithmic transformation I
Symmetry Histogram of
original data
A logarithmic transformation reduces
positive skewness because it compresses
the upper tail of the distribution while
stretching out the lower trail. This is be-
cause the distances between 0.1 and 1, 1
and 10, 10 and 100, and 100 and 1000
are the same in the logarithmic scale.
This is illustrated by the histogram of
data simulated with salary (hourly wag-
es) in a sample of nurses*. In the origi-
nal scale, the data are long-tailed to the
right, but after a logarithmic transfor-
mation is applied, the distribution is Histogram of
symmetric. The lines between the two transformed data
histograms connect original values with
their logarithms to demonstrate the
compression of the upper tail and
stretching of the lower tail.
Slide 24
Logarithmic transformation II
Histogram of
transformed data
Transformation
y = log10(x)
Histogram of
original data
skewed right
Slide 25
Other transformations
Root functions Hyperbola function
f(x) = x1/2 ,x1/3 ;x 0 f(x) = x -1;x 1
usable for right skewed distributions usable for right skewed distributions
Box-Cox-transformation Probit & Logit functions (cf. logistic regression)
p p
f(x) = x ; > 1 ln( ) f (p) = ln( ); p [0,1]
1 p 1 p
usable for left skewed distributions usable for proportions and percentages
Slide 26
Data trimming
Methods?
Use basic statistics: <Analyze> with <Frequencies>, <Explore> and <Descriptives.>
Outliers => e.g. z-scores higher/lower 2 st. dev., extremes => higher/lower 3 st. dev.
Use graphical techniques: Histogram, Boxplot, Q-Q plot, .
Outliers => e.g. as indicated in boxplot
Slide 27
Boxplot
A Boxplot displays the center (median), spread and outliers of a distribution.
See exercise for more details about whiskers, outliers etc.
1 40 .0
92
14 0 .0
92
income
income
10 0 .0
1 00 .0
60 .0
Whisker
in com e 6 0.0
income
2 3 4 5 6 7
education
educ
Slide 28
9 0.0
190
83 8 8.0
1 20 .0
88
196 8 6.0
168
95% CI income
191
65
income
8 4.0
1 00 .0
8 2.0
8 0.0
8 0.0
7 8.0
7 6.0
6 0.0 7 4.0
2 3 4 5 6 7 2 3 4 5 6 7
educ educ
Slide 29
Q-Q plot
The quantile-quantile (q-q) plot is a graphical technique for deciding if two samples come from
populations with the same distribution.
Quantile: the fraction (or percent) of data points below a given value.
For example the 0.5 (or 50%) quantile is the position at which 50% percent of the data fall below
and 50% fall above that value. In fact, the 50% quantile is the median.
Slide 30
In the q-q plot, quantiles of the first sample are set against the quantiles of the second sample.
If the two sets come from a population with the same distribution, the points should fall
approximately along a 45-degree reference.
The greater the displacement from this reference line, the greater the evidence for the
conclusion that the two data sets have come from populations with different distributions.
A q-q plot is better when assessing the goodness of fit in the tail of the distributions.
The normal q-q plot is more sensitive to deviances from normality in the tails of the distribution,
whereas the normal p-p plot is more sensitive to deviances near the mean of the distribution.
Slide 31
Quantiles of the first sample are set against the quantiles of the second sample.
Slide 32
300 300
Hufigkeit
Hufigkeit
200 200
100 100
0 0
9 12
Test distribution (SPSS)
Test distribution (SPSS)
10
8
Erwarteter Wert von Normal
8
Standard Normal
Standard Normal
5
2
4
0
3 -2
3 4 5 6 7 8 9 -2 0 2 4 6 8 10 12 14 16
Beobachteter
Normal Wert Beobachteter
Sample Wert
Distribution
Example
Dataset "Data_07.sav" (Tschernobyl fallout of radioactivity)
Distribution of original data Distribution of log transformed data
Slide 34
Linear Regression
Example
Medical research: Dependence of age and systolic blood pressure
180
Typical questions
170
Is there a linear relation between
160 age and systolic blood pressure?
150
140
What is the predicted mean blood
35 40 45 50 55 60 65 70 75 80 85 90 pressure for men aged 67?
Age [years]
Slide 36
The questions
Question in everyday language:
Is there a linear relation between age and systolic blood pressure?
Research question:
What is the relation between age and systolic blood pressure?
What kind of model is best for showing the relation? Is regression analysis the right model?
Statistical question:
Forming hypothesis
H0: "No model" (= No overall model and no significant coefficients)
HA: "Model" (= Overall model and significant coefficients)
Can we reject H0?
The solution
Linear regression equation of age on systolic blood pressure
pressure = 0 + 1 age + u
pressure = dependent variable
age = independent variable
0 , 1 = coefficients
u = error term
Slide 37
"How-to" in SPSS
Scales
240
Dependent variable: metric
220
SPSS
Analyze Regression Linear... 210
200
Result
190
Significant linear model
180
Significant coefficient
170
pressure = 135.2 + 0.956 age
160
Age [years]
Typical statistical statement in a paper:
There is a linear relation between age and systolic blood pressure.
(Regression: F = 102.763, R2 = .93, p = .000).
Slide 38
Impact analysis
Assess the impact of the independent variable to the dependent variable.
Example
If age increases, blood pressure also increases:
How strong is the impact? By how much will pressure increase with each additional year?
Prediction
Predict the values of a dependent variable using new values for the independent variable.
Example
Which is the predicted mean systolic blood pressure of men aged 67?
Slide 39
Text in italics: Only important in the case of multiple regression see next chapter.
Slide 40
Regression model
More details about mathematics
in Christof Luchsinger's part
Mathematical model
The linear model describes y as a function of x y
=
x
0 (intercept, constant)
1 (regression coefficient)
The increase in the dependent variable per unit change in the
independent variable (also known as "the rise over the run", slope)
Stochastic model
y = 0 + 1 x + u
The error term u comprises all factors (other than x) that affect y.
These factors are treated as being unobservable.
u stands for "unobserved"
Slide 41
Slide 42
1. Linear in coefficients y = 0 + 1 x + u
Slide 44
Empirical F-value and the appropriate p-value ("Sig.") are computed by SPSS.
In the example, we can reject H0 in favor of HA (Sig. < 0.05).
The overall model is significant (F(1,97) = 116.530, p = .000).
The estimated model is not only a theoretical construct but one that exists in a statistical sense.
Slide 46
The t statistic for the height variable (1) is associated with a p-value of .000 ("Sig.").
This indicates that the null hypothesis can be rejected.
Thus, the coefficient is significantly different from zero.
This holds also for the constant (0) with Sig. = .000.
Slide 47
weight i = 0 + 1 height i
Slide 48
yi
Error
y i
Total Gap
Regression
y i = Data point
y i = Estimation (model)
y = Sample mean
(y
i =1
i y) = ( y i y) + ( y i y i ) 2
2
i =1
2
i =1
SS Regression
R Square = 0 R Square 1
SS Total
Slide 50
3. Zero conditional mean: The mean values of the residuals do not differ visibly from 0 across
the range of standardized estimated values. OK
5. Homoscedasticity: Residual plot trumpet-shaped; residuals do not have constant variance.
This Gauss-Markov requirement is violated. There is heteroscedasticity.
Independence: There is no obvious pattern that indicates that the residuals would be influenc-
ing one another (for example a "wavelike" pattern). OK
Slide 52
Corrections
Transformation of the variable:
Possible correction in the case of this example is a log transformation of variable weight
Use of robust standard errors (not implemented in SPSS)
Use of Generalized Least Squares (GLS):
The estimator is provided with information about the variance and covariance of the errors.
(The last two options are not pursued further in this course.)
Slide 54
Multiple regression
Many similarities with simple Regression Analysis from above
All concepts are the same also regarding multiple regression analysis.
What is new?
Concept of multicollinearity
Concept of stepwise conduction of regression analysis
Dummy coding of categorical variables
Standardized regression coefficients
Adjustment of the coefficient of determination ("Adjusted R Square")
Slide 55
Multicollinearity
Outline
Multicollinearity means there is a strong correlation between independent variables.
Perfect collinearity means a variable is a linear combination of other variables.
=> Unique estimate of coefficients not possible because of infinite number of combinations.
Perfect collinearity is rare in real-life data (except the fact that you make a mistake.)
However, correlations or even strong correlations between variables are unavoidable.
Symptoms of multicollinearity
When correlation is strong, standard errors of the parameters become large
and thus t-tests and confidence intervals inaccurate.
The probability is increased that a good predictor will be found non-significant and rejected.
In stepwise regression coefficient estimation is subject to large changes.
There might be coefficients with sign opposite of that expected.
Multicollinearity is .
a severe problem when the research purpose includes causal modelling.
less important where the research purpose is prediction since the predicted values of
remain stable relative to each other.
Slide 56
In addition, SPSS reports the Variance Inflation Factor (VIF) which is simply the inverse of the
Tolerance (1/Tolerance). VIF has a range 1 to infinity.
Slide 57
Slide 58
The unstandardized B coefficients show the absolute change of the dependent variable weight
if the respective independent variable, height or age, changes by one unit.
The Beta coefficients are the standardized regression coefficients.
Their magnitudes reflect their relative importance in predicting weight.
Beta coefficients are only comparable within a model, not between. Moreover, they are highly
influenced by misspecification of the model.
Adding or leaving out variables in the equation will affect the size of the beta coefficients.
Slide 59
m (1 R Square)
Adjusted R Square = R Square
n m 1
n = number of observations
m = number of independent variables
n m 1= degrees of freedom(df)
Slide 60
For example, seasonal effects may be captured by creating dummy variables for each of the
seasons. Also gender effects may be treated with dummy coding.
The number of dummy variables is always one less than the number of categories.
Slide 62
Switching from male (female = 0) to female (female = 1) lowers weight by 8.345 kg.
Model fits better (Adjusted R square .894 vs. .832) due to "separation" of gender.
Slide 63
Example of multicollinearity
Human resources research in hospitals: Survey of nurse satisfaction and commitment
Slide 64
Slide 66
Data (EXAMPLE05.sav)
Subsample of n = 96 nurses
Among other variables: work experience (3 levels), salary (hourly wage in CHF/h)
Typical questions
Has experience an effect on the level of salary?
Are the results only due to chance?
What is the relation between work experience and salary?
Slide 68
Boxplot
- - - grand mean
The boxplot indicates that salary may differ significantly depending on levels of experience.
Slide 69
Questions
Question in everyday language:
Has work experience an effect on salary?
Research question:
Is there a relation between work experience and salary?
What kind of model is suitable for the relation?
Is analysis of variance the right model?
Statistical question:
Forming hypothesis
H0: "No model" (= Not significant factors)
HA: "Model" (= Significant factors)
Can we reject H0?
Solution
Linear model with salary as the dependent variable (ygk = wage of nurse k in group g)
y gk = y + g + gk
y = grand mean
g = effect of group g
gk = random term
Slide 70
"How-to" in SPSS
Scales
Dependent Variable: metric
Independent Variable(s): categorical, part of them metric (called covariates)
SPSS
Analyze General Linear Model Univariate...
Results
Overall model significant ("Corrected Model": F(2, 93) = 46.193, p = .000).
1. Design of experiments
ANOVA is typically used for analyzing the findings of experiments
Oneway ANOVA, Repeated measures ANOVA
Multi-factorial ANOVA (two or more factor analysis of variance) Mixed ANOVA
2. Calculating differences and sum of squares
Differences between group means, individual values and grand mean are squared and
summed up. This leads to the fundamental equation of ANOVA.
Test statistics for significance test is calculated from the means of the sums of squares.
3. Prerequisites
Data is Independent
Normally distributed variables
Homogeneity of variance between groups
4. Verification of the model and the factors
Is the overall model significant? (F-test)? Are the factors significant?
Are prerequisites met?
5. Checking measures
Adjusted R squared / partial Eta squared
Slide 72
Designs of ANOVA
One-way ANOVA: one factor analysis of variance
1 dependent variable and 1 independent factor
Multi-factorial ANOVA: two or more factor analysis of variance
1 dependent variable and 2 or more independent factors
MANOVA: multivariate analysis of variance
Extension of ANOVA used to include more than one dependent variable
y y
Slide 74
If y1 y 2 y3 then SSbetween SSwithin
Basic idea of ANOVA
Total sum of squared variance of differences SStotal is separated into two parts
(SS is short for Sum of Squares)
SSbetween Part of sum of squared difference due to groups ("between groups", treatments)
(here: between levels of experience)
SSwithin Part of sum of squared difference due to randomness ("within groups", also SSerror)
(here: within each experience group)
(y
g=1 k =1
gk
2
y) = K (y
g=1
g g
2
y) + (y
g=1 k =1
gk yg )2
SSt
MSt = Mean of SStotal
K total 1
SSb
MSb = Mean of SSbetween
G 1
SSw
MSw = Mean of SSwithin
K total G
Calculating test statistic F and significance testing for the global model
MSb
F= F follows an F-distribution with (G 1) and (Ktotal G) degrees of freedom
MS w
The F-test verifies the hypothesis that the group means are equal:
H0 : y1 = y 2 = y3
HA : yi y j for at least one pair ij
Slide 76
Slide 78
"Grand mean"
SSbetween
SSwithin (= SSerror)
SStotal
Partial Eta Squared compares the amount of variation explained by a particular factor (all other
variables fixed) to the amount of variation that is not explained by any other factor in the model.
This means, we are only considering variation that is not explained by other variables in the
model. Partial 2 indicates what percentage of this variation is explained by a variable.
Slide 80
Two-Way ANOVA
Research in human resource management: Survey of nurse salary
Level of Experience
1 2 3 All
Position
Typical questions
Do work position and experience have an effect on salary? ( main effects)
What "interaction" exists between work position and experience? ( interaction effects)
Slide 81
Main effects
The direct effect of an independent variable on the dependent variable is called main effect.
In the example:
The main effect of experien reveals that the nurses salaries depend on their level of profes-
sional experience.
The main effect of position reveals that the nurses salaries depend on whether they work in
the office or the hospital.
Profile plots are used as visualization:
Main effect experien Main effect position
45
45
40 40
35 35
30 30
25 25
salary
salary
20 20
15 15
10 10
5 5
0 0
1 2 3 office hospital
experien position
If the profile plot shows a (nearly) horizontal line, the main effect in question is presumably not
significant. (Attention: SPSS cuts off lower area of graph, Y-axis often does not start at 0!)
Slide 82
Interaction effects
An interaction between experience and position means there is dependency between the two
variables.
The independent variables have a complex influence on the dependent variable.
The factors do not just function additively but act together in a different manner.
An interaction means that the effect of one factor depends on the value of another factor.
experience
(factor A)
interaction
salary
(factor A x B)
position
(factor B)
Slide 83
Interaction effects
In the example: The interaction between experien and position means ...
that the effect of work experience on salary is not the same for nurses who work in offices
and for nurses who work in the hospital.
that the difference in salary between nurses working in the hospital and nurses working in
the office depends on the level of experience.
Profile plots:
Separate lines for position Separate lines for experien
45 45
experien
40 40
hospital 3
35 35 2
office
30 30 1
25 25
salary
salary
20 20
15 15
10 10
5 5
0 0
1 2 3 office hospital
experien position
Slide 84
Slide 86
Interaction
Interaction I
Do different levels of experience influence the impact of different levels of position differently?
Yes, if experience has values 2 or 3 then the influence of position is raised.
office
hospital
Slide 88
More on interaction
Main effect of experien Main effect of experien Main effect of experien
Main effect of position Main effect of position Main effect of position
Interaction Interaction Interaction
salary
salary
salary
salary
salary
Requirements of ANOVA
0. Robustness
ANOVA is relatively robust against violations of prerequisites.
1. Sampling
Random sample, no treatment effects
A well designed study avoids violation of this assumption
2. Distribution of residuals
Residuals (= error) are normally distributed
Correction transformation
3. Homogeneity of variances
Residuals (= error) have constant variance
Correction weight variances
4. Balanced design
Same sample size in all groups
Correction weight mean
SPSS automatically corrects unbalanced designs by Sum of Squares "Type III"
Syntax: /METHOD = SSTYPE(3)
Slide 90
Independent Dependent
Variable (IV) Variable(s) (DV)
Employee Customer
Price of satisfaction satisfaction
product
Quality of Customer
Products satisfaction
Quality of Motivation of
customer service employee
Slide 92
Choice of Method
metric categorical
Data Analysis
Descriptive Inductive
Dependence Interdependence
Slide 94
LDA is closely related to ANOVA and logistic regression analysis, which also attempt to express
one dependent variable as a linear combination of other variables.
135
130
125
x2 [cm]
120
115
110
Species x1 x2
105 1 191 131
1 185 134
100 1 200 137
150 160 170 180 190 200 210 220 230 240 250
1 173 127
x1 [cm]
1 171 118
1 160 118
1 188 134
1 186 129
1 174 131
1 163 115
Other names for puma 2 186 107
2 211 122
cougar 2 201 114
2 242 131
mountain lion 2 184 108
2 211 118
catamount 2 217 122
Species 1 = North America, 2 = South America
2 223 127
panther 2 208 125 x1 body length: nose to top of tail
2 199 124 x2 body length: nose to root of tail
Slide 96
Sketch of LDA
Very short introduction to linear discriminant analysis
Goal
Discrimination between groups
Puma's example: discrimination between two subspecies
Yi discriminant variable
,1,2 coefficients
xi,1,xi,2 measurement of body lenght
ui error term
Slide 97
140
135
130
125
x2 [cm]
120
115
110
105
100
150 160 170 180 190 200 210 220 230 240 250
x1 [cm]
140
135
130
125
x2 [cm]
120
115
110
105
100
150 160 170 180 190 200 210 220 230 240 250
x1 [cm]
Slide 98
SPSS-Example of linear discriminant analysis (EXAMPLE07)
DISCRIMINANT
/GROUPS=species(1 2)
/VARIABLES=x1 x2
/ANALYSIS ALL
/PRIORS SIZE
/STATISTICS=MEAN STDDEV UNIVF BOXM COEFF RAW TABLE
/CLASSIFY=NONMISSING POOLED MEANSUB .
Slide 99
Yi = + 1x i,1 + 2 x i,2 + i
Slide 100
5
"Found" two pumas A & B: 4 A
x1
175
x2
120
B 200 110
x1 x2 3
2
A 175 120
discriminant variable Y
1
B 200 110
0
1 1 1 1 1 1 A 1 1 1 1 2 2 2 2 2 2 2 B 2 2 2
What subspecies are they? -1
-2
Use -3
-5
to determine their subspecies. subspecies of puma [0,1]