Professional Documents
Culture Documents
Introduction
Basic regression concepts (key words)
Scatterplot
Statistical Modelling
Linear regression
(Selected at random)
Parameter is any numerical measure calculated based on
the population elements
Statistic is any numerical measure calculated based on the
sample elements
Inferential Statistics
5
Sample
Population
Types of data
6
Categorical:
Nominal: Order is not important (no ranking)
◼Examples: gender, marital status, race, ...
Ordinal: Order is important
◼Examples: Education level, disease severity, level of
satisfaction,…
Numerical:
Discrete:Countable (how many)
◼Number of children, length of stay at the hospital,…
Continuous: Uncountable (how much)
◼Age, weight, height, blood sugar,…
Three Skills for data analysis
8
▪ Scatter plot
Numerically
▪ Correlation coefficient
Modelling (equation)
▪ Regression
GRAPHICAL PRESENTATION
Scatterplot
Scatter plot
14
1. Linear
2. Positive
3. Strong
4. There is an outlier
2. Draw the scatterplot between “funding” and “visits” and
comment on the graph.
20
1. Linear
2. Positive
3. Strong
4. No outliers
NUMERICAL MEASURE
Correlation Coefficient
Correlation Coefficient
22
r is independent of units.
-1 ≤ r ≤ 1
r=0.98 r =-0.99
r=-0.13
r =0.51 r = -0.44
Testing the significance of the correlation (𝜌)
25
(predictor)
β0 and β1 are called regression parameters
b1 =
(x − x )( y − y ) SS
i i
= xy
(x − x )
2
i
SS xx
b0 = y − b1x
Linear regression in SPSS
35
Coefficients a
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 91.625 11.191 8.187 0.000
Reported diseases 0.479 0.063 0.737 7.556 0.000
(rate per 10,000)
a. Dependent Variable: Health care funding (amount per 100)
b0 b1 𝑌 = 91.625 + 0.479 X
ෟ = 91.625 + 0.479 𝑑𝑖𝑠𝑒𝑎𝑠𝑒𝑠
funding
1. Find the estimated regression line.
37
𝑌 = 91.6 + 0.48 X
Interpretation of Regression Coefficients
38
2. Interpret the regression coefficients.
39
If we are satisfied with how well the model fits the data, we
can use it to make predictions for y.
ෟ = 91.625 + 0.479 𝑑𝑖𝑠𝑒𝑎𝑠𝑒𝑠
funding
ෟ = 187.425
funding
On average it expected that the “funding” of a city with 200
of reported diseases (rate per 10,000) is 187$ (amount per
100)
Prediction in SPSS
45
of x which is 200
Prediction in SPSS
46
6. Predict “funding” of a city with 200 of reported diseases (rate per 10,000).
7. Find a 95% prediction interval for “funding” of a city with 200 of reported
diseases (rate per 10,000).
8. Find a 95% confidence interval for the average funding” of all cities with 200 of
reported diseases (rate per 10,000).
STATISTICAL MODELLING
Multiple linear
regression
Multiple linear regression model
49
is an error term
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients t Sig.
b0 b1 b2
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients t Sig.
ෟ = 180.22
funding