Professional Documents
Culture Documents
Objective slide
After completing
this course, you will
be able to:
Regression Analysis
Regression analysis mainly focuses on finding a relationship between a dependent variable and
one or more independent variables.
Predict the value of a dependent variable based on the value of at least one independent variable.
It explain the impact of changes in an independent variable on the dependent variable.
Y = f(X, )
where Y is the dependent variable
X is the independent variable
is the unknown coefficient
Widely used in prediction and forecasting
Univariate
Linear
Simple
Non Linear
Multivariate
Linear
Non Linear
Multiple
Copyright 2014, Simplilearn, All rights reserved.
Linear Regression
Its a common technique to determine how one variable of interest is affected by another.
Its used for three main purposes:
For describing the linear dependence of one variable on the other.
For prediction of values of other variable from the one which has more data.
Correction of linear dependence of one variable on the other.
A line is fitted through the group of plotted data.
Y= + X +
= intercept coefficients
= slope coefficients
= residuals
The residual value is a discrepancy between the actual and the predicted value.
The distance of the plotted points from the line gives the residual value.
The procedure to find the best fit is called the least-squares method.
Copyright 2014, Simplilearn, All rights reserved.
Slope = i
Intercept =
xi
Coefficient of determination R2 :
A measure of goodness of fit - How well your model does fit the data?
R2 = 0 , no linear relationship
Case Study
Case study slide.
SAS Video
Excel Video
Logistic Regression
Its a statistical method that is used in analyzing dataset where one or more independent variables
would determine the outcome
The dependent variables are binary (True or False)
Find the best fitting model to describe the relationship between the dichotomous characteristic and
a set of independent variables
Logistic regression generates the coefficients of a formula to predict a logit transformation of the
probability of presence of the characteristic of interest
logit (p) = 0 + 1 x1 + 2 x2 +3 x3 + n xn
where, p is the probability of presence of the characteristic of interest.
Logistic
Regression
Model
Cluster Analysis
Cluster Analysis is the process of forming groups of related
variable for the purpose of drawing important conclusions
based on the similarities within the group.
The greater the similarity within a group and greater the
difference between the groups, more distinct is the
clustering.
Types of Clusters
Well separated : The distance between any two points in different groups is greater than the
distance between any two points within a group. They need not be globular.
Prototype based : The prototype of a cluster is often a centroid for data with continuous
attributes. Such clusters tend to be globular.
Graph based : When data is represented as a graph where nodes are the objects and links
represent connection among the objects. They tend to be globular.
Density based : This method is employed when the clusters are irregular and when noise and
outliers are present.
Shared property : Also known as conceptual clustering its the process of identifying the pattern in
the clusters to successfully segregate into groups of clusters.
Copyright 2014, Simplilearn, All rights reserved.
DBSCAN : Its a density based clustering algorithm that produces a partitioned clustering, in which
number of clusters is automatically determined by the algorithm.
Copyright 2014, Simplilearn, All rights reserved.
Time Series
Time series data is an ordered sequence of observations on a quantitative variable measured over
an equally spaced time interval.
Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematic
finance, weather forecasting, earthquake prediction electroencephalography, control engineering,
astronomy , communications engineering and other places.
Time series analysis is used in
Analyzing time series data
Forecasting the future value of the variable under consideration.
In time series analysis it is assumed that the data consist of set of identifiable components and
random errors which usually makes the pattern difficult to identify.
E.g. Sales of quilts and blankets in a store across a period of five years.
Copyright 2014, Simplilearn, All rights reserved.
Observed
Trend
Seasonal
Random
Time
Copyright 2014, Simplilearn, All rights reserved.
Moving Average
Moving average is a widely used indicator in technical analysis that helps in smoothing out actions
by filtering out the noise i.e. the residuals from random fluctuations.
Moving average is also otherwise called as trend follower or lagging indicator because it always
depend on historical data.
Commonly used moving averages are
Check on labels in first row option if data extracted has column name in its first row
Check on chart output and click ok
The results will be obtained in new worksheet ply by default
Copyright 2014, Simplilearn, All rights reserved.
Summary
Here is a quick
recap of what we
have learned in this
lesson
Regression Analysis
Regression Models
Quiz
QUIZ
1
a.
Prediction
b. Collection
c.
Validation
d.
Tabulation
QUIZ
1
a.
Prediction
b. Collection
c.
Validation
d.
Tabulation
Answer: a.
Explanation: Prediction is the used for regression analysis.
QUIZ
2
Simple linear regression is not used for which of the following purse?
a.
b. For prediction of values of other variable from the one which has more data.
c.
d.
QUIZ
2
Simple linear regression is not used for which of the following purse?
a.
b. For prediction of values of other variable from the one which has more data.
c.
d.
Answer: c.
Explanation: Simple linear regression doesnt determine the distance between two
variables.
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
3
a.
d.
QUIZ
3
a.
d.
Answer: b.
Explanation: Residual value is the discrepancy between the actual and the predicted value.
QUIZ
4
The procedure to find the best fit for linear regression is?
a.
d.
QUIZ
4
The procedure to find the best fit for linear regression is?
a.
d.
Answer: d.
Explanation: The procedure to find the best fit for linear regression is least square method.
QUIZ
5
a.
K-means
b. DBSCAN
c.
d.
Collective clustering
QUIZ
5
a.
K-means
b. DBSCAN
c.
d.
Collective clustering.
Answers: d.
Explanation: Collective clustering is not a method for clustering.
QUIZ
6
a.
Hierarchical.
b. Fuzzy.
c.
Complete.
d.
Graph.
QUIZ
6
a.
Hierarchical.
b. Fuzzy.
c.
Complete.
d.
Graph.
Answer: d.
Explanation: Graph is a type of cluster.
QUIZ
7
a.
d.
QUIZ
7
a.
d.
Answer: d.
Explanation: The variables that are assumed to be the cause are called predictor and the
variables that are assumed to be effect are called the response or target variables.
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
8
a.
- theta
b. - beta
c.
- alpha
d.
- epsilon
QUIZ
8
a.
- theta
b. - beta
c.
- alpha
d.
- epsilon
Answer: d.
Explanation: The error term is represented as epsilon.
QUIZ
9
a.
d.
QUIZ
9
a.
d.
Answer: b.
Explanation: They assign each object to a single cluster.
QUIZ
10
a.
d.
QUIZ
10
a.
d.
Answer: a.
Explanation: simple moving average assigns equal weights to all values for smoothening.
QUIZ
11
a.
d.
QUIZ
11
a.
d.
Answer: b.
Explanation: exponential moving average assigns more weights to recent values and for
older values it decreases exponentially.
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
12
a.
Linear regression.
b. Clustered regression.
c.
Logistic regression.
d.
QUIZ
12
a.
Linear regression.
b. Clustered regression.
c.
Logistic regression.
d.
Answer: c.
Explanation: in logistic regression the dependent variable is binary and the independent
variable may be continuous or dichotomous.
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
13
a.
ODS graphics .
b. ODS plot.
c.
MSN graphics.
d.
ODS diagram.
QUIZ
13
a.
ODS graphics .
b. ODS plot.
c.
MSN graphics.
d.
ODS diagram.
Answer: a.
Explanation: ODS graphics helps in displaying the graphical output
QUIZ
14
a.
d.
QUIZ
14
a.
d.
Answer: c.
Explanation: Simple moving average gives equal weight to window of previous data, not
exponential.
Copyright 2014, Simplilearn, All rights reserved.
QUIZ
15
a.
b. HoltWinters
c.
ARIMA
d.
Holts method
QUIZ
15
a.
b. HoltWinters
c.
ARIMA
d.
Holts method
Answer: a.
Explanation: Simple Moving Average forecasting can be done in Excel
Thank You