You are on page 1of 108

Session 6

Data Analysis
Session Speaker
K.M. Sharath Kumar
1
M. S. Ramaiah University of Applied Sciences

Session Objectives
>_To explain the relevance of data analysis for
carrying out
research
>_To explore different types of data analysis
techniques for effective interpretation
>_To critique and recommend appropriate
exploratory data analysis techniques for a
problem
2
M. S. Ramaiah University of Applied Sciences

Session Outline

Sampling Design
Data Collection Methods
Quantitative and Qualitative Data Analysis
Stages in Data Analysis
Review of Techniques
Error Analysis

3
M. S. Ramaiah University of Applied Sciences

4
M. S. Ramaiah University of Applied Sciences

One Variant
6,200 Distinct Parts
Imported from 17 Countries
From 240 Suppliers
Assembled in 1 Plant
Within few minutes
Exported to 34 Countries
Same day
Without becoming inventory!

Suzuki Grand Vitara

5
M. S. Ramaiah University of Applied Sciences

The secret of success is to know


something nobody else knows
- Aristotle Onassis

6
M. S. Ramaiah University of Applied Sciences

Turn Data into Insight


Insight into Action
Action into Tangible Results
- Accenture

7
M. S. Ramaiah University of Applied Sciences

Data Analysis
(1/2)

Explore relationships among the variables


Partition the total variability (by statement /
variance component analysis)
Handle noisy data appropriately
Questions to be answered:
Is the process stable?
Is the process capable of meeting
specifications?
What are the major sources of variation
(noise, etc)?

Listen
Listen to
to what
what the
the data
data is
is saying
saying

8
M. S. Ramaiah University of Applied Sciences

Data Analysis (2/2)


Data Analysis is carried out in two distinct
environment
Result of a special study or Experiment
By product of some operations or Observational
Experimental Studies

Here we compare various condition and try to


determine which condition is better. We have
finite amount of data and carry out one time
analysis
Observational Studies

Here we get data from steady state process and


trying to find out any unplanned change is
occurred or not. Generally we perform a 9

M. S. Ramaiah University of Applied Sciences

Classification of Data Analysis


Quantitative

vs.

Explanation through
numbers
Objective
Deductive reasoning
Predefined variables and
measurement
Data collection before
analysis
Cause and effect
relationships

Qualitative
Explanation through words
Subjective
Inductive reasoning
Creativity, extraneous
variables
Data collection and
analysis intertwined
Description, meaning
10
M. S. Ramaiah University of Applied Sciences

Ambushed Every Where

11
M. S. Ramaiah University of Applied Sciences

Data analysis should be:


Supported by data
Shown in graphical and statistical
format
Not based on intuition
Make sense from an engineering
standpoint

Data
Data and
and Hard
Hard
Evidence!!
Evidence!!
12
M. S. Ramaiah University of Applied Sciences

Key Components of a Data


Analysis Plan
Purpose of the evaluation
Questions
What you hope to learn from the
question
Analysis technique
How data will be presented

13
M. S. Ramaiah University of Applied Sciences

Types of Data

Continuous Data

Discrete Data

14
M. S. Ramaiah University of Applied Sciences

Continuous Data
Data generated by
Physically measuring the characteristic
Generally using an instrument
Assigning an unique value to each item
Examples:
Time to receive a shipment, Time spend per page, Time to
activate, CPU Speed, Total Minutes per Incident (TMPI),
etc.
Hardness, Strength, Weight, Diameter, etc.
15
M. S. Ramaiah University of Applied Sciences

Discrete Data
Data generated by
Classifying the items into different groups based on
some criteria
No physical measurement is involved

Examples:
Sex, Shade variation, Surface defects etc.
% of visitors signing in for AOL messenger per day,
Number of Recharges per Month , Number of Operating
Systems, % Escalations, etc .
16
M. S. Ramaiah University of Applied Sciences

Continuous Data: Example (Time spend per page visit (in


minutes))
SL No.

Data

SL No.

Data

0.98

11

1.02

1.03

12

0.98

1.00

13

1.01

1.00

14

1.01

0.99

15

0.99

1.01

16

1.00

0.97

17

1.01

1.02

18

0.99

1.00

19

1.00

10

0.99

20

1.02
17
M. S. Ramaiah University of Applied Sciences

Continuous Data: Example (Time spend per visit (in


minutes))
Graphical Representation

18
M. S. Ramaiah University of Applied Sciences

Random Variables
0

BBBB
BGBB
GBBB
BBBG
BBGB
GGBB
GBBG
BGBG
BGGB
GBGB
BBGG
BGGG
GBGG
GGGB
GGBG
GGGG

1
X

3
4
Sample Space

Points on the
Real Line
19

M. S. Ramaiah University of Applied Sciences

Random Variables (Continued)

Suppose, the random variable X = 3 when any of the four


outcomes BGGG, GBGG, GGBG, or GGGB occurs,
P(X = 3) = P(BGGG) + P(GBGG) + P(GGBG) + P(GGGB)
= 4/16
The probability distribution of a random variable is a table
that lists the possible values of the random variables and their
associated probabilities.
x
0
1
2
3
4

TheGraphical
GraphicalDisplay
Displayfor
forthis
this
The
ProbabilityDistribution
Distribution
Probability
shownon
onthe
thenext
nextSlide.
Slide.
isisshown

P(x)
1/16
4/16
6/16
4/16
1/16
16/16=1

20
M. S. Ramaiah University of Applied Sciences

Random Variables (Continued)


Probability Distribution of the Number of Girls in Four Births
0.4

6/ 16

Probability, P(X)

0.3
4/ 16

4/ 16

0.2

0.1
1/ 16

0.0

1/ 16

2
Number of Girls, X

21
M. S. Ramaiah University of Applied Sciences

Example

Consider the experiment of tossing two six-sided dice. There are 36 possible
outcomes. Let the random variable X represent the sum of the numbers on
the two dice:
x
P(x)
x
P(x)
3
1,3
2,3
3,3
4,3
5,3
6,3

4
1,4
2,4
3,4
4,4
5,4
6,4

5
1,5
2,5
3,5
4,5
5,5
6,5

6
1,6
2,6
3,6
4,6
5,6
6,6

7
8
9
10
11
12

P ro b a b ility Dis trib utio n o f S um o f Two Dic e

0.17

0.12

p(x)

1,1
2,1
3,1
4,1
5,1
6,1

2
1,2
2,2
3,2
4,2
5,2
6,2

2
2
1/36
1/36
3
3
2/36
2/36
4
4
3/36
3/36
5
5
4/36
4/36
6
6
5/36
5/36
7
7
6/36
6/36
8
8
5/36
5/36
9
9
4/36
4/36

0.07

0.02
2

10

11

12

22
M. S. Ramaiah University of Applied Sciences

NORMAL DISTRIBUTION

23
M. S. Ramaiah University of Applied Sciences

Generic Causes Of Variation


Machines
Materials
Methods
Measurements
Mother Nature

PP
RR
OO
CC
EE
SS
SS

People
24
M. S. Ramaiah University of Applied Sciences

THE NORMAL CURVE

Smooth curve
interconnecting the
center of each bar

Center of the bar

Units of
Measure

25
M. S. Ramaiah University of Applied Sciences

Normal Distribution
If the frequency distribution of a set of
values is such that :
68.26% of the values lie within 1 from
the mean
AND
95.46% of the values lie within 2 from
the mean
AND
99.73% of the values lie within 3 from
the mean

Then the distribution is normal.


NORMAL DISTRIBUTION IS
CHARACTERISED BY A BELL SHAPED
CURVE.

26

M. S. Ramaiah University of Applied Sciences

Standard Normal Distribution


Since each normal variables have different units of
measurement
Standard Normal Distribution can tackle this
Standard Normal Variable Z = (x ) /
First convert the original problem
probability table for Z will be available

into

Z.

The

27
M. S. Ramaiah University of Applied Sciences

Sampling Design

28
M. S. Ramaiah University of Applied Sciences

Samples and Populations

Population
(N)

Sample
(n)

29

M. S. Ramaiah University of Applied Sciences

Sampling Design within the Research


Process
Question
hierarchy

Sample Type

Define Relevant
Population

Sampling
technique

Identify existing
sampling frame
Evaluate
sampling frame
Probability

Non-Probability

Dont
accept
Modify
sampling frame

Select
sampling frame
Draw
sample

30

M. S. Ramaiah University of Applied Sciences

Types of Sampling

Non-Probability
Sampling

Probability
Sampling

Convenience
Sampling

Simple
Random
Sampling

Stratified
Random
Sampling

Systematic
Sampling

Expert
Sampling

Quota
Sampling

Cluster
Sampling

31
M. S. Ramaiah University of Applied Sciences

Stratified Random Sampling

Instratified
stratified random
random sampling,
sampling,we
weassume
assumethat
thatthe
the
In
populationof
ofNNunits
unitsmay
maybe
bedivided
dividedinto
into m
mgroups
groupswith
withNNi
population
i
unitsin
ineach
eachgroup
group i=1,2,...,m.
i=1,2,...,m. The
Them
mstrata
strataare
are
units
nonoverlappingand
andtogether
togetherthey
theymake
makeup
upthe
thetotal
total
nonoverlapping
population:NN1+
+NN2+...+
+...+
=N.
Population
population:
NNmm=N.
1
2
Stratum1

N1

Stratum 2

N2

The
The m
m strata
strata are
are
non-overlapping.
non-overlapping.

Ni N

Stratum m

Nm

i 1

32
M. S. Ramaiah University of Applied Sciences

Systematic Random Sampling


Units are drawn from the population at regular
intervals clearly defined
Steps
- Compute K =(N/n) and take integer value. K is called
sampling interval
- Select a random number between 1 and k
- Starting with this number, select every kth number
until all the n units are selected

33
M. S. Ramaiah University of Applied Sciences

Example
Suppose in a market survey, you have to select 5
households out of 50 households in a block.
- Number of units in the population N = 50
- Number of units in the sample n = 5
- Sampling Interval K = (N/n) = 50/5 = 10
- Select a random number between 1 and 10
Suppose the selected random number is 5. Starting
with 5, select every 10th unit.
34
M. S. Ramaiah University of Applied Sciences

Example Contd.
1
13
22
31
40
49

2
14
23
32
41
50

3
15
24
33
42

4 5
16
25
34
43

6 7
17
26
35
44

8 9
18
27
36
45

10
19
28
37
46

11
20
29
38
47

12
21
30
39
48

35
M. S. Ramaiah University of Applied Sciences

Cluster Sampling

Group

Population
Population Distribution
Distribution
Sample
Sample
Distribution
Distribution

Instratified
stratifiedsampling
sampling
In
randomsample
sample(n
(n)i)
aarandom
i
chosenfrom
fromeach
each
isischosen
segmentof
ofthe
the
segment
population(N
(N).
i).
population
i

Incluster
clustersampling
sampling
In
observationsare
aredrawn
drawn
observations
from m
mout
outof
ofM
Mareas
areasor
or
from
clustersof
ofthe
the
clusters
population.
population.
36
M. S. Ramaiah University of Applied Sciences

Caution

None of the Non-probability sampling


should be generalised about the
population

37
M. S. Ramaiah University of Applied Sciences

Sampling Distribution
- A conceptual framework

38
M. S. Ramaiah University of Applied Sciences

Sampling Distribution of the


Mean from Normal Population
If X1, X2,.., Xn are n independent random samples
drawn from a normal population with mean and
standard deviation ,
then
the sampling distribution of X follows a normal
distribution with mean and standard deviation /
sqrt(n)
2 ....... n
i
1

X
X
n

X X

Standard deviation of the sample mean =


standard error

=
39

M. S. Ramaiah University of Applied Sciences

Sample Size and Standard Error

The sample
sample size
size determines
determines the
the bound
bound of
of aa statistic,
statistic,
The
since the
the standard
standard error
error of
of aa statistic
statistic shrinks
shrinks as
as the
the
since
sample size
size increases:
increases:
sample
Sample size = 2n
Standard error
of statistic
Sample size = n
Standard error
of statistic

40
M. S. Ramaiah University of Applied Sciences

Determining Sample Size

41
M. S. Ramaiah University of Applied Sciences

Determining Sample Size


using Confidence Interval

If we know the precision (sampling error), the


confidence level, and the standard deviation
of the original population the sample size can
be determined

42
M. S. Ramaiah University of Applied Sciences

Sample Size Determination


Population Mean

Sampling Error E = X , squaring both sides we get

Z
E
2

Where Z is the value corresponding to the area of


((1-) / 2) from the mean of the standard normal
distribution
43
M. S. Ramaiah University of Applied Sciences

Example
A marketing manager of a fast food restaurant in a
city wishes to estimate the average yearly amount
that families spend on fast food restaurants. He wants
the estimate to be within + or Rs. 100 with a
confidence interval of 99%. It is known from an earlier
pilot study that the standard deviation of the family
expenditure on fast food restaurant is Rs. 500. How
many families must be chosen for this problem?

44
M. S. Ramaiah University of Applied Sciences

Solution

Z
E
2

Applying the formula n

n = ((2.58^2) * (500^2)) / (100^2) = 166.41


= 166 (ROUNDED OFF)

45
M. S. Ramaiah University of Applied Sciences

Sample Size Determination


Population Proportion
We know

pP
p (1 p )
n

Sampling Error E = (p-P), squaring both sides and


simplifying
We get:

n Z

p (1 p )

Where Z is the value corresponding to the area of


((1-) / 2) from the proportion of the standard
normal distribution
46
M. S. Ramaiah University of Applied Sciences

Example
A company manufacturing sports goods wants to
estimate the proportion of cricket players among high
school students in India. The company wants the
estimate to be within + or 0.03 with a confidence
interval of 99%. A pilot study done earlier reveals that
out of 80 high school students, 36 students play
cricket. What should be the sample size?

47
M. S. Ramaiah University of Applied Sciences

Solution
p = 36/80 = 0.45
Applying the formula
n = ((2.58^2) (0.45(1-0.45)))/(0.03^2)
n = 1831

48
M. S. Ramaiah University of Applied Sciences

Data Collection Methods


Primary Data Collection
Secondary Data Collection

49
M. S. Ramaiah University of Applied Sciences

Data Collection Methods


Primary Data
Observation method
Interview method
Questionnaires
Warranty cards
Mechanical devices
Secondary Data
Agency
Published material etc.
50
M. S. Ramaiah University of Applied Sciences

Scales of Measurement
Nominal Scale - groups or classes
Gender

Ordinal Scale - order matters


Ranks (top ten videos)

Interval Scale - difference or distance matters


has arbitrary zero value
Temperatures (0F, 0C)

Ratio Scale - Ratio matters has a natural


zero value
Salaries

Likerts Scale
51
M. S. Ramaiah University of Applied Sciences

Sample Rating Scales


Simple category scale: (data: nominal)

Ex:
I plan to purchase a laptop in next twelve months
Yes
No

52
M. S. Ramaiah University of Applied Sciences

Sample Rating Scales


Multiple choice Single response scale (data:
nominal)
Ex:
What newspaper do you read most often?
TOI
DH
The Hindu
Mint
Others (specify:_________)
53
M. S. Ramaiah University of Applied Sciences

Sample Rating Scales


Likert Scale (data: interval)
Ex:
The internet is superior to traditional libraries for
comprehensive searches
Strongly

Agree

Neutral

Disagree

Strongly

Agree
Disagree
54
M. S. Ramaiah University of Applied Sciences

Sample Rating Scales


Semantic Differential Scale (data: interval)
Ex:
Lands end catalog
Fast ___: ___ : ___ : ___ : ___ : ___ : ___ : Slow

55
M. S. Ramaiah University of Applied Sciences

Sample Rating Scales


Numerical Scale (data: ordinal or interval)
Ex:
Extremely
Favourable

3
2
1
Extremely
Unfavourable

Employees cooperation in teams___


Employees knowledge of task ___
Employees planning effectiveness ___

56
M. S. Ramaiah University of Applied Sciences

Sample Rating Scales


Multiple rating list scale (data: interval)
Ex:
Please indicate how important or unimportant each
service characteristic is:
Important
Unimportant
Fast reliable repair
7 6 5 4 3 2 1
Service at my location
7 6 5 4 3 2 1
Maintenance by manufacturer
7 6 5 4 3 2 1
Knowledgeable technicians
7 6 5 4 3 2 1
Service contract after warranty 7 6 5 4 3 2 1
57
M. S. Ramaiah University of Applied Sciences

Sample Rating Scales


Constant-Sum Scale (data: ratio)
Ex:
Taking all the supplier characteristics we have just
discussed and now considering cost, what is their
relative importance to you (dividing 100 units
between)
Being one of the lowest cost suppliers
All other aspects of supplier performance
Sum

100
58
M. S. Ramaiah University of Applied Sciences

Sample Rating Scales


Stapel Scale (data: ordinal or interval)
Ex:
Company Name
+3
+3
+3
+2
+2
+2
+1
+1
+1
Technology
Existing
Reputation
Leader
Products
-1
-1
-1
-2
-2
-2
-3
-3
-3
59
M. S. Ramaiah University of Applied Sciences

Sample Rating Scales


Graphic rating scale (data: ordinal or interval or
ratio)
Ex:
How likely are you to recommend complete care to
others?
Very Likely
Very Unlikely

60
M. S. Ramaiah University of Applied Sciences

Data Analysis

61
M. S. Ramaiah University of Applied Sciences

The cure for boredom is curiosity ,


There is no cure for curiosity
- Dorothy Parker

62
M. S. Ramaiah University of Applied Sciences

Blind men and an elephant


- Indian fable

Things arent always what we think!


Six blind men go to observe an elephant. One feels the side and thinks the
elephant is like a wall. One feels the tusk and thinks the elephant is a like a
spear. One touches the squirming trunk and thinks the elephant is like a
snake. One feels the knee and thinks the elephant is like a tree. One
touches the ear, and thinks the elephant is like a fan. One grasps the tail and
thinks it is like a rope. They argue long and loud and though each was partly
in the right, all were in the wrong.
For a detailed version of this fable see:
letter=B&spage=3

http://www.wordinfo.info/words/index/info/view_unit/1/?

63
M. S. Ramaiah University of Applied Sciences

Stages in Data Analysis


Error checking
and verification

Editing

coding
Data entry
Key Boarding
Data
Analysis

Descriptive
analysis

Bivariate
analysis

Univariate
analysis

Interpretation

Multivariate
analysis

64
M. S. Ramaiah University of Applied Sciences

Descriptive Analysis Techniques

Count (frequencies)
Percentage
Mean
Mode
Median
Range
Standard deviation
Variance
Ranking

65
M. S. Ramaiah University of Applied Sciences

Overview of the Stages in Data Analysis

Error checking
and verification

Editing

coding
Data entry
Key Boarding
Data
Analysis

Descriptive
analysis

Bivariate
analysis

Univariate
analysis

Interpretation

Multivariate
analysis

66
M. S. Ramaiah University of Applied Sciences

Frequency Distributions

To what extent did you increase your skills in


putting together a household budget?

Women (N=30)

A lot

Some

A little

Not at all

14

Uni-variate Analysis The analysis of a single variable, for


purposes of description (examples: frequency distribution,
averages, and measures of dispersion)
67
M. S. Ramaiah University of Applied Sciences

Percentage Distributions

To what extent did you increase your skills in


putting together a household budget?

Women (N=30)

A lot

Some

A little

Not at all

46%

30%

17%

7%

68
M. S. Ramaiah University of Applied Sciences

Graphing Frequency Data


How did you first hear about the web-site?

Court Referral
Social Worker
Friend or Acquaintan
Librarian
Web Search Engine
Newspaper Story
Other

69
M. S. Ramaiah University of Applied Sciences

Means and Medians

Math
History
English
English
History
Biology
Music
Latin
Biology
Math

98
95
96
95
93
94
92
93
98

Latin
Music
Gym
Gym

92
94
40

Mean = 87
Median = 94

70
M. S. Ramaiah University of Applied Sciences

Note

40

50

55

94

100 100 100

Mean = 81

40

92

93

94

95

Mean = 87

96

98

71
M. S. Ramaiah University of Applied Sciences

Histograms

72
M. S. Ramaiah University of Applied Sciences

Cross Tabulations

73
M. S. Ramaiah University of Applied Sciences

Graphing comparisons
Satisfaction with Services
40

Satisfaction Score

35
30
25
20
15
10
5
0
A

Clinic Name

74
M. S. Ramaiah University of Applied Sciences

Satisfaction with Services


16

Satisfaction Score

14
12
10

Staff
Advice
Facility

8
6
4
2
0
A

Clinic

75
M. S. Ramaiah University of Applied Sciences

Satisfaction with Services


16

Satisfaction Score

14
12

A
B
C
D
E

10
8
6
4
2
0
Staff

Advice

Facility

Satisfaction Component

76
M. S. Ramaiah University of Applied Sciences

Overview of the Stages in Data Analysis

Error checking
and verification

Editing

coding
Data entry
Key Boarding
Data
Analysis

Descriptive
analysis

Bivariate
analysis

Univariate
analysis

Interpretation

Multivariate
analysis

77
M. S. Ramaiah University of Applied Sciences

Bi-variate Analysis

The analysis of two variables simultaneously


for determining the empirical relationship
between them

Y = f (X)

78
M. S. Ramaiah University of Applied Sciences

Few Techniques Available

Correlation
Regression
Chi-square Test and Cramers rule
Hypothesis Test for two population means/proportions
Paired T-tests comparing two groups

79
M. S. Ramaiah University of Applied Sciences

Measure of Correlation:
of Correlation
SymbolCoefficient
:r

Range : -1 to 1
Sign : Type of correlation
Value : Degree of correlation
Examples:
r = 0.6 , 60 % positive correlation
r = -0.82, 82% negative correlation
r = 0, No correlation

80
M. S. Ramaiah University of Applied Sciences

Regression
Regression helps
To identify the exact form of the relationship
To model output in terms of input or process variables

y=a+bx
Examples:
Yield = 5 + 3 x Time
Y = 2 - 5x

81
M. S. Ramaiah University of Applied Sciences

Coefficient of Regression
Measure of degree of Relationship
Symbol : R2
Range of R2 : 0 to 1

If R2 > 0.6, the Model is reasonably good

82
M. S. Ramaiah University of Applied Sciences

Error or Residual Analysis


Root Mean Square Error for Prediction
(MSEP)
x
65
8
89
88
50
73

y
69
78
8
21
24
72

Regression Statistics
Multiple R
0.594159006
R Square
0.353024925
Adjusted R Square
0.191281156
Standard Error
27.80337004
Observations
6

Intercept
x

Coefficients
83.00449781
-0.605970474

83
M. S. Ramaiah University of Applied Sciences

Root Mean Square Error:


x
65
8
89
88
50
73

y
69
78
8
21
24
72

Predicted y
43.62
78.16
29.07
29.68
52.71
38.77

Error
25.38
-0.16
-21.07
-8.68
-28.71
33.23
Sum

Error Square
644.33
0.02
444.08
75.33
824.03
1104.32
3092.11

Predicted y = 83.0045 0.6059 x


Error = y predicted y
Mean Square Error = 3092.11 / 6 = 515.35
Root Mean Square Error = 22.70
84
M. S. Ramaiah University of Applied Sciences

Difference between Observed Values Yi and model


predicted values f(Xi) for n datasets
Decomposition of MSEP has been carried out
using mean bias (UM), slope bias (UR) and random
error (UD)

f ( X i ) Y / MSEP

U S X * (1 b j ) 2 / MSEP
R

1 r * SY / MSEP
2

85
M. S. Ramaiah University of Applied Sciences

Logistic Regression
Objective
To develop a mathematical model for an attribute or response metric
(Y) in terms of other available attributes (Xs).
When to Use
Xs : Continuous
Y : Discrete binary

86
M. S. Ramaiah University of Applied Sciences

Hypothesis Test for Difference


between Two Means

Objective
To test hypothesis that compare the population
mean of interest for two separate populations
(independent samples)
Test Statistic (Large Sample)
2
Sample)
1

X X

12

22

n n
1

Test Statistic (Small

X X


n1 n2
1

87
M. S. Ramaiah University of Applied Sciences

Chi-Square Test
Objective:
To test whether two variables which have frequency
data are related or not
Usage:
When both the variables ( X & Y) are categorical
(grouped)
Cramers Rule: To quantify the relationship between X &
Y
88
M. S. Ramaiah University of Applied Sciences

Overview of the Stages in Data Analysis

Error checking
and verification

Editing

coding
Data entry
Key Boarding
Data
Analysis

Descriptive
analysis

Bivariate
analysis

Univariate
analysis

Interpretation

Multivariate
analysis

89
M. S. Ramaiah University of Applied Sciences

Multivariate Analysis
The analysis of the simultaneous relationships
among several variables
Analyse the data covariance structure to
understand it or to reduce the data dimension
Assign observations to groups
Explore relationships
variables

among

categorical
90

M. S. Ramaiah University of Applied Sciences

Few Techniques Available

Multiple Linear Regression


Cluster Analysis
Factor Analysis
ANOVA
MANOVA
Conjoint Analysis
Optimisation Techniques .

91
M. S. Ramaiah University of Applied Sciences

Multiple Regression
To model output variable y in terms of two or
more variables
General Form:
Y = a + b1X1 + b2X2 + - - - + bkXk
Two variable case:
Y = a + b1X1 + b2X2
Adjusted R2
If Adj R2 > 0.6, then the model is reasonably good
P value from coefficient table
If p value < 0.05, the corresponding term has
strong relationship with output

92

M. S. Ramaiah University of Applied Sciences

Residual Plots: Error Analysis


Y = 44+0.19X1-2.55X2

93
M. S. Ramaiah University of Applied Sciences

Main Effects Plot - Data Means for Impurity

Main Effects Plot


Day

Shift

Time

0.038

Impurity

0.033

0.028

0.023

0.018

Evidence
Evidenceof
ofaastrong
strongShift
Shiftto
toShift
ShiftEffect
Effect

94
M. S. Ramaiah University of Applied Sciences

Validation Tests for Model


Adequacy

Mean Square Error (MSE) for checking Model


n
^ 2
Precision

MSE


i 1

Yi Yi

n2

Mean Bias (MB) for checking


Model Accuracy
n

MB

Y f ( X )
i 1

where, f(Xi)= ith model Prediction


95
M. S. Ramaiah University of Applied Sciences

Factor Analysis
Loading Plot of Pop, ..., Home
Home

0.75

School

Second Factor

0.50

0.25
Pop

0.00

Employ

-0.25
Health

-0.50
-0.4

-0.2

0.0

0.2
0.4
First Factor

0.6

0.8

1.0

Explain the presence of each variable with the sign (+ or -). This
way we can reduce the number of variables
96
M. S. Ramaiah University of Applied Sciences

Predictors Selection

97
M. S. Ramaiah University of Applied Sciences

P = 0.001

98
M. S. Ramaiah University of Applied Sciences

Classification Methods
Example:
x1

Attribute 2

x2

Label : y

y1 (Red) , y2 (Blue)

x2

Attribute 1

x2

40
38
36
34
32
30
28
26
24
22
20
10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00

> 35

y1

< 28

x1

< 15.5

y2

y2
> 15.5

y1

x1

99
M. S. Ramaiah University of Applied Sciences

CLASSIFICATION METHODS

Example: Rules
Attribute 1

x1

Attribute 2

x2

Label : y

y1 (Red) , y2 (Blue)

If x2 > 35 then y = y1
If x2 < 28, then y = y2
If 28 > x2 > 35 & x1 > 15.5, then y = y1

x2
< 28

> 35

y1

x1

< 15.5

y2

y2
> 15.5

y1

If 28 > x2 > 35 & x1 < 15.5, then y = y2

100
M. S. Ramaiah University of Applied Sciences

Cluster Analysis
Objective
To classify the records or items into a smaller number of groups based
on the values of available attributes.
When to Use
When there is no Y attribute
All attributes are considered as Xs only

101
M. S. Ramaiah University of Applied Sciences

Weight in kg
Acceleration in m/s2

Acceleration in m/s2

K-Nearest Neighbors Cluster


Analysis

Weight in kg

102

M. S. Ramaiah University of Applied Sciences

ANOVA or Experimental Design


Sometimes, an investigator would like to compare
more than two population means in a problem
situation
ANOVA decomposes the
components of variation

total

variation

into

Population 1

Population 2

Population 3

103
M. S. Ramaiah University of Applied Sciences

MANOVA and Conjoint Analysis

MANOVA is similar to the ANOVA with added ability to


handle several dependent variables
The most common applications of conjoint analysis
are market research and product development for
making trade-offs

104
M. S. Ramaiah University of Applied Sciences

Optimisation Methods
Objective
To identify the best values of a set of variables
(Xs) which will optimize an objective function
satisfying a given set of constraints
For n variables in m constraints
Max / Min Z = C1x1 + C2x2 + .Cnxn
Subject to
a11 x1 + a12x2 + . + a1nxn < /> = b1
a21 x1 + a22x2 + . + a2nxn < /> = b2

am1 x1 + am2x2 + . + amnxn < /> = bm


And xi > 0, I = 1,2,.n

105
M. S. Ramaiah University of Applied Sciences

You never know what is enough


unless you know what is more than
enough
- William Blake

106
M. S. Ramaiah University of Applied Sciences

Session Summary (1/2)


Statistical Techniques and Tools:
Completely dependent on type of data used
(continuous or discrete)
Normal Distribution:
Describes many natural phenomena, industrial and
scientific situations. A normal curve is a graphical
representation to describe the normal distribution
Data Analysis is carried out in two distinct
environment:
Result of a special study or Experiment
By product of some operations or Observational
107
M. S. Ramaiah University of Applied Sciences

Session Summary (2/2)


Uni-variate Analysis:
The analysis of a single variable, for purposes of
description
(examples:
frequency
distribution,
averages, and measures of dispersion)
Bi-variate Analysis:
The analysis of two variables simultaneously for
determining the empirical relationship between
independent and dependent variables
Multi-variate Analysis:
The analysis of the simultaneous relationships among
several variables
108
M. S. Ramaiah University of Applied Sciences

You might also like