Professional Documents
Culture Documents
Program
Data Analysis using SPSS
PRESENTER
MR VENKAT
SPSS
• Statistical
• Package for
• Social
• Sciences
VERSIONS OF SPSS
• SPSS Ver-1 to Ver-5 : DOS
VERSIONS
• SPSS Ver-6 to Ver-15 : WINDOWS
VERSIONS
• SPSS-X : For MAIN FRAMES (on
various operating system platforms)
• SPSS-LAN: For LANs
• Web site: http://www.spss.com
BASIC APPLICATIONS
• Creating data as Spread-
sheet
• Generating Reports as
Tables
• Statistical Analysis of Data
• Graphic Presentations
MAIN STEPS IN USING SPSS
• Creating data or Getting data
• Defining data
• Modifying data
• Processing data
– generating tables
– statistical analysis
– generating graphs
Structure of SPSS data file
• Variables (Fields) in columns
• Cases (Respondents) in rows
• A case contains several
variables
Data Definition
• Variables Name
• Variable Type
• Field Width
• Decimal Positions
• Variable Label
• Value Labels
• Missing Values
• Column Width
• Alignment
• Scale
Variable Name
• Maxi. 8 characters (up to Ver
10)
• First letter must be alphabet
• Arithmetic operators, special
symbols and blank spaces not
permitted
• Two variables can not have
same name in one data file
Variable Label
• It helps in reading outputs.
• No restriction on characters.
Variable Type
• Numeric (Floating point)
• String (Character / Text)
• Date
• Currency
Value Labels
• It helps in reading tables and other
outputs.
• For example variable “Marital Status”
has five values (codes):
– value 1 means “Never Married”
– value 2 means “Currently Married”
– value 3 means “Widow/Widower”
– value 4 means “Divorced”
– value 5 means “Separated”
Missing Values
• These are values indicating “No
Response” or “Not Applicable” in any
variable.
• Declaring missing values tells the SPSS
package to ignore the cases containing
these values during analysis.
• A blank in Excel or dBase/FoxPro file is
treated as missing value.
• In SPSS data file, blanks appear as
dots (.) denoting that theses are
missing values.
Creating Data directly in SPSS
• Exponential Smoothing.
• Autoregression.
• Auto Regressive Integrated Moving
Averages (ARIMA).
• X11ARIMA.
• Seasonal Decomposition.
MULTIPLE RESPONSE
ANALYSIS
• Defining sets.
• Frequencies
• Crosstabulation.
CHARTS
• Bar, Line, Area, Pie, Hi-Low
• Pareto Charts, Control Charts (X-
bar,R,p,c)
• Box Plot, Error Bar
• Scatter Plot, Histogram, P-P Plot, Q-Q
Plot, Sequence Charts
• ROC Curve (Receivers’ Op
Characteristic)
• Time Series : Autocorrelations, Spectral
Plots, Cross-correlations,
Types of Data
• Nominal: A variable can be treated as
nominal when its values represent
categories with no intrinsic ranking; for
example, the department of the company
in which an employee works.
• Examples of nominal variables include
• region
• zip code
• religious affiliation etc.
Ordinal Data
• A variable can be treated as ordinal when
its values represent categories with some
intrinsic ranking; for example, levels of
service satisfaction from highly dissatisfied
to highly satisfied. Examples of ordinal
variables include attitude scores
representing degree of satisfaction or
confidence and preference rating scores.
Scale Data
• A variable can be treated as scale when
its values represent ordered categories
with a meaningful metric, so that distance
comparisons between values are
appropriate. Examples of scale variables
include age in years and income in
thousands of dollars.
Data Analysis
• Simple Tabulation and Cross Tabulation
• Univariate and Bivariate Analysis
• Dependent and Independent variables
• First Stage Analysis- Simple Tabulation
• Second Stage Analysis- Cross Tabulation
• The Chi-square test for cross tabulation
Anova and the design of
Experiments
• The analysis of variance technique is used
when the independent variables are of
nominal scale (categorical) and the
dependent variable is metric.
• The independent variable could be
different level of prices, different pack
sizes, or different product colors and the
dependent variable could be sales of the
product.
Experimental Designs
• Completely Randomized design in a one
way ANOVA (single Factor)
• Randomized Block Design (single blocking
factor)
• Latin Square Design (two blocking factor)
• Factoral design with two or more factors.
Correlation and Regression
• Correlation Analysis- to measure the
degree of association between two sets of
quantitative data e.g. how are sales of
product A correlated with sales of product
B etc.
• Regression Analysis- to explain the
variation in one variable based on the
variation in one or more variables.
Regression
• Basically two approaches:
• 1. Hit and trial approach (stepwise regression)-
exploratory research
• 2. A preconceived approach
• The output consist of the beta coefficient for all the
independent variables in the model. The output also
gives the result of a t-test for significance of each
variable in the model, and the result of F-test for model
on the whole.
• The coefficient of determination R2 is the total varience in
y explained by all independent variables in the
regression equation.
Problem
• A manufacturer and marketer of electric motors
would like to build a regression model consisting
of 5 or 6 independent variables, to predict sales.
Past data has been collected for 15 sales
territories, on sales and 6 independent variables.
Build a regression model and recommend
whether or not it should be used by the company
Dependent variable
Y= Sales in Rs. Lakh in the territory
Independent Variable
X1= Mkt potential in the territory
X2= No. of dealers of the company in the
territory
X3= No. of sales people in the territory
X4= Index of Competitor activity on a 5 point
scale
(1= low, 5= high)
X5= No. of service people in the territory
X6= No. of existing customers in the territory
Factor Analysis
• For Data reduction
• There are two stages in Factor analysis
• Factor Extraction process
• Rotation of principal components
•ANY
QUESTIONS
PLEASE?????
???
THANK YOU