Original Title: A Lesson 1 Introduction to Statistics & SPSS

Lesson 1

Introduction to Statistics

What is statistics?

Why is statistics needed?

Population and sample

Variable

Measurement

Data

Introduction to SPSS Windows

Basic steps in data analysis using SPSS

Statistical Analysis using SPSS Lesson 1

What is Statistics?

Statistics is the science of learning from data. It includes everything from planning for the

collection of data and subsequent data management to end-of-the-line activities such as

drawing conclusions of numerical facts and presentation of results. Statistics combines

mathematical theories with prevailing knowledge in different areas of sciences resulting

in the advancement of broader scientific understanding. Today, statistics has become an

important tool in the work of many disciplines such as medicine, psychology, education,

sociology, engineering and physics, just to name a few.

A famous statistician once remarked, Statistics is concerned with one of the most basic

human needs: the need to know about the world and how it operates in the face of

variation and uncertainty. We can say the whole field of statistics revolves around this

concept variation.

Variation is a concept that is used to quantify the differences in a characteristic of interest

between different members in a population. For example, birth weights of full-term

newborn babies are never the same. Some will be heavier than others, some will be

lighter than others, but the majority will weigh around 3.5kg. In this example, the

characteristic of interest is the weight of newborn babies and it is a variable. The

quantification of the differences in birth weight is known as variation. A researcher may

want to know why newborns differ in weight from one another. Are the differences due to

differences in the mothers age, mothers weight, habits, education, etc? These are some

of the questions that need to be answered. In this case, statistical methods can be used to

design, collect and analyze the observed data. The statistical principles can then be used

to understand more about the differences in the birth weights.

Why is Statistics Needed?

Data can be captured and stored in many ways. However, application of proper statistical

procedures is vital in the planning and data collection process to ensure the data collected

are reflective of the population at large. Once data are collected, we study the data. It is

important to remember that data are just crude information and not knowledge. When the

data are explored and the numbers are crunched, we obtain information. This division or

area of statistics is known as descriptive statistics. Many procedures are available to help

describe the data. The deeper we explore the data, the more we understand about the data

at hand. Care must be taken to understand the type of data that is being analyzed so that

appropriate descriptive procedures can be used.

The information gained through performing descriptive analyses on a particular variable

may shed some insight on the characteristics of interest. Note that this is only a sample

characteristic, also known as statistic (without the ending s in the word statistics).

But the main purpose of any study is not just to describe the sample, but to make

inferences about the population at large. The term parameter is used to refer to the

characteristic of interest in the population. Again, information obtained from a single

sample or at a single time point may or may not be a true reflection of the universe. Some

logical linkage must be made between the sample statistic and the population parameter.

This division or area of statistics is known as inferential statistics, which is discussed in

chapter 5. This is a wide area and involves the application of probability and statistical

theory, especially in the underlying statistical distributions.

Statistical Analysis using SPSS Lesson 1

Information tested and retested with different samples and at different time points

becomes facts, if the data can consistently support it. Finally, facts become knowledge

when they are used in the successful completion of the decision process. The whole

sequence is known as data-driven decision-making process. Figure below shows the level

of statistical methods or procedures needed for a study depends on the desired level of

improvement in decision making.

Level of Knowledge

statistical

methods

Facts

Information

Data Level of improvement in decision making

Data-driven decision-making process

Population and Sample

Population: A set of things or objects in which we have an interest at the particular time.

Examples: Workers at a factory, students in a college, in-patients at a hospital.

Sample: A subset of the population

Examples: A group of workers at the factory, a selection of students from the college.

Several sampling methods can be used to obtain a sample from the population. They can

be classified under two broad categories: probability sampling and non-probability

sampling. In probability sampling, we require a sample frame a list of all the items or

objects in the population. Within probability sampling there are a few types of sampling

procedures, the basic one is the simple random sample. All analyses in SPSS assume that

the data are collected using this simple random sampling procedure.

Variable

A variable can be defined as a characteristic of things or objects that take different values

in different items that are tested. The opposite of a variable is a constant.

For example, the weight of newborn babies varies from one to another. So, the weight of

newborn babies is considered to be a variable. The gender of the babies also differs from

one to another. So, gender is a variable too. The whole field of statistics revolves around

this term variable and the hundreds of statistical tools we have are concerned with

describing these variables and finding associations between them.

Statistical Analysis using SPSS Lesson 1

There are two types of variables:

Qualitative variable: This is a phrase used to describe characteristics that cannot

be measured or counted, but merely categorized like race, sex, colour, exam

grades and blood group.

Quantitative variable: This is a phrase used to describe measurable characteristics

like height, weight, age and exam marks and counts like number of passes,

number students and number of accidents.

Measurement

Measurement is the assignment of numbers to represent a characteristic. It is useful to

clarify what is being measured and what it measures. For example, the clinical

thermometer measures the body temperature. But what does the body temperature

measure or indicate? Perhaps, the body temperature is an indicator of the presence of

bacterial or viral infection.

The units of measurements are equally important for computations and inference

purposes. For example, consider an increase in body temperature. An increase of 3

0

Fahrenheit may not be a cause for concern, but an increase of 3

0

Celsius may be critical.

Concepts and Indicators

A concept is what we are hoping to capture and indicators are what we use to capture it.

Say, a doctor wants to establish the health status of a group of workers. The health status

is a concept. Since health status varies from person to person, it can be considered to be a

variable too. Health status is a concept and it is not directly measurable it is an

unobserved measure, often called a latent variable. Then, how do we measure the

unobserved? First, we need to identify some reliable indicators of health status. In

healthcare, variables like weight, blood pressure, cholesterol, blood sugar levels are often

used as some indicators of health status. These are measurable variables and their units of

measurement are different too. If a persons blood pressure is always high, he is said to be

of poor health. Blood pressure is also known to have high levels of association with

cholesterol level, blood sugar level and weight. A person who has values in the normal

range, for all of these measures, is said to be healthy.

Data

Data can be considered as the raw material of statistics. The information gathered, facts

tested and ultimately the knowledge gained, depends heavily on the quality of data

collected. Therefore, considerable importance must be paid to the data collection stage.

Data can be obtained either from primary or secondary sources. Data compiled from

sources like records, journals and archives are called secondary data. While data collected

primarily through designed experiments or surveys, by the researcher are called primary

data.

Statistical Analysis using SPSS Lesson 1

Types of data

1. Qualitative data can be classified further into nominal data and ordinal data.

Nominal data are categorical characteristics that you can name.

Examples: Gender: Male or female based on physical traits.

Blood group: A, B, AB or O based on allele types.

Of course, it is not true that group A is better than group B.

They are just names given based on particular characteristics.

Ordinal data are categorical characteristics that you can name and rank as well.

Examples: Socio-economic status: Low, middle or high.

Exam grades: A, B, C, D or E based on level of achievement.

Of course, grade A is better than grade B and so on.

2. Quantitative data can be classified into discrete data and continuous data.

Discrete data are numerical characteristics that are countable (whole numbers).

Examples: Number of males and number of females

Number of patients waiting for surgery

Number of students sitting for an exam

Continuous data are numerical characteristics that are measurable.

Examples: Marks obtain by students

Body mass index (BMI) of patients

Time taken by athletes to complete a road race

Since continuous data are measureable, they can be measured in decimals.

It is very important to understand the different types of data so that they can be described

and presented in an appropriate manner. For example, it does not make sense to find the

average for a group of males and females. In this case the information is best stated in the

form of percentages. Variables like weight and height are best described using average

and percentiles. For visual data presentations, bar charts should be used for qualitative

data and histograms should be used for quantitative data. The underlying distributions

also differ for different data types. In making inferences, the choice of statistical tests

depends on the type of data.

Statistical Analysis using SPSS Lesson 1

Introduction to SPSS Windows

Statistical Packages for Social Sciences (SPSS) for Windows provides a powerful

statistical analysis and data management system in a graphical environment, using

descriptive menus and simple dialog boxes to do most of the work for you. Most tasks

can be accomplished simply by pointing and clicking the mouse.

In addition to the simple point-and-click interface for statistical analysis, SPSS for

Windows provides:

Data Editor. A versatile spreadsheet-like system for defining, entering, editing,

displaying data.

Viewer. The Viewer makes it easy to browse your results, selectively show and hide

output, change the display order results, and move presentation-quality tables and charts

between SPSS and other applications.

Multidimensional pivot tables. Results come alive with multidimensional pivot tables.

Explore your tables by rearranging rows, columns, and layers. Uncover important

findings that can get lost in standard reports. Compare groups easily by splitting your

table so that only one group is displayed at a time.

High-resolution graphics. High-resolution, full-color pie charts, bar charts, histograms,

scatterplots, 3-D graphics, and more are included as standard features in SPSS.

Database access. Retrieve information from databases by using the Database Wizard

instead of complicated SQL queries.

Data transformations. Transformation features help get your data ready for analysis. You

can easily subset data, combine categories, add, aggregate, merge, split, and transpose

files, and more.

Electronic distribution. Send e-mail reports to others with the click of a button, or export

tables and charts in HTML format for Internet and Intranet distribution.

Online Help. Detailed tutorials provide a comprehensive overview; context-sensitive Help

topics in dialog boxes guide you through specific tasks; pop-up definitions in pivot table

results explain statistical terms; the Statistics Coach helps you find the procedures that

you need; and Case Studies provide hands-on examples of how to use statistical

procedures and interpret the results.

Command language. Although most tasks can be accomplished with simple point-and-

click gestures, SPSS also provides a powerful command language that allows you to save

and automate many common tasks. The command language also provides some

functionality not found in the menus and dialog boxes.

Statistical Analysis using SPSS Lesson 1

SPSS Windows

SPSS for Windows provides a powerful statistical analysis and data management system

in a graphical environment, using descriptive menus and simple dialog boxes to do most

of the work for you. Simply pointing and clicking the mouse can accomplish most tasks.

SPSS for Windows provides:

SPSS Data Editor. A versatile spread-sheet-like system for defining, entering, editing,

and displaying data.

SPSS Viewer. The new Output Navigator makes it easy to browse your results,

selectively show and hide output, change the display order results, and move

presentation-quality tables and charts between SPSS and other applications.

SPSS Chart Editor. Helps you edit charts. You can change the pattern, color, style, and

label of the graphs. You can also modify the axis, rotate or swap the axis.

SPSS Syntax Editor. This can be used to save, view, modify and rewrite the syntax.

Help. Comprehensive overview of SPSS basics is also available in the online tutorial

under the Help menu. The meanings of the statistical terms can also be obtained by

double-clicking on the terms themselves.

Statistical Analysis using SPSS Lesson 1

Basic steps in Statistical Data Analysis Using SPSS

The four basic steps in data analysis in SPSS is summarized as below.

Step 1

Bring

your data into

SPSS

Get your data into SPSS Data Editor

This can be done either by;

directly entering the data in the Data Editor.

open a previously saved SPSS file.

read a spreadsheet/text data file.

Step 2

Select

a procedure

from the menu

Select a procedure from the men.

This depends on the objective of the study.

Graph procedure to create a chart.

Analyze procedure to perform statistical analysis.

Step 3

Select

variable(s)

for the analysis

Select a variable

Make sure the procedure is appropriate for the

variable.

all the variables in data file are displayed in a

Dialog Box.

just highlight and click the variable(s) into the

respective dialog boxes.

Step 4

Run & Examine

the results

Run the procedure by clicking OK

Results are displayed in the OUTPUT VIEWER.

it can be a chart,

it can be descriptive statistics,

it can be inferential statistics,

Based on the output, draw conclusions accordingly.

The four steps in data analysis

