You are on page 1of 37

5123 Business Statistics

Data Collection
Case Study Diamonds
Diamonds have been treasured as gemstones since their use as
religious icons in ancient India. Their usage in engraving tools
also dates to early human history. The popularity of diamonds
has risen since the 19th century because of increased supply,
improved cutting and polishing techniques, growth in the world
economy, and innovative and successful advertising campaigns.

The price of a jewellery diamond depends on the four Cs:


Carat weight
Clarity
Cut
Colour

How should a potential diamond buyer collect information about


diamonds in order to make the correct decisions when making2 a
purchase?
Case Study Diamonds
The Rapaport Report is the jewellery industry standard
for the pricing of diamonds. The report is published weekly
and given to jewellers and diamond merchants to set prices
for consumers. The report is issued in the form of a table
and prices diamonds based on the 4C's. The Rapaport List
is copyrighted and available only to subscribers to its
magazine.
Diamond Colours are assigned letters from D, for colourless
stones, through to K, for increasingly more yellow stones.
Diamond Cut is graded as either Ideal, Good, Very Good or
Excellent. This depends on the depth percentage of the
diamond which is determined by the ratio of the diameter
as compared to the depth of the diamond.
3
Case Study Diamonds
Diamond Clarity is assessed by the following scales:

Internally Flawless - diamonds have no inclusions visible under 10x magnification,


IF only small blemishes on the diamond surface.

SI1 Slightly Included - diamonds have noticeable inclusions that are easy to very easy for
a trained grader to see when viewed under 10x magnification. The SI category is divided
SI2 into two grades; SI1 denotes a higher clarity grade than SI2.

VS1 Very Slightly Included - diamonds have minor inclusions that are difficult to somewhat
easy for a trained grader to see when viewed under 10x magnification. The VS category
is divided into two grades; VS1 denotes a higher clarity grade than VS2. Typically the
VS2 inclusions in VS diamonds are invisible without magnification.

VVS1 Very Very Slightly Included - diamonds have minute inclusions that are difficult for a
skilled grader to see under 10x magnification. The VVS category is divided into two
VVS2 grades; VVS1 denotes a higher clarity grade than VVS2.

By inclusions the diamond trade means imperfections.4


Case Study Diamonds
The buyer wishes to determine the average carat weight,
the most common colour, the overall clarity and the
quality of the cut of diamonds displayed by a major
jewellery retailer.

What is the population of interest?

Due to the large number of diamonds supplied by the


retailer the buyer decides to collect a sample of 2,690
diamonds.

What is the sample of interest?


5
Case Study Diamonds
The data are described by the following variables:
Carat weight
Cut
Clarity
Colour
Price

Which type of data is the buyer using?

6
Key messages this week you will
learn...
What is Statistics
What we need for Statistical Analysis
The nature of Statistical Data
The distinction between Population and Sample

The distinction between Parameter and Statistic

The 2 types of Statistics

The 2 approaches to collecting data

The distinction between Sample Surveys and Censuses

Sources of errors in quantitative data collection

What we mean by Sample Design

The methods of Random Sampling 7


What is Statistics?
In various aspects of life, we come across many
questions whose answers are not immediately and
accurately available.

For example, we may ask ourselves many questions:


Shall we have enough rainfall this summer?
What is the level of unemployment in Australia?
Are our industries able to compete on the world market?

Statistics is that branch of knowledge which provides us


with tools/techniques to answer, at least to some extent,
the above questions and many more such questions.
8
What we need for Statistical Analysis?
A minimum level of knowledge (i.e. understanding)

Information/data already available. If information/data are not


available, then the first step is to collect them.

Once data are collected, there is the need to organise and present
them in a manner to facilitate their interpretation.

We then have to analyse the data to uncover with precision


patterns which exist in the data set and relationships unheard of
previously.

Finally, conclusions and recommendations are made so that, in


turn, the relevant authorities may take appropriate policy decisions.
9
Diagrammatic Definition of Statistics

Organisation &
Presentation
of Data

Analysis of
Data Collection of
Data

Conclusion &
Recommendation

People Nature

10
Nature of Statistical Data
Data are categorised in two ways: primary data and
secondary data

Primary data are data which have been collected for a


specific purpose and are being used for that purpose.
E.g. data collected by means of a sample survey

Secondary data are data which have been collected for a


specific purpose but which are being used for various other
studies.
E.g. data collected for administrative reasons

Beware of secondary data ! 11


Key messages this week you will
learn...
What is Statistics
What we need for Statistical Analysis
The nature of Statistical Data
The distinction between Population and Sample

The distinction between Parameter and Statistic

The 2 types of Statistics

The 2 approaches to collecting data

The distinction between Sample Surveys and Censuses

Sources of errors in quantitative data collection

What we mean by Sample Design

The methods of Random Sampling 12


Population and Sample
Population entire collection of objects/subjects which we
are interested in collecting information/data from
E.g. population of students, population of cattle, population of
buildings etc
Sample part of the population randomly (if we wish to
generalise) selected
E.g. sample of students, sample of cattle, sample of buildings etc

Random
selection

13
Lecture Example 1: Diamonds
The buyer wishes to determine the average carat weight, the
most common colour, the overall clarity and the quality of
the cut of diamonds displayed by a major jewellery retailer.

What is the population of interest?


All diamonds used in jewellery by the retailer

Due to the large number of diamonds supplied by the retailer


the buyer decides to collect a sample of 2,690 diamonds.

What is the sample of interest?


The 2,690 diamonds sampled

14
Lecture Example 1: Diamonds
The data are described by the following variables:
Carat weight
Cut
Clarity
Colour
Price

Which type of data is the buyer using?


This is primary data as the buyer is collecting them for his own
information (he is not getting them from someone else)

15
Lecture Exercise 1: Your Turn (Stop and Consider)
Fifty per cent of teens sext by mobile phone
(August 2, 2015, by Cosima Marriner, Sun-Herald senior writer)

According to a new survey, teenage girls are using their phones


to send sexual images of themselves because they think it's fun
and sexy, rather than because they feel pressured to.
One in two teenage boys and girls have used a mobile phone to
send a sexually explicit image of themselves, according to the
biggest sexting survey undertaken in Australia.
Teenage girls are using their mobiles to send sexual images of
themselves because they think it's fun and sexy, rather than
because they feel pressured by boys, the new research from the
Australian Institute of Criminology found.

16
Lecture Exercise 1 (ctd)
Most of the 1200 teens
surveyed who had sexted
said they sent the image to
a person with whom they
had a relationship.
Forty per cent had sent a
sext to more than one
person in the past year.
Only six per cent of sexters
reported sending an image
on to a third party for whom
the picture wasn't originally
intended.

17
Lecture Exercise 1 (ctd)
The criminology researchers who conducted the survey said the results
underscored the mismatch between sexting laws, which classify the
practice as child abuse or child pornography and do not distinguish
between consensual and non-consensual sexting, and the reality of teen
sexting.
"Lots of kids are doing it, but not very often, and not with many people,"
Murray Lee, associate professor in criminology at Sydney University, said.
"For the most part sexting is an exploration of their sexuality. Sometimes
it can move into the field of bullying but that's very rare."
But school cybersafety lecturers said boys were regularly badgering girls
to send them sexual images. "Not a day goes by that I don't deal with
girls around Australia under pressure to send these photos," said Susan
McLean, a leading cyber safety expert, and former police officer.

What is the population of interest?


What is the sample of interest?
18
Which type of data is being used?
Parameter and Statistic
Parameter a numerical characteristic of the population
E.g. average weight of all students of the University of Canberra
Statistic a numerical characteristic of the sample corresponding to
the parameter that needs estimating
E.g. average weight of a sample of say 200 students

Random
selection

Paramete Inference Statistic


19
r
Types of Statistics
2 types of Statistics: descriptive statistics and inferential
statistics

Descriptive statistics: Methods of collecting, presenting and


summarising data. Using data gathered about a group to
draw conclusions about that same group.

Inferential statistics: Using data gathered from a sample to


draw conclusions (that is making inferences) about a
population.

Fundamental problem of Statistics: How reliable are figures


calculated from samples as estimates of population values?
20
Key messages this week you will
learn...
What is Statistics
What we need for Statistical Analysis
The nature of Statistical Data
The distinction between Population and Sample

The distinction between Parameter and Statistic

The 2 types of Statistics

The 2 approaches to collecting data

The distinction between Sample Surveys and Censuses

Sources of errors in quantitative data collection

What we mean by Sample Design

The methods of Random Sampling 21


How do we collect data?
2 broad approaches: qualitative approach and quantitative
approach

In this unit we shall deal only with the quantitative approach.

2 major types of quantitative approach:


Census i.e. collecting data in respect of every member of the
population of interest (E.g. Census Australia, conducted every 5
years by the ABS since 1911. Census 2016 was on 9 August 2016.)
Sample survey i.e. collecting data in respect of only some of the
target population but with the purpose of learning about the
whole of that population (E.g. Morgan Poll, Roy Morgan
Research, using sample surveys to ask public opinion on a particular
topic at a point in time. i.e. What party will you vote for at the next
election? Who is your preferred prime minister?) 22
Sample Surveys vs. Censuses
The sample survey has a number of advantages over the
census.
Sample surveys require less resources and are far less costly
than censuses.
Sample surveys are less time consuming and hence, results are
more timely.
In a sample survey, because only a small portion of the
population is involved, that portion can be studied intensively.
In certain contexts, data collection may involve destruction of
the individual from whom the data are collected, in which case,
a census is then out question. Can you think of a situation where
this can happen in practice?

23
Sources of errors in Censuses and
Sample Surveys
Sampling Errors
Suppose we want to find out the average weight of all students of
the University of Canberra.
We may do this by selecting a sample of say 200 students and
finding the average weight of our sample of students. What we get
is an estimate.
We may expect this estimate to be close to the true average weight
of all students. Can we expect this estimate to be exactly equal?
Differences between the estimates based on samples (statistics) and
the true population values (parameters) are called sampling
errors.

Censuses are not subject to sampling errors because they involve


complete enumeration.

24
Sources of errors in Censuses and
Sample Surveys (ctd)
Non-Sampling Errors
Both the census and the sample survey are subject to other errors
called non-sampling errors.

Important types of non-sampling errors are:


Errors of omission: omission occurs when individuals who belong
to the target population are forgotten or somehow not reached.
Non-response: non-response occurs when people contacted are
not at home or refuse to participate.
Interviewer bias: interviewer bias occurs when the responses
obtained are influenced by the interviewers.
Coder bias: coder bias may occur when answers to questions
which have been recorded verbatim by the interviewer are coded
in the office for the purposes of analysis.
25
Key messages this week you will
learn...
What is Statistics
What we need for Statistical Analysis
The nature of Statistical Data
The distinction between Population and Sample

The distinction between Parameter and Statistic

The 2 types of Statistics

The 2 approaches to collecting data

The distinction between Sample Surveys and Censuses

Sources of errors in quantitative data collection

What we mean by Sample Design

The methods of Random Sampling 26


Sample Design
Requirements of a good sample
Randomness
Representativeness

Definition of Random Selection


Random (or probability) sampling is a method of sampling where
every member of the target population has a known, non-zero
probability of selection.

Representativeness
The sample should capture the variation in the target population
in respect of the characteristic or characteristics under
study.
27
Sample Design (ctd)
Sampling Frames
A list from which we select a sample is called a sampling frame.

Methods of Random Sampling


Simple Random Sampling (SRS)
Systematic Sampling
Stratified Random Sampling
Cluster Sampling

28
Sample Design Simple Random Sampling (SRS)

Definition
SRS is a method of random selection where every subset of the
chosen sample size has the same chance of selection.

Application of SRS
We need a list of all members of the target population (sampling
frame).
This list is numbered serially (e.g. if the population consists of
1,000 individuals, we number them 1, 2, 3 etc., up to 1,000).
Suppose we need a sample of size 100. We select 100 random
numbers (lying between 1 and 1,000) using Excel.

Using Excel: Data > Data Analysis > Sampling


29
Sample Design Systematic Sampling

Given a sampling frame, it is possible to select a sample as


follows:
Suppose we have a list of N people and we wish to select a sample
of n from it.
Regard the N individuals as arranged in a circle.
Let k be the nearest integer to N/n.
Using Excel, we first select a random number between 1 and N.
This identifies our first selection.
We then select every kth individual after that first selection going
round the circle until n individuals have been selected.

Example
Suppose we have a population of 1,000 and we need a sample of 75.
Then k=13
30
Sample Design Systematic Sampling (ctd)

Example
Suppose we have a population of 1,000 and we need a sample of
75.
Then k=13

We select a random number (using Excel) between 1 and 1,000.


Suppose this number is 36.
Then the first selection into our sample is the 36 th individual from
the start of our (original) list.
The subsequent selections are 49, 62, 75, 88, , 998.

31
Sample Design Stratified Random Sampling

Stratified random sampling consists of dividing the target population


into sub groups called strata and taking a separate sample within each
stratum. The sample within each stratum is usually drawn by SRS.

Stratification i.e., the division of the target population into groups must
be on a criterion or on criteria relevant to the survey topic.
E.g., if we are stratifying by education level, we need to know every individuals
education level

The data prerequisites for implementation of stratified random


sampling are however more demanding than for SRS.

We need not only a list of every member of the target population


(sampling frame) but also in respect of each such member we need
information on the stratification criterion. 32
Sample Design Stratified Random Sampling (ctd)

How do we allocate the total sample to the various strata i.e.,


how many to sample from each stratum?
The simplest option is to share the total sample among the various
strata in proportion with their sizes.
E.g., a stratum which has 40% of the population gets 40% of the
sample, another which has 25% of the population gets 25% of the
sample and so on.
This is known as stratified random sampling with proportionate
allocation.

33
Sample Design Cluster Sampling

Sometimes populations occur or can be conveniently


divided into groups or clusters.
E.g, population of school children located in schools

This fact provides an alternative method of sampling


which consists of drawing up a list of clusters that together
comprise the whole population and then selecting a sample
of clusters. The latter can be done by SRS.

34
Key messages this week you will
learn...
What is Statistics
What we need for Statistical Analysis
The nature of Statistical Data
The distinction between Population and Sample

The distinction between Parameter and Statistic

The 2 types of Statistics

The 2 approaches to collecting data

The distinction between Sample Surveys and Censuses

Sources of errors in quantitative data collection

What we mean by Sample Design

The methods of Random Sampling 35


Further Reading

Basic Business Statistics 4, by Berenson et al.


Chapters 1 & 7

Please note:
The book is a reference only; its the lecture content
which dictates what you read in the book.

36
Next Week
Presenting data: Tables and Graphs

37

You might also like