You are on page 1of 50

Business Statistics

Business Statistics

Business Process
Value Addition

Input
Input
Input

Output

Data Driven Decisions


Google is a company in which fact-based decision-making is part of the DNA
and where Googlers (that is what Google calls its employees) speak the
language of data as part of their culture. In Google the aim is that all
decisions are based on data, analytics and scientific experimentation.

Mission

Organize the worlds


information and make it
universally accessible
and useful

Fact-based Decision-Making
at Google
What Data to Use?
{Do managers actually matter?}-HR Case
performance reviews (top down review of managers) & employee survey
(bottom up review of managers).

Review of Managers
6
5
4
3
2
1
0
Category 1

Category 2

Category 3

Category 4

Analytics
Top Quartile

Bottom Quartile
Team productivity-Employee happiness-Employee turnover

5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0

Category 1

Category 2

Category 3

Category 4

New Data Collection


A Great Managers Award
Interviews

Insights
Top 8 behaviors of a high scoring manager:
Is a good coach
Empowers the team and does not micromanage
Expresses interest / concern for team members success and personal wellbeing
Is productive and results-orientated
Is a good communicator listens and shares information
Helps with career development
Has a clear vision / strategy for the team
Has important technical skills that help him / her advice the team

Insights
In addition to the eight behaviors they have identified for a good
manager, they also narrowed down on the top 3 causes why
managers are struggling in their role:
Has a tough transition (e.g. suddenly promoted, hired from outside with little
training)
Lacks a consistent philosophy / approach to performance management and
career development
Spends too little time on managing and communicating.

Using the Insights


Google started to measure people against these behaviors. For that purpose
it introduced a new twice-yearly feedback survey
Google decided to continue with the Great Manager Award
Google revised the management training

Good Decision-Making
Good Data and Facts
1. Defining the objectives and information needs: Do managers matter? and
What makes a good manager within Google?
2. Collecting the right data: using existing data from performance reviews
and employee surveys and creating new data sets from the award
nominations and managers interviews.
3. Analyzing the data and turning it into insights: simply plotting of the
results, regression analysis and text analysis.
4. Presenting the Information: new communications to the managers
5. Making evidence-based decisions: revising the training, measuring
performance in line with the findings, introducing new feedback
mechanisms

5 Big Ways Companies Could


Use Big Data to Grow
Business

Understanding customers

Optimize processes

Create opportunities

Customer Relationship
Management (CRM)

Improve security

Introduction To Statistical
Science
Statistics is the science that relates data to specific questions of interest.
This includes devising methods to gather data relevant to the question
Methods to summarize and display the data to shed light on the question
Methods that enable us to draw answers to the question that are supported
by the data.

Data
Data almost always contain uncertainty.
This uncertainty may arise from selection of the items to be measured.
It may arise from variability of the measurement process.

Statistical Inferences
Methods and tools for taking decision despite the uncertainty in the data.
Is the basis for increasing knowledge about the world.
Is the basis for all rational scientific inquiry.

Showing a Causal Relationship


from Data
X

Z
Our first goal is to determine which of the possible reasons for the association
holds. If we conclude that it is due to a causal effect, then our next goal is to
determine the size of the effect. If we conclude that the association is due to
causal effect confounded with the effect of a lurking variable, then our next
goal becomes determining the sizes of both the effects.

Turning data into actionable intelligence means developing the


capability to look forward, inform and optimize decision making
Data and reporting allows the business to view past and current results. By adding Advanced Analytics,
foresight is applied to future decision-making returning more value to the business.
Data Harmonization
& Standardizations

Data

Information
Visualization Tools

Reports
Data Analysis
Tools

Use appropriate
Information Delivery

Insight
Application of
Business Drivers

Business Value

t$
$

Empowered
Decision-Making

Constant Optimization

BW/Operational Reporting

Business Intelligence / Business Analytics

Looking back

Looking forward
Discover & simulate

Understand
What happened?
Analyze
Key Performance Indicators
(KPIs)

Present

Slice & dice


Predict

What will happen?


Optimize
Key Performance
Predictors (KPPs)

Business Goals
Your competitive edge is a data-driven culture.

AKey Performance Indicator (KPI)is a measurable value that


demonstrates how effectively a company is achieving key business
objectives.
Organizations use KPIs to evaluate their success at reaching targets.
Business Metrics
A Business Metric is a quantifiable measure that is used to track and assess the status of a
specific business process. Every area of business has specific metrics that should be
monitored marketers track campaign and program statistics, sales teams monitor new
opportunities and leads, and executives look at big picture financial metrics.

Sales Growth
SalesKPI

Analyze the pace at which your


sales revenue is increasing or
decreasing.

Sales Opportunities

Organize prospects based on


opportunity value and probability.

Product Performance
Rank products based on revenue
performance.

Average Profit Margin

ut how Klipfolio helps Gary run his business

SIGN IN
TRY IT NOW
HOW IT WORKS
INTEGRATIONS
BLOG
PRICING
MORE

Types of Data Exploratory techniques


Descriptive
Exploratory
Inferential
Predictive
Causal
Mechanistic

Descriptive(least amount of effort):The discipline of quantitatively


describing the main features of a collection of data. In essence, it
describes a set of data.
Typically the first kind of data analysis performed on a data set
Commonly applied to large volumes of data, such as census data
-The description and interpretation processes are different steps
Univariate and Bivariate are two types of statistical descriptive analyses.
Type of data set applied to:Census Data Set a whole population
Example: Census Data

Univariate Analysis
Univariate analysis involves the examination across cases of one
variable at a time. There are three major characteristics of a single
variable that we tend to look at:
the distribution
the central tendency
the dispersion

The Distribution

Table 1. Frequency distribution


table.
Table 2. Frequency distribution bar
chart.
This type of graph is often
referred to as ahistogramorbar chart.

Central Tendency.
The

central tendency of a distribution is an estimate of the "center" of a


distribution of values. There are three major types of estimates of
central tendency:
Mean
Median
Mode

Dispersion
Dispersion refers to the spread of the values around the central tendency. There
are two common measures of dispersion, the range and the standard deviation.
Therange

TheStandard Deviationis a more accurate and detailed estimate of dispersion


because an outlier can greatly exaggerate the range

approximately 68% of the scores in the sample fall within one standard deviation
of the mean

approximately 95% of the scores in the sample fall within two standard deviations
of the mean

approximately 99% of the scores in the sample fall within three standard
deviations of the mean

Descriptive Statistics
15,20,21,20,36,15,25,15

Mean

20.8750

Median

20.0000

Mode
Std.
Deviation
Variance
Range

15.00
7.0799
50.1250
21.00

Bivariate Analysis
Correlation
The correlation is one of the most common and most useful statistics. A correlation is
a single number that describes the degree of relationship between two variables.
Person

Height

Self Esteem

68

4.1

71

4.6

62

3.8

75

4.4

58

3.2

60

3.1

67

3.8

68

4.1

71

4.3

10

69

3.7

11

68

3.5

12

67

3.2

13

63

3.7

14

62

3.3

15

60

3.4

16

63

4.0

17

65

4.1

18

67

3.8

Histogram

Histogram

Two Variable Plot

Exploratory: An approach to analyzing data sets tofind


previously unknown relationships.

Exploratory models are good for discovering new connections


They are also useful for defining future studies/questions
Exploratory analyses are usually not the definitive answer to the question at hand,
but only the start
Exploratory analyses alone should not be used for generalizing and/or predicting
Remember: correlation does not imply causation
Type of data set applied to:Census and Convenience Sample Data Set (typically nonuniform) a random sample with many variables measured

Findings from EDA are often


orthogonal to the primary analysis
task
Example
The analysis task is to find the variables which best predict the tip
that a dining party will give to the waiter. The variables available are
tip,
total bill,
gender,
smoking status,
time of day,
day of the week
size of the party

Histogram of tips given by customers with bins equal to $1


increments. Distribution of values is skewed right and unimodal,
which says that there are few high tips, but lots of low tips

Histogram of tips given by customers with bins equal to 10c


increments. An interesting phenomenon is visible, peaks in the
counts at the full and half-dollar amounts. This corresponds to
customers rounding tips.

Scatterplot of tips vs bill. We would expect to see a tight positive linear


association, but instead see a lot more variation. In particular, there are more
points in the lower right than upper left. Points in the lower right correspond to
tips that are lower than expected, and it is clear that more customers are cheap
rather than generous

smoking party.
Smoking parties
have a lot more
variability in the
tips that they
give. Males tend
to pay the (few)
higher bills, and
female nonsmokers tend to
be very
consistent
tippers (with the
exception of
three women).

Inferential: Inferential statistics are techniques that allow

us to use the samples to make generalizations about the


populations from which the samples were drawn. That is,
use a relatively small sample of data to say something about
a bigger population
It is important that the sample accurately represents the population.
The process of achieving this is called sampling .
Inferential statistics arise out of the fact that sampling naturally incurs
sampling error and thus a sample is not expected to perfectly represent
the population.
The methods of inferential statistics are
the estimation of parameter
testing of statistical hypotheses.

analyzecurrent and historical facts to make


predictions about future events. In essence, to use
the data on some objects to predict values for
another object.
The models predicts, but it does not mean that the independent variables cause
Accurate prediction depends heavily on measuring the right variables
Although there are better and worse prediction models, more data and a simple
model works really well
Prediction is very hard, especially about the future references
Type of data set applied to:Prediction Study Data Set a training and test data set
from the same population
Example: Predictive Analysis

Causal:To find out what happens to one variable


when you change another.
Implementation usually requires randomized studies
There are approaches to inferring causation in non-randomized studies
Causal models are said to be the gold standard for data analysis
Type of data set applied to:Randomized Trial Data Set data from a randomized
study
Example: Causal Analysis

Mechanistic(most amount of effort):Understand the exact


changes in variables that lead to changes in other variables for
individual objects.
Incredibly hard to infer, except in simple situations
Usually modeled by a deterministic set of equations (physical/engineering science)
Generally the random component of the data is measurement error
If the equations are known but the parameters are not, they may be inferred with
data analysis
Type of data set applied to:Randomized Trial Data Set data about all components
of the system

You might also like