You are on page 1of 53

Introdu

tion to Statisti al inferen e

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

1 / 34

Previous hapter

Previously...

Data analysis is the art of des ribing data using graphs and
numeri al summaries.
From the previous hapter... pie harts and bar graphs for ategori al
variables, histogram and s atterplots for quantitative variables; also
numeri al tools for des ribing the enter and variability of the
distribution of one variable.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

2 / 34

Previous hapter

Previously...

Data analysis is the art of des ribing data using graphs and
numeri al summaries.
From the previous hapter... pie harts and bar graphs for ategori al
variables, histogram and s atterplots for quantitative variables; also
numeri al tools for des ribing the enter and variability of the
distribution of one variable.
The purpose of the exploratory data analysis is to help understand the
most important ara teristi s of the data, that is sear hing for
interesting ara teristi s in the data.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

2 / 34

Previous hapter

Previously...

Data analysis is the art of des ribing data using graphs and
numeri al summaries.
From the previous hapter... pie harts and bar graphs for ategori al
variables, histogram and s atterplots for quantitative variables; also
numeri al tools for des ribing the enter and variability of the
distribution of one variable.
The purpose of the exploratory data analysis is to help understand the
most important ara teristi s of the data, that is sear hing for
interesting ara teristi s in the data.
Con lusions are informal, based on what we see in the data!

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

2 / 34

Introdu tion

Introdu tion
An important role of statisti s is to provide information on a
population based on data obtained from a sample .

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

3 / 34

Introdu tion

Introdu tion
An important role of statisti s is to provide information on a
population based on data obtained from a sample .

This is what is ommonly known as

C.Cordeiro

statisti al inferen e!

Chapter 3: Introdu tion to Statisti al inferen e

3 / 34

Introdu tion

Notation

By onvention, Greek letters are used to denote population parameters (parmetros) and sample statisti s (estatsti as) are denoted with
the equivalent lower ase Roman letters.

Mean
Varian e
Standard deviation

C.Cordeiro

Population

Sample

(parameters)

(statisti s)

x
s2
s

Chapter 3: Introdu tion to Statisti al inferen e

4 / 34

Introdu tion

Introdu tion
To make statisti al inferen es on samples, we usually make some
assumptions about the probability distribution underlying the data.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

5 / 34

Introdu tion

Introdu tion
To make statisti al inferen es on samples, we usually make some
assumptions about the probability distribution underlying the data.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

5 / 34

Introdu tion

Introdu tion
To make statisti al inferen es on samples, we usually make some
assumptions about the probability distribution underlying the data.

The on lusions of inferen e use the term probability , the


mathemati s of han e.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

5 / 34

Introdu tion

Introdu tion
To make statisti al inferen es on samples, we usually make some
assumptions about the probability distribution underlying the data.

The on lusions of inferen e use the term probability , the


mathemati s of han e.
Understanding probability is key to being able to analyse data to yield
meaningful and s ienti ally valid on lusions.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

5 / 34

Probability and Distributions

Probability

The on epts of

C.Cordeiro

randomness

and

probability

are important in statisti s.

Chapter 3: Introdu tion to Statisti al inferen e

6 / 34

Probability and Distributions

Probability

The on epts of
Demo ritus:

C.Cordeiro

randomness

and

probability

are important in statisti s.

Everything existing in the universe is the fruit of han e.

Chapter 3: Introdu tion to Statisti al inferen e

6 / 34

Probability and Distributions

Probability

The on epts of
Demo ritus:

randomness

and

probability

are important in statisti s.

Everything existing in the universe is the fruit of han e.

Probability (Probabilidade) fo uses on the des ription and


quanti ation of han e or random events.
Probability is a mathemati al language that allows us to make
statisti al statements and analyze data.
The view of data as something oming from a probability distribution is vital
to understanding statisti al methods.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

6 / 34

Standard probability distributions

Introdu tion
To make statisti al inferen es on samples, we usually make some
assumptions about the probability distribution underlying the data.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

7 / 34

Standard probability distributions

Introdu tion
To make statisti al inferen es on samples, we usually make some
assumptions about the probability distribution underlying the data.

Example: The bell shape in the above graph has a pre ise
mathemati al des ription.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

7 / 34

Standard probability distributions

Introdu tion
To make statisti al inferen es on samples, we usually make some
assumptions about the probability distribution underlying the data.

Example: The bell shape in the above graph has a pre ise
mathemati al des ription.
The mathemati theory underlying probability distributions requires a
distin tion to be made between

discrete

and

continuous

random

variables.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

7 / 34

Standard probability distributions

Random variables

A random variable is a variable whose value is a numeri al out ome of


a random phenomenon.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

8 / 34

Standard probability distributions

Random variables

A random variable is a variable whose value is a numeri al out ome of


a random phenomenon.
The probability distribution of a random variable
values

C.Cordeiro

tells us what

an take and how to assign probabilities to those values.

Chapter 3: Introdu tion to Statisti al inferen e

8 / 34

Standard probability distributions

Random variables

A random variable is a variable whose value is a numeri al out ome of


a random phenomenon.
The probability distribution of a random variable
values

Notation:
Y.

C.Cordeiro

tells us what

an take and how to assign probabilities to those values.


random variables are denote by apital letters, su h as

Chapter 3: Introdu tion to Statisti al inferen e

or

8 / 34

Standard probability distributions

Random variables

A random variable is a variable whose value is a numeri al out ome of


a random phenomenon.
The probability distribution of a random variable
values

tells us what

Notation:
Y.

an take and how to assign probabilities to those values.


random variables are denote by apital letters, su h as

or

The two main types of variables, orreponds to the two types of


probability models:

C.Cordeiro

discrete

and

continuous .

Chapter 3: Introdu tion to Statisti al inferen e

8 / 34

Standard probability distributions

Random variables

A random variable is a variable whose value is a numeri al out ome of


a random phenomenon.
The probability distribution of a random variable
values

tells us what

Notation:
Y.

an take and how to assign probabilities to those values.


random variables are denote by apital letters, su h as

or

The two main types of variables, orreponds to the two types of


probability models:

discrete

and

continuous .

A dis rete random variable has a nite list of possible out omes.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

8 / 34

Standard probability distributions

Random variables

A random variable is a variable whose value is a numeri al out ome of


a random phenomenon.
The probability distribution of a random variable
values

tells us what

Notation:
Y.

an take and how to assign probabilities to those values.


random variables are denote by apital letters, su h as

or

The two main types of variables, orreponds to the two types of


probability models:

discrete

and

continuous .

A dis rete random variable has a nite list of possible out omes.
A ontinuous random variable an take any value in an interval, with
probabilities given as areas under the density urve.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

8 / 34

Standard probability distributions

Dis rete distributions

Dis rete distributions

Univariate dis rete distributions are standard probability models that uses a
dis rete random variable to dene the out omes of an experiment.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

9 / 34

Standard probability distributions

Dis rete distributions

Dis rete distributions

Univariate dis rete distributions are standard probability models that uses a
dis rete random variable to dene the out omes of an experiment.
The sum of the individual probabilities for independent events equals

one.

Presented here are two models frequently used in analyzing biologi al data:
Binomial
Poisson

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

9 / 34

Standard probability distributions

Dis rete distributions

Binomial
A Bernoulli random variable arises in an experiment where there are only
two out omes, generally referred to as su ess (su esso) and failure (insu esso).
For the su ess out ome the value of the random variable is assigned the
value 1, and for the failure out ome the value of the random variable is
assigned the value 0.
The probability of su ess is a value
The probability of failure is 1

C.Cordeiro

p,

a proportion between 0 and 1.

p.

Chapter 3: Introdu tion to Statisti al inferen e

10 / 34

Standard probability distributions

Dis rete distributions

Binomial
A Bernoulli random variable arises in an experiment where there are only
two out omes, generally referred to as su ess (su esso) and failure (insu esso).
For the su ess out ome the value of the random variable is assigned the
value 1, and for the failure out ome the value of the random variable is
assigned the value 0.
The probability of su ess is a value
The probability of failure is 1
Considering

p,

a proportion between 0 and 1.

p.

n trials, the probability fun tion (funo massa de probabilidade)


n trials)=P(X=k)=Ckn p k (1 p)nk .

is dened as P(k su esses in

X has a Binomial distribution that is

=np
C.Cordeiro

X Bi (n, p),

2 = n p (1 p)

Chapter 3: Introdu tion to Statisti al inferen e

10 / 34

Standard probability distributions

Dis rete distributions

Example

Let's onsider the ase of having a hild and use a Bernoulli random variable
to represent whether the hild has blue eyes. Assume that the probability of
the hild having blue eyes is 0.16 and this is the su ess out ome.
Consider that you have 10 hildren and you want to know the probability
a) that 0 out of the 10 have blue eyes?
b) that 3 have blue eyes?
) that less than 3 have blue eyes?

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

11 / 34

Standard probability distributions

Dis rete distributions

Poisson
The Poisson is used to model the ounts of events o urring randomly in
spa e or time.
Examples:

number of typing errors on a page, the number of ells in a

mi ros ope eld of view, number of seeds taken by a bird per minute, number
of hura anes in one year, et ..
X has a Poisson distribution that is

X P(),

P(X = x) =
=

and

e x
x!

2 = .

A Poisson variable an take any integer value between zero and innity
be ause the number of trials is not xed.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

12 / 34

Standard probability distributions

Dis rete distributions

Let's work!
1

Sabe-se que 10% da populao de uma idade tem uma determinada


doena. Considerando uma amostra de 5 pessoas dessa idade
es olhida ao a aso, determine:

a) Represente a funo massa de probabilidade gra amente.


b) A probabilidade de 4 de 5 pessoas, es olhidas ao a aso dessa idade,
terem a doena?
) A probabilidade de no mximo 3 terem a doena?
d) A probabilidade de pelo menos 3 terem a doena?
e) Cal ule P(0<X<4)?
f) Qual o valor mdio? E a varin ia?
2

Seja X uma varivel om distribuio Poisson om

2 = 4.

Determine:

a) P(X=3)
b) P(X<5)

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

13 / 34

Standard probability distributions

Dis rete distributions

Let's work!
3

O nmero de avistamentos de baleias ao largo dos aores, num


perodo de 1 ano, em mdia 30.

a)
b)
)
d)
4

Qual a distribuio de probabilidade adequada situao des rita.


Dena a varivel aleatria e a sua distribuio.
Qual a probabilidade de se registarem mais de 40 avistamentos num ano?
Qual a probabilidade de em 6 meses se registarem no mximo 20
avistamentos?

Dada uma distribuio Bi(10, 0.4), al ule as seguintes probabilidades:

a) Represente gra amente a funo de probabilidade. Qual o nmero mais


provvel de su essos?
b) P(X 8)
) P(2 < X 5)
d) P(X 7)

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

14 / 34

Standard probability distributions

Dis rete distributions

Let's work!

Uma regio dos Estados Unidos afetada, em mdia, por 6 tornados


por ano. En ontre a probabilidade de num determinado ano essa rea
seja afetada por:

a)
b)
)
d)
e)

Dois tornados.
Menos de 4 tornados.
Pelo menos 4 tornados.
Entre 6 a 8 tornados.
Represente gra amente a funo de probabilidade para a varivel em
estudo.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

15 / 34

Standard probability distributions

Dis rete distributions

Let's work!

Uma empresa ao largo dos Aores faz, diariamente, passeios tursti os


para observar baleias/golnhos. Na sua bro hura a empresa apresenta
uma garantia de 85% em omo so avistadas baleias ou golnhos.
Considere a v.a. X que representa o nmero de passeios em que foram
avistadas baleias/golnhos, em 20.

a) Identique a distribuio de probabilidade da varivel aleatria X.


b) Qual a probabilidade de terem sido avistados baleias/golnhos em 15
passeios feitos pela empresa?
) Qual a probabilidade de terem sido avistadas baleias/golnhos em pelo
menos 15 passeios?
d) Qual a probabilidade de que o nmero de passeios em que foram
avistadas baleias/golnhos seja no mnimo 15 e no mximo 20 passeios?

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

16 / 34

Standard probability distributions

Dis rete distributions

Let's work!

Uma empresa ao largo dos Aores faz, diariamente, passeios tursti os


para observar baleias. Na sua bro hura a empresa arma que so
avistados, em mdia, 5 et eos por dia.

a)
b)
)
d)

Identique a distribuio de probabilidade da varivel aleatria X.


Qual a probabilidade de serem avistados 10 destes et eos?
Qual a probabilidade de terem sido avistadas mais de 5 baleias, num dia?
Determine P(8 X < 14)

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

17 / 34

Standard probability distributions

Continuous distributions

Some data arise from measurements on an essentially ontinuous s ale, for


instan e temperature, on entrations, et .
In pra ti e they will be re orded to a nite pre ision but have a omponent
of random variation, whi h makes them less than perfe tly reprodu ible.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

18 / 34

Standard probability distributions

Continuous distributions

Some data arise from measurements on an essentially ontinuous s ale, for


instan e temperature, on entrations, et .
In pra ti e they will be re orded to a nite pre ision but have a omponent
of random variation, whi h makes them less than perfe tly reprodu ible.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

18 / 34

Standard probability distributions

Continuous distributions

Normal
The normal distribution (also known as the Gaussion distribution) has a
probability density fun tion (funo densidade):

f (x) =
dependending on its mean

1
2

(x )2
)
2 2

and standard deviation

X has a normal distribution that is

C.Cordeiro

exp(

X N(, ).

Chapter 3: Introdu tion to Statisti al inferen e

19 / 34

Standard probability distributions

Modifying

C.Cordeiro

and

Continuous distributions

simply translates and widens the distribution.

Chapter 3: Introdu tion to Statisti al inferen e

20 / 34

Standard probability distributions

Continuous distributions

The standard deviation is the squared root of the varian e. Indi ates how
lose the data is to the mean.
Assuming a normal distribution:
68% of the values are within 1
sd(.99)
95% within 2 sd(1.96)
99% within 3 sd(2.58).

Example: Consider the amount of time students spend in games. Assume


that the data are normally distributed with a mean of 14.25 hours per week
and a standard deviation of 2.1 hours.
Based on the information above, we an qui kly determine that aproximately
68% of the students spend between 12.15 hours and 16.35 hours per week
playing games.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

21 / 34

Standard probability distributions

Continuous distributions

Examples

Ex1 How mu h area under the urve is above the Z value of 1.44?
Ex2 How mu h area under the urve is below the Z value of -2.13?
Ex3 How mu h area under the urve is between Z value of -1.96
and 1.96?
Ex4 The pulse rates for a ertain population follow a normal
distribution with a mean of 70 per minute and s.d. 5. What
per ent of this distribution that is in between 60 to 80 per
minute?

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

22 / 34

Standard probability distributions

Continuous distributions

Let's work!
8

Z N(0.1), determine:
a) P(Z 2.2)
b) P(1 < Z 2)
) P(Z>2.5)

Seja

Na populao de ursos pardos, os ma hos tm pesos que so


aproximadamente N(350 kg, 75 kg).
Ao sele ionar um urso ao a aso, qual a probabilidade de pesar mais de
450 kg?

10

Um estudo demonstrou que o tamanho dos ps das japonesas seguem


uma distribuio normal om valor mdio 24.9 m e desvio padro
1.05 m.
Qual a probabilidade de um p sele ionado ao a aso ser inferior a 26
m?

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

23 / 34

Standard probability distributions

Continuous distributions

Let's work!

11

O peso (em gramas) de uma aixa de ereais tem uma distribuio


N(120,5). Qual a probabilidade de uma aixa de ereais, es olhida ao
a aso, pesar no mximo 107 gramas?

12

Considere uma distribuio normal om valor mdio 3 e varin ia 9.


Cal ule:

a) P(2 X 5)
b) P(X 3)
) P(X 2)
13

Na populao de ursos polares, o peso dos ma hos segue uma


distribuio aproximadamente normal om peso mdio de 350 kg e

varin ia 752 kg . Ao sele ionar um urso ao a aso, qual a


probabilidade de pesar entre 425kg e 450 kg?

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

24 / 34

Standard probability distributions

Continuous distributions

Let's work!

14

A mos a da fruta (Drosophila melanogaster) muito utilizada em


estudos de genti a porque pequena, res e depressa e reproduz-se
rapidamente. O omprimento do trax numa populao de mos as
segue tem uma distribuio aproximadamente Normal om valor
mdio 0.8 milmetros (mm) e desvio padro 0.078 mm.

a) Qual a probabilidade de uma mos a ter um trax om omprimento


inferior a 0.6 mm?
b) Qual a probabilidade de uma mos a ter um trax om omprimento de
pelo menos 0.9 mm?
) Qual a probabilidade de uma mos a ter um trax om omprimento
entre 0.5 e 0.8 mm?

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

25 / 34

Standard probability distributions

Continuous distributions

Transformation-Standard normal

One of the tri ks with the normal distribution is that it is easily standardized
to a standard s ale.
If X is a ontinuous random variable with mean

and standard deviation it

an be standardized by transforming X to Z where Z is a normally distributed


variable with mean 0 and standard deviation 1 (Z-s ores):

Z=

Note that

C.Cordeiro

Z N(0, 1).

Chapter 3: Introdu tion to Statisti al inferen e

26 / 34

Standard probability distributions

Continuous distributions

Normal approximation
It is a very important distribution in statisti al models, when it is ommonly
used to des ribe error variation.
It also omes up as an approximating distribution in several ontexts; for
instan es, the binomial distribution for large samples sizes an be well approximated by a suitably s aled normal distribution.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

27 / 34

Standard probability distributions

Continuous distributions

0.20

Prob

0.00

0.10

0.4
0.2
0.0

Prob

0.6

0.30

Binomial to Normal approximation

10

10

10

0.20

Prob

0.00

0.10

0.20
0.10

Prob

p=0.2

0.00
0

p=0.5

C.Cordeiro

0.30

p=0.05

10

p=0.8

Chapter 3: Introdu tion to Statisti al inferen e

28 / 34

Standard probability distributions

Continuous distributions

Poisson to Normal approximation

Lambda 1

0.0

0.1

0.2

Prob

0.4
0.2
0.0

Prob

0.3

0.6

Lambda 0.5

10

10

10

0.00

0.10

Prob

Prob

0.10
0.00
0

C.Cordeiro

Lambda 5

0.20

Lambda 2

10

Chapter 3: Introdu tion to Statisti al inferen e

29 / 34

Standard probability distributions

Continuous distributions

Normal t?

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

30 / 34

Standard probability distributions

Continuous distributions

Normal inspe tion


To he k whether data are normally
distributed, a plot alled a Q-Q plot
is used.
90

The Q-Q plot ompares the sample


distribution against a normal

sample data set an be assumed to

70

within the dotted line boundaries, the

60

If all of the points on this plot are

waiting

80

distribution.

Otherwise, one an assume the set


does not follow a normal distribution.

50

be from a normal distribution.


127
265
3

norm quantiles

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

31 / 34

Standard probability distributions

Example

Suppose you are measuring survival times of an enzyme in a solution (as


measured by some kind of assay for enzyme a tivity) and you get the following
data in hours: 4.75, 3.4, 1.8, 2.9, 2.2, 2.4, 5.8, 2.6, 2.4, and 5.25.
How ould you de ide on a probability model to model the probability of the
enzyme surviving in solution?

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

32 / 34

Standard probability distributions

Let's work!

15

Para a varivel

sat.v

dos dados

stud.recs

(UsingR), averigue se estes

seguem uma distribuio normal.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

33 / 34

Standard probability distributions

Let's work!

15

Para a varivel

sat.v

dos dados

stud.recs

(UsingR), averigue se estes

seguem uma distribuio normal.

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

33 / 34

Standard probability distributions

Pro essing the data...

A random variable will have an


asso iated probability distribution:
dis rete variables: binomial and
Poisson
ontinuous variable: normal

C.Cordeiro

Chapter 3: Introdu tion to Statisti al inferen e

34 / 34

You might also like