Professional Documents
Culture Documents
C.Cordeiro
1 / 34
Previous hapter
Previously...
Data analysis is the art of des
ribing data using graphs and
numeri
al summaries.
From the previous
hapter... pie
harts and bar graphs for
ategori
al
variables, histogram and s
atterplots for quantitative variables; also
numeri
al tools for des
ribing the
enter and variability of the
distribution of one variable.
C.Cordeiro
2 / 34
Previous hapter
Previously...
Data analysis is the art of des
ribing data using graphs and
numeri
al summaries.
From the previous
hapter... pie
harts and bar graphs for
ategori
al
variables, histogram and s
atterplots for quantitative variables; also
numeri
al tools for des
ribing the
enter and variability of the
distribution of one variable.
The purpose of the exploratory data analysis is to help understand the
most important
ara
teristi
s of the data, that is sear
hing for
interesting
ara
teristi
s in the data.
C.Cordeiro
2 / 34
Previous hapter
Previously...
Data analysis is the art of des
ribing data using graphs and
numeri
al summaries.
From the previous
hapter... pie
harts and bar graphs for
ategori
al
variables, histogram and s
atterplots for quantitative variables; also
numeri
al tools for des
ribing the
enter and variability of the
distribution of one variable.
The purpose of the exploratory data analysis is to help understand the
most important
ara
teristi
s of the data, that is sear
hing for
interesting
ara
teristi
s in the data.
Con
lusions are informal, based on what we see in the data!
C.Cordeiro
2 / 34
Introdu tion
Introdu
tion
An important role of statisti
s is to provide information on a
population based on data obtained from a sample .
C.Cordeiro
3 / 34
Introdu tion
Introdu
tion
An important role of statisti
s is to provide information on a
population based on data obtained from a sample .
C.Cordeiro
statisti al inferen e!
3 / 34
Introdu tion
Notation
By
onvention, Greek letters are used to denote population parameters (parmetros) and sample statisti
s (estatsti
as) are denoted with
the equivalent lower
ase Roman letters.
Mean
Varian
e
Standard deviation
C.Cordeiro
Population
Sample
(parameters)
(statisti s)
x
s2
s
4 / 34
Introdu tion
Introdu
tion
To make statisti
al inferen
es on samples, we usually make some
assumptions about the probability distribution underlying the data.
C.Cordeiro
5 / 34
Introdu tion
Introdu
tion
To make statisti
al inferen
es on samples, we usually make some
assumptions about the probability distribution underlying the data.
C.Cordeiro
5 / 34
Introdu tion
Introdu
tion
To make statisti
al inferen
es on samples, we usually make some
assumptions about the probability distribution underlying the data.
C.Cordeiro
5 / 34
Introdu tion
Introdu
tion
To make statisti
al inferen
es on samples, we usually make some
assumptions about the probability distribution underlying the data.
C.Cordeiro
5 / 34
Probability
The on epts of
C.Cordeiro
randomness
and
probability
6 / 34
Probability
The
on
epts of
Demo
ritus:
C.Cordeiro
randomness
and
probability
6 / 34
Probability
The
on
epts of
Demo
ritus:
randomness
and
probability
C.Cordeiro
6 / 34
Introdu
tion
To make statisti
al inferen
es on samples, we usually make some
assumptions about the probability distribution underlying the data.
C.Cordeiro
7 / 34
Introdu
tion
To make statisti
al inferen
es on samples, we usually make some
assumptions about the probability distribution underlying the data.
Example: The bell shape in the above graph has a pre
ise
mathemati
al des
ription.
C.Cordeiro
7 / 34
Introdu
tion
To make statisti
al inferen
es on samples, we usually make some
assumptions about the probability distribution underlying the data.
Example: The bell shape in the above graph has a pre
ise
mathemati
al des
ription.
The mathemati
theory underlying probability distributions requires a
distin
tion to be made between
discrete
and
continuous
random
variables.
C.Cordeiro
7 / 34
Random variables
C.Cordeiro
8 / 34
Random variables
C.Cordeiro
tells us what
8 / 34
Random variables
Notation:
Y.
C.Cordeiro
tells us what
or
8 / 34
Random variables
tells us what
Notation:
Y.
or
C.Cordeiro
discrete
and
continuous .
8 / 34
Random variables
tells us what
Notation:
Y.
or
discrete
and
continuous .
A dis rete random variable has a nite list of possible out omes.
C.Cordeiro
8 / 34
Random variables
tells us what
Notation:
Y.
or
discrete
and
continuous .
A dis
rete random variable has a nite list of possible out
omes.
A
ontinuous random variable
an take any value in an interval, with
probabilities given as areas under the density
urve.
C.Cordeiro
8 / 34
Univariate dis
rete distributions are standard probability models that uses a
dis
rete random variable to dene the out
omes of an experiment.
C.Cordeiro
9 / 34
Univariate dis
rete distributions are standard probability models that uses a
dis
rete random variable to dene the out
omes of an experiment.
The sum of the individual probabilities for independent events equals
one.
Presented here are two models frequently used in analyzing biologi
al data:
Binomial
Poisson
C.Cordeiro
9 / 34
Binomial
A Bernoulli random variable arises in an experiment where there are only
two out
omes, generally referred to as su
ess (su
esso) and failure (insu
esso).
For the su
ess out
ome the value of the random variable is assigned the
value 1, and for the failure out
ome the value of the random variable is
assigned the value 0.
The probability of su
ess is a value
The probability of failure is 1
C.Cordeiro
p,
p.
10 / 34
Binomial
A Bernoulli random variable arises in an experiment where there are only
two out
omes, generally referred to as su
ess (su
esso) and failure (insu
esso).
For the su
ess out
ome the value of the random variable is assigned the
value 1, and for the failure out
ome the value of the random variable is
assigned the value 0.
The probability of su
ess is a value
The probability of failure is 1
Considering
p,
p.
=np
C.Cordeiro
X Bi (n, p),
2 = n p (1 p)
10 / 34
Example
Let's
onsider the
ase of having a
hild and use a Bernoulli random variable
to represent whether the
hild has blue eyes. Assume that the probability of
the
hild having blue eyes is 0.16 and this is the su
ess out
ome.
Consider that you have 10
hildren and you want to know the probability
a) that 0 out of the 10 have blue eyes?
b) that 3 have blue eyes?
) that less than 3 have blue eyes?
C.Cordeiro
11 / 34
Poisson
The Poisson is used to model the
ounts of events o
urring randomly in
spa
e or time.
Examples:
mi
ros
ope eld of view, number of seeds taken by a bird per minute, number
of hura
anes in one year, et
..
X has a Poisson distribution that is
X P(),
P(X = x) =
=
and
e x
x!
2 = .
A Poisson variable
an take any integer value between zero and innity
be
ause the number of trials is not xed.
C.Cordeiro
12 / 34
Let's work!
1
2 = 4.
Determine:
a) P(X=3)
b) P(X<5)
C.Cordeiro
13 / 34
Let's work!
3
a)
b)
)
d)
4
C.Cordeiro
14 / 34
Let's work!
a)
b)
)
d)
e)
Dois tornados.
Menos de 4 tornados.
Pelo menos 4 tornados.
Entre 6 a 8 tornados.
Represente gra
amente a funo de probabilidade para a varivel em
estudo.
C.Cordeiro
15 / 34
Let's work!
C.Cordeiro
16 / 34
Let's work!
a)
b)
)
d)
C.Cordeiro
17 / 34
Continuous distributions
C.Cordeiro
18 / 34
Continuous distributions
C.Cordeiro
18 / 34
Continuous distributions
Normal
The normal distribution (also known as the Gaussion distribution) has a
probability density fun
tion (funo densidade):
f (x) =
dependending on its mean
1
2
(x )2
)
2 2
C.Cordeiro
exp(
X N(, ).
19 / 34
Modifying
C.Cordeiro
and
Continuous distributions
20 / 34
Continuous distributions
The standard deviation is the squared root of the varian
e. Indi
ates how
lose the data is to the mean.
Assuming a normal distribution:
68% of the values are within 1
sd(.99)
95% within 2 sd(1.96)
99% within 3 sd(2.58).
C.Cordeiro
21 / 34
Continuous distributions
Examples
Ex1 How mu
h area under the
urve is above the Z value of 1.44?
Ex2 How mu
h area under the
urve is below the Z value of -2.13?
Ex3 How mu
h area under the
urve is between Z value of -1.96
and 1.96?
Ex4 The pulse rates for a
ertain population follow a normal
distribution with a mean of 70 per minute and s.d. 5. What
per
ent of this distribution that is in between 60 to 80 per
minute?
C.Cordeiro
22 / 34
Continuous distributions
Let's work!
8
Z N(0.1), determine:
a) P(Z 2.2)
b) P(1 < Z 2)
) P(Z>2.5)
Seja
10
C.Cordeiro
23 / 34
Continuous distributions
Let's work!
11
12
a) P(2 X 5)
b) P(X 3)
) P(X 2)
13
C.Cordeiro
24 / 34
Continuous distributions
Let's work!
14
C.Cordeiro
25 / 34
Continuous distributions
Transformation-Standard normal
One of the tri
ks with the normal distribution is that it is easily standardized
to a standard s
ale.
If X is a
ontinuous random variable with mean
Z=
Note that
C.Cordeiro
Z N(0, 1).
26 / 34
Continuous distributions
Normal approximation
It is a very important distribution in statisti
al models, when it is
ommonly
used to des
ribe error variation.
It also
omes up as an approximating distribution in several
ontexts; for
instan
es, the binomial distribution for large samples sizes
an be well approximated by a suitably s
aled normal distribution.
C.Cordeiro
27 / 34
Continuous distributions
0.20
Prob
0.00
0.10
0.4
0.2
0.0
Prob
0.6
0.30
10
10
10
0.20
Prob
0.00
0.10
0.20
0.10
Prob
p=0.2
0.00
0
p=0.5
C.Cordeiro
0.30
p=0.05
10
p=0.8
28 / 34
Continuous distributions
Lambda 1
0.0
0.1
0.2
Prob
0.4
0.2
0.0
Prob
0.3
0.6
Lambda 0.5
10
10
10
0.00
0.10
Prob
Prob
0.10
0.00
0
C.Cordeiro
Lambda 5
0.20
Lambda 2
10
29 / 34
Continuous distributions
Normal t?
C.Cordeiro
30 / 34
Continuous distributions
70
60
waiting
80
distribution.
50
norm quantiles
C.Cordeiro
31 / 34
Example
C.Cordeiro
32 / 34
Let's work!
15
Para a varivel
sat.v
dos dados
stud.recs
C.Cordeiro
33 / 34
Let's work!
15
Para a varivel
sat.v
dos dados
stud.recs
C.Cordeiro
33 / 34
C.Cordeiro
34 / 34