Professional Documents
Culture Documents
Pavol ORANSK
Contents
1 Probability theory
1.1 Random event . . . . . . . . . . . . . . . . . .
1.1.1 Algebraic operations and programs with
1.2 Classical denition of probability . . . . . . . .
1.3 Kolmogorov denition of probability . . . . . .
1.4 Probability of the unication of random events
1.5 Probality of the opposite event . . . . . . . . .
1.6 Conditional probability . . . . . . . . . . . . .
1.7 Intersection probability of random phenomena
1.8 Full probability formula . . . . . . . . . . . . .
1.9 Bayesov vzorec . . . . . . . . . . . . . . . . . .
1.10 Bernoulliho vzorec . . . . . . . . . . . . . . . .
. . . .
events
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
2 Random variable
2.1 Discrete probability distribution . . . . . . . . . . .
2.2 Distribution function of random variable . . . . . . .
2.3 Density distribution . . . . . . . . . . . . . . . . . .
2.3.1 Basic features of the density distribution. . .
2.4 Numerical characteristics of random variable . . . .
2.4.1 Mean value . . . . . . . . . . . . . . . . . . .
2.4.2 Variance (dispersion) and standard deviation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
10
12
14
15
17
18
19
20
22
23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
27
28
29
31
31
33
41
5 Estimates of parameters
5.1 Point estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Interval estimation of parameters . . . . . . . . . . . . . . . . . .
5.2.1 100 (1
)% bilated condence interval for mean value
5.2.2 100 (1
)% bilateral condence interval for dispersion 2
45
45
49
49
53
CONTENTS
81
Preface
This text was based on the inherent requirements of my students on the subject
of Statistics Faculty of Management, University of Preov, an overview of a
comprehensive publication of purely on statistics, contained in their basic course.
Most existing text was either too large, and thus a deterrent to the reader, or,
conversely, some publications contained only secondary curriculum. Demands
on university students has recently changed considerably, therefore was also a
gap in the literature, which lacks books covering a kind of intermediate stage of
secondary literature and literature explicitly university type, ie. kladce books
to the reader requirements less stringent than in the past. This gap, I tried the
"quick x" patch overwrite my lectures from the course Statistics in acceptable
form. I have text in addition to enriching addressed but not resolved by the
examples that I drew from its own resources respectively. I took them from the
book [1].1
This text does not replace any, in my opinion, excellent publications by other
authors, of which I once again mention Chajdiaks [1]. But in some way trying
to bring modern students who are trying, lets face it, as far as possible to save
mathematics. It is strictly text reader for inexpensive, lightweight for deeper
analysis of issues.
Finally, I would like to thank my colleague and good friend Dr., Ing. Jn RYBRIK, PhD., For their support and valuable advice of an experienced teacher,
without whom this text would not arise. Also thank my students for comment.
Pavol ORANSK
1 No results or arguments and examples used in them are not based on facts, and therefore
there is no real preguration them, which could be an eventual usurper can look up.
CONTENTS
Pavol ORANSK
CONTENTS
Chapter 1
Probability theory
Probability theory is a kind of basis for statistics. Is an integral part thereof. For
a deeper understanding of statistics is necessary to have knowledge of at least
this chapter. Probability theory describes random events and probability of the
occurrence. Statistics specic model empirical events, and statistical methods
to describe these events have a fundamental right in probability theory.
1.1
Random event
10
konst:, for n
nA
n
1;
1.1.1
A=B
event A occurs if the event is part of B and B at the same event is part
of the event A.
Eg.: event A ..." In throwing dice scored number six. "
event B ..." In throwing dice will fall even number divisible by three. "
II. Operation unication
A[B
11
A\B
Is a random event, which occurs if and only if there is a event A and B at the
same event occurs, ie. both events occur simultaneously.
Eg.: event A ..." In throwing dice scored number six. "
event B ..." In throwing dice scored number ve. "
event A \ B ..." In throwing dice scored number ve and number six at
the same time. "
IV. Opposite event
Contrary to the random event accidental event and occurs when the event
occurs A.
Eg.: event A ..." In throwing dice scored number six. "
event A ..." In throwing dice scored number six, ie. scored one of the
numbers one, two, three, four or ve. "
V. Certain event
Certain event is a random event that always occurs.
Certain event ..." In throwing dice scored one of the numbers one, two, three,
four, ve or six. "
It could still possibly be that we fall on the edge of the cube but that it is
impossible.
Probably true is that A [ A = :
VI. Impossible event
event ..." when throwing dice, cube fall "(we think of anything more
imaginative)
= ?.
Remark 3 Eg.: event A ... "The roll of dice you get the number six."
event B ... "In throwing dice will fall even number."
12
1.2
nA
Probability should have similar properties as relative abundance
, since it is
n
modeled.
For further understanding it is necessary to introduce some concepts that
we try to explain most empirically.
Event that can not be further broken down into detailed call elemental event,
for example. when we throw dice event of falling even number decomposed into
three elementary events, namely: the event of falling number two, the event of
falling numbers four and six event of falling numbers, which we can no longer
spread, therefore it is elementary phenomena.
Denition 3 The system of sets
1. 8A; B 2 : A [ B 2 ;
2. 8A; B 2 : A \ B 2 ;
3. 8A 2
4.
: A2
2 ;
5. ? 2 :
Denition 4 Real function P (A) dened on the algebra
called if the following shall apply:
1. 8A 2
) P (A)
subsets of
will be
0;
jAj
;
j j
where the symbol jAj means the number of elements set A and the symbol
j j means the number of elements set :
13
jAj
4
= $ 0:67 = 67%:
j j
6
Example 2 Throw while playing with three dice. Vypoctajme what is likely to
fall by three equal numbers?
Solution: jAj = 6:
n+k
j j = C30 (6) =
k
P (A) =
6+3
3
= 56:
6
3
jAj
=
=
0:107 = 10:7%:
j j
56
28
7
2
13
3
= 21 286 = 6006:
7
) a zvyn tri
2
13
).
3
j j = C5 (20) =
20
5
= 15 504:
jAj
6006
1001
=
=
= 0:387 38 = 38:7 38%:
j j
15 504
2584
14
1.3
is a nonempty subset of
)A2 :
2. 8Ai 2 , where i = 1; 2; : : : )
1
[
i=1
Ai 2 :
from -algeba.
1
[
i=1
0
Ai
8Aj 2 ; j = 1; 2; : : : ;
=
1
X
P (Ai ) :
i=1
Or otherwise.
1 Tade Nikola jevi
c Kolmogorov (*25. 4., 1903 y20. 10. 1987) Soviet mathematician, founder of modern probability theory and complexity theory algorithms. He also worked
in the elds of topology, logic, Fourier series, turbulence and classical mechanics.
15
8Aj 2 ; j = 1; 2; : : : ; m;
2. P ( ) = 1;
3. P (A1 [ A2 [ : : : [ Ak ; : : :) = P (A1 ) + P (A2 ) + : : : + P (Ak ) ; : : :
for any
(nal or innite) sequence of bilaterally disjoint random events A1 ; A2 ; : : : ; Ak ; : : :.
Remark 7 Probability can take values from zero to one inclusive, ie. 0
P (A) 1:
Remark 8 Impossible event ? has zero probability, ie. P (?) = 0.
Remark 9 Probability of the opposite event is equal to add to one of the original
event, ie.P A = 1 P (A):
Remark 10 If the event A is a part of the event B (ie. A is subevent of event
B, A B),than the probability of event A is less equal at most to probability of
event B; ie. P (A) P (B):
Remark 11 If the event A is a part of the event B (ie. A is subevent of event
B, A B),than the probability of di erence events B A is equal to probability
of diference of probality of this events P (B A) = P (B) P (A):
1.4
P (A \ B);
if we consider n random events, then the formula for the probability of them
together should form
!
n
n
n
n
[
X
X
X
P
Ai
=
P (Ai )
P (Ai1 \ Ai2 ) +
P (Ai1 \ Ai2 \ Ai3 ) + : : :
i=1
i=1
1 i1 <i2 n
n+1
: : : + ( 1)
1 i1 <i2 <i3 n
P (A1 \ A2 \ : : : \ An ) :
16
P (A \ B)
P (A \ C)
P (B \ C)+
P (B \ D)
P (C \ D)+
+P (A \ B \ C);
ii) P (A [ B [ C [ D) = P (A) + P (B) + P (C) + P (D)
P (A \ B)
P (A \ C)
P (A \ D)
P (B \ C)
+P (A \ B \ C) + P (A \ B \ D) + P (A \ C \ D) + P (B \ C \ D)
P (A \ B \ C \ D):
Consequence 1 Consider the ending of the relationship for the unication of
disjoint events, ie. if Ai \ Aj = ? for i 6= j, than
!
n
n
[
X
P
Ai =
P (Ai ) :
i=1
i=1
P (A2 ) =
jA1 j
=
j j
jA2 j
=
j j
4
2
20
7
16
6
4
1
20
7
16
7
4
0
546
= 0:338 08;
1615
2002
= 0:413 21;
4845
143
=
= 0:147 57;
20
969
7
546
2002 143
871
::: =
+
+
=
= 0:898 86 $ 90%
1615 4845 969
969
P (A3 ) =
jA3 j
=
j j
16
5
17
Example 5 The disused hockey stadium metropolis East could be deployed for
up to three television cameras, in the case of a live television broadcast of a
hockey game. These detect low independently. For the rst (central) camera
is likely that they will shoot at any given time 60% for the second and third
(covering a third or home. Guests) is the same probability equal to 80%. Let s
calculate probability that they will shoot at any given moment what is happening
on the ice surface at least one of the cameras!
Solution: A : : : :"Will shoot at least one 9
of the cameras."
A1 : : : : "It will shoot the rst camera." =
A2 : : : : "Will shoot second camera."
nie s disjunktn!!!
;
A3 : : : : "Will capture third camera."
A = A1 [ A2 [ A3 ;
P (A1 \ A2 )
P (A1 \ A3 )
P (A2 \ A3 ) + P (A1 \ A2 \ A3 ) = : : :
P (A1 ) = 0:6;
P (A2 ) = 0:8;
P (A3 ); = 0:8;
P (A1 \ A2 ) = P (A1 ) P (A2 ) = 0:6 0:8 = 0:48;
P (A1 \ A3 ) = P (A1 ) P (A3 ) = 0:6 0:8 = 0:48;
P (A2 \ A3 ) = P (A2 ) P (A3 ) = 0:8 0:8 = 0:64;
P (A1 \ A2 \ A3 ) = 0:6 0:8 0:8 = 0:384:
: : : = 0:6 + 0:8 + 0:8
1.5
0:48
:::
0:48
9
>
>
=
are independent!!!2
>
>
;
P (A):
Dkaz:
= A [ A;
P( ) = P A [ A ;
1 = P (A) + P A ;
P A = 1 P (A) :
Example 6 Set of 32 cards we pulled 4 cards.
Let s calculate what is the probability that the card withdrawal will be at least
one ace (ie 1 to 4)!
2 That are independent events we guarantee that the intersection probability calculation we
use the above relationship. For a more detailed explanation see. chapter on the probability
of intersection of random variables, respectively. independence of random events.
18
A
P A =
=
j j
P (A) = 1
P (A) = 1
= 0:569 38;
[97:2%]
Example 8 In the fate there are 2 white, 3 black and 5 blue chips. Randomly
select 3 tokens.
Calculate the probability that among them are at least 2 chips of the same
color!
[75%]
1.6
Conditional probability
If you are not on the occurrence of the event and placed no conditions, the
probability P (A) of A event called the unconditional probability.
Often, however, is contingent upon the occurrence of the event of the occurrence of another event, ie. A event can occur only if an event B, the probability
is P (B) > 0. In this case we are talking about conditional probability. The
concept of conditional probability P (A j B) two events denes the following
relationship
P (A \ B)
:
P (A j B) =
P (B)
Example 9 The telephone exchange is among 110 cable 67 red and those involved is 45. Randomly select the red wire. Calculate the probability that this
will be involved?
Solution: A : : : :"Cable is plugged."
B : : : :"Cable is red."
P (A j B) =
P (A \ B)
=
P (B)
45
110
67
110
45
= 0:671 64 = 67:164%:
67
1.7
19
i=1
In the special case of the intersection of two events A and B, ie. case if these
event occur simultaneously true that the probability of intersection is equal
probability of one of the events times the conditional probability of the second
event
P (A \ B) = P (A) P (B j A) = P (B) P (A j B) :
Similarly we could dene the probability of three events
P (A1 \ A2 \ A3 ) = P (A1 ) P (A2 j A1 ) P (A3 j A1 \ A2 ) :
Event A and B will be called independent events if the occurrence of
events and event depend on the occurrence of the event of B and B while the
incidence of the event does not depend on the occurrence of event A.
Due to the previous it is clear that for independent events A and B holds3
P (A j B)
P (B j A)
= P (A) ;
= P (B) :
Example 12 The fates are 3 white, 5 red and 7 blue balls. Randomly select
four balls in a row.
What is the probability that 1 ball will be white, 2 red, 3 red and 4 blue?
Solution:
A1 : : : :"1.
A2 : : : :"2.
A3 : : : :"3.
A4 : : : :"4.
again, however, stressed that this applies only if they are independent event.
20
P (A1 ) =
jAj
=
j j
3
1
15
1
3
1
= ;
15
5
5
;
14
4
;
P (A3 j A1 \ A2 ) =
13
7
P (A4 j A1 \ A2 \ A3 ) =
;
12
P (A2 j A1 ) =
::: =
1 5 4 7
1
=
:
5 14 13 12
78
1.8
21
n
X
i=1
P (Hi ) P (A j Hi )
or else
P (A) = P (H1 ) P (A j H1 ) + P (H2 ) P (A j H2 ) + : : : + P (Hn ) P (A j Hn ) :
The formula is the culmination of a clear denition full probability, where
any event which is fully distributed in complete system of events to disjoint
parts, which, since the system is complete, unication gives us the whole event
A.
Remark 13 Full probability formula is true if we replace n by innite, ie. n !
1.
Remark 14 Events H1 ; H2 ; : : : ; Hn commonly we called a hypothesis and apply
for them
P (H1 ) + P (H2 ) + : : : + P (Hn ) = 1:
Example 16 Come into the store three production companies of the same type
products represented 2:3:4. Probability of a awless product is rst holding 82%
of 2 holding 93% and 90% of 3.podniku.
What is the probability that a awless product we buy?
Solution: A : : : :"We buy a perfect product."
Hi : : : :"The product is the i-th company."
P (A)
Example 17 Two consignments containing products. The rst shipment contains 18 good and 3 bad. The second shipment includes 9 good and 1 bad. The
second consignment choose a product and give it to rst. Then randomly select
one product from the rst shipment.
Let s calculate probability that this is good!
Solution: A : : : :"The product is good."
H1 : : : :"The product is the second shipment is defective."
H2 : : : :"The product is the second shipment is good."
P (A) = P (H1 ) P (A j H1 ) + P (H2 ) P (A j H2 ) =
1 18
9 19
+
= 85:9%
10 22 10 22
22
Example 18 Until fate with three balls to put a white ball. Assume that any
number of white balls in fate is likely. Then drawn at random from pulled one
ball.
How likely is it white?
Solution: A : : : :"The selected ball is white."
H1 : : : :"In the original fates are all three white balls."
H2 : : : :"In the original fates are two white balls."
H3 : : : :"In the initial fate is a white ball."
H4 : : : :"In the initial fate is no white ball. "
P (A)
= P (H1 ) P (A j H1 ) + P (H2 ) P (A j H2 ) +
P (H3 ) P (A j H3 ) + P (H4 ) P (A j H4 )
1
1 3 1 2 1 1
5
=
1+
+
+
= = 0:625 = 62:5%
4
4 4 4 4 4 4
8
1.9
Bayesov vzorec
P (Hj ) P (A j Hj )
P (Hj ) P (A j Hj )
= P
:
n
P (A)
P (Hi ) P (A j Hi )
i=1
23
0:5 0:1
P (A3 ) P (A j A3 )
=
= 0:819 67 = 82%
P (A)
0:2 0:01 + 0:3 0:03 + 0:5 0:1
Example 22 The plant produces some components of which two quality control
inspectors. The probability that the component will be checked rst sampler is 0.7
to 0.3 second. The probability that the rst component sampler deems acceptable
is 0.92, the second is the probability of 0.98. When an exit clearance, it was
found that the component has been satisfactory.
Calculate the probability that it controlled the rst controller!
[68:6%]
Example 23 Before the disease breaks out of its existence can be determined
by biological tests, the outcome is not entirely clear. When the sick person is the
likelihood of a positive test result 0.999, contrary to a healthy person, a positive
test appears with a 0.01. Test condition therefore not be detected. We assume
that the disease a ects approximately 10% of the population.
XY person has tested positive.
Calculate the probability that the disease has actually Z!
[91:7%; ie. hope is 8:3%]
1.10
Bernoulliho vzorec
n
k
pk (1
n k
p)
24
n
k
pk q n
Example 24 Let s Calculate the probability that the 5-fold attempt to get the
number just six 2-times?
Solution: p =
P2 (5)
1
;
6
k = 2;
5
2
5
2
=
=
n = 5:
5 2
1
1
5
1
1
=
2
6
6
6
625
1 125
=
$ 0:160 = 16%:
36 216
3888
5
6
100
61
0:761 0:339 +
100
60
100
62
0:760 0:340 +
0:762 0:338 $
Chapter 2
Random variable
Sometimes it is useful to describe the random phenomenon by using some of
its numerical characteristics (eg size, weight, number of features, rates, etc..),
which we call the random variable.
Denition 11 Concept of random variable denote variable whose value is determined by the result of an accidental experiment. Repeat the experiment occurs
due to changes in random processes random variable and its value can not be determined before carrying out the experiment, and therefore is a random variable
specied probability distribution.
Random variable we denote as
x, y, z etc..
2.1
Denition 12 We say that a random variable has a discrete probability distribution, if there is a nal or countable set of real numbers fx1 ; x2 ; x3 ; :::g, such
that pays
1
X
P ( i = x) = 1:
i=1
1 small
25
26
P ( = 0) =
3
3
5
3
1
;
10
P ( = 1) =
2
1
3
2
5
3
6
;
10
P ( = 2) =
3
1
2
2
5
3
3
:
10
1
10
6
10
3
10
In the previous example, we tried as much detail to explain what constitutes the concept of probability distribution of random variable and how it can
present.
Notice also the fact that the sum of probabilities over all values of the random
variable is equal to the number 1, ie. is equal to the probability of a full system
of events.
3
X
i=1
P(
= x) = P ( = 0) + P ( = 1) + P ( = 3) =
6
3
1
+
+
= 1:
10 10 10
2.2
27
F (x) = P ( < x) ;
pre x 2 ( 1; 1) ;
lim F (x) = 0
x! 1
and
lim F (x) = 1:
x!1
P ( = 0) = P3 (0) =
p0 (1
p)3 = : : :
3
0
1
2
1
2
= P3 (1) =
3
1
1
2
P ( = 1)
= P3 (2) =
3
2
1
2
P ( = 2)
= P3 (3) =
3
3
1
2
P ( = 3)
1
2
1
;
8
1
2
1
2
1
2
3
;
8
3
;
8
1
:
8
28
x<1:
F (x) = P ( < 1) = P ( = 0) =
x<2:
F (x) = P ( < 2) = P [(
= P ( = 0) + P (
x<3:
F (x) = P ( < 3) = P [(
= P ( = 0) + P (
7
1 3 3
= + + = ;
8 8 8
8
3 x:
F (x) = P [( = 0) [ ( = 1) [ ( = 2) [ ( = 3)] =
= P ( = 0) + P ( = 1) + P ( = 2) + P ( = 3) =
1 3 3 1
= + + + = 1:
8 8 8 8
Distribution function can thus be written as
8
0
pre x < 0
>
>
>
>
< 18 pre 0 x < 1
4
pre 1 x < 2 :
F (x) =
8
>
7
>
pre 2 x < 3
>
>
: 8
1
pre 3 x
Distribution function can be graphically illustrated as follows
F(x)
1.0
0.5
-1
-0.5
2.3
Density distribution
As for the discrete distribution, we dene the distribution function, in the case
of continuous distribution is equivalent to the concept of so-called density
distribution.
Denition 14 Says that a random variable has a continuous distribution if
there is a nonnegative function f (x) for which an
F (x) = P ( < x) =
Zx
f (t)dt;
29
2.3.1
F (a) =
Zb
f (x)dx;
from where directly derives important relationships for calculating the prbability
of a random variable which has continuous distribution
P (a
<
< b) =
Zb
f (x)dx = F (b)
F (a);
< b) =
Zb
f (x)dx = F (b)
F (a);
b) =
Zb
f (x)dx = F (b)
F (a);
Zb
f (x)dx = F (b)
F (a):
P (a
P (a
<
P (a
b) =
In the previous formulas we have seen that the calculation of denite integral to
our border points do not matter and therefore it does not matter whether
the relations we use a sharp or blurred inequality.
Z1
f (x)dx = 1:
1
X
i=1
P(
= x) = 1 for
30
Graphically it means, that the area between the x-axis and the density
function is equal to 1.
f(x)
x
Density distribution
When calculating the probability that a random variable will be included in
such interval h 1; 2i, we calculated the denite integral
P( 1
2) =
Z2
f (x)dx;
which result would be the dierence values of the distribution function, ie.
P( 1
2) =
Z2
f (x)dx = F (2)
F ( 1):
F(x)
x
Distribution function
Example 29 Determine the constant c so that the function f (x) would be the
density distribution of random variable , and consequently calculate the probality, that the random variable will take values from interval h 1; 1i, ie. calculate P ( 1
1):
cx2 e
0
f (x) =
x 0
:
x<0
Z1
31
f (x)dx = 1:
Z1
Z1
f (x)dx =
cx2 e
0
Z1
x3
dx = c
x2 e
c
3
x3
dx =
Z1
c
e t dt =
3
t 1
0
c
0 + e0
3
,
1
0
t = x3
,
dt = 3x2 dx
=
0!0
1!1
c
;
3
and therefore
c
= 1:
3
Z1
f (x)dx = 0 +
x 0;
x < 0:
1) we calculate as follows
1) =
Z1
x3
3x2 e
0;
3x2 e
x3
Z1
f (x)dx =
dx =
+ e0 =
1
+ 1 $ 0:632 = 63:2%:
e
2.4
Distribution function, probability distribution and density distribution accurately describe the likelihood of conduct random variable. In practice, we often
used certain numeric characteristics to describe a random variable.
2.4.1
Mean value
32
n
X
xi pi ;
i=1
E( ) =
Z1
x f (x)dx:
where c is constant,
+ b) = a E( ) + b,
iii) E(
iv) E(
2
2)
+ ::: +
n)
= E( 1 ) + E( 2 ) + : : : + E(
= E( 1 ) E( 2 ),
n ),
1; 2
independent.
1
0:1
0
0:2
1
0:2
2
0:4
3
:
0:1
1
2
Z1
1
x f (x)dx =
1
2
Z
0
x sin xdx = : : : =
2.4.2
33
D( ) = E [
E ( )]
D( ) = E
[E ( )] :
Dispersion for:
is dened as
D( ) =
n
X
[xi
E ( )]
pi ;
i=1
[x
E ( )] f (x)dx:
Dispersion characteristics
i) D(c) = 0;
where c is constant,
+ b) = a2 E( );
ii) D(a
D(
+ ::: +
n)
= D( 1 ) + D( 2 ) + : : : + D(
n );
:::
n)
= D( 1 )
n ):
D( 2 )
:::
D(
, which is dened
1
0:1
0
0:2
1
0:2
2
0:4
3
:
0:1
34
D( ) = ( 1
1:2)
+ (2
1:2)
=
0:1 + (0
1:2)
0:2 + (1
2
0:4 + (3
1:2)
1:2)
1:2+
0:1 = 1: 4;
p
p
D( ) = 1: 4 = 1: 183 2:
is
Z1
1
(x
a)
f (x)dx =
Z2
0
(x
a)
sin xdx = : : : = a2
2a +
2:
Chapter 3
Signicant continuous
distribution of random
variable
3.1
Normal distribution (Laplace-Gauss distribution)1 2 we use wherever uctuations caused by the random variable is the sum of many small and independent
of each other impacts, eg. on the production dimensions of a product aects
uctuating raw material quality, uniformity machine processing, attention to
dierent worker, etc. ..
parameters:
density distribution:
distribution function:
mean value
dispersion:
(x
)2
1
p
e 2 2 ; for x 2 ( 1; 1)
2
Zx
(t
)2
1
F (x) = p
e 2 2 dt; for t 2 ( 1; 1)
2
f (x) =
E( ) =
D( ) =
haspa normal p
probability distribution
2 we denote as
= D( ) =
s
35
f(x)
0.20
0.15
0.10
0.05
-5
10
15
20
25
30
35
3.2
+3
) = 0:997 = 99:7%
= 0;
=1
1
x2
f (x) = p
e 2 ; for x 2 ( 1; 1)
2
Zx
1
t2
F (x) = p
e 2 dt; for t 2 ( 1; 1)
2
E( ) =
D( ) =
=0
2
=1
37
f(x)
0.4
0.3
0.2
0.1
-4
-2
PHI (x)
1.0
0.5
-5
distribution function
Integral values
1
(x) = p
2
Zx
(x)
x2
e 2 dt can not be determined (the integral
(x) satisfy
(x) = 1
( x)
pre x 2 ( 1; 1) :
( 1:18):
3.2.1
(1:18) = 1
0:881 = 0:119:
(x)
Since the values are tabulated only for a standard normal distribution with
normal distribution we need to obtain the quantiles used a conversion, which
we now derive
F (x)
1
p
2
Zx
(t
)2
2 2
dt =
u=
t
1
du =
1!
x
x!
dt
1
1
p
2
u2
2
du =
Therefore
F (x) =
Example 35 Random variable
a) P (2
b) P (
10);
0):
Solution:
a) P (2
b) P (
10) =
(10)
0) = P (0
< 1) =
(2) = 1
(1)
1);
1:16):
Solution:
a) P (
=1
1) = P (1
(0:1) = 1
< 1) = F (1)
0:5398 $ 54%;
F (1) = 1
1 0:8
2
1:16) = P ( 1 <
1:16 0:8
2
=
=1
0=
39
1:16) = F ( 1:16)
( 0:98) = 1
F ( 1) =
( 0:98) =
0:8365 = 0:1635
9:97) = P ( 1 <
9:97 10
0:02
=
=1
b) P (
0=
1:16) = F (9:97)
( 1:5) = 1
F ( 1) =
(1:5) =
10:024 10
0:02
=1
(1:2) = 1
< 1) = F (1)
F (10:024) = 1
60) = P ( 1 <
60 90
15
0=
80) = P (80
0:66 = 1
60) = F (60)
( 2) = 1
< 1) = F (1)
1
F ( 1) =
0:66
F (80) = 1
80 90
15
= 0:716 = 71:6%:
Chapter 4
Descriptive statistics
In this chapter we consider a set of statistical N scale units
which we have found the value of the investigated random
statistical unit (random experiment), which means that we
(for i = 1; 2; : : : ; N ) examined variable .
xi represents a particular value of random variable by
42
pi =
cumulative abundance
ni
;
N
Ni =
N
P
ni ;
i=1
,
kumulatvna relatvna pocetnost
xi :
1
2
3
4
5
6
ni :
3
6
7
8
4
2
P
= 30
Mi =
pi :
Ni
:
N
Ni :
3
8
16
24
28
30
3
30
6
30
7
30
8
30
4
30
2
P30
Mi :
3
30
8
30
16
30
24
30
28
30
30
30
=1
xmin = 116
34:8 = 81:2
R
variacnho rozptia
81:2
=
=
= 9: 022 2 $ 10:
k
pocet tried
9
xj :
35
45
55
65
75
85
95
105
115
nj :
1
2
5
7
7
4
2
1
1
P
= 30
pj :
1
30
2
30
5
30
7
30
7
30
4
30
2
30
1
30
1
P30
=1
Nj :
1
3
8
15
22
26
28
29
30
Mj :
30
30
1
30
3
30
8
30
15
30
22
30
26
30
28
30
29
30
=1
43
Basically it was a similar procedure as in the previous example, but please note
the index with respect to j, where j = 1; 2; : : : ; k, where k represents number of
classes.
For visual understanding is the best graphically represent results. For these
purposes, is commonly used histogram.
Histogram is essentially columnal diagram where the x-axis value is applied
random variable representing the class and the y-axis is applied corresponding
to an absolute (or relative frequencies).
0.4
0.3
0.2
0.1
0.0
20
40
60
80
Histogram
100
120
44
Chapter 5
Estimates of parameters
Estimate we mean a statistical method by which the approximately determined
(estimated) unknown parameters of statistical les.
Lets random selection 1 ; 2 ; : : : ; n of a distribution, which depends on
unknown parameters 1 , then
parameter can take only certain values of
the area . Through the estimation theory we are trying to create statistics
T ( 1 ; 2 ; : : : ; n ), which distribution comes closer to that parameter
.
,
Odhady, pri ktorch hl adme urcit parameter, nazvame parametrick
odhady. Neparametrickmi odhadmi nazvame odhady, pri ktorch nie je pozadovan
parametrick pecikcia typu pravdepodobnostnho rozdelenia.
5.1
Point estimate
Point estimate lies in replacing the unknown parameter values of the population, or its functions, the value of the selection characteristics.
At some point estimate, we pay claims as to its consistency and unbias.
Consistent (undisputed) point estimators call such a set of basic statistics
Tn = T ( 1 ; 2 ; : : : ; n ), that for su ciently large values of index n satises the
condition
P (jTn
j ") < 1
;
for any " > 0 and > 0; ie.require that the parameter belongs to the interval,
whose radius is less than the arbitrarily small but positive " with probability
1
; where is any positive number, which usually is chosen close to zero as
possible. In other words, the point estimate consistent if it lie in the smallest
possible interval with most probability as can.
Unbiased point estimate of the parameter is called the basic set of statistics
Tn = T ( 1 ; 2 ; : : : ; n ), which mean value holds E(Tn ) = : Otherwise, we talk
about estimating the distortion (bias). Dierence b ( ) = E(Tn )
we call the
bias parameter estimation . As with a growing range of n random distortion
1
45
46
The best undistorted point estimate of the dispersion of the basic set D ( ) =
is sample covariance
1
Sx2 =
n
P
(xi
is
x) :
i=1
The p
best undistorted point estimate of the standard deviation of the basic set
= D ( ) je sample standard deviation
v n
uP
2
u
(xi x)
t
p
i=1
Sx = Sx2 =
:
n 1
Sx2 =
1
N
k
P
nj (xj
x)
j=1
k
P
nj (xj
x)
j=1
x;
Sx2 ;
Sx ;
47
35
1
45
2
55
5
65
7
75
7
85
4
95
2
105
1
115
1
x =
1
(1 35 + 2 45 + 5 55 + 7 65 + 7 75 + 4 85 + 2 95 + 1 105 + 1 115) =
30
71:
=
=
1
N
1
30
+4 (85
1
30
k
X
nj (xj
x) =
j=1
1 (35
30
71) + 2 (45
9
X
71) =
j=1
71) + 5 (55
71) + 2 (95
nj (xj
71) + 1 (105
71) + 7 (65
71) + 1 (115
71)
71) + 7 (75
i
= 321: 38
Sample standard deviation is so equal
p
p
Sx = Sx2 = 321: 38: = 17: 927
For comparison, giving the values of the sample mean and sample covariance
calculated from original values
x =
Sx2
Sx
n
1 X
xi = 70: 457;
n i=1
n
X
71) +
48
n
1 P
xi = 307: 42
n i=1
b) I. druh modus as most occurring character is the value 310, that occurred
5-th times,
c) I. druh median is also the value 310, because the arrangement of characters
in ascending order value is in the middle of such an arrangement.
a) II. druh [x = 301:] ;
b) II. druh [mod = 313; 4-th times] ;
c) II. druh [median = 313] :
5.2
49
In the previous section, we estimated the unknown parameter points, ie. unknown parameter was "replacing" the particular values that best estimated. It
is understandable that using a larger sample we get more accurate results than
a smaller sample, but the point estimate method disregard this fact.
Another possibility is to estimate the parameter interval estimation. Unknown parameter is estimated interval, meaning that lies between two values.
The center of interval is a kind estimation of parameter of mean value and
width of the interval represents the degree of dispersion of the values.
This interval is called the condence interval (Ta ; Tb ) and unknown parameter
this interval will contain the probability (1
)%, where number
called the signicance level. This level is chosen in advance, and expresses
the required "degree of accuracy" with which we are looking for the interval in
which the parameter is located.
If the selection is repeated many times (given the trend of stability of stochastic processes), then the unknown parameter
"falls" into the condence
interval (Ta ; Tb ) in about the 100 (1
)% cases (ie. the probability that the
parameter is within the interval (Ta ; Tb ) is equal to the number 1
).
We are talking about so called 100 (1
)% condence interval and write
P (Ta
Tb ) = 1
In the event that the boundaries Ta ; Tb are nite, we say about bilateral condence interval.
If Ta = 1, ie. ( 1; Tb ):::right-sided condence interval.
If Tb = 1, ie. (Ta ; 1):::left-sided condence interval.
Boundaries depend on the estimated parameter, the random selection and
distribution. However, we further restrict ourselves to a normal distribution
N ; 2 , for which we estimate the parameters and 2 .
5.2.1
100 (1
value
sion D(x) =
and disper-
, and so x s N
u1
x
2
u1
=1
50
x
x
u(1
u1
n
p
u(1
x+ p
u1
x+ p
u1
u1
So far we have assumed that we know the value of . If we did not know her
well, we can for a su ciently large statistical set (n 30) the value of standard
deviation point estimate with sample standard deviation Sx .
In practice, however, often occur les, whether for economic or purely practical reasons, not too large (eg when determining the quality of the product
is destroyed or if it is monitoring some phenomenon of time or otherwise economically too demanding). And if the le is small, we would be using a point
estimate committed signicant errors.
In 1908 the English chemist Arthur Guinness Brewery & Son Brewery, W.
S. Gosset2 ppublished under the pseudonym Student work, which dealt with
just small sample. Derived sample statistics for the distribution of small les,
ie. n < 30. This division is called Students t-distribution.
Parameter of this distribution is the number of degrees of freedom. If we have
a random le with a range of n elements, then the Student t-distribution (n 1)
(n 1)
degrees of freedom. Values of Students t-distribution t1
are dierent degrees
2
of freedom and signicance level are tabulated and listed in the Appendix.
f(x)
0.4
0.3
0.2
0.1
-5
-4
-3
-2
-1
51
,
Graf rozdelenia pravdepodobnosti je vel mi podobn grafu normovanho normlneho rozdelenia, avak krivka Studentovho rozdelenia je viac "zaoblen"
,
okolo strednej hodnoty. Nie je tazk porovnanm hustt pravdepodobnost
,
tchto dvoch rozdelen dokzat, ze plat:
lim t(n) = N (0; 1):
n!1
S
px u1
n
Sx
x + p u1
n
For small sets (n < 30) we use for condence interval for mean value
on
(n 1)
signicance level ; values Student t-distribution t1
with (n 1) degrees of
2
freedom, than
Sx (n 1)
Sx (n 1)
:
x p t1
x + p t1
2
2
n
n
Example 43 The airline estimates the average number of passengers. Within
20 days, the average number of passengers 112 with sample variance 25.
Find a 95% bilateral condence interval for the average number of passengers
:
p
Solution: x = 112;
Sx2 = 25 (tj.: Sx = 25 = 5);
n = 20;
=
0:05:
Since the le is small (n = 20 < 30), we use Student t-distribution, where
(20 1)
(20 1)
(19)
(19)
t1 5% = t1 5% = t1 0:025 = t0:975 = 2:1,
2
2
and the interval estimate will apply:
S
1)
px t(n
n 1 2
5
112 p
2:1
20
112 2:35
109:65
x
Sx (n 1)
x + p t1
;
2
n
5
112 + p
2:1;
20
112 + 2:35;
114:35:
52
p
Solution: x = 70:012; Sx2 = 0:00723 (tj.: Sx = 0:00723 = 0:08503);
n = 50; = 0:01:
Since the le is large (n = 50 > 30) we use standardized normal distribution,
where u0:99 = 2:33, and the interval estimate will apply:
S
px u0:995
n
0:08503
p
2:57
70:012
50
70:012 0:0309
69: 981
Sx
x + p u0:995 ;
n
0:08503
70:012 + p
2:57;
50
70:012 + 0:0309;
70:0429:
Example 45 Measuring the resistance cable from eight randomly selected samples obtained the following values:
0:139; 0:144; 0:139; 0:140; 0:136; 0:143; 0:141; 0:136.
Assume that the measured values can be considered a random realization from
a normal distribution with unknown mean and unknown variance.
Find a 95% condence interval for the mean value.
Solution: Since we do not know the mean value E(X) or standard deviation
, we use instead samprle mean x and sample standard deviatinon Sx .
x =
=
1
(0:139 + 0:144 + 0:139 + 0:140 + 0:136 + 0:143 + 0:141 + 0:136) =
8
0:13975
Sx2 =
1
(0:139
7
+ (0:140
+ (0:141
0:13975) + (0:144
2
0:13975) + (0:136
2
0:13975) + (0:136
0:13975) + (0:139
0:13975) + (0:143
2
0:13975) = 8; 5
10
0:13975) +
2
0:13975) +
6
= 0:0000085;
and so
Sx =
p
p
Sx2 = 8; 5
10
= 2:915 5
10
= 0:0029155
(7)
The le is small, therefore use the t-distribution quantiles, where t0:975 = 2:365:
Thus for the condence interval is valid:
S
px t(7)
n 0;975
0:0029155
p
0:13975
2:365
8
0:13975 2:437 8 10 3
0:137 31
x
Sx (7)
x + p t0;975 ;
n
0:0029155
p
0:13975 +
2:365;
8
0:13975 + 2:437 8 10 3 ;
0:142 19:
5.2.2
100 (1
sion 2
53
In many cases it is important to monitor not only the reliability of the average
value of set, but the degree of variability le. TThis we have in mind particularly the eorts to reduce deviations from the average. It is clear that the
manufacturer of screws with the standard 5 cm, would hardly succeed with half
of production and the other 5.5 cm 4.5 cm.
To calculate the standard deviation of the condence interval we use more
selection division called chi-square distribution ( 2 ).
2
-distribution3
n
P
(x1
i=1
2
has
-distribution with (n
1) Sx2
(n
1) degrees of freedom.
(n 1) Sx2
=
we can with condence
2
2
1
where
2
2
and
2
1
, are
and 1
2
;
quantiles
-distribution.
54
f(x)
x
2
Graph
-distribution
-therefore we replace:
(n
1) Sx2
2
1
;
2
(n
2
1
(n
1) Sx2
2
2
1) Sx2
(n
2
1
1) Sx2
(n
Example 46 Let the systematic error measuring device is zero. Under the
same conditions was carried out ten independent measurements of one and the
same values , where = 1000m:Specic details are given below:
i:
1
2
3
4
5
6
7
8
9
10
:
xi [m] : 92 1010 1005 994 998 1000 1002 999 1000 997
Find a 90% condence interval for standard deviation :
Solution: We needs to calculate the value of the sampling variance Sx2
Sx2 =
1
((992
9
+ (998
+ (999
1000) + (1010
2
1000) + (1000
2
1000) + (1006
1000) + (1005
2
1000) + (1002
2
1000) + (997
1000) +
2
1000) +
2
1000) ) = 27:
55
(10
r
1) Sx2
2
0;95
9 27
16:919
p
14:363
3:789 9
(10
1) Sx2
2
0;05
9 27
;
3:325
p
73:08;
8:549:
56
Chapter 6
Testing statistical
hypotheses
The notion of statistical hypothesis understand some claim on the distribution
of basic statistical le, respectively. its parameters (for parametric tests). Verication of the veracity of such claims on the merits of random sampling is called
hypothesis testing.
In its later being limited to parametric testing. We tested the parameters
mean
and variance 2 (respectively standard deviation ). Generally the
parameter we denote :
6.1
6=
and alterna-
57
58
f(X)
X
area of acceptance H0 :
f(X)
X
left part area of refusal H0 :
f(X)
X
right part area of refusal H0 :
When testing statistical hypotheses, we proceed as follows:
1. we determine the null hypothesis H0 and alternative hypothesis H0 ,
2. choose the test statistic,
3. we determine the level of signicance if it the corresponding area of
refusal,
59
6.1.1
Testing parameter
Null hypothesis:
Test statistic:
T =
H0 :
area of refusal H0 :
<
Sx
H1 :
(n 1)
t1
c) alternative hypothesis:
T
6=
(n 1)
t1
b) alternative hypothesis:
area of refusal H0 :
respectively T =
H1 :
jT j
area of refusal H0 :
30)
n,
a) alternative hypothesis:
and :
H1 :
(n 1)
t1
(left-side test)
>
(right-side test)
Sx
n=
38
14
49 p
18 =
3:33:
t1
(17)
= t1
0:01
2
(17)
= t0:995 = 2:898232:
60
Area of refusal H0 is jT j
t1
j 3:33j
2:898232:
Therefore reject the null hypothesis H0 , since the value 3:33 lies in the area
of refusal of H0 .
Agencys argument that the average price of land central part of Tokyo for
the rst six months of 1986 increased by 49%, we reject.
Example 49 The manufacturer states that the average lifetime it produced reectors is 70 hours. Competitive rm believes that it is in fact lower, so decided
to prove that the manufacturers claim is not correct. Randomly selected 20
reectors and found that their average life was 67 hours and the standard deviation was 5 hours. Signicance level = 0:05 verify if manufacturers claim is
actually incorrect.
Solution: We determine the null hypothesis H0 : = 70 and alternative
hypothesis H1 : < 70: Now we calculate the test statistic
T =
Sx
67
5
70 p
20 =
2:683 3:
t1
(19)
t0;95 =
1:729131:
(19)
On this basis, we reject the hypothesis H0 , since T < t0;95 , and so we accept
H1 . Thus the presumption of competitive rms is conrmed.
6.1.2
Testing parameter
Null hypothesis:
Test statistic:
T =
H0 :
a) alternative hypothesis:
area of refusal H0 :
jT j
b) alternative hypothesis:
area of refusal H0 :
c) alternative hypothesis:
area of refusal H0 :
n,
respectively T =
H1 :
u1
6=
<
H1 :
u1
H1 :
u1
(left-side test)
>
(right-side test)
Sx
61
u1
u0:90 =
1:28:
We see that outside the eld acceptance of hypothesis H0 , therefore, must fall
within the eld of refusal hypothesis H0 , and therefore the signicance level of
0:1, it can be argued that the average time of the new algorithm is accelerated.
6.1.3
Testing parameter
Null hypothesis:
Test statistics:
H0 :
a) alternative hypothesis:
area of refusal H0 :
area of refusal H0 :
2
0
6=
2
;(n 1)
H1 :
1) Sx2
(n
alebo
<
H1 :
2
1
>
;(n 1)
2
1
;(n 1)
(left-side test)
;(n 1)
c) alternative hypothesis:
area of refusal H0 :
H1 :
2
b) alternative hypothesis:
(right-side test)
62
Example 51 Standard deviation of a particular substance in tablets manufactured by the pharmaceutical company, shall not exceed 0.45 milligrams. If you
exceed this amount, a correction must be made in setting the production line. Inspector randomly selected 25 tablets and found that the dispersion of the contents
of the substance being studied is 0.2383. What should be concluded if acknowledging the probality of error I. kind is 2.05 (signicance level = 0:05)?
Solution: n = 25;
= 0:05;
H0 : = 0:45
H1 : > 0:45
Lets calculate the test statistic
2
Quantile is
2
1
;(n 1)
(n
1) Sx2
2
0
2
0:95;(24)
(25
= 0:45;
1) 0:2383
2
(0:45)
Sx2 = 0:2383
= 28: 243
= 36:415:
6.2
6.2.1
Null hypothesis::
H0 :
63
Test statistics:
(x1
U=s
2
1
n1
a) alternative hypothesis:
area of refusal H0 :
jU j
b) alternative hypothesis:
area of refusal H0 :
c) alternative hypothesis:
area of refusal H0 :
H1 :
u1
6=
<
2
2
n2
H1 :
u1
H1 :
u1
x2 )
(left-side test)
1
>
(right-side test)
Example 52 Site selection for new store depends on many factors. One is the
level of household income in the area around the proposed site. Suppose that a
large store to decide whether to build its next store in town A or town B Although
construction costs are lower in city B, the company decided to build in the city
and, if there are average monthly household income higher than in city B. The
survey of 100 randomly selected households each city found that their average
monthly income is in A e 4380, -, in town B e 4050, -. From other sources it
is known that the standard deviation of monthly household income is e 520, at the citys inhabitants A and e 600, - for city residents B.
Can be with the signicance level of 5% say that the average monthly income
of households in the city and exceed the average monthly income in the household
in the city of B? Assume that income in both cities have a normal distribution.
Solution: Lets formulate hypotheses
H0 : 1 = 2
H1 : 1 > 2
Test statistics is
(x1
U=s
(4380 4050)
=r
= 4: 156 3
2
2
5202
6002
1
2
+
+
100
100
n1
n2
x2 )
64
6.2.2
Null hypothesis::
Test statistics:
H0 :
(x1
U=s
2
1
n1
a) alternative hypothesis:
area of refusal H0 :
jU j
H1 :
area of refusal H0 :
6.2.3
2
2
n2
H1 :
<
(n1 +n2 2)
c) alternative hypothesis:
6=
(n1 +n2 2)
t1
b) alternative hypothesis:
area of refusal H0 :
x2 )
t1
H1 :
>
(left-side test)
2
(n1 +n2 2)
t1
(right-side test)
jU j
b) alternative hypothesis:
area of refusal H0 :
c) alternative hypothesis:
H1 :
u1
6=
<
H1 :
u1
H1 :
(left-side test)
1
>
u1
65
(right-side test)
Example 53 Several years ago, users of credit cards accounted for some segments. Generally, people with higher incomes and spending tendency prevailed
to possess an American Express card, while people with lower incomes and spending more use of VISA cards. For this reason, Visa has intensied its e orts to
penetrate even more into groups with higher incomes and through ads in magazines and television are trying to create a greater impression on people. After
some time, asked the consulting company, to determine whether the average
monthly payments through American Express Gold Card reader about equal payments made Pre ered VISA VISA cards. The company did a survey in which
1,200 randomly selected Pre ered Visa card holders and found that their average monthly payments were $ 452 with the selection standard deviation of $ 212.
Independently of this choice randomly selected 800 Gold Card holders of cards,
whose average monthly payments amounted to $ 523 with the selection standard
deviation of $ 185. Holders of both cards were excluded from the survey. Survey
results conrmed the di erence between the average amount of payments made
cards and VISA Gold Card Pre erd, overme this hypothesis signicance level
0.01.
Solution: Lets formulate hypothesis
H0 : 1 = 2
H1 : 1 6= 2
Test statistics is
(452 523)
(x1 x2 )
U=s
=r
=
2122
1852
Sx22
Sx21
+
+
1200
800
n1
n2
7: 926 4
For
6.2.4
If the standard deviation is unknown, comparing 1 ; 2 using independent random choices of small-scale requires in addition to the independence of choices
and normality of distribution essential les and the additional condition that
the variances of both random choices are equal ( 1 = 2 ).
Denote this common dispersion of both random choices 2 . Its value of
course we also do not know, and therefore puts it at variance with the common
sample variance Sp2 of sample variances. Estimate of the variance of the population has a (n1 1) degrees of freedom and variance estimation of the population
66
2 has a (n2
has form:
(n1
x2 ) is given by
The test statistic for test of conformity averages of two basic groups assuming
equal variances in their small selection of les for the null hypothesis H0 : 1 =
2 is
(x1 x2 )
U=s
1
1
Sp2
+
n1
n2
a) alternative hypothesis:
area of refusal H0 :
H1 :
jU j
H1 :
<
(n +n 2)
t1 1 2
c) alternative hypothesis:
area of refusal H0 :
6=
(n +n 2)
t1 1 2
2
b) alternative hypothesis:
area of refusal H0 :
H1 :
>
(n +n 2)
t1 1 2
(left-side test)
2
(right-side test)
(x1
Sp2
x2 )
1
1
+
n1
n2
=s
(39600
2
41200)
13 5060 + 10 40102
23
=
1
1
+
14 11
0:857 17
67
(n1 +n2 2)
t1
(14+11 2)
t0:95
(23)
t0:95 =
= 0:05 is
1:714:
Test statistic is in the area acceptance, so we with signicance level = 0:05 say
that the producers proposed reducing the prices of CD players has not resulted
in increase in sales volume.
68
Chapter 7
Correlation Analysis
Correlation, we understand each other linear relationship (dependence) of two
random variables X and Y 1 . This relationship may be direct, ie. with increasing
values of one variable increase in the value of the second variable and vice versa,
or indirect, ie. with increasing values of one variable decreases the value of the
other and vice versa.
7.1
n
1 X
(xi
n i=1
x) (yi
y) = x y
x y:
E(X) E(Y );
tags
n
1 X
(xi
n i=1
x) (xi
x) =
n
1 X
(xi
n i=1
x) = D(x) =
this section, for clarity, dispensed with random variables by Greek letters
will use the next indication X; Y; : : :.
1 ; 2 ; : : :we
69
2
x:
and instead
70
7.2
The strength of linear relationship between two variables in the base set is given
by the correlation coe cient rXY , which can take only values from the interval ( 1; 1). If the variables X and Y linearly independent, the correlation
coe cient is equal, respectively. very close to zero. Values close to -1 are interpreted as indirect high linear correlation and values close to 1 are interpreted as
high a direct linear relationship. Values close to 0:5 is interpreted as a weak
linear relationship.
However, if values close to zero, we can not say that variables X and Y are
,
independent, but only that they are linear nekorelovatel n, what we mean for
example. nonlinear dependence.
Suppose we know the n pairs of pairs of values [xi ; yi ] variables X a Y obtained for a random selection i = 1; 2; : : : ; n statistical units of the random
choice. Then the force of mutual linear dependence of variables X and Y measured correlation coe cient le rXY is dened
cov xy
;
rXY =
x
substituting we get
x y
rXY = p
x2
x y
q
;
x2 y 2 y 2
after a full statement will form the relationship, which we call the Pearson
correlation coe cient, by Karl Pearson2
n
rXY = v
u
n
u X
tn
x2i
i=1
n
X
xi yi
i=1
n
X
i=1
n
X
i=1
xi
n
X
yi
i=1
!2 v
u
n
u X
tn
xi
yi2
i=1
n
X
i=1
yi
!2
Relatively high correlation coe cient (r 0:7) indicates that between variables
X and Y is a linear high mutual dependence, but that does not mean that
there are variables between the high causal dependency, because there may be
another variable, eg. Z, from which the variable Y also linearly dependent and
which will better explain the variability of the variable Y .
Depending on the degree of causal variables X and Y determine the coe cient of determination and index determination.
7.3
Degree of causal depending variable Y on the variable X expresses the coe cient
of determination, dened as the square of correlation coe cient r. In the sample
2 Karl Pearson (* 27. 3. 1857 y 27. 4. 1936) was an English mathematician and
philosopher, proponent of machizmus.
71
denoted by r2 .
Interpretation of the coe cient of determination is based on an analysis of
variance (dispersion) dependent variable Y , which would largely explain the
variability of independent variable X, provided that it linearly depends on the
size of values Y Y.
If, for example r = 0:7, then r2 = 0:49, which means that only 49% of
the variability of the variable Y is explained by a linear relationship with the
variable X (regression line). Because 51% of the variability remains unexplained
variable Y is a linear relationship with the variable X is clear that the model
was chosen improperly (instead of linear dependence be considered non-linear
dependence).
27
15
61
6
37
10
23
18
46
9
58
7
29
14
36
11
64
5
40
:
8
Assuming that the number of days of absence and workers age is a linear relationship, consider whether direct or indirect.
Calculate the correlation coe cient and coe cient of determination.
xi
27
61
37
23
46
58
29
36
64
40
421
yi
15
6
10
18
9
7
14
11
5
8
103
x2i
729
3721
1690
529
2116
3364
841
1296
4096
1600
19661
yi2
225
36
100
324
81
49
196
121
25
64
1221
xi yi
405
366
370
414
414
406
406
396
320
320
3817
72
10;
xi
421;
yi
103;
x2i
19661;
yi2
1221;
xi yi
3817:
n
X
i=1
n
X
i=1
n
X
i=1
n
X
i=1
n
X
i=1
cov xy
= x y
=
x y=
3817
10
n
X
n
X
xi yi
i=1
xi
i=1
421 103
=
10 10
5193
=
100
n
X
yi
i=1
51: 93:
cov xy
x
where
and
2
x
51:93
=
13:917 4
0:932 85:
were calculated as
= x x
19661
10
x x = x2
421
10
x =
n
X
x2i
i=1
= 193: 69;
n
X
12
xi C
B
B i=1 C
B
C
B n C =
@
A
asince
73
2
y
y2
2
y
y =
n
X
yi2
n
X
12
yi C
B
B i=1 C
1221
B
C
B n C = 10
@
A
i=1
103
10
= 16: 01;
16:01 $ 4:
v0
u
n
u
u@ X 2
xi
t n
i=1
n
X
i=1
n
X
i=1
10 3817
(10 19661
xi yi
n
X
xi
i=1
n
X
yi
i=1
!2 1 0
n
X
xi A @n
yi2
i=1
421 103
1032 )
n
X
i=1
!2 1
yi A
0:932 54:
Coe cient of determination r2 = ( 0:93) = 0:864 9 means, that 86% variability in the number of days o in a year is explained by the inuence of age of
the worker and 14% of the variability in the number of days o in a year can be
explained by other causes such as the linearity between variables X and Y .
Example 56 Group of 100 randomly selected couples were classied by age of
wife (X) and husbands age (Y ). Characterize the degree of dependence between
the ages of husband and wife age coe cient of correlation.
XnY
15-25
25-35
35-45
45-55
45-60
65-75
Solution:
15-25
11
1
25-35
7
17
2
35-45
45-55
8
18
2
1
5
13
1
45-60
1
3
6
1
65-75
:
1
2
74
15-25
11
1
25-35
7
17
2
12
12 20
12 202
nj;Y y
nj;Y y 2
x =
x2
y2
xy =
26
25 30
25 302
35-45
45-55
8
18
2
1
5
13
1
28
28 40
28 402
20
20 50
20 502
45-60
1
3
6
1
11
11 60
11 602
65-75
1
2
3
3 75
3 752
k
1 X
1
3800 = 38:0;
(nj;X xj ) =
N j=1
100
k
1
1 X
(nj;Y yj ) =
4010 = 40: 1;
N j=1
100
k
1 X
1
nj;X x2j =
161600 = 1616:0;
N j=1
100
k
1 X
1
nj;Y yj2 =
177300 = 1773:0;
N j=1
100
k
1 X
1
(nj xj yj ) =
(11 20 20 + 7 20 30 +
N j=1
100
+1 30 20 + 17 30 30 + 8 30 40 + 1 30 50+
+2 40 30 + 18 40 40 + 5 40 50 + 1 40 60 +
+2 50 40 + 13 50 50 + 3 50 60 +
+1 60 50 + 6 60 60 + 1 60 70 +
+1 70 60 + 2 70 70) = 1675:0
cov xy = xy
2
x
2
y
=x x
=y y
rxy =
x y = 1675
x x=
x2
y y = y2
cov xy
x
=p
38 40:1 = 151: 2;
x = 1616
y 2 = 1773
382 = 172:0;
40:12 = 164: 99;
151: 2
= 0:897 55:
172:0 164: 99
18
27
26
18
8
3
100
4010
177300
nj;X x
18 20
27 30
26 40
18 50
8 60
3 70
3800
nj;F x2
18 202
27 302
26 402
18 502
8 602
3 702
161600
75
Correlation coe cient indicates a strong direct linear relationship between age
of wife and husband.
2
rxy
= 0:897 5522 = 0:805 60:
The coe cient of determination, we see that 80% of the variability is explained
by the linear dependence.
In the previous example, the value of the otherwise than we were previously
accustomed. Were included in the table, where each box representing a pair of
two sets of values, devolves given abundance. In addressing the proliferation of
deployment we properly used. Using a spreadsheet program Microsoft Excel,
respectively. OpenO ce Calc is also very likely to speed up routine calculations.
Arranged this way data is called the correlation table.
76
Chapter 8
8.1
Regression line
Suppose that we test the two physical variables X and Y , between which there
is a linear dependence
Y = a + b X:
Parameters 0 ; 1 are unknown. Therefore we do an experiment in which the
detected pairs of values [x; y]. Measurement of the x values being quite right,
it is often possible to set x to a predetermined level, while y is measured with
error. Therefore, a new statistical mode
yi =
xi + i ;
for i = 1; 2; : : : ; n;
where
yi is {-th value variable Y in random choice,
the value of Y; when variable X = 0, in random choice,
0
regression coe cient in the base set, which indicates how many
1
changes yi , where xi is changed by a unit of measurement,
xi
{-th value variable X in random choice,
random
error of variable Y for {-th observation with normal distrii
bution N 0; 2 .
77
78
8.2
and
Parameters are estimated by the method of least squares. Minimizes the sum
of squares of deviations between measured and theoretical values of Y
Q(
0;
1) =
n
X
(Yi
xi ) ! min;
i=1
n
X
xi +
i=1
n
X
i=1
n
X
xi =
x2i =
i=1
n
X
yi ;
i=1
n
X
xi yi :
i=1
Algebraic variety can be achieved, the coe cients line 0 ; 1 can be calculated
also by other relations, which are particularly useful if the unknown value [xi ; yi ]
of sample, but we know its characteristics, such as averages of variables X and
Y , the standard deviation of X and Y respectively. variances, covariance of
variables X and Y , the correlation coe cient. Coe cients for the regression
line then the
r
cov xy
x
= rXY
;
1 =
(9.1)
x
y
=
y
x:
0
1
Example 57 Based on data from Example 55 lets create a point estimate of
the regression line according to the number of days of absence and age of the
worker, while Lets create a point estimate of the number of days o 25-year
employee.
Solution: From Example 55 we know that
n
10;
xi
421;
yi
103;
x2i
19661;
yi2
1221;
xi yi
3817:
n
X
i=1
n
X
i=1
n
X
i=1
n
X
i=1
n
X
i=1
AND
79
Po dosaden do sstavy
0
n
X
n+
xi +
i=1
n
X
i=1
n
X
xi
x2i
i=1
=
=
n
X
i=1
n
X
yi ;
xi yi ;
i=1
we get
0 10 + 1 421 = 103;
421 + 1 19661 = 3817;
y = 21:587
= 21:587 a
0:268 x:
One can expect that the average number of days o 25-year employee will be
approximately 15 days per calendar year.
Example 58 Estimate the parameters of the regression line of Example 56.
Solution: Using formulas (9.1) we get
1
cov xy
x
40:1
151:2
= 0:879 07;
172
0:879 07 38 = 6: 695 3;
=
80
81
82
CHAPTER 9. ATTACHMENTS
Chapter 9
Attachments
n:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
(n)
t0:90
3:078
1:886
1:638
1:533
1:476
1:44
1:415
1:397
1:383
1:372
1:363
1:356
1:35
1:345
1:34
1:337
1:333
1:33
1:33
1:325
1:323
1:321
1:319
1:318
1:316
1:315
1:314
1:313
1:311
1:31
(n)
t0:95
6:314
2:92
2:353
2:132
2:015
1:943
1:895
1:86
1:833
1:813
1:8
1:782
1:771
1:761
1:753
1:746
1:74
1:734
1:73
1:725
1:721
1:717
1:714
1:711
1:708
1:706
1:703
1:701
1:699
1:697
(n)
t0:975
12:706
4:307
3:182
2:776
2:571
2:447
2:365
2:306
2:262
2:228
2:201
2:179
2:160
2:145
2:131
2:12
2:11
2:101
2:1
2:086
2:08
2:074
2:069
2:064
2:06
2:056
2:052
2:048
2:045
2:042
(n)
t0:995
63:657
9:925
5:841
4:604
4:032
3:707
3:499
3:355
3:250
3:169
3:106
3:055
3:012
2:977
2:947
2:921
2:898
2:878
2:861
2:845
2:831
2:819
2:807
2:797
2:787
2:779
2:771
2:763
2:756
2:750
n:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
83
0
0:1
0:2
0:3
0:4
0:5
0:6
0:7
0:8
0:9
1
1:1
1:2
1:3
1:4
1:5
1:6
1:7
1:8
1:9
2
2:1
2:2
2:3
2:4
2:5
2:6
2:7
2:8
2:9
0
0:5
0:54
0:579
0:618
0:655
0:691
0:726
0:758
0:788
0:816
0:841
0:864
0:885
0:903
0:919
0:933
0:945
0:955
0:964
0:971
0:977
0:982
0:986
0:989
0:992
0:994
0:995
0:997
0:997
0:998
0:01
0:504
0:544
0:583
0:622
0:659
0:695
0:729
0:761
0:791
0:819
0:844
0:867
0:887
0:905
0:921
0:934
0:946
0:956
0:965
0:972
0:978
0:983
0:986
0:99
0:992
0:994
0:995
0:997
0:998
0:998
0:02
0:508
0:548
0:587
0:626
0:663
0:698
0:732
0:764
0:794
0:821
0:846
0:869
0:889
0:907
0:922
0:936
0:947
0:957
0:966
0:973
0:978
0:983
0:987
0:99
0:992
0:994
0:996
0:997
0:998
0:998
0:03
0:512
0:552
0:591
0:629
0:666
0:702
0:736
0:767
0:797
0:824
0:849
0:871
0:891
0:908
0:924
0:937
0:948
0:958
0:966
0:973
0:979
0:983
0:987
0:99
0:992
0:994
0:996
0:997
0:998
0:998
0:04
0:516
0:556
0:595
0:633
0:67
0:705
0:739
0:77
0:8
0:826
0:850
0:873
0:893
0:91
0:925
0:938
0:949
0:959
0:967
0:974
0:979
0:984
0:987
0:99
0:993
0:994
0:996
0:997
0:998
0:998
0:05
0:52
0:56
0:599
0:637
0:674
0:709
0:742
0:773
0:802
0:829
0:853
0:875
0:894
0:911
0:926
0:939
0:951
0:96
0:968
0:974
0:98
0:984
0:988
0:991
0:993
0:995
0:996
0:997
0:998
0:998
0:06
0:524
0:564
0:603
0:641
0:677
0:712
0:745
0:776
0:805
0:831
0:855
0:877
0:896
0:913
0:928
0:941
0:952
0:961
0:969
0:975
0:98
0:985
0:988
0:991
0:993
0:995
0:996
0:997
0:998
0:998
0:07
0:528
0:567
0:606
0:644
0:681
0:716
0:749
0:779
0:808
0:834
0:858
0:879
0:898
0:915
0:929
0:942
0:953
0:962
0:969
0:976
0:981
0:985
0:988
0:991
0:993
0:995
0:996
0:997
0:998
0:999
0:08
0:532
0:571
0:61
0:648
0:684
0:719
0:752
0:782
0:811
0:836
0:860
0:881
0:900
0:916
0:931
0:943
0:954
0:962
0:97
0:976
0:981
0:985
0:989
0:991
0:993
0:995
0:996
0:997
0:998
0:999
sN (0;1)
0:09
0:536
0:575
0:614
0:652
0:688
0:722
0:755
0:785
0:813
0:839
0:862
0:883
0:901
0:918
0:932
0:944
0:954
0:963
0:971
0:977
0:982
0:986
0:989
0:992
0:994
0:995
0:996
0:997
0:998
0:999
0
0:1
0:2
0:3
0:4
0:5
0:6
0:7
0:8
0:9
1
1:1
1:2
1:3
1:4
1:5
1:6
1:7
1:8
1:9
2
2:1
2:2
2:3
2:4
2:5
2:6
2:7
2:8
2:9
84
n:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
120
CHAPTER 9. ATTACHMENTS
2
0:99;(n)
6:635
9:21
11:345
13:277
15:086
16:812
18:475
20:09
21:666
23:209
24:725
26:217
27:688
29:141
30:578
32
33:409
34:805
36:191
37:566
38:932
40:289
41:638
42:98
44:314
45:642
46:963
48:278
49:588
50:892
63:691
76:154
88:379
100:425
112:329
158:95
2
0:975;(n)
5:024
7:378
9:348
11:143
12:833
14:449
16:013
17:535
19:023
20:483
21:92
23:337
24:736
26:119
27:488
28:845
30:191
31:526
32:852
34:17
35:479
36:781
38:076
39:364
40:646
41:923
43:195
44:461
45:722
46:979
63:691
76:154
88:379
100:425
112:329
158:95
2
0:95;(n)
2
0:05;(n)
3:841
5:991
7:815
9:488
11:07
12:592
14:067
15:507
16:919
18:307
19:675
21:026
22:362
23:685
24:996
26:296
27:587
28:869
30:144
31:41
32:671
33:924
35:172
36:415
37:652
38:885
40:113
41:337
42:557
43:773
55:758
67:505
79:082
90:531
101:879
146:567
Quantiles
0:004
0:103
0:352
0:711
1:145
1:635
2:167
2:733
3:325
3:94
4:575
5:226
5:892
6:571
7:261
7:962
8:672
9:39
10:117
10:851
11:591
12:338
13:091
13:848
14:611
15:379
16:151
16:928
17:708
18:493
26:509
34:764
43:188
51:739
60:391
95:705
2
distribution
2
0:025;(n)
0:001
0:051
0:216
0:484
0:831
1:237
1:69
2:18
2:7
3:247
3:816
4:404
5:009
5:629
6:262
6:908
7:564
8:231
8:907
9:591
10:283
10:982
11:689
12:401
13:12
13:844
14:573
15:308
16:047
16:791
24:433
32:357
40:482
48:758
57:153
91:573
2
0:01;(n)
0
0::2
0:115
0:297
0:554
0:872
1:239
1:646
2:088
2:558
3:053
3:571
4:107
4:66
5:229
5:812
6:408
7:015
7:633
8:26
8:897
9:542
10:196
10:856
11:524
12:198
12:879
13:565
14:256
14:953
22:164
29:707
37:485
45:442
53:54
86:923
n:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
120
Bibliography
[1] J. Chajdiak, E. Rublkov, M. Gudba - TATISTICK METDY V
PRAXI, STATIS Bratislava 1994.
85