You are on page 1of 65

MODULE 3.

PROBABILITY AND STATISTICS


MATHEMATICS III: REGULARITY AND REPETITION
TOPIC 11: DATA COLLECTION
MODULE 3. PROBABILITY AND STATISTICS
INTRODUCTION

 In Mexico, in 1983, the National Institute of Statistics, Geography and Informatics (INEGI) was
instituted by presidential decree. Its creation brought the modernization of the rich tradition our country had for
gathering, processing and disseminating information about the nation, the population and the economy.

 Statistics is used for the compilation and analysis of data.

 “If there is no data, it is simply not possible to work with the tools Statistics provides” (Gutiérrez, 2010).
11.1 BASIC CONCEPTS OF STATISTICS

 Statistics is the branch of mathematics devoted to


the compilation, organization, presentation, analysis of
data, as well as their interpretation. Statistics
 Descriptive statistics is the part that collects
(through polls), introduces and describes (using tables
or graphs) data from sample.
 Inferential statistics covers the part that interprets
Descriptive Inferential
the resulting values, either through estimates or Statistics Statistics
predictions, in order to assist in the decision making.
Population

Statistic Sample

Parameter Variable

Concepts

Qualitative
Experiment
variable

Quantitative
Data
variable
CONCEPTS

Population Sample
 It is the total of the elements of the study; also  It is the subset or representative part of the
known as universe. population and it is selected by different methods,
which are called sampling methods.
CONCEPTS

Variable
 It is the characteristic or attribute the sample or
population can have. There are qualitative and
quantitative variables.

Qualitative variable
 It classifies or describes an element of the
population and they can be nominal or ordinal.

Quantitative variable
 It numerically describes an element of a population
and they are classified as discrete or continuous.
CONCEPTS

Variable
 It is the characteristic or attribute the sample or
population can have. There are qualitative and
quantitative variables.

Qualitative variable
 It classifies or describes an element of the
population and they can be nominal or ordinal.

Quantitative variable
 It numerically describes an element of a population
and they are classified as discrete or continuous.
CONCEPTS
Data
 It is the value compiled for the variable of each
element of the population or sample.
Experiment
 It is the operation of observing results and through
which a set of data is obtained.

Parameter
 It is the numerical value that describes the data of a
population.

Statistic
 It is the numerical value that describes the data of a
sample.
CONCEPTS
Data
 It is the value compiled for the variable of each
element of the population or sample.
Experiment
 It is the operation of observing results and through
which a set of data is obtained.

Parameter
 It is the numerical value that describes the data of a
population.

Statistic
 It is the numerical value that describes the data of a
sample.
11.2 SAMPLING METHODS

 Sampling is the set of techniques used to select the ”best” possible sample (the one that is considered to
represent the population better).
 To carry out sampling it is necessary to identify appropriate sources (called “frames”) because if these are
inadequate, then the sample will be too and our estimates or predictions will be wrong.

The non-probability sample, in which The probability sample requires a


the elements are taken without regard to statistical work that precedes the
probability; sometimes these are used selection of the elements, as these are
even though they are not good for chosen according to a probability of
generalizing actual estimates, since there occurrence.
is no certainty that their sample has been
representative.
NON-PROBABILITY METHODS
11.2 SAMPLING METHODS

 Sampling is the set of techniques used to select the ”best” possible sample (the one that is considered to
represent the population better).
 To carry out sampling it is necessary to identify appropriate sources (called “frames”) because if these are
inadequate, then the sample will be too and our estimates or predictions will be wrong.

The non-probability sample, in which The probability sample requires a


the elements are taken without regard to statistical work that precedes the
probability; sometimes these are used selection of the elements, as these are
even though they are not good for chosen according to a probability of
generalizing actual estimates, since there occurrence.
is no certainty that their sample has been
representative.
PROBABILITY METHODS: SIMPLE RANDOM SAMPLING

 It is the simplest and most known method of selection, and even though sometimes it is difficult to use because it
requires a sample frame, it is the base of other sampling techniques. The sample obtained had the same probability
of being selected than any other element of the population.
PROBABILITY METHODS: SYSTEMATIC RANDOM SAMPLING

 In this method, an element is selected and then the rest is selected until completing the desired sample number.
 To obtain the sample it is necessary to divide the population size (P) by the number of desired elements (n),
where the result (k) is rounded to the nearest integer. To get the result, the first element is selected randomly and
then the rest are selected to every k elements.
 This method is NOT convenient to use when there is a pattern in the arrangement of the data. (For example, in a
list of students, as these can be arranged alphabetically or by enrollment number either in ascending or
descending form).
PROBABILITY METHODS: STRATIFIED RANDOM SAMPLING

 In this technique it is necessary to divide the population into groups (called strata) formed with certain
characteristics.
 This method requires auxiliary information from the sampling frame because each stratum requires homogeneity
and heterogeneity of the elements in each of the strata.
PROBABILITY METHODS: CLUSTER SAMPLING

 It consists in dividing a population into smaller groups (or clusters). We can take as an example the elements of a
particular city or school or even a box with certain product, then random elements must be selected until
completing the sample.
CONCLUSION

 There is no better way to do a study than with 100% of the items, but sometimes this is difficult and also very
expensive, which is why Statistics uses sampling techniques that allow us to save resources as well as to obtain
reliable results.

 Various media such as television news or even print media (newspapers, magazines) mention Statistics to show
the opinion of people. Because not everyone is familiar with the basic concepts, many years ago the use of
graphics was implemented, so that we could better understand the data they show.

 One of the novel aspects was the use of statistical diagrams, as her intention was for readers to understand their
message.
QUIZ #11
1. The branch of mathematics devotes to the compilation, organization, presentation, analysis of data, as well as their interpretation; is called…
2. Statistics is dived into 2 branches that are :
3. The branch of statistics that interprets the values in order to assist in the decision making; is called…
4. The branch of statistics that collects information and describes the data from a sample; is called…
5. Explain the difference between population and sample.
6. Explain the difference between parameter and statistic.
7. Variable that classifies and describes an element; is called…
8. Variable that numerically describes an element; is called…
9. Is a characteristic the sample or population can have…
10. Is the value compiled for the variable of each element…
11. Is the operation of observing results and through which a set of data is obtained.
12. The 2 types of samples are:
13. Using the systematic random sampling formula,: If you have 550 as a population of students and you only want 20 students as your sample; what is the
resulting number?
14. Give an example of ordinal, nominal, continuous and discrete variables (different from your exercise in class).
15. Briefly explain the stratifies random sampling.
TOPIC 12: MEASURES OF CENTRAL TENDENCY
MODULE 3. PROBABILITY AND STATISTICS
INTRODUCTION

 There exist values that are located at the center


of a dataset, that represent an extract of a sample
or a population.
 The measures of central tendency are the mean,
the median and the mode, and they serve as a
representative value for a dataset.
12.1 MEAN, MEDIAN AND MODE
 Central tendency measures are divided in 2 groups: ungrouped data and grouped data.
 Ungrouped data: refers to data that have not been summarized in any way.
 The measures of central tendency, in ungrouped data are: mean, median, and mode.
 Grouped data: refers to data when we have them divided in classes and we only have the frequency of each one of
them, that is, when we have a frequency table. When there is a frequency table, the values obtained for the
measures of central tendency will be approximate.
 The measures of central tendency in grouped data sets are: approximate mean, approximate median, and
approximate mode.
MEAN (UNGROUPED DATA)

 It is the most used central measure, and it is also known as arithmetic mean or simply average. This measure is
very simple to obtain, as it requires adding up each one of the data and then dividing them by the total number of
data.
EXAMPLES

 Find the mean of these values: 3, 3, 4, 5, 6, 7, 8, 9, 9, 9


MEDIAN (UNGROUPED DATA)

 It is the numerical value found at the middle of the data, once these have been arranged in ascending order, which
is the same as arranging them from smallest to largest.
 There are two cases we must take into account:
 If the number of data (n) is odd, the median corresponds to the number in the middle.
 If the number of data (n) is even, the median will correspond to the average obtained between the 2 central numbers.
EXAMPLES

 Find the median among the following values: 8, 5, 7, 8, 10, 3, 5, 6, 8.

 Find the median among the following values: 6, 5, 10, 8, 10, 4, 7, 6, 9, 11.
MODE (UNGROUPED DATA)

 This is the value that appears most often in a data list; when there are two modes, it is called bimodal, when this is
repeated more than 2 times, it is called multimodal. The mode, depending on the case study, also could NOT exist.

 Four possibilities:
 Unimodal: 1 mode
 Bimodal: 2 modes
 Multimodal: 3 or more modes
 No mode
EXAMPLES
 Find the mode through the following information:
 In this set of data: 1, 3, 5, 7, 9, 11, 13, 15.

 In this set of data: 1, 2, 3, 3, 4, 5, 6, 7, 8

 In this set of data: 1, 2, 2, 3, 4, 5, 6, 6, 7.

 In this set of data: 1, 2, 2, 3, 4, 6, 8, 8, 8, 9, 9


APPROXIMATE MEAN (GROUPED DATA)

 It is obtained by adding up each one of the frequency products by the class mark, and then dividing them against
the total data.

Approximate mean

Class mark i (the number of the middle of the class)

Absolute frequency of the class i

Total number of data


APPROXIMATE MEDIAN (GROUPED DATA)

 Just like in ungrouped data, it can be obtained in a different way for values in which the total number of data “n” is even or
odd.

 When the total number of data (n) is odd: the approximate median corresponds to the value found in the median place =

 When the total number of data (n) is even: the approximate median corresponds to the value found in the median place =
APPROXIMATE MODE (GROUPED DATA)

 It corresponds to the value of the class mark with the highest frequency.
 In case there are two modes, it is called a bimodal distribution, when the number of modes is higher than two,
it is called multimodal.

 Four possibilities:
 Unimodal: 1 mode
 Bimodal: 2 modes
 Multimodal: 3 or more modes
 No mode
MEASURES OF VARIABILITY OR DISPERSION (GROUPED DATA)

Variance Standard deviation


 Is the expectation of the squared deviation of  Is a measure that is used to quantify the amount of
a random variable from its mean. variation or dispersion of a set of data values.
EXAMPLE

 Identify the central tendency measures in the following set of data


 APPROX. MEAN
 APPROX. MEDIAN
 APPROX. MODE

 Calculate the approximate variance and the standard deviation from


the following group of ordered pairs.
Xi f Cumulative f f ∙ X f ∙ X²

28 3 3 84

30 4 7 120
 Approx. mean=
32 2 9 64
 Approx. median=

34 5 14 170  Approx. mode

36 10 24 360  Approx. variance=


 Standard deviation=
38 7 31 266

40 6 37 240

n= 37 ∑= 1304 ∑=
EXAMPLE

 Identify the central tendency measures in the following set of data


 APPROX. MEAN
 APPROX. MEDIAN
 APPOX. MODE

 Calculate the approximate variance and the standard deviation from the
following group of ordered pairs.
Xi f Cumulative f f ∙ X f ∙ X²

1 2 2 2 2

2 3 5 6 12
 Approx. mean=
3 3 8 9 27  Approx. median=
 Approx. mode
4 6 14 24 96

5 5 19 25 125  Approx. variance=


 Standard deviation=
6 4 23 24 144

7 1 24 7 49

n= 24 ∑= 97 ∑= 455
CONCLUSION

 By collecting a number of elements, these can be located in tables of classes and frequencies, which will help
identify them or analyze them more quickly. On the other hand, these data can be summarized by the mean, mode
and median, so they help us make a more complete statistical analysis.

 It is appropriate to mention that the measures of central tendency help us to locate the center of a dataset, but in
order to form a picture of what this data actually indicates, these measurements cannot be considered sufficient,
since they do not provide all the information we need to understand the distribution of the same data, so it is
also convenient to study measures of variability.
QUIZ #12
QUIZ #12

Also include variance and standard deviation.

Interval Class mark (X) Frequency (f) Cumulative f f∙X f ∙ X2

5-9 7 4
10-14 12 6
15-19 17 5
20-24 22 15
25-29 27 13
30-34 32 3
35-39 37 1
TOPIC 13: MEASURES OF VARIABILITY
MODULE 3. PROBABILITY AND STATISTICS
INTRODUCTION

Range

 Measures of variability, also known as measures of


dispersion.
 The most common are range, variance and standard
deviation. Measures
 Represent the variation presented with data, in of
relation to their average. variation
Standard
Variance
Deviation
13.1 RANGE, VARIANCE, AND STANDARD DEVIATION

 Range: It is considered the easiest to obtain measure of dispersion, since it is just a matter of subtracting the
maximum value of the data, minus the minimum value of the same data.
13.1 RANGE, VARIANCE, AND STANDARD DEVIATION

 Variance: It is considered the most important measure of dispersion, the one that tells us how far or near the
data are in relation to the mean.
13.1 RANGE, VARIANCE, AND STANDARD DEVIATION

 Standard deviation: It is the square root of the variance and it is expressed in the same unit of data.
EXAMPLE

 Find the variance and standard deviation of the following vales sample: 3, 3, 4, 5, 6, 7, 8, 9, 9, 9.
13.1 RANGE, VARIANCE, AND STANDARD DEVIATION

 Approximate Variance – when a sample of the population is analyzed (n-1).


13.1 RANGE, VARIANCE, AND STANDARD DEVIATION

 Approximate Standard deviation – when a sample of the population is analyzed (n-1).


EXAMPLE
 The time range, in minutes, that a sample of high school students spends on Facebook throughout the day.

Time in minutes
Amount of students
spent on Facebook Class mark (X) f∙X f ∙ X2
(f)
(Class interval)
[0-60) 17
[60-120) 3
[120-180) 4
[180-240) 8
[240-300) 3
[300-360) 4
[360-420) 2
[420-480) 1
[480-540) 1
N= N=
CONCLUSION

 At present there are several technological resources that will help us to expedite the calculation of measures of
central tendency and dispersion, but it is advisable that you know the procedure by which you will be able to
obtain these measures when you do not have these tools on hand.
 Sometimes we can find grouped data and ungrouped data, so it is very helpful to be familiar with the calculation of
measures of central tendency and variability, which will help us to interpret and analyze a dataset, which will help
you make a possible decision.
QUIZ #13

 RANGE, VARIANCE AND STANDARD DEVIATION


TOPIC 14: PROBABILITY
MODULE 3. PROBABILITY AND STATISTICS
14.1 BASIC PRINCIPLES OF PROBABILITY

 Probability is the chance that something to happen (the proportion of favorable cases among the total cases).
 The probability values are always between zero and one.
BASIC PRINCIPLES OF PROBABILITY

 Experiment: It is the procedure in which we observe the


results in certain conditions; these can be deterministic
experiments, which is when we have the same result as long
as we are under the same conditions (F=ma); or random
experiments, which is when the results are variable after
conducting an experiment (heads/tails).
 Event: It corresponds to the set of one or more results of an
experiment.
 Sample space: It is the set of the possible events that might
happen.
 Sample point: It is each one of the elements in the sample
space.
𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑐𝑎𝑠𝑒𝑠
Probability of an event =
𝑡𝑜𝑡𝑎𝑙 𝑐𝑎𝑠𝑒𝑠

𝑒𝑣𝑒𝑛𝑡/𝑠
=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
EXAMPLE 1

 Calculate the probability of tossing a coin and getting 2 tails in a row.


VENN DIAGRAM

 Set: is an specific collection described clearly where the elements or people that make up a set are
called elements or members of a set.
 A set can be described by enumeration (by making a list that includes the elements of the
set), understanding (by providing a rule that identifies the elements of the set) and Venn diagram (which is
the graphic method where the set and their relations are represented).

 By enumeration: {January, February, March, April,


May, June, July, August, September, October,
November, December}
 By understanding: {the months of the year}
 By Venn diagram: 
EXAMPLE 2

 Represent the seasons of the year for the S set:


 By enumeration:

 By understanding:

S
 By Venn diagram:

U
VENN DIAGRAM

 Universe set: It is defined with the letter ∪, and it includes all the elements of the set.
 Subset: When all the elements of a set (A) belong to another set (B) it is said that A is a subset of B, and it is
represented bye the symbol ⊆, for example A ⊆B.
 Empty set: it is the set that has no elements, and it is represented by { } or with the symbol ∅.
 Venn diagrams are used to graphically show the grouping of elements in sets..
14.2 BASIC SET OPERATIONS

 The basic operations in sets are four and these are: union (∪), intersection (∩), complement (𝐴𝐶 ), and difference
(–).

A B

U
BASIC SET OPERATIONS

Union Intersection
 The union of the A and B sets is the set of elements  The A and B intersection corresponds to the set of
that are in A, in B or even in both; it is represented elements that are in A and also are in B; it is
by the ∪, for example A∪B. represented with the symbol ∩, for example A∩B.
BASIC SET OPERATIONS

Complement Difference
 The complement of A 𝐴𝐶 , are those elements of  The difference of A and B is the set of elements that
the universe ∪, that do not belong to A, therefore are in A but not in B; its symbol is A – B.
𝐴𝐶 = ∪ – A.
EXAMPLE 3
CONCLUSION

 One of the objectives of probability will be to calculate the odds that an event occurs, which will be of great help
for decision-making.
QUIZ #14
QUIZ #14

 What is the probability of taking a King of a 52 poker deck shuffled?


 What is the probability of taking a number of a 52 poker deck shuffled?