TM - Dua - Statistika & Probabilitas (Ms181403) - Ganjil 2018-2019

STATISTIKA DAN PROBABILITAS
(MS184303)
HASAN IQBAL NUR, ST, MT.

DIKA VIRGINIA DEVINTASARI, S.Si, M.Sc.
DEPARTEMEN TEKNIK TRANSPORTASI LAUT

FAKULTAS TEKNOLOGI KELAUTAN
INSTITUT TEKNOLOGI SEPULUH NOPEMBER
Semester Ganjil 2018/2019 1

Pokok Bahasan
1. Pendahuluan: Pengantar Statistika dan Probabilitas
2. Statistika Deskriptif
3. Probabilitas
4. Distribusi Probabilitas: Variabel Random (Diskrit dan Kontinyu)
5. Distribusi Variabel Random Diskrit
6. Distribusi Variabel Random Kontinyu
7. Hubungan antar distribusi
8. Penaksiran parameter
9. Pengujian hipotesis
10. Tes Chi-Square
Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 2

MINGGU KE-2
Pendahuluan: Statsitika
Deskriptif (Descriptive
Statistics)

Jenis-Jenis Statistika
• Statistika deskriptif: metode yang berkaitan
dengan pengumpulan dan penyajian data i.e.,
Penyampaian secara grafis dan numeris dari
data amatan untuk keperluan deskripsi.
• Statistika inferensi: metode yang berkaitan
dengan analisis sampel untuk penarikan
kesimpulan (inferensi) tentang karakteristik
populasi.

What is Descriptive Statistics? [Ronald. E. Walpole]
There are times when a scientific practitioner wishes only to
gain some sort of summary of a set of data represented in
the sample. In other words, inferential statistics is not
required. Rather, a set of single-number statistics or
descriptive statistics is helpful.
These numbers give a sense of:
1. the centre of the location of the data,
2. variability in the data and
3. general nature of distributions of observations in the
Sample

Continue ...
Types of descriptive statistics:
• Organize Data
◦ Tables
◦ Graphs, i.e.,
• Summarize Data
◦ Central Tendency (distribusi frekuensi)
 Mean
 Median
 Modus
◦ Variation (Measuring Variability)

 Range
 Variance
 Standard Deviation
 Quartile

Key measures
Describing data
Statistika dan Probabilitas_Departemen

Teknik Transportasi Laut_Ganjil 2018/2019 8
Key distinction
Population vs. Sample Notation

Central Tendency > Mean (Rata-Rata)
• Suppose that the observations in a sample are 𝑥1 , 𝑥2 , … , 𝑥𝑛 . The sample
mean is denoted by 𝑥.ҧ
• The sampling distribution of the sample mean is a probability

distribution of all the sample means. Let’s say you had 1,000 people,
and you sampled 5 people at a time and calculated
their average height. If you kept on taking samples (i.e. you repeated
the sampling a thousand times), eventually the mean of all of your
sample means will:
1. Equal the population mean, μ
2. Look like a normal distribution curve.

Continue ...
• Mean can be badly affected by outliers (data points with extreme
values unlike the rest)
• Outliers can make the mean a bad measure of central tendency or
common experience

Central Tendency > Median (Nilai Tengah)
• The purpose: to reflect the central tendency of the sample in such a way that is
uninfluenced by extreme value or outliers.
median
• Given the observations in a sample are 𝑥1 , 𝑥2 , … , 𝑥𝑛 , arrange in increasing order of

magnitude, the sample median is:

Continue ...
2. If the recorded values for a variable form a symmetric distribution,
the median and mean are identical.
3. In skewed data, the mean lies further toward the skew than the
median.

The most common The combined IQ
data point is called scores for Classes A
the mode. & B:
Central
Tendency>
Mode 80 87 89 93 93 96 97
BTW, It is possible to
98 102 103 105 106
have more than one
109 109 109 110 111
mode!
115 119 120

Teknik Transportasi Laut_Ganjil
2018/2019
14
Continue ... 1. It may give you the most likely experience
rather than the “typical” or “central”
experience.
2. In symmetric distributions, the mean, median,
and mode are the same.
3. In skewed data, the mean and median lie
further toward the skew than the mode.

Types of descriptive statistics:
• Organize Data
o Tables
o Graphs, i.e.,
• Summarize Data
◦ Central Tendency (distribusi frekuensi)
o Mean
o Median
o Modus
◦ Variation (Measure of Dispersion)
 Range
 Variance
 Standard Deviation
 Quartile

Dispersion > Range
• The spread, or the distance, between the lowest and highest values of a
variable.
• To get the range for a variable, you subtract its lowest value from its
highest value. 𝑿𝒎𝒂𝒙 − 𝑿𝒎𝒊𝒏
• The Range can be useful and is discusses at length on Statistical Quality
Control.

Dispersion > Variance
1. A measure of the spread of the recorded values on a variable.
2. A measure of dispersion.
3. Large variability in a data set produces relatively large value of
𝑥 − 𝑥ҧ 2 and thus a large sample variance.
The larger the variance, the further the individual cases are from the mean,
The smaller the variance, the closer the individual scores are to the mean.

Continue ...
Variance is extensively used in probability theory, where from a given
smaller sample set, more generalized conclusions need to be drawn. This is
because variance gives us an idea about the distribution of data around the
mean, and thus from this distribution, we can work out where we can expect
an unknown data point. [smaller data set  data distanalyse]
1. Calculating variance starts with a “deviation.”
A deviation is the distance away from the mean of a case’s score.
(𝑥 − 𝑥)ҧ
Example:
If the average person’s car costs
$20,000, my deviation from the
mean is - $14,000! So,
6,000 – 20,000K = -14K

Question (?)
1. The deviation of 102 from 110.54 is?

2. Deviation of 115?
Class A--IQs of 13 Students

102 115
128 109
131 89
98 106
140 119
93 97
110
(𝑥 − 𝑥)ҧ 𝐴 = 110.54

Continue ...
• We want to add these to get total deviations, but if we were to do that,
we would get zero every time. Why?
The data has the same value to the mean
• We need a way to eliminate negative signs. Why?
Since we are only interested in the deviations of the scores and not
whether they are above or below the mean score, we can ignore the
minus sign and take only the absolute value, giving us the absolute
deviation.
2. Squaring the deviations will eliminate negative signs...

A Deviation Squared: (𝑥 − 𝑥)ҧ 2
Total
Deviation ...
Deviation

Continue ...
3. If you were to add all the squared deviations together, you’d get what
we call the “Sum of Squares.”
𝑠 2 = σ(𝑥 − 𝑥)ҧ 2 =(𝑥1 − 𝑥)ҧ 2 + (𝑥2 − 𝑥)ҧ 2 + ... + (𝑥𝑛 − 𝑥)ҧ 2
Total Sum of
Deviation
Deviation Square

4. The last step,
The last step, the approximate average sum of squares.
Thus,
• all variances that are non-zero will be positive numbers.
• A large variance indicates that numbers in the set are far from the mean
and each other, while a small variance indicates the opposite.
Total Sum of
Deviation Variance
Deviation Square

Sample Standard Deviation
The standard deviation is a measure of the spread of scores within a set
of data. Denoted by 𝑠, is the positive square root of 𝑠 2 , that is:
REVIEW:
Deviation  Deviation Squared  Sum of Squared  Variance 
Standard Deviation

Continue ...

Variance VS Std. Dist
Which Variability is more important ?

The Second Quartile (Median)
What does it mean?

The First Quartile
What does it mean?

The Third Quartile
What does it mean?

Graphical Diagnostics
• Scatter Plot
• Stem-and-Leaf-Plot
• Histogram
• Box-and-Whisker-Box or Box-Plot

Scatter Plot
 Explanatory and Response Variables
Most statistical studies examine data on more than one variable. In
many of these settings, the two variables play different roles.
Definition:
A response variable measures an outcome of a study.
An explanatory variable may help explain or influence
changes in a response variable.
Note: In many studies, the goal is to show that changes in

one or more explanatory variables actually cause
changes in a response variable. However, other
explanatory-response relationships don’t involve direct
causation.

Displaying Relationships: Scatterplots
The most useful graph for displaying the relationship between two
quantitative variables is a scatterplot.
Definition:
A scatterplot shows the relationship between two quantitative
variables measured on the same individuals. The values of one
variable appear on the horizontal axis, and the values of the
other variable appear on the vertical axis. Each individual in
the data appears as a point on the graph.
1.Decide which variable should go on each axis.
•Remember, the eXplanatory variable goes on the X-axis!
2.Label and scale your axes.
3.Plot individual data values.

Displaying Relationships: Scatterplots
 Make a scatterplot of the relationship between body

weight and pack weight.
 Since Body weight is our eXplanatory variable, be sure to
place it on the X-axis!
Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019

Interpreting Scatterplots
How to Examine a Scatterplot

As in any graph of data, look for the overall pattern and for striking
departures from that pattern.
•You can describe the overall pattern of a scatterplot by the direction,
form, and strength of the relationship.
•An important kind of departure is an outlier, an individual value that
falls outside the overall pattern of the relationship.

Interpreting Scatterplots
Outlier
There is one possible outlier, the hiker with
the body weight of 187 pounds seems to be
carrying relatively less weight than are the
other group members.
Strength Direction Form

There is a moderately strong, positive, linear relationship between body weight and
pack weight.
It appears that lighter students are carrying lighter backpacks.
36
Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019
Stem-and-Leaf Plot
• Combined tabular and graphical display
How can we create a Steam-and-Leaf Plot?
Use the data in the table to make a stem-and-leaf plot.
Step 1: Group the data by tens digits.

75 79
Step 2: Order the data from least to greatest. 83 84 86 86 88
91 94 99

Helpful Hint!
To write 42 in a stem-and-leaf plot, write

each digit in a separate column.
4 2
Stem
Leaf
Test Scores
Stems Leaves
7 5 9
8 3 4 6 6 8
9 1 4 9

Find the least value, greatest value, mean, median,
mode, and range of the data.
The least stem and least leaf give Stems Leaves
the least value, 40. 4 00157
5 1124
The greatest stem and greatest leaf
give the greatest value, 94. 6 333599
7 044
8 367
9 14
Use the data values to find the mean (40 + … + 94) ÷ 23 = 64.
Key: 4 0 means 40

The median is the middle value in the table, 63.
To find the mode, look for the number that occurs most
often in a row of leaves. Then identify its stem. The mode is
63.
The range is the difference between

the greatest and the least value.
94 – 40 = 54.

• The stem-and-leaf plot contains only four stems,
There is a consequently does not provide an adequate Picture
of the distribution.
case... • The smaller the number of data available, the
smaller is our choice for the number of stems.
• Usually we choose 5 to 20 stems

Histogram
Relative
Line Plot Frequency Histogram
Dist.

Continue...
Another way is through the use of frequency distribution, where the data,
grouped into different classes or intervals, can be constructed by counting
the leaf belonging to each Stem and nothing that Stem defines a class
interval.
(𝑏 − 𝑎)
2

The Histogram

Box-and-Whisker Plot or Box Plot
• A box plot summarizes data using the median, upper and lower
quartiles, and the extreme (least and greatest) values. It allows you to
see important characteristics of the data at a glance.
• Interquartile range ( Upper quartile, extremes the 75% percentiles;
Lower quartile, the 25% percentiles).
• The five number summary consist of :
1. The median ( 2nd quartile)
2. The 1st quartile
3. The 3rd quartile
4. The maximum value in a data set
5. The minimum value in a data set

Importance
Why do we need to know how to display and analyze data

in box-and-whisker plots ?
*It helps you to interpret and represent data.

*It gives a visual representation of data.

Box and Whisker Diagrams.
Anatomy of a Box and Whisker Diagram.
Lower Upper
Lowest Quartile Median Quartile Highest
Value Value
Whisker Box Whisker
4 5 6 7 8 9 10 11 12
Box Plots

Statistic Descriptive
Done!!!
Now you are qualified for this study. Any questions?

Exercise 1.21
The lengths of power failures, in minutes, are recorded in the following
table.
(a) Find the sample mean and sample median of the power-failure times.
(b) Find the sample standard deviation of the power failure times.

Exercise 1.18
The following scores represent the final examination grades for an elementary
statistics course:
23 60 79 32 57 74 52 70 82
36 80 77 81 95 41 65 92 85
55 76 52 10 64 75 78 25 80
98 81 67 41 71 83 54 64 72
88 62 74 43 60 78 89 76 84
48 84 90 15 79 34 67 17 82
69 74 63 80 85 61
(a) Construct a stem-and-leaf plot for the examination grades in which the
stems are 1, 2, 3, . . . , 9.
(b) Construct a relative frequency histogram, draw an estimate of the graph
of the distribution, and discuss the skewness of the distribution.
(c) Compute the sample mean and sample std. dev

Exercise 1.27
A study is done to determine the influence of the wear, y, of a bearing as a function of the load, x,
on the bearing. A designed experiment is used for this study. Three levels of load were used, 700 lb,
1000 lb, and 1300 lb. Four specimens were used at each level, and the sample means were,
respectively, 210, 325, and 375.
(a) Plot average wear against load.
(b) From the plot in (a), does it appear as if a relationship exists between wear and load?
(c) Suppose we look at the individual wear values for each of the four specimens at each load level
(see the data that follow). Plot the wear results for all specimens against the three load values.
(d) From your plot in (c), does it appear as if a clear relationship exists? If your answer is different
from that in (b), explain why.


TM - Dua - Statistika & Probabilitas (Ms181403) - Ganjil 2018-2019

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TM - Dua - Statistika & Probabilitas (Ms181403) - Ganjil 2018-2019

Uploaded by

Copyright:

Available Formats

STATISTIKA DAN PROBABILITAS

HASAN IQBAL NUR, ST, MT.

DEPARTEMEN TEKNIK TRANSPORTASI LAUT

Semester Ganjil 2018/2019 1

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 2

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 3

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 4

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 6

◦ Variation (Measuring Variability)

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 7

Statistika dan Probabilitas_Departemen

Statistika dan Probabilitas_Departemen

• The sampling distribution of the sample mean is a probability

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 10

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 11

• Given the observations in a sample are 𝑥1 , 𝑥2 , … , 𝑥𝑛 , arrange in increasing order of

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 12

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 13

Statistika dan Probabilitas_Departemen

Statistika dan Probabilitas_Departemen

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 16

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 17

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 18

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 19

1. The deviation of 102 from 110.54 is?

Class A--IQs of 13 Students

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 20

2. Squaring the deviations will eliminate negative signs...

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 21

𝑠 2 = σ(𝑥 − 𝑥)ҧ 2 =(𝑥1 − 𝑥)ҧ 2 + (𝑥2 − 𝑥)ҧ 2 + ... + (𝑥𝑛 − 𝑥)ҧ 2

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 22

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 23

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 24

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 25

Which Variability is more important ?

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 26

What does it mean?

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 27

What does it mean?

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 28

What does it mean?

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 29

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 30

Note: In many studies, the goal is to show that changes in

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 32

1.Decide which variable should go on each axis.

•Remember, the eXplanatory variable goes on the X-axis!

2.Label and scale your axes.

3.Plot individual data values.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 33

 Make a scatterplot of the relationship between body

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019

How to Examine a Scatterplot

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 35

Strength Direction Form

Use the data in the table to make a stem-and-leaf plot.

Step 1: Group the data by tens digits.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 37

To write 42 in a stem-and-leaf plot, write

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 38

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 39

The range is the difference between