SDDF

Chapter One
Introduction
“Statistical thinking will one day be as necessary for efficient citizenship as the ability to
read and write.”
G. WELLS
In the modern world of computers and information technology, the importance of
statistics is very well recognized by all the disciplines. Statistics has originated as a
science of statehood and found applications slowly and steadily in Agriculture,
Economics, Commerce, Biology, Medicine, Industry, planning, education and so on.
In the mean time, there is no other human walk of life, where statistics cannot be
applied. Hence, we are constantly being bombarded with statistics and statistical
information.
1.1 History of Statistics

Statistics, the science of learning from data, is a relatively new discipline. One can
divide the history of statistics into three periods using the years 1900 and 1960.
 In the early days of statistics (before 1900), much of the statistical work was
developed to data analysis including the construction of graphical displays.
There was little work done on inferential statistics. The foundation of
Bayesian inference had been developed by Bayes and Laplace in the 18 th
century.
 The foundations of statistical inference were developed in the period between
1900 and 1960. Karl Pearson developed the chi-square goodness-of-fit
procedure around the year 1900 and R.A. Fisher developed the notions of
sufficiency and maximum likelihood in this period. Statistical procedures are
evaluated in terms of their long-run behavior in repeated sampling. For this
reason, these procedures are known as frequentist methods. Properties such as
unbiasedness and mean square error are used to evaluate procedures. Some
prominent Bayesians such as Harold Jeffrey, Jimmie Savage, and I.J. Good
made substantial contributions during this period, but the frequentist
methods became the standard inferential methods in the statistician’s toolkit.
1
 In the last 50 years, there has been a great development in new statistical
methods, especially computational demanding methods such as the bootstrap
and nonparametric smoothing. Due to the recent availability of high-speed
computers together with new simulation-based fitted algorithms, Bayesian
methods have become increasingly popular. In contrast to the middle period
of statistics, where frequentist methods were dominate, we currently live in a
frequentist/Bayesian world where statisticians routinely use Bayesian
methods in situations where this inferential perspective has particular
advantages.
The word Statistics and Statistical are all derived from Latin word status which means
a political state. Statistics is defined differently by different authors over a period of
time. In the olden days, statistics was confined to only state affairs but in modern
days it embraces almost every sphere of human activity. Therefore, a number of old
definitions, which was confined to narrow field of enquiry, were replaced by other
definitions, which are much more comprehensive and exhaustive.
1.2 Definition and classification of Statistics
First let’s see different ways of defining statistics by different authors or Dictionaries.
 The American Heritage Dictionary defines statistics as “The mathematics of
collection, organization and interpretation of numerical data, especially the
analyses of population characteristics by inference from sampling.”
 The Merriam-Webster’s collegiate Dictionary defines statistics as “A branch of
mathematics dealing with the collection, analyses, interpretation, and
presentation of masses of numerical data.”
 The former American Statistical Association president Jon Kettering define
statistics as “… the science of learning from data …It presents exciting
opportunities for those who work as professional statisticians. Statistics is
essential for the proper running of government, central to decision making in
industry and a core component of modern educational curricula at all level.”
Therefore into the consideration of the above concepts we can define statistics in two senses
2
a. In the plural sense : statistics are the raw data themselves , like statistics of births,
statistics of deaths, statistics of students, statistics of imports and exports, etc.
b. In the singular sense: statistics is the subject that deals with the collection, organization,
presentation, analysis and interpretation of numerical data
Classifications:
Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.
Descriptive statistics: deals with the meaningful presentation of data such that its
characteristics can be effectively observed. Descriptive statistics consists of the collection,
organization, summarization, and presentation of data. It encompasses the tabular, graphical
or pictorial display of data, condensation of large data into tables, preparation of summary
measures to give a concise description of complex information and also to exhibit pattern that
may be found in data sets.
Inferential statistics: Inferential statistics on other hand, deals with drawing inferences and
taking decision by studying a subset or sample from the population. That means, it generalize
the result from sample to population, performing estimations and hypothesis tests,
determining relationships among variables, and making predictions. For example, the average
income of all families (the population) in Ethiopia can be estimated from figures obtained
from a few hundred (the sample) families. It is important because statistical data usually arises
from sample.
1.3 Stages in statistical investigation
There are five stages or steps in any statistical investigation.

1. Collection of Data: The first stage of statistical investigation. The process of measuring,
gathering, assembling the raw data up on which the statistical investigation is to be based.
Two methods of data collection: Primary and Secondary: Primary method of data
collection refers to obtaining original and first hand data and Secondary method of data
collection involves obtaining data from other sources.
2. Organization of Data: This is a methodology for classification and describing the

properties of data in a summary form. Editing, coding and classification are the three steps in
the organization of data.
3
3. Presentation of Data: In this stage the collected and organized data are presented with in
some systematic order to facilitate statistical analysis. The organized data are presented with
the help of tables, diagrams and graphs.
4. Analysis of Data: Analysis of data involves extraction of relevant information from the
collected data using some mathematical and statistical tools. In other words, it involves
extracting relevant information from the data (like mean, median, mode, range, variance…),
mainly through the use of elementary mathematical operation.
5. Interpretation of Data: This stage involves drawing a valid conclusion from the analyzed
data. That is interpretation of data involves making inferences (drawing conclusions) based
on the analysis of data.
1.4 Definition of Some Basic terms
1. A (statistical) population: is the complete set of possible measurements for which

inferences are to be made. The population represents the target of an investigation, and the
objective of the investigation is to draw conclusions about the population hence we sometimes
call it target population.
Examples
Population of trees under specified climatic conditions
Population of animals fed a certain type of diet
Population of farms having a certain type of natural fertility
Population of households, etc
The population could be finite or infinite (an imaginary collection of units)
There are two ways of investigation: Census and sample survey.
2. Census: a complete enumeration of the population. But in most real problems it cannot be
realized, hence we take sample.
3. Sample: A sample from a population is the set of measurements that are actually collected
in the course of an investigation. It should be selected using some pre-defined sampling
technique in such a way that they represent the population very well.
Examples:
Monthly production data of a certain factory in the past 10 years.
Small portion of a finite population.
In practice, we don’t conduct census, instead we conduct sample survey
4. Parameter: Characteristic or measure obtained from a population.
4
5. Statistic: Characteristic or measure obtained from a sample.
6. Sampling: The process or method of sample selection from the population.
7. Sample size: The number of elements or observation to be included in the sample.
8. Variable: It is an item of interest that can take on many different numerical values.
1.5 Applications, uses and limitations of Statistics
Applications of statistics:
• In almost all fields of human endeavor.
• Almost all human beings in their daily life are subjected to obtaining numerical facts
e.g. abut price.
• Applicable in some process e.g. invention of certain drugs, extent of environmental
pollution.
• In industries especially in quality control area.
Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena.
The following are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison
5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variable.
8. Forecasting future events.
Limitations of statistics
As a science statistics has its own limitations. The following are some of the limitations:
• Deals with only quantitative information.
• Deals with only aggregate of facts and not with individual data items.
• Statistical data are only approximately and not mathematical correct.
• Statistics can be easily misused and therefore should be used be experts.
1.6 Types of Variables and Measurement Scales

A variable is a characteristic or attribute that can assume different values in different
persons, places, or things. Some examples of variables include diastolic blood
pressure, heart rate, the height of adult males, the weights of preschool children,
5
gender of statistics students, marital status of instructors at UoG, ethnic group of
patients, the age of patients seen in a dental clinic, the number of daily admissions to a
general hospital, and the number of decayed, missing or filled teeth per child in an
elementary school.
Data refers to a collection of facts, values, observations, or measurements that the
variables can assume. The raw material of statistics is data. A collection of data values
forms a data set. Each value in the data set is called a data value or a datum.
Variables can be classified as qualitative or quantitative.

Qualitative variables are variables that can be placed into distinct categories,
according to some characteristic or attribute. Qualitative variables are variables which
assume non-numerical values. They cannot be measured.
Examples: Gender of patients, marital status of patients, ethnic group of patients and
state of birth.
Quantitative variable are variables which assume numerical values. A quantitative
variable is one that can be either measured or counted in the usual sense. For example,
Quantitative variables can be further classified in to two groups: discrete and
continuous.
Discrete Variables: are variables which assume a finite or countable number of

possible values. They are usually obtained by counting. A discrete variable is
characterized by gaps or interruptions in the values that it can assume. These gaps or
interruptions indicate the absence of values between particular values that the
variable can assume.
Example:
 The number of daily admissions to a general hospital, and
 The number of first year statistics students
 The number of decayed, missing or filled teeth per child in an elementary
school.
Continuous Variables: are variables which assume an infinite number of possible
values between any two specific values. They are usually obtained by measurement.
A continuous variable does not possess the gaps or interruptions characteristic of a
discrete variable.
Example:
6
 Weight, age, length, temperature, weight, speed, salary and mark of students
Scales of measurements
We may generally refer to data as a collection of facts, values, observations, or
measurements. So if our data consists of observations that can be classified, ordered, or
quantified, then at what level does the measurement take place? Or how data are classified,
measured or counted? Here we are interested in the forms in which data is found or the scales
on which data is measured. Measurement scale refers to the property of value assigned to the
data based on the properties of order, distance and fixed zero. These scales, stated in terms of
increasing information content, are classified as nominal, ordinal, interval, and ratio.
Nominal Scales
It is associated with the word name since this scale identifies categories. Observations on a
nominal scale possess neither numerical values nor order. However, observations on this
type of scale can be given numerical codes such as “0 or 1” or “1, 2, 3 . . .”. Note that when
dealing with a nominal scale, the categories defined must be mutually exclusive (each item
falls into one and only one category) and collectively exhaustive (the list of categories is
complete in that each item can be classified). These numbers serve only as identifiers; the
magnitude of the differences between these numerical values is meaningless. Classifying
residents according to zip codes is an example of the nominal level of measurement. Even
though numbers are assigned as zip codes, there is no meaningful order or ranking. The only
valid operations for variables represented by a nominal scale are the determination of “=” or
“≠.”
In short, in Nominal scales of measurements:
 No order or ranking can be imposed on the data.
 No arithmetic and relational operation applied between the data.
Examples:
 Using numbers to distinguish among the various medical diagnoses
 Sex (Male or Female.)
 Marital status(married, single, widow, divorce)
 Country code
 Students identification number
 Regional differentiation of Ethiopia.
Ordinal scales
7
The ordinal scale (think of the word order) includes all properties of the nominal scale with
the additional property that the observations can be ranked from the smallest to the largest or
from the least important to the most important. (Note that nominal measurements cannot be
ordered—all items are treated equally.) In this regard, the only valid operations for ordinally
scaled variables are “=, ≠, <, >.”
That means, in Ordinal scales of measurement:
 There are orders or ranks among the data but differences between the ranks do not
exist.
 Arithmetic operations are not applicable but relational operations are applicable.
Example:
 Letter grades (A, B, C, D, F).
 Rating scales (Excellent, Very good, Good, Fair, poor).
 Military status.
Note: Both the nominal and ordinal scales are termed nonnumeric scales since differences
among their values are of no consequence or meaningless.
Interval scales
It includes all the properties of the ordinal scale with the additional property that distance
between observations is meaningful. Here the numbers assigned to the observations indicate
order and possess the property that the difference between any two consecutive values is the
same as the difference between any other two consecutive values (the difference 10 − 9 = 1
has the same meaning as 3 − 2 = 1). It is important to note that while an interval scale has a
zero point; its location may be arbitrary. Hence ratios of interval scale values have no
meaning.
For example the Fahrenheit temperature scale, measured in degrees, is an interval scale, as is
the centigrade scale. The temperature difference between 50 and 60 degrees centigrade (10
degrees) equals the temperature difference between 80 and 90 degrees centigrade (10
degrees). Note that the 0 in each of these scales is arbitrarily placed, which makes the interval
scale different from ratio. If the temperatures in Gondar and Bahirdar were 20 and 40 degree
centigrade respectively, then we cannot say that Bahirdar is twice as hot as Gondar, and
hence, ratio is meaningless in this scale of measurement. The operations for handling
variables measured on an interval scale are “=, ≠, >, <, +, −.”
In general,
8
 Interval scale of measurement is a level of measurement which classifies data
that can be ranked and differences are meaningful. However, there is no
meaningful zero or true zero, so ratios are meaningless.
Example:
 IQ
 Temperature
 SAT scores
Ratio scales
It includes all the properties of the interval scale with the added property that ratios of
observations are meaningful. This is because absolute zero is uniquely defined. Clearly
variable Gift in dollar is a ratio variable in that $0 measures the absence of any gift and a gift
of $2000 is twice as large as a gift of $1000 (the ratio is 2/1 = 2). Valid operations for
variables measured on a ratio scale are “=, ≠, >, <, +, −, ×, ÷.”
Generally,
 It is a level of measurement which classifies data that can be ranked,
differences are meaningful, and there is a true zero. True ratios exist between
the different units of measure.
 All arithmetic and relational operations are applicable.
Examples:
 Height, weight, time, salary, age and number of students in the class.
Note: Both the interval and the ratio scales are said to be metric scales (since differences
between values measured on these scales are meaningful), and variables measured on
these scales are said to be quantitative variables.
9
The following present a list of different attributes and rules for assigning numbers to objects.
Try to classify the different measurement systems into one of the four types of scales.
(Exercise)
1. Your checking account number as a name for your account.

2. Your checking account balance as a measure of the amount of money you have in that
account.
3. The order in which you were eliminated in a spelling bee as a measure of your spelling
ability. Your score on the first statistics test as a measure of your knowledge of
statistics.
4. Your score on an individual intelligence test as a measure of your intelligence.
5. The distance around your forehead measured with a tape measure as a measure of your
intelligence.
6. A response to the statement "Abortion is a woman's right" where "Strongly
Disagree" = 1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4, and "Strongly
Agree" = 5, as a measure of attitude toward abortion.
7. Times for swimmers to complete a 50-meter race
8. Months of the year Meskerm, Tikimit…
9. Socioeconomic status of a family when classified as low, middle and upper classes.
10. Blood type of individuals, A, B, AB and O.
11. Pollen counts provided as numbers between 1 and 10 where 1 implies there is almost
no pollen and 10 that it is rampant, but for which the values do not represent an actual
counts of grains of pollen.
12. Regions numbers of Ethiopia (1, 2, 3 etc.)
13. The number of students in a college;
14. the net wages of a group of workers;
15. the height of the men in the same town;
10
Chapter Two: Data Collection, Presentation and Analysis
2 Methods of Data Collection and Presentation
2.1 Methods of data collection
Data: is the raw material of statistics. It can be obtained either by measurement or counting.
2.1.1. Sources of Data:
Statistical data may be obtained from two sources, namely, primary and secondary.
1. Primary Data: data measured or collected by the investigator or the user directly
from the source. Primary sources are sources that can supply first hand information
for immediate user.
-There are various methods of collecting primary data:
a. Direct observation: this is counting the data of interest in person.
Drawback: not always possible to observe directly.
Example: Data on the number of cigarette smokers in UoG.
b. Personal interview: this is contacting the desired people in individual and
asking questions.
11
Example: to determine whether the salary of workers in a given factory is fair or
not, an investigator may contact each worker and ask his or her opinion.
Drawback:-It is time consuming
- Cost of training interviews is high
- People may not be open in giving the information we need.
c. Telephone interview: this is contacting the desired people through telephone
lines.
Drawbacks: -Respondents may not be available to telephone calls
- Personal type of questions may not be answered.
d. Written questionnaires: in this case written questionnaires are mailed to
individuals and the method is most widely used because;
a) Large number of individuals may be contact within a very short
period of time (i.e. it takes less time).
b) It reduces cost.
2. Secondary Data: When an investigator uses data, which have already been collected
by others, such data are called secondary data. Such data are primary data for the
agency that collected them, and become secondary for someone else who uses these
data for his own purposes. Data gathered or compiled from published and unpublished
sources or files is known as secondary data.
• When our source is secondary data check that:
o The type and objective of the situations.
o The purpose for which the data are collected and well-matched with the
present problem.
o The nature and classification of data is appropriate to our problem.
o There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other purpose.
2.1.2. Methods Collection Data
A data collection instrument is a document used for gathering and recording of data in a
survey. Questionnaire is the main data collection instrument in formal sample survey.
Data Gathering Techniques
The objective of the survey, the nature of the items of information, the operational feasibility
and cost will often determine the method of data collection. Of the various methods of
collecting the data just a few of them are outlined below.
12
1. Self administered Questionnaire
Mail and self administered questionnaire is a method of data collection in which researchers
can give questionnaires with instructions directly to respondents or mail them to respondents
who read instructions and questions, then record their answers and give it back or return it by
mail again to data collecting agency.
Advantages of this method
 Cheapest and can be conducted by a single researcher.
 Researcher can send questionnaires to a wide geographical area
Disadvantage
 Mail questionnaire is not suitable for illiterate community
 Researchers can’t usually observe the respondent’s reactions to questions.
 A low response rate is the biggest problem
2. Direct investigation- measurement (observation) and interviewing (face-to face,

telephone)
I. Measurements or observations
It includes all methods from simple usual observations to the use of high level machines and
measurements, sophisticated equipment or facilities, such as radiographic, X-ray machines
microscope, clinical examinations, etc
II. Interviewing (face-to face, telephone)
Face-to- face is the process in which the interviewer meets the respondents, explains the
purpose of the study, forwards a set of questions and records the answers.
Advantage of this method
 Have the highest response rate and permit the longest questionnaires
 Respondents is likely to answer all the questions alone
Disadvantage
 Cost is high-the training travel, supervision, and personnel costs for interviewer
 Interviewer bias is also high in this method
 The appearance, tone of voice, question wording, and so forth of the interviewer may
affect the respondent.
13
The main advantages of telephone interview are
 Lower cost and faster completion, with relatively higher response rate
 There may be less interviewer bias and less social desirability bias than with personal
interviews
 Permits the survey to reach people who would not open their doors to an interviewer
but who might be willing to talk on the telephone.
Main disadvantages of telephone interview
 Less opportunity for establishing rapport with respondent than in face-to face
situation
 Households without telephones those with unlisted numbers are automatically
excluded
3. Extraction of data from records

It is usually possible to answer some of the questions a survey is intended to cover from
available data. For example, amass of information about the population studied by social
surveys is available in historical documents, statistical reports, records of institutions and
other sources.
2. 2. Methods of Data Presentation
- Having collected and edited the data, the next important step is to organize it. That is
to present it in a readily comprehensible condensed form that aids in order to draw
inferences from it.
- The presentation of data is broadly classified into the following two categories:
 Tabular Presentation
 Diagrammatic and Graphic Presentation
Definitions:
 Raw Data: Recorded information in its original collected form, whether it is counts
or measurements, is referred to as raw data.
 Frequency: is the number of values in a specific class of the distribution.
 Frequency Distribution: is the organization raw data in table form using classes and
frequencies.
- There are three basic types of frequency distributions:
1. Categorical Frequency Distributions
2. Ungrouped Frequency Distributions
14
3. Grouped Frequency Distributions
-There are specific procedures for constructing each type of frequency distribution.
-Tables: include the systematic arrangement of statistical data in columns and rows.
 When a single variable is used for classification, the table formed is
considered as one way table.
 When a 2 variable is used for classification, the table formed is considered as
two ways or contingency table.
 When >2 variable is used for classification, the table formed is considered as
high order table.
1. Categorical Frequency Distributions:
- Used for data that can be placed in specific categories such as nominal or ordinal.
Example: Twenty-five army inductees were given a blood test to determine their blood type.
The data set is as follows.
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.

Solution:
Since the data are categorical, discrete classes can be used. There are four blood types,
namely, A, B, O, and AB. These types will be used as the classes for the distributions. We
follow procedure to construct the frequency distribution.
Step-1: Make a table as shown.
A B C D
(Class) (Tally) (Frequency) (Percent)
A
B
AB
O
Step-2: Tally the data and place the results in column B.
Step-3: Count the Tallies and place the result in column C.
Step-4: Find the percentage of values in each class by using the formula.
15
f
%= *100%
n
Where: f = frequency of the class
n = total number of values.
 Percentages are not normally a part of frequency distribution but they can be added
since they are used in certain types of graphical presentations, such as pie graphs.
Step-5: Find the columns C and D.
Combining all the steps we can construct the following frequency distribution.
Class Tally Frequency Percent

A ///// 5 20
2. B ///// // 7 28 Ungrouped Frequency
AB ///////// 9 36 Distributions:
O //// 4 16 - This is a distribution
Total 25 100
where the exact values of
the variable are put with their number of occurrences (combined frequencies).
- Used for numerical and when the range of data is small.
- Each class is only one unit in width and each individual value is presented separately,
that is named ungrouped frequency distribution.
Constructing Ungrouped Frequency Distribution:
 First find the smallest and largest raw score in the collected data.
 Arrange the data in order of magnitude and count the frequency
 To facilitate counting one may include a column of tallies
Example: The following data represent the mark of 20 students.
80 76 90 85 80 76 80 70 60 62
70 85 70 85 65 60 63 74 75 70
Construct an ungrouped frequency distribution.
Solution:
Step-1: Find the range, Range= Maximum Value- Minimum Value=90-60=30.
Step-2: Make a table as shown below.
Step-3: Tally the data
Step-4: Complete the frequency column
Mark Tally Frequency Percent

60 // 2 10.0
62 / 1 5.0
16
63 / 1 5.0
65 / 1 5.0
70 //// 4 20.0
74 / 1 5.0
75 / 1 5.0
76 // 2 10.0
80 /// 3 5.0
85 /// 3 15.0
90 / 1 5.0
Total 20 100.0
3. Grouped Frequency Distributions:

 When the range of the data is large, the data must be grouped into classes that are
more than one unit in width.
Definitions:
• Grouped Frequency Distribution: a frequency distribution when several numbers are
grouped in one class.
• Class limits: Separates one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limits of one
class and lower limit of the next class.
• Units of measurement (U): the distance between two possible consecutive measures. It is
usually taken as 1, 0.1, 0.01, 0.001, -----.
• Class boundaries: Separates one class in a grouped frequency distribution from another.
The boundaries have one more decimal places than the raw data and therefore do not
appear in the data. There is no gap between the upper boundary of one class and lower
boundary of the next class. The lower class boundary is found by subtracting U/2 from
the corresponding lower class limit and the upper class boundary is found by adding U/2
to the corresponding upper class limit.
• Class width: the difference between the upper and lower class boundaries of any class. It is
also the difference between the lower (upper) limits of any two consecutive classes or the
difference between any two consecutive class marks.
• Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.
17
• Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.
• Cumulative frequency above (more than type): it is the total frequency of all values
greater than or equal to the lower class boundary of a given class.
• Cumulative frequency blow (less than type): it is the total frequency of all values less
than or equal to the upper class boundary of a given class.
• Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type, depending on the type of cumulative frequency used.
• Relative frequency (rf): it is the frequency divided by the total frequency.
• Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
Guidelines for classes:

We must decide how many classes to use and the width of each class. For the construction of
frequency distribution, the rules are given as follow:
1. There should be between 5 and 20 classes. We rarely use less than 5 or more than 20
classes. The exact number we use depends on the number of observations we have.
2. The classes must be mutually exclusive. This means that no data value can fall into
two different classes.
3. The classes must be all inclusive or exhaustive. This means that all data values must
be included.
4. The classes must be continuous. There are no gaps in a frequency distribution. Classes
that have no values in them must be included (unless it's the first or last class which is
dropped).
5. The classes must be equal in width. The exception here is the first or last class. It is
possible to have a "below ..." or "... and above" class. This is often used with ages.
Steps for Constructing Grouped Frequency Distribution
1. Find the largest and smallest values

2. Compute the Range = largest value – smallest value
3. Select the number of classes desired. This is usually between 5 and 20. or use Sturges
formula; k  1  3.322 log(n)
18
Where: k= the number of classes desired;
n= the total number of observation of the given data
4. Find the class width by dividing the range by the number of classes and rounding up,
R LS
not off. W   , where: L= largest value and S= Smallest value
k k
5. Pick a suitable starting point less than or equal to the minimum value. The starting
point is the lower limit of the first class. Continue to add the class width to this lower
limit to get the rest of the lower limits. The starting point plus the number of classes
times the class width must be greater than the maximum value.
6. To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the
upper limits.
7. Find the boundaries by subtracting U 2 units from the lower limits and adding U 2
units from the upper limits. The boundaries are also half-way between the upper limit
of one class and the lower limit of the next class.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it
may not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies.
Example*: The blood glucose level for 50 patients is shown below. Construct a frequency
distribution for the following data.
44 50 79 63 66 54 56 70 56 63
60 87 60 70 59 60 62 88 71 53
56 65 74 80 51 83 69 77 69 50
58 42 43 85 43 75 55 60 58 49
72 67 55 77 48 45 61 47 44 61
Solution:
Step 1: Find the highest and the lowest value H=88, L=42
Step 2: Find the range; R=H-L=88-42=46.
19
Step 3: Select the number of classes desired using Sturges formula;
k=1+3.322log (50) =6.64=7(rounding up)
Step 4: Find the class width; w=R/k=46/7=6.57=7 (rounding up)
Step 5: Select the starting observation as lowest class limit (this is usually the lowest
observation). Add the width to that observation to get the lower limit of the next class. Keep
adding until there are 7 classes.
 42, 49, 56, 63, 70, 77, 84 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=42-U=49-1=48
 48, 55, 62, 69, 76, 83, 90 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
42-48
49-55
56-62
63-69
70-76
77-83
84-90
Step 7: Find the class boundaries by subtracting 0.5 from each lower class limit
and adding 0.5 to the UCL as shown.
LCBi  LCLi  U 2 and UCBi  UCLi  U 2
Example: For class 1 LCB1 =42-0.5=41.5 and UCB1  48  0.5  48.5
• Then continue adding W on both boundaries to obtain the rest boundaries. By
doing so one can obtain the following classes.
Class boundary
41.5 – 48.5
48.5 – 55.5
55.5 – 62.5
62.5 – 69.5
69.5 – 76.5
76.5 – 83.5
83.5 – 90.5
Step 8: Tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
20
Step 10: Find cumulative frequency.
Step 11: Find relative frequency and /or relative cumulative frequency.
The complete frequency distribution follows:
Class Class Class Freq. <CF >CF RF <RCF >RCF
limits boundary Mark
42-48 41.5 – 48.5 45 8 8 50 0.16 0.16 1
49-55 48.5 – 55.5 52 8 16 42 0.16 0.32 0.84
56-62 55.5 – 62.5 59 13 29 34 0.26 0.58 0.68
63-69 62.5 – 69.5 66 7 36 21 0.14 0.72 0.42
70-76 69.5 – 76.5 73 6 42 14 0.12 0.84 0.28
77-83 76.5 – 83.5 80 5 47 8 0.10 0.94 0.16
84-90 83.5 – 90.5 87 3 50 3 0.06 1 0.06
Total 50 1
2.2.1 Diagrammatic and Graphic Presentation of Data

- These are techniques for presenting data in visual displays using geometric and pictures.
Importance:
 They are greater attraction
 They facilitate comparison
 They are easily understandable.
2.2.1.1 Diagrammatic Presentation of Data
- Diagrams are appropriate for presenting discrete as well as qualitative data.
- The three most commonly used diagrammatic presentation of data are:
 Pie charts
 Pictograms
 Bar charts
Pie Charts
-A pie chart is a circle that is partitioned into different sectors corresponding to the relative
frequency of the item of each category.
The angle of the sector is given by:
Angle of a sector = Rf * 360 0 where Rf= relative frequency
Example: Draw a suitable diagram to represent the following Immunization status of
children.
Immunization Status Not immunized Partially immunized Fully immunized
Value 49 46 37
Solutions:
Step 1: Find the percentage.
21
Step 2: Find the angle of the sector for each class.
Step 3: Using a protractor and compass, graph each section and write its name corresponding
percentage.
Class Frequency Percent Degree
Not immunization 49 37 133.2
Partially immunization 46 35 126
Fully immunization 37 28 100.8
Sum 132 100 360
Fully immunized
28% Not immunized
37% Not immunized
Partially immunized
Fully immunized
Partially immunized
35%
2. Pictogram
-This is a diagrammatic representation of categorical data using small symbolic figures and
pictures to represent data.
-It can be drawn horizontally or vertically.
Note: pictograms (short for “picture diagrams”)
Example: Draw a pictogram to represent the following population of a town during the
years: 1989 to 1992.
Year 1989 1990 1991 1992
Population 2000 3000 5000 7000
3. Bar charts
 A set of bars (thick lines or narrow rectangles) used to represent and
compare the frequency distribution of discrete variables and attributes or
categorical series.
 In presenting data using bar diagram, all bars must have equal width and the
distance between bars must be equal.
 The height or length of each bar indicates the size (frequency) of the figure
represented.
22
 Bars can be drawn either horizontally or vertically.
 There are different types of bar charts. The most common being are:
i. Simple bar char
ii. Component bar chart (subdivided bar chart)
iii. Multiple bar chart
iv. Percentage bar chart
v. Broken bar chart
vi. Deviation or two way bar chart
i. Simple bar chart
 It is used to represent a single set of data (variable) classified in different
category.
Example: Consider the immunization status of children
Immunization status of Children
50
40
30
20
Number of Children
10
0
Not immunized Partially Fully immunized
Immunized
Immunization Status
Fig. Immunization status of children
ii. Component Bar Chart

This is a diagram where each bar represents the cumulative of two or more categories of data
(variables).
Examples: Consider data on immunization status of women by marital status
Marital status Immunization Status of women
Immunized Non-Immunized
Single 12 18
Married 24 21
Divorced 24 35
Widowed 14 16
23
Solution:
iii. Multiple bar chart

- It is a bar chart where group of bars are used to represent two or more interrelated data
in each category.
Example: Consider data on immunization status of women by marital status
2.2.1.2 Graphical Presentation of data:

- The histogram, frequency polygon and cumulative frequency graph (ogive) are most commonly
applied graphical representation for continuous data.
Procedures for constructing statistical graphs:
 Draw and label the X and Y axes.
 Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y
axes.
 Represent the class boundaries for the histogram or ogive and the mid points for the
frequency polygon on the X axes.
24
 Plot the points.
 Draw the bars or lines to connect the points.
Histogram
-A graph which places the class boundaries on the horizontal axis and the frequencies on a
vertical axis. Class marks and class limits are sometimes used as quantity on the X axes.
-For each class in the distribution a vertical rectangle is drown with its base on the horizontal
axis extending from one class boundary of the class to the other class boundary, there will
never be any gap between the histogram rectangles.
- If all of the classes have equal width, then the histogram consists of a set of rectangles
having heights equal to the class frequencies and bases equal to the class width.
Example: Construct a histogram to represent the previous data (example *).
Number of Patients
14
12
10
8
6
4
2
0
41.5 – 48.5 – 55.5 – 62.5 – 69.5 – 76.5 – 83.5 –
48.5 55.5 62.5 69.5 76.5 83.5 90.5
Blood Glucose Level
Fig. Histogram for blood glucose level in milligrams per deciliter, for 50 patients
Frequency Polygon:
-It is a line graph of class frequency in the vertical axis plotted against class marks on the
horizontal axis. It is customer to the next higher and lower class intervals with corresponding
frequency of zero, this is to make it a complete polygon.
Remark: It can be obtained by connecting the midpoints of the tops of the rectangles in a
histogram.
Example: Consider example * and construct a frequency polygon
25
Fig: Frequency polygon for blood glucose level, in milligrams per deciliter, for 50 patients.
Cumulative frequency curve or Ogive
A graph showing the cumulative frequency (less than or more than type) plotted against upper
or lower class boundaries respectively. That is class boundaries are plotted along the
horizontal axis and the corresponding cumulative frequencies are plotted along the vertical
axis. The points are joined by a free hand curve.
To construct an ogive curve:
 Compute the cumulative frequency of the distribution.
 Prepare a graph with the cumulative frequency on the vertical axis and the true lower
class limits (class boundaries) of the interval scaled along the x-axis (horizontal axis).
- The true lower limit of the lowest class interval with lowest scores is included in the x-
axis scale. This is also the true lower limit of the next lower interval having a
cumulative frequency of 0.
Example: Consider example * and construct an ogive curve(less than type)
Fig: Ogive curve for blood glucose level of the 50 patients

Exercise: Consider example * and construct an ogive curve (more than type)
Exercises
26
1. The following data shows the number of experimental rats tested for their response to
a given drug in 30-day period. Construct a frequency distribution using appropriate
class size.
68 32 28 28 32 53 29
59 23 32 33 20 59 29
31 58 18 32 48 47 28
19 45 25 31 60 31 43
28 37
2. Construct a histogram, frequency polygon and less than and or more Ogive for the
data in exercise 5.
3. In a certain frequency distribution having 50 observations, the smallest and the
highest observations are 27 and 57 respectively. The distribution has constant class
width with classes. Then find:
a. The class width
b. The class limits
c. All the class marks
d. The class boundaries
27
CHAPTER 3
MEASURES OF CENTRAL TENDENCY
INTRODUCTION
Measures of central tendency are measures of the location of the middle or the center value of
a distribution. The definition of "middle" or "center" is purposely left somewhat vague so that
the term "central tendency" can refer to a wide variety of measures. The tendency of
statistical data to get concentrated at certain values is called the “central tendency” and the
various methods of determining the actual value at which the data tend to concentrate are
called measures of central tendency or averages.
Properties of Measures of Central Tendency:
 It should be easy to understand and calculate
 It should be rigidly (well) defined, in the sense that it should have one and only one
interpretation so that the personal bias of the investigator does not affect the value or
its usefulness.
 It should be representative of the data
 It should be as little as affected by extreme observations.
 It should be capable of further algebraic treatment. For example, if we are given the
average of some groups, then we should be able to find the average of all the items
taken together.
 It should be as little as affected by fluctuations of sampling.
 It should be based on all observation under investigation.
The Summation Notation:
- Let X 1 , X 2 ,..., X n be a number of measurements where n is the total number of
observation and X i is i th observation.
n
- The symbol X
i 1
i is used to denote the sum of all the X i ’s from i  1 to i  n , i.e. by
definition:
28
n
X i 1
i  X 1  X 2  ....  X n
- The symbol  is the Greek capital letter sigma, denoting the sum.
- We shall denote the sum by  X ,  X i , or 

i
Xi .
Properties of Summation:
n
1.  C  nC , where C is any constant number
i 1
n n
2.  CX
i 1
i  C  X i , where C is any constant number
i 1
n n
3.   a  bX i   na  b X i ,
i 1 i 1
where a and b are any constant number
n n n
4.   X i  Yi    X i   Yi ,
i 1 i 1 i 1
n n n
5.   X i * Yi    X i *  Yi ,
i 1 i 1 i 1
Examples:
4
a) X
i 1
i  X1  X 2  X 3  X 4
4
b)  3Xi 1
i  3 ( X 1  X 2  X 3  X 4 )  3X 1  3X 2  3X 3  3X 4
4
c)  a  a  a  a  a  4a
i 1
Types of Measures of Central Tendency:

- The most commonly used averages are:
A. mean (Arithmetic, Geometric and Harmonic)
B. Mode
C. Median
- The choice of the averages depends on which best fits the property under discussion.
Arithmetic mean
Definition: The arithmetic mean is defined as the sum of all the values of the item divided by
the total number of items.
29
-The arithmetic mean of a sample (or simply the sample mean) of n observations
X 1 , X 2 , ..., X n , denoted by X is computed as:
1 n 1
X 
n i 1
X i  ( X 1  X 2  ...  X n )
n
-The population mean,  (mu) is defined as:
N
1 1

N
X
i 1
i 
N
( X 1  X 2  ...  X N )
 For a grouped data arithmetic mean is computed as:

1 k 1
X   f i X i  ( f 1 X 1  f 2 X 2  ...  f k X k )
n i 1 n
Where X i = the mid point (class mark) of the i th class
k
f i =is the frequency of the i th class and n  f
i 1
i .
k= is the number of classes

Examples:
1. The ages in weeks of six kittens at an animal shelter are 3, 8, 5, 12, 14 and 12. Find
the arithmetic mean
1 6 1
X 
6 i 1
X i  (3  8  5  12  14  12)  9
6
2. The table below shows blood pressure levels of 50 first–year male medical
students. Calculate the arithmetic mean
Class limits Frequency (fi) class mark (Xi) fi X i

42-48 8 45 360
49-55 8 52 416
56-62 13 59 767
63-69 7 66 462
70-76 6 73 438
77-83 5 80 400
84-90 3 87 261
30
Solution:
 First find the class marks
 Find the product of frequency and class marks
 Find the mean using the formula
1 k 1 3104
X 
n i 1
f i X i  (360  416  ...  261) 
50 50
 60.08
Properties of Arithmetic mean:

o The sum of deviations of a set of items from their arithmetic mean is always 0. i.e.
 f (X
i 1
i i  X)  0.
o Uniqueness: For a given set of data there is one and only one arithmetic mean.
- The sum of squares of deviations from the arithmetic mean is less than of those
computed from any other point. Symbolically,
k k
 f (X  X ) 2   f i  X i  A , where A  X
2
i i
i 1 i 1
o If X 1 , X 2 ,..., X k are arithmetic mean of n1 , n 2 ,..., n k observations respectively then

the combined mean is given by:
1 k
n1 X 1  n 2 X 2  ...  n k X k
XC  k n X i i 
n1  n 2  ...  n k
.
n
i 1
i
i 1
Example: In a class of 60 students, who have taken an exam, 50 are male with an average
mark of 45 and the average mark of females was 60. Find the average mark obtained by the
entire class.
Solutions:
Males Females
31
n m  50 n f  60  50  10
X m  45 X f  60
nm X  n f X f 50 * 45  10 * 60 2850
XC     47.5.
nm  n f 50  10 60
Therefore, the average mark of the entire class is 47.5.

o If a wrong value has been used when calculating the arithmetic mean then the correct
mean can be obtained without repeating the whole process using:
X corr  X incorr
Correct X = incorrect X +
n
Where: X corr  sum of correct items
X incorr  sum of incorrect items
n  the total number of observation

Example: The average mark of 100 Students was found to be 53. Later it was found that the
mark of 90 was misread as 60. Find the correct mean.
Solutions:
Incorrect X =53, n=100, X corr  90, X incorr  60
1
Correct X =incorrect X + ( X correct  X incorr )
n
1
=53+ (90  60)  53  0.3 = 53.3
100
o The effect of transforming original series on the mean:
a. If any constant k is added/ subtracted to/from every observation then the new mean
will be the old mean  k respectively.
b. If every observations are multiplied by a constant k then the new mean will be k*old
mean.
Example-1: the mean of n variables X 1 , X 2 ,..., X n are known to be 12. New set of
variables are obtained by the linear transformation of Yi  2 * X i  0.5 then what will be the
mean of the new set of variables.
Solutions: Ynew  2 * X old  0.5  2 *12  0.5  23.5.
Example-2: The mean of a set of variable is 500.
a. If 10 are added to each of the numbers in the set, then what will be the mean
of the new set?
32
b. If each of the numbers in the set are multiplied by -5, then what will be the
mean of the new set?
Solution:
a. X new  X old  10  500  10  510
b. X new  5 * X old  5 * 500  2500.

Weighted mean:
Let X 1 , X 2 ,..., X n be the value of items of a series and W1 , W2 ,..., Wn their
corresponding weights, and then the weighted mean denoted by X W is given by:
n
X W i i
XW  i 1
n
W i 1
i
Example: A student obtained the following percentage in an examination: Statistics 60,

Biology 75, Mathematics 63, Physics 59, and Chemistry 55. Find the students weighted
arithmetic mean if weights 1, 2, 1, 3, 3 respectively are allotted to the subjects.
Solutions:
5
X W i i
60 * 1  75 * 2  63 * 1  59 * 3  55 * 3 615
XW  i 1
   61.5.
n
1 2 1 3  3 10
W i 1
i
Advantage of Arithmetic Mean:

 It is the most widely used and most commonly understand of all averages.
 It is based on all observation
 The mean is used in computing other statistics, such as the variance.
 It is useful for performing statistical procedures such as comparing the means
from several data sets.
 It is rigidly defined.
 It is easy to calculate and simple to understand.
Disadvantage Arithmetic of Mean:
 Arithmetic mean cannot be calculated if the extreme class is open ended class. For
example, below 5 or above 50.
 It is affected by extreme values that are not the representative of the data.
Example: For the data; 12, 10, 8, 7, 10, 16 and 10.5 if we substitute the larger value
16 by 54, the resulting mean is 16.83 which is not the representative of the data.
33
 It is meaningless for nominal or qualitative classified data
Geometric Mean (G.M)
Definition: If all the given observations X 1 , X 2 ,..., X n are positive, their geometric mean is
simply the nth root of their product. Like the arithmetic mean it also depends on all
observations. That is
1
 n n
GM    X i   X 1 * X 2 * .... * X n
 i 1 
-  represents the product of the X values
 The geometric mean gives a better measure of central tendency than other means if the
values are measured as ratios, proportions or percentages.
 There is one great drawback with it, that it cannot be calculated if any one or more values
are zero or negative.
’S
- In practice, GM can be computed by taking logarithmic values of X , that is
1
 n n 1
LogGM    X i    log( X i ) for i=1,2,. . . ,N.
 i 1  n
In case of frequency distribution where each of Xi occurs fi times (i=1, 2, . . ., k)

1
log G 
n
 f i log10 X i , where n   f i for i=1, 2, . . ., k, where Xi.= the mid-values of the
class intervals. Then taking antilog of both sides, we obtain G.M.
Note: The geometric mean is less affected by extreme values than is the arithmetic mean and
is useful as a measure of central tendency for some positively skewed distributions.
Harmonic Mean (H.M)
H.M is the inverse of the arithmetic mean of the reciprocals of the observations of a set. It is
a suitable measure of central tendency when the data pertains to speed, rates, and time.
Let X 1 , X 2 ,..., X n be n variate values in a set; then the harmonic mean is given as,
1
H 
1 1 for i=1, 2, …, k
n
 Xi
If the data are arranged in the for of a frequency distribution in which an observation X i has
frequency fi (i=1, 2, . . ., k), the harmonic mean is given by,
1
H
1 fi Where N   f i for i=1, 2, …, k.
n
 Xi
34
-It fulfils almost all properties of a good measure of central tendency, except when any
observation is zero, it cannot be calculated. Its main advantage is that it gives more weight
age to small values and less weight age to large values.
Example 3.6:
1) A man travels from A.A to Awasa by a car and takes four hours to cover the whole
distance. In the first hour he maintains a speed of 50km/h, in the second hour his speed
remains 64km/h, in the third 80km/h and in the fourth hour he travels at the speed of
55km/h.Find the average speed of the motorist?
2) The price commodity increased by 5%, 8% and 77% for three consecutive years. What is
average yearly price increase?
3) The arithmetic mean of two numbers is 13 and their geometric mean is 12. Find
a) The numbers
b) H.M
4) Proof the following theorem
a) If x1 and x2 are two observed values, the geometric mean of their arithmetic mean and
harmonic mean is equal to the geometric mean of the numbers x1 and x2.
b) If A, G, and H stand for A.M, G.M and H.M respectively, the relation A  G  H
holds.
Mode ( X̂ ):
Definition: It is the value of the distribution that occurs with the highest frequency among all
the observations in a sample. The mode may not exist, and even if it does exist, it mayn’t be
unique.
Unimodal: is a distribution having one mode.
Bimodal: is a distribution with two modes.
Multimodal: A data set which contain more than one mode
-For individual series:
Mode = the highest frequency value.
Example: The modal age of the age distribution: 23, 28, 28, 31, 32, 34, 37, 42, 50, and 61 is
28, since it occurred twice while the other values occurred only once.
-For a grouped frequency distribution the mode of the distribution is calculated by the
formula
 1 
Xˆ  Lmod    w
 1   2 
35
Where; Lmod  lower class boundary of the modal class
 1  difference of frequency of the modal class and pre-modal class
 2  difference of frequency of the modal class and post-modal class
w= length of the interval of the modal class.

 Modal class: is class having the highest frequency in the distribution.
Example: Calculate the mode for grouped frequency distribution of blood pressure levels.
Class limits Freq.
42-48 8
49-55 8
56-62 13
63-69 7
70-76 6
77-83 5
84-90 3
Total 50
Solutions:
- Identify the modal class: the modal class is a class having the highest frequency in
the distribution.  56-62 is a modal class.
- Find the mode using the formula.
 1 
Xˆ  Lmod    w
 1   2 
 (13  8) 
 55.5   
 (13  8)  (13  7) 
 55.5  0.46
 55.96
Advantages of mode:
o It can be calculated for distribution with open end class.
o It is not affected by extreme values.
o We can change the size of the observations without changing the model
o Easy to calculate and simple to understand.
36
o It can be used when the data is nominal such as gender, religious preference, or
political affiliation
Disadvantage of mode:
o It is not based on all values
o The mode is not always unique that is a data set can have more than one mode.
o The mode doesn’t always exist for a data set.
Median
~
Definition: It is the center value of an order data. It is denoted by X .
- Before one can find median, the data must be arranged in order. Then,
i. When the number of observation is odd, then, the median is the meddle
value.
ii. When the number of observation is even, then, the median is the arithmetic
mean of the two middle values.
 Suppose there are n observations in a sample. If these observations ordered from
~
the smallest to the largest, then the median ( X ) is:
~
X = the
X  n 1 th value if n is odd.
2
~ X
X = The average of the  n2 th
and X
 1 value if n is even.
n
2
th
 For grouped frequency distribution the median of the distribution is calculated by

the formula:
~ n  w
X  LMed    C 
2  f Med
where:
LMed  lower class boundary of the median class
C = cumulative frequency preceding the median class

f Med  the frequency of the median class
w = width of the median class

n = sum of frequencies
 Median class: is the first class whose cumulative frequency is at least n/2.
Properties of the median:
37
- It is used when one must find the center or middle value of a data set.
- Uniqueness: There is only one median for a given set of data.
- Unlike the mean it is not affected by extreme values. Therefore, when there are
extreme values it is advisable to use the median instead of the mean, especially in
application.
Example-1: Consider the following data, which consists of white blood counts (in
thousands) taken on admission of all patients entering a small hospital on a given day:
7, 35, 5, 9, 8, 3, 10, 12, and 8.
Solution:
First order the samples as follows: 3, 5, 7, 8, 8, 10, 12, and 35.
Since n is odd (n=9) median is given by the (9  1 2) th  5 th , point, which is equal to
8.
Example-2: Consider the grouped frequency distribution of blood pressure levels of
50 first-year male medical students. Calculate the median
Solution:
- First find the less than cumulative frequency
- Identify the median class: median class is the first class whose cumulative
frequency is at least n/2 = 50/2=25.  56-62 is a median class.
- Find the median using the formula.
Class limits Freq. <CF
42-48 8 8
49-55 8 16
56-62 13 29
63-69 7 36
70-76 6 42
77-83 5 47
84-90 3 50
Total 50
~ n  w
X  LMed    C 
2  f Med
7
 55.5   25  16
13
 60.35
Remark:
i. For nominal data (such as sex or race), the mode is the only valid measure.
38
ii. For ordinal data (such as salary categories), only the mode and median can be
used.
Measures of Relative Position (Quantiles)

Quantiles are the values that divide a set numerical data arranged in increasing order into
equal number of parts. Quartiles divide the numerical data arranged in increasing order into
four equal parts of 25% each. Thus there are 3 quartiles Q1, Q2 and Q3 respectively. Deciles
are values which divide the arranged data into ten equal parts of 10% each. Thus we have 9
deciles which divide the data in ten equal parts. Percentiles are the values that divide the
arranged data into hundred equal parts of 1% each. Thus there are 99 percentiles. The 50th
percentile, 5th decile and 2nd quartile are equal to median.
 Quartile divides a given set of data in to four equal parts
Wher = the lower class boundary of the kth quartile class
W= the class width

n = total number of observation
Remark: The kth quartile class (class containing ) is the class with the smallest cumulative
frequency (less than type) greater than or equal to .n
Note that:
 Decile divides a given set of data in to ten equal parts
39
 Percentile divides a give set of data in to hundred equal parts
Note:
40

SDDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SDDF

Uploaded by

Copyright:

Available Formats

Chapter One

1.1 History of Statistics

1.3 Stages in statistical investigation

There are five stages or steps in any statistical investigation.

2. Organization of Data: This is a methodology for classification and describing the

1.4 Definition of Some Basic terms

1. A (statistical) population: is the complete set of possible measurements for which

1.5 Applications, uses and limitations of Statistics

1.6 Types of Variables and Measurement Scales

Variables can be classified as qualitative or quantitative.

Discrete Variables: are variables which assume a finite or countable number of

1. Your checking account number as a name for your account.

2. Direct investigation- measurement (observation) and interviewing (face-to face,

3. Extraction of data from records

Construct a frequency distribution for the data.

Class Tally Frequency Percent

Mark Tally Frequency Percent

3. Grouped Frequency Distributions:

Guidelines for classes:

Steps for Constructing Grouped Frequency Distribution

1. Find the largest and smallest values

2.2.1 Diagrammatic and Graphic Presentation of Data

Immunization status of Children

ii. Component Bar Chart

iii. Multiple bar chart

2.2.1.2 Graphical Presentation of data:

Fig: Ogive curve for blood glucose level of the 50 patients

- We shall denote the sum by  X ,  X i , or 

Types of Measures of Central Tendency:

 For a grouped data arithmetic mean is computed as:

k= is the number of classes

Class limits Frequency (fi) class mark (Xi) fi X i

Properties of Arithmetic mean:

o If X 1 , X 2 ,..., X k are arithmetic mean of n1 , n 2 ,..., n k observations respectively then

Therefore, the average mark of the entire class is 47.5.

n  the total number of observation

b. X new  5 * X old  5 * 500  2500.

Example: A student obtained the following percentage in an examination: Statistics 60,

Advantage of Arithmetic Mean:

-  represents the product of the X values

In case of frequency distribution where each of Xi occurs fi times (i=1, 2, . . ., k)

 2  difference of frequency of the modal class and post-modal class

w= length of the interval of the modal class.

 For grouped frequency distribution the median of the distribution is calculated by

C = cumulative frequency preceding the median class

w = width of the median class

Measures of Relative Position (Quantiles)

Wher = the lower class boundary of the kth quartile class

W= the class width

frequency (less than type) greater than or equal to .n

 Decile divides a given set of data in to ten equal parts

You might also like