Statistics Term Paper

Introduction
Bangladesh is one of the most densely populated countries of the world where most of the people
live under the line of poverty. Due to the extreme poverty, people migrate to the urban region for
seeking of better socio-economic condition. These migrated people mostly find jobs in different
informal sectors such as petty retail trade, transport, manufacturing, construction and domestic
services. Among these entire occupations, shop pulling absorbs a significant number of migrated
people.
This socioeconomic environmental article briefly includes philosophies on the socioeconomic

environmental conditions of the Vegetable Seller in Dhaka City of Bangladesh. The Vegetable
Seller in Dhaka City are those middle class and vulnerable group of people migrated from
various geographical regions of Bangladesh where they have no source of income, generally
landless, floating people due to regular natural disasters such as drought, flood, river bank
erosion, cyclone in their areas but capable of doing hard works. The profession of this group of
people is often neglected and they survive on their hardship without any direct support of the
government, but the constructional obligation of Bangladesh is to support all basic needs of the
people, which are the needs of food, shelter, clothing, education, and medical facilities. The
increasing population growth, natural disasters due to global warming and climate change, food
crisis, unemployment, etc., are the great threats to the survival of their business and their
dependents, which results in more vulnerable and inhuman socioeconomic environmental
conditions.
The current global programs on Sustainable Development (SD, initiated in 1992 during Earth
Summit) and Clean Development Mechanism (CDM) under Kyoto Protocol (KP, signed in 1997)
prioritized Poverty Alleviation through creation of alternative sources of income for the poor,
so that natural resources are not overexploited. In this respect, shop pulling is an important
environment friendly profession that should be taken care with humanistic grounds. The
degradation of socioeconomic environmental conditions and its control or protection is very
essential for the people of Bangladesh, a densely populated and least developed country (LDC)
member in Asia.
1 | Page
Objectives of the Study
General Objectives
The general objectives of this study is to identify some factors affecting average daily family
income, purchase capacity and some factors nominally affecting investment capacity of
Vegetable Seller owners at Bashundhara residential area & kuril in Dhaka City.
Specific Objectives
The specific objective of this report is

To understand factors affecting average daily family income of Vegetable Seller owners at
Bashundhara residential area & kuril in Dhaka City.
To understand investment capacity.
To understand purchasing pattern.
To understand overall socio economic status.
Limitations of the Study
The study was limited by a number of factors

Firstly, the research was limited only at the Bashundhara residential area & kuril in Dhaka
City.
Secondly, sample size was very small (only 20) to present the proposed scenario.
Thirdly, time constraint led to get narrower outcomes.
Fourthly, the knowledge constraint of the researcher was another limitation for this study.
Finally, owners were not willing to give answers when asked by personal questions given in
the Questionnaire.
2 | Page
Research Methodology
PRIMARY RESEARCH
Primary Research has been done to get an exhaustive understanding of what are the factors that
might contribute to the socio-economic condition of Vegetable Seller in Bangladesh. Thorough
information was collected from primary quantitative and further secondary research, one set of
questionnaire was provided to us by our honorable faculty member.
SECONDARY RESEARCH
Since, the Vegetable Sellers of Bangladesh have never been subject to extensive research, there is
lack of literature directly related to it. The strength of secondary study as a contribution towards
development of the core of the research is nothing much to talk about. Before any analysis of this
unexplored area could be made, the available secondary data regarding the small shops should be
kept in consideration. These reports, books and websites were thoroughly studied.
Along with that, several websites were also being comprehensively studied. Newspaper articles
from New Age, The Daily Star were also collected and studied. From this wide ranging source of
data, we were able to manage to achieve valuable insights. This information helped us set the
parameters for our primary research.
Sample Plan
3 | Page
SAMPLE DESIGN
Target Population
The Target Population of the study is the Vegetable Sellers at Bashundhara residential area &
kuril in Dhaka City.
Sample Frame
The sample frames that have been selected are as follows

Vegetable Sellers at Bashundhara residential area and kuril.
Sample Element
1. Vegetable Sellers at Bashundhara residential area & kuril in Dhaka City.

2. Sample Size: 20
3. Confidence Level: 95%
4. Allowable Error: 5%
5. Sampling Method: Convenient
6. Data Processing: Data has been processed by using Microsoft excel.
Statistical Analysis
4 | Page
About Statistics
Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting
numerical data to assist in making more effective decisions. Statistical techniques are used
extensively by marketing, accounting, quality control, consumers, professional sports people,
hospital administrators, educators, politicians, physicians, etc. There are two types of statistics.
They are:
Descriptive Statistics: A methods of organizing, summarizing, and presenting data in an
informative way.
Inferential Statistics: A decision, estimate, prediction, or generalization about a population,
based on a sample.
Types of variables
Considering the types of the variables as follows:
Qualitative or Attribute variable - the characteristic being studied is nonnumeric.

Quantitative variable - information is reported numerically.
Quantitative variables can be classified as either discrete or continuous.
1. Discrete variables: can only assume certain values and there are usually gaps between
values.
2. Continuous variable: can assume any value within a specified range.
Level of Measurement
Considering the four levels of measurement as follows:
Nominal level: data that is classified into categories and cannot be arranged in any particular
order. Examples: eye color, gender, religious affiliation.
Ordinal level: involves data arranged in some order, but the differences between data values
cannot be determined or are meaningless. Example: During a taste test of 4 soft drinks, Mellow
Yellow was ranked number 1, Sprite number 2, Seven-up number 3, and Orange Crush number
4.
5 | Page
Interval level: similar to the ordinal level, with the additional property those meaningful
amounts of differences between data values can be determined. There is no natural zero point.
Example: Temperature on the Fahrenheit scale.
Ratio level: The intervals level with an inherent zero starting point. Differences and ratios are
meaningful for this level of measurement. Examples: Monthly income of surgeons, or distance
traveled by manufacturers representatives per month.
The data that has been collected through the primary research are of interval level and ratio level.
Graphical presentation of data

Bar Chart
A bar chart or bar graph is a chart that presents grouped data with rectangular bars with lengths
proportional to the values that they represent. The bars can be plotted vertically or horizontally. A
vertical bar chart is sometimes called a column bar chart. A bar graph is a chart that uses either
horizontal or vertical bars to show comparisons among categories. One axis of the chart shows
the specific categories being compared, and the other axis represents a discrete value. Some bar
graphs present bars clustered in groups of more than one (grouped bar graphs), and others show
the bars divided into subparts to show cumulative effect (stacked bar graphs).
6 | Page
Seller's Monthly Income
50000
40000
30000
20000
Monthly Income Monthly Income
10000
Seller
From the collected data, the monthly income of 20 vegetable sellers at Bashundhara residential
area & Kuril has been taken and the following bar chart is produced.
Pie Chart
A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to
illustrate numerical proportion. In a pie chart, the arc length of each slice is proportional to the
quantity it represents. While it is named for its resemblance to a pie which has been sliced, there
are variations on the way it can be presented.
From the collected data, the following pie chart is produced using the information of the sellers
profile before this business.
7 | Page
Sellers Profile Number of owners Percentage
Owner of a different 6 30%
business
Employee in the same 7 35%
sector
Employee in a different 4 20%

sector
Unemployed 3 15%
Total 20 100%
Seller's Profile
3 Owner of a different business

6
Employee in the same sector
4
Employee in a different sector
Unemployed
7
Frequency Distribution & Relative Class Frequencies
Frequency Distribution: A Frequency distribution is a grouping of data into mutually exclusive

categories showing the number of observations in each class. Frequency table is a grouping of
qualitative data into mutually exclusive classes showing the number of observation in each class.
The table contains some elements, such as:
8 | Page
Class midpoint: A point that divides a class into two equal parts. This is the average of the
upper and lower class limits.
Class frequency: The number of observations in each class.
Class interval: The class interval is obtained by subtracting the lower limit of a class from the
lower limit of the next class.
Relative Class Frequencies

Class frequencies can be converted to relative class frequencies to show the fraction of the
total number of observations in each class.
A relative frequency captures the relationship between a class total and the total number of
observations.
The following is an example of a Frequency Table with Relative Class Frequencies of the
surveyed Vegetable Sellers owners monthly income from the collected data:
Class Frequency, f Cumulative Relative Midpoint, M fM
Frequency frequency
0-10000 3 3 .15 5000 15000
10000-20000 4 7 .20 15000 60000
20000-30000 4 11 .20 25000 100000
30000-40000 7 18 .35 35000 245000
40000-50000 2 20 .10 45000 90000
Total n=20 fM=
510,000
Graphical presentation of Frequency distribution

Histogram
9 | Page
Histogram for a frequency distribution based on quantitative data is very similar to the bar chart
showing the distribution of qualitative data. The classes are marked on the horizontal axis and
the class frequencies on the vertical axis. The class frequencies are represented by the heights of
the bars.
The following is an example of a histogram using the date related to the monthly income of the
surveyed Vegetable Seller Owners at Bashundhara residential area & Kuril.
Histogram of seller's monthly income

8
7
6
5
4
3
2
Frequency 1 Frequency
0
Seller's monthly income
Frequency Polygon
Here frequency polygon is used to determine which has the highest percentage of frequencies. In
our sample we have 20 vegetable vendors who distributed in five class intervals. As the
difference between the total numbers of frequencies is quite large, the frequencies have been
converted to relative frequencies.
10 | P a g e
0.4
0.35
0.3
0.25
0.2
Relative Frequency
0.15 Relative Frequency
0.1
0.05
0
5000 15000 25000 35000 45000
Class Midpoint
From this chart, it can be estimated that 35% vegetable vendors whose monthly profit is 30000-
40000 Tk.
Mean, Median, Mode

Arithmetic Mean or Mean
The arithmetic mean is the most widely used measure of location. It requires the interval scale.
Its major characteristics are:
Every set of interval-level and ratio-level data has a mean.
All the values are included in computing the mean.
A set of data has a unique mean.
The mean is affected by unusually large or small data values.
The arithmetic mean is the only measure of central tendency where the sum
of the deviations of each value from the mean is zero.
11 | P a g e
Using the data related to the monthly income of the surveyed Vegetable Seller Owners at
bashundhara residential area & kuril, the following calculations find out the arithmetic mean.
Mean= 510000/20 = 25500.
Median
The Median is the midpoint of the values after they have been ordered from the smallest to the
largest.
There are as many values above the median as below it in the data array.
For an even set of values, the median will be the arithmetic average of the two middle numbers.
Properties of the Median are as follows:

a) There is a unique median for each data set.
b) It is not affected by extremely large or small values and is therefore a valuable measure of
central tendency when such values occur.
c) It can be computed for ratio-level, interval-level, and ordinal-level data.
d) It can be computed for an open-ended frequency distribution if the median does not lie in an
open-ended class.
The median for the grouped data is calculated as follows:
12 | P a g e
n/ 2fpm
Median= Lm + fm *c
Where,
Lm = The lower boundary of the class median
n = The total frequency
Fpm = The cumulative frequency before class median
fm = The frequency of the class median
c = The class width
20 /211
Median =25000+ 4 )*10,000 = 22500

Mode
The mode is the value of the observation that appears most frequently.
The mode for the grouped data is calculated as follows:
f 1f 0
Mode = L1 + ( 2 f 1f 0f 2 )* c

Where,
L1 = The lower boundary of class mode
f 1 = the frequency of class mode and the
f 2 = Frequency of the class after the class mode
f 0 = The difference between the frequency of class mode and the
frequency of the class before the class mode

c = The class width.
74
Mode= 30000 + 2742 ) *10000

13 | P a g e
= 33750
Measures of dispersion
Range
Range = 50000 - 0 = 50000
Standard Deviation
Standard Deviation, S = 3095000000/20-1
= 12,763.02
Coefficient of Variation
CV = (SD/Mean) X 100
CV= 12,763.02/ 25500 *100
= 50.05
14 | P a g e
This large value of standard deviation indicates that the observations are widely scattered around
the mean. So the mean is not a reliable measure of location in this case.
Inferential Statistics
Inferential statistics is the method used to determine something about a population on the basis of a
sample. We use inferential statistics to try to infer from the sample data about the population-to make
judgments of the probability that an observed difference between groups is a dependable one or one that
might have happened by chance in this study. Thus, it is used to reach conclusions that extend beyond the
immediate data. Following is the application of different examples of inferential statistics in the context of
our collected data:
Table A: Contingency table showing the number of vegetable vendors whose home district is in Comilla
or Chadpur division and whether their families live in or outside Dhaka.
Family Lives in Comilla Chadpur Total

Dhaka B1 B2
Event, Ai
Yes, A1 5 4 9
No, A2 3 3 6
Total 8 7 15
We can determine the probability of randomly selecting a vegetable vendor whose family lives in
Dhaka and has home district belonging to Chadpur division from this contingency table using the
rules of addition and multiplication.
Here two events occur at the same time-the vendor is from Chadpur and has family living in
Dhaka.
The probability that event A1 will happen is, P (A1) = 9/15
The conditional probability that event B2 will happen is, P (B2|A1) = 4/15
Using the general rule of multiplication, P (A1 and B2) = P(A1 ) P(B2|A1)= 9/15*4/15= 0.16
15 | P a g e
So, the probability of selecting a vegetable vendor whose family lives in Dhaka and has home
district belonging to Chadpur division is 0.16 We use the general rule of addition to find the
probability of selecting a vegetable vendor whose family lives in Dhaka or has home district
belonging to Chadpur division. The probability that event B2 will happen is, P (B2 ) = 7/15
The joint probability that both event A1 and B2 will happen is, P (A1 and B2) = 4/15 P (A1 or
B2) = P(A1 ) + P(B2)- P(A1 and B2) = 4 /15 + 7/15 4/15 = 0.47. So, the probability of selecting a
vegetable vendor whose family lives in Dhaka or has home district belonging to Chadpur division is 0.47.
Bayes Theorem
By using Bayes Theorem, we can determine the probability of a vegetable vendors family living in
Dhaka given his/her home district is in Chadpur division. A1 and A 2 are 2 mutually exclusive and
collectively exhaustive events.
The prior probabilities are: P(A1 ) = 9/15 , The probability that family lives in Dhaka
P(A2 ) = 6/15 , The probability that family lives outside Dhaka.
The conditional probabilities are: P(B2|A1) = 4/9 ,
The probability that the vegetable vendor whose family lives in Dhaka from Chadpur division.
P (B2|A2) = 3/6 , The probability that the veg. vendor whose family lives in Dhaka is from Chadpur
division. Using Bayes theorem, P(A1|B2) = P(A1) P(B2|A1) P(A1 ) P(B2|A1) +P(A2 ) P(B2|A2) = 4/7 It
means if a vegetable vendor is selected at random from the above sample of 15 people, the probability
that his/her family lives in Dhaka is 9/15 or 0.6. If the persons home district is under Chadpur division,
the probability that his family actually lives in Dhaka becomes 4/7 or 0.57.
The conditional probability table showing the data is given below:
Family in Prior Conditional Joint Posterior

Dhaka Event, Probability, Probability, Probability, Probability,
Ai P( Ai) P(B2| Ai) P(Ai andB2) P(Ai |B2)
Yes,A1 9/15 4/9 4/15 4/7
No, A2 6/15 3/6 3/15 3/7
Total 7/15 1
16 | P a g e
One-Sample Tests of Hypothesis
The mean of the data of vendors monthly income is 26025. We can test the null hypothesis that the
population mean is 26025.The alternate hypothesis is The mean is not 26025.These two
hypotheses are written:
H0 : = 26025
H1 : 26025
t = 26025-25500 = 0.20392 < t, n-1= 2.093
11513.74/20
We have failed to reject the null hypothesis. That means, we do not have evidence in our dataset to
disprove the null hypothesis.
Correlation Analysis
Correlation Analysis is the study of the relationship between variables. It is also defined as group
of techniques to measure the association between two variables.
The Coefficient of Correlation (r) is a measure of the strength of the relationship between two
variables.
17 | P a g e
It requires interval or ratio-scaled data.
It can range from -1.00 to 1.00.
Values of -1.00 or 1.00 indicate perfect and strong correlation.
Values close to 0.0 indicate weak correlation.
Negative values indicate an inverse relationship and positive values indicate a direct relationship.
Correlation coefficient is calculated using the following equations:
From the surveyed data, the correlation coefficient of monthly income (profit) and yesterdays sales
is being calculated using the above mentioned equation.
=6904.232; y = 11973.078
r = 0.9198
That is positive, so we see there is a direct relationship between the number of between monthly
income (profit) and yesterdays sales. The value is 0.9198, so we conclude that the association is
strong.
Regression Analysis
In regression analysis we use the independent variable (X) to estimate the dependent variable (Y).
The relationship between the variables is linear.
Both variables must be at least interval scale.
The least squares criterion is used to determine the equation.
Regression Equation: An equation that expresses the linear relationship between two variables.
Least Squares Principle: Determining a regression equation by minimizing the sum of the
squares of the vertical distances between the actual Y values and the predicted values of Y.
18 | P a g e
Using the data related to between monthly income and yesterdays sales, the following linear
regression equation is being produced.
Here, Slope of the Regression Line,

b = r *Sy /Sx
= 0.9198*11973.08/ 6904.232
= 1.5951
And, Y- Intercept,
a = Y bX
= 26025 - 1.5951*11200
= 8159.88
The Regression Equation is,
^
y= a + bX
^
y = 8159.88 + 1.5951X
19 | P a g e
2
Regression Equation
60000
50000
40000 f(x) = 1.6x + 8160.28

R = 0.85 Monthly Income, Y
30000
Monthly Income Linear (Monthly Income, Y)
20000
10000
0
0 10000 20000 30000
Yesterday Sales
This trend line shows that there appears to be a positive relationship between monthly income
and yesterdays sales. The strength and direction of this relationship is measured by determining
the coefficient of correlation (r). Using Microsoft Excel in this case, we get r = 0.9198. As it is
positive, there is a direct relationship between monthly profit and the number of yesterdays total
sales. The value of 0.9198 is fairly close to 0.92, so the association is strong. The coefficient of
determination= r2 = (.9198)2 = 0.846 It means 84.6% of the variation in monthly profit is
accounted for by the variation in the number of yesterdays sales.
The Standard Error of Estimate

20 | P a g e
^
(Y Y ) 2
s y.x
n2
The standard error of estimate measures the scatter, or
dispersion, of the observed values around the line of regression.
Standard Error of the Estimate, Sy.x = 419427439.9/(20-2)
= 4827.165
Confidence and Prediction Interval

If we have to determine confidence interval for all vegetable vendors whose total daily
sales=20,000 Tk. for Md. Bahar Bhuiya, a vegetable vendor from Bashundhara residential area
with similar condition, we have to perform following calculations: y = 8159.88 + (1.5951
20000) = 40,061.88 Tk. Using table, we get t statistics for 95% confidence level = 2.101
Standard error= 4827.165, n=20
So, confidence interval= 40,061.88 (2.101 4827.165 20 ) =40,061.88 2267.792. Thus the
95% confidence interval for all vegetable vendors whose total daily sales is 20000 Tk. and total
is from 37794.088 up to 42329.672.
Prediction interval= 40061.88 (2.101 4827.165) = 40,061.88 10141.87. Thus the

prediction interval for Md. Bahar Bhuiya is from 29920.01 up to 50203.75.
21 | P a g e
Conclusion
Most of the vendors monthly income exceed 20,000 Tk. did not want to give me their full
information, some of them suspecting I am working with the government and will force them to
pay tax. Recently there was a terrorist related incident happen in our area, in one case the shop
owner suspects that I am working with the RAB. These are the problem I encountered in my
survey but tried my best to collect as much data is possible.
In this report, it is evident that descriptive statistics can only be used to describe the group that is
being studied. The results cannot be generalized to any larger group. On the other hand,
inferential statistics does start with a sample and then generalizes to a population. This
information about a population is not stated as a number. . Instead these parameters are expressed
as a range of potential numbers, along with a degree of confidence. In order to this; however, it is
imperative that the sample is representative of the group to which it is being generalized. To
address this issue of generalization, we have tests of significance tell us the probability that the
results of the analysis could have occurred by chance when there is no relationship at all between
the variable we studied in the population. In the light of this knowledge, it can be said that the
issues discussed in this report like daily income, profit, distribution of vendors according to
administrative divisions, literacy of female family members etc. show the picture of our sample
only. It doesnt generalize the situation of vegetable vendors of Bangladesh.
22 | P a g e
References
Lind DA,Marchal WG,Wathen SA,2005, Statistical Techniques in Business & Economics, 12th
edition,McGraw Hill Irwin, New York.
23 | P a g e

Statistics Term Paper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Term Paper

Uploaded by

Copyright:

Available Formats

Introduction

This socioeconomic environmental article briefly includes philosophies on the socioeconomic

The specific objective of this report is

To understand investment capacity.

To understand purchasing pattern.

To understand overall socio economic status.

Limitations of the Study

The study was limited by a number of factors

Thirdly, time constraint led to get narrower outcomes.

The sample frames that have been selected are as follows

1. Vegetable Sellers at Bashundhara residential area & kuril in Dhaka City.

Qualitative or Attribute variable - the characteristic being studied is nonnumeric.

Quantitative variables can be classified as either discrete or continuous.

Graphical presentation of data

Employee in a different 4 20%

3 Owner of a different business

Frequency Distribution & Relative Class Frequencies

Frequency Distribution: A Frequency distribution is a grouping of data into mutually exclusive

Relative Class Frequencies

Graphical presentation of Frequency distribution

Histogram of seller's monthly income

Seller's monthly income

Mean, Median, Mode

Properties of the Median are as follows:

The median for the grouped data is calculated as follows:

f 2 = Frequency of the class after the class mode

f 0 = The difference between the frequency of class mode and the

frequency of the class before the class mode

Range = 50000 - 0 = 50000

Standard Deviation, S = 3095000000/20-1

CV= 12,763.02/ 25500 *100

Family Lives in Comilla Chadpur Total

The probability that event A1 will happen is, P (A1) = 9/15

P(A2 ) = 6/15 , The probability that family lives outside Dhaka.

The conditional probabilities are: P(B2|A1) = 4/9 ,

The conditional probability table showing the data is given below:

Family in Prior Conditional Joint Posterior

t = 26025-25500 = 0.20392 < t, n-1= 2.093

Correlation coefficient is calculated using the following equations:

Here, Slope of the Regression Line,

The Regression Equation is,

40000 f(x) = 1.6x + 8160.28

The Standard Error of Estimate

Standard Error of the Estimate, Sy.x = 419427439.9/(20-2)

Confidence and Prediction Interval

Standard error= 4827.165, n=20

Prediction interval= 40061.88 (2.101 4827.165) = 40,061.88 10141.87. Thus the

You might also like