Professional Documents
Culture Documents
Bangladesh is one of the most densely populated countries of the world where most of the people
live under the line of poverty. Due to the extreme poverty, people migrate to the urban region for
seeking of better socio-economic condition. These migrated people mostly find jobs in different
informal sectors such as petty retail trade, transport, manufacturing, construction and domestic
services. Among these entire occupations, shop pulling absorbs a significant number of migrated
people.
1 | Page
Objectives of the Study
General Objectives
The general objectives of this study is to identify some factors affecting average daily family
income, purchase capacity and some factors nominally affecting investment capacity of
Vegetable Seller owners at Bashundhara residential area & kuril in Dhaka City.
Specific Objectives
Secondly, sample size was very small (only 20) to present the proposed scenario.
Fourthly, the knowledge constraint of the researcher was another limitation for this study.
Finally, owners were not willing to give answers when asked by personal questions given in
the Questionnaire.
2 | Page
Research Methodology
PRIMARY RESEARCH
Primary Research has been done to get an exhaustive understanding of what are the factors that
might contribute to the socio-economic condition of Vegetable Seller in Bangladesh. Thorough
information was collected from primary quantitative and further secondary research, one set of
questionnaire was provided to us by our honorable faculty member.
SECONDARY RESEARCH
Since, the Vegetable Sellers of Bangladesh have never been subject to extensive research, there is
lack of literature directly related to it. The strength of secondary study as a contribution towards
development of the core of the research is nothing much to talk about. Before any analysis of this
unexplored area could be made, the available secondary data regarding the small shops should be
kept in consideration. These reports, books and websites were thoroughly studied.
Along with that, several websites were also being comprehensively studied. Newspaper articles
from New Age, The Daily Star were also collected and studied. From this wide ranging source of
data, we were able to manage to achieve valuable insights. This information helped us set the
parameters for our primary research.
Sample Plan
3 | Page
SAMPLE DESIGN
Target Population
The Target Population of the study is the Vegetable Sellers at Bashundhara residential area &
kuril in Dhaka City.
Sample Frame
Sample Element
Statistical Analysis
4 | Page
About Statistics
Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting
numerical data to assist in making more effective decisions. Statistical techniques are used
extensively by marketing, accounting, quality control, consumers, professional sports people,
hospital administrators, educators, politicians, physicians, etc. There are two types of statistics.
They are:
Descriptive Statistics: A methods of organizing, summarizing, and presenting data in an
informative way.
Inferential Statistics: A decision, estimate, prediction, or generalization about a population,
based on a sample.
Types of variables
Considering the types of the variables as follows:
1. Discrete variables: can only assume certain values and there are usually gaps between
values.
2. Continuous variable: can assume any value within a specified range.
Level of Measurement
Considering the four levels of measurement as follows:
Nominal level: data that is classified into categories and cannot be arranged in any particular
order. Examples: eye color, gender, religious affiliation.
Ordinal level: involves data arranged in some order, but the differences between data values
cannot be determined or are meaningless. Example: During a taste test of 4 soft drinks, Mellow
Yellow was ranked number 1, Sprite number 2, Seven-up number 3, and Orange Crush number
4.
5 | Page
Interval level: similar to the ordinal level, with the additional property those meaningful
amounts of differences between data values can be determined. There is no natural zero point.
Example: Temperature on the Fahrenheit scale.
Ratio level: The intervals level with an inherent zero starting point. Differences and ratios are
meaningful for this level of measurement. Examples: Monthly income of surgeons, or distance
traveled by manufacturers representatives per month.
The data that has been collected through the primary research are of interval level and ratio level.
6 | Page
Seller's Monthly Income
50000
40000
30000
20000
Monthly Income Monthly Income
10000
Seller
From the collected data, the monthly income of 20 vegetable sellers at Bashundhara residential
area & Kuril has been taken and the following bar chart is produced.
Pie Chart
A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to
illustrate numerical proportion. In a pie chart, the arc length of each slice is proportional to the
quantity it represents. While it is named for its resemblance to a pie which has been sliced, there
are variations on the way it can be presented.
From the collected data, the following pie chart is produced using the information of the sellers
profile before this business.
7 | Page
Sellers Profile Number of owners Percentage
Owner of a different 6 30%
business
Employee in the same 7 35%
sector
Unemployed 3 15%
Total 20 100%
Seller's Profile
8 | Page
Class midpoint: A point that divides a class into two equal parts. This is the average of the
upper and lower class limits.
Class frequency: The number of observations in each class.
Class interval: The class interval is obtained by subtracting the lower limit of a class from the
lower limit of the next class.
9 | Page
Histogram for a frequency distribution based on quantitative data is very similar to the bar chart
showing the distribution of qualitative data. The classes are marked on the horizontal axis and
the class frequencies on the vertical axis. The class frequencies are represented by the heights of
the bars.
The following is an example of a histogram using the date related to the monthly income of the
surveyed Vegetable Seller Owners at Bashundhara residential area & Kuril.
Frequency Polygon
Here frequency polygon is used to determine which has the highest percentage of frequencies. In
our sample we have 20 vegetable vendors who distributed in five class intervals. As the
difference between the total numbers of frequencies is quite large, the frequencies have been
converted to relative frequencies.
10 | P a g e
0.4
0.35
0.3
0.25
0.2
Relative Frequency
0.15 Relative Frequency
0.1
0.05
0
5000 15000 25000 35000 45000
Class Midpoint
From this chart, it can be estimated that 35% vegetable vendors whose monthly profit is 30000-
40000 Tk.
The arithmetic mean is the most widely used measure of location. It requires the interval scale.
Its major characteristics are:
Every set of interval-level and ratio-level data has a mean.
All the values are included in computing the mean.
A set of data has a unique mean.
The mean is affected by unusually large or small data values.
The arithmetic mean is the only measure of central tendency where the sum
of the deviations of each value from the mean is zero.
11 | P a g e
Using the data related to the monthly income of the surveyed Vegetable Seller Owners at
bashundhara residential area & kuril, the following calculations find out the arithmetic mean.
Mean= 510000/20 = 25500.
Median
The Median is the midpoint of the values after they have been ordered from the smallest to the
largest.
There are as many values above the median as below it in the data array.
For an even set of values, the median will be the arithmetic average of the two middle numbers.
12 | P a g e
n/ 2fpm
Median= Lm + fm *c
Where,
Lm = The lower boundary of the class median
n = The total frequency
Fpm = The cumulative frequency before class median
fm = The frequency of the class median
c = The class width
20 /211
Median =25000+ 4 )*10,000 = 22500
Mode
The mode is the value of the observation that appears most frequently.
The mode for the grouped data is calculated as follows:
f 1f 0
Mode = L1 + ( 2 f 1f 0f 2 )* c
Where,
L1 = The lower boundary of class mode
f 1 = the frequency of class mode and the
74
Mode= 30000 + 2742 ) *10000
13 | P a g e
= 33750
Measures of dispersion
Range
Standard Deviation
= 12,763.02
Coefficient of Variation
CV = (SD/Mean) X 100
= 50.05
14 | P a g e
This large value of standard deviation indicates that the observations are widely scattered around
the mean. So the mean is not a reliable measure of location in this case.
Inferential Statistics
Inferential statistics is the method used to determine something about a population on the basis of a
sample. We use inferential statistics to try to infer from the sample data about the population-to make
judgments of the probability that an observed difference between groups is a dependable one or one that
might have happened by chance in this study. Thus, it is used to reach conclusions that extend beyond the
immediate data. Following is the application of different examples of inferential statistics in the context of
our collected data:
Table A: Contingency table showing the number of vegetable vendors whose home district is in Comilla
or Chadpur division and whether their families live in or outside Dhaka.
We can determine the probability of randomly selecting a vegetable vendor whose family lives in
Dhaka and has home district belonging to Chadpur division from this contingency table using the
rules of addition and multiplication.
Here two events occur at the same time-the vendor is from Chadpur and has family living in
Dhaka.
The conditional probability that event B2 will happen is, P (B2|A1) = 4/15
Using the general rule of multiplication, P (A1 and B2) = P(A1 ) P(B2|A1)= 9/15*4/15= 0.16
15 | P a g e
So, the probability of selecting a vegetable vendor whose family lives in Dhaka and has home
district belonging to Chadpur division is 0.16 We use the general rule of addition to find the
probability of selecting a vegetable vendor whose family lives in Dhaka or has home district
belonging to Chadpur division. The probability that event B2 will happen is, P (B2 ) = 7/15
The joint probability that both event A1 and B2 will happen is, P (A1 and B2) = 4/15 P (A1 or
B2) = P(A1 ) + P(B2)- P(A1 and B2) = 4 /15 + 7/15 4/15 = 0.47. So, the probability of selecting a
vegetable vendor whose family lives in Dhaka or has home district belonging to Chadpur division is 0.47.
Bayes Theorem
By using Bayes Theorem, we can determine the probability of a vegetable vendors family living in
Dhaka given his/her home district is in Chadpur division. A1 and A 2 are 2 mutually exclusive and
collectively exhaustive events.
The prior probabilities are: P(A1 ) = 9/15 , The probability that family lives in Dhaka
The probability that the vegetable vendor whose family lives in Dhaka from Chadpur division.
P (B2|A2) = 3/6 , The probability that the veg. vendor whose family lives in Dhaka is from Chadpur
division. Using Bayes theorem, P(A1|B2) = P(A1) P(B2|A1) P(A1 ) P(B2|A1) +P(A2 ) P(B2|A2) = 4/7 It
means if a vegetable vendor is selected at random from the above sample of 15 people, the probability
that his/her family lives in Dhaka is 9/15 or 0.6. If the persons home district is under Chadpur division,
the probability that his family actually lives in Dhaka becomes 4/7 or 0.57.
16 | P a g e
One-Sample Tests of Hypothesis
The mean of the data of vendors monthly income is 26025. We can test the null hypothesis that the
population mean is 26025.The alternate hypothesis is The mean is not 26025.These two
hypotheses are written:
H0 : = 26025
H1 : 26025
11513.74/20
We have failed to reject the null hypothesis. That means, we do not have evidence in our dataset to
disprove the null hypothesis.
Correlation Analysis
Correlation Analysis is the study of the relationship between variables. It is also defined as group
of techniques to measure the association between two variables.
The Coefficient of Correlation (r) is a measure of the strength of the relationship between two
variables.
17 | P a g e
It requires interval or ratio-scaled data.
It can range from -1.00 to 1.00.
Values of -1.00 or 1.00 indicate perfect and strong correlation.
Values close to 0.0 indicate weak correlation.
Negative values indicate an inverse relationship and positive values indicate a direct relationship.
From the surveyed data, the correlation coefficient of monthly income (profit) and yesterdays sales
is being calculated using the above mentioned equation.
=6904.232; y = 11973.078
r = 0.9198
That is positive, so we see there is a direct relationship between the number of between monthly
income (profit) and yesterdays sales. The value is 0.9198, so we conclude that the association is
strong.
Regression Analysis
In regression analysis we use the independent variable (X) to estimate the dependent variable (Y).
The relationship between the variables is linear.
Both variables must be at least interval scale.
The least squares criterion is used to determine the equation.
Regression Equation: An equation that expresses the linear relationship between two variables.
Least Squares Principle: Determining a regression equation by minimizing the sum of the
squares of the vertical distances between the actual Y values and the predicted values of Y.
18 | P a g e
Using the data related to between monthly income and yesterdays sales, the following linear
regression equation is being produced.
^
y= a + bX
^
y = 8159.88 + 1.5951X
19 | P a g e
2
Regression Equation
60000
50000
10000
0
0 10000 20000 30000
Yesterday Sales
This trend line shows that there appears to be a positive relationship between monthly income
and yesterdays sales. The strength and direction of this relationship is measured by determining
the coefficient of correlation (r). Using Microsoft Excel in this case, we get r = 0.9198. As it is
positive, there is a direct relationship between monthly profit and the number of yesterdays total
sales. The value of 0.9198 is fairly close to 0.92, so the association is strong. The coefficient of
determination= r2 = (.9198)2 = 0.846 It means 84.6% of the variation in monthly profit is
accounted for by the variation in the number of yesterdays sales.
= 4827.165
So, confidence interval= 40,061.88 (2.101 4827.165 20 ) =40,061.88 2267.792. Thus the
95% confidence interval for all vegetable vendors whose total daily sales is 20000 Tk. and total
is from 37794.088 up to 42329.672.
21 | P a g e
Conclusion
Most of the vendors monthly income exceed 20,000 Tk. did not want to give me their full
information, some of them suspecting I am working with the government and will force them to
pay tax. Recently there was a terrorist related incident happen in our area, in one case the shop
owner suspects that I am working with the RAB. These are the problem I encountered in my
survey but tried my best to collect as much data is possible.
In this report, it is evident that descriptive statistics can only be used to describe the group that is
being studied. The results cannot be generalized to any larger group. On the other hand,
inferential statistics does start with a sample and then generalizes to a population. This
information about a population is not stated as a number. . Instead these parameters are expressed
as a range of potential numbers, along with a degree of confidence. In order to this; however, it is
imperative that the sample is representative of the group to which it is being generalized. To
address this issue of generalization, we have tests of significance tell us the probability that the
results of the analysis could have occurred by chance when there is no relationship at all between
the variable we studied in the population. In the light of this knowledge, it can be said that the
issues discussed in this report like daily income, profit, distribution of vendors according to
administrative divisions, literacy of female family members etc. show the picture of our sample
only. It doesnt generalize the situation of vegetable vendors of Bangladesh.
22 | P a g e
References
Lind DA,Marchal WG,Wathen SA,2005, Statistical Techniques in Business & Economics, 12th
edition,McGraw Hill Irwin, New York.
23 | P a g e