You are on page 1of 39

Statistical survey

A survey is a process of collecting data from existing population units with no particular control over factors that may affect the population characteristics of interest in the study.

Sources of data
The sources of information may be either primary or secondary. When an investigator collects first hand information for the study, such data are known as primary. If he obtains the data from published or unpublished sources such data will constitute secondary data for him.

Methods of collecting primary data


Direct personal interviews Indirect oral interviews Information from correspondents Mailed questionnaire methods Schedules sent through enumerators

Techniques of data collection


There are two important techniques of data collection Census method or complete enumeration survey method Sampling method

Census method
Under census method data are obtained from each and every unit of population The results are more representative, accurate and reliable. It is appropriate for obtaining information on rare events It can be widely used as a basis for various surveys. Inspite of all these it is not very popularly used because of time, cost and efforts involved.

Sampling
Sampling is simply the process of learning about the population on the basis of a sample drawn from it. Only a part of the universe is studied and the conclusions are drawn on that basis for the entire universe. A sample is a subset of population units. Process of sampling involves three elements. Selecting the sample Collecting the information Making an inference about the population

Statistics and Parameters


Measures such as mean, median, mode and standard deviation are used to describe the characteristics of sample, they are called statistics When these measures are used to describe the characteristics of a population then they are called parameters

Laws of sampling
There are two laws on which theory of sampling is based Law of statistical regularity: it states that moderately large number of items chosen at random from a large group are almost sure on the average to possess the characteristics of the large group. Law of inertia of large numbers: it states that other things being equal, larger the size of the sample more accurate the results are likely to be because large numbers are more stable as compared to small ones.

Essentials of sampling
Representativeness: a sample should be so selected that it truly represents the universe otherwise the results may be misleading Adequacy: the size of sample should be adequate Independence: all items should be selected independently i.e all items of universe should have the same chance of being selected in the sample. Homogeneity: there should not be any difference between nature of units of population and sample

Methods of sampling
sampling methods non probability sampling methods Probability sampling Methods

Quota sampling

Convenience sampling

Judgement sampling

restricted random Samples

Unrestricted Random sampling or simple Random sampling

Stratified sampling

Systematic sampling

Cluster sampling

Judgment sampling
the sample items depends exclusively on the judgment of the investigator. Eg if a sample of 10 students is to be selected from a class of sixty for analysing the spending habits of students, the investigator would select 10 students who in his opinion are representative of the class.

Quota sampling
quotas are set up according to some specified characteristics such as so many in each of several income groups, so many in each age, so many with certain political or religious affiliations and so on. Each interviewer is then told to interview a certain number of persons which constitute his quota. Within the quota the selection sample items depends on personal judgement.

Convenience sampling
sample is obtained by selecting conenient population units. This method is also called the chunk. A chunk refers to that fraction of the population being investigated by conveneince. A sample obtained from readily available lists such as automobile registration, telephone directories etc..

Simple random sampling


Simple random sampling selects samples by methods that allow each possible sample to have an equal probability of being picked and each item in the entire population to have an equal chance of being included in the sample e.g lottery method

Stratified sampling
Population is divided into relatively homogenous groups called strata. Then two approaches are used: We select at random from each stratum a specified number of elements corresponding to the proportion of that stratum in the population as a whole Drawing an equal number of elements from each stratum and give weight to the results according to the startum proportion of total population.

Systematic sampling
elements are selected from the population at a unifrom interval that is measured in time, order or space. Eg if we want to interveiw every 20th student on the college campus we would choose a random starting point in the first 20 names in the student directory and then pick every 20th name thereafter.

Cluster sampling
we divide the population in clusters or groups and then select random sample. These clusters are representative of the population as a whole. Eg market research team is attempting to determine by sampling the average number of television sets per household in a large city. They could use a city map to divide the territory into blocks and then choose a certain number of blocks for interviewing. Each household in each of these blocks would be interviewed. A well designed cluster sampling can produce more precise results.

Cont..
difference between stratified and cluster sampling is that we use stratified sampling when each group has small variation within itself but there is wide variation between the groups. Cluster sampling is used when there is considerable variation within each group but the groups are essentially similar to each other.

Sample size
It means number of sampling units selected from the population for investigation. It should neither be too small nor too large, it should be optimum. The following factors should be considered: Size of universe: the larger the size of the universe, the bigger should be the size Resouces available: if the resouces available are vast a large sample size could be taken Degree of accuracy: the greater the degree of accuracy desired, the larger should be the sample size.

Cont..
Homogeneity or heterogeneity: small sample serves the purpose in homogenous and large sample in case of heterogeneous universe Nature of study:for intensive and continuous study small sample is inevitable. If the studies which are not likely to repeated and are quiet extensive in nature, then large sample is suitable Method of sampling Nature of respondents: if the respondents are not cooperating then large sample should be selected.

Advantages and disadvantages (sampling)


Adv: Less time consuming Less cost More reliable results More detailed information Some it is the only method available It is often used to check the accuracy of the information obtained by census method

Cont..
Disadv : It should be carefully planned and executed It requires the services of experts At time it becomes complicated If the infromation is required for each and every unit then a complete enumeration survey is necessary

Sampling errors
The error arising due to drawing inferences about the population on the basis of few observations is termed as sampling error. It is of two types: Biased errors : these errors arise from any bias in selection, estimation etc. if in place of simple random sampling , startified is used, then such errors are called biased sampling errors.

Causes of bias
Faulty process of selection Faulty work during the collection Faulty methods of analysis Unbaised errors: these errors arise due to chance difference between the members of population included in the sample and those not included. An error in statistic is the difference between the value of a statistic and that of the corresponding parameters

Non sampling errors


when a complete enumeration of units in the universe is made, then it is difficult to avoid errors of observation or ascertainment in processing of data and tabulation.

Sampling distribution
Population Sample Sample statistics Sampling distribution

Water in a river

10 gallon containers of water

Mean number of parts of mercury per million parts of water Median height

Sampling distribution of the mean

All professional basket ball teams All parts produced by a manufacturing process

Group of 5 players

Sampling distribution of the median Sampling distribution of the proportion

50 parts

proportion defective

Standard error
the standard deviation of the distribution of a sample statistic is known as the standard error of the statistic Standard deviation of the distribution of sample means -------standard error of the mean Standard deviation of the distribution of sample proportion -------standard error of the proportion Standard deviation of the distribution of sample medians -------standard error of the median Standard deviation of the distribution of sample ranges -------standard error of the range

Central limit theorem


The mean of the sampling distribution of the mean will equal the population mean regardless of the sample size, even if the population is not normal. This relationship between the shape of the population distribution and the shape of the sampling distribution of the mean is called the central limit theorem

Estimation
probablity theory forms the foundation for statistical inference, the branch of statistics concerned with using probability concepts to deal with uncertainty in decision making. Statistical inference is based on estimation and hypothesis testing, both are concerned with using sample statistics to estimate population parameters.

There are two types of estimates about a population : a point estimate and an interval estimate. A point estimate is a single number that is used to estimate an unknown population parameters. An interval estimate is a range of values used to estimate a population parameters.

Estimator and Estimates


Any sample statistic that is used to estimate a population parameter is called an estimator. An estimator is a sample statistic used to estimate a population parameter. When we have observed a specific numerical value of our estimator, we call that value as estimate. In other words, an estimate is a specific observed value of a statistic.

Eg
Population in which we are interested Population parameter we wish to estimate Sample statistic we will use as an estimator Estimate we make

Employees in a furniture factory Applicants for town manager of chapel hill Teenagers in a given community

Mean turnover per Mean turnover for year a period of 1 month Mean formal education (years) Proportion who have criminal records

8.9% turnover per year

Mean formal 17.9 years of education of every formal education 5th applicant Proportion of a sample of 50 teenagers who have criminal records 0.02 or 2% have criminal records

Properties of a good estimator


Unbiasedness: the mean of the sampling distribution of sample means taken from the same population is equal to the population mean itself Efficiency: it refers to the size of the standard error of the statistic. If we compare two statistics from a sample of the same size and try to decide which one is more efficient, we would pick the statistic that has the smaller standard error. Consistency: a statistic is a consistent estimator of a population parameter if as the sample size increases, it becomes almost certain that the value of the statistic comes very close to the value of the population parameter. Sufficiency: an estimator is sufficient if it makes so much Use of the information in the sample that no other estimator could extract from the sample.

Confidence interval
In statistics, the probability that we associate with an interval estimate is called the confidence level. This probability indicates how confident we are that the interval estimate will include population parameter. A higher probability means more confidence i.e. commonly used confidence levels are 90%, 95% and 99%. The confidence interval is the range of the estimate we are making.

Hypothesis testing
hypothesis testing begins with an assumption called a hypothesis, that we make about a population parameter. Then we collect sample data, produce sample statistics, and use this information to decide how likely it is that our hypothesized population parameter is correct.

Cont
In hypothesis testing we must state the assumed or hypothesized value of the population parameter before we begin sampling. The assumption we wish to test is called the null hypothesis and is symbolized as Ho Suppose the null hypothesis is that the population mean is equal to 500 Ho: =500 hrs If our sample results fail to support the null hypothesis, we must conclude that something else is true When ever we reject the hypothesis, the conclusion we do accept is called the alternative hypothesis and is symbolized as H1

There are three possible alternative hypothesis H1: 500----- two tail test H1:>500 ---- right tail test ----one tail test in right direction H1:<500 -----left tail test----one tail test in left direction

Level of significance ()
having set up the hypothesis, the next step is to test the validity of Ho against H1 as certain level of significance. The confidence with which an experimenter rejects or accepts a null hypothesis depends upon the significance level adopted. It is expressed in % such as 5% is the probability of rejecting the null hypothesis if it is true. It indicated the percentage of sample means that is outside certain limits. In other words it indicates the difference between the sample statistic and the hypothesized population parameter.

Type I and Type II error


Rejecting a null hypothesis when it is true is Type I error and its probability is denoted by Accepting a null hypothesis when it is false is called as type II error and its probability is denoted by

You might also like