Professional Documents
Culture Documents
BASIC DEFINITIONS
STATISTICS The word statistics which comes from the Latin word status, meaning a political state, originally meant for information useful to the state.
Statistics defined as discipline that include procedure and techniques used to collect, process and analyse numerical data to make inferences and to reach the decision in the face of uncertainties. The word statistics refers to numerical facts which are systematically arranged. For example statistics of price, statistics of road accidents, statistics of births, statistics of death etc in all these examples the word statistics denotes a set of numerical data in respective field.
1
BASIC DEFINITIONS DESCRIPTIVE STATISTICS Descriptive statistics is the branch of statistics which deals with the collection of data, their graphical display and computations of numerical quantities that provide the information about the data. INFERENTIAL STATISTICS Inferential statistics deals with the procedure for making inferences about the data. It include the estimation of of population parameter and testing of hypotheses.
BASIC DEFINITIONS POPULATION A population is a collection or set of all possible observation whether finite or infinite,relvant to some characteristic of interest. A statistical population may be real such as the height of college students, or hypothetical such as all the possible outcomes from the toss of coin. The number of observation in a finite population is called the size of the population and is denoted by N.
A sample is part or subset of the population. Generally it consists of some of the observation but in certain situation it may include the whole of the population.
The number of observation include in a sample is called the size of the sample and is denoted by small letter n. The information derived from the sample data is used to draw conclusion about the population
BASIC DEFINITIONS
IMPOTANCE OF STATISTICS Statistics assists in summarizing the large sets of data in the form that is easily understandable Statistics assists in the efficient design of laboratory and field experiments as well as in surveys. Statistics assists in a sound and effective planning in any field of inquiry Statistics assists in drawing general conclusion and is making predictions how much of thing will happen under given condition.
5
BASIC DEFINITIONS
IMPOTANCE OF STATISTICS Statistical techniques is being used powerful tools for analysing numerical data, are used almost in every branch of learning. Banks insurance companies and Governments all have their statistics departments. A modern administrator whether in public or private sector leans on statistical data to provide a factual basis for decision. A social scientist uses statistical methods in various areas of socio-economic life of nation.
6
INTRODUCTION
A data set can be summarized in a single value. Such a value usually somewhere in the centre and representing the entire data set, is a value at which the data have a tendency to concentrate. Since a measure of central tendency indicate the location or the general position of the data set in the range of observation, it is also known as a Measure of Location or Position. A numerical value like mean median mode calculated from population is know as parameter and a numerical value calculated from sample is called the statistics.
7
Thus the mathematically mean of set of n observation x1,x2,x3..xn is defined as Mean = Xi N (Where I = 1,2,3,n)
MEASURE OF CENTRAL TENDENCY THE MEDIAN Median (n is Even) = 38+39/2 = 77/2 = 38.5 For grouped data median is calculate by the given formula. Median = L+ h/f [ n/2 C] Where L = Lower class boundary of the Median group H = Class interval of the median group F = Frequency of the median group C = Cumulative frequency of the preceding group n/2 is to indicate the median group in given data.
12
Where L = Lower class boundary of the modal class Fm = Frequency of the modal class F1 = Frequency of preceding class to the modal class F2 = Frequency of following class to the modal class H = Class interval ob modal class
14
An absolute measure of dispersion is one that measures the dispersion in term of same units, as the units of data.
RELATIVE MEASURE
A relative measure of dispersion is one that is express in the form of a ratio, co-efficient and is independent of units of measurements.
DIFFERENT MEASURE OF DISPERSION The Range and Coefficient of Range Mean Deviation The Variance Standard Deviation Co-efficient of Variation (CV)
16
18
The variance of the sum or difference of two independent variables is equal to the sum of their respective variance.
Var (X + Y) = Var (X) + Var (Y) Var (X - Y) = Var (X) + Var (Y) If k subgroup of data consisting of n1,n2.n3nk observation having their respective mean X1,X2,X3Xk and variance S12,S22,S32Sk2 than the combined variance is calculated by given formula. SC2 = n1[S12 + (X1-Xc)2 + n2[S22 + (X2+Xc)2+.+nk[Sk2+(Xk-Xc)2 n1 + n2 + n3 + + nk
22
MEASURE OF DISPERSION
COEFFICIENT OF VARIANCE The variability of two or more than two sets of data is to be compared by using the measure of dispersion which known as coefficient of variance, abbreviated as CV. So CV is defined as the standard deviation as percentage of arithmetic mean of the data set, symbolically it is defined as CV = S/Mean *100 where S is standard deviation CV is a pure number without units so therefore it is used to compare the variation in two or more data sets in different units.
23
CV is also used to compare the performance of two candidates or of two players given their scores in various papers or games
The smaller the coefficient of variation the more consistent is the performance of the player or larger the coefficient of variation the less consistent is the performance of the player. So CV used as a criterion for the consistent performance of the candidates or the player.
24
The organization of set of data into classes or groups together with their number of observation in each class or group is called a frequency distribution.
The number of observation falling in particular class is referred to the class frequency or simply frequency and it is denoted by f. Data presented in the form of frequency distribution are also called the grouped data while the data in the original (raw) form are referred to as ungrouped data.
25
A class mark is also called the midpoint and it is obtained by dividing the sum of both limits of class by 2.
26
(5) Determine the remaining class limits by adding class interval repeatedly in lower class limit. First we complete our lower class limits by adding class interval than the upper limit.
28
By scanning of data we find that the largest weight is 204 and the smallest weight is 68 grams so the range is 204 68 = 136
Suppose we decide to take the 7 classes of equal size then the size (interval ) of the classes is 136/7 = 19.47 we take this as 20
29
QUARTILES
The three values which divided the distribution or data set into four equal parts is called the Quartiles. These values are denoted by Q1,Q2 and Q3. Q1 is called the lower quartile and Q3 is called the upper quartile.
30
32
An experiment which produce different results even though it is repeated a large number of time under similar condition is called the random experiment.
The tossing of fair coin, the throwing of a balanced die, drawing of a card from well shuffled deck of 52 cards are the example of random experiment. PROPERTIES OF RANDOM EXPERIMENT
BASIC CONCEPTS OF PROBABILITY EXHAUSTIVE EVENTS Events are said to be collectively exhaustive, when the union of mutually exclusive is the entire sample space S. In tossing of coin we have two mutually exclusive events head and tail, if we take the union of these mutually exclusive events it becomes equal to the sample space of a coin. so head and tail are also called the exhaustive events.
39
VERBAL STATEMENTS
SET NOTATION
Event A AS Event A is Impossible A= Event A is Sure A=S Event A does not occur A = S-A Event A or Event B AUB Event A and Event B AB Event A and Event B are mutually exclusive AB = Event A and Event B are exhaustive AUB = S
40
When the sample points in sample space is very large, it becomes very difficult to list them all in a subset.
Then we need some method or rules which helps us to count the number of sample points without actually listing them. A few of the basic rules frequently use in counting as unedr Rule of Multiplication Rule of Permutation Rule of Combination
41
For example: The compound experiment of tossing of coin and throwing a die together consists of two experiments.
The coin consisting two distinct outcomes (m) {H,T} and the die consisting six distinct outcomes (n) {1,2,3,4,5,6]. So total number of outcomes are m = 2 & n = 6 Outcomes = m*n = 2*6 = 12
42
RANDOM VARIABLES
INTRODUCTION A random variable is also called a chance variable or simply variate and is abbreviated as r.v. The random variables are denoted by capital letters such as X,Y,Z while the values taken by them are represented by small letters such as x,y,z. There are two types of random variables. Discrete Random Variable Continuous Random Variables
44
RANDOM VARIABLES
PROBABILITY DISTRIBUTION The probability distribution of a random variable is expressed in the a tabular form by showing all the possible values of X with their respective probabilities. A probability distribution must satisfy the following two properties of probability. 1. f(xi) 0 for all I 2. f(xi) = 1 In other words prob. Of an outcome is greater than or equal to zero and the sum of prob. Of all outcomes is equal to one.
46
The height of a person. The temperature at a place. The amount of rainfall. Time to failure for an electronic system. The pressure in an automobile tire. Width of a room.
47
RANDOM VARIABLES
CONTINUOUS RANDOM VARIABLE The function f(x) for continuous variable is called the probability density function, abbreviated as p.d.f or simply density function. A p.d.f has following properties. 1.f(x) > 0 for all 2. f(x) = 1 (- to + ). To find out the probabilities of a continuous random variable we will use the concept of integration because integration is the process of continuity between two limits.
48
RANDOM VARIABLES
Why? Because probability for a continuous is measurable only over a given interval or limits.
49
The distribution of two or more random variables which are are observed simultaneously when an experiments is performed is called their joint distribution.
The distribution of single variable is called the univariate and the distributions having two or three r.v.s are called the bivariate,trivariate or multivariate. Joint probability function of two variable i.e X and Y are denoted by f (x,y).
50
RANDOM VARIABLES
MARGINAL PROBABILITY FUNCTION From the Joint probability function for (X,Y) we can obtain the individual probability function of X and Y. Such individual probability functions are called marginal probability function. Let f (x,y) be the joint distribution function of two discrete r.vs X and y. Then the marginal probability function of X is defined as g(xi) = f(xi,yj) h(yj) = f(xi,yj)
51
Let a discrete r.v X have possible values x1,x2,x3.xn with their respective probabilities f(x1), f(x2), f(x3)f(xn) such that f(x) = 1 .Then the mathematical expectation or expection or the expected value of X is denoted by E(X), is defined by E(X) = x1f(x1) + x2f(x2) + x3f(x3) +........+ xnf(xn) = xif(xi) where i = 1,2,3,.....n
53
In other words expectation gives the mean value of function X,that E(X) is also called the mean value of r.v X.
By using the rule of expectation you can find the expectation of any newly defined variable w.r.t your original variable. equally likely events.
54
RANDOM VARIABLE
PROPERTIES OF EXPECTATION If a is a constant then E (a) = a. If X is discrete r.v if a and b are constants, then E (aX+b) = a E(X) + b The expected value of the sum of two any random variables is equal to the sum of their expected values,i.e E (X+Y) = E (X) + E (Y)
55
PROPERTIES OF EXPECTATION
The expected value of the subtraction of two any random variables is equal to the subtraction of their expected values,i.e
E (X-Y) = E (X) - E (Y) The expected value of the product of two any random variables is equal to the product of their expected values i.e E (XY) = E (X)E (Y)
56
59
61
62
63
The outcome of each trail are classified into two categories called success and failure.
The probability of success changes on each trail denoted by p.
64
When an experiments consists of independent trails with probability of success p and the trails are repeated until the first success occur, it is called the Geometric Experiments.
GEOMETRIC DISTRIBUTION If X is the number of trails needed for the first success then X is g.r.v and its probability distribution is called the geometric probability distribution.
GEOMETRIC DISTRIBUTION
GEOMETRIC DISTRIBUTION
Since a g.r.v represent how long one has to wait for success, it is also called the waiting time random variable. It is interesting to note that a Geometric distribution is a special case of a Negative binomial distribution when k = 1. The formula for Geometric distribution is. P(X = x) = pqx-1 where P = Probability of Success Q = Probability of failure X = Numerical value of random variable.
67
GEOMETRIC DISTRIBUTION MEAN AND VARINACE OF GEOMETRIC DISTRIBUTION We can find the mean and variance of Geometric distribution directly by using the parameters (i.e p ) of the distribution. If X be a g.r.v with Geometric distribution g(x;p) then its mean and variance are Mean = 1/p Variance = q/p2
69
Sampling is techniques which is used to collect the information and on the basis of this information draw the inference about population.
Sampling is also defined as the process of selecting the sample from the population is known as sampling. POPULATION A population is defined as the aggregate or totality of all individual members of our variable of interest.
70
72
SAMPLING & SAMPLING DISTRIBUTION PROBABILITY SAMPLING The major types of probability sampling are. Simple random sampling (SRS). Stratified random sampling. Systematic random sampling. Cluster sampling. Multistage sampling. Multiphase Sampling.
75
Non-Prob. Sampling is a process the personal judgement determines which units of population are selected for sample.
Non-probability sampling is also known as non-random sampling. Major types of non random sampling are Purposive sampling. Quota sampling.
76
When sampling is performed with replacement a finite population can theoretically be considered as an infinite population Why?
78
SAMPLING & SAMPLING DISTRIBUTION PARAMETER A numerical value such as mean, median or mode which is calculated from the population is known as parameter. STATISTICS A numerical value such as mean, median or mode which is calculated from the sample is known as statistics.
79
A sampling distribution is defined as a probability distribution of the values of a statistics such mean, median etc computed from all possible samples which might be selected with or without replacement from population. A sampling distribution of a statistics is a probability distribution therefore the sum of all probabilities in it always equal to one.
There are many types of sampling distribution but the most frequently used types in statistical inference are
80
SAMPLING DISTRIBUTION Binomial Distribution Normal Distribution T- Distribution F- Distribution Z Distribution Chi Square Distribution
81
STANDARD ERROR
The standard deviation of a sampling distribution of a sample statistics is called the standard error of statistics and it denoted by S.E.
The standard error thus measure the dispersion of the values of statistics. SAMPLING DISTRIBUTION OF MEAN The sampling distribution of mean is the probability or relative frequency distribution of the means X of all possible samples drawn from the population.
82
SAMPLING & SAMPLING DISTRIBUTION SAMPLING DISTRIBUTION OF MEAN The mean of this distribution is denoted by and standard deviation which is called the standard error of mean by . PROPERTIES OF SAMPLING DISTRIBUTION OF MEAN 1.The mean of the sampling distribution is equal to the population mean regardless of weather sampling is done by with or without replacement. i.e =
83
2.The standard deviation of the sampling distribution of the mean is With replacement Without replacement 3.If the population sampled is normally distributed then the sampling distribution of mean will also be the normally distributed
4.If the population sampled is non normal but sample size is large then the sampling distribution of mean will approximate the normal distribution.
84
Probability distribution of differences of means can be obtained and such distribution is called the sampling distribution of differences of means.
85