You are on page 1of 3

Systematic sampling

From Wikipedia, the free encyclopedia Systematic sampling is a statistical method involving the selection of elements from an ordered sampling frame. The most common form of systematic sampling is an equal-probability method, in which every kth element in the frame is selected, where k, the sampling interval (sometimes known as the skip), is calculated as:[1]

where n is the sample size, and N is the population size.This is one of the method that has been used. Using this procedure each element in the population has a known and equal probability of selection. This makes systematic sampling functionally similar to simple random sampling. It is however, much more efficient (if variance within systematic sample is more than variance of population). The researcher must ensure that the chosen sampling interval does not hide a pattern. Any pattern would threaten randomness. A random starting point must also be selected. Systematic sampling is to be applied only if the given population is logically homogeneous, because systematic sample units are uniformly distributed over the population. Example: Suppose a supermarket wants to study buying habits of their customers, then using systematic sampling they can choose every 10th or 15th customer entering the supermarket and conduct the study on this sample. This is random sampling with a system. From the sampling frame, a starting point is chosen at random, and choices thereafter are at regular intervals. For example, suppose you want to sample 8 houses from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random starting point between 1 and 15. If the random starting point is 11, then the houses selected are 11, 26, 41, 56, 71, 86, 101, and 116. If, as more frequently, the population is not evenly divisible (suppose you want to sample 8 houses out of 125, where 125/8=15.625), should you take every 15th house or every 16th house? If you take every 16th house, 8*16=128, so there is a risk that the last house chosen does not exist. On the other hand, if you take every 15th house, 8*15=120, so the last five houses will never be selected. The random starting point should instead be selected as a noninteger between 0 and 15.625 (inclusive on one endpoint only) to ensure that every house has equal chance of being selected; the interval should now be nonintegral (15.625); and each noninteger selected should be rounded up to the next integer. If the random starting point is 3.6, then the houses

selected are 4, 19, 35, 51, 66, 82, 98, and 113, where there are 3 cyclic intervals of 15 and 5 intervals of 16. To illustrate the danger of systematic skip concealing a pattern, suppose we were to sample a planned neighbourhood where each street has ten houses on each block. This places houses #1, 10, 11, 20, 21, 30... on block corners; corner blocks may be less valuable, since more of their area is taken up by streetfront etc. that is unavailable for building purposes. If we then sample every 10th household, our sample will either be made up only of corner houses (if we start at 1 or 10) or have no corner houses (any other start); either way, it will not be representative. Systematic sampling may also be used with non-equal selection probabilities. In this case, rather than simply counting through elements of the population and selecting every kth unit, we allocate each element a space along a number line according to its selection probability. We then generate a random start from a uniform distribution between 0 and 1, and move along the number line in steps of 1.
YULES Q-TEST To measure the correlation between two possibly related dichotomous events (E1 and E2), you use Yule's Q, given by the formula:

((ad - bc) / (ad + bc))


where, after running an experiment a certain number of times... a = the number of times E1 happened and E2 happened b = the number of times E1 did not happen and E2 happened c = the number of times E1 happened and E2 did not happen d = the number of times E1 did not happen and E2 did not happen The result will be a real number between -1 and 1. When Q=1, there is a perfect positive correlation between the two events--what happens in E1 always happens in E2 and vice versa. If E1 happens, E2 always happens. If E2 happens, E1 always happens. If E1 does not happen, E2 never happens. If E2 does not happen, E1 never happens. When Q=-1, there is a perfect negative correlation between the two events, and the occurance of one event invariably leads to the non-occurance of the other (and vice-versa). If E1 happens, E2 never happens. If E2 happens, E1 never happens. If E1 does not happen, E2 always happens. If E2 does not happen, E1 always happens. When Q = 0, there is absolutely no correlation between the two events--one event happening or not happening does not influence the other event at all. Total statistical independence.

Of course, those are just the ideal conditions. A Q value of 0.2, for instance, would indicate that there's a relatively weak positive correlation between E1 and E2--if E1 happens, it is more likely than not that E2 will happen--but you probably wouldn't want to bet the farm on it.

It's also worth noting that Yule's Q is a symmetric measure--it couldn't care less whether E1 occured first or second chronologically.

You might also like