Professional Documents
Culture Documents
Statistics
Calculating rates
Rates are the most common way of measuring disease frequency in a population and are
calculated as:
number of new cases of disease in population at risk
number of persons in population at risk
The numerator is new cases of disease (or deaths or other health events) during a specified
period; the denominator is the population at risk. Rates imply changes over time and the
period of time for which the rate has been calculated (e.g. month, year) must be specified.
Rates can be expressed per hundreds, per thousands or per millions as convenient.
Rates that are calculated with the total population in an area are known as crude rates. Crude
rates from different populations cannot be easily compared especially where there are striking
differences in, for example, age and sex between populations. Rates may also be calculated
using data from specific segments of the population: these are called specific rates (e.g. ageor sex-specific rates for certain age groups and for men or women, respectively).
An attack rate is defined as the proportion of those who became ill after a specified exposure.
For example, in an outbreak of gastroenteritis with 50 cases among a population at risk of
2500, the attack rate of disease is
50/2500 = 0.02, or
= 2/100, or
= 20/1000
Specific attack rates are calculated to identify persons in the population who are at a higher
risk of becoming ill than others. Examples of commonly used specific attack rates are attack
rates by age group, residence, sex or occupation. To identify the potential vehicle in a
foodborne disease outbreak, the food-specific attack rate is often calculated, which is the
attack rate for consumption of a specified food, calculated as
number of cases of disease among people who ate food X
number of persons who ate food X
To calculate a measure of association between food X and illness, a second attack rate must
be calculated for those who did not eat food X. The two attack rates can then be compared
with each other as a relative risk (division) or as a risk difference (subtraction).
Example
After a dinner attended by 100 people, 12 individuals become ill. All 100 people are
interviewed about their food consumption at the dinner. The interviews show that 8 of the 12
people who are ill and 25 of the 88 who are healthy ate fish.
127
Ill
Well
Total
Ate fish
25
33
24.2
63
67
6.0
12
88
100
Total
The relative risk for eating fish is 24.2/6.0 or 4. The risk difference is 24.2% 6% = 18.2%
Median
The median is the midpoint of a series of ordered values. It divides a set of values into two
equal parts. To identify the median from individual data:
Find the middle rank using the following formula: middle rank = (n + 1)/2.
If the number of values is odd, the middle rank falls on one observation.
If the number of values is even, the middle rank falls between two observations.
Example 1
To calculate the median for the following observations: 1, 20, 5, 3 and 9:
Example 2
To calculate the median for the following observations: 1, 20, 5, 3, 9, 21:
The median is the average of the value of the third and fourth observations, namely 5 and
9. Thus the median = (5+9)/2 = 7.
If the middle rank falls within a row, the median interval equals the value of the row. If
the middle rank falls between two rows, the median interval is the average of the values of
the two rows.
128
Example 3
The epidemic curve shows 58 cases. The middle rank is (58+1)/2 = 29.5. Case numbers 29
and 30 both occur between 18:00 and 24:00 hours on 22 August, which is the median
interval.
10
cases
27
17
26
36
50
16
25
35
49
15
24
34
48
14
23
33
13
22
32
41
46
4
3
47
12
21
31
40
45
11
20
30
39
44
10
19
29
38
43
52
55
57
18
28
37
42
51
54
56
53
58
00- 06- 12- 18- 00- 06- 12- 18- 00- 06- 12- 18- 00- 06- 12- 1821 August
22 August
23 August
24 August
Ill
Well
Total
43
11
54
79.6
18
21
14.3
46
29
75
61.3
To calculate statistical significance the chi-square (2) test can be used. The principles are
illustrated in the following 2x2 tables:
Ill
Well
Total
Exposed
O1 = a
O2 = b
n1
Non exposed
O3 = c
O4 = d
n2
n3
n4
Total
Observed
129
We can calculate the expected numbers of ill and well that would occur if exposure were not
related to becoming ill and the division into ill and well were by chance alone:
Exposed
Non exposed
Ill
Well
E1 = n1n3
N
E3 = n2n3
N
E2 = n1n4
N
E4 = n2n4
N
n3
n4
Total
Total
Expected
n1
n2
N
The chi-square tests compare the observed numbers with the expected numbers for each of
the four cells using the following formula:
(observed expected)2 = (Oi Ei)2
observed
Oi
2 = (Oi Ei)2
Oi
(1)
An easier way to calculate the 2 for a 2x2 table which leads to the same result can be
obtained with the following formula:
2 = N(ad bc)2
n1n2n3n4
(2)
If the expected number (Ei) inside any of the cells is less than 5, the X2 needs to be corrected
using the following formula:
2corrected = N[(ad bc) N/2]2
n1n2n3n4
(3)
The results for 2 are compared with theoretical values for the chi-square distribution (see
statistical textbooks for detailed tables). As a rough guide, if the calculated 2 value is:
10.83, the difference between the two groups is highly significant (p 0.001)
6.64, the difference between the two groups is strongly significant (p 0.01)
3.84, the difference between the two groups is significant (p 0.05).
If the calculated 2 value is <3.84, the difference between the two groups is considered to be
not statistically significant (p > 0.05).
Calculated example, using formula (2)
130
Ill
Well
Total
43
11
54
18
21
46
29
75
= 75(43x18 11x3)
54x21x46x29
= 27.2
Since the 2 value of 27.2 > 10.83, the p-value is <0.001. This means that the probability of
finding the distribution presented in this 2x2 table by chance alone is small less than
1/1000. The exact p-value as calculated by a computer is 0.0000002. In other words, it can be
assumed that vanilla ice cream is strongly associated with the risk of becoming ill.
131