You are on page 1of 10

Chemometrics and Intelligent Laboratory Systems 74 (2004) 85 94 www.elsevier.

com/locate/chemolab

Practical applications of sampling theory


Pentti Minkkinen *
Department of Chemical Technology, Lappeenranta University of Technology, P.O. Box 20, FIN-53851 Lappeenranta, Finland Received 1 August 2003; received in revised form 1 January 2004; accepted 12 March 2004 Available online 28 July 2004

Abstract A large number of analyses is carried out, e.g., for process control, product quality control for consumer safety, and environmental control purposes. The sampling theory developed by Pierre Gy, together with the theory of stratified sampling, can be used to audit and optimize analytical measurement protocols. A careful optimization of the sampling and measurement steps of the complete analytical procedure may result in considerable savings in costs or in improvement of the reliability of results. D 2004 Elsevier B.V. All rights reserved.
Keywords: Gys sampling theory; Stratified sampling; Optimization of sampling

1. Introduction In many textbooks of analytical chemistry, it is stated that the result is not better than the sample on which it is based. Very little is however said on how to assure that the sample is good. It is still largely unknown that there exists a useful sampling theory developed for chemical analysis. The situation is, hopefully, slowly changing. Laboratories and consultants who are carrying out sampling as part of their business have started to accredit their sampling procedures, at least in Finland, probably elsewhere too. Basic requirements for the accreditation are that the sampling equipment is correct, the uncertainties of the methods have been estimated, procedures are regularly audited, and the personnel have been adequately trained for their jobs. The most complete theory on sampling for chemical analysis, that takes into account both the technical and statistical aspects of sampling, has been developed by Pierre Gy. Gys theory is presented in his books [1 3] and the latest developments in papers of this issue. Pitard [4] has also written a book about Gys sampling theory. A useful account covering the theory of stratified sampling and optimization of sampling procedures has been written by Sommer [5]. The purpose of this paper is to elucidate how

sampling theory can help to develop cost-optimal procedures. The optimization procedures described in this paper are based on Sommers work.

2. Design and audit of sampling procedures The classification of errors of sampling forms a logical framework for designing and auditing sampling procedures. The classification is shown in Fig. 1 (see Gys papers in this issue for explanations of the different boxes of the figure). Auditing and designing sampling procedures normally involve the following steps: Step 1 Check that all sampling equipment and procedures obey the rules of correct sampling. Replace incorrect equipment and procedures with correct ones. Correct sampling largely eliminates the materialization and preparation errors. Weighting error is made if the lot consists of sublots of different sizes or if the flow rate varies during the sampling periods in process streams, and simple average is calculated to estimate the lot mean. This error is eliminated if proportional cross-stream sampling can be carried out, and the average is calculated as the weighted mean weighted by the sample sizes. Step 2 Estimate the remaining errors (fundamental sampling error, grouping and segregation error, and point selection error) and what is their dependence on

* Tel.: +358-5-621-2102 (office), +358-40-504-9413 (mobile); fax: +358-5-621-2199. E-mail address: Pentti.Minkkinen@lut.fi (P. Minkkinen). 0169-7439/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2004.03.013

86

P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 8594

one-dimensional object, fundamental sampling error models can be used to estimate the uncertainty of the sampling. If the lot cannot be treated as one-dimensional object, at least the point selection error has to be taken into account when the variance of primary samples is estimated. If the sample preparation and size reduction by splitting are carried out correctly, fundamental sampling error models can be used for estimating the variance components generated by these steps. If the expectance value for the number of critical particles, in the sample can be estimated easily as function of sample size, Poisson distribution or binomial distribution can be used as sampling models to estimate the uncertainty of the sample. In most cases, the fundamental sampling error model developed by Gy has to be used. 3.1. Estimation of fundamental sampling error by using Poisson distribution Poisson distribution describes the random distribution of rare events in a given time or space interval. If the average number of the critical particles expected in the sample can be estimated, the standard deviation of the sample can be estimated. Poisson distribution, as the model for sampling error, has been treated, e.g., by Ingamells and Pitard [6]. The important property of the Poisson distribution is that the variance and the mean of the occurrences or the events in the interval inspected are identical (ln, average number of critical particles in the sample in our case). Standard deviation expressed as the number of particles is rn p ln 1

Fig. 1. Gys classification of sampling errors according to the origin of errors. Errors can be classified into two main groups, to those that originate from incorrect design or operation of the sampling equipment (materialization errors) and to statistical errors.

increment size and sampling frequency. If the necessary data are not available, design pilot studies to obtain the data. Step 3 Define the acceptable overall uncertainty level or cost of the investigation and optimize the method, i.e., the increment sizes, selection strategy (systematic or stratified), and the sampling frequency so that the required uncertainty or cost level is achieved. Step 1 is crucial. Normally it is difficult and expensive to estimate the uncertainties of incorrect sampling. It is also futile, because sampling biases are never constant due to the fact that stream segregation is a transient phenomenon which changes all the time. Therefore, sampling correctness must be preventively implemented.

The relative standard deviation is just as easy to estimate 1 rr p ln 2

If ln is large (say, larger than 25), the confidence interval can be estimated by replacing Poisson distribution by normal distribution with the same standard deviation and mean. If ln is small, the confidence intervals have to be estimated from Poisson distribution. Example 1 describes a typical situation where Poisson distribution can be used as the model for sampling error estimation. Example 1. Plant Manager: I am producing fine-ground limestone that is used in paper mills for coating printing paper. According to their specification, my product must not contain more than 5 particles/tonne particles larger than 5 Am. How should I sample my product? Sampling Expert: That is a bit too general a question. Lets first define our goal. Would 20% relative standard deviation for the coarse particles be sufficient? Plant Manager: Yes.

3. Applications of fundamental sampling error model Fundamental sampling error is the minimum error of an ideal sampling procedure. Ultimately, it depends on the number of critical particles in the samples. For homogeneous gases and liquids, it is very small, but for solids, powders, and particulate materials, especially at low concentrations of critical particles, the fundamental error can be very large. If the lot to be sampled can be treated as

P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 8594

87

Sampling Expert: Well, lets consider the problem. We could use the Poisson distribution to estimate the required sample size. Lets see: The maximum relative standard deviation sr = 20% = 0.2. From Eq. (2), we can estimate how many coarse particles there should be in the sample to have this standard deviation 1 1 n 2 25 sr 0: 22 If 1 tonne contains 5 coarse particles, this result means that the primary sample should be 5 tonne. This is a good example of an impossible sampling problem. Although you could take a 5-tonne sample, there is no feasible technology to separate and count the coarse particles from it. You should not try the traditional analytical approach in controlling the quality of your product. Instead, if the specification is really sensible, you forget the particle size analyzers and maintain the quality of your product by process technological means; that is, you take care that all the equipment are regularly serviced and their high performance maintained so that the product quality is always maintained. Plant Manager: Thank you. In light of what you said, it seems that the expensive laser diffraction particle size analyzer recommended to us will not solve our problem. 3.2. Applications of Gys fundamental sampling error equation for designing sample preparation procedures If the material to be sampled consists of particles having different shapes and size distributions, it is difficult to estimate the number of critical particles in the sample. Gy has derived an equation that can also be used in this case to estimate the relative variance of the fundamental sampling error:
3 r2 r Cd

Fig. 2. Estimation of particle shape factor and liberation factor for unliberated and liberated critical particles. L is the particle size of the critical particles.

characteristic dimension d to the volume of the cube having the same dimension. For spheroidal particles f c 0.5, which often can be used as the default value for this parameter. g is the size distribution factor ( g = 0.25 for wide-size distribution, and g = 1 for uniform particle sizes), and b is the liberation factor (see Fig. 2). Liberation factor is an empirical correction for materials, where the critical particles are found as inclusions in the matrix particles. Liberation size L is defined as the size of the opening of a screen below which 95% of the material has to be crushed in order to liberate at least 85% of the critical particles; bmax = 1 (liberated materials and materials ground below the liberation size L), bmin = 0.03 (materials where the critical particles are very small in comparison to d; note that because b is dependent on the particle size d for a given material, the sampling constant C changes when the material is ground or crushed). c is the constitution factor and can be estimated by using Eq. (6) if the necessary material properties are available. aL 2  1 aL  a c q 1 q c aL a m a 

1 1 MS ML

 3

Here rr ra Relative standard deviation of the fundamental aL sampling error 4

where ra, absolute standard deviation (in concentration units); aL, average concentration of the lot; d, characteristic particle size = 95% limit of the size distribution; MS, sample size; ML, lot size; and C is the sampling constant that depends on the properties of the material sampled. C is the product of four parameters: C fg bc 5

where f is the shape factor (see Fig. 2). Shape factor is the ratio of the volume of the sampled particles having the

Here, aL is the average concentration of the lot; a, the concentration of the analyte in the critical particles; qc, the density of the critical particles; and qm, the density of the matrix or diluent particles. If the material properties are not available and they are difficult to estimate, the sampling constant C can always be estimated experimentally. International reference materials (RMs), for example, are a special group of materials for which the sampling constant should always be estimated and reported. Unfortunately, this is seldom done. If the particle size distribution and sampling constants were available for the user, the usefulness of the materials could be improved. The producers of the RMs usually carry out homogeneity tests that provide data that could be reported in a compressed form as sampling constants, but unfortunately at the moment, these data are not fully utilized. Below, some examples are given on how Gys fundamental sampling error model can be used in practice to design and audit analytical procedures. Some further examples can be found in Ref. [8]. As mentioned above, the fundamental sampling error is the minimum theoretical

88

P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 8594

error achievable in a sampling step. Therefore, the fundamental sampling error calculations give realistic estimates for the global sampling error, only if the material is wellmixed before sampling, and all sampling and subsampling procedures are carried out with equipment and methods that follow the rules of sampling correctness defined in Gys sampling theory. Therefore, if large lots are sampled, the uncertainty of the primary samples has to be estimated in a different way, e.g., by using Gys variographic method. Example 2. A certain cattle feed (density = 0.67 g/cm3) contains as an average 0.05% of an enzyme powder that has a density of 1.08 g/cm3. The size distribution of the enzyme was available, and from it was estimated that the characteristic particle size d = 1.00 mm and the size factor g = 0.5. Estimate the fundamental sampling error for the following analytical procedure. The actual concentration of a 25-kg bag is estimated by taking first a 500-g sample from it. This material is ground to particle size 0.5 mm. Then, the enzyme is extracted from a 2-g sample by using a proper solvent, and the concentration is determined by using liquid chromatography. The relative standard deviation of the chromatographic measurement is 5%. To estimate the errors of the two sampling steps, we have the following material properties:

The total relative standard deviation can now be estimated by applying the rule of propagation of errors: st q X s2 ri 0:143 14:3%

The largest error is generated in preparing the 2-g sample for the extraction of the enzyme. To improve the overall precision, this step should first be modified. The recommendation from this exercise is that either a larger sample should be used for the extraction, or the primary sample should be pulverized to a finer particle size before secondary sampling, whichever is more economic in practice. Example 3. Evaluate the feasibility of the following procedure for calibrating an IR spectrometer for the determination of quartz in mineral mixtures. To prepare the calibration standards, pure minerals (d = 1 mm) were ground individually for 2 min in a swing mill. Then 30 mg 2.95 g of each mineral was carefully weighed to obtain the designed composition. The material was carefully mixed for 3 min in a Retsch Spectro Mill, and 20 mg of the mineral mixture was carefully weighed into 4.98 g of KBr and mixed for 3 min in a Retsch Spectro Mill. Mineral KBr mixture (200 mg) was pressed into a tablet for the IR measurement. It was evaluated that the size of the IR beam covered 38% of the area of the sample tablet. The method was developed for quartz concentrations from 1% to 10%. Dilution factor in 0.2 g/5.0 g = 0.004 is needed to evaluate aL, the concentration of quartz in KBr tablets. The procedure has three steps generating sampling errors. These are (1) taking the 20-mg mineral sample from the homogenized mineral mixture to be mixed in KBr: lot size = ML1 = 5 g sample size = MS1 = 0.02 g (2) the calibration tablet preparation: lot size = ML2 = 5 g sample size = MS2 = 0.2 g (3) IR measurement: lot size = ML3 = 200 mg sample size = MS3 = 38% of 0.2 g = 76 mg Following material properties were estimated:
d = 0.045 mm g1 = 0.25 aL = 1.0 10% a = 100% qc = 2.65 g/cm3 qm = 3.0 g/cm3 (in mineral mixture) qm = 3.2 g/cm3 (in KBr mixture) f = 0.5 b=1 . . .particle size of quartz . . .estimated size distribution factor . . .concentration of quartz mineral mixture before dilution with KBr

M1 = 500 g ML1 = 25000 g d1 = 0.1 cm g1 = 0.5 aL = 0.05% a = 100% qc = 1.08 g/cm3 qm = 0.67 g/cm3 f = 0.5 b=1 M2 = 2.0 g ML2 = 500 g d2 = 0.05 cm g2 = 0.25

. . .default value for spheroidal particles . . .liberated particles . . .sample sizes . . .lot sizes . . .particle sizes . . .estimated size distribution factors

These values give for constitution factor (Eq. (6)) the value c = 2160 g/cm3, and for the sampling constants (Eq. (5)), C values C1 540 g=cm3 and C2 270 g=cm3 Eq. (3) gives now the standard deviation estimates for sampling steps.

sr1 = 0.033 = 3.3% sr2 = 0.13 = 13% sr3 = 0.05 = 5%

. . .primary sample . . .secondary sample . . .analytical determination

. . .liberated particles

P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 8594

89

primary samples are taken. r1 is the standard deviation between the means of the N1 strata, and c1 is the unit cost of selecting a strata for sampling (usually c1 is practically zero, because it only involves making the decision from which strata the samples should be taken). 4.1.2. Primary samples From each selected stratum, n2 primary samples are taken. N2 is the size of stratum expressed as the number of potential samples that could be taken from the each stratum. r2 is the standard deviation of primary samples (or the within-strata standard deviation), and c2 is the unit cost of taking a primary sample.
Fig. 3. Standard deviation estimates obtained in Example 3 for the three sampling steps (1 3) and for the total standard deviation (4) in the calibration of the IR spectrometer for quartz determination.

Results of the fundamental sampling error estimation are shown as function of the quartz in the original sample mixture without dilution with KBr in Fig. 3, for each step separately and for the total three-step calibration procedure.

4. Optimization of sampling plans based on stratified sampling Theory of stratified sampling can be used to optimize sampling plans. This subject has been treated, e.g., by Sommer [5] and Cochran [7]. Sommers approach is followed in this presentation. Sometimes, stratification is natural, e.g., if the lot to be investigated consists of bags, containers, wagon loads, etc. In sampling process streams, where no clear stratum borders can be found, the strata can be selected by the sampler. As Gy and Sommer [5] have shown, stratified sampling usually gives smaller uncertainties for the mean value, at worst equal to random sampling. 4.1. Optimization of nested (hierarchical) sampling plans for lots consisting of strata of equal sizes Nested sampling plan is described in Fig. 4. In nested sampling, the samples are taken at k levels (here k = 3). All levels contribute to the overall uncertainty of the mean of the lot, and at each level below the first sampling level, the sample of the upper level is treated as the lot for this level in the sampling chain. The quantities shown in Fig. 4 are discussed in the following subsections. 4.1.1. Lot The lot consists of N1 strata (sublots) of equal sizes. Of these, n1 strata are selected from which the

4.1.3. Analytical samples At this level, n3 is the number of analytical samples prepared from each primary sample, N3 is the size of the primary sample, as the number of potential analytical samples that could be prepared from it, r3 is the standard deviation of the preparation of the analytical samples (i.e., the standard deviation between analytical samples taken from a primary sample), and c3 is the unit cost of preparing an analytical sample. For optimization purposes, the unit costs, ci, can be given either as currency units or as relative costs, e.g., as time required to carry out the given sampling operation. Because the strata and the units within the strata are in most cases autocorrelated, especially in process analysis, and the sampling variances depend on sampling strategy (systematic, stratified, or random selection), the variances should in general be estimated by using Gys variographic method. Analysis of variance based on the design shown in Fig. 4 is recommended in many statistical textbooks as the method for estimating the variance components. Because this method does not take autocorrelation into account, it should be used only in case that strictly random sample selection is used (not recommended), or there is no autocorrelation between the sampling units.

Fig. 4. Nested sampling plan.

90

P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 8594

Because the strata have equal sizes, the mean of the lot can be calculated as the unweighted mean of the analytical results Total number of samples analyzed : nt n1 n2 n3
n3 n1 X n2 X 1 X xijk n1 n2 n3 i j k

By substituting the values for ni (i>1) in Eq. (9), n1 can now be solved n1 cmax ; rounded to the lower integer c 1 n2 c 2 n2 n3 c 3 1Vn1 VN1 11

Mean of the lot : x

Variance of the lot mean: N 1 n1 r 2 N2 n2 r2 N3 n3 r2 1 2 3 N 1 1 n1 N2 1 n1 n2 N3 1 n1 n2 n3 8a Eq. (8a) shows that if a sample can be taken from every stratum (N1 = n1), the between-strata variance is completely eliminated from the variance of the mean. On the other hand, if at all levels, the sample is small in comparison to the lot, from which it is taken, this equation simplifies to r2 x r2 r2 r2 1 3 2 if all ni bNi n1 n1 n2 n1 n2 n3 8b

r2 x

4.1.3.2. Target value for the variance of the mean fixed, 2 total cost minimized. If rT is the target value of the variance of the mean of the procedure, then the protocol has to be designed so that the variance of the mean of the 2 selected procedure, r2 x V rT . For levels i >1, Eq. (10) also applies in this case. Number of sublots to be sampled at level 1 can now be solved either from Eq. (8a) (exact solution, Eq. (12a)) or from Eq. (8b) (approximate solution, Eq. (12b), when all NiHni) N1 N2 n2 r2 N3 n3 r2 2 3 r2 1 N1 1 N2 1 n2 N3 1 n2 n3 n1 r2 1 r2 T N1 1 n1
2 2 r2 1 r 2 r3 r2 T

12a

12b

These solutions have to be rounded to the nearest upper integer 1 V n1 V N1. Example 4 (Determination of cobalt in nickel cathodes). Assume that the size of a lot consisting of cathode nickel is 25 tonne and according to the specifications, the cobalt content must not exceed 150 g/tonne. Average weight of the cathode plates produced is 50 kg, and the plates are cut approximately into 50-g pieces before packing into barrels for shipment. For the analysis, these 50-g pieces are taken as primary samples, and from them a 1-g sample is dissolved for the cobalt determination. The cost of one analytical determination is 12 o, that of taking one primary sample (50-g piece from a given plate) is 2 o. The standard deviations have been estimated as betweenplates standard deviation s1 = 35 g/tonne, within-plate standard deviation (standard deviation of 50-g pieces taken from a single plate) s2 = 15 g/tonne, and the standard deviation between 1-g samples taken from a single 50-g piece, s3 = 3.3 g/tonne. Optimize the analytical procedure so that the standard deviation of the lot mean does not exceed the value of 5 g/tonne.. Sampling is carried out at three different error-generating levels. Following values apply to these: level 1: N1 = 25000 kg/50 kg = 500, s1 = 35 g/tonne, c1 = 0 o, n1=? level 2: N2 = 50 kg/50 g = 1000, s2 = 15 g/tonne, c2 = 2 o, n2=? level 3: N3 = 50 g/1 g = 50, s3 = 3.3 g/tonne, c3 = 12 o, n3=?

Total cost of the investigation : ct n1 c1 n1 n2 c2 n1 n2 n3 c 3 9

This system can be optimized in two ways: either so that the maximum tolerable variance is first specified and the total cost has to be minimized, or the total cost is fixed and the variance of the mean has to be minimized. An exact mathematical solution for this optimization problem cannot be derived, because the number of samples taken can only be an integer number. The optimum can be found however, either by checking all feasible solutions, which is relatively easy by the speed of modern computers, or by using approximate mathematical solution given by Sommer [5]. Approximate mathematical solution can be derived by assuming that all ni are continuous instead of integers and all NiHni. The mathematical solution is presented below. 4.1.3.1. Maximum costs, cmax fixed, variance of the mean minimized. For the levels below, the first level the number of samples are to be taken can be evaluated by using the formula r ci 1 ni ; constrained to integers 1Vni VNi ; si1 ci si for levels i > 1 10

P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 8594

91

At level 1, the unit cost is practically zero. At this level, the only sampling procedure is the decision made from which plates the 50-g primary samples should be taken. Solution. Eq. 10 gives the results: n3 = 0.09, by applying the constraints we have to select n3 = 1; n2 = 0, we have to select n2 = 1. For the level 1, the following results are obtained: n1= 53.3 (Eq. (12a)) or n1= 58.4 (Eq. (12b)). The approximate solution slightly overestimates the required number of plates to be sampled, but as long as the size of the sample at any level is, say 10% or less from the lot size, the difference is small. In this case, the following sampling protocol could be used. A 50-g piece is taken from every ninth cathode plate at the packing stage. This gives a total of 55 samples per a lot of 25 tonne. From each 50-g piece, one cobalt determination is made. The total cost of this inspection scheme is 55(0 o + 2 o + 12 o) = 770 o. The expected standard deviation of the lot mean (from Eq. (8a)) is 4.9 g/tonne. Fig. 5 shows the Operation Characteristics curve of this inspection scheme. It shows both the producers and buyers risk by using this inspection scheme. 4.2. Optimization of sampling plan, when the lot consists of strata of different sizes and heterogeneities Sometimes the lot to be investigated consists of welldefined strata of different sizes and heterogeneities; for example, when the batch to be processed is prepared by mixing raw materials of different qualities to achieve the required average composition. This kind of a lot is shown in Fig. 6.

Fig. 6. Lot consisting of k strata of different sizes, and the quantities needed to optimize the sampling plan.

The lot consists of k different strata. Quantities needed to design a cost optimal sampling plan are MLi Wi P Relative size of the stratum i MLi 14

where MLi, sizes of strata (e.g., as mass or volume; i = 1,2, . . ., k); Ni, relative size of stratum i expressed as the number of potential samples that could be taken from the strata = Ni = MLi/MSi; MSi is the size of samples taken from stratum i; ri, standard deviation of one sample taken from stratum i; ci, cost of one sample analyzed from stratum i; ct, total cost of the estimation of the grand mean of the lot; ni, number of samples taken and analyzed from stratum i; nt, total number of samples analyzed = Sni; xij, analytical results on samples from stratum i (i = 1,2,. . .,k; j = 1,2, . . ., ni); P ni x j ij Pk W i x i i = grand mean x = Mean of stratum i; x of the lot. Variance of the lot mean r2 x X Wi2 Ni ni r2 i Ni 1 ni 15a
ni i1

If the samples taken are small in comparison to the stratum size (as is usually the case), this equation simplifies to r2 x c X Wi2 r2 i ; if in all strata ni bNi and Ni H1 ni 15b

Total cost of the investigation in general case is ct


k X i1

ni c i

16a

Usually in practice, the costs of sampling and analysis are independent from the strata from which the samples are taken and this equation simplifies to
Fig. 5. Operation Characteristics of the optimized inspection protocol for cobalt determination. In this figure x-axis shows the true mean value of the lot, and y-axis, the probability that the mean value of the inspection exceeds the specification 150 g/tonne (producers risk) or the probability that the inspection value is below the specification (buyers risk).

ct nt c*; if c1 c2 ; . . . ; ck c*

16b

Optimization of investigation involves the optimal allocation of the total number of samples that can be analyzed between the strata in an optimal way. Mathematical opti-

92

P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 8594

mum can again be derived by assuming that ni is continuous. The results given below have been derived assuming that investigation costs are independent from strata (Eq. (16b) valid). Optimization strategy depends on how much information is available. If only the sizes of strata are known and the total cost ct of the investigation is fixed, then the best strategy is to allocate the samples proportionally to the sizes of strata: ni W i nt and nt ct c* 18 17

Both nt and ni have to be rounded to integers so that the total cost will not be exceeded. If the unit costs and standard deviations are available, then even better plans can be designed. Laboratories usually follow their costs and, consequently, good cost estimates are available. The standard deviations can be estimated either by using Gys sampling theory, if the material properties needed are available, or experimentally from a pilot study. If the quality control of the analytical laboratory is well planned, it also provides data that can be used for optimization of sampling and analytical procedures. Cost optimal plan for the investigation can be derived in two ways. Either the total cost of the investigation is fixed and the variance of the grand mean of the lot is minimized or the target value for the variance is given and the total cost is minimized. By assuming that Eqs. (15b) and (16b) are valid, that is, in all strata the samples are small in comparison to the sizes of strata, and the cost of investigation is independent of strata, the following results can be derived. 4.2.1. Maximum value, cmax, given to the total cost, variance of the lot mean minimized Wi ri
k X i1

Example 5 (Optimal design for estimating sulfur balance of a pulp mill). Estimation of sulfur balance for a pulp mill is a difficult task Fig. 7. Sulfur enters the mill in raw materials (water, wood) and in chemicals. The outflowing streams consist of products, wastewater, solid wastes, and atmospheric emissions. Initial calculations showed that the mean values of all other streams, except the emissions into atmosphere, could be estimated reliably. Atmospheric emissions comprised about one quarter of the total sulfur. Estimation of atmospheric sulfur emissions in an old pulp mill, like the one where this study was carried out, is difficult and therefore optimization of the emission measurement plan is a challenging task. This is due to the fact that there is a large number of gaseous outlets into atmosphere. These have different mass-flows, concentrations of the sulfur compounds are highly variable and sulfur is found in dust and in many gaseous compounds [SO2, H2S, CH3S, (CH3)2S, and (CH3)2S2]. In optimization, all different emission sources and sulfur-containing compounds analyzed had to be treated as separate strata. 4.2.2.1. Design requirements.
 

Material balance for sulfur required annually A two-man team with portable instruments available for atmospheric emission control measurements  To avoid too much time spent in transporting and setting up the instruments, a one-week measurement period is carried out at each location, where the team is working; that is, 52 periods/year are available and should be allocated optimally  For optimization, the emission sources were grouped into six groups, which could be measured during 1 week from one station in the field 4.2.2.2. Acquisition of basic data. Existing records supplemented with new experiments were used to estimate:

ni

cmax c*

19

Wi r i

Here, ni has to be rounded to integers so that the target cost is not exceeded. 4.2.2. Target value, rt, given to the standard deviation of the lot mean, total cost minimized
k Wi ri X Wi ri r2 T i 1

ni

20

Again, ni has to be rounded to integers so that the required standard deviation of the lot mean will not be exceeded, i.e., r2 Vr2 T. x

Fig. 7. To estimate the uncertainty of the annual sulfur balance in a pulp factory, the standard deviations of the means of several material streams have to be estimated. The streams have different sizes, contain sulfur in many different compounds, and in many streams, high variability is characteristic to the sulfur containing compounds. This is especially the case for atmospheric emissions.).

P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 8594

93

Table 1 Relative sizes, relatives standard deviation of one sample/measurement period, allocation of 32 samples (measurement periods) that can be made during one week, and relative standard deviations of the mean of 1 week in emission group #6 Stratum No 1 2 3 4 5 Total Wi 0.143 0.285 0.002 0.285 0.285 1.000 sri (%) 30 25 40 25 25 ni 5 9 1 9 9 32 sr(x ) (%) 13.4 8.3 40 8.3 8.3 sr x 4:5%

Fig. 8. Heterogeneity (relative variation about the mean) of SO2 in emissions from a soda recovery boiler during a measurement period.

Relative contribution (size) of all major emission sources to the total emission of sulfur compounds  Variance estimates for the different emission sources (e.g., from variogram or from Analysis of Variance). As an example, Fig. 8 shows the SO2 emissions from the soda recovery boiler as the process heterogeneity (relative variation about the process mean). As can be seen, a high variability is characteristic to emissions from this source. In the peaks, the SO2 concentration is three times higher than the process average. From the heterogeneity values, variogram (Fig. 9) can be calculated for the process. Fig. 9 also shows the relative standard deviation estimates for the sampling error in this process stream as function of the sampling interval both for the systematic and the stratified

sample selection. The variogram and standard deviation estimates were calculated by using Gys method. With the standard deviation estimates, the uncertainty of the process mean during the measurement period can be estimated. This measurement plan can be treated as a twolevel hierarchical sampling plan, where one sample is analyzed from each of the N1 strata. By substituting N1 = n1, n2 = 1 (and N2Hn2) and N3 = n3 = 0 in Eq. (8a), the standard deviation of the mean of the measurement period is obtained as r2 x N1 n1 r2 N2 n2 r2 s2 1 2 c 2; N1 1 n1 N 2 1 n1 n2 n1

2 is the standard deviation estimate from Fig. 9 for where s2 the used sampling frequency, and n1 is the number of samples analyzed.

4.2.2.3. Optimization procedure. ried out at two levels:

Optimization was car-

(1) For each group having more than one source, the analytical resources were optimally allocated within the one-week (5 days) measurement period between the different sources (an example is given in Table 1). (2) Of the 52 measurement weeks, two were allocated for the insignificant sources, where only occasional measurements need to be carried out, and 50 weeks were allocated to the six most important groups. The results of the final optimization are given in Table 2.

Table 2 Relative sizes, total relatives standard deviation between the mean of 1week measurement period, and allocation of 50 measurement weeks between the different emission source groups Emission group no. Wi (%) sri (%) ni

Fig. 9. Variogram calculated from the heterogeneities of Fig. 8 (upper part) and relative standard deviation estimates for systematic and stratified sample selection as function of the sampling interval calculated from the variogram (lower part).

1 0.14 14.6 10 2 0.18 20.8 17 3 0.04 14.8 3 4 0.02 7.9 1 5 0.02 10.9 1 6 0.60 6.4 18 Standard deviation of the annual mean of sulfur emission, sr x 1:5%

94

P. Minkkinen / Chemometrics and Intelligent Laboratory Systems 74 (2004) 8594

Table 2 shows that regardless of the high heterogeneity of sulfur in gaseous emissions, the mean can be estimated fairly accurate at an annual level. When the emission measurements were carried out approximately according to the optimized plan, the unaccounted sulfur in the annual material balance was only about 1%, which is in a good agreement with the theoretical calculation considering the complexity of the process.

5. Conclusions Sampling theory can, and should, be applied in all steps of analytical procedures, from the planning to the analytical measurements. At the moment, it is largely neglected. A large amount of analytical resources is devoted for quality control and environmental emission estimation in industries. At national levels, large programs are devoted, e.g., to monitor the state of environment and to guarantee the quality and safety of food. Only seldom that the data quality objectives are adequately defined before the sampling campaigns, and consequently, the sampling theory is not utilized when these campaigns are designed. Often, the sampling plans have just evolved and their performance has never been critically audited. This leaves a lot of room for optimization in this field. Before designing a sampling and analytical plan, one should carefully consider what is the uncertainty level tolerated by the user of the results. If the ambition level is set too high, the investigation will be unnecessarily expensive. In general, in the cost benefit relationship considerations, the following rule applies: if the total standard deviation is cut to half, the cost will increase four times, and if only one quarter of original standard deviation can be tolerated, the cost will be 16 times higher, etc. But, what is important, this relationship only holds if the analytical procedure has been optimized. If the resources are not optimally allocated, the required uncertainty level may not be achieved at all. On the other hand, optimization of existing procedures may become cheaper and still give more reliable result than the original procedure. An example: a pulp factory pumped the pulp slurry

directly to a paper factory through a pipeline. The pulp factory had installed a process analyzer to the pipeline and the amount of pulp pumped to the paper factoryand the bill for sold pulpwas based on this measurement. When the discrepancy in the mass-balances of paper produced and pulp received was noticed, the sampling and analytical chain was carefully checked. It turned out that partly due to the sampling problems and partly due to the calibration problems of the analyzer, the estimated amount of pulp could be 10% too high. As the paper factory consumed about 80 000 tonne pulp per year, the analytical error was really expensive. By improving the sampling and calibration procedure, the systematic errors could be removed and the random errors were minimized to a level where their effect to the mean of the annual mass of pulp pumped was only few tenths of a percent.

Acknowledgements I want to thank my friends, Dr. Pierre Gy for developing the important sampling theory and Prof. Kim Esbensen for encouragement for writing this paper. I am also greatly indebted to numerous colleagues and coworkers with whom I have worked with sampling problems over the years.

References
[1] P.M. Gy, Sampling of Particulate Materials, Theory and Practice, Elsevier, Amsterdam, 1982. [2] P.M. Gy, Sampling of Heterogeneous and Dynamic Material Systems, Elsevier, Amsterdam, 1992. [3] P.M. Gy, Sampling for Analytical Purposes, Wiley, Chichester, 1998. [4] F.F. Pitard, Pierre Gys Sampling Theory and Sampling Practice, Second edition, CRC Press, Boca Raton, 1993. [5] K. Sommer, Probenahme von Pulvern und ko rnigen Massengu tern, Springer, Berlin, 1979. [6] C.O. Ingamells, F.F. Pitard, Applied Geochemical Analysis, Wiley, New York, 1986. [7] W.G. Cochran, Sampling Techniques, Third edition, Wiley, New York, 1977. [8] P. Minkkinen, SAMPEXa computer program for solving sampling problems, Chemolab 7 (1989) 189 194.

You might also like