Robust Techniques For Signal Processing - A Survey

Robust Techniques for Signal Processing: A Survey
In recent years there has been much interest in robustness issues in general and in robust signal processing schemes in particular. Robust schemes are useful in situations where imprecise a priori knowledge of input characteristics makes the sensitivity of performance t o deviations from assumed conditions an important factor in the design of good signal processing schemes. In this survey we discuss the minimax approach for the design of robust methods for signal processing. This has proven t o be a very useful approach because it leads t o constructive procedures for designing robust schemes. Our emphasis is on the contributions which have been made in robust signal processing, although key results of other robust statisticalprocedures are also considered. Most of the results we survey have been obtained in the past fifteen years, although some interestingearlier ideas for minimax signal processing are also mentioned. This survey is organized into five main parts, which deal separately with robust linear filters for signal estimation, robust linear filters for signal detection and related applications, nonlinear methods for robust signal detection, nonlinear methods for robust estimation, and robustdata quantization. The interrelationships among many of these results are also discussed in the survey.
I.
INTRODUCTION
In recent years there has been a resurgence of interest in what are known as robust methods for statistical signal processing. Such methods are applicable whenever schemes are used to carry out functions such as signal detection, estimation, filtering, and coding, common examples being in radar and sonarsignalprocessing, communication systems, pattern recognition, and speech and image processing. In the early days of development of the body of ideas we now possess for statistical signal processing, the emphasis
Manuscript received October 2, 1984. This work was supported by the U.S. Air Force Office of Scientific Research under Grant AFOSR 82-0022, the U.S. Army Research Office under Contract DAAC29-81-K-0062, theJoint Services Electronics Program (U.S. Army, U.S. Navy, U.S. Air Force) under Contract NOOOl4-84-C-0149, theNational Science Foundation under Grant ECS-82-12080, and the U.S. Office of Naval Research under Contracts N00014-80-K-0945 and N00014-81-K-0014. S. A. Kassam is with the Department of Electricat Engineering, University of Pennsylvania, Philadelphia, PA 19104, USA, H. V. Poor i s with the Department ofElectricaland Computer Engineering and the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
was onthe derivation of optimum schemes for use in specified signal and noise environments. A classic example of this is the matched filter which is optimum for a particular signal and noise model. Because the signals and noise in signal processing applications are usually modeled as random processes and performance measures therefore usually involve probabilistic quantities (such as mean squared error or probability of error), the theory of statistics has played a fundamentalrole in the development ofoptimum signal processing techniques. Suppose a signal processing scheme, say a detector for a signal with known waveform in additive noise, is designed to give optimum performance for noise possessing a specific statistical description. For example, one widespread model for noise is that it is a Gaussian process. Animportant question that arises is, how sensitive is the performance of such an optimum scheme to deviations in the signal and noise characteristics from those for which the scheme is designed? This is an important question because in practice one rarely has perfect knowledge of, say, the noise characteristics; the Gaussian any or other specific model is usually a nominal assumption which may at bestbeapproximately valid most of the time. Unfortunately, it turns out that in many cases nominally optimum signal processing schemescan suffer a drastic degradation in performance even for apparently small deviations from nominal assumptions. It is this basic observation that motivates the search for robust signal processing techniques; that is, techniques with good performance under any nominal conditions and acceptable performance for signal and noise conditions other than the nominal which can range over the whole of allowable classes of possible characteristics. Thus in seeking robust methods it is recognized at the outset that a single, precise characterization of signaland noise conditions is unrealistic, and so classes of possible signal and noise characterizations are constructed and considered in the design of such methods. To illustrate the above observation with a concrete example consider further the detection problem mentioned above in discrete time. Thussuppose we have scalar observations X , , X , ; . . , X , , forming a vector X , which are
0018-9219/85/0300-0433901.00 4 9 8 5 I E E E
P R O C E E D I N G S OF THE IEEE, VOL. 73. NO 3, M A R C H 1985
433
known either to be noise only or to be noise plus a known , signal sequence s,, s 2 ; -, s with positive amplitude 8. We express this situation as a choice between the two hypotheses
/-/,:Xj=
Nj,
i=1,2;..,n
(I .Ia)
(I.Ib)
and
H 1 : X j = B s j + N,,
i=1,2;..,n
assumed to be wherethe noise components N j will be independent and identicallydistributed with acommon univariate probability density function (pdf) f. The likelihood ratio A(X) for the observation vector X is given in this case by
tial (or some other long-tailed pdf), then the performance of the test that is optimum for Gaussian noise could be very poor because of the unexpected number of large noise values. It would appear, then, that to counter the undesirable sensitivity of the test based on A(X) one should implement a bounded modification i( x; s,e) of the function L ( x; s, 0 ) corresponding to the assumed nominal model. Thus we are led to consider i( x; s, e) of the form
i(x;s,e)=
b,
L(x;s,e),
L(x; sJ) > b
-a,
- a < L ( x ; s , e ) G b (1.5) q x ; s,e) < - a
This ratio can be formed for any particular realization of X provided f is known. It is well-known that a test for H, versus Hq based on the comparison of A(X) to a threshold is optimum according to several criteria. For example, such a test is Neyman-Pearson optimum [ I ] yielding maximum detection power (i.e., minimum miss probability) subject to a constraint on the maximum value of the false-alarm probability. Similarly, the test minimizing the Bayes risk for a set of prior probabilities for H, and H,, and the minimax test for a given loss function or payoff matrix with unknown priors are also of this form. A test based on the comparison of A(X) with a threshold is equivalent to one based on a comparison ofthe logarithm of A ( X ) with the logarithm of the original threshold. Taking the logarithm on both sides of (1.2), we have
where
If I L ( x ; s,B)I is unbounded as a function of x , the value of logA(X) can be influenced heavily by a single observation component X i for which I L ( X , ; s,,e)l is large. Such a component can therefore completely override the weight of a possibly large number of other components in the choice between H, and HI. While this effect is certainly acceptable if the model for the noise density function is accurate, it may also be observed because of an occasional completely erroneous measurement which the pdf model f does not take into account. In general, the assumed pdf f describes only an approximate or nominal model. Thus while the actual value of l L ( X , ; s , , O ) l at some observed value X , = x , may not be large relative to that obtained at other observation components, for the assumed model this may happen. To illustrate this, suppose that f is assumed to be Gaussian, in which case [ ( x ; s,e) is linear in x and thus is unbounded. If the true density f has exponential, rather than Gaussian, tails, then the true L ( x ; s,@) is a constant for x in the tails, and so is bounded. For a model specifying exponential tails, an increasing absolute value for an observation component indicates increasing likelihood of one hypothesis over another only up to a saturation value; beyondit, larger absolute values do not indicate larger relative likelihood. If the noise density were truly exponen-
where a and b are constants. One can expect that with a and b not too small test performance should degrade only marginally when the assumed model is accurate. On the other hand, the boundedness of i ( x ; s,e) builds in a robustness against the influence of a small number of spurious observations. The size of the interval [ - a, b ] clearly controlsthe tradeoffbetween degree of robustness and performance degradation under the assumed model. It is noteworthy that several analytical considerations of robust detection lead to detectors based on functions with the form (1.5), as will be discussed in Section IV. Often a class of allowable characteristics, say for a noise pdf, is constructed by starting with a nominal characteristic and then including in the class all other characteristics that are close, in some well-defined sense, to this nominal one. Then a signal processingscheme that is robust may have performance at the nominal which is not quite as good as the scheme that is optimum for the nominal case, but i t s overall performance with respect to thedefined class of characteristics will be good or acceptable. This loose definition of robustness is perfectly reasonable, but it does not provide a systematic approach to obtaining robust schemes. In order to achieve this we must first specify a measure of overall performance of a scheme with respect to a class of allowable conditions at the input. One such measure that has been widely used and which leads to interesting and useful results in many situations is the worst case performance of a scheme over a class of input conditions. Clearly, if its worst case performance is good we may say that a given scheme is robust. On the other hand, to find such a robust scheme we can look for the scheme that optimizes worst case Performance. This approach leads to what are known as minimax robust schemes. Implicit in our association of minimax schemes with robust schemes is the expectation that the worst case performance of a minimax scheme will be acceptably good, being the best that can be achieved. Another expectation one has in defining robust schemes in this way is that atany nominal operating point the performance of the minimax scheme will not be very far below that of the nominally optimum scheme, which on the other hand will have much poorer performance away from the nominal point. Fortunately, it does turn out that minimax schemes for the signal processing applications of interest usually have the above
A scheme that minimizes the maximum possible value of a loss function is minimax; if performance is measured by a gain function then a maximin scheme would be sought. Weshalluse the term minimax as a general description for such schemes in all cases.
434
PROCEEDINGS O THE IEEE, V O L 73. NO. 3, M A R C H 1985 F
characteristics. They may therefore be said to have a more stable performance than schemes lacking these characteristics (inthe literature the terms robust and stable are sometimes used to mean the same thing). We should emphasize that the classes of allowable characteristics one deals with in robust signal processing are generally nonparametric function classes,such as the class of all power spectral density functions with specified total power (area underthefunction) and which lie between specified upper and lower bounding functions. For uncertainties expressed by parametric classes of allowable values for finite-dimensional parameters(such as the mean and variance of a Gaussian pdf) one can of course use minimax designs as well, although alternative parametric approaches of statistical theory can also be applied in such situations. In this paper we will concentrate on minimax robust schemes. There are useful formulations of robustness other than the minimax one, most notably the stability or qualitative robustness ideas introduced by Root [2] in the context of signal detection and by Hampel [3] in the context of parameter estimation. These formulations utilize the idea of robustness as a continuity property of some performance measure as a function of the underlying model, and some brief discussion of theseideas is included here. However, from the viewpoint of design, the minimax approach has had the most impact on robust signal processing schemes. ~ l s we will not survey adaptive procedures, which may be o used as robust schemes when input conditions are not precisely known and may be time-varying. Adaptive procedures, which attempt to learn about input conditions and adjust their specific signal processing structure accordinglytomaintain good performance, are generally more complex than fixed minimax schemes. Adaptive schemes are more desirable when the a priori uncertainty is so large thatthe guaranteed level of performance of a minimax scheme would be too poor to beacceptable and when adequate time and data for adapting are available. Conversely, minimax procedures would be more desirable under more constrained uncertainty classes, and especially as robust procedures to guard againstexcessive performance degradation of nominally optimum schemes for deviations from nominal assumptions. Minimax schemes may be used inconjunctionwith an adaptive approach, because the learning mechanism in adaptive schemes never can be expected toperform perfectly given any finite time for adaptation to take place. The application of minimax concepts to obtain robust versions of optimum adaptive procedures has also been considered [4]. While our primary concern here is with minimax robust schemes we will mention other specific techniques whenever it is appropriate. Most of the recent investigations on robust signal processing techniques have been motivated by the works of the statistician Tukey [5] and more so by the seminal 1%4-1%5 results of the statistician Huber [6], [7] on minimax robust location-parameter estimation and hypothesis testing. There has generally been a tendency to overlook some rather interesting work on minimax procedures which was carried out for signal processing applications in the decade prior to the publication of Hubers results. In 1954, Zadeh [8] suggested that minimax solutions are the natural choices to use in filtering noisy signals under a priori uncertainties. In [9] Root describes the game-theoretic ap-
proach and its application to obtain minimax decision rules in some communication problems. (These results of Root were originally contained in a 1956 report [IO].) Early considerations of minimax schemes for signal processing include those of Yovits and Jackson in 1955 [Ill on signal estimation filters for imprecisely known power spectral density functions and of Nilsson in 1959 [I21 and Zetterberg in 1%2 [I31 on matched filters. We willmention their results again in the followingsections. Other early contributions are the 1957 paper of Blachman [14], the 1959 work of Dobrushin [15], and the 1%1 paper of Cadzhiev [16]. The pre-I964 investigations of minimax signal processing schemes tend to be characterized by two attributes. One is that they were generally not concerned directly with pdf variations but rather with power spectral density function or related variations. Secondly, minimax schemes were advocated simply as reasonableapproaches when designing systems for operation under conditions at the inputs which could not be determined a priori. Thus the possible nonrobustness of optimum schemes for nominal assumptions on the input was not explicitly recognized as an issue. The term robust was first used in describing desirable statistical procedures by Box in 1953 [17]. As we have remarked, minimax robustness of estimation and hypothesis testing schemes was considered by Huber in [6] and [7], and since then a large number of results on minimax and alternative formulations of robustness have been generated in the statistics literature. In a recent paper [I81 Huber has givena most interesting account of someearlyconcerns about robustness of statistical procedures and specific schemes, some of which date back to the last century. Reviews of the more recent techniques of robust statistics have beengiven by Huber [19], [20], Hampel [21], Bickel [22], and Hogg [23],[24]. Ershov [25] alsogives a survey of robust estimation schemes which is quite broad in its scope. A monograph on robust estimation schemes by Andrews et a/. [26] studies the properties of many robust estimates. A collection of chapters edited by Launer and Wilkinson [27] contains some useful expositions. A recent book [28] may be consulted for a more detailed treatment. This survey will focus specifically on minimax robust signal processing schemes, so that only a small part of the large body of the statistics literature will be mentioned explicitly. Most of the recent developments in robust signal processing have of course been influenced directly by the developments in robust statistics. However, signal processing problems impose their own distinct requirements which are not always standard in problems of statistics. Thus it turns out that some recent developments in robust signal processing have provided new results in robust statistics. Most of the results we survey here are of the post-1965 period. Two of the earliest papers in the signal processing area from this period are those of Wolff and Gastwirth [29] and Martin and Schwartz [30], and they have been responsible for driving muchof the subsequent work in robust signal processing. Thus a considerable literature has arisen on robust signal processing just in the last ten to fifteen years. The statistical descriptions of input conditions in signal processing are usually stated in terms ofpower spectral density or correlation functions and pdfs. We shall discuss results which have been obtained on minimax robust linear
KASSAM A N D P O O R : R O B U S T TECHNIQUES F R SIGNAL PROCESSING O
435
filtering for signal estimation (e.g., Wiener filtering) in Section I 1 and for signal detection (eg, matched filtering) in Section Ill. Here the uncertainty classesare for spectral density or correlation functions. In Section IV results on minimax robust nonlinear signal detection schemes for distributional uncertainties are surveyed. Nonlinear parame fer estimation schemes are surveyed more briefly in Section V, since on this topic there is much already available in reviewformin the statistics literature. Also included in Section V is a brief survey of nonlinear modifications of the Kalman filter for robustness against non-Gaussian pdfs for the observation and process noise components. Section Vi treats the problem of robust quantization of data with unknown statistics, and we close with some concluding remarks in Section VII. Although our survey begins with robust linear filtering, studies on this topic are of more recent vintage than those on nonlinear signal detection and estimation. We feel, however, that the very widespread use of schemes such as Wiener and matched filters in signal processing justifies our beginning with robust versions of such linear processing schemes. Before we begin let us note some other review, tutorial, and survey articles which are available in the literature. A tutorial on this subject by the authors has been published recently [31]. VandeLinde has given a brief survey in [32]. Ershov [25] and Krasnenker[33]havesurveyed nonlinear estimation anddetection schemes, respectively. Poor [34] has recently given a more mathematically detailed survey of robust detection schemes. Kleiner, Martin, and Thomson [35] and Martin and Thomson [36] treat the robust estimation of power spectral density functions. Robust methods for time series analysis have been considered by Martin in [37], and robust methods for system identification have been described by Poljak and Tsypkin in [38]. As a final introductory comment we should note that the literature in the area of robust statistical methods is vast and broad. Thus although this survey touches on what we feel to be the major contributions in robust signal processing, it is by no means exhaustive. However, the many results and methods that are not discussed here are accessible to the reader through the references provided.
estimation procedures primarily within the context of the stationary linear (i.e., Wiener-Kolmogorov) estimation problem. Several other signal estimation problems have been treated in the context of robust design, including recursive nonlinear filtering and identification. Results on these problems will be discussed briefly in Section V.
A. The Need for Robustness in Signal Estimation
Consider the observation model

Y( t) = S( t)
+ N( t),
-CO
<t<
00
(2.1)
where { S( t); - 00 < t < oo} and { N( t); - oo < t < 00) are real, zero-mean, orthogonal, wide-sense-stationary (wss) random processes representing signal and noise, respectively.We assume that {S( t); - 03 t < GO} and { N( t); - w < t < oo} have power spectral densities as and a,, respectively. (Most of these assumptions can be relaxed, as is discussed below.) Given the observation process { Y( t); - 03 < t < oo} we wish t o form an estimate of S t) of the form ( j(t)=/m
-m
h(t-T)Y(T)dT
where h is the impulse response of a time-invariant linear filter. A common performance criterion for signal estimates is the mean squared error (MSE), which for estimates of the form of (2.2) is given straightforwardly by
where H is the transfer function associated with h (i.e., H is the Fourier transform of h). If as and @ are known, then , the M S E of (2.3) canbe minimized over H tofind the optimum filter transfer function for linear minimum-MSE estimation. It is straightforward to show (see,e.g.,Thomas [39]) that the minimizing solution is given by
11.
ROBUST FILTERS F O R
SIGNAL
ESTIMATION
One of the most common signal processing tasks arising in applications is that of estimating (e.g., filtering, predicting, or smoothing) a signal waveform from a noisy measurement. This task arises, for example, in radar and sonar tracking systems, in observers for automatic control systems, in demodulators for analog communication systems, and in medical imaging systems. Conventional design procedures for optimum signal estimation algorithms often require an exact knowledge of the statistical behavior both of the signal of interest and of the noise corrupting the measurement. For example, in the design of optimum linear estimation algorithms we must know the spectral or autocorrelation properties of the signal and noise in order to specify the optimum procedures, and (as we shall see below) procedures designed to be optimum for a given model can be undesirably sensitive to inaccuracies in the model. As noted in the Introduction, robust procedures can overcome problems arising due to modeling inaccuracy by incorporating modeling uncertainty into the design from the outset. In this section, we will discuss the design of robust
436
and that the corresponding minimum value of the MSE is given by
ew(@s,@,.,).
(2.5)
Suppose that we design a filter H, via (2.4) to be optimum for some nominal signal and noise spectral pair (@s,O,@,,O), but that the actual spectra as and @ can , range over some classes 9 and N , respectively, of spectra neighboring QS,, and An important question that arises in this situation is what is the behavior of the MSE e(aS,@,; H,) as as and @ rangeover Y and A ? For , example, it is of interest to know how the quantity e(@,,@,;SUP
( @ 5 , @ N ) ~ ~ X ~
H,)
(2.6)
compares quantity the with @,; H , ) ,, ew(@~,o,@,,o). The first of these quantities represents the worst performance of the filter H, over the class of possible spectra, whereas the second quantity represents the predicted performance assuming the nominal model to be accuP R O C E E D I N G S O T H E IEEE, VOL. 73, NO. 3, M A R C H 1985 F
rate. A situation in which (2.6) were considerably larger would point to a possibly inadequacy than ew(@s,o,@N,o) of the nominal design H,. Toillustrate the degree to which modeling uncertainty can affect performance, we consider the following example Suppose we have astakenfrom Vastola and Poor [a]. sumed a nominal model ( @ s , o , @ N , o ) given by
and 20
@,,o(o)
=
Fig. 1. Worst case and nominal performance of nominal and trivial filtering for the example in Subsection Il-A.
loo
4 +
-w
<
W.
(2.7b) range of input SNRs. Also depicted in Fig. 1 is the performance of trivial filtering, which corresponds to all-pass filtering if the input SNR is positive and no-pass filtering if the input S N R is negative. Note that the worst case over (2.8) of the nominally designed filter is uniformly worse than trivial filtering. Thus thenominalfilter can actually make the signal noisier than it originally was!
8. Minimax Design of RQbust Filters
o2'
Note that these spectra represent first-order wide-sense Markov processes with 3-dB signal bandwidth equal to 1, 3-dB noise bandwidth equal to IO, signal power E { [ S ( t ) ] 2 } = $, and noisepower E{[N(t)]'} = Suppose, however, that all we really know are the total signal power, the total noise power, and the fractional signal and noise powers in the frequency band I1 Q 1. This knowledge corresponds o to the uncertainty classes2
6.
Y= @
51
@Jo) do = ps$
and and
I/" do = $ QS(o) 27 9
(2.8a)
The above example illustrates the need for an alternative design philosophy for the stationary linear signal estimation problem for applications in which there is some uncertainty regarding the spectra of interest. In particular, in view of the methods described in theIntroduction,we consider as a design philosophy the minimization over H of the worst case performance degradation described by (2.6).3That is, we consider the design criterion min(
H
sup
(Cp,,Cp,).YXX
e(as,QN;H)).
(2.10)
(2.8b) where
A solution to (2.10) can be considered to be a robust filter for the uncertainty classes 9 and N , To solve this problem, we would like to find a saddle
point for the minimax game of (2.10); i.e., we would like to find a spectral pair ( @ s , L , @ N , L ) E Y X N and a filter H, satisfying
min e(QS,,,@,,,; H ) .
H
For a given estimation filter, H, and spectral pair (as,@,,,), the signal-to-noise ratio (SNR)at the output of H can be defined by Output SNR = IOlog,, since the - s(t)), and E { [ S ( t ) ] ' } =
(2.11)
( $ / e ( Q S , a N ; H ) ) (2.9) ( jt output j t ) can be written as 3( t ) = S( t ) + ( ( )

$ and
E { ( S ( t ) - j(t))'} = e(as,QN;H).
Also, the input SNR is given simply by
Input SNR = lologl,
($14).
Using these definitions, Fig. 1 depicts the nominal and worst case performance of the filter Ho designed to be optimum for the nominal spectral pair of (2.7). Note the considerable performance degradation throughout the given
'Note that rational models (such as (2.7)) are often forced upon estimated power spectra, although the actual data only predict fractional powers (such as (2.8)) accurately (Marzetta and Lang [41]).
Note that the right-hand equality in (2.11) implies that H, is the optimum filter for the pair ( @ s S . L , @ N , L[i.e., H R ( o )= ) @ s , L ( o ) / ( @ s , L ( o ) , L ( o )thus the determination of a @N )]; saddle point involves finding a pair ( @ s , L , @ N , L ) which satisfies (2.11) with H, = @ s , L / ( @ s , L @ N , L ) . The left-hand equality in (2.11) says that H, achieves its worst performance at the pair of spectra ( @ s , L , @ N , L ) for whichit is optimum. This worst performance, e(QS,,,QN,,;H R ) , is the guaranteed levelof performance of the filter H, for the classes of 9 and N . The problem (2.10) was first posed4 in the context of robustness by Kassam and Lim in [44] wherein a saddle
3An approach to this problem in which the signal is taken to be deterministic with unknown parameters is described by Kurkin and Sidorov in [42]. 4Although not in this specific context, the idea of designing filters for uncertain models by using least favorable conditions was proposed by Nahi and Weiss in [43].
K A S S A M A N D P O O R ROBUST TECHNIQUES F O R SIGNAL PROCESSING
437
point solution to (2.10) wasgiven for the situation in which the spectra are knownonlytoliewithin given spectral bands. The problem of (2.10) was considered for general spectral uncertainty classes by Poor [45], and it is shown in [45] that for convex Y a n d X , a spectral pair ( @ s , L , @ N , L ) E Y X N a n d its optimum filter H, = @s,L/(@s, @, form ,) a saddle point for (2.10) if, and only if, the pair ( @ s , L , Q N , L ) is least favorable for Y x X ; i.e., if and only if ( @ s , L , @ N , L ) solves
uncertainty (between 0 and 1) placed on the nominal model bythe designer. This type of model allows for a fairly general type of uncertainty in anominal spectral model, and it is used frequently to model uncertainty in several contexts. Example 2: Variational-NeighborhoodModels: Another useful model for spectral uncertainty arises from allowing all spectra that vary from a nominal spectrum by no more than some given amount. Using a standard measure of variational distance this model becomes
where e, is the minimum-MSE functional defined by (2.5). The term least favorable comes from the fact that ( @ s , L , @ N , L ) is the pair of spectra in Y X JV that correspond to the random processes that are hardest to separate by filtering. Thus a design procedure for finding a robust filterfor given uncertainty classes Y and X is to solve (2.12) and then to design the optimum filter for the maximizing spectral pair. Since the filter design problem is solved by (2.4), the only possible difficulty is in solving (2.12). This problem, however, is straightforward to solve for many uncertainty classes of interest. in particular, the functional ew(QS,QN) can be written as
and
/ @(
2n
-m
o) d o =
$1
(2.15)
where a0 and E play the same roles as inthe c-contaminated class and where d = (1/2n)j00,@0(w) do. Example 3: pPoint Models: The classes of (2.8) are particular examples of a more general type of spectral uncertainty class known as p p o i n t classes.Theseclassesare of the form
(2.16)
x). where C is the convex function C ( x ) = -(2n)-x/(I Thus maximizing e, is equivalent to minimizing the functional ]C(@s/@N)@N, which is a special case of a general class of divergences or distances between densities (see Ali and Silvey Csiszar [&I, [471). Inview of (2.13) least favorables can be interpreted as being the spectra in Y and N whose shapes are closest together. Because of this structure, the problem of solving for least favorable spectra for spectral uncertainty classes in which the total signal and noise powers are known and only the spectral shapes are uncertain can be accomplished by analogy with results in robust hypothesis testing. In particular, for a general type of classes with this property, the leastfavorablespectraare scaled versions of the least favorable probability densities for an analogous robust hypothesis testing problem posed by Huber [7]. This is a useful result because solutions to the robust hypothesis testing problem are known for many uncertainty models of interest. (SeePoor[45], [48] for further details.)
is a nominal spectrum and frequency domain. Note that a p p o i n t class consists of all spectra that have a fixedamountof power on each ofthe spectral regions 0,;..,0,,. Such a class might arise, for example, when power measurements are taken in a number of frequency bands. Example 4: Band Models: The first spectral uncertainty model that was considered in the context of robust Wiener filtering consists of the class of those spectral densities (with a given amount of power) that lie in a band bounded above and below by two known functions. Thisclasscan be written as
as
where,
0,,Q2;.,a, form a partition ofthe
before,
Qo
@lL(o)
< @(o) < u(o),

and
--oo < w <
00
/ 277
@( d o o)
(2.17)
C. Some Useful Models for Spectral
Uncertainty
There are a number of useful models for spectral uncertainty for which solutions to the robust stationary linear filteringproblem can be obtained straightforwardly. The following examples are typical: Example 7: c-Contaminated Models: One very useful spectral uncertainty model is that given by
where L and U are known functions and where P is fixed. Note that a model such as (2.17) can be used to describe a confidence region around an estimated spectrum. Also note that the <-contaminated model of Example 7 is a special case of (2.17) with L = (1 - )ao and U = 03. Example 5: Generalized Moment-Constrained Models: As a final example, consider spectral uncertainty classes of the following type:
(@I/
f , ( o ) @ ( o ) d o = c,,
k=l;-.,n}
(2.18)
-m
(@I@( a) = (1 - )ao( + ( 0) .
and
/ w
W
0)
@(a) d o
=/
W
Qo(o)do)
(2.14)
-m
where Qo is a nominal spectrum, u is an arbitrary and unknown contaminating spectrum, and c is a degree of
where f,;. ., f, are known functions and C , ; - . , C , are constants. The quantities If,@,= l ; . - , n are sometimes k known as generalized moments of the spectrum 0 corresponding to the weightings f,;.., f,. Note that the p p o i n t class of (2.16) is a special case of (2.18) with f k ( w )= 1 for w E 0, and f k ( w )= 0 for o 6i i l k . More generally, (2.18)
438
PROCEEDINGS OF THE IEEE. VOL. 73, N O .3, MARCH 1985
might represent the set of spectra of all processes that yield output power ck when input to a filter with transfer function [2nfk(o)]'/* for k = 1 ., n. Thus a model such as ; . (2.18) arises when the available information consists of power measurements taken at the outputs of a filter bank. If 9and A" both of the c-contaminated or variational are neighborhood type, then it can be shown (see Kassam and Lim [44], Poor and Looze [49]) that the robust filter is of the form
' f , c i
/!Frepilency,
Q2
H R ( o )=
if H , ( o ) < k'
Ho(o),
: ,
if k' c H , ( o ) d k" if H,( w ) > k"
(2.19)
Fig. 3 Typical robust filter characteristic for p p o i n t mod. els for spectral uncertainty.
where H, = Os,o/(Os,o + QN,,) is the nominalfilter and where k' and k" are two constants depending on the value of c and on the particular model used. This robust transfer function is illustrated in Fig. 2. Note that the effect of
A typical filter of this type is depicted in Fig. 3. Note that this is a zonal filter that can only be implemented approximately for temporal signals; however, in optical filtering where the variable t in (2.1) is interpreted as a spatial parameter and w as spatial frequency, this type of filter would be very simple to implement (see Cimini and Kassam [50],[52]). An interesting feature of this model is that the performance of the robust filter is constant over the classes Y and N a n d i s given by [50]
Fig. 2. Typical filter robust characteristic for c-contaminated or variationalneighborhood models forspectral uncertainty.
incorporating the uncertainty into the design is a limiting of the gain of the nominalfilter both from above and from below. This solution has a nice intuitive interpretation if one considers the action of the nominal filter H,. This filter is designed to have near-unity gain in spectral regions where the nominal signal-to-noise power density ratio, O ~ , o ( w ) / O N , o ( w )is large,and to have , near zero gain at frequencies where this ratio is small. In other regions, the gain is chosen to balance the effects of signal distortion and noise throughput. The robust filter transfer function reflects similar characteristics but, also, because of the spectral uncertainty, it limits the gain from above to guard against a greater than nominal amount of noise power at the frequencies where OsS.o/ONN.O 1 and it limits the gain from >> below to assure that unexpected signal power at frequencies where Os,o/ON,O 1 will not lead to undue distore tion. If both signal and noisespectral uncertainty classes are the p p o i n t formof (2.16) with common spectral bands Q 1 ; . . , Q 2 , , then the robust filter can be shown to be given by (see Cimini and Kassam [50], Vastola and Poor [51])
for all (as, , E 9 X A". 0) ' The robust solution for the band model of Example 4 is given by Kassam and Lim in [44] and its behavior is similar to thatfor the c-contaminated model(which is a special case). Solutions for generalized moment classeshave been given by Breiman in [53], and a particular case of this model will be discussed below. (Of course, the p p o i n t model is also a special case.) Other models including combinations of the above models (such as the bounded p p o i n t model [54]) and more general models based on Choquet capacities [ a ] , [55] have also been considered. To illustrate the potential effectiveness of the robust filter we return to the problem described by the nominals of (2.7) and the uncertainty model of (2.8). This is a ppoint model with a, = [-1,1], Q, = Q;, ps = 1/2, and pN = 0.063. Fig. 4 superimposes the (constant) performance of the robust filter (2.20) for this case onto the nominal and worst case performance curves of the nominal filter from Fig. 1. Note that the performance of the robust filter over the entire
where
(2.20)
Fig. 4.
and
Performance curves depicting the favorability of the robust filter for the example of Subsection 11-A.
KASSAM A N D POOR: ROBUST TECHNIQUES F O R SIGNAL PROCESSING
439
uncertainty class is only slightly degraded from the nominal performance of thenominal filter and is remarkably improved over the worst case performance of thenominal filter. This example illustrates fairly dramatically the favorability of the minimax design for filtering in uncertain environments.
D. Robust Causal Filtering, Prediction, and Smoothing
minimizing the worst case error
sup
(@,,@,)SYXK
eA(
ON; H,
(2.27)
The results discussed in the above subsections assume implicitly that the class of allowable estimation filters includes noncausal filters; i.e., h ( t - T ) in (2.2) is not necessarily zero for T > t. While this assumption is not restrictive for many applications such as those involving spatial filtering or enhancement of stored signals, there are also many applications in which causality ofthe estimation filter is desired for the purposes of real-time processing. To discuss the situation we consider again the observation model given by
over all causal transfer functions H. Although this problem is analyticallymore difficult thanthe corresponding noncausal problem, it can be shown for convex Y and JV that, withinmild conditions, a saddle point solution for this problem is given by the optimum causal filter HL corresponding to the least favorable spectral pair ( @ s , L , @ N , l ) for this causal problem, where ( @ 5 , 1 , ( P N , l ) is defined via
v( t ) = S( t ) + N( t ) ,
- 0 0
c tc
00
(2.22)
where { S ( t ) ; - m < t c a} and { N ( t ) ; -00 < t c 00) satisfy the assumptions made below (2.1). We wish to estimate the signal at time t A for some fixed X based on observations up to time t; i.e., we wish to consider estimates of the form
i ( t + h ) = Jh ( t - T ) Y ( T ) d T . f
-m
(2.23)
Note that X c 0 in (2.23) corresponds to fixed-lag smoothing of the signal, X = 0 corresponds to causal filtering, and A > 0 corresponds to signal prediction. The M S E associated with the estimate of (2.23) is given by
'
+
+ H(
eA(aS!@Ni
H)
(2.24)
where H is the transfer function of the filter { h( t); t 2 O}. For fixed as and mN such that the observation spectrum, (as aN), satisfies the Paley-Wiener condition, the MSE of (2.24) is minimized over all causal filters by the filter with transfer function
Ht( 0)=
@( S
[55], Franke [57], Vastola (seePoor [45], VastolaandPoor [58], Franke and Poor [SS]). Thus conceptually, the causal robust signal estimation problem is no more difficult than the noncausal one, since one designs a robust filter by first maximizing e:(as,QN) over Y X X and then designing an optimum filter for the maximizing pair via (2.25). However, in the noncausal problem, there is a tractable closed-form expression, namely (2.5), to be maximized to find least favorables, whereas no such general expression exists for the causal problem; i.e., there is no general closed-form expression for e\(Qs,mN) of (2.26). O n the other hand, there are many specific cases of practical interest for which 4(Qs,QN) in closed is known form (see,e.g.,Yao [ a ] , Snyders [61], [62D and, moreover, general (but tedious) methods for finding such expressions are available (e.g., [61D. Robustness in several causal filtering problems has been considered using these results. Cenerally speaking, the phenomena observed are more or less the sameas for the noncausal case, although, for a given model, nominal causal filters appear to be somewhat less sensitive to uncertainty than nominal noncausal filters are, due to the relatively lower selectivity of causal filters (see, e.g., Vastola and Poor [a]). Example: Robust Prediction: An interesting example of an applicationinwhich the aboveresults can be easily applied is the problem of discrete-time one-step pure pred i ~ t i o n . ~ particular, suppose we observe a discrete-time In signal directly up to some time t; Le., we have
Y(k) = S(k),
* * .
, t - 3,t - 2 , t - l,t}.
(2.29)
1
w, +' N (
0)1'
JAWs(
0 )
@( S
0 >+ @ N (
'1
Suppose further that we wish to predict the value of S ( k ) at the next sampling instant k = ( t 1) with a linear predictor
i(f+i)=
+
2
k--W
h(t-k)s(k).
(2.30)
(2.25)
where the subscript denotes causal part in an additive spectral decomposition and the superscripts and - denote causal and anticausal parts, respectively, in a multiplicative spectral decomposition (see, e.g., Wong [56]). The minimum M S E is thenobtained by combining (2.24) and (2.25) as
This is the problem of (2.23) with A = 1 and O N ( o ) = 0 for all w . The minimum-MSE functional 4(aS,aN) this probfor lem is given by the well-known Kolmogorov-Szego-Krein formula (Hoffman [63])
e ! ( a S , O ) = exp( 2 a
'/" -"log@s(o) d u } .
(2.31)
eA(aSraN;Ht)
ef(@S,@N).
(2.26)
As in the noncausal situation discussed above, it is commonly the case in practice that as and mN are not known exactly but rather are knownto fie in some uncertainty classes Y and X of possible signaland noise spectra. In this case we may seek a robust filterfor Y and X by
Thus in order to design a predictor to be robust over an uncertainty class Y of signal spectra, we choose as, via
'Note that the discrete-time problem i s the specialcase of the continuous-time one in which the spectra are concentrated in the spectral band IwI 6 n. Thus the above discussion holds for both discrete and continuous time.
440
PROCEEDINGS O F THE IEEE, VOL. 73, NO, 3, MARCH 1965
@s,L
= arg{ max
(PSESP
/"log@s(w) d o ) -"
(2.32)
20-
and we then design the optimum predictor for this spectrum. This problem has been considered by Hosoya [64] for the particular case of an -contaminated spectrum and by Franke [57] and Vastola and Poor [55] for the general case.6 A related problem has been considered by Korobochkin [66]. It i s interesting tonote that the spectrum of (2.32) can be interpreted as being that member of 9 which is "closest" to a uniform spectrum (see Poor [67]).' This has a very nice intuitive interpretation since a uniform spectrum corresponds to white noise, which is the universal worst case type of signal to predict. (In other words, past and present data are useless in predicting future values of white noise). It is also interesting to note that the quantity
1 " -/ -"log@s(a) d o 297 is the spectral entropy of the signal process (see, e.g., [ a ] ) , and thus the least favorable spectrum is maxentropic in 9. Also, since the entropyof a process is a measure of its indeterminism, the least favorable spectrum can be thought of as the most indeterministic, a term introduced by Franke
L5 10.
05 -
-*
1 f
0
Frequency, w
9
1
Fig. 5. Least favorable spectrum for predicting an c-contaminated wide-sense Markov signal.
Another interesting example of robust prediction comes from consideration of the following signalspectral uncertainty classes (see Franke [57]):
where c,,.
. -,c,
(2.36)
is a set of constants. Since
WI.
As a specific example, we consider the particular case of an -contaminatedfirst-order wide-sense-Markov signal; i.e, we have
1-"e'"k@,( /" 2.n

+
w ) do
2.n
where
-m
@(o) d o = I} (2.33) ,
I-$
@S!O(~)
= 1 - 2rcos(o)
+ r~ '
-.n<w<.n
(2.34)
and where 0 B r 1. Theleast favorable spectrum for this class is given by (see [67n
@S,L(W)
is the kth-lag autocorrelation ( E { S ( ! ) S ( ! + k ) } ) of the signal, (2.36) corresponds to the set of all spectra whose first (m 1)-lag autocorrelations take the given values c,;. .,c,. Such a model is applicable, for example, when we have measurements of a finite number of autocorrelations of the signal process. Note that (2.36) is an example of the generalized moment constrained model from Example 5 of the preceding subsection. To find a robust predictor for the class 9 of (2.36) we first look for the least favorable signal spectrum by solving the constrained optimization problem max O,E92.n
" /-"logas(o) do 1 subject to -/" Zok@s( do o) 2.n -" 1
= ck,
; , -
C ) @ ~ , ~ ( W ) ,
if (I - E ) @ ~ , ~ ( O ) c' > if (I - ~ ) @ ~ , , ( oc' < )
I k l = 0;.
1 ,
m.
(2.35)
where c' is chosen so that
I -1 = 2.n
(2.37) Equation (2.37) will be recognized as the well-known maximum entropy spectrum fitting problem, and its solution is straightforwardly seen to be given by
-m
@s,L(o) = 1 do
This spectrum is illustrated in Fig. 5 for the case r = 0.5 and

c = 0.25. Note that, as c increases, the peak in the center of
the frequency band "melts" into a uniform spectrum in the frequency tails.
'A nonstatistical approach to the minimax prediction problemhas been considered byseveral authors; a recent study by Colubev and Pinsker is found in [65]. 'This follows since
Ik-0
with a, = 1 and with a l ; . . , a M and & satisfying the Yule-Walker equations for the correlations c,; . ., c,,, (see, e.g., [69]). The spectrum ass.[ (2.38) is the spectrum of the of mth-order autoregression
r n
S(t)=
k-1
dks(t+k)+gr(t),
t=0,~1,~2,'.'
(2.39)
where C is the convex function C ( x ) = -1ogx. Comparing with (2.13) one sees the interpretation of the negative of the spectral entropyof as being a measure of the distance of QS fromthe uniform spectrum.
where { c( t)}y- - is a sequence of orthogonal, zero-mean, and unit-variance random variables. Thus the optimum one-step predictor for the least favorable spectrum (2.38) is
K A S S A M A N D P O O R : R O B U S T TECHNIQUES FOR SIGNAL PROCESSING
441
given by
3( t + 1) =
dkS( t k=l
+1+ k)
(2.40)
and so the robust one-step predictor for (2.36) is a simple finite-length predictor with coefficientsdeterminedfrom the Yule-Walker equations. That this predictor uses only rn past samples is an intuitively pleasing result, since we have no knowledge of the correlation structure beyond lags of length rn. This result has been generalized to pstep prediction by Franke and Poor in [59].
kides and Kassam in [70]. To illustrate the nature of solutions to this problem we suppose, for example, that the channel is known to have a linear phase characteristic and that the channel gain l K ( w ) l is known only to lie between known upper and lower limits L c ( w ) and U,(o), respectively. That is, suppose we have
L C ( @ )
IK(w)I
Uc(w)
(2.4)
for all frequencies a,but that I K ( w ) l is otherwise unknown. Then, assuming U c ( w ) > 0 and Q N ( w ) > 0, i t can be shown [70] that the magnitude of the robust (minimax) equalization filter H R ( w ) is given by
(2.45)
E. Robust Equalization of an Uncertain Channel
where
In the observation model of (2.1) it is assumed that the signal we are interested in estimating is corrupted only by the additive orthogonal noise process { N( t); -bo < t < m } . However, in many situations of practical interest we also have linear distortion or spreading of the signal by the observation channel. This situation can be described by an observation model of the form
Y(t)=/-k(f-~)s(~)d~+N(t),
--m
-m < t < m
(2.41) where the noise and signal processes satisfy the assumptions made after (2.1) and where k ( t ) is a channel spread function. The problem of estimating S( t ) from the observation of (2.41) is the general problem of channel equalization (or deconvolution) plus filtering, which arises in many applications such as communications, sonar, seismology, and image processing. If, as before, w e consider signal estimates of the form
Note that the quantity @ N ( o ) / L ~ ( w ) @ s ( ois a measure of ) the maximum possible noise-to-signal ratio at frequency o, and A ( w ) is a measure of the uncertainty in the knowledge about the channel. Thus (2.45) implies that if the maximum noise-to-signal ratio at a given frequency is larger than the uncertainty in the channel model, then we use the optimum gain for the lower channel, Lc(o),at that frequency. Alternatively, if the opposite is true for a given frequency, then we simply ignore the noise at that frequency and use the gain prescribed by the inverse average channel. A similar result can be obtained for situations in which the phase of K ( w ) is also unknown.
F. Robust Filtering of Signals in Correlated Noise
i(t)
=/-h ( t - T ) Y ( T ) d T
-m
(2.42)
then with known as,Q N , and k , the optimum (minimumMSE) equalizationfilter is given straightforwardly by the transfer function
where K is the transfer function of the channel and where K* denotes its complex conjugate. In practice, the transfer function of the channel is rarely known precisely. However, to design the optimum equalization filter of (2.43) one needsexact knowledgeofthis channel characteristic. Thus as in the case inwhichthe signal and noise spectra are uncertain, it is necessary to seek an alternative design objective for situations in which the channel characteristic is uncertain. In particular, if we can model the channel as having a transfer function which lies in an uncertainty class X , then an appropriate design criterion might be a minimax M S E formulation where the maximization i s taken over all channels in the class X . This minimax formulation for equalization of uncertain channels has been proposed and investigated by Mousta-
Another aspect of the assumptionsmadeabove on the model (2.1) that is sometimes violated in practice is that of no correlation between the signal and noise processes. Signal-dependent noise arises in many applications such as radar or sonar, for example, due to phenomena such as multipath and clutter. Often the correlation between signal and noise in such applications is not well modeled, so that robust techniques are useful. If we assume that the signal and noise processes of (2.1) are jointly wide-sense stationary, then their total correlation picture can be described by the spectral density matrix D given by
where as and aNare as before, QSN is the cross spectrum between { S ( t ) ; - m < t < 03) and { N(t); - m t < m}, and where, as before, the asterisk denotes complex conjugation. For a signal estimate of the form
we have that the M S E is given straightforwardly in this case by
442
PROCEEDINGS O F THE IEEE. VOL 73, NO. 3, MARCH 1985
nothing is known about the cross spectrum, then all we can say is that
o d l @ ~ ~ ( w ) I d [ @ ~ ( ~ ) @ N ( ~ ) ] 1 -20, O < w < m
(2.56) where the right-hand inequality follows from the required nonnegative definiteness of D ( o ) . Since min { a , b } d for all a >, 0 and b 2 0, it follows straightforwardly from (2.54) that in this case we have
w)
Hw(w) =
as(w ) + 2 Re { aSN( + Q N ( o)}
@( S
+ @N( w ,
(2.50)
@ s N , L ( ~= )
-min{@s(w),@N(w)},
-00
<w <
00.
and the corresponding minimum value of M S E is given by

e(@,,@,,@s,,; H w )
(2.57) Equation (2.57) together with the optimum-filter expression of (2.50) gives that the robust filter for completely unknown cross correlation is given by the somewhat surprising result
A eW(@St@N,@sN).
(2.51) Other results for different bounded uncertainty classesare giveq in [71].
Of course, with @ s N ( w ) identically zero the expressions (2.49)-(2.51) reduce to (2.3)-(2.5). If the correlation matrix D of (2.47) is not known precisely but, rather, is known only to lie in some class 9 of spectral density matrices, then to seek a robust alternative to the optimum filter of (2.50) we can replace the objective function e(@s,@,,@sN; with its supremum over 9.As in H) the uncorrelated signal and noise case, this yields a minimax game for designing a robust filter H,. The solution to this problem has been considered by Moustakides and Kassam[71], [72]. Within mild conditions it canbe shown that, for convex classes 9,a spectral density matrix Dl E 9 and its corresponding optimum filter from (2.50) (when it is uniquely defined) will be a saddle-point solutionto this game if and only if DL is least favorable; i.e., if and only if
D, = arg
G. Uncertainty Classes of Spectral Measures

In the above discussion, we have considered several aspects of the basic problemof robust linear filtering of stationary signals in additive stationary noise. In particular, we havediscussed the basic robust filtering problem in a noncausal framework, and we have also discussed the treatment of filter causality, equalization, and cross correlation between signal and noise. One issue which has not been discussed is the treatment of stationary processes that do not necessarily spectral have densities but, rather, have associated spectral measures (or, equivalently, spectral distributions). This situation arises in practice primarily when there are pure harmonics in the signal and/or noise. For example, in a communications receiver, one might have processes thatnominally have spectral densities but also contain pure-harmonic uncertainties caused by sources such as line hum or tone jamming. To treat the robustness problem in this more general context requires a measuretheoretic formulation of the filtering problem. This issue has been considered by Poor [48] for the case of noncausal filtering and by Vastola and Poor [55] and Franke and Poor 1591 for the case of causal filtering. The results obtained for this situation are quite similar to those for the case in which all processes concerned havespectraldensities, with the additional advantage that quite generalresults concerning the existence of least favorable spectral measurescan be obtained. In particular, suppose we have the observation model of (2.1) inwhich the signal and noise processes are real, zero-mean, orthogonal, wide-sense stationary, and quadratic-mean continuous. Then for an estimate of the form of (2.2) we can always write the M S E as
E ( I W -m
2 )
( min ew(@s,@Nt@sN)).
DE9
(2.52)
To illustrate the possible structure of the least favorable spectral density matrix, it is interesting to consider the case in which the signal andnoisespectra are known but the cross spectrum is not known precisely. In particular, suppose we can establish the fact that the cross spectrum satisfies the conditions
0 d L(o) d
pSN(W)I
u(o),
- 0 0
<0<
0O
(2.53) where L and U are given functions. (Such a modelmight arise, for example, if a confidence band for the cross spectrumcouldbe determined via spectrum estimation.) For this model it can be shown (see Moustakides and Kassam 1721) that the least favorable cross spectrum is given by
@sN,L(w) =
-L(o),
-I3(0),
if B ( w ) d L ( o )
if L ( m ) d B ( 0 ) Q U ( o ) if U ( w ) d B ( o )
-U(o),
(2.54) (2.55)
where I3 is defined by
B ( w ) A min{@s(w),@N(w)}.
O 11 - H ( w ) 1 2 m , ( d o ) /a
-m
Thus at a given frequency, whether the worst case involves minimum cross-spectral density, maximum cross-spectral density, or something in between, depends on the relationship among L , 13, and U at that frequency. If, for example,
(2.59)
KASSAM A N D P O O R : R O B U S T T E C H N I Q U E S F O R SIGNAL P R O C E S S I N G
443
where m, and mN are the spectral measures associated (via Bochners Theorem [56n with the processes { S t); - m ( t < m } and { N( t ) ; - m < t < m}, respectively. The transfer function that minimizes (2.59) for fixed m, and mN is given by
H w ( w ) = d( m,
is given straightforwardly by
E( i(0)
- S(0) 1)
= 2a
1 - H( a) 1
12@,(
o) do
e( as; ) H
(2.65)
+ m N )( w )
where the differentiationin (2.60) denotes the RadonNikodym derivative. Alternatively, if m, and mN are known onlytoliein classes -k; and X , respectively, thenwe would like to solve the minimax problem min
H
where H is the transfer functionof thefilter sequence -..,h(-2),h(-l),O,h(+l),h(+2);.. and where @ is , the spectrum of the signal. For fixed a the M E of (2.65) is , S minimized by the interpolator with transfer function
/-/,(a) -=I
@,(o) 2~
(I/@,(
--I
0)) w l - . d
sup
(cms,mN)=&x&
e( m,, m,; H ) ) .
(2.61) (Note that the zeroth Fourier coefficient of H, equals zero, as is necessary for H, to be a valid interpolator.) The minimum possible value of the mean-square interpolation error for a given signal spectrum @ is given by ,
e(@,; H,) =
An interesting class of such problems arises when total powers in the signaland noise processesare both known (i.e., when ms(( - m, m)) 2nP, and mN((- 00,00)) ZnP, are constant on -k; and X ) .In this case, solutions to (2.61) can be characterized for a general type of uncertainty class studied by Huber and Strassen in [73]. Theseclassesare of the form
[L/ 2n (l/@,(u)) d w 1 - l
e,(@,).
(2.67)
4 =m E d I m ( B ) g v ( B ) , {
for all 6 E B and m(R) = v(R)} (2.62) where R denotes the set of real numbers, B denotes the a field of Bore1 sets in R,Adenotes the set ofall spectral measures on (R,S?), and v is a 2-alternating (Choquet) capacity. A2-alternating capacity in this context is a set function mapping B to R with the following properties: v ( + ) = 0, v(R) < m, A c 6 v ( A ) < v(B), v is continuous from below and is continuous from above on closed sets, and v( A U B ) v( A n B ) g v( A ) v( B ) for all A and B in S?. Examples of classes of the type of (2.62) are given in [73]-[76] and include such common uncertainty models as the r-contaminated class, the variational neighborhood, the band model, and others. Classes of the form of (2.62) are useful for the problem of (2.61) because the solution to (2.61) for the situation in which = 4s and = 4N two 2-alternating capacfor ities q and v, is given by (see Poor [MI)
Suppose we design an interpolator based on an assumed signal spectrum but that the true spectrum is a. The , resulting mean-square interpolation error is given by inserting the optimum interpolator (2.66) for into (2.65), in which case we have the mismatch error
where n = d q / d v N is a Radon-Nikodym derivative between % and v,. (Note that H , from (2.60) could be written as H,(o) = X(o)/(X(o) 1) where X = dm,/ dm,). Thus for uncertainty classes generated by 2-alternating capacities the general solution to (2.61) is characterized. Similar results for causal problems are discussed in [55].
Note that the numerator integrand @5/(@,,o)2 implies a high degree of sensitivity of the interpolation error to large spectral components in the actual spectrum that are not predictedbythe nominalmodel. Thus as with theother linear-estimation problems discussed in the preceding sections, it is desirable to robustify an interpolator against uncertainty in @ This can be done by embedding @ , . in a , spectral uncertainty class 9 and replacing the minimization of e(@,; H ) with the minimization of its worst case value over 9. This problem was posed and solved by Taniguchi in [77] for the case in which 9 is an r-contaminated model, and the solution to the general casewas characterized by Kassam in [78]. In [78] it is shown that, within mild conditions on 9, a spectrum @ E 9 and its corresponding optimum interpo,, lator from (2.66) will be minimax robust over SP if and only if @,,L is least favorable; i.e., if and only if
a ,
H. Robust Interpolation
argmax e,( as) aS9
(2.69) equiv-
A problem related to robust filtering is that of robust signal interpolation. In the signal interpolation problem we have a discrete-time signal S ( k ) , k = 0, *I, f2;.., which we observe for all k # 0. Our objective is to estimate S 0 () from { S ( k ) ; k = *I, *2, ... } with a linear estimate
where e, is from (2.67). Moreover, note that alently minimizes

2n
/ w
--I
C( @ 0))dw , (
i0 = ()
k#O
h( k ) S ( k ) .
(2.64)
The mean-square estimation error incurred by using (2.64)
where C is the convex function C ; = l / x . Thus as with () the filtering and prediction problem discussed above, least favorables for many normalized uncertainty classes can be found by applying analogousresults from Hubers robust hypothesis-testing formulation (see Kassam [78] for details).
444
PROCEEDINGS O F THE IEEE. VOL. 73. NO. 3, MARCH 1985
I. Robust Linear Filtering for Vector Observations
Many problemsof practical interest involve the vector version of the observation model (2.1). Although conceptually similar t o the scalar problem, vector filtering problems often present practical difficulties that are usually overcome by the imposition of specific structural models such as a finite-state signal model. These difficulties are not alleviated in the robust versions of vector filtering problems, and so the types of uncertainty models for which minimax robust filters can be developed are usually more structured thaninthe single variable case. Developments for the vector problem analogous to those for the scalar problem discussed in thepreceding subsection are found in[79]-[83]. As an example of the type of uncertainty class that can be treated in this context, Chen and Kassam [81] consider an uncertainty set of spectral density matrices which share a common (constant) eigenstructure but whose eigenspectra lie in band models. Another set of structural assumptions that allow for the treatment not only of the vector case but also of time-varying situations is the usual state-space signal model
1 1 1 . ROBUST FILTERS F O R APPLICATIONS
SIGNAL
DETECTION RELATED AND
S( t ) X( t )
C( t ) X ( t )
t)
= A( t ) X (
+ v( t ) ,
t > to
(2.70)
where E { v( t ) v( s} = Q( t ) 8 ( t - s, { v( t ) } is independent ) ) of { N( t ) ) , and A( t ) and C t ) are matrices of appropriate ( dimensions. We assume that the observation noise { N ( t ) } has correlation E{ N ( t ) N ( s ) } = R ( t ) 8 ( t - s). The best linear estimator of X ( t ) from { Y ( T ) ; to < T < t } is given by the well-known Kalman-Bucy filter
j?( t )
= A( t ) i (t )
+ K( t ) [ V( t ) - c( t ) i (t ) ]
(2.71)
where K( t ) is the Kalman gain matrix determined from the error covariance P( t ) = COV(X( t ) - k( t ) ) . Although uncertainty can arise within the model (2.70) in several ways, there are two basic types of uncertainty that can affect the performance of linear estimators. One of these is uncertainty in the second-order statistical properties of the noise (and initial condition), such as the matrices Q and R and the assumptions that { v( t ) } and { N( t ) } are white and uncorrelated. The other type of uncertainty is uncertainty in the dynamical model itself; i.e., uncertainty in the A and C matrices. Problems ofthe first type have been studied by a number of investigators including DAppolito and Hutchinson [79]. VandeLinde [MI, Morris [85], Poor and Looze [%I, and Verdu and Poor [87]. The basic result for minimax design within this type of uncertainty is that the linear structure is preserved in the minimax solution and the corresponding gain matrix is chosen to be optimum for a least favorable set of second-order statistics. These results are thus of the same type as those discussed in the preceding subsections. Problems of the second type, however, have received less attention in this context, and have been treated only recently in a paper by Martin and Mintz [ a ] . As one might expect, the effects of uncertainties in the dynamical structure of the model (the A matrix) are quite different than those of uncertainty in the noise statistics. In particular,the minimax solutions for uncertain A matrix are not pure strategies(i.e., they are not simply of the form (2.71)) but rather are mixed strategies-randomizations among several filters of the type (2.71). Another approach to this problem has been proposed by Cusak [89].
One the of most pervasive of functions that signal processing schemes are required to carry out is that of detecting a signal of a generally known type in noisy observations. Obvious examples of applications inwhich signal detection is required are provided by radar (detection ofecho pulses) and sonar (detection of a random signal present in an array of hydrophones). Numerous other applications may be listed; for example, detection of specified two-level pulse-code sequences in communication systems and detection of abnormal patterns in medical imaging. In the classical theory of signal detection one starts with specific statistical or deterministic models for the signal and observation process, and proceeds to obtain a detector that has optimum performance under an appropriate detection performance measure such as detection probability or output SNR. In this section we will restrict attention to the class of linear detectors or filters and to the output signalto-noise (SNR) as the measure of a linear filters detection performance. The design of optimum detectors under these two restrictions has been carried out for a large number of special applications. There several are reasons for the widespread acceptance of these design restrictions. One is that optimizationof the linear filter design to maximize output SNR is usually a simple mathematical problem and leads to explicit solutions, and theimplementationof a linear filter is usually straightforward. Another reason is that in many cases the statistical models for the noise and any random signal are often stated in terms of only their correlation functions or power spectral densities (that is, their secondorder statistical properties). In suchcases, one simply does not have enough statistical information, such as the parametric form of the probability-density functions, to allowoptimizationto be attempted over a larger class of detectors. If the noise processcan be assumed to be Gaussian then itit possible to show thatthe optimum detector maximizing the detection probability in the detection of a deterministic signal is indeed the optimum linear filtermaximizing the output SNR; this of course is the ubiquitous matched filter of detection theory. In general, the matched-filterspecification depends on the exact form of the deterministic signal for whichit maximizes its output SNR, and on the exact noise autocorrelationfunction or power spectral density. Since these quantities are rarely known exactly, the need for applying robust techniques may arise naturally in the matched filtering problem. Oneinteresting extreme case occurs when the nominal assumptions on signal and noise characteristics are such as to result in a singular detection problem. This means thatundernominalconditionsthe output SNR is infinitely large, implying perfect detection is possible. Consider, for example, an ideal low-pass nominal signal sinc(w,t) whose Fourier transform S ( w ) is shown in Fig. 6, and let the noise have a nominalpower spectral density O N ( w ) which is the triangular function of Fig. 7. The matched-filter frequency response S * ( w ) / @ . , ( w ) is shown in Fig. 8; it increases without bound as w approaches +ao. The output SNR under this situation is
K A S S A M A N D P O O R . ROBUST TECHNIQUES FOR SIGNAL PROCESSING
445
Fig. 6.
Ideallow-passnominalsignalcharacteristic
S(o).
section differ from those in the previous section on robust filters for signal estimation primarily because our criterion This to of performance here is the output SNR. leads mathematical approaches that are conveniently unified using Hilbert spaces. In addition, unlike the Wiener filtering problem, no direct correspondence exists between these results on minimax robust matched filters andresults on robust hypothesis tests, although some indirect connections do exist. In Subsection Ill-B we will give some interesting special applications of the general results on robust matched filters in spatial array processing and time-delay estimation. We will close this section with a more mathematical discussion in Subsection Ill-C of the general Hilbert space approach for formulating robust linear detection problems; this approach provides a common framework for a variety of such problems.
A. Robust Matched Filters
- WO
WO
Fig. 7 Triangularnoisepowerspectraldensity .
QN(o)
As an example of a system using a matched filter for signal detection, consider the structure of the receiver for a pulse train in which a given pulse shape p( t ) occurs in the i t h positionof the train with some amplitude a,, i = 1,2;-., m. In a pulsed radar system these might represent the echo pulses from a target in some fixed range gate, with the amplitude factors a, produced by the beam pattern of the receiving antenna as its main beam scans past the target position. The noisy received waveform may be described by the equation
rn
0 WO - WO Fig. 8. Mached filter frequency response for ideal low-pass signal and triangular noise power spectral density.
v( t ) = e
i=l
dipj(
t)
+ N( t ) ,
oG
t G mT
(3.2)
But suppose that the signal deviates very slightly from the nominal and becomes sinc([q, - r]t) where c is small and positive. The output SNR using the original matched filter now drops to zero, or - co decibels! In this situation where the signal bandwidth may not be precisely q,, it would be better to design the matched filter for the smallest bandwidth that may be encountered. The resulting filterwill then perform well for the minimum bandwidth signal and will be fairly insensitive to deviations of the signal bandwidth from q,. This basic idea is expanded upon in SubsectionsIll-A and Ill-C, where we consider minimax robust matched filters which maximize the worst case performance over a given pair of classes for the allowable signal and noise characteristics. Also a propertyof such robust matchedfilters in most cases is that their performance is relatively stable, ornot too variable,over theallowable classes of characteristics. The issues of stability and singularity in certain matched filtering problems have been considered by several authors, most notably Root [2] and Kailath [%I. The minimax robust matched filter has also been described as an optimally stable matched filter. While the simple example we have considered above to illustrate the possibility of extreme sensitivity ofperformance of a nominal matched filter assumed that only the value of one parameter (the signal bandwidth) was variable, in general we deal with nonparametric classes of functions in modeling the uncertainties in signal and noise characteristics. The results and interpretations we survey in this
where p,( t ) = p( t - [ i - 11T), the basic pulse p( t ) delayed by ( i - l ) T units of time, Tis the pulse repetition period, 8 is the overall pulse-train amplitude, and N ( t ) is random noise. Note that the above observation model can also describe the received signal in a binary signaling scheme. For example, we may take m = 1 and 8 = $, 8 = 8, as being two possible values of 8, with eo = 0 and 8, # 0 corresponding to the case of on-off keying; or, with m > 1 (3.2) may describe a coded waveform such as are used in direct-sequence spread-spectrum modulation, with p( t ) being the "chip" waveform and a1,a,; ..,a,,, being a pseudonoise code sequence. For the general problem of testing 8 = 8, versus 8 = 8, in the observation model (3.2) when N ( t ) is Gaussian with power spectral density aN(u), an optimum receiver structure is shown in Fig. 9. This receiver is an implementation of the likelihood-ratio test for 8 = 8, versus 8 = in the case of Gaussian noise. More generally, the output of the matched filter at the end of each pulse interval has the maximum S R obtainable from a linear N filter. The frequency response /-/,(a) the matched filter of is
. I
L---------i
oj
Fig. 9.
Structure of optimum receiver for pulse train in
Gaussian noise.
446
PROCEEDINGS
O F
THE IEEE, VOL.
73,
NO. 3, MARCH 1 9 8 5
where P ( w ) is the Fourier transform of the pulse p( t ) . We shall now focus on the design of the matched filter when P ( o ) and Q N ( w ) are not precisely known. In the next section we will survey techniques for modifying the correlator detector following the matched filter in Fig. 9 for situations where the noise at its input cannot be assumed to be Gaussian. The general expression for the SNR at time T at the output of a filter with frequency response H ( o ) when the input is Bp( t ) N( t ) , is
This SNR is maximized by H ( o ) = H,(w) of (3.3). The synthesis of H M ( w ) in this case requires anexact knowledge of P ( w ) and Q N ( w ) . Suppose, however, that the received pulse p ( t ) is a possibly distorted version of a nominal pulse &(f), as illustrated in Fig. 10. A good mea-
makes it too sensitive to deviations in the pulse characteristics, the actual pulse possibly having lower energy at these frequencies. Thus the filter (3.7)is robust from an intuitive viewpoint. Another interpretation of (3.7) is that signal uncertainty is translated into an additional white-noise component in the noise spectrum. It is interesting to note that in the case of white noise with the above signal uncertainty the filter matched to the nominal pulse is itself the robust filter, since any absolute gain factor is irrelevant in matched filtering. Thisresult agrees well with behavior observed in practice [93]. It is also interesting to note that the additional white-noisecomponent which appears under signal uncertainty in this approach ensures that the detection is not singular, regardless of how artificialthe nominal model is. Alternatively, suppose we assume that the pulse shape, say h(f), and the total noise power uk are known, but that the true noise spectrum is known only to lie in a class V , of spectra bounded by known upper and lower bounds U,(w) and L N ( w ) , respectively. That is, suppose we assume that the spectral shape of the noise is constrained only by the band model
and
28
/*
QN(
-m
a) d o
ui
(3.8b)
This class, which is the same as that considered as Example 4 in Subsection 11-C, is illustrated in Fig. 11. In this case the
Tlme,t
Spectral
Fig. 1 . Non-nominal pulse shapes p( t ) 0
Denslty
Allowable
sure of the degree of pulse distortion is the integrated squared difference between p( f) and &( t ) or, equivalently (via Parceval's relation), the integrated squared difference between their Fourier transforms, P ( o ) and Po(w). Thus a useful signal uncertainty model for this matched filtering problem is to assume that the received pulse p( f) is known only to lie in the class V, which is the class of pulses p( t ) with Fourier transforms P ( w ) satisfying
No~seSpectra
Frequency, w
Fig. 11. Bounded noise power spectral density class
Here P,(o) is the Fourier transform of the nominal pulse, and A determines the degree of uncertainty or possible distortion in p ( t ) . The solution H = H, to the problem max min S N R
H
robust filter can again be given an interesting interpretation. It turns out that the robust filter H,(o) in this case is the matched filter for detecting &(f) in noise whose spectrum tends to be as close in its shape as possible to the shape of IPo(w)l. Specifically, the results in [9l], [92] show that the robust filter solving max min SNR
H
(3.6)
(3.9)
@% ,' E,
PWp
where SNR is from (3.4) is given by Kuznetsov Kassam, Lim and Cimini [92] as
[91]and
is given by
(3.10)
where the positive constant 00 depends on A and is an increasing function of A . Equations (3.3) and (3.7) imply that the robust filter for (3.5)is forced to haveless gain than it would otherwise have at frequencies where the noise power is low. It is the higher gain of the optimum filter for Po(w) [that is, P b ( w ) e - j " T / Q N ( w ) ] at these frequencies which
K A S S A M A N D POOR ROBUST TECHNIQUES F O R SIGNAL PROCESSING
447
Here k is a constant determined by the requirement that in and therefore must have total power uh, which i s assumed to be known. It is clear that @N,L(o) tries tofollow as closely as possible a scaled version of IPo(o)l. This phenomenon is illustrated in Fig. 12. An illustration of the magnitude of H R ( u )for the example of Fig. 12 is shown in Fig.13.This figure also shows what theoptimumfilter magnitude would be if a nominal
@ N , L ( o ) be must
3 ,
Frequency, w
Fig. 12. Illustration of least favorable noise spectral density of (3.11).
Frequency, w
Fig. 13. Illustration of IHn(u)1,robust matched filter amplitude response, for a bounded noise power spectral density class.
noise spectrum having the characteristic (c/2)[L,(u) U,(u)] (that is, the normalized center of the band) is used in the design of the filter, where c is a normalizing constant chosen to get the correct power u&. The implication for the robust filter is that its magnitude characteristic is flatter than it would be with any other @,(a) in 9 .This means , that extremes of gain and attenuation do not appear, making the filter less sensitive to the presence ofadditional noise power at relatively high-gain frequencies and to a reduction in noise power at relatively high-attenuation frequencies. The performance of the robust filter may thus be said to have been stabilized. Of course, the advantage gained in using a robust filter in any particular case depends on how extreme the frequency response of the nominally optimum filter is. By combining and extending the above results it is possible to obtain the robust filter for simultaneous uncertainties about the pulse shape and the noise spectrum. This has been done in [92].The extension of this spectral-domain formulation of the matched-filtering problem to the case of discrete-time observation processes is quite straightforward. It i s interesting to note that an explicit consideration of a game-theoretic approach in the design of a matched filter was first considered by Nilsson [I21 and Zetterberg [13]. In
[13], in particular, Zetterberg considered uncertainty in the noise spectrum which is assumed to have a known whitenoise component and another component which has fixed total power but is otherwise unknown. He obtained a result which i s similar to that of (3.11). Although Zetterberg did not allow uncertainty in the signal, it is noteworthy that he assumed a fixed white component for the noise spectrum. We have seen above that a white-noise component for the noise spectrum is prescribed for the L, signal uncertainty class of (3.5). Other related studies arealsodiscussed in [13]. More recent work of this nature has also been reported by Rodionov [94], Cahn [95], and Turin [%I. The multi-input matched filtering problem under signal and noise uncertainty in the frequency domain has recently been considered by Chen and Kassam [97], [%I. An m-component signal vector s ( t ) may be modeled to belong to a generalized L , uncertainty class that is a neighborhood of some nominal signal vector so( t ) . In [97], [ X ] the class was defined in terms of a set of constants A , , A 2 ; . . , A q ; these were now the bounds on the allowable values of q sums of component-wise integrated square differences between the vectors s( t ) and s o ( t ) . Thenoise power spectral density matrix @,(u) was assumed to lie in a generalized bounded spectrum class. Specifically, from the decomposition @,(a) = P,(u)A,(u)PZ(u) where P , ( u ) is a unitary matrix and A N ( o ) is diagonal, one may obtain an uncertainty class by requiring the components of h N ( a to have known upper ) and lower bounds. In addition, a generalized noise power constraint was imposed which required that the sums of integrals of the components of A N ( a )for r sets in some given partition of these components be equal to known values Q,, Q 2 ; . . , Q r . The results given in [98] for this multi-input matched filtering problem form a useful extension of the scalar results we have mentioned above. The frequency-domain formulation of the matched filtering problem is useful for large (infinite) observation intervals. A direct time-domain approach in modeling the signal and noise uncertainties is desirable when the observation interval is some finite or semi-infinite interval [ t o , t , ] with one or both end-points finite.In this case, the matched filter weighting function (impulse response) is the solution to an integral equation. In the finiteinterval discrete-time case the SNR functional and theequation for optimum filter weighting function are given in terms of matrix and vector quantities. Thus for this situation the robust matched filtering problem is amenable to simpler mathematical analysis. By viewing the finite-length discrete-time matched filtering problem as a multi-input (time-domain) problem it is seen that the ideas for the models for signal and noise quantities in themulti-input frequency-domain case are directly applicable. This approach has been followed in [97], [98] where the signal vector s = (s,, ;s ..,)s of samples , , s, and the noise covariance matrix R, are modeled as belonging to generalized d, and bounded eigenvalue uncertainty classes, respectively. A slightly different situation is considered by Kuznetsov in [99]. In [IOO] a more general approach has been taken by Verdh and Poor, and three signal vector uncertainty classes are considered ( ! t,, and , dm classes defined as neighborhoods around a nominal signal vector). A general class of noise covariance matrices, defined in terms of matrix norms, hasalso been studied in [IOO]. The general approach in [IOO] is based on a Hilbert space formulation of the matched filtering problem, which
448
PROCEEDINGS O F THE IEEE,
VOL. 73,
NO. 3, M A R C H 1985
we shall discuss in Subsection Ill-C. There we will also see how the time-domain continuous-time version of the problem may be approached. In fact, the finite-time-interval continuous-time robust matched filtering problem was originally considered by Kuznetsov [I011 even before his frequency-domain problem formulated in In [9l]. [loll Kuznetsov used a signal uncertainty class inwhich the possible deviation 8 ( t ) of a signal waveform s ( t ) from a nominal signal s o ( t ) is bounded in its integrated squared error over the interval. Further aspects of this problem have been considered more recently by Burnashev [102]. A similar model was also used for the uncertain noise covariance function, and in addition uncertainty in the mean value of the noise was also taken into account. Subsequently Aleyner [I031 modified the signal uncertainty class by imposing a signal-energy equality constraint, a result which was later extended tothe envelope uncertainty case for bandpass signals in [104]. A variant of the robust matched filtering problem is that of optimal nominal-signal selection to obtain the best possible minimax performance. This situation has recently been considered by Verdir and Poor in [I051 for the finite-length discrete-time situation. There the actual signal s is assumed to be a nominal signal so with an additive vector 8 which can belong to some class of possible vectors. For given so the robust matched filter can be obtained and the minimax performance level canbe determined. With so allowed to be any nominal vector with fixed total energy, the choice of so maximizing this guaranteed performance level is obtainedin [I051for signal uncertainty 8 in t,, t,, and t, neighborhoods of the zero vector. While the result for the e, uncertainty class indicates that so should be the minimum-eigenvalue eigenvector of the noise covariance matrix (a similar result was obtained independently by Kuznetsov [%I), this classical solution does not hold for the other uncertainty classes. Before considering two specific applications in the next subsection, we mention that recently a maximin sonar system design problem has been considered by Vastola, Farnbach, and Schwartz i n [ l a ] , inwhich the signaland detector pair are picked to maximize the worst case performance of the system over a class of possible reverberation scattering functions.
B. Two Examples of Robust Multisensor Systems
ysis is closely related to that performed in obtainingthe matched filter in a deterministic signal-detection problem. I ) Robust Spatial Arrays for Signal Detection and Location: Consider a /-element narrow-band linear array, illustrated in Fig. 14. Thesignals received at the array elements from any source are time-delayed (or phase-shifted) versions of the source signal. Let x i be the position of the
FOf-feld
swrce
01
8,
rmvelength
'
freanncv
i
&ray outpu1
Fig. 14. Narrow-band spatialarray for signal detection and location.
i t h array element, measured from an arbitrary origin. Let X be the wavelength of the source signal, and suppose a far-field source is in some direction 0 from the array broadside. To detect a signal from the direction 8 , a phased-array system uses a set of complex weights or phase shifts h,,(B) = exp(-j2.rrxj sinO/X) to "line-up" in phase the signals received at each element. More generally, when sources of interference are present in specified directions and the observations at each element are noisy, one may design the weights h j ( 8 ) to maximize the SNR at the array output due to a signal in direction 8. Now it canbe shown that the generalized / X 1 noisecovariance matrix R , for K interfering sources at locations ;, . ., 8 is, element-wise, 0 ,
el,
RN( i , e)
q8,, +
K
Rk&2n(x,-x~)sinek/X
(3.12)
k=l
We will now describe briefly two specific applications of these results on robust matched filters. Our first application is a narrow-band spatial array system in which the individual sensors or antenna outputs are given complex (amplitude and phase) weights to detect signals arriving from any particular direction. While spatial matched filtering has previously been considered for this case [107], here we will be concerned with uncertainties in some of the signal and noise characteristics. It has been found that the use of minimaxrobust weights leads to another significant advantage in such a system.Thesecond application we will consider is that of a system for estimating the time delay between the random signal arriving at two sensors, with the signal observed in independent additive noise at each sensor. While this is not directly a signal-detection problem involving matched filters, we find that in at least one approach to optimum system design the mathematical anal-
where M.: is the white-noise level at the ith element, S,, is the Kronecker delta, and R, is the power of the kth interfering source. For a source in direction 0 the nominal signal complex envelope at the i t h element is
so,
- e / 2 n x , slne/X -
(3.13)
assuming a normalization to unit signal amplitude ateach array element. The matched-filter array weight vector maximizing output SNRis ho(e) =
R 1e) Ne (
(3.14)
where so(0) = [sol(@), sO,(8);. ., s0,(8)]'. This weight vector becomes the phased-array weight vector with components h,,(B) = exp( - j 2 n x , sin @/X) (with amplitudes normalized tounity) when there are no interfering sources present. For the general case of R , given by (3.12), the spatial matched filter will have unequal amplitude weights
KASSAM A N D P O O R . ROBUST TECHNIQUES F O R SIGNAL PROCESSING
449
and a set of phase weights differentfrom that for the phased-array case. Our interest is in the case where there is some uncertaintyinthe characterization (3.13) of the signal components. This arise may because of imperfect propagation characteristics, element position uncertainties, or element gain variations. To model such uncertainties we use the e, class of possible signal vectors s(e) which satisfy
The performance shown in Table 1 does not show a dramatic advantage (or disadvantage) obtained by using the robust spatial filter, in terms of SNR performance. However, there is an important aspect of performance of suchan array system which is not captured by examining the output SNR in the correct lookdirection 8 only. As the array scans for targets by adjusting its weights for different angles 8 the output SNR becomes a function of the look angle 0 for any fixed source direction 8 = 0 Alternatively, a beam . , pattern for the array may be defined as
While the covariance matrix R, mayalsobe taken to be uncertain, we will assumehere that it is known exactly. If the are all equal to some common value W it can be shown that the eigenvectors of R, are independent of W. Moreover, (/ - K ) eigenvalues of R, are equal to Wand the others are of the form W + u j where the ui are independent of W. Such properties of R, may be used in enlarging the class of allowable noise covariance matrices, although we shall not do so here. General results forfinite-length discrete-time matched filters discussed below canbe applied here to obtain the robust spatial matched-filter weights as
(3.17)
the array output signal magnitude for a nominal signal in direction 0 with array weighted to look in the direction e,. Ideally B(e, e, ) should have only a narrow peak at 0 = e as a function of 0 for fixed e , ., Figs. 15 and 16 show (f.or the numerical example giving Table 1) that the robust spatial matched filter weights produce normalized beam patterns that have a distinct
-10
( R,
e)
(3.16)
g-20
0
I
where u, i s a constant depending on A which effectively , adds uncorrelated noise components to the array elements, and s,(B) is the least favorable signal vector. Table 1 shows numerical values for the output SNRs usingthe nominallyoptimum weights and the minimax robust weights for A = 2 and 4 for the following system: 0 signal direction = 0 / number of array elements = 12 element spacing = x/2 =I K number of interferers = 1.0 R, power of interference = -0 6 0, direction of interference white-noise levels, given by
m-30
-40
1 k,?:
,
, o f , IT,:,
,
o
0.2
,
0.4
se n i Fig. 15. Beam patterns of nominal ( A = 0) and robust 2) matched filter spatial arrays.
-50-50-1
1 1
,
ols ,
-08 -OB -a4 -0.2
(0.25, 2.0 \O.l;
i = 1,2,8,12 i = 3,4,9,10,11
i=5
i = 6,7.
Table 1 Signal-to-Noise Ratio of Array Output Using Nominal and Robust Matched-Filter Weights
Sin 6 Fig. 16. Beam patterns of nominal ( A = 0) and robust ( A = 4) matched filter spatial arrays.
As expected, the robust weights hR(B) give a worst case performance (for s , ( B ) ) which is better than the corresponding performance of the nominally optimum weights h,(B) = R;@(e). Note that the SNR is a function of the filter weights, the signal vector and the noisecovariance matrix R,, and that for A = 0 the vector h R ( 0 ) is the nominally optimum vector ho(B).
advantage over the pattern of the nominally optimum array. The main beams of the robust array weights are narrower and their close-in sidelobes are appreciably below those of thenominallyoptimum array, a factor important in array design. Thus the use of a robust weighting scheme produces a side benefit which probably exceeds in imporof maintainingthe output tance its originaljustification
450
PROCEEDINGS O F THE IEEE. VOL. 73, N O . 3. MARCH 1985
SNR under uncertainties. The robust weights differ from the nominally optimum weights as a result of the addition of the matrix a,/ to R,, and this suggests that as a general procedure a consideration of different values of a, in array weight design can lead to optimum tradeoffs between output SNR and beam characteristics under nominal conditions. Notice that the use of A values such as A = 2 implies that allowance i s made for up to two dead elements in the array. As an extension of the signal class modeled by (3.15), one can consider a generalized class defined in terms of several tolerances A,,A2; .., A q for q groups of sensors. Further aspects of the performance of a robust spatial array system are discussed in [97], [%I. Before leaving this subject we note that recently Ahmed and Evans [I081 have considered robust narrow-band array processing under uncertainties, although their definition of robustness is based on the notion of an acceptable set of performances rather than on optimizing worst case performance. 2) Robust EckartFilters forTime-DelayEstimation: The estimationoftime delay between signals arriving at two spatially separated sensors is of interest in applications such as seismology and sonar. In suchsystems, the time-delay measurement gives information about the direction of arrival of a wide-band source. Let Y,( t ) and VI( t ) be the outputs of two sensors, these signals being described by
v,( t ) = 5( t ) + Nd t ) V2( t ) = S( t - D ) + N2( t ) .
(3 .I 8a)
(3.18b)
Here S ( t ) is a random sourcesignal and N,(t), N2(t ) are uncorrelatedadditive-noise processes at the sensors. The basic technique for estimating the unknown relative delay D is to cross-correlate Y,( t ) and V 2 ( t ) and to use as an estimate of D thattime argument which gives the maximum value of the cross-correlation function. To improve the estimation process one can use a filter to weight the cross-spectrum estimate. Various criteria for optimum choiceof the filter have been proposed [109]. One particular performance measureis the output SNR at the correct time delay; for weaksignals (low-input SNR) this output SNR for long averaging time is [I091
SNR at the correct time delay for long observation time under weak signal conditions in time-delay estimation. The is main difference lies in the fact that aS(w) a nonnegative power spectral density whereas P ( w ) in matched filtering was a Fourier transform of a finite-energy signal. The implementation of an Eckart filter requires knowlare often not precisely edge of Qs(o) and Q(o).These known but may bemodeled as belonging to uncertainty classes such as those considered for power spectral densities in Section II. As an example, consider the p p o i n t classes for both aS(o) and Q(w), which specify the fractions of the total powers of @,(a) and Q(w) in specified intervals partitioning the entire frequency spectrum. Such information may be obtained from measurements at the output of a simple cross correlator when signal is present, and fromoutput power measurements in the frequency bands of interest under noise-only conditions. Since our earlier results on the minimax robust matched filter were obtained specifically for the L , signal uncertainty class, these results are not directly applicable here. However, it is quite simple to show [I101 that for the time-delay estimation problem where SNR of (3.19) is the performance measure of interest, the minimax robust Eckart filter for ppoint classes for QS(w) and Q(o) has a piecewise-constant frequency response. The values of the constant levels of the robust Eckart filter frequency response W,,(o) are determined simply as the ratios of the given fractional powers in @,(w) and Q(u) in the common frequency bands defining the p-point classes (as in Example 3 ofSubsection 11-C). As an example, assume that the nominal signal power spectral density Qs,o(w) and the nominal noise power spectral density C J ~ , ~ ( Ware given by )
@N,O(
= @5,0(
(3.21)
and
where W ( w ) is the real filter frequency response, @,(a) is the power spectral density of the zero-mean and stationary signal process, and Q(w) = @$(w), the square of the noise power spectral density at each sensor. The noise processes are here assumed to be zero-mean and stationary with identical power spectral densities, and signal and noise processes are assumed to be uncorrelated. The filter function W,(o)maximizing the above SNR is given by (3.20) which is the Eckart filter. Comparison of (3.3) and (3.4) with (3.20) and (3.19) shows the correspondence between the matched-filter maximizing output SNR in the detection of a deterministic signal and the Eckart filter maximizing output
For A = 1 (input SNR = 0 dB) the nominally optimum Eckart filter W,,(w) results in an output SNR of 277 or 8 dB. Now Qs,o(o) and @&,o(o) with a = 1/3 are members of particular p p o i n t classes with two frequency bands[0,1.61] and [1.61, w). (The value 1.61 of the frequency-band edges is the one which maximizes the performance of a two-level piecewise-constantfilter under nominal conditions.) Suppose the noise power spectral density actually is a constant c on [O,w,] and zero outside. Now we may pick the values of c and o, to make @$(w) a member of the p p o i n t class to which Q $ , o ( ubelongs. In this case we find (with A = 1, ) as before) thattheinput SNR is now - 3 dB.The robust Eckart filter for situation this i s the optimumtwo-level piecewise-constant filter for the nominal situation. Its SNR in the nominal case is 2.76or4.4dB, and it maintains this value of the SNR for all Q(u)in the p p o i n t class defined above. The nominally optimum Eckart filters output SNR drops to the value of 0.01 or -20 dB when the noise has the above ideal low-pass spectrum in the p p o i n t class. Simulation results [I101 confirmthatthe robust two-level piecewise-constant Eckart filter works quite effectively to
KASSAM AND P O O R ROBUST TECHNIQUES F O R SIGNAL PROCESSING
451
obtain a good estimate of D for all Q(o)in thedefined p p o i n t class, while the nominally optimum filter is quite sensitive to deviations form the nominal assumption. More general results for other convex classes of totalpower-constrained allowable power spectral densities as(@) Q ( w ) = @k(o) have been discussed in [IIO]. and Although the SNR performance criterion of (3.19) is very similar to the SNR measure of (3.4) for deterministic signal detection, the minimax robust Eckart filter is the optimum filter for a least favorable pair @ s , L ( w )and Q L ( uwhich is ) found exactly as in the case of robust Wiener filtering in Section II. This follows from the general observation made in Section I1 concerning least favorable pairs whenever the performance measure is of a type expressible as a "distance" measure between aS(u) and Q(w) for convex classes of power spectral density or probability density functions [Ill]. Before leaving this particular application let us note one further interesting feature of the robust Eckart filter. As in the case of the robust spatial matched filter we have considered above, there is another aspect of the performance of a time-delay estimation system that we have not considered explicitly. This is the variance of the time-delay estimate. It turns out that the robust Eckart filter has the additional advantage of generally producing time-delay estimates with considerably lower estimation variances than theoptimum Eckart filter under nonnominal conditions, with almost similar performances for both under nominal conditions. This phenomenon isdiscussed further in [IIO].
of complex-valued functions f satisfying

'-m
and 4 corresponds to a set of positive, symmetric, realvalued functions. The signal quantity s is identified with the shifted-pulse Fourier transform P(w>e'"'; the filter quantity h is identified with H * ( w ) , the complex conjugate of the filter transfer function; the noise quantity n corresponds to the noise spectrum aN(w); for an element n E 2 and and any f E X , the operation nf is defined by ( n f ) ( w ) = n ( w ) f ( w ) . Note that the inner product on I,( - 00, 00) is defined by
(f,g)
=lJx f*(w)g(w)do. 2n -"
Examples of other matched filtering problems that fit this general Hilbert-space formulation are discussed below. Within this general Hilbert-space formulation, for fixed signal s and noise quantity n the matched filter (i.e., the element h, E X that maximizes p ) is given by any solution to the equation nh, = s. i f n is invertible then h, is n - l s and the maximum value of p is given by (s, n-'s). If, on the other hand, s and n are known only to within classes Y be and JV of signal and noise quantities, respectively, then we can consider the alternative design criterion (as in (3.6) and (3.9))
h E X ( s,n ) E
,x a(
inf
9 X X
p(h;s,n)).
(3.24)
C. General Formulation of Robust Matched Filtering Problems in Hilbert Space

The results we have discussed so far in this section serve to illustrate the ideas behind robust matched filtering and their possible applications, and can be extended in several ways as we have indicated. In this subsection we will consider a general formulation of the robust matched filtering problem which has recently been developed in [112], using a Hilbert-space framework. Most matched-filtering situations (e.g., continuous-time/discrete-time, onedimensional/multidimensional, time-domain/frequencydomain) can be fit into a single general Hilbert-space setting which is convenient for studying robustness in all of these problems simultaneously. In particular, suppose X is a separable Hilbert space(e.g., I , or R") with inner product ( , .) and let 2 denote a set of bounded nonnegative linear operators (e.g., integral operators, matrices, or spectral operators) mapping X itself. to A matched-filtering problem on X involves three quantities: a signal quantity s E X (eg, a signal spectrum or waveform); a noise quantity n E 4 (e.g., a noise spectrum, autocorrelation function, or covariance matrix); and a filter quantity h E X (e.g., a filter transfer function or impulse response).The design criterionfor the matched filtering problem is based on a functional p: X X 2 2 R de% 4 fined by
1
It can be shown (see [112, Lemma I n that if 9 and X are convex, then a pair (s,, n,) E Y X X and it optimum filter h, (satisfying nLhR s t ) is a saddlepoint solution for (3.24) = if and only if the followinginequality holds for all (s, n) E
Y X X :
Moreover, i f nL is invertible then this occurs if and only i f ( s L , n L )is least favorable for matched filtering for Y and M , i.e., if and only if
(s,,n;'s,)
=
min (supp(h;s,n)).
(s,n)EyxX h c X
(3.26)
Expression(3.25) provides a means for checking potential solutions to (3.24), and (3.26) provides a means for searching for such solutions since the expression in brackets in (3.26) is known in closed-form for many situations of interest. By using the above results, solutions to the general robust matched filtering problem of (3.24) have been obtained for generalizations ofthe uncertainty classes gP and VN of (3.5) and (3.8a and b), as well as for other uncertainty classes of interest. For example, consider the situation in which thenoise quantity is known tobe somefixed no E 2, but the signal quantity s is known only to be in the class Y: E 2 defined by '
(3.27)
where so is a known nominal signal, A is a fixed positive number representing the degree of uncertainty in the signal, and llxll denotes the norm of x E X defined by (1x11 = [ ( x , ~ ) ] ' ' ~ . For this problem it can be shown (see [112, Theorem I]) that a saddlepoint solution to (3.24) is given by (h,; s L , no)where h, is given by h, = ( n o + qJ)-ls,
and representing anSNR. Note that, for properly defined X , most conventional matched filtering formulations fit this model (see, for example, Thomas [39]). Theexample described in Subsection Ill-A corresponds to the particular case in which X is complex 1 2 ( - 00,w), i.e., X is the set
(3.28)
452
PROCEEDING O F THE IEEE. VOL. 73, NO. 3, MARCH 1 9 8 5
with I beingtheidentity mappingfrom with a being the positive solution to ,
2 to itself and
(3.29) (3.30)
uillhRl12
and where
sL = S ,
- u0hR.
Note that,forthe specific spectral-domain example discussed in Subsection Ill-A, the identity operator is represented by the unit white-noise spectrum aN(w) 1 and so = (3.28) corresponds to (3.7) with no and so being represented by a N ( w ) and f O ( w ) e + I u T , respectively. (Recall that, for this case, h is identified with H I ( w ) ) . It is interesting to note that, as in the spectral case, the identity operator / generally describes a white-noise process, so that (3.28) indicates that the type of uncertainty described by y: has an effect on the design equivalent to that of adding white noise of spectral height JO. The result regarding the band model of (3.8a and b) can also be extended to general Hilbert spaces. In particular, consider the situation in which the signal is known, say so, and inwhich all of the noise operators n E M can be represented by spectral components { n ( w ) ; w E Q } for some set a. This situation arises in stationary models such as that yielding (3.8a and b), inwhich case n ( w ) canbe taken to be the power spectral density at the frequency w and Q is R (it could similarly be R).Alternatively, for other models, this situation arises when all of the members of M share a common eigenstructure and the set { n ( o ) ; w E a } is the eigenspectrum of the operator n. For example, such a model is generated in L, if the all noise autocorrelation functions in M have Karhunen-Loeve expansions in the same eigenfunctions. In this latter case, would be the set of positive integers, and n ( w ) would be the oth eigenvalue of n. In this general setting the band model (3.8a and b) becomes
identical to (3.11) with nL = @ N , L , L, = n, U, = n, and p0 = 5. This result can be combined straightforwardly with , (3.28)-(3.30) to give the robust solution for uncertainty in both signal and noise (see Kassam, Lim, and Cimini [92] and Poor [I121). Example: Binary Communication: To illustrate the generalityof the above result consider the problem of antipodal signaling in additive Gaussian noise as described by the following pair of statistical hypotheses: versus
H,: Y( t ) = N( t )
+ s( r ) ,
oG tG
(3.33)
where { ~ ( t )0 < t < T } is a known (i.e., deterministic) ; square-integrable signal waveform and { N( t); 0 < t d T } is a zero-mean Gaussiannoiseprocess with autocorrelation function { RN( t,u); 0 < t < T , O < u < T } . Assuming that H, and H1 are equally likely, the Bayes optimum (i.e., minimum-probability-of-error) receiver for (3.33) is of the form (see, e g , Helstrom [113])
(3.34)
+ = +I
where sgn { . } denotes the algebraic sign of the argument; and = -1 denote the acceptance of hypotheses Hl and H,, respectively; y = { y( t ) ; 0 < t < T } denotes the observed realization of the random process { Y( t); 0 t < T } , andwhere { h( T , t ) ; 0 t < T } denotes the impulse response of a linear filter. The probability of error associated with the receiver of (3.34) is given by
f =1
- @([SNR])
(3.35)
where SNR denotes the signal-to-noise ratio at the output of the filter at time Tand is given by
sp, = { n(n(w ) < n( w ) < n( w ) ,

VUEQ
and t r { n } = c }
(3.31)
Here @ denotes the standard (unit) Gaussian probability distributionfunction. The optimum receiver for (3.33) is given by (3.34) with h( T , t ) being the solution to the integral equation
where n and n are known functions and t r { .} denotes the trace operation. Note that for power spectral densities the trace is just
and for discrete eigenspectra the trace is
i-1
4;).
(More generally, we can write tr{n) =jn(o)p(d4 D
This problem fits within the above Hilbert-space formulation with & being real L,[O, TI, s = { s ( t ) ; o < t < T } , h = { h ( T , t ) ; O < t < T } , and n = {R,(t,u); ( t , u ) E ~~,T12}.Witha={a(t);O~t~~}andb={b(t);O~ T } in 2,we have
( a , b)
for some measure p on 3.) Assuming that the signal so also has the spectral representation { S o ( w ) ; w E a } in terms of the same eigenstructure as the members of Nn, canbe it shown using (3.25) (see [112]) that the robust filter forNn of (3.31) is the optimum filter for so and a least favorable noise operator whose spectrum is
n L ( w ) = max{ n(w),min{ ~ ~ ~ , ( o ) l , n ( o ) ) }
=jra( 0
t)b( t ) dt
and the operation na is defined by
p ( h ; s, n ) is given by (3.36), and (3.37) is the equation s. For this model, the signal uncertainty class y: of (3.27) is given by
SO,
nh
(3.32)
where k is chosen so that tr { n L } = c. Note that (3.32) is
(3.38)
KASSAM A N D P O O R : ROBUST TECHNIQUES F O R SIGNAL PROCESSING
453
where so = {so( t ) ; 0 Q t Q T } is the known transmitted signal. This model represents a general model for types of receiver and channel distortion that are difficult to model parametrically. Further justification for this model is found in Slepians notion of indistinguishable signals [114]. From (3.28) and (3.29) it follows that the robust receiver of the form (3.34) for uncertain signal distortion described by (3.38) is given by the impulse response solving the equation
with
A;
Kp;
A,
k-lp;
k-lp;,
A;
< X;
(3.43)
x:l
m
A,! < k-lp;
where p; = IsTyI and where k is chosen so that A L , ; = c.

i=l
jorR,(f,u)h,(T,u)du+gh,(T,f)=s,(f),
o< t<
(3.39)
which is a Fredholm equation of the second kind. The constant u is specified by (3.39) and the condition ,
t$/\h,(
0
T,t)I2 dt =
A.
The performance of specific solutions to (3.39) is discussed in [112]. It is interesting to note again that the effect of the uncertainty modeled by (3.38) is to introduce a white-noise floor of height u. This device is used in standard treat, ments (see, Van e.g., Trees [I]) in order to circumvent singularity problems arising in the solution to (3.37) for the case of continuous R,. That this phenomenon arises naturally here gives additional justification for considering minimax design within (3.38). Equation (3.39) and the condition following it for the robust filter was originallyderived in [IOI]. A modified condition for a signal class with an energy equality constraint is given in [103]. To further illustrate the general Hilbert-space results, consider the discrete-time observation models under the two hypotheses
H,: L; = N; - s i ,
Note that (3.42) must be nonsingular and thus the robust filter is described by h, = R ; l L ~ .This result is a special case of more general results found in Chen and Kassam [97], [%I. The discrete-time model of (3.40) canalso be used to illustrate least favorable signals and noise operators for uncertainty models of interest other than those of (3.27) and (3.31). This problem has been treated in [ I O ] in some detail. For example, suppose the noise covariance is known to be given by diag { uf, u . ., u j; }: (corresponding to uncorrelated samples), and the signal is known only to differ in total absolute distortion by no more than A from some nominal signal so, i.e., we assume the signal lies in the class
rn
3 =
s E R
1s;
Ij=l
- SO,,[ < A .
(3.44)
Then, it can beshown filter h, is given by
[ I O ] using (3.25) that the robust
where h, is the nominal filter and where c > 0 satisfies the equation
rn
i = 1,2;..,m
i-1
: min {O,lho,,l - c } u
(3.40)
A.
and HI: L; = N,
+ s,,
i = 1,2;..,m
where N = ( N , ; . . , N,) is a random vector having zero mean and nonsingular covariance matrix R, and where s = ( 5;. ., s,) is a known signal vector. Here, the output at time m of a linear filter with impulse response {h;, i = 0 , 1 ; . . , m - I } is given by hTs where h, = h,-,, and theoutput S N R is ( l i 5 ) / h T R N h . The matched filter for known R, and s is given by RGs. Of course, this fits the above Hilbert-space formulation with & R (i.e., ( a , b ) = = aTb), s = s, h = h, and with the operator n represented by the matrix R,. A band model class JV of operators such as (3.31) occurs here in the case in which all members of JV have the same orthonormal eigenvectors 9, . - ,v , in y; which case (3.31) is given by R,IR,y
=
Thus in this case the robust matched filter is a clipped version of the nominal filter. Similarly if R, is diagonal and s lies in a class
(3
then the robust filter becomes (see [loo])
( 50,;- A)/$,
(so,,
A < SO,;
-A
< so,;<
(3.47)
+ A)/$,
so.; < - A .
S q
A j y and X;
X; Q X,!,
forall i=l;..,m,and
EX,=
i-1
rn
Thus applying (3.32), the least favorable covariance matrix

is given (using its spectral representation) by
RN,L =
i-1
c XL,,v;YT
rn
(3.42)
This filter is the optimum filter for a least favorable signal in that is as near zero as possible (in terms of the norm max,,l,2,,,,,,IsjD. Extensions of these results can be found in 101. 1 The above treatment makes clear the rather general applicability of the Hilbert-space framework for formulating and solvingrobust linear detection problems. Recently it has beenshown [I151 that a similar Hilbert-space framework can also be used for robust linear estimation problems (which we surveyed in Section 11); for details the reader is referred to [115]. We shall now proceed, in the next section, to consider robust nonlinear signal detection schemes primarily designed to protect against uncertainties in the amplitude statistics of noise.
454
PROCEEDINGS O F THE IEEE, VOL. 73.NO. 3, MARCH 1985
DETECTION justification it provided of the use of detectors based on bounded functions such as i ( x ; s, 0 ) of (1.5). In the previous two sections we focused on the design of Let X = (Xl, ..,X,,) be a vector of independent and X,; robust linear filters for signal estimation and detection, for identically distributed (iid) observation components. Under situations in which there was uncertainty about the spectral a hypothesis H, (null hypothesis) let the commonpdf of densities or correlation functions of the signal and noise. the X i be f,, and under an alternative hypothesis HI let the The performance measures considered there were the M S E common pdf be f,. The requirement is to construct a test and the SNR, which did notinvolvethe exact functional for H, versus HI based on the observations X , when f, and forms of the signal and noise pdfs. f, are not specified completely. The approach taken by In thissection we will survey results on robust signal Huber in [7] was to define first classes ofallowable pdfs detection which pertain specifically to robustness when the under the null and alternative hypotheses. The classes condetection performance is measured by characteristics disidered in [7] were obtained as neighborhood classes conrectly related to the probability of detection or error probtaining in each case a nominal density function and density abilities instead of SNRs. Although in most situations an functionsin i t s vicinity.One such pair of neighborhood assumptionthatthe noise isGaussiangives a direct relaclasses which is popular in robustness studies is the pair of tionship between the SNR and such detection performance c-contamination classes, for which under H,,j = 0,1,the measures, this is not true in the case of non-Gaussian noise. class of allowable density functions is In almost all such cases it is possible toobtain explicit results on the structures of minimax robust detectors (gen( r;O,c,) = { f l f = (1 - c j ) + ; h i } , j = 0,l. erally nonlinear) only when the noise processes seare (4.1) quences of independent random variables so that only univariate pdf uncertainties need be considered. In the first Here $' the nominal density function under hypothesis is two sections we were able to avoid considerations of pdf H,, quantity c, in [0,1) is the maximum degree of contamination for $', and h, is any density function. These uncertainties because the performance measures (MSE and classes are, of course, the same (aside from normalization) SNR) depended only on second-moment characteristics. For as the spectral uncertainty classes of Example 1 in Subsecthe detection performance measures we use here, considertion Il-C. The hypothesis testing problem is therefore that ation of the correlation functions and more generally of of choosing between the two hypotheses? pdfs beyond those of first order is avoided by the assumption of independence. For the pulse-train detectionprobH,: the pdf of the X , is any pdf f, in So, lem discussed at thebeginning of Section Ill, this means (4.2a) that a robust detection scheme can be arrived at by decov H , : the pdf of the X ; is any pdf f, in 9,. pling the treatments of spectral density and probability (4.2b) density uncertainties. That this approach is necessary is not surprising, considering the well-recognized difficulties of Huber then sought the least favorable pair of probability dealing in general with non-Gaussian random processes. densities in 9 x SI, , which is defined on the basis of a risk There are available, nonetheless, some recent results on function R( f,+). In the risk function, $ denotes the test for robust detectionin correlated noise, under certain conH, versus HI which rejects Hj in favor of H I - , with straints, and we will discuss these later in this section. probability # ( X ) when X = (Xl, X , ; . ., X , ) isobserved: For the most part, then, we will be considering here the X ; being iid withdensity function f in 9 or 9,. , various nonparametric classes of univariate noise pdfs exConsider, for example, the minimum probability of error pressing different types and extents of uncertainty about criterion for which we would have the exact noise pdf. We will consider in particular the R( f , + ) = E , { +IX ) } , ( if f E (4.3) canonical detection problems of known low-pass signals in additive noise, deterministic bandpasssignals in bandpass Then for ( f,, f,) E 9 X 9 the probability of error (assum, , additive noise, and random signals in additive noise. Even ing equally likely priors) is with the simplification due to the white noise assumption, explicit solutions for minimax robust detectors are generally possible only under a further restriction to consideration of the local or weak-signal case. We will begin, however, and this is minimized when $ is a test based on a compariI with a description of the results of Huber, and other related son of the likelihood ratio results, for a general hypothesis testing problem. As we n have mentioned i n the Introduction,the 1964 and 1965 A(x) fl(Xi>/S(Xi) i-1 results of Huber [6], [7] have greatly influenced and motivated much of the subsequent work on robust signal with a threshold value ofunity. The least favorablepair processing schemes. (9,,q1) in 9 X F , for this problem is that pair for which , the corresponding test +q based on its likelihood ratio A. Robust Hypothesis Testing satisfies, for all (6, f,) E 9 X 9 , ,
SIGNAL
Iv. NONLINEAR METHODS ROBUST FOR
co
9.
In 1965 Huber [7] published an explicit solution for the robust test for a binary hypothesis testing problem related to the signal-detection problem we discussed in the Introduction. The significance of this result lies not so much in the solution it provided for the hypothesis testing problem considered by Huber, as much as in the mathematical
'The arguments for Fo,Fl will be dropped when no confusion can result. 'Note that + ' ( X ) + # ( X ) = 1.
K A S S A M A N D POOR: ROBUST TECHNIQUES F O R SIGNAL PROCESSING
455
One of the major results in [7] is that for the c-contamination classes a least favorable pair (q,, q ) ,exists satisfying
R ( f , + q ) d R ( c ~ ~ , + ~ ) for f ~ q , j = 0 , 1 (4.6) ,
where (Pq is any test for 9, versus q1 based on a comparison of the likelihood ratio with a threshold (a "probability ratio test").This clearly implies that (4.5) is true and also gives similar results for other risk-based criteria. The pair (qO,q1) and +q for minimum error probability form a saddlepoint for the error probability functional, that is
fe(fOtfl;+q)
pe(q0t91;+q)
fe(qO,%;+)
(4.7)
for any (h, f,) E S X S and anytest o l The test +q is called a robust test for f E S versus f E sf; it minimizes o over all tests the supremum (least upper bound) of the error probability over all pairs in S x SI. o Indeed +q is robust, in view of (4.6), for other risk-based criteria such as the Neyman-Pearson criterion. For this criterion we can define the threshold for +q in such a way that for a design value a of the false alarm probability
R( f , a ) d R ( q 0 , $ )
with R( f,+) as in (4.3).
+.
=a,
f E 6
(4.8)
For the r-contamination class Huber's solution for the least favorable pair turns out to be
q(
bounded by 2-alternating capacities, which are generalized notions of measure as discussed in Subsection 11-G. Other specific uncertainty models have been considered by Khalfina and Khalfin [116], Rieder [74], and Bednarski [75].The <-contaminated classes ofnominal densities have remained the most widely used uncertainty models, partly because of their earlier introduction, but primarily because it is possible to justify their use, in many cases, from physical considerations (see, for example, Trunk [117]). In modeling impulsive noise, for example, the presence of a small proportion (whose maximum value is E) of impulsive components in a background of, say, Gaussian noise can be modeled to have an c-contaminated nominal Gaussian pdf. Any uncertainty about the exact pdf of the impulsive components then leads directly to use of an c-contaminated uncertainty class, perhaps with a side constraint on the nature of the contaminating pdf h. Models of this type have been studied by Miller and Thomas [118]. One other specific uncertainty class of pdfs that has been used more recently is that analogous to the spectral band model discussed in Subsection 11-C; this model is justifiable from physical considerations in some applications. It may be viewed as being more general than the c-contamination class and allows the results obtained for the c-contamination classes to be extended. Specifically the bounded classes f i L , CU) considered in [I191 are defined by
q(CL,Cu)f l $ ={
d f d fju},
i = O , I , (4.12)
L ( 1 - c O ) f f ( x), c "
otherwise
(4 .sa)
\ ? ( I - cl)
ho(x),
otherwise
where c' < c are nonnegative numbers such that qo and " q, are pdfs. The proofof the existence of such a pair in S X P where P and S are disjoint (the case of intero I o l est, since otherwise qo = q,), and the proof that (4.6) holds, can be found in [7]. Note that the likelihood ratio h q ( x ) = 9,(x)/q0(x) for a single observation for the least favorable pair is a "censored" version of the nominal one,
bc" ,
Xq(x) = bXo(x),
c d "
X( ,
x)
c' < Xo(x) < c " X,(x)
(4.10)
[bF,
d c'
where b = (1 - cl)/(l - c 0 ) and ho(x) = ff(x)/r,"(x). In [7] the least favorable pair of density functions was also obtained for another uncertainty model for the density functions under H, and HI. Thiswas defined by the totalvariation classes of probability densities which may be expressed as
~ ( < o , c ) = ( f ~ ~ ~ f ( x ) - < o ( x ) l .d x ~ ~ (4.11)
This is analogous to Example 2 of Subsection Il-C. In a later paper [73] it was shown by Huber and Strassen that least favorable pairs ofprobability measures can be found for probability measure classes which are defined as being
so that allowable density functions in are those bounded by given nonnegative functions C1 and CU which make the 7 non-empty. (These are band models analogous to those definedin Example 4 of Subsection Il-C.) These classes reduce tothe c-contamination classes of (4.1) for C1 = (1 - E,)$' and fiu = a.It is not true, however, that the classes (4.12) are obtained from (4.1) by imposing an upper bound on the contamination densities hi. This is clear from the fact that even if we set fl1 = (1 - c j ) G o , the resulting "nominal" density Go may not belong to C1, <,). in [I191 the least favorable pair in the above classes has been found, and the likelihood ratio for the least favorable pair is also essentially a censored version of either f l l / f & or flu/ku. The bounded classes of pdfs defined by the band model in (4.12) can be viewed as arising naturally in situations where the pdfs under the hypotheses are estimated from training data, in whichcase the bands ( C1, $ are confidence bands. ) , It should be clear that in the varlous uncertainty models considered above for the densities f of each independent component X , ofthe observation sequence X , there was no necessary restriction that X ; be one-dimensional and that f be univariate. It is possible, for instance, to treat an observation sequence of n one-dimensional components as a single n-dimensional observation, and characterize its multivariate density by one of the above classes. Essentially, this approach was taken in [I201 and [I211 by Kuznetsov, who developed results for the characteristics of the robust test for hypotheses described by thebounded classes of (4.12). Since the likelihood ratio in a threshold comparison test may be replaced by any function which yields the same critical region and threshold equality region, simplifications may be made to results of the type in (4.10). In (4.10), for example, for a single observation test with the threshold T inbetween bc" and bc', one gets an equivalent test if simply bX,(x) is compared to T . Note that bh,(x) =
q(
456
PROCEEDINGS O F THE IEEE, VOL. 73, N O 3, MARCH 1965
f,,(x)/f,,(x) when the r-contamination class is viewed as a special case of the bounded class. In [I201 and [I211 Kuznetsov shows that the robust tests are obtainedby threshold comparisons using one of the functions f,,/fOu, flL/fOL,or f,Jf&,. As an interesting application, consider the detection problem (1.1) where 6s = e( s,, s2; . ., s is not precisely known, but is known ) , to belong to a neighborhood of a nominal sequence Bos0. Since so is in R one might define the neighborhood as a spheroid of radius A centered on Boso, as was done in Section Ill. Let the N, be iid and zero-mean, unit-variance Gaussian random variables. A useful bounded class of denI sity functions for X under H is then defined in terms of the maximum-likelihood and minimum-likelihood estimates of Bs,. The robust detector is based on a combination of square-law and linear processing of X; details are given in [121]. results The in [I201 and [I211 on binary hypotheses have recently been extended tomultiplehypothesis testing problems in [122]. We have remarked that it is possible in general to solve for least favorable pairs ofprobability measures for other pairs of probability measureclasses which are defined to contain measures bounded by lalternating capacities [73]. Specific examples of other pairs of classes for which explicit least favorable pairs are available are the p p o i n t classes (discussed in Subsection It-C indefining spectral density classes; these become classes of pdfs when the power is fixed at unity), and the bounded p p o i n t classes [54]. Let us also reiterate the interesting connection between therobust binary hypothesis testing problems for such classes of pdfs and the corresponding robust linear filtering problems with M S E s as performance criteria. For many of the robust linear filtering problems with fixed-power spectral density uncertainty classes, the least favorable pairs of pdfs for the robust hypothesis testing problems defined on corresponding unit-power normalized spectral-density classes produce directly the least favorable spectra[55]. In general, one may interpret the least favorable pair of pdfs in P xF, as o being that pair which has the minimum distance between its components. Indeed, strong connections exist between distance minimization and least favorable pdf pairs [ I l l ] , [I191under some general restrictions. It is this fact that results in the close relationship between the solutions for the least favorable pairs in the robust linear filtering and corresponding hypothesis testing problems. Similar considerations arise in robustness problems formulated in terms of Chernoff bounds [123], [I241 and for point-process observations [ I 251. The problem of robust sequential hypothesis testing was also considered by Huber in his first paper on robust hypothesis testing [7]. More recently this work was extended by Shultheiss and Wolcin in [126], which also dealt with the c-contamination model for noise pdf classes under the null and alternative hypotheses. Numerical performance results from simulation experiments and computations are also given in [126].
f,Jf,,,
described by H in (l.lb) may be considered to arise as a 1 result of sampling a continuous-time additive mixture of a low-pass or baseband signal waveform 6s(t) and a stationary noise process. If the noise bandwidth is large relativeto that of the signal in such a situation it becomes reasonable to assume a sampling rate which results in the noise components N;, i = 1,2;. n, under H, or H, being statistically independent random variables with some common univariate pdf f. Another common situation in which the above observation model is appropriate is that arising in the detection of a pulse-train, as described in Section Ill. In the absence of precise knowledge about the noise pdf f , one can now attempt to obtain a robust detection scheme for a class of possible noise pdfs generated by a model such as the c-contamination model used by Huber. But it should now be apparent that Hubers solution which we discussed in Subsection IV-A abovedoes not directly apply to our known signal in additive noise detection problem. The least favorable pair of pdfs given by (4.9a) and(4.9b) was obtained under the assumption that the pdfs f, 3, and f, 9, are chosen independently; this means that the contaminating pdfs h, and h, in the c-contamination models were not constrained to be related to each other in any particular way. In the known-signal detection problem we should require h, to be a translated or shifted version of h,. O n the other hand, Hubersapproach in which independentcontaminating densities are allowed under the two hypotheses can be used to obtain a conservative solution to the robust detection problem. One of the first attempts to adapt Hubers approach and results on robust hypothesis testing for signal-detection problems was reported in 1971 by Martin and Schwartz [30]. Martin and Schwartz were interested in the signal detection problem described by (l.l), in which the observations X , under H, are independentbutnot necessarily identically distributed. They first showed that Hubers result for the -contaminated classes extends directly to the time-varying problem, where for each component X , of X the nominal densities f:, f t and contamination degrees rOi,rl, may be different. For thedetection of a known signal in nearly Gaussian noise they set 6;. = 9, the zero-mean Gaussian density with unit-variance, and f $ ( x ) = + ( x - 6 5 , ) . They took E, = c to be sufficiently small for a given 6 so that the resultingr-contamination classes F and 3 are disjoint , , for each i, which yielded a symmetric censoring or limiter characteristic. The structure of the resulting correlatorlimiter detector is shown in Fig. 17. The quantities ai in Fig. 17 are related to the degree of limiting and can be solved for from an implicit equation.Although this detector requires knowledge of the value of 6 for implementation, a lower bound is shown in [30] for the detector power function when it is designed for a specific set of values of the parameters.
e,
6. Robust Detection of Known LowPass Signals
Consider again the signal detection problem described by the hypothesis testingsituation of (1.1). Here the signal sequence is known to within an overall amplitude factor, that is, the s,, i = 1,2,.. ., n are known. The observations X ,
Fig. 17. Correlator-limiter robust detector for a known signal i n <-contaminated Gaussian noise.
KASSAM AND POOR ROBUST TECHNIQUES FOR SIGNAL PROCESSING
45 7
In addition to thefact that there is only one independent class of pdfs, the noise pdfs, in the known-signal detection problem, a further limitation of the above approach to obtaining a robust detector is that the signal amplitude 8 needs to be known. In Huber'sapproach, the uncertainty classes of pdfs are usually defined as expansions of corresponding single nominal pdfs, so that the resulting robust test does not generally give a test which is uniformly robust inthe strict minimax sense for, say, a range ofnominal alternative hypotheses. As a specific example, consider the likelihood ratio of the robust test given by (4.10). Here c' and c depend on the density f f . Thus in [30] where these " results were applied to the known-signal detectionproblem, the f t were defined for a particular signal strength 8 . While for this problem it is possible to find a lower bound for the robust detector power function when it is designed for a particular 8 , only for that 8 is the detector implementing a "minimax" test for the detection problem. Finally, the robust tests and least favorable pairs were necessarily obtainable only when Sr,,sf were not only disjoint but had a finite "separation"; thus the case of vanishing signal strength in weak-signal detection cannot be considered. These considerations point to the desirability of another formulation for robust hypothesis testing problems of signal detection. One such alternative approach which has been quite fruitful considers the asymptotic case of weak signals and large sample sizes ( 8 + 0, n + 00). It is interesting to note that the basis for this alternative approach has been the theory of robust estimation of a location parameter which will be discussed briefly in Section V below. Asymptotically Robust Detection: For the known-signal detection problem of (1.1) consider the log-likelihood ratio defined by (1.3) and (1.4). This generally depends on 8 , but for the locally optimum (LO) detector which maximizes the slope of the power (detection probability) function at 8 = 0, subject to a false-alarm probability bound, it is well known that the test statistic for given noise pdf f should be [127], [I 281
based on a test statistic T ( X ) may be defined as [I271
B = lim 1
n-rm
[
n
;EW)I8}I var{ T ( X ) l 8
e-o
=
].
(4.15)
0}
It also turns out that maximizing the efficacy with T ( X ) of the form
L ( x ; ;Si)
i-1
obtained dropping by the undesirable 8-dependence in (1.3), leads one to the LO statistic of (4.13) because the slope of the power function can be related directly to the efficacy. Without the limit ( n 03) in (4.15) the quantity is sometimes called the differential or incremental SNR. For the test statistic of (4.14) with a, = si the efficacy becomes
+
within mild regularity conditions on f and C and with the assumption that C ( X j ) has zero mean value under the noise-only hypothesis. We may also assume without loss of generality that
that is, that the signal has unit average powerwhen its amplitude is unity. A significant observation about the efficacy of (4.16) with this normalization is that the reciprocal of this normalized efficacy is exactly the asymptotic variance of an M-estimate for the signal amplitude 8 of a constant signal (5, = 1). An M-estimate 8 of 8 is in general that quantity solving
n
C(
x; - e ) = 0.
(4.17)
(4.13)
This LO test statistic is a special case of the generalized correlator (CC) statistic
=
i=l
TGC(X)
i=l
a;/(x;)
(4.14)
where a , ; . .,an is a correlation sequence and C is a nonlinearity. The class of GC statistics with ai = s; is then a natural class of candidate test statistics to which attention may be restricted in obtaining a robust detector for known si but for noise pdf f not precisely known. o For a class S of noise pdfs one would like to be able to obtain the minimax robust detector from amongst the class of GC detectors (with ai = s i ) , with slope of the detector power function as the performance criterion. For fixed finite sample size n this is, unfortunately, extremely difficult in general. However, in the asymptotic case n 03 (and 8 + 0), under mild regularity conditions yielding asymptotically normal distributions for the test statistics, the problem reduces to a consideration of a more tractable performance measure calledthe efficacy. The efficacy B of a detector
+
(Note that 4 is the sample mean when C ( x ) = x . ) The asymptotic variance of an M-estimate using nonlinearity t ' is the reciprocal of the normalized efficacy of (4.16) under some regularity conditions. A brief discussion of robust M-estimation of a location parameter when thepdf f of theadditive noise is not precisely known is given in Section V. Here we shall note only that in 1964 Huber [6] found the least favorable pdf in the class of<-contaminated noise pdfs for the robust Mestimation problem, and hence obtained the corresponding optimum M-estimate as the robust M-estimate. It should now be clear (since minimizing the asymptotic variance in M-estimation is equivalent to maximizing the efficacy for our detection problem) that the least favorable pdf found by Huber will also make its sequence of LO detectors (for n = 1 , 2 , . . . ) an asymptotically robust sequence of detectors, providing a saddlepoint value for the game in which the efficacy is the performance function. In fact, one can get a stronger result where the false-alarm probability can simultaneouslybebounded. These results for asymptotically robust detection were first extended from Huber's estimation results, by Martin and Schwartz [30] and later were expanded upon by Kassam and Thomas [129].
458
PROCEEDINGS O THE IEEE. VOL 73. N O . 3. MARCH 1985 F
The general result in [30] and [I291 may be summarized as follows. Let 9 be the class of all detectors of asymptotic size (i.e., false-alarm probability) a for our hypothesis testing problem with f E So(g,c)as defined in (4.1). Now g is assumed to be a symmetric nominal density function that is strongly unimodal(Le., - logg is a convex function), and is twice differentiable, and the contamination h is symmetric and bounded but otherwise arbitrary. Let B,,(elf) be the power function of a detector D with f E So(g,c)the noise density function. Our false-alarm probability constraint is that for each D E 9 lim
n+
00
the asymptotically normal distribution of this finite-variance statistic. The resulting generalized correlator detector may be described as a limiter-correlator detector, and this is one of the canonical structures of robust detection theory. Note that when g is zero-mean, unit-variance Gaussian we have -g( x)/g( x) = x, and becomes the amplifier-limiter or soft-limiter nonlinearity. The structure of the limiter-correlator detector which is asymptotically robust for known-signal detection in c-contaminated Gaussian noise i s shownin Fig. 18. Note that this particular GC
! ,
BD(Olf) d a,
all f ESo(g,c).
(4.18)
The asymptotically most robust detector D E 9 is then , defined as the locally optimum detector for the least favorable , ESo(g,c) such that f lim
r - 0O l
.&,.soft
Bb,(l
1,
fR)
all fEgFg(g,c),
(4.19)
Llmlter
r---------I I CorreIota !
.I
L
I -_-_-_-__ J
ol=s,
Fig. 18. Limiter-correlator robust detector for a weak signal in -contaminated Gaussian noise.
in addition to (4.18) and detector can be used in place of the linear correlator detectoracting on the matched filteroutputs in Fig. 9, especially when the matched filteroutput noise components can bemodeled as being -contaminated Gaussian random variables with unknown contaminating pdfs. further details of the above results are given in [30] and [129], including comparison with other detectors on the basis of asymptotic relative efficiency (ARE), and a discussion of the robustness property of the simple sign detector, which doesan extreme form of limiting on the observations. A numerical study of the performance characteristics of various limiter-correlator detector nonlinearities for the additive known-signaldetectionproblem has been made by Miller and Thomas in [118]. Thenoise density in [I181 was taken to be the c-contaminated Gaussian nominal with contaminating densities of the impulsive-noise type modeled by exponential (Laplace) and generalized Cauchy functions. The nonlinearities considered were various multilevel approximations of the canonical amplifier-limiter, including the hard-limiter and the noise-blanker. The performance was characterized in terms of asymptotic relative efficiencies with respect to the conventional linear-correlator detector. The growing interest in detection systems using quantized data ledto a consideration in [I301 ofthe above known-signal in additive-noise problem with the requirement that the detector characteristic !in (4.14) be an m-level piecewise-constant quantizer characteristic. It has been shown in [I301 that the most robust quantizer-detector in the sense of (4.18)-(4.20) for the c-contaminated noise density class So(g,c)of (4.1), and for the detection problem (l.l), is again the locally optimum quantizer-detector for the least favorable density of (4.21). Robust Detectors Based on M-Estimators: The detectors we have considered so far in this section have all had the general structure of Neyman-Pearson optimum detectors, which are based on a comparison of the likelihood ratio to a threshold. The test statistics of our detectors have been of the form
The DRsatisfying (4.18)-(4.20) exists for a > a(g,c),a lower bound depending on g and c , and is the locally optimum detector of asymptotic size a for fRE Fo(g,c) given by Hubers exponential-tailed density
fR(
(1 - c)g(x), (1 - Og(a)exp[-b(lxl
- a)],
1x1 < a 1x1 > a
(4.21)
where a , b satisfy the equation
1 :
and
g( x) dx
+ 2g( a ) / b = (1 - c)-
(4.22)
The lower bound on a is given explicitly as a function of g and c in [129].11 For the unit-variance Gaussian nominal density this bound is no less than 0.158, which is obtained 0. Note that the robust detector is based on the when c test statistic TGc of (4.14) with C = C = -f;/f,, which is , given by
The threshold for the robust detector can be set by considering the normalized test statistic
and basing the computation of false-alarm probability on

More explicitly, each D E 9 is an infinite sequence ( D , , D2,. . . ) of detectors, one for each sample size n. The functional dependence of the test statistics on X is the same for all members of the sequence. A method for relaxing this restriction is discussed in [130].
i=l
<,(Xi)
K A S S A M A N D P O O R : R O B U S T TECHNIQUES FOR SIGNAL PROCESSING
459
which is that of the statistic of (1.3). A different approach for robust detection of additive known signals was proposed by El-Sawy VandeLinde and in [131]. In their approach, the test statistic is simply an M-estimateof the signal strength parameter 8 , and their robust detector test statistic is the robust M-estimate of 8 for the same class of noise densities. To understand the motivation for this approach, and its difference from the previous one and the possible advantages of the resulting detectors, let us reexamine brieflythe aboveresults on asymptotically robust known-signal detection. It is clear that, as for any consistent test, the robust detector based on a test statistic of the form of (4.14) has a power function BD(81f) which approaches unity as n approaches 00 for each 8 > 0, assuming that
so that
n
b = arg min
0
i-1
I ( X , - es,).
(4.25)
i-1
5f/n
This is a generalization for nonconstant signals of Hubers definition. Note that (4.17) s, = 1, all i, is obtained from for this with / ( x ) = dL(x)/dx. Let the asymptotic detection power as a function of v for an M-detector based on a function L be denoted as BL(vlf). This power function obviously depends on the noise density function f 9. analysis in [I311 establishes the The following results. Let U2(f, L ) be the asymptotic variance of ri*(b - e), which is asymptotically normally distributed with mean zero for f E F and L E 9.Suppose there exists a density , EF f such that L, = -log fK is in 9, and U2(f, L , ) Q U2( L,) for all f E . F . Let the M-detector f , based on L , have a threshold y . Then for v 2 y , we have
approaches a positive value and ai = s,. Thus the practical interpretation of (4.19)is that for large enough n for which use of the central limit theorem is reasonably well justified, the slope of the power function at 8 = 0 for D, may be considered to be minimized by , in Fo(g,c). f In [I311 it is pointedout that this condition does not guarantee that for each f F0(g,c),we will have BD,(81f) > BD,(8lfR) in some interval (0, 8,). A simple example of a density in 4ro(g,c) for which &,(elf) < BD,(elfR) when 8 0 proves this. The lack of a strict inequality in (4.19), which allowsthis to happen, ledto the alternative approach in [131]. For the hypothesis testing problem of (l.l), note that for a finite-power signal (for which
Bl,(olf)
and
BL,(OIfRR)!
(4.26)
=-
Iim
n-m
1-1
sf/.
=
a finite positive value) the signal energy as n + 00 becomes infinite under the alternative hypothesis H,. If the hypothesis testing problem statement is slightly modified by replacing the amplitude 8 with v/n1l2 for some constant v, thetotal signal energy in any sample of size n remains finite, and the limiting energy is v2, assuming C = 1 without loss of generality. In a practical sense, suppose an observationoflength N is given for which the known signal sequence (s1,s2;..,s,,,) has amplitude 8, and for which
N
In fact, since the M-detector based on -log , is asymptotf ically equivalent in performance to the detector based on thelikelihoodratio for , when fR is the noise density f function, the maximum over Y in (4.27) can be replaced by the supremum over all detectors with size equal to that of the M-detector based on L,. The main requirement in the above is that v not be less than y . This implies that for a given v, the design false-alarm probability cannot be too small. The requirement v > y can be met by making the sample size n sufficiently large in any given situation. Note, however, that the saddlepoint conditions (4.27) and (4.26)donothold together for a given sample size and false-alarm probability constraint, in an interval around v = 0. It has been shown in [I311 that for such cases the robustness criterion based on asymptotic slope of the power function (as in (4.19)and (4.20))also makes the M-detector based on L, the most robust detector, but again subject to the same restrictions on the minimum value for the false-alarm probability. If the saddlepoint solution (f,, L,) exists it may be obtained by first minimizing the Fisher information function for location, I( f ) , over all f E 9. Fisher Information The function for location is the function I( f ) =
Sf/N
i-1
C,.
lrn X))*/f( ( f(
-m
x ) dx.
(4.28)
For N large enough, the results to follow can be applied with v = NC,,,]l/. The minimax robust detector for this known-signal detection problem was obtained from amongst a class of M-derectors in [131].This is a class of detectors for which the test statistic is an M-estimate (as in (4.17)) the signal strength of 8. Let Y be the class of M-estimator functions L which satisfy convexity, symmetry, monotonicity, and differentiability requirements, in addition to mild requirements on the moments cd the associated random variables when the noise density function f is a member of a general class 9 of symmetric densities. The M-estimator b of 8 , based on n observations and a function L , is the value of 8 minimizing
el[
L(
i-1
x; - Os;)
It is no accident that / ( f ) is the maximum value of the normalized efficacy in (4.16), obtained for / ( x ) = - f(x)/f(x), the LO nonlinearity. The minimizing density is f and the function L , = -log f in Y gives the saddle, , point pair (f,, L,). In [I311 the family of symmetric density functions with a known probability p assigned to an interval ( - a , a ) was explicitly considered as a n example. For this p p o i n t class of densities the least favorable density f was , again shown to have exponential behavior outside ( - a , a ) , and the robust M-detector characteristic L, had derivative t, which was a constant outside ( - a , a ) . Some numerical performance comparisons of asymptotic performance and finite-sample simulation results are also given in [I311 for a particular example. The significance of the function dL(x)/dx = / ( x ) in Mdetection is that under appropriate regularity conditions,
460
P R O C E E D I N G S O F THE IEEE, VOL. 73, NO. 3. M A R C H 1985
the test statistic
must satisfy
n
Sit(
i-1
X, -
8Si) =
0.
(4.29)
(This is the same as (4.17) for s, = 1 , all i.) The variance of the statistic 8 is the reciprocal of the efficacy of the generalized correlator detector based on the test statistic of (4.14), with ai = s,. Note that the mean of 8 is 0. Since t is a monotonic increasing function of its argument for L in the class 9,we have a simple way of implementing such a detector when s, all have the same sign. In this case, the sign of
i-1
S A X ,
- us,)
directly indicates if 8 is above or below the threshold y . The robust detection of known signals using an M-estimate has also been extended in [I321 to the sequential binary signaling problem. Here the two hypotheses H, and HI are defined by
will be discussed in Section V. The robustness of these L-estimates in estimation has prompted investigation of their use in signal detection; notably Trunk and George [117], [133], [I341 have investigated the use of some simple L-estimates in signal detection problems. The results we havediscussed in this subsection show quite clearly the central role in robust signal detection of robust estimation theory, particularly of the original results of Huber, in addition to that of Hubersresults on robust hypothesis testing. A different formulation of the additivesignal robust detection problem in bounded-amplitude noise was considered by Morris [135]-[137]; this work is related tothe early work of Root [9] mentioned in the Introduction. We now go on to consider, in somewhat less detail, the other two canonical signal detection problems of random signals and of bandpass signals in additive noise.
C. Robust Detection of Random Signals

In many applications involving the detection of random signals the observations are obtained simultaneously from a number of sensors forming an array. The detection problem a random signal then often becomes that of detecting common to each sensor in the presence of noise processes uncorrelated with each other and with the signal. An example of such an application is a hydrophone array in a passive sonar system used to detect the presence of, and to locate, sources of random signals.Foreachsearch direction, relativetime delays are imparted to the sensor outputs to equalize the propagation delays to each sensor for a potential signal from that direction. This is followed by a detector for a common random signal. A special case of interest is that in which the sensor array is composed of two elements. In this situation, the sampled observation vectors X , = (Xl1, X l Z ; . . , ( y l n ) and X , = (X,, , X,,; . can be described by
a ,
where the density function of noise components N, is a member f of some class 9 of symmetric densities. Under the same mild restrictions on the class Y of allowable M-detector characteristics L as in the nonsequential case, a robustness property is established in [I321 for the sequential M-detector (MS-detector) based on the sequence of robust M-estimates 8 defined by the same characteristic L, as in the robust nonsequential case.Thus when f = f, for this robust scheme, the probabilities of error are upper bounds on the probabilities of error for arbitrary f E 9, the and same holds for normalized expected sample sizes, in the limiting case when - $1 4 0 and sample sizes are large, for which Gaussian approximations to distributions for 8 can be used. In addition, it has been shown that the pair ( f,, L,) also forms a saddlepoint for performance measured as a risk function which is a linear combination of the error probabilities and the normalized expected sample sizes, for one set of weighting or cost coefficients. The normalization of the expected sample sizes is required to obtain well-defined quantities, since the results are valid asymptotically when actual sample sizes become infinitely large. An alternative scheme for robust sequential testing in the situation described by (4.30) wasalso considered in [132]. This scheme is the robust stochastic approximation sequential (SAS) receiver. Again the test statistic is directly based on the sequence of robust fixed-sample stochastic approximation estimates of the location parameter 0 = e or , el; this estimation scheme will bediscussed briefly in Section V. It has been shown that the robust MS- and SAS-detectors have identical asymptotic characteristics. The SASdetector does not store past observations, as it uses a recursive algorithm for updating the estimate of 0 as new observations arrive. On the other hand, the MS-detector converges to i t s asymptotic performance faster and does not need an initial estimate to obtain 8. Another class of robust estimates of mathematical statistics i s the class of L-estimates, which are estimates formed as linear combinations of order statistics. One example is the median, which also happens to be an M-estimate. The a-trimmed mean is another example of an L-estimate and
lel
X,i=f?S,+N/i,
i=1,2,-..,n,
j=1,2
(4.31)
where the signal amplitude 0 is zero under the noise-only null hypothesis and 0 # 0 under the alternative hypothesis. We assume here that the Ni = ( N g ,Nj2; .., N j n ) are independent sequences of zero-mean iid noise components with common pdf f, and S = (S,,$;..,S,) is an independent signal sequence of iid zero-mean components, whose variance may be taken to be unity without loss of generality. The presence of a signal in the above model (e # 0) causes the output pairs of observations ( X l i ,X , ; ) to be positively correlated, in addition to increasing thepower level received at each sensor. increase The in the correlation value from zero is in particular due to the presence of the common random signal. Thus it is quite reasonable to restrict attention to the class of generalized cross-correla tion (CCC) detectors based on test statistics of the type TGCC(XlfX2) =
i-I
t(,%)t(X,;)
(4.32)
where t is the detector weighting function. For detection of weak signals (e2 4 0) it can be shown easily that detection efficacy is maximized with ! ( x ) = - f ( x ) / f ( x ) , which is also the LO nonlinearity for known-signal detection. In one of the earliest published studies of robust detection of random signals, Wolff and Gastwirth [29]examined several specific simple nonlinearities t for robustness of
K A S S A M A N D POOR. ROBUST TECHNIQUES F O R SIGNAL PROCESSlNC
461
performance when f is not known. A finite class of pdfs comprised of the Gaussian, logistic, and the pdf for a particular t-distribution was considered in [29]. The nonlinearities G considered in [29] were the simple three- and four-level symmetric quantizer and the amplifier-limiter or soft-limiter continuous function which is linear in an interval including the origin but constant outside that interval. The criterionof performance used in [29] was effectively an asymptotic relative efficienc): it was the ratio of the efficacy for noise density f obtained withweighting function C and the optimum efficacy for noise density f. In the definition of efficacy for this problem, 0 is now the signal strength parameter. In [I381 the detection of a random signal common to an array of receivers was considered as an extension of the two-input case. Again, no attempt was made to include a bound on the false-alarm probability as part of the performance criterion, so that the assumption was that detector thresholds could be adapted to get the correct false-alarm probability for any noise density. The performance criterion was thedetection efficacy, which as we have remarked before for the known-signal case is directly related to the slope power of the function at 0 = 0. For the correlator-array structure (generalization of (4.32) by a second sum over all pairs of sensors)an interesting saddlepoint robustness result was shown. Thiswas that for the -contaminated class Fo(g,c) describing theindependent noise component at each receiver across the array, with the same restrictions on it as in the known signal in additive noise case, the minimax robust cross-correlator nonlinearity t is exactly the same as C, of (4.24) for the known signal , case. Of course, the result here was only possible in terms of the weaker efficacy criterion above. Fig. 19 shows the
ri
fy
tions of locally optimum detectors (rather than cross-correlators) for common random-signal robust detection were also investigated, although explicit minimax robustness results could be deduced only for the special case of a contaminated double-exponential noise density, for which a hard-limiter based polaritycoincidence array detector [I391 is most robust. The polarity coincidence array detector i s based on use of the two-level sign function for C, and is well known as a nonparametric detector with a fixed falsealarm probability for zero-median noise pdfs. The class of detectors forsingle-inputadditive random signal contains no counterpart of the cross-correlator structureofthemulti-input problem.Developmentof robust detectors for this problem has paralleled that for the single-input known-signal problem. The first instance of this is due to Martin and McGath [I401 in which the optimum quadratic detector for Gaussian statistics was modified,fromintuitive considerations, into the limiterquadratic detector. Numerical asymptotic relative efficiency computations for Gaussian-mixture noise pdfs (-contaminated nominal Gaussian with largervarianceGaussian contaminating density) and finite-sample detection power and false-alarm probability computations verify the expected robustness of the limiter structure. The subsequent results in [I381 extended such structures to the multi-input or array case. Explicit minimax results for the single-input additive random-signal detection problem paralleling those for known-signal detection have been obtained more recently in [141]. Two statistical models for this problem have been investigated in [141]. In one model, the alternative hypothesis of signal presence is defined to produce additive signal and noise components in the observations, (i.e. (4.31) for j = I). In the second,scale-change, model the alternative hypothesis is defined by the condition that the pdf of the iid observation components X ; is f ( x / o ) / o , where f is the null-hypothesis observation (noise) pdf. It had been shown earlier [I421 thatfortheadditive noise modelthelocally optimum detector is based on the generalized energy (GE) detector test statistic
(4.33)
The LO statistic for the scale-change model is also given in [I 421. From knowledge of the previous results on known-signal robust detection one might conjecture that a minimax result should be obtainable for the robust detector formed by introducinghard-limiting beyond some argument value a into the nonlinearity f / f of (4.33). Indeed, consider the exponential-tails least favorable density of (4.21), with a, b defined as in (4.22) and a modified (4.23), namely
u
Threshdd
e = o or e*o
Fig. 19. Generalized cross-correlation robust detector for a weak random signal.
Decism
(4.34)
For this noise density the nonlinearity f ( x ) / f ( x ) becomes
structure of a two-element random signal detector using an amplifier-limiter CCC nonlinearity d . In 11381 other variations of the c-contamination class for this multi-input problem were also considered. Modifica-
Consider the c-contamination model S0(g,c)for the noise density f , as described in our summary ofknown-signal results. In addition to requiring a further smoothness property for g (since here $ is involved), a more restrictive
462
PROCEEDINGS
O T H E IEEE, F
VOL. 73, N O . 3, M A R C H 1985
further condition on the allowable contamination h is imposed. This is thatallowablecontamination densities are zero in ( - a , a). With these conditions, results exactly paralleling those for asymptotically robust detection of known signals in additive noise are obtained in [141]. An interesting feature of the solution, arising from the restriction on allowable h, is that (4.19) holds with equality over the allowable f and the result is valid for all values of the size a. Numerical asymptotic relative efficiency comparisons for general limiter nonlinearities are also given in [141]. A very similar result is established in [I411 for the scalechange model. Here again for a similarly restricted version of the c-contamination class, the density of (4.21) is again shown to be least favorable. A noteworthy point about this solution is that as in the case of asymptotically robust known signal detection, this scale-change robust solution is directly related to Huber's results on robust M-estimation of a scale parameter [6]. Thus we see once again a close relationship between robust estimation and asymptotically robust detection problems.
D. Robust Detection of Bandpass Signals in Bandpass Noise
ponents is obtained. Under the circular symmetry assumption the LO detector uses a generalized narrowband correlator (CNC) test statistic
n
TCNC( = k)
i=l
t'( R ; ) Re { S,,kf}
(4.39)
where the R ; = I i i l are the observation envelopes with t ' the LO envelope weighting function [I431
Bandpass signals are commonly encountered in applications such as radar and communication systems, and techniques for theirdetection in bandpassnoise with imprecisely known statistical characterizations are therefore of practical interest. Letus first consider briefly one bandpass known-signal detection problem for which an asymptotically robust detector may be defined using ideas very similar to those used for low-pass known-signal and completely random-signal robust detection problems. An observed continuous-time waveform is now described by
X ( t ) = 6 v( t ) cos [ wot
and where Si = s,, is,;. Note that f ( r ) = 2nrc(r) is the pdf of the R ; . ForGaussian noise f is the Rayleigh pdf and the LO function of (4.40) is a constant resulting in the linear narrow-band correlator (LNC) detector. In general, if the pdf f is some known function 2arg(r) the LO detector can be obtained. In [I441 thec-contaminationmodel for f was considered, with conditions on the nominal function g similar to those for known-signal asymptotically robust detection. As in the case of random-signal detection, however, the class of contaminating pdfs 2arh(r) was restricted to produce tailcontamination only, that is, the h were zero in some interval at the origin. With these restrictions, the asymptotically robust detector was shown in [I441 to be a CNC detector which is LO for a least favorable pdf f, having an exponentially decaying tail. The weighting function t'= in the test statistic TCNC(i)of this robust detector is
t',
+ +( t ) ] + N( t )
(4.36)
where the low-pass signal amplitude and phase compo, nents v( t ) and +( t ) , respectively, and the frequency q are known, the overall signal amplitude 0 being either 0 (noise only) or having some positive value (alternative hypothesis). The bandpass noise N( t ) may also be expressed in terms of its in-phase and quadrature components N , ( t ) and N,(t). For a detector operating on sampled values of the in-phase and quadrature components of the observation X ( t ) the input data may then be represented as a vector , X , + j x , = i where the components X,, and X,,, i = 1 , 2 , . . ., n, of X , and X Q , respectively, are
for a depending on c and g. Notice that the function rt',(r) is a constant for r 2 a; this function may be interpreted as providingweighting the applied to the hard-limiter narrow-band correlator (HNC) terms Re { S j ( i ? / R j ) } , the , k j / R j having unit amplitude value. For a nominal Gaussian case, the function - g ' ( r ) / g ( r ) is linear in r. Fig. 20 shows the structure of the asymptotically robust detectorfor known bandpasssignal under the above as-
eS,; + N,; xQI= es,, + NQ;.

X,; =
(4.37a) (4.37b)
Here the s,,, sol, N,,, and NQi are samples of s,( t ) = v( t ) cos +( t ) , so( t ) = - v( t ) sin +( t), N,( t ) , and No( t ) , respectively. We make the assumption that the i ;= X,, i%,, are independent for i = 1,2; .., n, implying a restriction on the sampling rate. If the noise components N,, and NQ, are restricted to be independent then this problem is essentially the same as the low-pass known-signal detection problem. A more general assumption is that the joint pdf f,, of N,; and N,, has circular symmetry, so that
Llmftw
Fig. 20. Generalized narrow-band correlator robust detector for a weak bandpass signal.
f / Q (u t
. = c( )
I.-=.
(4.38)
For Gaussianbandpass noise both circular symmetry and independence of the in-phase and quadrature noise com-
sumptions. Note that the HNC detector is a special case of this structure and may be viewed as the robust detector for a nominal g function which is exponential, or as an extreme case which actually possesses a constant false-alarm probability and functions as a nonparametric detector [145]. Further interesting properties of the robust detector employing of (4.41) are discussed in [144]. S far we have been considering the case of a coherent o
t',
K A S S A M A N D P O O R . ROBUST TECHNIQUES F O R SIGNAL PROCESSING
463
bandpass signal. In the incoherent case there is an additional phase term 'k, uniformly distributed on [0,2.n), so that the signal is v( t ) cos[w,t + +( t ) *I. The asymptotically optimum detector test statistic now becomes
l?GNc(i)12
i-1
=I
C(Ri)Zikt
where d is the optimum nonlinearity of (4.40). It turns out thattheefficacyfor such a square-law quadrature GNC detector is directly related to that of the GNC detector for coherent signals, and thus d = dR of (4.41) also results in an asymptotically robust detector in the incoherent case. In [I461 El-Sawy considered a special case of the incoherent signal detection problem in which v ( t ) , the low-pass signal amplitude, was a constant, and + ( t ) , the low-pass signal phase term, was - ~ / 2 (or, equivalently, zero).The in-phase and quadrature noise samples were assumed to be independent anda robust detector was obtained which used the squares of robust M-detector outputs for the separate in-phase and quadrature observations. Specific results were obtained for the p p o i n t classes of univariate noise pdfs. A more general problem involving signals with random parameters has been considered recently by Kelly in [147]. Another important variant ofthe observation modelof (4.36) is obtained by allowing a random signal amplitude 0 = A in addition to the random phase 9.Theusualassumption for the signal is that A and are independent, with A being a Rayleigh random variable and \k being uniformly distributed on [0,2v). In [I481 the performance characteristics of a particular modification of the optimum detector for Gaussian noise are considered. The modification consists of replacing the squarers of the optimum detector with limiter-squarers for robust performance. Both the single bandpass-pulse detectionsituation and that of multiple independent bandpass pulses is considered. In the latter case, a binary integration or double-threshold detector is considered. The structure of this detector is shown in Fig. 21. The numerically computed performance characteris-
12
(4.42)
considered in [I401 and [141]. We should note finally that Martin and Schwartz also considered in their study of robust detection [30] the detection of multiple independent pulses. The robust structure they suggested used a limiter function on the sampled in-phase and quadrature observations inof the side a digital or discrete-time implementation quadrature single-pulse matched filters, followed by a square-law envelope detector. We have described in these last three subsections the main results on asymptotically minimax robust detection for three canonical signal detection situations. A general theory for robust detectors in the asymptotic case (vanishing signal strengths, infinitely large sample sizes) is discussed by Wolff and Sullivan in [149]. Here asymptotic normality of statistics of the form of(4.14) is studied for four detection problems; namely, known signal in additive noise, two random signal problems (onewith a scale-change in the noise density, another with an additive random signal, under the alternative hypothesis), and envelope detection of a narrow-band signal. In addition, the general characteristics of the robust solutions are considered, which are not restricted only to the univariate case, and a few explicit uncertainty models and robust detector structures are discussed. This work essentially considers only a performance measure related to the efficacy in the known signal case, and defines a general notion of Fisher's information; simultaneous consideration of a false-alarm probability constraint is not attempted in [I491.
E. Extensions and Other Results
v(t~un(w~t++(t))
Fig. 2l. Robust detector for independent random amplitude and phase bandpass pulses in additive noise.
tics indicate that, for the independent -contaminated inphase and quadrature bandpass matched filter output noise components that were assumed, the limiter-squarer structure is very effective in guarding against drastic performance deterioration in heavy-tailed non-Gaussian contamination. The structure considered in [I481 was suggested by the many formal results which we havediscussed that show the robustness of limiter-type structures in different situations. In fact, the problem considered in [I481 may be considered as being a bandpass version of a random signal detection problem, for which robust detectors have been
While we havesurveyed the basicresults of robust hypothesis testing and robust signal detection in the above subsections, several further results in this area deserve some comment. We have seen that explicit minimax robust structures can be derived under some rather specific and sometimes restrictive assumptions. In this subsection we will also mention some work that has been directed at easing some of these constraining assumptions. Signal Uncertainty: All the results that we have discussed in this section were obtained for robust detection when only the noise pdfs are not precisely known. Consider, for example, the known-signal detection problem of Subsection IV-6. In our discussion of the asymptotically robust detector of the generalized correlator type with test statistic of the form (4.14), the choice a, = si for the coefficients in the test statistic was the obvious one for knownsignal detection. But it is also possible to consider here uncertaintyaboutthe exactvalues ofthe signal components s,. Notice that this problem was considered in Section Ill for robust matched filters and SNR performance; however, here the CC detector is not necessarily linear and we have been concerned with weak-signal detection performance. In [I501 Kuznetsov considered this signal uncertainty problem, using the differential SNR(see discussion following (4.15)) as a performance measure. For known noise pdf and e, signal uncertainty class the result is quite similar to that for the robust linear matched filter of Section Ill, because the differential SNR i s after all a performance measure derived from the SNR.The random-signal detectionproblemwith signalcovariance matrix uncertainty is also considered in [ISO]. For joint uncertainty in signal
4.64
PROCEEDINGS O F THE IEEE. VOL. 73, NO. 3, MARCH 1935
characteristics and noise pdfs solutions for the minimax robust structures are difficult to obtain in the general case. The special case where the noise is Gaussian with uncertain covariance matrices for the signal and noise has been treated in [150], with the detector restricted to employ a combination oflinear and quadratic statistics. A similar problem involving deterministic signals and linear detectors has been considered by Kuznetsov in [99], as mentioned in Section Although a consideration of the asymptotic performance of detectors allows appealing minimax robust detector structures to be derived, a major limitation of the schemes we have described is that the asymptotic robustness property can be attributed to them only under the assumption of independence of the detectorinput samples.This assumption is generally difficultto get around, serialdata dependence introducing considerable complications in the analysis. One of the problems, of course, is the definition of appropriate statistical models and uncertaintyclasses. Recently, however, some progress has been made in the specification of asymptotically minimax robust detectors for serially dependent data samples. In [I511 Poorconsiders the case of weakserial dependence of data in a constant-signal M-detection problem. The dependence structure considered is that obtained from a moving-average model for the noise components. In particular, the noise model
Ill. Serial Dependence and Asymmetry:
tor detectors, and with efficacy as the performance criterion and an explicit requirement of false-alarm probability control, the form of the robust detector nonlinearity was derived. The very interesting conclusion was that, subject to some regularity conditions, the robust detector nonlinearity is a null-zone modification of the independent-data robust detector nonlinearity for the same class of univariate noise pdfs. For the nominal Gaussian noise pdf the form of this nonlinearity i s shown in Fig. 23. It is interesting to note that
iR"'
Fig. 23. Generalized correlator detector nonlinearity for robust detectionofa signal in <-contaminated correlated Gaussian noise.
(4.43)
is considered where the { Y } are iid with uncertain marginal pdfs. For p = 0 the approaches described previously are applicable; in [I511 the case of weak nonzero correlation is considered. For the -contaminated pdf classes, it is shown there that the robust M-detector nonlinearity is the limiter function which is robust for p = 0, corrected by a linear term. Thus for the nominal Gaussian case a nonlinearity t', of the form illustrated in Fig. 22 is suggested; more detailed
Fig. 22. M-detectornonlinearity for robust detectionof a signal in <-contaminated, weakly correlated, Gaussian noise.
considerations modify this so that the nonlinearity remains bounded. These results are similar to those derived earlier by Portnoy for robust estimation with dependent data [152], [I 531. More recently Moustakides and Thomas [I541 have considered a less structured dependence assumption for the known-signal detection problem. The additive noise sequence was assumed here to be +-mixing, which includes the case of stationary Markov noise sequences. For the c-contamination model for the univariate noise pdf model, with attention confined to the class of generalized correla-
a simplethree-levelnonlinearity which approximates this characteristic can be used to provide nonparametric performance for symmetric noise pdfs [155]. The result of Moustakides and Thomas represents a major breakthrough in the extension of asymptotically robust correlator-detector structures for dependent data,and similar resultscan now be sought for random- and bandpass-signal detection problems. Another aspect of robust detection in dependent data has been considered recently by Martin [156], inwhich autoregressive dependence and a regressionsignal model which includes the known signal and random-phase bandpass signal cases assumed. suggests are He that, in this situation, robust M-detection is best accomplished by first prewhitening the noise and then applying an M-estimate to obtain the test statistic. The asymptotic theory of robust detection, fundamentally related to Huber's results on robust parameter estimation as it is, is limited by the same factors that have constrained the applicability of robust parameter estimation theory. In addition to serial independence of data one other assumption which has been generally required, atleast in thewidely used c-contamination model for uncertain pdfs, is symme try of the pdfs of the noise. In estimation theory this results in unbiased estimates for which the variances can be writtendown as second moments. Recently, an attempt has been made in [I571 to apply to robust detection of known signals ideas formulated for robust M-estimation for classes of density functions allowing tail asymmetry. Specifically, [I571 considers both the generalized correlator structure of (4.14) and the M-estimate structure (4.29) for detection test statistics, and develops the form of the robust detection function t', in (4.14) and (4.29) for classes of c-contaminated nominal densities with arbitrary behaviors outside a central interval. The development is based on the work by Collins on the corresponding M-estimation problem [158], and shows the robustness of detector functions which redescend to zero and remain zero outside the interval of symmetry. Two general limitations of minimax robust detection the-
POOR. KASSAM AND
SIGNAL PROCESSING
465
ory should by now be clear. These are that a considerable part of the theory that has been developed is asymptotic theory, and that it is currently unable to deal directly with continuous-time observation processes. It should be kept in mind that here our concern is with uncertainty in the noise pdfs, and not just with power spectral density uncertainties and linear schemes which we discussed in Section ill. Care has to be taken in applying asymptotic theory in practical situations involving a finite number of observations, since in some cases predicted asymptotic performance is approached rather slowly. The Gaussian approximation for test statistic distributions in setting threshold values should be applied with care, even if the structure of the robust test statistic appears to have strong justification. in addition, another aspect of asymptotic formulations is that they are based on the assumption that signal strengths are approaching zero. Thus the effect of a nonzero signal and a finite sample size together could result in deviations in performance from theoretical predictions, even for large samples.* Since closed-form analytical results are rarely feasible, about the only alternative left is numerical computations and simulations to verify the actual performance of such robust schemes in general. Continuous-time results are very difficultto come by because of convergence issues and also because ofthe difficulty in specifying models to define simple classes of allowable random processes which are physically meaningful. Of course, if only second-moment characteristics are modeled and exact probabilitydistribution functions are irrelevant, as in maximizing SNR at the output of a linear detector, robust filters such as the robust matched filter can be obtained. Although [I601 introduces classes of mixture or contaminated random processes, there has been no application made of this in signal detection with continuous-time observations. However, Kelly and Root [161], [I621 have recentlycombined some of Roots original stability ideas with a minimax-type design philosophy to develop robust schemes for continuous-time processes. Adaptive and Nonparametric Detectors: in a general sense, a detector canbesaid to be robust if it has good (close-to-optimal) detection performance under nominal conditions and if it also maintains an acceptable level of performance when the noise statistics deviate, within some allowable class, from the nominal. Our focus in this paper has been on fixed detectors designed to provide minimax detection schemes, which generally possess the above two characteristics. Another approach which can be taken when the noise environment statistical characterization is imprecisely known, or is largely unknown, is to use adaptive procedures. in most adaptive procedures one generaljy begins with a specific test statistic structure in which some parameters are free to beset and updated as functions of previous inputs, which might include separate training data. in addition, the threshold is also usually free to be set adaptively. We have mentioned earlier that even for fixed minimax robust detectors the thresholds may have to be set adaptively to maintain false-alarm probability requirements whenever the performance criterion does not explicitly include such requirements. Since our focus here has been on minimax robust fixed-test-statistic structures we shall
For a recent contributionon see Moustakides [159]. nonlocal asymptotic robustness
not try to give here an exhaustive survey of adaptive robust detectors, but will mention only some of the main recent contributions. One general structure in the known-signal detection problem is obtained by requiring the function inthe f generalized correlator test statistic of (4.14) to be an m-level quantizer characteristic. Such a quantizer-correlator detector may be viewed as partitioning the observations X ; into m subsets or intervals to each of which a distinct level is assigned. For a given noise pdf the asymptotically optimum quantizer characteristic canbe found for this and similar detection problems [163]. if, however, the noise pdf is not known one may use estimated values of these optimum parameters. Alternatively, the quantizer breakpoints may be required to fall at the quantiles of the noise distribution, which can be estimated and updated. Adaptive m-interval partition detection schemesbased on this idea have been described by Kurz in [I641 and for sequential detection by Dwyer in [165]. Other studies of this nature can be found in [I 661-[I 701. Recognizing that contamination of a Gaussian noise density function by an impulsive-noise component, and heavy-tailed densities in general, are reasonable models for random noise and interference in several applications [118], [I711 adaptive structures are considered by Modestino and Ningo in[143], [I721 for good performance over such classes of noise. The simple structure considered in [I721 for the detection of a known signal in additive noise is illustrated in Fig. 24. i t is motivated by the fact that for Gaussian noise
H a d Llrnlter
Fig. 24. Adaptive detector for signals in a mixture of Gaussian and impulsive noise.
the linear-correlator test statistic is optimum, whereas for heavy-tailed noise pdfs often used to model impulsive noise the sign-correlator test statistic performs very well. The mixture test statistic considered in [I721 is T&(X) of (4.14) with a, = s, and
(4.44)
where the free parameter y is set adaptively. Modestino in [I 721 discusses a stochastic approximation technique maximizing the SNR for setting the value of y . A similar robust detector structure based on adaptively forming an optimum test statistic combination is described in [I431 for bandpass signal detection. An alternative detector described by Milstein, Schilling, and Wolf in [I731 uses extreme-value theory to obtain the proper threshold setting in an otherwise fixed matched-filter envelope detector structure. Although adaptive schemes can be useful in situations where the noise statistics are unknown or are nonstationary, the implementation of efficient adaptive schemes
466
P R O C E E D I N G S OF T H E IEEE. VOL. 73, NO, 3, M A R C H 1985
can add in a major way to the complexity of a detector. Furthermore, other considerations such as speed of convergence may limit their applicability. In nonparametric detection the main concern is that the probability of false alarm remain bounded by some design maximum value for broad classes of noise statistics, for example for all univariate noise pdfs which are symmetric about the origin. Once again it is possible to consider adaptive structures for nonparametric detection, which are mainly of the adaptive threshold type, although fixed-structure andfixed-threshold nonparametric schemes are the most common. For a given class of noise pdfs there generally existseveral possible nonparametric detectors, and in choosing between alternatives one usually considers detection performance atsome nominal cases within the class. The most common nonparametric detection schemesare those based on signsandranks of observations, including multilevel (quantization) versions of sign-based schemes. It often turns out that nonparametric detectors are robust, in the more general sense, in their detection performance over the classes of noise pdfs for which they are designed. Note, however, that the primary aim of nonparametric schemes is to keep the false-alarm probability bounded by any desired value, whereas robust schemes are required to additionally exhibitgooddetection performance for the whole class instead of at some nominal operating points only. In this regard the reader is referred to some comments by Huber in [20]. For survey papers, a bibliography, and details of nonparametric detectibn schemeswe refer the reader to [I741 and [175]. In conclusion, we point out a recent Russian survey paper by Krasnenker, available in English translation [33] which is on the subject of robust detection. Another similar survey by Ershov [25] is concerned with robust estimation, although it also covers some detection and hypothesis testing results. VandeLinde [32] gives a short survey of robust techniques in communications which covers robust detection. Finally, Poor has given a more recent survey, emphasizing some of the mathematical details, in [34].
tion V-B we consider robustness in other estimation contexts including system identification andKalman filtering. We also give brief mentionto problems of robustness in other methods of time-series analysis.
A. Robust Estimation of Signal Amplitude
Many signal processing applications arising in practice fall within the category of parameter estimation. A common example is the estimation of the amplitudeof a signal embedded in additive noise. A standard model for this particular situation is that we have observations X = ( X , , . . ., X,) given by
X ; = N,+es,,
; = I ,,.n . .
(5 .I)
where ( N , ; . . , N,) is an iid noise sequence with symmetric marginal pdf f , (s,;.., 5 ) , is a known signal waveform, and the signal amplitude 8 is unknown and is to be estimated. The particular situationof (5.1) inwhich the signal is constant (s, = s2 = . .. = s = 1) is the location estima, tion problem of statistical inference, and it was this problem that was studied in the seminal paper of Tukey [5] that demonstrated the lack of robustness of the classical estimators for this parameter, and in the pioneering 1964 paper of Huber [6] that established the importance of minimax methodology for robust estimation. In the following discussion of the basic results in this area, we will consider this constant signal case unless otherwise noted. Modifications for time-varying signals will be mentioned where appropriate. Threeclassicalestimators for thelocation parameter of (5.1) with constant s, are the sample mean
1 X=--CX,
i-I
the sample median, med { X , ; . . , X , } , likelihood estimate (MLE)
and the maximum-
v.
NONLINEAR METHODS ROBUST ESTIMATION FOR
In Sections II and Ill we discussed robust linear methods for estimation and detection within uncertain second-order models. In Section IV we saw that, when the uncertainty is described in terms of the distributional model rather than the second-order model, nonlinear methods are called for to provide robustness in signal detection. Similar considerations arise in problems of estimation in uncertain distributional models, and in this section we discusssome of the main issues arising therein. Since many of these issues have been surveyed and unified elsewhere (see, e.g., the book by Huber [28], and the surveys by Martin [37], Ershov [25], and Poljak and Tsypkin [38]), we touchonly briefly on the essential ideas of this area. The large majority of workin this area has been concerned with robust nonlinear parameter estimation rather than with robust nonlinear filtering. In Subsection V-A we outline basic methods for (minimax) robust parameter estimationby considering the important special case of robust estimation of signal amplitude. Some of these methods have already been mentioned briefly in Section IV, since they are closely related to robust detection. In Subsec-
Thesample mean and the M L E coincide for the case in which f is a Gaussian density
1 e f( x ) = - - x 2 / 2 0 2
G U
and the sample median and the M L E coincide when f is a Laplacian (double-exponential) density
All three of these estimators are examples of a more general class of estimators, known as M-estimators, proposed by Huber in [6]. As discussed in Section IV, this class of estimators consists of those estimators of the form
I
where L is a function determining the estimator. Thesample mean corresponds to the choice L ( x ) = x 2 , the sample median corrsponds to L ( x ) = (XI, and the M L E corresponds to L ( x ) = -log f ( x ) . Note that 4, of (5.3) is the estimate of 0 that best fits the data when errors, X ; - 8 , ( X ) , are weighted with the function L .
K A S S A M A N D POOR: ROBUST TECHNIQUES FOR SIGNAL PROCESSING
467
Assuming that L is convex, symmetric about the origin, and sufficiently regular, the M-estimate based on I is consistent and has the property that 6 ( 8 , ( X ) - 6 ) is asymptotically Gaussian with zero mean and variance V(d, f ) where /e2 f \ 7 (5.4) V (fd , A 7 )
To correct the possible nonrobustness of classical estimators oflocation, Huber proposed the design of location estimators using a minimax formulation, viz. min max v( d , f ) . c fF
(5.9)
Within someassumptions on 9, solution to (5.9) is the given by the influence curve dR = - fL/fR where f, is a least favorable density for location estimation defined by
with d = I (see [6] for details). Thus estimators of this type can be compared on the basis of their asymptotic variances V(d, f ) . For now we will call the function d the influence curve of 8, although this term will be given a slightly different meaningbelow. For fixed f = f,,, the Schwarz inequality implies that
V ( d , f , , ) 2 V(d0,fO) =I(
I
(5.10)
The pair ( d R ,f R ) is a saddlepoint solution for (5.9) provided -log f, is convex; i.e., under the convexity of - log f,, (e,, f R )satisfies
v(
f, d
v( / R ,
fR)
v( e, f R )
(5.11)
where do = and where I( f ) = /( f ) 2 / f is Fishers information for location. (Note that the inequality V ( t , f,) 2 I / / ( h) is a special case of the Cramer-Rao lower bound.) This do corresponds to Lo(x) = -log &(x); so, assuming - log 6 is convex, the most efficient M-estimate of location is the maximum-likelihood estimate, a well-known result. If, as in the signal-detection problems discussed in Section IV, f is not known precisely but rather is known only to be in some class 9of bounded symmetric pdfs, then it is possible thatthe performance of an improperly designed location estimator can be quite poor. For example, suppose we consider the r-contaminated Gaussian class
.F=
c/f,,,
f,,)
(5.5)
for all f E 9 and all d . Note that (5.11) implies that, not only does d, have minimax asymptotic variance but also its variance is upperbounded over 9 by its variance when f, is the true density. This means that the performance of the estimator will never be worse than V(d,, f,) over F. Also note that, since I( f ) is Fishers information for location, f, has the interpretation of being the density for which the observations are least informative about 6. For the particular case of -contaminated Gaussiandata (i.e., 9 from (5.6)), the least favorable density turns out to be given by (4.21) with g = as discussed in Section IV. In this case, dR is a soft-limiter
{ f l f = (1 - c ) +
+~
h , h
E X }
(5.6)
where
where a is from (4.22) and (4.23), and so !( ) ,x d a, and !( ) ;x = 1 for 1x1 < a and /:(x) = 0 for 1x1 > a. Thus the numerator of V(d,, f ) is
X is the class of all bounded symmetric pdfs, and E E [0, I]. The optimum estimator based on the nominal model is the samplemean, which corresponds to /(x) = /,,(x) x and so has asymptotic variance
ti(x) f( x)
dx = (1 -
)jm x) dx !;(x)+( + E j-m h( x) dx /;(x)

-m
m
V( dm,,, f ) =
Ox2f( x) dx /0
-m
= (1 - E )
+ 1x2h( x) dx.
(5.7)
d (1 - c ) j m d;(x)+(x)
-m
dx
+ Ea
(5.13)
-m
For c > 0 the asymptotic variance (5.7) can be arbitrarily large since h is arbitrary, and so the sample mean is a very nonrobust estimator of location for this type of uncertainty. The basic problem with the sample mean is that it has an unbounded influence curve so that too many large observations (or outliers) can destroy its efficiency. This could be corrected by employing the sample median, which has the influence curve x,! ( ) , A sgn(x). For the sample median the asymptotic variance(assuming f is continuous at the origin) is
V( dm,, f ) = 1/4f2(0) d
and in the denominator we have
I-r,/;(x)
f( x) dx = / a f( x) dx
-a =
(I - c)/:$J(
x) dx
h( x) dx
-a
j-
d;(
X)
fR( X) dx.
(5.14)
1 4(1
I I
- t)+(O)
2(1 - E)
So, the sample median is certainly more robust thanthe sample mean in this case; however, its efficiency relative to the sample mean is only 64 percent at the nominal model. Ideally, we would like an estimator that is almost as effimean at the nominal and has the cient as the sample robustness of the sample median away from the nominal. It turns out that these goals can be achieved as we see below.
468
From (5.13) and (5.14) the left-hand inequality of the saddlepoint condition (5.11) is readily seen to hold inthis case. (The right-hand inequality of (5.10) is just a restatement of ( 5 3 ) Similarly, for the r-contamination model with replaced by any density g (with - logg convex) a saddlepoint solutionto (5.9) is given by ( - fL/fR,f,) with f, from (4.21). Note that the influence curve (5.12) of the robust estimator for r-contaminated Gaussian noise combines features of the influence curves for the sample mean and the sample
PROCEEDINGS OF THE IEEE, VOL. 73, N O 3, M A R C H 1965
influence curves for robust estimation in these models are shown in Fig. 25(b) and (c), respectively. Appropriate modifications ofthe influence curve for robust estimation in dependent noise and for asymmetric noise have been considered by Portnoy [152], [I531 and Collins [158], respectively. Also, the estimate of (5.3) is straightforwardly modified to account for the time varying signal ;, . ., s via s ,
Assuming
n
this estimate has the same properties as in the constant-signal case except that the asymptotic variance is V(!, f ) / C . Thus the minimax robustness problem for the time-varying case is identical tothat for the constant-signal case [131].13 From a practical viewpoint,the robust M-estimator of signal amplitude has the disadvantage of being a "batch" procedure; i.e., all data must be stored in order to minimize
n
i-I
iteratively. This disadvantage can be overcome by considering a class of recursive estimators of the stochastic approximation type. Robustness theory for this type of estimator was first considered byMartin [I781 and these ideas were developed further by Martin and Masreliez [I761 and by Price and VandeLinde [179]. To discuss these results, we first consider a generalization of the minimax problem of (5.9)
aEefEF
median (see Fig. 25(a)). It has the boundedness of,,e , but it has the linear shape of for 1x1 d a. (This eliminates the sensitivity of to the value of f at zero.) For the case E = 0.1, the limiting point a is approximately 1.1 and the efficiencyofthe robust estimator at thenominalmodel relative to the samplemean is 92 percent. Moreover, the worst case variance of f K over 9 is V(!,, f,) G 1.5 compared with
min sup v( 8 , f )
(5.15)
e,,,,
e,,
where 8 is the class of all asymptotically unbiased estimators of B and where v(8, f ) is the asymptotic variance of 8. A sufficient condition for 8, to solve (5.15) is that it, together with some f, 9, satisfy the saddlepoint condition
sup v( dm,,, f )
fS
1.9.
Thus we see that this estimator does indeed achieve the goals set forth for robust estimation. Robust M-estimators can be found for uncertainty classes other than the -contaminated class by minimizing Fisher's information over the uncertainty class and then choosing the estimator accordingly. It is not necessary that the uncertainty class containonly distributions with density functions, although the least favorable will always correspond to a continuousdistribution. A variational methodfor minimizing Fisher's information i s discussed in [28, ch. 51. Examples of useful uncertainty classes for which solutions are known are p p o i n t classes, which consist of the set of all noise distributions that place a fixed amount of probability on a given interval [I761 and the class of noise distribution functions F that satisfy sup
--m<x4m
for all 8 E 8 and f 9. possible that (5.9) and (5.15) It is have different solutions since, in (5.9), consideration is restricted to M-estimates. However, since the Cramer-Rao bound implies v(8, f,) 2 I / / ( f,) = V(!,, f R ) for all 8 E 8, we see that the minimax robust M-estimator alsosatisfies (5.16) and so is in fact a minimax robust estimator among all asymptotically unbiased estimators. However, there is also another saddlepoint (8,, f,) for (5.15) in which 8, is not the M L E for f, but rather is a recursive estimator. In particular, suppose 9, f,, and are as in the robust M-estimation formulation and let 8 F ( X ) = 8,, be defined for each n by
! ,
(5.17)
with do arbitrary. Note that 8 is computed recursively 9 (i.e., at the i t h sampling instant 8, is computed from and %,). Then under regularity conditions the asymptotic
I3The problem in which s1;.., s is also unknown and to be , estimated has also been considered in several studies (see for example, Tsybakov [177]).
IF(x)
-@(,)I
where @ is the unit
Gaussian distribution function
[6].
KASSAM A N D POOR: ROBUST TECHNIQUES F O R SIGNAL PROCESSING
469
variance of
8 for a given noise density p
f is given by [I791
where F, i s the sample cumulative distributionfunction (cdf)

F,(x)
&-(#X;
1 n
<X),
--OO
<X<
-OO
(5.22)
Note that optimum for f , .
~(82, = I//(so that 8? f,) f,)

.(d?,f)
d v(8?,f,)
is asymptotically Moreover, it can also be shown that
and where T is some functionalmapping the set of cdfs into I?. For example if we define
(5.19)
for all f E F , that so f is also a saddlepoint solution , ) to (5.15). Thus we have the interesting conclusion that the recursive estimator of (5.17) is minimax robust over 9 just as is the M-estimator based on C,. However, although both estimators have the same worst case performance, I//( f,), one can see that their performances ( V(C,, f ) and f)) generally differ for given f # f,. For the c-contaminated Gaussian case it is easily shown that f ) > V(C,, f ) whenever
(e?,
then the M-estimate

/ m
m
8,
is T(F,) since
L( x -
e)
dF,( x ) = - " L ( 1
n 1-1 .
X;
-8).
.(e?,
~(82,
If we assume that L is convex and symmetric about zero and if F, is the common marginal cdf of X , = Ni + 0, then for (5.23) we have T ( 6 ) = 8. For a wide class of estimates of the form T ( F , ) it turns out that, when F is the true data distribution, h( T(Fn) - T ( f ) ) is asymptotically normal with zero mean and variance
A( F , T ) =
/:ah( x ) d x > 0.
Thus for the computational advantage of the recursive structure one pays the penalty of some lost efficiency when worst case conditions are not present. However, for example at c = 0.1 this lost efficiency is negligible (< 1 percent) at the nominal density +. The appropriate modification of the algorithm of (5.17) for the time-varying case is
'i=';-l
Jrn x , F , T)]' [IC( -m
dF( x ) .
(5.24)
Here IC is the influence curve of the estimate (the reason for the earlier use of this term will become obvious below) defined by
IC( x , F , T ) = Iim
f+O
T((1 - E ) F
+ cFJ
E
- T( F )
(5.25)
where F, is the cdf of a random variable taking on only the value x.14 The influence curve for an M-estimate dL is
+ s ~ c R ( X ; - 8 i - , s ; ) / [ ' ( f R ) ~ , r ~ ] ~
IC( x , Fo, T ) =
(5.20)
i = 1,2;..,n.
- /C!(
e) x - e) dF,(
X
(5.26)
x)
This estimator has not been studied analytically; however, simulations forthe case of a sinusoidal signal in c-contaminated Gaussian noise indicate that the performance of (5.20) is comparable to that for the constant-signal case (Martin [178]). Heuristically, in either this or the constantsignal case robustness against contamination is achieved by inserting thesoft-limiter C into the feedback loop that , incorporates the residual X i - 8j-1si into the updates of the amplitude estimate (see Fig. 26). If it were not for this limiter, then when 8; gets close to the true 19 "too many" large noise values would tend to drive 8, away from 0.
which gives the asymptotic variance V(C, f ) of (5.4). Other useful classes of estimates of the form T( F,) are the so-called L-estimates, which estimate 0 by linear combinations of the order statistics (i.e., the observation sequence put in numerically increasing order), and /?-estimates which are based on the ranks of the data. Explicit representations of these type estimators in theform T(F) and the correspondinginfluence curves can be foundin [28]. For any fixed noise density, an asymptotic variance equal to I//( f) can be achieved within each of these classes. Of particular interest in the class of L-estimates is the so-called a-trimmed mean, which estimates 0 by first removingthe [an] largest and [an] smallestsamples (0 d a < $), and then computing the sample mean of the remaining sample. The a-trimmed mean can be written as Tu(Fn) where
1 f - " F 1 ( xd) x Tu( F ) = 1 - 2a
(5.27)
Fig. 26. Robust recursive signal-amplitude estimator for e contaminated Gaussian errors.
where F-' is defined by F ' ( x ) = inf { y l F ( y ) 2 x } . For a symmetric noise distribution FN the influence curve of the a-trimmed mean is given by (5.28)
The robustness properties of types of estimators other than the M-estimators and stochastic approximation estimators can be studied in a general framework. In particular, for the constant-signal case, many estimators of practical interest can be written in the form
an
where
14Note that, in addition to its role in the asymptotic variance formula (5.24),the influence curve characterizes the sensitivity of T(F,) to the incorporation of a datum x into the estimate; i.e., it quantifies the influence of such a datum on the estimate. Hence, the term influence curve.
PROCEEDINGS O F THE IEfE.
= T(Fn)
(5.21)
470
VOL. 73, NO. 3, MARCH 1985
Fig. 27. Application of robust amplitude estimation to image enhancement. (a) Original image. (b) Noisy image (Gaussian background noise with impulsive outliers). (c) Running mean processing. (d) M-estimate processing
(5.29) Note the similarityof this influence curve to that of the robustM-estimator for -contaminated Gaussiannoise. In fact, by choosing
= jdm f,( x )
dx
where f, is the least favorable density from (4.21) and a is the limiter breakpoint from (4.22), (4.23), we see that I/C(x, Fe, TI < IIC(x, F R , 8 , Tall where , )
FR,B(X)
j-;
f,(
Y - 0 ) dY.
It follows that this a-trimmed mean is also a minimax robust location estimate for -contaminated Gaussian noise. Unfortunately, there is noknown time-varying analog to the a-trimmed mean that carries its minimax property. A different, intuitive, notion of robustness of an estimator is that small changes in the dataset x1;..,xn should not change the estimate much. (This notion is sometimes called resistance.) This property is assured for an estimate T(f,,) if the functional T is continuous. Thus one way of defining robustness of an estimator ofthe type T(f,) at a given nominal F, is in terms of the continuity of Tat f,. A related robustness notion arises if we view an estimator 8 ( x ) as a mapping from the marginal distribution of the observations X; . . , X , , to the distribution of the estimate 8 ( X ) . Robust,
K A S S A M A N D P O O R . ROBUST TECHNIQUES F O R S I G N A L P R O C E S S I N G
ness can be formulated in this context by requiring this mapping to be continuous in some way; i.e., a small change in the data distribution should cause only a small change in the distribution of the estimate. It turns out that, for estimates of the form T( f,,),these two continuity definitions of robustness are equivalent (within the proper definitions of continuity). The notion of robustness as a continuity property was introduced in the context of parameter estimation by Hampel [3], although similar ideas were set forth in the context of signal detection in Roots earlier study of stability in detection [2]. Robustness of this continuity type is usually known as qualitative robustness. The robust estimators discussedabovecan be extended to the estimation of signal parameters other than amplitude (see, e.g., Huber [28], Kelly [147], and El-Sawy [146]). Robust estimators have found numerous applications in statistics and signal processing. For example, location M-estimates, a-trimmed means, and modifications thereof have been applied successfully to the problem of image enhancement by Bovik, Huang, and Munson [I801 and by Lee and Kassam [181], [182]. This is a natural application for robust methods since image noise typically consists of a Gaussian-like background with occasional impulsive or spiky components. For example the imagS5 of Fig. 27(a) is shown corrupted by a combination of additive Gaussian and impulsive noises in Fig. 27(b). The effects of smoothing this image with a running mean (analogous to the sample mean) and with an M-estimate are shown in Figs.27(c) and (d), respectively.
The images in Fig. 27 were produced using the facilities of the GRASP Laboratory, Department of Computer and Information Sciences, University of Pennsylvania.
471
The beneficial effects of robustifying the estimate are quite dramatic in this case. Another interesting investigation of the use of robust estimators, in a digital F M receiver, has been considered by Abend et a/. [183].
B. Robust Nonlinear Recursive Filtering and Identification

One of the most commonly usedsignal processing algorithms is the discrete-time Kalman filter, which is based on the linear observation model
Y, = HX ,,
scaling and transformation of the residual is necessary both in modeling the uncertainty and in processing the residuals for robust estimation. In particular, for p p o i n t uncertainty, the robust version of (5.32), (5.33) when the prediction errors X,, - XnI,-,areGaussian but the innovations (Y, H,X,ln-,) have uncertain distribution (this case corresponds to uncertainty in the observation noise) is ofthe form
,1'"
= i n l n - 7 + MnH,TTr*(vn)
(5.34) (5.35)
+ V,,
n = 0,1,2,...
(5.30)
with
Xrl+ll" = F,l, X ,
where, for each n, Y, is an r X 1 observation vector, H, is the observation matrix, X is the n x 1 state vector, and V, , is the observation noise; and on the state model X,+,
=
vn =
T,( V, - ~ n i n I n - 7 )
(5.36)
F,X,
+ W,,
n = 0 , 1 , .. .
(5.31)
where F, is the one-step state transition matrix and W, is the state noise. Assuming that X { W,}:-,, , and { are all Gaussian, are independent of one another, and that { W,}?=, and { V,}?=,are sequences of independent vectors, the optimum (MMSE) estimators of X and X,+, given , V,; . ., Y, are given recursively by the well-known Kalman filtering algorithm
V,}:-,
and
(5.33)
where R = cov(W,), , M,, = F,-,P,-,F~-, + Qn-,, Q, = COV( V,), and Pn = cov(X,,, - X,,). P, is found recursively from a standard formula [I&]. The relations (5.32) and (5.33) are called the measurement update and time update, respectively. O f course, the linearity of (5.32) and (5.33) follows from the assumptions of Gaussian statistics for the state and the observation noise. If either of these quantities has a distribution that deviates from this nominal assumption in that it allows an unexpected number of largeobservations, then the Kalman estimator may perform poorly. This may be seen from (5.32). In particular, the prediction residual (Y, H,~nln-l) will containoutliers if either of the Gaussian assumptions is violated towards a heavier tail behavior. Since this residual is fed directly into the estimate, a distorted value can cause severe degradation of the estimation performance; and, because of the dynamics, such errors will propagate. However, there is an additionaldimension to this problem in that outliers in the state are of interest and should be tracked rather thanlimited, and so additional considerations arisehere. A method for robustification of the Kalman filter against modeling uncertainty has been proposed by Masreliez and Martin in [185]. In view of the ideas described in the preceding subsection, a natural way to try to protect against the detrimental effects noted in the above paragraph is to somehow place a limiter or similar nonlinearity in the feedback loop of the filter (5.32), (5.33). This wouldlimit the effects that any outlier could have on performance, and this, in fact, is the way robustness is achieved in this model if all uncertainty is in the observation noise distribution. However, because of the vector nature of the Kalman filtering problem, some
where T, is a scaling transformation described in [I851 and where 9 ( v , ) is the vector whose j t h component is the j t h component v,,~ of v, replaced by $ ( v , , ~ ) , fP beingthe , influence curve for robust location estimation in the p p o i n t model illustrated in Fig. 25(b). The error covariance of this estimator forppoint uncertainty is always less than its value whenthe components ofthe transformed residual vector are iid with the least favorble marginal distribution. Moreover, this worst case covariance is the optimum covariance for the case in which the transformed residuals have this least favorable property. However, this estimate does not provide a saddlepoint for the mean square estimation error because of the constraints of the linear model as is discussed in [185]. For the case in which the observation noise is Gaussian but the residual distribution is uncertain due to uncertainty inthe state distribution,the Kalman filter is robustified differently than for the case of uncertain observation noise. In particular, without going into detail, the measurement update incorporates Y, linearly into the estimate of in,, but the transformed residual v,, is limited. The linear incorporation of Yn is done because an outlierin Y, indicates an outlier in X,, which should be incorporated into the estimate of X O n the other hand,an outlier in v, does not . , necessarily indicate a bad prediction (inI,,-, and so should not be treated as such. For further details and simulation results for robust Kalman filtering, the interested reader is referred to [185]. Related results can be found in VandeLinde, Doraiswami, and Yurtseven [I&], Ershov and Lipster [187], Ershov [188], Levin [189], and Tsai and Kurz [I%]. A problem related to Kalman filtering is that of system parameter identification. In this problem we have a set of scalar observations V,;.., Y, and a set of system input vectors X . . , X , that are related through the equation ; ,
~=s(c,X,)
+ N;,
; = I ; . . ,n
(5 37)
where c is a vector parameter and N . ., N is an iid noise ; , , sequence with marginal pdf f. system The identification problem is t o estimate c from observation of V,; . -,Y, and X . ; , .,X,. Note that (5.37) is a generalization of the amplitude estimation problem of (5.1) in which X;, 8, and si in (5.1) correspond, respectively, to y , c, and X; in (5.33, and s c, X,) = cx;. ( As inthe estimationof signal amplitude, the conventional estimators of c in (5.37) are of the form of M-estimators, namely,
472
PROCEEDINGS O F THE IEEE. VOL. 73,NO. 3. MARCH 1935
ncov(6)
- V(L,f)C-'/u,'
(5.44)
with L an error-weighting function. We getleastsquares estimationwith L ( x ) = x 2 , least moduli estimation with L ( x ) = 1x1, and maximum-likelihood estimation with L ( x ) = -log f ( x ) . If the X ; are iid and L is convex, then within mild regularity conditions the estimate S of (5.38) is strongly consistent. Moreover if s ( c , X ; ) is linear (i.e., s(c, X i ) = c r X j ) , then f i ( S - c) is asymptotically Gaussian with zero mean and covariance matrix
where V ( L , f ) is the asymptotic variance expression for M-estimation oflocationfrom (5.4), ' is the variance of , u the innovations, and C is the p X p covariance matrix of , u the { X ; } process when ' = 1. Note from (5.43) that LR = - fL/f, and
P-lv(e, f )
(5.39)
where /3 = E { X,%:} and V ( L , f )is from (5.4). Thus just as in the location estimation case the minimax robust estimate of c in (5.37) fora given noise uncertainty class . is the F optimum M-estimate (L, = - f L / f R ) for the least favorable noise density,
As noted in the preceding subsection,since the M-estimator is a batch process, we are often more interested in recursive estimators for on-line applications. In this case (with linear s(c, X , ) ) the appropriate robust recursive identifier for a class S is given by
cn+l = ?n
P-lSnLR(
yn
- 5( t n ,
nl(
'R)]
(5 4 which, together with f,, alsogives a saddlepoint for the minimax robustness problem. For further details on this and related topics the reader is referred to the interesting survey by Poljak and Tsypkin [38]. The problem of system identification is closely related to the problem of estimating the parameters of a linear time series. An excellent survey of methodology for robust estimation in this context is found in [37]. To illustrate the type of results that can be obtained, we mention brieflythe problem of estimating the parameters of an autoregression. Specifically, suppose we have observations V, = e
gives a saddlepoint for the minimax robust estimation of 6 when theinnovationsdistribution is uncertain. However, this is not necessarily so for robust estimation of because of the u term in (5.44). It is interesting to note from (5.44) , ' that a heavy-tailed innovations distribution may actually be beneficial in estimating +, since cov(6) depends inversely o n u This is not surprising if we note that outliers in the , ' . innovationsshould actually aid in the identification of $I just as the insertion of an impulse into a system allows one to identify its impulse response. Thus we have here a situation in which impulsive phenomena are beneficial to inference. Extensions of the above ideas to problems of estimation in ARMA models, nonlinear models, models with additive observation noise (in which outliers are again detrimental), and models of unknown order are discussed in [37]. Similar ideas can also be applied to other problems of time series analysis such as forecasting [37] and spectrum estimation [35]. Connections between the problems of robust Kalman filtering and robust time-series regression have been noted in a recent paper by Boncelet and Dickinson [191].
VI.
ROBUST DATA QUANTIZATION
+ X;,
i = 1,2,-,n
(5.41)
where 0 is a location parameter and { X ; } is a pth-order autoregression

X , = +,X;-,
+ hZX,-,+ . . . + + p X , - p +
C;
(5.42)
where = (+,,...,(pp)'is a vector of constants, and { c ; } is an iid innovations sequence with marginal pdf f. M-estimates of and 0 can be computed by finding
(?A)
where y e/(l - ... - + p ) and 5 is an estimate of the innovations scale parameter (see [37]), and then setting 8 = +(I - - . . . These estimates of B and $I are consistent and are asymptotically Gaussian and independent with
6,
+, -6J.
and
Data quantization is a necessary function of systems which digitally processor transmit signals.The optimum design of quantizers based on minimum-distortion criteria has been considered extensively over the past three decades and a number of references on this subject canbe found in surveys by Morris [I921 and Gersho [I931 and in a special issue of the IEEE TRANSACTIONS INFORMATION ON THEORY (Gray [194]). More recently, designs for quantizers which are optimum for signal detection or estimation purposes have also been developed [163], [195]-[197]. Optimum quantizer design is based primarily on statistical definitions of optimality, such as minimum mean distortion [I981 or maximum divergence [I%]. Thus the optimum designs usually require an accurate statistical model for the data to be quantized. Sincesuch models are rarely exact, the study of quantizer design for inaccurate models is of practical interest. One approach to this problem is that of adaptive quantization (see, for example, [ I s ] , [2oO]), which is appropriate for very inaccurate models or for situations in which data statistics are changing significantly over moderate time periods. An alternative approach which is primarily of interest when there are relatively small inaccuracies in the statistical model or when a fixed structure is preferred, is a game-theoretic one inwhich a quantizer with best worst case performance is sought. As demonstrated in the preceding sections, this general approach has also been applied successfully to many other problems of signal processing with inaccurate statistical models, and the
KASSAM A N D P O O R : ROBUST TECHNIQUES F O R SIGNAL PROCESSING
473
resulting designs are usually robust. In this section, we consider the problem of minimax design of quantizers for imprecisely modeled data. In particular, we surveyseveral recent results pertaining to the problem of minimax distortion quantization. These results include both asymptotic (as the number of quantization levels becomes infinite) and nonasymptotic treatments of this problem.
A. Robust Quantization for a Small Number of Levels
qk =
ltk
k- 1
k=l;..,M
(6.4)
f ( x ) dx
An M-level quantizer Q can be represented by a set of M output levels ql, q2; .., qM and a set of ( M 1) input breakpoints t o ,t,; .., tM, satisfying - w = to < t, < . . . < t M - , < tM = 00, where the quantized value of a real input x is given by
Q(x)=qk,ifxE(tk (see Fig. 28). Thus
_,,k ] , k = I t
;.., M. (6.1)
the design of an M-level quantizer is an
out
that is, qk is the centroid of (fk-l,tk] weighted with f . (An equivalent interpretation is that q k = E { X I X E ( t k - l , tk]}). Manyimportant refinements and generalizations of this theory as well as studies of performance, approximately optimal design, etc., can be foundin the literature, and again the reader is referred to [192]-[I941 for further discussion. Of course, designs based on minimizing the quantity of (6.2) will depend on knowledge of the probability distribution of X , as canbe seen, for example, in (6.4).Thus, as discussed in the Introduction, when this distribution is not known exactly it is necessary to seekan alternative design strategy to minimizing(6.2). If we assume that the probability distribution of X lies in some uncertainty class 9 ,then a useful design objective is the minimization of the alternate quantity
Several recent studies have considered various aspects of theminimization of (6.5) and thesearediscussed in the following paragraphs. For the purposes of discussion, we will consider only the particular case of mean-square distortion, although all ofthecited studies consider more general distortion measures as well. Thus we will be considering the problem
Fig. 28.
tizer.
Input/output characteristics of an M-level quan-
min sup
Q E ~ ME B F
jm- Q( x ) )'dF( x ) 1x
-m
(6.6)
optimization problem on RZM-'. most common quanThe tizer design criterion for a random input X is to choose the quantizer parameters to minimize a mean-distortion quantity Q{( X ) l l t E D[X (6.2)
where D [ . ; ] is some appropriate measure ofdistortion. Usually D [ - ; ] is a difference distortion measure(i.e., it depends only on the difference [ X - Q(X)D and the most useful of these are the pth-difference distortion measures given by D [ a , b ] = la - blP. The study of minimum-distortion quantization is exemplified by the classical work of M a x [I%], in which design conditions for the tk and qk of minimum-distortion quantizers are derived. In particular, for a wide class of difference distortion measures (including the pth-difference ones), M a x shows that the breakpoints should be chosen to satisfy
where 1 denotes the class of all M-level quantizers. , The first study to consider (6.6) in the context of minimax robustness is that of Morris and VandeLinde [201], in which B is taken to be the class of all possible probability distributions on a fixed interval [ - V, VI. In this case, it is shown in [201] that the minimax quantization problem (6.6) is solved by the M-level uniform quantizer on [ - V VI, which , is given by t - V + Z k V / M= l ; . . , M - l k= k , q k = fk = I , / M , M - I k-V... and
qm = V - V/M.
(6.7)
The solution to (6.6) was considered next by Bath and VandeLinde in [202], in which the class B is taken to be a unimodal generalized moment constrained (UGMC) set; i.e., B is assumed to consist of the distribution functions which are unimodal (with mode zero) and which satisfy the generalized moment constraint
and he also gives a second set of necessary conditions which, together with (6.3), give a set of nonlinear equations to be solved forthe optimum quantizer parameters. For example, if we consider mean square distortion ( D [ a , b ] = la - bI2)and if X has a pdf f , then the necessary conditions are given by
where the constraint function p is symmetric, continuous, is strictly increasing on (0, co), and satisfies p(0) = 0 and p ( x ) + 00 as x + co. In particular, it is shown in [202]via the Lagrange duality theorem that the minimax quantizer for this problem i s given by the solution to
474
PROCEEDINGS
O F
THE IEEE, V O L 73, NO. 3, MARCH 1985
where c is from (6.8) and
An O ( M ) algorithm for solving (6.9) is also given in [202], and it is demonstrated numerically that the resulting minimax quantizer can perform much better in the worst case over B than both the uniform quantizer and the quantizer which is optimum for Gaussian data.
limiting distortion of the minimum mean-square-error quantizer (see, for example, Bucklew and Wise [203]), it follows that the companding structure causes no performance loss for optimum quantization in the asymptotic case. Note that, as one would expect, the optimum compressor characteristic of (6.12) depends on anexact knowledge of the probability distribution of the data. Thus if, as above, F is known only to belong to someclass 9 of possible data distributions, then it is reasonable to replace the problem of minimizing (6.11) over g with that of minimizing over g the worst case over B of (6.11). That is, it is of interest to consider the problem min S P D( f , g ) ) U
gc9
FEF
B. Asymptotic Robust Quantization

Further work on the minimax-distortion quantization problem considers the asymptotic caseas the number, M , of quantization levels becomes infinite. Assuming the data are confinedto an interval [ - V, VI, it is convenient to represent (and to implement) a quantizer as an increasing invertible function G:[- VI + [ - V, VI, followed by a V, uniform quantizer, which is followed in turn by the inverse of C (see Fig. 29). The function G is termed a compressor and its inverse an expander, so that the whole scheme is termed companding. In general, the compressor G canbe any invertible function; however, it is usually assumed to have a continuous derivative g. It is also common to assume, for simplicity, that the data distribution F has a density f. Under these and further mild regularity conditions it can be shown that the mean-squared error associated with the companding scheme is asymptotically of the form D( f,g) . M P 2 ,where the functional D( f,g) is given by
D( f , g )
(6.14)
where .F= { f l f = F', f~ B } , and where g is a set of admissible compressor curve derivatives. The problem in (6.14) was first considered by Bath and VandeLinde in [204], where the case in which B is a UGMC set as in (6.8) is treated. Itfollowsfrom [204] that the minimax compressor for this case is given by
where
g;*(x )
with
( v/3'/2)[ X :
+ X;p(
x)]
-"2
(6.16)
: A
and X chosen to solve :
and
= -2j vv f ( ~ ) [ g ( x ) ] - ~ d x . V 3 -
(6.11)
The function g describes the relative density of the quantization intervals within the range of the data. By way of (6.11), an asymptotically optimum compressor curve G can be chosen by minimizing over g. Straightforward minimization (see,e.g., Gersho [193]) yields that the minimizing compressor is given by
which yields a value of (6.11) of

D( f,g,) =
l - v f 1 l 3 ( x ) dx) ( 12
(6.13)
where go =
G.Since it
is known that (6.13) gives the
Here, the constant c and the function p are from (6.8). It is noted in [204] that for the case p ( x ) = x z , the solution to (6.17) involves only finding the root of a simple transcendental equation. It is also shown in [204] that the worst case performance over the UGMC set B of the minimax compander is much better than that of the corresponding optimum companders for the Laplace, uniform, and Gaussian distributions, However, surprisingly, it is found that "robustified" versions of the classical A-law and p-law companders (see, for example, Cattermole [205]) perform nearly as wellin their worst case as does the minimax compander. A slightly different approach to solving the robust companding problem is proposed by Kazakos in [206] and [207]. In particular, [206] and [207] consider uncertainty classes B forwhich saddlepoint solutions to (6.14) can easily be demonstrated. Note that D( f,g) is linear (and hence concave) in f and is convex in g. In [206] the following uncertainty class i s considered (here V is taken to be equal to 1):
B = ( ~ F ( X k + l ) - F ( x k ) k = 0 ,,1 , " ' , N - 1 =Pk
F(X)+F(-X)=I,O=X~<X,
<
...
<xN=1}
(6.18)
where N, the x k , and the pk are fixed and known with

N-1
Fig. 29.
Configuration for companding quantization
Pk=I.
k=O
K A S S A M AND P O O R . ROBUST TECHNIQUES F O R SIGNAL PROCESSING
475
This class is an example of the ppoint classdiscussed in previous sections. Kazakos shows that the member of (6.18) which has theuniform density on each of the intervals ( x k - , , x k ] and its corresponding optimum compressor from (6.12) form a saddlepoint for (6.14) in this case. That is, with
f2*(X)=Pk/(Xk+l
xk)t
XE(Xk,Xk+l],
k=O,I;..,N-l (6.19) and

-3
-2
-1
&
from (6.12) with f =

D(
f&)
fp, we have
Q D(
Input b l u e , x
Q D(
f?&)
Kg)
(6.20)
Fig. 30. Nominal robust and output levels densities for c-contaminated truncated Gaussian data(c = 0.1).
forall g E 9.Note that the compressor curve q corresponding to this & will be piecewise linear. Also note that, of course, the existence of ($*, 8 ) satisfying (6.20) is equivalent to the condition
In [207], the saddlepoint properties of (6.14) are considered in a more general setting. in particular, it is noted in [207] that within regularity conditions on 9 and Y, the equality of (6.21) holds for general convex classes 9.Thus in view of (6.13),saddlepoint solutions to (6.14) can be sought by looking for solutions to the alternate problem (6.22) where
uniformly (and closer together than for the nominal quantizer) near the ends of the range. This latter property guards against a larger than expected number of data points near the ends of the range. Thus the robust quantizer is a mixture of the nominally optimum quantizer and the uniform quantizer (which is universally minimax). Numerical results illustrating the effectiveness of the robust quantizer are given in [209].
VII.
CONCLUSIONS
]( f )
/" f1I3(x ) dx.

-V
(6.23)
That is, under mild conditions, a solution f * to (6.22) and its corresponding optimum compressor characteristic g* from (6.12) form a saddle point for (6.14). As was noted in [208], since ]( f ) is of the form
with C concave, solutions to (5.22) are those "closest" to the uniform density on [ - V VI. Thus solutions for several , uncertainty classes are straightforward. For example, with
9= f l f = {
(I -
4)
6 + Eh}
(6.24)
where 6 and c are fixed and h is arbitrary, the density solving (6.22) is given by
f ; t ( x ) = max{(l
- c)&(x),m}
(6.25)
where m is a constant chosen so that r;t integrates to unity (compare with (2.35)). Unfortunately, as pointed out in [209], the asymptotic formulation of [206], [207] is weakened by the fact that, for some models (e.g., c-contaminated data), the minimum of the maximum asymptotic error is different from the minimum of the asymptotic maximum error due to the discrete natureof thefinite-M problem. (This problem does not occur for the UGMC model of Bath and VandeLinde.) This problem is corrected in [209], and thesolution for c-contaminated data are stillof the form of (6.25) but with a higher degree of limiting. A typical solution is shown in Fig. 30. Note that, for this case, the levels of the robust quantizer are distributed more or less as those of the nominal quantizer near the center of the range but they are spaced
In this paper we have considered minimax robustness in the context of the signal processing tasks of estimation, detection, and data quantization. We have seen that these robustness formulations take two basic forms: robustness with respect to uncertain second-order statistical properties (e.g., spectral properties) of signals or noise and robustness with respect to uncertainty in the marginal distribution of the noise or signal process. In the first of thesecases, the robust methods take the form of the linear procedures discussed in Sections II and Ill, whereas in the second case, nonlinear procedures are called for. In either case the typical robustification procedure has the effect of lowering the sensitivity of a nominally optimum procedure by tempering those characteristics that are accentuated by the nominal model. Thus in a nominally Gaussian noise model with a small fraction of "outliers," enough limiting is introduced into the optimum procedure to keep these outliers from destroying action the the of procedure. The frequency-domain robustness methods can also be thought of in the same way in which the gain of the appropriate filter is reduced in certain spectral regions to limit the effects of a more than expected amount of energy in those regions (i.e., "spectral outliers"). The relationshipof these two types of robustness has been discussed by Franke and Poor in [59] in the context of estimation. in [59] it is noted that, if one knows only spectral or other second-order properties, then linear estimation procedures are globally minimax over all estimation schemes. It is only when distributional information is provided that nonlinearity arises in the minimax solutions. Thus the robust filters of Section Ii are globally minimax over all filters (linear or nonlinear) and all random processes (Gaussian or otherwise) with the given spectral properties. Note that the relevant observation or noise processes in Section V have very special spectral characteristics (they are usually white), and only their marginal distribution is allowed tovary. Very little progress has been made in dealing
476
PROCEEDINGS O F THE IEEE. VOL. 73. NO. 3, MARCH 1985
with simultaneous spectral and distributional uncertainty models (an exception is a recent paper by Moustakides and Thomas [154]), although this certainly gives rise to an interesting class of problems from a practical viewpoint.
REFERENCES
[l] H. L. Van Trees, Detection, Estimation, and Modulation Theory, I. New York: Wiley, 1968. [2] W. L. Root, Stability in signal detection problems, in Proc. Symp. in AppliedMathemarics, vol. 16, American Math. SOC. 1964. [3] F. R. Hampel, A general qualitative definition of robustness, Ann. Math. Stat., vol. 42, pp. 1887-1896, 1971. 141 Ya. 2 . Tsypkin, Adaptive optimization algorithms when [SI [6] [7]
[8]
there i s a priori uncertainty, Automat Remote Contr., vol. 40, no. 6, pp. 857-868, June 1979. J . W. Tukey, A survey of sampling from contaminated distributions, in Contributions to Probability and Statistics, I. Olkin et a/., Eds. Stanford, CA: Stanford Univ. Press, 1960. P. J. Huber, Robust estimation of a location parameter, Ann. Math. Stat., vol. 35, pp. 73-104, 1964. -, A robust version of the probability ratio test, Ann. Math. Stat., VOI.36, pp. 1753-1758, 1965. L. A. Zadeh, General filters for separation of signals and noise, in Proc. Symp. on Information Networks (Poly. Inst. of Brooklyn, Apr. 1954), pp. 31-49. W. L. Root, Communicationsthrough unspecified additive noise, Inform. Contr., vol. 4, pp. 15-29, Mar. 1961. -, Some notes on jamming, MIT Lincoln Lab. Tech. Rep. 103, 1956. M . C. Yovits and J, L. Jackson, Linear filter optimization with game theory considerations, in IRE Nat. Conv. Rec., pt. 4, pp. 193-199, 1955. N. J. Nilsson, An application of the theory of games to radar reception problems, in IRE Nat. Conv. Rec., pt. 4, pp. 130-1 40, Mar. 1959. L.-H. Zetterberg, Signal detection under noise interference in a game situation, IRE Trans. Inform. Theory, vol. IT-8, pp. 47-57, Sept. 1%2. N. M. Blachman, Communication as a game, in IRE Wescon Conv. Rec., pt. 2, pp. 61-66, Aug. 1957. R . L. Dobrushin, Optimum information transmission through a channel with unknown parameters, Radio ng. Electron. Phys., vol. 4, no. 12, pp. 1-8, 1959. M . Yu. Cadzhiev, Determination of the optimum variation mode of the useful signal and noise carrier frequencies in detection problems based on the theory of games, Automat. Remote Contr., vol. 22, no. 1, pp. 31-39,1961. C. E. P. Box, Non-normalityand tests on variances, Biometrika, vol. 4 0 , pp. 318-335, 1953. P. J. Huber, Episodes from the early history of robustness, in Abstracts of Papers: 1982 /FEE Int. Symp. on Information Theory (Les Arcs, France, June 1982), p. 72. -, Robust statistics: A review, Ann.Math. Stat., vol. 43, pp. 1041-1067 1972. -, Robust Statistical Procedures. Philadelphia, PA: SlAM Press, 1977. F. R. Hampel, Robust estimation: A condensed partial survey, Z. Wahr. verw. Ceb., vol. 27, pp. 87-104, 1973. P. J. Bickel, Another look at robustness:A review of reviews and some new developments, Scand. /. Statist., vol. 3, pp. 145-1 68, 1976. R. V. Hogg, Adaptive robust procedures: A partial review and some suggestions for future applications and theory, J. Amer. Stat. Assoc., vol. 69, pp. 909-927, 1974. -, Statistical robustness: One view of its use i n applications today, Amer. Statistician, vol. 33, pp. 108-115, Aug. 1979. A. A. Ershov, Stable methods of estimating parameters (survey), Automat. Remote Contr., vol. 39, no. 7, pt. 2, pp. 1152-1181, July 1978.
J. Bickel, F. R. Hampel, P. J. Huber, W. H. Estimates of Location: Rogers, and J. W. Tukey, Robust Survey and Advances. Princeton, NJ: Princeton Univ. Press,
D. F. Andrews, P.
1972.
R. L. Launer and C. N. Wilkinson, Eds., Robustness in Statistics. New York: Academic Press, 1979. P. J. Huber, Robust Statistics. New York: Wiley, 1981. S. S. Wolff andJ. L. Castwirth, Robust two-input correlators, J. Acoust. SOC. Amer., vol. 41, pp. 121 2-1 219, May 1967. R. D. Martin and S. C. Schwartz, Robust detectionof a known signal in nearly Gaussian noise, /E Trans. Inform. Theory, vol. IT-17, pp. 50-56, Jan. 1971. S. A.Kassam and H. V. Poor, Robust signal processing for communication systems, I Commun. Mag., vol. 21, pp. 20-28, Jan. 1983. V. D. VandeLinde, Robust techniques in communication, in Robustness in Statistics, R. L. Launer and G. N. Wilkinson, Eds. New York: Academic Press, 1979, pp. 177-200. .V. M. Krasnenker,Stable (robust) detection methods for signalsagainst a noise background (survey), Automat. Remote Contr., vol. 41, no. 5, pt. 1, pp. 640-659, May 1980. H. V. Poor, Robustness in signal detection, in Communications and Networks: A Survey of Recent Advances, I. F. Blake and H. V. Poor, Eds. New York: Springer-Verlag, 1985 (to appear). 6. Kleiner, R. D. Martin, and D. J. Thomson, Robust estimation of powerspectra, J. Roy. Stat. Soc., Ser. 6, vol. 41, no. 3, pp. 313-351, 1979. R. D. Martin and D. J. Thomson, Robust-resistant spectrum estimation, Proc. I, vol. 70, pp. 1097-1115, Sept. 1982. R. D. Martin, Robust methods for time series, in Applied Time Series I / , D. F. Findley, Ed. New York: Academic Press,
(91
[IO]
[ll] [12] [13] [I41 [IS] [I61
1981.
B. T. Poljak and Ya. 2. Tsypkin, Robust identification, Automatica. vol. 16, pp. 53-63, 1980. J. 8. Thomas, An lntroduction to Statistical Communication Theory. New York: Wiley, 1968. K. S. Vastola and H. V. Poor, An analysis of the effects of spectral uncertainty on Wiener filtering, Automatica, vol. 28, pp. 289-293, 1983. T. L. Marzetta and S. W. Lang, Power spectral density bounds, /FEE Trans. Inform. Theory, vol. IT-30, pp. 117-122, Jan. 1984. 0. Kurkin and I. C. Sidorov, M. Signal filteringinthe presence of noise with incompletely known probability characteristics, Radio Eng. Electron. Phys., vol. IO, pp. 54-62,
[47 1
1980.
N. E. Nahi and I. M. Weiss, Bounding filter: A simple solutionto the lack of exact apriori statistics, Inform. Contr., VOI. 39, pp. 212-224, NOV.1978. S. A. Kassam and T. L. Lim, Robust Wiener filters, J. Franklin Inst., vol. 304, pp. 171-185, 1977. H. V. Poor, On robust Wiener filtering, /FEE Trans. Automat. Contr., vol. AC-25, pp. 531-536, June 1980. S. M. Ali and S. D. Silvey, A general class of coefficients of divergence of one distribution from another, J. Roy.Stat. Soc., Ser. B, vol. 28, pp. 131-142, 1966. I. Csiszar, Information-type measures of difference of probability distributions and indirect observations, Studia Sciem tariurn Mathematicerium Hungarica, vol. 2, pp. 229-318,
[I71 [18] [19] [20] [21]

[22]
1 %7.
H. V. Poor, Minimax linear smoothing for capacities, Ann. IO, Prob., VOI. pp. 504-507, 1982. H. V.Poor and D. P. Looze, Minimax state estimation for linear stochastic systems with noise uncertainty, /FEE Trans. Automat. Contr., vol. AC-26, pp. 932-906, 1981. L. J, Cimini and S. A. Kassam, Robust and quantized Wiener filters for p p o i n t spectralclasses, in Proc. 7 9 8 0 Conf. on Inform. Sciences and Systems (Princeton Univ., Princeton, NJ, Mar. l W ) , pp. 314-318. K. S. Vastola and H. V. Poor, On generalized band models in robust detection and filtering, in Proc. 7 9 8 0 Conf.on Inform Sciences and Systems (Princeton Univ., Princeton, NJ, Mar.I W ) , pp. 1-5.
[23] [24] 1251
K A S S A M A N D P O O R : ROBUST TECHNIQUES FOR SIGNAL PROCESSING
477
L. J. Cimini and S. A. Kassam, Optimum piecewise constant Wiener filters, j . Opt. SOC. Amer., vol. 77, pp. 1162-1171, Oct. 1981. L. Brieman, A note on minimax filtering, Ann. Prob., vol. 1, pp. 175-1 79, 1973. S. A.Kassam, The bounded p p o i n t classes in robust hypothesis testing and filtering, in Proc. 20th Annual Allerton Conf. on Comm., Control and Computing (Monticello, IL, OCt. 6-8, 1982), pp, 526-534. K . S. Vastola and H. V. Poor, Robust Wiener-Kolmogorov theory, /E Trans. Inform. Theory, vol. IT-30, pp. 316-327, Mar. 1984. E. Wong, Stochastic Processes in Information and Dynamical Systems. New York: Wiley, 1971. J. Franke, Minimax robust prediction of discrete time series, 2. Wahr. Verw. Geb., vol. 68, pp. 337-364, 1985. K. S. Vastola, On robust linear causal estimation of continuous-time signals, in Abstracts of Papers: 7932 IInt. Symp. on Inform. Theory (Les Arcs, France, June 21-25, 1982), p. 124. J. Franke and H. V. Poor Minimax robust filtering and finite-length robust predictors, in Robust and Nonlinear Time Series Analysis, J. Franke, W. Hardle, and R. D. Martin, Eds. Heidelberg, Germany: Springer-Verlag,1984. K . Yao, An alternative approach to the linear causal leastsquare filtering theory, / Trans. Inform. Theory, vol. IT-17, pp. 232-240, 1971. J. Snyders, Error formulae for optimal linear filtering, prediction and interpolation of stationary time series, Ann. Math. Statist., VOI.43, pp. 1935-1943, 1972. -, Error expressions for optimal linear filtering of stationary processes, /E Trans. Inform. Theory, vol. IT-18, pp. 574-582, 1972. Spaces of Analytic Functions. EnK. Hoffman, Banach glewood Cliffs, NJ: Prentice-Hall, 1962. Y. Hosoya, Robust linear extrapolations of second-order stationary-processes, Ann. Prob., vol. 6, pp. 574-584,1978. C. K. Colubev and M. S. Pinsker, Minimax extrapolation of functions, Prob. Inform. Transmission, vol. 20, pp. 99-11?. Oct. 1984. Yu. B. Korobochkin, Minimax linear estimation of a stationary random sequence in the presence of a perturbation with limited variance, Radio ng. Electron. Phys., vol. 28, no. 11, pp. 74-78, 1983. H. V. Poor, The rate-distortion function on classes of sources determined by spectral capacities, / E Trans. Inform. T h e OV, VOI.IT-28, pp. 19-26, 1982. M. S. Pinsker, lnforrnation and Information Stability of Random Variables and Processes. San Francisco, CA: HoldenDay, 1%4. A. Papoulis, Maximum entropy and spectrum estimation: A review, I Trans. Acoust., Speech, and Signal Process., vol. ASSP-29, pp. 1176-1186, Dec. 1981. C. Moustakides and S. A. Kassam, Minimax robust equalization for random signals, / E Trans. Commun., vol. COM-33, 1985 (to be published). -, Robust Wiener filters for random signals in correlated noise, / E Trans. Inform. Theory, vol. IT-29, pp. 614-619, July 1983. -, Robust Wiener filters for correlated signals and noise, i n Proc. 7 9 8 0 Conf. on Inform. Sciences Syst. (PrinceWpp. 308-313. ) , ton Univ., Princeton, NJ, Mar. 26-28, l P. J. Huber and V. Strassen, Minimax tests and the Neyman-Pearson lemma for capacities, Ann. Statist., vol. 1, pp. 251-263, 1973. H.Rieder, Least favorable pairs for special capacities, Ann Statist., vol. 5, pp. 909-921, 1977. T. Bednarski, On solutions of minimax test problems for special capacities, Z. Wahr. verw. Geb., vol. 58, pp. 397-405, 1981. K. S. Vastola and H. V.Poor, On the p p o i n t uncertainty class, /E Trans. Inform. Theory, vol. IT-30, pp. 374-376, Mar. 1984.
M. Taniguchi, Robust regression and interpolation for time
series, j . Time Ser. Analysis, vol. 2, pp. 53-62, 1981. 5. A.Kassam, Robust hypothesis testing and robust time series interpolation and regression, 1. Time Ser. Analysis, VOI.3, pp. 185-194, 1982. J. A. DAppolito and C. E. Hutchinson, A minimax approach to the design oflow sensitivity state estimators, Automatica, vol. 8, pp. 599-608, 1972. V. P. Perov, Optimum linear filtering of random processes when the a priori information is limited, Radio ng. Electron. Phys., vol. 21, no. 10, pp. 68-75, 1976. C. T. Chen and S. A.Kassam, Robust Wiener filtering for multiple inputs with channel distortion, /E Trans. Inform. Theory, VOI. IT-30, pp. 674-677, July 1%. J. C. Darragh and D. P. Looze, Noncausal minimax linear state estimation for systems with uncertain second-order statistics, I Trans. Automat. Contr., vol. AC-29, pp. 555-557, June 1984. H. Tsaknakis and P. Papantoni-Kazakos, Robust linear filtering formultivariable stationary time series, in Proc. 7 9 8 4 Conf. on Inform. Sciences Syst. (Princeton Univ., Princeton, NJ, March 14-16, 1%), pp. 6-10. V. D. VandeLinde, Robust properties of solutions to linearquadratic estimation and control problems, / Trans. A u tomat. Contr., vol. AC-22, pp. 138-139, Feb. 1977. J. M. Morris, The Kalman filter: A robust estimator for some classes of linear quadratic problems, I Trans. Inform. Theory, vol. IT-22, pp. 526-534, 1976. H. V. Poor and D. P. Looze, Minimax state estimation for linear stochastic systems with noise uncertainty, IEEE Trans. Automat. Contr., vol. AC-26, pp. 902-906, 1981. S. Verdu and H. V. Poor, Minimax linear observers and regulators for stochastic systems with uncertain second-order Contr., vol. AC-29, pp. statistics, /E Trans. Automat. 499-511, June 1%. C. Martin and M. Mintz, Robust filtering and prediction for linear systems with uncertain dynamics: A game-theoretic Contr., voi. AC-28, pp, approach, IEEE Trans. Automat. 888-8%, Sept. 1983. P. P. Gusak, Upperboundfor rms filtering performance criterion i n quasilinear models with incomplete information, Automat. Remote Contr., vol. 42, no. 4, pp. 466-471, Apr. 1981. to detection and estimation T. Kailath, RKHS approach problems-Part I: Deterministic signals in Gaussiannoise, / E Trans. Inform. Theory, vol. IT-17, pp. 530-549, Sept. 1971. V. P. Kuznetsov, Stable detection when the signal and spectrum of normal noise are inaccurately known, Tele comm. Radio Eng., vol. 30/31, pp. 58-64, Mar. 1976. S. A.Kassam,T.L. Lim, and L. J. Cimini, Two-dimensional filters for signal processing under modeling uncertainties, /E Trans. Geosci. Remote Sensing, vol. GE-18, pp. 331 -336, Oct. 1980. W. C. Knight, R. C. Pridham, and S. M. Kay, Digital signal processing for sonar, Proc. /E, vol. 69, pp. 1451-1506, Nov. 1981. V. V . Rodionov, A game theory approach to detection of radar signals in the presence of unknown interference, Radio Eng. Nectron. Phys., vol. 27, no. 9, pp. 74-80, 1982. C. R. Cahn, Performance of digital matched filter correlator with unknown interference, I Trans. Commun., vol COM-19, pp. 1163-1172, Dec. 1971. C. L. Turin, Minimax strategies for matched-filter detection, I Trans. Commun., vol. COM-23, pp. 1370-1371, Nov. 1975. C.-T. Chen, Robust and quantized linear filtering for multiple input systems, Ph.D. dissertation, Dept. ofSystems Eng., Univ. of Pennsylvania, Philadelphia, 1983. C.-T. Chen and 5. A. Kassam,Robust multiple-input matched filtering: Frequency and time-domain results, I Trans. Inform. Theory (to appear). .. . [99] V. P. Kuznetsov,.Minimax linear Neyman-Pearson detectors
478
P R O C E E D I N G S O F T H E IEEE.
VOL. 7 3 , NO. 3, M A R C H 1985
for an imprecisely known signal and noise, Radio ng. Electron. Phys., vol. 19, no. 12, pp. 73-81, 1974. S. Verduand H. V . Poor, Minimax robust discrete-time matched filters, I Trans. Commun., vol. COM-31, pp. 208-215, Feb. 1983. V. P. Kuznetsov, Synthesis of linear detectors when the signal i s inexactly given and the properties of the normal noise are incompletely known, Radio Eng. Electron. Phys., vol. 19, no. 12, pp. 65-73, 1974. M . V. Burnashev, On the Minimax detection of an inaccurately known signal in a white Gaussian noise background, Prob. Appl., vol. 24, pp. 107-119, 1979. R. Sh. Aleyner, Synthesis of stable linear detectors for an inaccurately known signal,Radio Eng. Electron. Phys., vol. 22, no. 1, pp. 142-145, 1977. -, On the stable detection of a quasi-deterministic signal the presence ofmultiplicative interference, Radio Eng. Nectron. Phys., vol. 24, no. I O , pp. 60-66, 1979. S. Verdu and H.V. Poor, Signal selection for robust matched filtering, I Trans. Commun., vol. COM-31, pp. 667-670, May 1983. K. S. Vastola, J. S. Farnbach, and S. C. Schwartz, Maximin sonarsystem design for detection, in Proc. 1983 I Int. Conf. on Acoustics, Speech andsignal Processing, Apr. 1983. R. A. Monzingo and T. W. Miller, Introduction to Adaptive Arrays. New York: Wiley, 1980. K. M. Ahmed and R. J. Evans, Robust signal and array processing, Proc. Inst. Elec. Eng., vol. 129, pt. F, no. 4, pp. 297-302, Aug. 1982. C. H. Knapp and G. C. Carter, The generalized correlation method for estimation of time delay, I Trans. Acoustics, Speech, Signal Process., vol. ASSP-24, pp. 320-327, Aug. 1976. E. K. AI-Hussaini and S. A . Kassam, Robust Eckart filters for time delay estimation, /E Trans. Acoustic, Speech, Signal Process., vol. ASSP-32, pp. 1052-1063, Oct. 1984. H. V. Poor, Robust decision design using a distance criterion, I Trans. Inform. Theory, vol. IT-26, pp. 575-587, Sept. 1980. -, Robust matched filters, I Trans. Inform. Theory (to be published). C. W. Helstrom, Statistical Theory of Signal Detection, 2nd ed. Oxford, England: Pergamon, 1968. D . Slepian, O n bandwidth, Proc. IEEE, vol. 64, pp. 292-300, M a r . 1976. S. Verdu and H. V. Poor, On minimax robustness:A general approach and applications, I Trans. Inform. Theory, vol. IT-30, pp. 328-340, Mar. 1984. N . M. Khalfina and L. A. Khalfin, On a robust version of the likelihood ratio test, Theory Prob. Appl., vol. 20, pp. 199-202, 1975. Also in Proc. I, vol. 64, pp. 292-3133, Mar. 1976. G. V. Trunk, Non-Rayleigh sea clutter: Properties and detection of targets, Naval Res. Lab Rep. 7986, Washington, DC, June 1976. J. H. Miller and J. 6. Thomas, Robust detectors for signals in non-Gaussian noise, / E Trans. Commun., vol. COM-25, pp, 686-691, July 1977. S. A. Kassam, Robust hypothesis testing for bounded classes ofprobability densities, / E Trans. Inform. Theory, vol. IT-27, pp. 242-247, Mar. 1981, V. P. Kuznetsov, Stable methods of discriminating between two signals, Radio Eng. Nectron. Phys., vol. 19, no. IO, pp. 42-49, 1974. -, Stable detection of a signal that is not precisely known, Prob. Inform. Transmission, vol. 12, no. 2, pp. 119-1 29, 1976. -, Stable rules for discrimination of hypotheses, Prob. Inform. Trans., vol. 18, no. 1, pp. 41-51, 1982. D. Kazakos, Signal detection under mismatch, / E Trans. Inform. Theory, vol. IT-28, pp. 681-684, July 1982. E. A. Ceraniotis, Performance bounds for robust decision problems with uncertain statistics, in Proc. 2 n d I Conf.
I381
1391
I 401
1411
on Decision and Control (San Antonio, TX, Dec. 14-16, 1983). pp. 298-303. E. A. Ceraniotis and H. V. Poor, Minimax discrimination for observed Poissonprocesses with uncertain rate functions, I Trans. Inform. Theory, vol. IT-31, 1985 (to appear). P. M. Schultheiss and J. J. Wolcin, Robust sequential probability ratio detectors, in EASCON 75 Conv. Rec., pp. 36-A-36-H, 1975. J. Capon, On the asymptotic efficiency of locally optimum detectors, IRE Trans. Inform. Theory, vol. IT-7, pp. 67-71, Apr. 1961. J. H. Miller and J. 6.Thomas, Detectors for discrete-time signals in non-Gaussian noise, /E Trans. Inform. Theory, vol. IT-18, pp, 241-250, Mar. 1972. S. A. Kassam and J. 6. Thomas, Asymptotically robust detection of a known signal in contaminated non-Gaussian noise, I Trans. Inform. Theory, vol. IT-22, pp. 22-26, Jan. 1976. H. V. Poor and J. E. Thomas, Asymptotically robust quantization for detection, / Trans. Inform Theory, vol. IT-24, pp. 222-229, Mar. 1978. A. H. El-Say and V . D. VandeLinde, Robust detection of known signals, I Trans. Inform. Theory, vol. IT-23, pp. 722-727, Nov. 1977. -, Robust sequential detection of signals in noise, IEEE Trans. Inform. Theory, vol. IT-25, pp. 346-353, May 1979. C. V. Trunk, Small- and large-sample behavior of two detectors against envelope-detected sea clutter, I Trans. Inform. Theory, vol. IT-16, pp. 95-99, Jan. 1970. C. V. Trunk and S. F. George, Detection of targets in non-Gaussian sea clutter, ITrans. Aerosp. Nectron. Syst., vol. AES-6, pp. 620-628, Sept. 1970. J. M. Morris, On single-sample robust detection of known signals in additive unknown-mean amplitude-bounded random interference, / Trans. Inform. Theory, vol. IT-26, pp. 199-209, Mar. 1980. -, On single-sample robust detection of known signals in additive unknown-mean amplitude-bounded random interference-Part II: The randomized decision rule solution, / E Trans. Inform. Theory, vol. IT-27, pp. 132-1 36, Jan.1981. -, Performance of a suboptimal multisample decision rule against known signals with additive unknown-mean amplitude-bounded random interference, in Proc. 27st I Conf. on Decision and Control(Orlando, FL, Dec. 8-10, 1982), pp, 437-439. S. A . Kassam, Locally robust array detectors for random signals, I Trans. Inform. Theory, vol. IT-24, pp. 309-316, May 1978. M. Kanefsky. Detection of weak signals with polarity coincidence arrays, / Trans. Inform. Theory, vol. IT-12, pp. 260-268, Apr. 1%6. R. D. Martin and C. P. McGath, Robust detection of stochastic signals, / E Trans. Inform. Theory, vol. IT-20, pp. 537-541, July 1974. H. V. Poor, M. Mami, and J.8. Thomas, On robust detection of discrete-time stochastic signals, /. Franklin Inst., vol. 309, pp. 29-53, Jan. 1980. H. V. Poor and J. 6. Thomas, Locally-optimum detection of discrete-time stochastic signals i n non-Gaussian noise, 1. Acoust. SOC. Amer., vol. 63, pp. 75-80, Jan. 1978. J . W. Modestinoand A. Y. Ningo, Detection of weak signals in narrowband non-Gaussian noise, I Trans. Inform Theory, vol. IT-25, pp. 592-600, Sept. 1979. S. A. Kassam, Detection of narrowband signals: Asymptotic robustness and quantization, in Proc. 1984 Conf. on Inform. Sciences and Systems (Princeton Univ., Princeton, NJ, Mar. 1984), pp. 297-301. S. A. Kassam, Nonparametric detection of narrowband signals, in Proc. 27th Midwest Symp. on Circuits and Systems (Morgentown, WV, June 1984), pp. 518-521. A. H. El-Say, Detection of signals with unknown phase, in Proc. 17th Ann. Allerton Conf. on Comm., Control and Computing (Monticello, IL, Oct. 1979). pp. 152-161.
K A S S A M A N D POOR. ROBUST TECHNIQUES
F O R SIGNAL PROCESSING
479
P. A. Kelly, Robust estimation and detection of signals with arbitrary parameters, in Proc. 27st Annual Allerton Conf. on Commun., Control and Computing (Monticello, IL, Oct. 5-7, 1983), pp. 602-609. J. G. Shin and 5. A. Kassam, Robust detector for narrowband signals in non-Gaussian noise, J. Acoust. SOC. Amer., VOI.74, pp. 527-533, Aug. 1983. 5.5. Wolff and F. J. M. Sullivan, Robust many-input signal detectors, in Signal Processing:Proc. of NATO Advanced Study Inst. on Signal Processing, J. W. R. Criffiths et a/., Eds. London, England: Academic Press, 1973. V. P. Kuznetsov, Synthesis of stable, small signal detectors, Radio Eng. Electron. Phys., vol. 21, no. 2, pp. 26-34, 1976. H. V. Poor, Signal detection in the presence of weakly dependent noise-Part II: Robust detection, I Trans. Inform. Theory, vol. IT-28, pp. 744-752, 1982. 5. L. Portnoy, Robust estimation in dependent situations, Ann. Statist., vol. 5, pp. 22-43, 1977. -, Further remarks on robust estimation in dependent situations, Ann. Statist., vol. 7, pp. 244-251, 1979. G. V. Moustakides and J. B. Thomas, Min-max detection of weak signals in +-mixing noise, / E Trans. Inform. Theory, vol. IT-30, pp. 529-537, May 1984. S. A. Kassam and J. B. Thomas, Dead-zone limiter: An application of conditional tests in nonparametric detection, J. Acoust. SOC. Amer., vol. 60, pp. 857-862, 1976. R. D. Martin, Robust estimation of signal parameters with dependent data, in Proc. 27st /FEE Conf. on Decision and Control (Orlando, FL, Dec. 8-10, 1982), pp. 433-436. 5. A. Kassam, C. Moustakides, and J. C. Shin, Robust detection of knownsignals in asymmetric noise, /FEE Trans. Inform. Theory, vol. IT-28, pp. 84-91, Jan. 1982. J. R. Collins, Robust estimation of a location parameter in the presence of asymmetry, Ann. Statist., vol. 4, pp. 68-85, Signals: A Large Deviations Approach, IRISA(CNRS) Publ. no. 243, Univ. de Rennes, France, Nov. 1984. R. D. Martin and 5. C. Schwartz, On mixture quasi-mixture, and nearly normal random processes, Ann. Math. Stat., vol. 43, pp. 948-967, 1972. P. A. Kelly and W. L. Root, Stable linear detectors with optimality constraints, in Proc. 22nd Ann. Allerton Conf. (Monticello, IL, Oct. on Comm., Control, and Computing 3-5, 1984), pp. 318-327. , Stability i n detection of signals in noise, in Proc. 23rd /FEE Conf. on Decision and Control (Las Vegas, NV, Dec. 12-14,1984), pp. 1436-1443. 5. A. Kassam, Optimum data quantization for signal detection, i n Communications and Networks: ASurvey of Recent Advances, I. F. Blake and H. V. Poor, Eds. New York: Springer-Verlag, 1985. L. Kurz, Nonparametric detectors based on partition tests, in Nonparametric Methods Communications, in P. Papantoni-Kazakos and D. Kazakos, Eds. New York: Marcel-Dekker, 1977. R. F. Dwyer, Robust sequential detection of weak signals in undefined noise using acoustical arrays, J. Acoust. SOC. Amer., vol. 67, pp. 833-841, 1980. Y. C. Ching and L. Kurz, Nonparametric detectors based on m-interval partitioning, /E Trans. Inform. Theory, vol. IT-18, pp. 251-257, Mar. 1972. P. Kersten and L. Kurz, Robustized vector Robbins-Monro algorithm with applications to m-interval detection, Inform. Sci., VOI.11, pp. 121 -1 4 , 1076. 0 , Bivariate minterval classifiers withapplicationto edge detection, Inform. Contr., vol. 34, pp. 152-168, 1977. R. F. Dwyer, Robust sequential detectionof narrowband acoustic signals in noise, in Proc. /FEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 140-143, Apr. 1979. -, A technique for improving detectionand estimation of signals contaminated by under ice noise, J. Acoust. SOC. Amer., vol. 74, pp. 124-130, July 1983. J. H. Miller and J. B. Thomas, The detection of signals in impulsive noise modeled as a mixture process, / E Trans. Commun., vol. COM-24, pp. 559-563, May 1976.
W. Modestino, Adaptive detection of signals in impulsive noise environments, /FEE Trans. Commun., vol. COM-25, pp. 1022-1027, Sept. 1977. L. B. Milstein, D. D. Schilling, and J. K . Wolf, Robust detection using extreme-value theory, /FEE Trans. Inform. Theory, vol. 15, pp. 370-375, 1%9. 5. A.Kassam and J. B. Thomas, Eds., Nonparametric Detection: Theory and Applications. Stroudsburg, PA: Dowden, Hutchinson and Ross, 1980. 5. A. Kassam, A bibliography on nonparametric detection, /E Trans. Inform. Theory, vol. IT-26, pp. 595-602, Sept.
J.
1980.
R. D.Martin and C. J. Masreliez, Robust estimation via stochastic approximation, /E Trans. Inform. Theory, vol. IT-21, pp. 263-271, May 1975. A. B. Tsybakov, Robust estimation of a function, Prob. Inform. Transmission, vol. 18, no. 3, pp. 190-200, 1983. R. D. Martin, Robust estimation of signal amplitude, /E Trans. Inform. Theory, vol. IT-18, pp. 5%-64%, 1972. E. L. Price and V. D. VandeLinde, Robust estimation using the Robbins-Munro stochastic approximation algorithm, /E Trans. Inform. Theory, vol. IT-25, pp. 698-704, 1979. A. C. Bovik, T. 5. Huang, and D. C. Munson, Jr., A generalization of median filteringusing linear combinations of order statistics, I Trans. Acoust., Speech, and Signal Process., vol. ASSP-31, pp. 1342-1350, Dec. 1982. Y. H. Lee and S. A. Kassam, Nonlinear edge preserving filtering techniques for image enhancement, in Proc. 27th Midwest Symp. Circuits Syst. (Morgentown, WV, June I%), pp. 554-557. -, Applications of nonlinear adaptive filters for image Int. Pattern Recogn. enhancement, in Proc. 7th Conf. (Montreal, Que., Canada, Aug. l W ) , pp. 930-932. K. Abend et a/., Advanced digital receiver techniques, Tech.Rep.RADC-TR-68-169,Rome Air Development Cen., Rome, NY, Sept. 1978. B. D. 0. Anderson and J. B. Moore, Optimal Filtering. Englewood Cliffs, NJ: Prentice-Hall, 1979. C. J. Masreliez and R. D. Martin, Robust Bayesian estimation for the linear model and robustifying the Kalman filter, /FEE Trans. Automat. Contr., vol. AC-22, pp. 361-371, 1977. V. D. VandeLinde, R. Doraiswami, and H. 0. Yurtseven, Robust filtering for linear systems, in Proc. 77th /FEE Conf. onDecision and Control (New Orleans, LA, Dec. 13-15, 1972), pp. 652-656. A. A.Ershov and R. Sh. Lipster, A robust Kalman filter in discrete time, Automat. Remote Contr., vol. 39, no. 3, 1978. A. A. Ershov, Robust filtering algorithms, Automat. Remote Contr., vol. 39, no. 7, pt. 1, pp. 992-996, July 1978. I. K. Levin, Accuracy analysis of a robust filter of a certain type by method the of convex hull, Automat. Remote Contr., no. 5, pp. 660-669,1980. C. Tsai and L. Kurz, Adaptive robustizing approach to Kalman filtering, Automatica, vol. 18, no. 3, pp. 279-288,
1976. C. V. Moustakides, Robust Detectionof
1983.
C. G. Boncelet, Jr., and B. W. Dickinson, An approach to robust Kalman filtering, in Proc. 2Znd /E Conf. Decision Control (San Antonio, TX, Dec. 14-16,1983), pp: 304-305. J. M. Morris,Quantizationand source encodingwith a fidelity criterion: A survey, Naval Res. Lab. Rep. 8082, Washington, DC, Mar. 25,1977. A. Gersho, Principles of quantization, /FEE Trans. Circuits Syst., VOI.CAS-25, pp. 427-436, July 1978. R. M. Gray, Ed. ITrans. Inform. Theory (Special Issue on Quantization), vol. IT-28, Mar. 1982. B. Aazhang and H. V. Poor, On optimum and nearly optimum data quantization for signal detection, / E Trans. Commun., vol. COM-32, pp. 745-751, July 1984. H. V . Poor, A companding approximation for the statistical divergence ofquantized data, in Proc. 2 n d IConf. Decision Control (San Antonio, T , Dec. 14-16,1983). pp. X
697-702. -, The effects of data quantizationonfiltering 1430-1 435.
[I 71 1
of stationary Gaussian processes, in Proc. 23rd IF Conf. DecisionControl (Las Vegas, NV, Dec. 12-14,1 W), pp.
PROCEEDINGS O F THE IEEE. VOL. 73,NO. 3. MARCH 1985
[8 I 1 9
Max, Quantizing for minimum distortion, IRE Trans. Inform. Theory, vol. IT-6, pp. 7-12, Mar. 1960. [ 9 D. J. Goodman and A. I 1 9 Gersho, Theory of ara adaptive /FEE Trans. Commun., vol. COM-22, pp. quantizer, 1037-1045, Aug. 1974. [200] A. Gersho and D. J. Goodman, Atraining mode adaptive quantizer, /FEE Trans. Inform. Theory, vol. IT-20, pp. 746-749, NOV.1974. [201] J. M. Morris and V. D. VandeLinde, Robust quantization of discrete-time signals with independent samples, /FEE Trans. Commun. Techno/.,vol. COM-22, pp. 1897-1901, Dec. 1974. [202) W. C. Bath and V. D. VandeLinde, Robust memoryless quantization minimum for signal distortion, /E Trans. Inform. Theory, vol. IT-28, pp. 296-306, Mar. 1982. [203] J , A . Bucklew and G. L. Wise, Multidimensional asymptotic quantization theory withr-thpowerdistortion measures, /FEE Trans. tnform. Theory, vol. IT-28, pp. 239-247, Mar.
J,
[204]
[205] [206]
[207] [208]
[209]
1982. W. G. Bath and V. D. VandeLinde, Robust quantizers designed using the companding approximation, in Proc. 78th /E Conf. on Decision and Control (Ft. Lauderdale, FL, Dec. 1979), pp. 483-487. P. W. Cattermole, Principles of Pulse Code Modulation. London, England: Ilffe, 1969. D. Kazakos, On the design of robust quantizers, in 7987 /E Telecomm. Conf.,Conf. Rec. (New Orleans,LA, Dec. 1981), pp. F4.5.1-F4.5.4. -, New results on robust quantization, /E Trans. Commun., VOI. COM-31, pp. 965-974, Aug. 1983. H. V. Poor, Someresults on robust data quantization, in Proc. 27st / E Conf. Decision Control (Orlando, FL, Dec. 8-10, 1982), pp. 440-445. , Robust quantization of <-contaminated data, /E Trans. Commun., vol. COM-33, pp. 218-222, Mar. 1985.
K A S S A M A N D POOR: R O B U S T T E C H N I Q U E S F O R SIGNAL PROCESSING
481

Robust Techniques For Signal Processing - A Survey

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robust Techniques For Signal Processing - A Survey

Uploaded by

Copyright:

Available Formats

Robust Techniques for Signal Processing: A Survey

P R O C E E D I N G S OF THE IEEE, VOL. 73. NO 3, M A R C H 1985

L(x; sJ) > b

- a < L ( x ; s , e ) G b (1.5) q x ; s,e) < - a

PROCEEDINGS O THE IEEE, V O L 73. NO. 3, M A R C H 1985 F

KASSAM A N D P O O R : R O B U S T TECHNIQUES F R SIGNAL PROCESSING O

A. The Need for Robustness in Signal Estimation

Consider the observation model

and that the corresponding minimum value of the MSE is given by

( $ / e ( Q S , a N ; H ) ) (2.9) ( jt output j t ) can be written as 3( t ) = S( t ) + ( ( )

Also, the input SNR is given simply by

Input SNR = lologl,

K A S S A M A N D P O O R ROBUST TECHNIQUES F O R SIGNAL PROCESSING

0,,Q2;.,a, form a partition ofthe

< @(o) < u(o),

--oo < w <

C. Some Useful Models for Spectral

PROCEEDINGS OF THE IEEE. VOL. 73, N O .3, MARCH 1985

if k' c H , ( o ) d k" if H,( w ) > k"

KASSAM A N D POOR: ROBUST TECHNIQUES F O R SIGNAL PROCESSING

minimizing the worst case error

PROCEEDINGS O F THE IEEE, VOL. 73, NO, 3, MARCH 1965

1-"e'"k@,( /" 2.n

if (I - E ) @ ~ , ~ ( O ) c' > if (I - ~ ) @ ~ , , ( oc' < )

This spectrum is illustrated in Fig. 5 for the case r = 0.5 and

K A S S A M A N D P O O R : R O B U S T TECHNIQUES FOR SIGNAL PROCESSING

E. Robust Equalization of an Uncertain Channel

we have that the M S E is given straightforwardly in this case by

PROCEEDINGS O F THE IEEE. VOL 73, NO. 3, MARCH 1985

as(w ) + 2 Re { aSN( + Q N ( o)}

and the corresponding minimum value of M S E is given by

G. Uncertainty Classes of Spectral Measures

argmax e,( as) aS9

where e, is from (2.67). Moreover, note that alently minimizes

The mean-square estimation error incurred by using (2.64)

PROCEEDINGS O F THE IEEE. VOL. 73. NO. 3, MARCH 1985

I. Robust Linear Filtering for Vector Observations

1 1 1 . ROBUST FILTERS F O R APPLICATIONS

DETECTION RELATED AND

K A S S A M A N D P O O R . ROBUST TECHNIQUES FOR SIGNAL PROCESSING

Structure of optimum receiver for pulse train in

THE IEEE, VOL.

Fig. 1 . Non-nominal pulse shapes p( t ) 0

Fig. 11. Bounded noise power spectral density class

K A S S A M A N D POOR ROBUST TECHNIQUES F O R SIGNAL PROCESSING

Fig. 12. Illustration of least favorable noise spectral density of (3.11).

PROCEEDINGS O F THE IEEE,

Fig. 14. Narrow-band spatialarray for signal detection and location.

KASSAM A N D P O O R . ROBUST TECHNIQUES F O R SIGNAL PROCESSING

-08 -OB -a4 -0.2

(0.25, 2.0 \O.l;

PROCEEDINGS O F THE IEEE. VOL. 73, N O . 3. MARCH 1985

v,( t ) = 5( t ) + Nd t ) V2( t ) = S( t - D ) + N2( t ) .

KASSAM AND P O O R ROBUST TECHNIQUES F O R SIGNAL PROCESSING

of complex-valued functions f satisfying

=lJx f*(w)g(w)do. 2n -"

C. General Formulation of Robust Matched Filtering Problems in Hilbert Space

PROCEEDING O F THE IEEE. VOL. 73, NO. 3, MARCH 1 9 8 5

with I beingtheidentity mappingfrom with a being the positive solution to ,

sp, = { n(n(w ) < n( w ) < n( w ) ,

and for discrete eigenspectra the trace is

(More generally, we can write tr{n) =jn(o)p(d4 D