Professional Documents
Culture Documents
PROCESSING
ELSEVIER Signal Processing 36 (1994) 287-314
Abstract
The independent component analysis (ICA) of a random vector consists of searching for a linear transformation that
minimizes the statistical dependence between its components. In order to define suitable search criteria, the expansion of
mutual information is utilized as a function of cumulants of increasing orders. An efficient algorithm is proposed, which
allows the computation of the ICA of a data matrix within a polynomial time. The concept o f l C A may actually be seen as
an extension of the principal component analysis (PCA), which can only impose independence up to the second order
and, consequently, defines directions that are orthogonal. Potential applications of ICA include data analysis and
compression, Bayesian detection, localization of sources, and blind identification and deconvolution.
Zusammenfassung
Die Analyse unabhfingiger Komponenten (ICA) eines Vektors beruht auf der Suche nach einer linearen Transforma-
tion, die die statistische Abh~ingigkeit zwischen den Komponenten minimiert. Zur Definition geeigneter Such-Kriterien
wird die Entwicklung gemeinsamer Information als Funktion von Kumulanten steigender Ordnung genutzt. Es wird ein
effizienter Algorithmus vorgeschlagen, der die Berechnung der ICA ffir Datenmatrizen innerhalb einer polynomischen
Zeit erlaubt. Das Konzept der ICA kann eigentlich als Erweiterung der 'Principal Component Analysis' (PCA) betrachtet
werden, die nur die Unabh~ingigkeit bis zur zweiten Ordnung erzwingen kann und deshalb Richtungen definiert, die
orthogonal sind. Potentielle Anwendungen der ICA beinhalten Daten-Analyse und Kompression, Bayes-Detektion,
Quellenlokalisierung und blinde Identifikation und Entfaltung.
R~sum~
L'Analyse en Composantes Ind6pendantes (ICA) d'un vecteur al6atoire consiste en la recherche d'une transformation
lin6aire qui minimise la d6pendance statistique entre ses composantes. Afin de d6finir des crit6res d'optimisation
appropribs, on utilise un d6veloppement en s6rie de l'information mutuelle en fonction de cumulants d'ordre croissant.
On propose ensuite un algorithme pratique permettant le calcul de I'ICA d'une matrice de donn6es en un temps
polynomial. Le concept d'ICA peut 6tre vu en r~alitb comme une extension de l'Analyse en Composantes Principales
(PCA) qui, elle, ne peut imposer l'ind6pendance qu'au second ordre et d6finit par cons6quent des directions orthogonales.
Les applications potentielles de I'ICA incluent l'analyse et la compression de donn6es, la d&ection bayesienne, la
localisation de sources, et l'identification et la d6convolution aveugles.
Key words: Statistical independence; Random variables; Mixture, Entropy; Information; High-order statistics; Source
separation; Data analysis; Noise reduction; Blind identification; Array processing; Principal components; Independent
components
independently proposed another approach at the signal separation problem. In some cases, identifia-
same time that presented rather similar qualities bility conditions are not fulfilled, e.g. when pro-
and drawbacks. cesses z,(t) have spectra proportional to each other,
Giannakis et al. [35] addressed the issue of so that the signal separation problem degenerates
identifiability of ICA in 1987 in a somewhat differ- into the ICA problem, where the time coherency is
ent framework, using third-order cumulants. How- ignored. Independently, the signal separation prob-
ever, the resulting algorithm required an exhaustive lem has also been addressed more recently by Tong
search. Lacoume and Ruiz [42] also sketched et al. [60].
a mathematical approach to the problem using This paper is an extended version of the confer-
high-order statistics; in [29, 30], Gaeta and ence paper presented at the Chamrousse workshop
Lacoume proposed to estimate the mixing matrix in July 1991 [16]. Although most of the results were
M by maximum likelihood approach, where an already tackled in [16], the proofs were very
exhaustive search was also necessary to determine shortened, or not stated at all, for reasons of space.
the absolute maximum of an objective function of They are now detailed here, within the same frame-
many variables. Thus, from a practical point of work. Furthermore, some new results are stated,
view, this was realistic only in the two-dimensional and complementary simulations are presented.
case. Among other things, it is sought to give sound
Cardoso focused on the algebraic properties of justification to the choice of the objective function
the fourth-order cumulants, and interpreted them to be maximized. The results obtained turn out to
as linear operators acting on matrices. A simple be consistent with what was heuristically proposed
case is the action on the identity yielding a cumu- in [42]. Independently of [29, 30], the author pro-
lant matrix whose diagonalization gives an esti- posed to approximate the probability density by its
mate of ICA [8]. When the action is defined on Edgeworth expansion [16], which has advantages
a set of matrices, one obtains several cumulant over Gram-Charlier's expansion. Contrary to [29,
matrices whose joint diagonalization provides more 30], the hypothesis that odd cumulants vanish is not
robust estimates [55]. This is equivalent to opti- assumed here: Gaeta emphasized in [29] the consist-
mizing a cumulant-based criterion [55], and is then ency of the theoretical results obtained in both ap-
similar in spirit to the approach presented herein. proaches when third-order cumulants are null.
Other algebraic approaches, using only fourth-or- It may be considered, however, that a key contri-
der cumulants, have also been investigated [9, 10]. bution of this paper consists of providing a practi-
In [36], Inouye proposed a solution for the sep- cal algorithm that does not require an exhaustive
aration of two sources, whereas at the same time search, but can be executed within a polynomial
Comon [13] proposed another solution for N 7> 2. time, even in the presence of non-Gaussian noise.
Together with Cardoso's solution, these were The complexity aspect is of great interest when the
among the first direct (within polynomial time) dimension o f y is large (at least greater than 2). On
solutions to the ICA problem. In [61], Inouye and the other hand, the robustness in the presence of
his colleagues derived identifiability conditions for non-Gaussian noise has not apparently been pre-
the ICA problem. Their Theorem 2 may be seen to viously investigated. Various implementations of
have connections to earlier works [19]. On the this algorithm are discussed in [12, 14, 15]; the
other hand, our Theorem 11 (also presented in [15, present improved version should, however, be pre-
16]) only requires pairwise independence, which is ferred. A real-time version can also be implemented
generally weaker than the conditions required by on a parallel architecture [14].
[61].
In [24], Fety addressed the problem of identify-
ing the dynamic model of the form y(t) = Fz(t), 1.4. Applications
where t is a time index. In general, the identification
of these models can be completed with the help of Let us spend a few lines on related applications.
second-order moments only, and will be called the If x is a vector formed of N successive time samples
290 P. Comon / Signal Processing 36 (1994) 28~314
and whose components are 'the most indepen- [15]. In fast, for computational purposes, it is con-
dent possible', in the sense of the maximization venient to define a unique representative of these
of a given 'contrast function' that will be sub- equivalence classes.
sequently defined.
DEFINITION 4. The uniqueness of ICA as well as
In this definition, the asterisk denotes transposi- of PCA requires imposing three additional con-
tion, and complex conjugation whenever the straints. We have chosen to impose the following:
quantity is complex (mainly the real case will be (c) the columns of F have unit norm;
considered in this paper). The purpose of the next (d) the entries of d are sorted in decreasing order;
section will be to define such contrast functions. (e) the entry of largest modulus in each column of
Since multiplying random variables by non-zero F is positive real.
scalar factors or changing their order does not
affect their statistical independence, ICA actually Each of the constraints in Definition 4 removes
defines an equivalence class of decompositions one of the degrees of freedom exhibited in Property
rather than a single one, so that the property below 2. More precisely, (c), (d) and (e) determine/1, P and
holds. The definition of contrast functions will thus D, respectively. With constraint (c), the matrix F de-
have to take this indeterminacy into account.
fined in the PCA decomposition (Definition 3) is
unitary. As a consequence, ICA (as well as PCA)
PROPERTY2. l f a pair {F, A} is an I C A o f a ran-
defines a so-called p x 1 'source vector', z, satisfying
dom vector, y, then so is the pair {F', A'} with
where .d is a p x p invertible diagonal real positive As we shall see with Theorem 1 l, the uniqueness
scaling matrix, D is a p x p diagonal matrix with of ICA requires z to have at most one Gaussian
entries o f unit modulus and P is a p x p permutation. component.
For conciseness we shall sometimes refer to the
matrix A = A D as the diagonal invertible 'scale'
matrix.
2. Statements related to statistical independence
For the sake of clarity, let us also recall the
definition of PCA below. In this section, an appropriate contrast criterion
is proposed first. Then two results are stated. It is
D E F I N I T I O N 3. The PCA of a random vector proved that ICA is uniquely defined if at most one
y of size N with finite covariance Vyis a pair {F, A} component of x is Gaussian, and then it is shown
of matrices such that why pairwise independence is a sufficient measure
(a) the covariance factorizes into of statistical independence in our problem.
Vy = FA2F *, The results presented in this paper hold true
either for real or complex variables. However, some
where d is diagonal real positive and F is full derivations would become more complicated if de-
column rank p; rived in the complex case. Therefore, only real vari-
(b) F is an N x p matrix whose columns are ortho- ables will be considered most of the time for the
gonal to each other (i.e. F * F diagonal). sake of clarity, and extensions to the complex case
will be mainly addressed only in the appendix. In
Note that two different pairs can be PCAs of the the remaining, plain lowercase (resp. uppercase)
same random variable y, but are also related by letters denote, in general, scalar quantities (resp.
Property 2. Thus, PCA has exactly the same in- tables with at least two indices, namely matrices or
herent indeterminations as ICA, so that we may tensors), whereas boldface lowercase letters denote
assume the same additional arbitrary constraints column vectors with values in ~N.
292 P. Comon / Signal Processing 36 (1994) 287-314
DEFINITION 6. A contrast will be said to be ponents. I f B has two non-zero entries in the same
discrimina[ing over a set ~: if the equality column j, then xj is either Gaussian or deterministic.
7J(pAx) = T(px) holds only when A is of the form
AP, as defined in Property 2, when x is a random
Now we are in a position to state a theorem, from
vector of ~: having independent components.
which two important corollaries can be deduced
(see Appendices A.7-A.9 for proofs).
Now we are in a position to propose a contrast
criterion.
T H E O R E M 11. Let x be a vector with independent
THEOREM 7. The following mapping is a contrast components, of which at most one is Gaussian, and
over ~_~: whose densities are not reduced to a point-like mass.
Let C be an orthogonal N x N matrix and z the
T(p~) = - I(p~). vector z = Cx. Then the following three properties
are equivalent:
See the appendix for a proof. The criterion pro- (i) The components zi are pairwise independent.
posed in Theorem 7 is consequently admissible for (ii) The components zi are mutually independent.
ICA computation. This theoretical criterion, in- (iii) C = AP, A diagonal, P permutation.
volving a generally unknown density, will be made
usable by approximtions in Section 3. Regarding C O R O L L A R Y 12. The contrast in Theorem 7 is
computational loads, the calculation of ICA may discriminating over the set of random vectors having
still be very heavy even after approximations, and at most one Gaussian component, in the sense of
we now turn to a theorem that theoretically ex-
Definition 6.
plains why the practical algorithm designed in
Section 4, that proceeds pairwise, indeed works.
The following two lemmas are utilized in the C O R O L L A R Y 13 (Identifiability). Let no noise be
proof of Theorem 10, and are reproduced below present in model ( 1 . 1 ) , and define y = M x and
because of their interesting content. Refer to y = Fz, x being a random variable in some set ~_ of
[18, 22] for details regarding Lemmas 8 and 9, ~_s
2 satisfying the requirements of Theorem 11. Then
respectively. if T is discriminant over E, T(pz) = T(px) if only if
F = M A P , where A is an invertible diagonal matrix
LEMMA 8 (Lemma of Marcinkiewicz-Dugu6 and P a permutation.
(1951)). The function q~(u)= e vtu~, where P(u) is
a polynomial of degree m, can be a characteristic This last corollary is actually stating identifiabil-
function only if m <~ 2. ity conditions of the noiseless problem. It shows in
particular that for discriminating contrasts, the in-
LEMMA 9 (Cram6r's lemma (1936)). I f {xl, determinacy is minimal, i.e. is reduced to Property
1 <~ i <~ N} are independent random variables and if 2; and from Corollary 12, this applies to the con-
N
X = ~,i=1 aixi is Gaussian, then all the variables trast in Theorem 7. We recognize in Corollary 13
xi for which ai # 0 are Gaussian. some statements already emphasized in blind de-
convolution problems [21, 52, 64], namely that
Utilizing these lemmas, it is possible to obtain a non-Gaussian random process can be recovered
the following theorem, which can be seen to be after convolution by a linear time-invariant stable
a variant of Darmois's theorem (see Appendix A.7). unknown filter only up to a constant delay and
a constant phase shift, which may be seen as the
T H E O R E M 10. Let x and z be two random vectors analogues of our indeterminations D and P in
such that z = Bx, B being a given rectangular Property 2, respectively. For processes with unit
matrix. Suppose additionally that x has independent variance, we find indeed that M = F D P in the co-
components and that z has pairwise independent corn- rollary above.
P. Comon / Signal Processing 36 (1994) 287-314 295
Yet, cumulants satisfy a multilinearity property L E M M A 15. Denote by "Q the matrix obtained by
[7], which allows them to be called tensors [43, 44]. raising each entry of an orthogonal matrix Q to the
Denote by F the family of cumulants of j,, and by power r. Then we have
K those of ~: = Q.~. Then this property can be writ-
ten at orders 3 and 4 as HZQu II ~ IIu II.
Kijk : ~, QipQjqQkrFpqr, (3.7)
pqr THEOREM 16. The functional
N
gljkl : ~ QipQjqQkrQtsFpqrs. (3.8)
pqrs ~'(Q) = Z K2i l . . . i ,
i=1
On the other hand, J(p~) does not depend on Q, so
that criterion (3.1) can be reduced up to order m-2 where Kii... i are marginal standardized cumulants of
to the maximization of a functional ~: order r, is a contrast for any r >1 2. Moreover, for
N
any r >>.3, this contrast is discriminating over the set
tp(Q) : ~ 4K~i + K,],i + 7K~i -- 6KiiiKiiii
2 (3.9) of random vectors having at most one null marginal
i=i cumulant of order r. Recall that the cumulants are
with respect to Q, bearing in mind that the tensors functions of Q via the multilinearity relation, e.g.
K depend on Q through relations (3.7) and (3.8). (3.7) and (3.8).
Even if the mutual information has been proved to
be a contrast, we are not sure that expression (3.9) is The proofs are given in the appendix (see also
a contrast itself, since it is only approximating the [15] for complementary details). These contrast
mutual information. Other criteria that are con- functions are generally less discriminating than
trasts are subsequently proposed, and consist of (3.1) (i.e. discriminating over a smaller subset of
simplifications of (3.9). random vectors). In fact, if two components have
If Kiii are large enough, the expansion of J(p~) a zero cumulant of order r, the contrast in Theorem
can be performed only up to order O(m-3/2 ), yield- 16 fails to separate them (this is the same behaviour
ing ~,(Q) = 4~K2i. On the other hand, if Kiii are all as for Gaussian components). However, as in The-
null, (3.9) reduces to if(Q) = ~K2ii. It turns out that orem 11, at most one source component is allowed
these two particular forms of ~,(Q) are contrasts, as to have a null cumulant.
shown in the next section. Of course, they can be
used even if Km are neither all large nor all null, but Now, it is pertinent to notice that the quantity
then they would not approximate the contrast in
Theorem 7 any more. ~2, ~ K i2l i 2 . . . i r (3.10)
il . . . i t
0 1
[0 1 polynomial (4.4) degenerates and we end up with
the rooting of a polynomial of degree 2 only. How-
ever, it is wiser to resort to the latter simpler algo-
and (= 0- 1/0. (4.1) rithm only when N = 2 [15].
P. Comon / Signal Processing 36 (1994) 287-314 299
to usual complexities in O(N 3) of most matrix The gap ~(A, .4) is built from the matrix D = A - 1A
factorizations. as
The other algorithm at our disposal to compute
the ICA within a polynomial time is the one pro-
posed by Cardoso (refer to [10] and the references
therein). The fastest version requires O(N 5) flops, if
all cumulants of the observations are available and +~. ~IDi, I2-1 +~ ~. I D i , 1 2 - 1 . (5.1)
if N sources are present with a high signal to noise
ratio. On the other hand, there are O(N4/24) differ-
It is shown in the appendix that this measure of
ent cumulants in this 'supersymmetric' tensor.
distance is indeed invariant by postmultiplication
Since estimating one cumulant needs O(~T) flops,
by a matrix of the form AP:
1 < c~~< 2, we have to take into account an addi-
tional complexity of O(aN4T/24). In this operator- e(AAP, .,4) = e(A, A A P ) = e(A, 4). (5.2)
based approach, the computation of the cumulant
tensor is the task that is the most computationally Moreover, ~(A, ,4) = 0 if and only ifA and A can be
heavy. For T = O(N 2) for instance, the global deduced from each other by multiplication by a
complexity of this approach is roughly O(N6). factor AP:
Even for T = O(N), which is too small, we get
e(A, ,4) = 0 , ~ A = A A P . (5.3)
a complexity of O(NS). Note that this is always
larger than (4.9). The reason for this is that only
pairwise cumulants are utilized in our algorithm.
Lastly, one could think of using the multilinear- 5.2. Behaviour in the presence o f non-Gaussian
ity relation (3.8) to calculate the five cumulants noise when N = 2
required in step 4(a). This indeed requires little
computational means, but needs all input cumu- Consider the observations in dimension N for
lants to be known, exactly as in Cardoso's method realization t:
above. The computational burden is then domin-
ated by the computation of input cumulants and y(t) = (1 - I~)Mx(t) + pflw(t), (5.4)
represents O ( T N 4) flops.
where 1 ~< t ~< T, x and w are zero-mean standard-
ized random variables uniformly distributed in
[ - x ~ , x ~ ] (and thus have a kurtosis of - 1.2),
5. S i m u l a t i o n results p is a positive real parameter of [0, 1] and
/~ = II M [1/111II, [1' II denoting the L 2 matrix norm
5.1. Performance criterion (the largest singular value). Then when p ranges
from 0 to 1, the signal to noise kurtosis ratio de-
In this section, the behaviour of ICA in the pres- creases from infinity to zero. Since x and w have the
ence of non-Gaussian noise is investigated by same statistics, the signal to noise ratio can be
means of simulations. We first need to define a dis- indifferently defined at orders 2 or 4. One may
tance between matrices modulo a multiplicative assume the following as a definition of the signal to
factor of the form AP, where A is diagonal invert- noise ratio (see Fig. 1):
ible and P is a permutation. Let A and A be two
invertible matrices, and define the matrices with SNRdB = 101oglo 1 - p
unit-norm columns #
x
vd, y
0.45
N=2
0 . 4= .~
0.35
0.3 =
0.25
0.2
0.15
Fig. 1. Identification block diagram. We have in particular
0.1
y = Fz = LQ*(P*A ID)z. The matrix (P*A-1D) is optional
and serves to determine uniquely a representative of the equiva- 0.05
lence class of solutions.
0
100 200 300 400 500 600 700 800 900 1000
T
0.45
Mean of gap
Fig. 3. Average values of the gap ~ as a function of the data
N=2 ~ --
length Tand for various noise rates/~. The dimension is N = 2.
0.4 T = 100, v=240 ................
T = 140, v=171 //,/"
0.35 T = 200, v = 120 j,/ _ Standard deviation of gap
T = 500, v=48 /// ,,,~ . . . . . . . . . 0.3
N=2
T = 100, v=240
0.25 0.25 T = 140, v=171
T=200, v= 120 / ,/ .....
0.2 T = 500, v = 48 / , '-.,
0.2 = , = /,// "'" ',,
0.15
o.1 _i 0.15
i
M x as a noise and the vector I~flw as a source 160
vector. Because w has independent components, it
could be checked out conversely that e(I, F) tends
140 ii,
120
to zero as p tends to one, showing that matrix F is
approaching AP. 1(30
80
60
5.3. Behaviour in the noiseless case when N > 2
40
[3 0 2 1 - 1 1 0 1 -1 -2].
10' .___. o ~
It m a y be observed in Fig. 7 that an integration
length of T = 100 is insufficient when N = 10 (even
//
in the noiseless case), whereas T = 500 is accept- ..a. o "'""'-,..
able. F o r T = 100, the gap e showed a very high 0.1 0.2 0.3 014 0.5 0.6 g 0.7
variance, so that the top solid curve is subject to
i m p o r t a n t errors (see the standard deviations in Fig. 8. Standard deviations of the gap e obtained for integration
Fig. 8); the n u m b e r v of trials was indeed only 50 for length T and computed over v trials, for the same data as for
reasons of c o m p u t a t i o n a l complexity. As in the Fig. 7.
case of N = 2 and for larger values of T, the in-
crease of e is quite sudden when # increases. With
a logarithmic scale (in dB), the sudden increase for integration lengths of the order of 1000 samples.
would be even much steeper. After inspection, the The continuous b o t t o m solid curve in Fig. 7 shows
ICA can be c o m p u t e d with Algorithm 18 up to the performances that would be obtained for infi-
signal to noise ratios of about + 2 dB (# = 0.35) nite integration length. In order to c o m p u t e the
304 P. Comon / Signal Processing 36 (1994) 28~314
Denote by Q' the last N x p block matrix. By con- 7.A.3. Proof of (2.14)
struction, we consequently have C = P*A'Q'A,
which can be written as C = AQA, with Q = P*Q' From (2.13) we start with
and Q'*Q' = I. []
J(Px) -- ~ J(Px,)
i
= s ( ¢ ~ ) - S(p~) - Y~ s ( ¢ x , ) + y~ S(px,).
7.A.2. Properties of (2.13) i i
Combining the two last equations shows that and after substitution in (A.6)
negentropy (2.1 3) may be written in another man-
ner, as a Kullback divergence: L~= L 2 + Z LZk - (A.8)
i=1 i=1 k=l
with equality iff q~x = Px almost everywhere. Again The second requirement of Definition 5 is satis-
as a Kullback divergence between two densities fied because the random variable is standardized.
defined on the same space, negentropy is invariant Regarding the third one, note first that from (2.4) or
by an invertible change of coordinates. (2.5) the mutual information can cancel only when
306 P. Comon / Signal Processing 36 (1994) 287-314
all components are mutually independent. Again where xi are independent random variables. Then if
because of (2.4), the information is always positive. X I and Xz are independent, all variables xj for
Thus, we indeed have ~(Ax) <~ O, with equality if which ajb i ~ 0 are Gaussian.
and only if the components are independent.
At this stage a comment is relevant. In scalar This theorem can also be found in [19; 49, p. 218;
blind deconvolution problems [21], in order to 38, p. 89]. Its proof needs Lemmas 8 and 9, and also
prove that the minimization of entropy was an the following lemma published in 1953 by Darmois
acceptable criterion, it was necessary to resort to [193.
the property
L E M M A 20 (Darmois' lemma). Let f l , f z , ga and
S(~a,x~) - S(x) >>,log(~a~2)/2, (A.9) g2 be continuous functions in an open set U, and null
at the origin. I f
satisfied as soon as the random variables xi are
independently and identically distributed (i.i.d.) ac- fl(au + cv) + f2(bu + dr) = gl(u) + g2(v),
cording to the same distribution as x. Directions for
Vu, v~U,
the proof of (A.9) may be found in [6]. In our
framework, the proof was somewhat simpler and where a, b, c are non-zero constants such that
property (A.9) was not necessary. This stems from ad va bc, then J](x) are polynomials of degree <~ 2.
the fact that we are dealing with multivariate
random variables, and thus multivariate entropy. The proof of the lemma makes use of successive
In return, the components of x are not requested to finite differences [38, 19, p. 5]. More general results
be i.i.d. [] of this kind can be found in Kagan et al. [38].
Lemma 8 was first proved by Marcinkiewicz in Implications (iii) =~ (ii) and (ii) ~ (i) are quite ob-
1940, but Dugu6 gave a shorter proof in 1951 [22]. vious. We shall prove the last one, namely (i) =, (iii).
Extensions of these lemmas may be found in Kagan Assume z has pairwise independent components,
et al. [38], but have the inconvenience of being and suppose C is not of the form AP. Since C is
more difficult to prove. Lemma 9 is due to Cram6r orthogonal, it necessarily has two non-zero entries
and may be found in his book of 1937 [18]. See also in at least two different columns. Then by applying
[38, p. 21] for extensions. Some general comments Theorem 10 twice, x has at least two Gaussian
on these lemmas can also be found in [19]. [] components, which is contrary to our hypo-
thesis. []
kernel:
Again, because fij, *(2u and 2Qu are positive, this which does not depend on matrix Q. These results
yields also hold in the complex case, as shown in
[15].
(20~)i- ( r O l - I ) i -~- O, Vi, (A.22)
(2Qij _ '(~ij)uj - 0, Vi, j. (A.23) ~'~3,4 can be shown to be invariant in exactly the
same way as the proof of (3.10) was derived. In fact,
Consequently, for all values of i, and for at least replace cumulants K as functions of cumulants
p-1 values of j, we have O 2J . = -Qu.~ Since F and matrix Q, using the multilinearity (3.7) and
0 ~< 0u ~ 1, 20u is necessarily either zero or one for (3.8). Note that if a row index of Q, say i, is present
those pairs (i, j). Now, a p x p doubly stochastic in a monomial of (3.11) then it appears twice. The
matrix 2() containing only zeros or ones in sum over i can be pulled and calculated first. Then
a p x ( p - 1) submatrix is a permutation. The the two terms, say Qi. and Qib, disappear from
matrix Q is thus equal to a permutation up to (3.11) since, by orthogonality of Q,
multiplication of numbers of unit modulus. This
may be written as Q = DP, where D need only be Qi.Qib = 6.b. (A.26)
i
diagonal with unit modulus entries. This proves the
second part Of the theorem. Then repeat the procedure as long as entries of
This result also holds in the complex case, as Q are left in the expression of f23.4. This finally
demonstrated in [15], provided the following gives the expression
definition is used in Theorem 16: ~ ( Q ) =
Z[Ku ... i[ 2" [] : F a b c d -~-
abc abcd
P. Comon / Signal Processing 36 (1994) 287 314 309
3 y~ /'.bc/'.b.l"..:r.:. points:
abcdef
Then the contrast (A.35) is of the form (4.2) if we sider plane rotations with real cosine and complex
denote the coefficients as follows: sine:
b~ = ao~ + a],
b3 = 2 ( a 3 a 4 - aoal),
O=s/c, c 2 ÷ [ s l 2 = 1. (A.40)
b2 = 4aoz + 4a ] + a 2 + a 2 + 2aoa2 + 2aza4,
O u t p u t cumulants given in the real case in (A.36)
bx = 2 ( - 3aoal + 3aaa4 + ala4 - aoa3 (A.38) now take the following form [-15]:
Taking the derivative of ~(~) given in (4.2) with - 2{r2,220 + r~2220"} + r2222.
respect to ~ yields directly (4.4) if we denote Then the contrast function, which is real by con-
struction, can be calculated as a function of the
c,, = r1111r~l~2 -/'2222/'1222,
variable ~ = 0 - 1/0", since ~, is a rational function
2 2
C 3 = I) - - 4 ( f f 1 1 1 2 ÷ /"1222) satisfying ~,(1/0") = ~O(0). It can be shown that the
precise form of ~(~) is
- 3/"llZ2(r1111 + /"2222), 2 2
~(~) = Z ~ b~j~i~*J/[~¢ * + 4] 2, (A.42)
c2 = 30, (A.39)
i=-2 j=-2
0~<i+j~<4
C 1 = 3v - 2/"1111/"2222 - 32/"1112/'1222
where the coefficients bij are defined as follows.
-- 36F2122, Denote for the sake of simplicity
Co = -- 4(v + 4c4), ao = / ' 1 1 1 1 , al = 4 / ' 1 1 1 2 , a 2 = ff1122, (A.43)
with I~2 ~ 2/-'2112 , a 3 ~ 4/"1222 , a 4 ~ /"2222.
Then
v = (/"11xl + r2222 - 6r~22)(/"~22~ -/"1112),
b22 = a~ + a L
V = /"2111 + /"2222 .
b21 = a4a* - aoal, b12 = b*l,
b~l = 4(a 2 + a ]) + (a~a* + aaa~)/2
7.A.19. Extension to the complex case + 2a2(ao + a4),
b2o = (a~ + a'2)/4 + 82(ao + a,), bo2 = b~o,
In the complex case, the characteristic function
can be defined in the standard way, i.e. q~y(u)= blo = az(a* - a,) + a2(a3 - a*)/2
E{exp[iRe(y*u)]}, but cumulants can take
+ 3(a4a~ - aoal) + (a4al - aoa~),
various forms. Here we use / ] / j u = c u m { ~ i ,
Y*, .v*, .vl} as a definition of cumulants. We con- box = b~o, (A.44)
P. Comon / Signal Processing 36 (1994) 287-314 311
b2-1 = a2(a~ - ai)/2, b-12 = b*-1, Let us prove the converse implication. If
e(A, A ) = 0, then each term in (5.1) must vanish;
b1-1 = 2a2a2 + (a 2 + a2)/2 + a,a* this yields
+ 2a2(ao + a4), b _ l l = b*_l, (a) ~ l D i j [ = 1, ( N ) ~ I D u I = I ,
j i
b2_ 2 = a2/2, b 22 = b * - 2 ,
(c) ~ IDulz = 1, (d) ~ IDul z = 1. (A.46)
j i
bo = 2a 2 + a , a * + a2a'~ + 2a 2 + a3a* + 2a ]
The first two relations show that matrix b is bi-
+ 4aoa4 + ala3 + ala3 "4- 4a2(ao + a4). stochastic. Yet, we have shown in Appendix A.13
that for any bistochastic matrix D, if there exists
The other coefficients are null. It may be checked
a vector ~ with real positive entries and at most one
that this exPression of the contrast coincides with
null component such that
(A.38) when all the quantities involved are real.
Next, stationary points of ff can be calculated in IID~ II = II2 ~ II, (A.47)
various ways. One possibility is to resort to the
equation t h e n / ) is a permutation. Here the result we have to
prove is weaker. In fact, choose as vector ~ the
Kll11K1112 - K1222K2222.= 0, (A.45) vector formed of ones. From (A.46) we have
A L G O R I T H M 21 (Validation).
(1) Compute the SVD of the matrix A A =
7.A.20. Proof o f (5.2) and (5.3) [ ( 1 - p ) M #flI] as A A = VZU*, where V is
N x p, S is p × p and U* is p x 2N, and p de-
Assume/] = AAP. Then by definition the gap is notes the rank of AA. Set L = VS.
a function of the matrix D' = P*D if we denote (2) Initialize F -- L.
D = A- 1A (in fact, A A P = A A P = A_P). However, (3) Begin a loop on the sweeps: k = 1, 2. . . . . kmax;
a glance at (5.1) suffices to see that the order in kmax ~< 1 + xfP"
which the rows of D are set does not change the (4) Sweep the p( p - 1)/2 pairs (i,j), according to
expression of the gap e(A, ,zi). a fixed ordering. For each pair:
312 P. Comon / Signal Processing 36 (1994) 287-314
(a) Use the multilinearity relation (3.8) in order [3] S. Bellini and R. Rocca, "Asymptotically efficient blind
to compute the required cumulants F~(i,j) as deconvolution", Si9nal Processin 9, Vol. 20, No. 3, July
functions of cumulants tcpppp, and matrices 1990, pp. 193 209.
[4] M. Basseville, "'Distance measures for signal processing
AA and F.
and pattern recognition", Signal Processin9, Vol. 18, No. 4,
(b) Find the angle ~ maximizing ~(Q(i.j)), December 1989, pp. 349 369.
where Q(~'J) is the plane rotation of angle a, [5] A. Benveniste, M. Goursat and G. Ruget, "Robust identi-
a ~ ] - ~/4, rt/4], in the plane defined by fication of a nonminimum phase system", IEEE Trans.
components {i,j}. Use the expressions Automat. Control, Vol. 25, No. 3, 1980, pp. 385 399.
given in Appendices A~17 and A.18 for this [6] R.E. Blahut, Principles and Practice of InJbrmation Theory,
Addison-Wesley, Reading, MA, 1987.
purpose.
[7] D.R. Brillinger, Time Series Data Analysis and Theory,
(c) Accumulate F := FQ (i'j)*. Holden-Day, San Francisco, CA, 1981.
(5) End of the loop on k if k = k . . . . or if all estim- [8] J.F.Cardoso, "Sources separation using higher order mo-
ated angles are small (compared to machine ments", Proc. Internat. Conf. Acoust. Speech Signal Pro-
precision). cess., Glasgow, 1989, pp. 2109-2112.
(6) Compute the norm of the columns of F: [9] J.F. Cardoso, "Super-symmetric decomposition of the
fourth-order cumulant tensor, blind identification of more
Aii II F..iII.
=
sources than sensors", Proc. Internat. Conj. Acoust. Speech
(7) Sort the entries of A in decreasing order: Signal Process. 91, 14 17 May 1991.
[10] J.F. Cardoso, "lterative techniques for blind sources separ-
A := PAP* and F : = FP*. ation using only fourth order cumulants", Conf. EU-
SIPCO, 1992, pp. 739 742.
(8) Normalize F by the transform F ' = F A - 1 [11] T.F. Chan, "An improved algorithm for computing the
(9) Fix the phase (sign) of each column of F accord- singular value decomposition", ACM Trans Math. Soft-
ware, Vol. 8, No. March 1982, pp. 72 83.
ing to Definition 4(e). This yields F : = FD.
[12] P. Comon, "Separation of sources using high-order
Source MATLAB codes can be obtained from the cumulants", SPIE Conf. on Advanced Algorithms and
author upon request. [] Architectures for Siynal Processin9, Vol. Real-Time Signal
Processin 9 XII, San Diego, 8 10 August 1989, pp.
170 181.
[13] P. Comon, "Separation of stochastic processes", Proc.
8. Acknowledgments Workshop Higher-Order Spectral Analysis, Vail, Colorado,
June 1989, pp. 174 179.
The author wishes to thank Albert Groenen- [14] P. Comon, "Process and device for real-time signals separ-
boom, Serge Prosperi and Peter Bunn for their ation", Patent registered for Thomson-Sintra, No.
9000436, December 1989.
proofreading of the paper, which unquestionably
[15] P. Comon, "Analyse en Composantes Ind6pendantes et
improved its clarity. References [44], indicated by Identification Aveugle", Traitement du Signal, Vol. 7, No.
Ananthram Swami, and [6], indicated by Albert 5, December 1990, pp. 435 450.
Benveniste some years ago, were also helpful. Last- [16] P. Comon, "Independent component analysis", lnternat.
ly, the author is indebted to the reviewers, who Sional Processin9 Workshop on High-Order Statistics,
Chamrousse, France, 10-12 July 1991, pp. 111-120; re-
detected a couple of mistakes in the original sub-
published in J.L. Lacoume, ed., Hioher-Order Statistics,
mission, and permitted a definite improvement in Elsevier, Amsterdam 1992, pp. 29-38.
the paper. [17] P. Comon, "Kernel classification: The case of short sam-
ples and large dimensions", Thomson ASM Report,
93/S/EGS/NC/228.
[18] H. Cram6r, Random Variables and Probability Distribu-
9. References tions, Cambridge Univ. Press, Cambridge, 1937.
[19] G. Darmois, "Analyse G6n6rale des Liaisons Stochas-
[1] M. Abramowitz and I.A. Stegun, Handbook of Mathemat- tiques", Rev. Inst. Internat. Stat., Vol. 21, 1953, pp. 2-8.
ical Functions, Dover, New York, 1965, 1972. [20] G. Desodt and D. Muller, "Complex independent com-
[2] Y. Bar-Ness, "Bootstrapping adaptive interference can- ponent analysis applied to the separation of radar signals",
celers: Some practical limitations", The Globecom Conf., in: Torres, Masgrau and Lagunas, eds., Proc. EUSIPCO
Miami, November 1982, Paper F3.7, pp. 1251-1255. Conf., Barcelona, Elsevier, Amsterdam, 1990, pp. 665 668.
P. Comon / Signal Processing 36 (1994) 28~314 313
[21] D.L. Donoho, "On minimum entropy deconvolution", [37] C. Jutten and J. Herault, "Blind separation of sources, Part
Proc. 2nd Appl. Time Series Syrup., Tulsa, 1980; reprinted in I: An adaptive al'gorithm based on neuromimatic architec-
Applied Time Series Analysis II, Academic Press, New ture", Signal Processing, Vol. 24, No. 1, July 1991, pp. 1 10.
York, 1981, pp. 565-608. [38] A.M. Kagan, Y.V. Linnik and C.R. Rao, Characterization
[22] D. Dugu+, "Analycit6 et convexit6 des fonctions Problems in Mathematical Statistics, Wiley, New York, 1973.
caract&istiques', Annales de l'lnstitut Henri Poincarb, Vol. [39] M. Kendall and A. Stuart, The Advanced Theory of Statis-
XII, 1951, pp. 45-56. tics, Vol. 1, New York, 1977.
[23] P. Duvaut, "Principes de m6thodes de s6paration fond6es [40] S. Kotz and N.L. Johnson, Encyclopedia ~f" Statistical
sur les moments d'ordre sup6rieur', Traitement du Signal, Sciences, Wiley, New York, 1982.
Vol. 7, No. 5, December 1990, pp. 407 418. [41] J.L. Lacoume, "Tensor-based approach of random vari-
[24] L. Fety, Methodes de traitement d'antenne adapt+es aux ables", Talk given at ENST Paris, 7 February 1991.
radiocommunications, Doctorate Thesis, ENST, 1988. [42] J.L. Lacoume and P. Ruiz, "'Extraction of independent
[25] P. Flandrin and O. Michel, "Chaotic signal analysis and components from correlated inputs, A solution based on
higher order statistics", Conf. EUSIPCO, Brussels, 24 27 cumulants", Proc Workshop Higher-Order Spectral Ana-
August 1992, pp. 179 182. lysis, Vail, Colorado, June 1989, pp. 146-151.
[26] P. Forster, "Bearing estimation in the bispectrum do- [43] P. McCullagh, "Tensor notation and cumulants", Biomet-
main", IEEE Trans. Signal Processing, Vol. 39, No. 9, rika, Vo. 71, 1984, pp. 461-476.
September 1991, pp. 1994-2006. [44] P. McCullagh, Tensor Methods in Statistics, Chapman and
[27] J.C. Fort, "'Stability of the JH sources separation algo- Hall, London 1987.
rithm", Traitement du Signal, Vol. 8, No. 1, 1991, pp. [45] C.L. Nikias, "ARMA bispectrum approach to nonmini-
35 42. mum phase system identification", IEEE Trans. Acoust.
[28] B. Eriedlander and B. Porat, "Asymptotically optimal es- Speech Sional Process., Vol. 36, April 1988, pp. 513 524.
timation of MA and ARMA parameters of non-Gaussian [46] C.L. Nikias and M.R. Raghuveer, ~'Bispectrum estimation:
processes from high-order moments", IEEE Trans Auto- A digital signal processing framework", Proc. IEEE, Vol.
mat. Control. Vol. 35, January 1990, pp. 27-35. 75, No. 7, July 1987, pp. 869-891.
[29] M. Gaeta, Les Statistiques d'Ordre Sup6rieur Appliqu6es [47] B. Porat and B. Friedlander, "Direction finding algorithms
:i la S6paration de Sources, Ph.D. Thesis, Universit+ de based on higher-order statistics", IEEE Trans. Signal Pro-
Grenoble, July 1991. cessing, Vol. 39, No. 9, September 1991, pp. 2016 2024.
[30] M. Gaeta and J.L. Lacoume, "Source separation with- [48] J.G. Proakis and C.L. Nikias, "'Blind equalization", in:
out a priori knowPedge: The maximum likelihood Adaptive Signal Processing, Proc. SPIE, Vol. 1565, 1991,
solution", in: Torres, Masgrau and Lagunas, eds., Proc. pp. 76 85.
EUSIPCO Con.['., Barcelona, Elsevier, Amsterdam, 1990, [49] C.R. Rao, Linear Statistical lnjerence and its Applications,
pp. 621 624. Wiley, New York, 1973.
[31] E. Gassiat, D+convolution Aveugle, Ph.D. Thesis, Univer- [50] P.A. Regalia and P. Loubaton, "'Rational subspace estima-
sit6 de Paris Sud, January 1988. tion using adaptive lossless filters", IEEE Trans. Acoust.
[32] G. Giannakis and S. Shamsunder, "'Modeling of non- Speech Signal Process., July 1991, submitted.
Gaussian array data using cumulants: DOA estimation [51] G. Scarano, "'Cumulant series exapansion of hybrid non-
with less sensors than sources", Proc. 25th Ann. Conj. on linear moments of complex random variables", IEEE
Information Sciences and Systems, Baltimore, 20 22 Trans. Signal Processing, Vol. 39, No. 4, April 1991, pp.
March 1991, pp. 600 606. 1001-1003.
[33] G. Giannakis and A. Swami, "On estimating noncausal [52] O. Shalvi and E. Weinstein, "'New criteria for blind decon-
nonminimum phase ARMA models of non-Gaussian pro- volution of nonminimum phase systems", IEEE Trans.
cesses", IEEE Trans. Acoust. Speech Signal Process., Vol. Inform. Theory, Vol. 36, No. 2, March 1990, pp. 312 321.
38. No. 3, March 1990, pp. 478 495. [53] S. Shamsunder and G.B. Giannakis, "Modeling of non-
[34] G. Giannakis and M.K. Tsatsanis, "A unifying maximum- Gaussian array data using cumulants: DOA estimation of
likelihood view of cumulant and polyspectral measures for more sourcdes with less sensors", Signal Processing, Vol.
non-Gaussian signal classification and estimation", IEEE 30, No. 3, February 1993, pp. 279-297.
Trans. Inform. Theory, Vol. 38, No. 2, March 1992, pp. [54] J.E. Shore and R.W. Johnson, "Axiomatic derivation of the
386- 406. principle of maximum entropy", IEEE Trans. Inform
[35] G. Giannakis, Y. Inouye and J.M. Mendel, "Cumulant- Theory, Vol. 26, January 1980, pp. 26 37.
based identification of multichannel moving average [55] A. Souloumiac and J.F. Cardoso, "Comparaisons de
models", IEEE Automat. Control, Vol. 34, July 1989, pp. M&hodes de S6paration de Sources", X l l l t h Coll.
783- 787. GRETSI, September 1991, pp. 661 664.
[36] Y. lnouye and T. Matsui, "Cumulant based parameter [56] A. Swami and J.M. Mendel, "ARMA systems excited by
estimation of linear systems", Proc. Workshop Higher- non-Gaussian processes are not always identifiable", IEEE
Order Spectral Analysis, Vail, Colorado, June 1989, pp. Trans. Automat. Control, Vol. 34, No. 5, May 1989, pp.
180-185. 572 573.
314 P. Comon / Signal Processing 36 (1994) 28~314
[57] A. Swami and J.M. Mendel, "ARMA parameter estimation shop on High-Order Statistic.s, Chamrousse, France, 10 12
using only output cumulants", IEEE Trans. Acoust. Speech July 1991, pp. 261-264.
Signal Process., Vol. 38, July 1990, pp. 1257-1265. [62] J.K. Tugnait, "Identification of linear stochastic systems
[58] A. Swami, G.B. Giannakis and S. Shamsunder, "A unified via second and fourth order cumulant matching", IEEE
approach to modeling multichannel ARMA processes us- Trans. Inform. Theory, Vol. 33, May 1987, pp. 393 407.
ing cumulants', IEEE Trans. Signal Process., submitted. [63] J.K. Tugnait, "Approaches to FIR system identifica-
[59] L. Swindlehurst and T. Kailath, "Detection and estimation tion with noise data using higher-order statistics",
using the third moment matrix", Internat. Conf. Acoust. Proc. Workshop Higher-Order Spectral Analysis, Vail,
Speech Signal Process., Glasgow, 1989, pp. 2325-2328. June 1989, pp. 13-18.
[60] L. Tong, V.C. Soon and R. Liu, "AMUSE: A new blind [64] J.K. Tugnait, "Comments on new criteria for blind decon-
identification algorithm", Proc. lnternat. Conf. Acoust. volution of nonminimum phase systems", IEEE Trans.
Speech Signal Process., 1990, pp. 1783-1787. Inform. Theory, Vol. 38, No. 1, January 1992, 210 213.
[61] L. Tong et al., "A necessary and sufficient condition of [65] D.L. Wallace, "Asymptotic approximations to distribu-
blind identification", lnternat. Signal Processing Work- tions", Ann. Math. Statist., Vol. 29, 1958, pp. 635-654.