You are on page 1of 5

Home | Sessions | Authors | Session 5.

Exact Identification of an All-Pole System


From Its Response to a Periodic Input
C.S. Ramalingam
Department of Electrical Engineering
Indian Institute of Technology Madras
Chennai600 036
csr@iitm.ac.in
Abstract

50

We show that it is possible to identify exactly an all-pole


system from its response to a periodic input. The problem is motivated by speech analysis, where voiced speech
is modeled as the response of an all-pole filter to an impulse train. The autocorrelation method of linear predictive (LP) analysis estimates the inverse filter using a criterion that is equivalent to maximizing the residual signals spectral flatness measure. This is not a satisfying
criterion, and a better choice is requiring the spectral envelope to be flat, but it results in a nonlinear problem.
However, if we constrain the spectrum to be a constant at
a discrete set of points, namely at multiples of the fundamental frequency, not only does the problem become
linear but also yields the exact inverse filter under certain
conditions. This framework is general in that it can handle the case of the excitation being any periodic input, but
requires knowledge of the input spectrum at multiples of
the fundamental frequency. We illustrate the effectiveness of our method by using synthetic examples. The
proposed method is sensitive to the starting point of the
analysis window.

40

A common approach for modeling short segments of voiced speech is viewing it as the output of an all-pole filter
excited by an impulse train [1]. In this simple but effective model, the filter models the vocal tract, and our goal
is to estimate its coefficients from a short segment of the
output. Fig. 1 shows the spectrum of a filter and its output when excited by an impulse train with f0 = 100 Hz.
Clearly, the spectral envelope of the output is a close
match to the filter response, except in the valley region.
On the other hand, Fig. 2 shows what happens when f0
is increased to 1000 Hz. In this case, the harmonic peaks
sample the filters response only sparsely, and the filters
spectrum does not at all appear to be the envelope of the
output signal. We will later build on this example.

Magnitude (dB)

20
10
0
10
20

Output
spectrum

30
40
0

500

1000

1500

2000

2500

3000

3500

4000

Frequency (Hz)

Figure 1: Spectrum of an all-pole filter and its output


when excited by an impulse train with f0 = 100 Hz. The
envelope of the output and filter spectrum match closely
(except in the valley region).
50
40
30
20

Magnitude (dB)

1. Introduction

Filter spectrum

30

10

Filter spectrum

Output
spectrum

0
10
20
30
40
0

500

1000

1500

2000

2500

3000

3500

4000

Frequency (Hz)

Figure 2: The output spectrum when the impulse trains


f0 is increased to 1000 Hz. The harmonic peaks sample
the filter spectrum sparsely, and there seems to be a gross
mismatch between the outputs spectral envelope and the
filter response.

The most popular method of estimating the model


coefficients is based on the principles of linear predictive coding (LPC) [2, 3]. In many practical applications
the autocorrelation method [1] is used because it always
yields a stable filter. It is well-known that the LPC approach suffers from drawbacks, e.g., for voiced speech
segments the LP filters spectral peaks are biased towards
the pitch harmonics, the drawback being inherent in the
error criterion [2, 4]. Fig. 3 shows this bias very clearly,
wherein the LPC spectrum was obtained by applying the
autocorrelation method to the example of Fig. 2 (see Section 3 for more details).
It has been shown that the autocorrelation method is
equivalent to maximizing the spectral flatness measure of
the output of the inverse filter [5, 6]. If the excitation signal to the all-pole filter is an impulse train and we filter
the output by the corresponding exact inverse filter, the
resulting residual signal will be the original impulse train.
For this residual signal, what is flat is the spectral envelope, rather than the spectrum itself. Therefore, it seems
that a more correct criterion would be to require that the
spectral envelope of the residual signal be flat, rather than
requiring this of the spectrum. However, constraining the
envelope is a very difficult problem to formulate: while
it may be easy to visualize the envelope of a spectrum, it
is very difficult to translate this mental picture into concrete mathematical terms. Even if we can come up with
a mathematical formulation, it will most likely result in a
nonlinear problem, solving which will be difficult.
Instead, we propose to constrain the residual signals
spectrum to be constant at a certain discrete set of points.
An intuitively appealing choice of frequencies is k0 ,
where 0 is the fundamental frequency of the periodic input. In the next section we show that seeking the inverse
filter that minimizes the norm of the residual subject to
the constraint of its spectrum being constant at k0 leads
to a linear minimization problem. In Section 3 we show
through simulation examples that the solution gives the
exact inverse filter. The constraint on the residual spectrum to be constant at multiples of the pitch frequency is
a special case of the general approach of constraining it to
be G(ejk0 ), where G(ej ) is the spectrum of the input,
which is assumed to be known at = k0 .
The effectiveness of placing constraints at a discrete
set of frequencies, thereby converting a nonlinear problem into a linear one and yet obtaining an effective solution, has been demonstrated in a different context in
[7]. In that work, given a non-positive sequence, i.e., one
whose Fourier transform was not strictly non-negative,
the goal was to find a positive sequence that was closest
in the mean-square sense. The solution proposed in [8]
was nonlinear. Virtually the same solution was arrived at
by iteratively correcting the spectrum at the most negative
points, which resulted in a linear minimization problem
at each step [7].

2. Proposed Method
Let x[n], n = 0, 1, . . . , N 1, be the output of a p-th order all-pole filter 1/A(z) = 1/(1 + a1 z 1 + + ap z p )
in response to the periodic input g[n]. To motivate our
idea, let the input be a periodic impulse train, although
our framework is applicable to any periodic input. We
p
X
bk z k be
assume that we know G(ej ). Let B(z) =
k=0

the inverse filter, which when excited by x[n] produces


the residual e[n]. Over the interval n = p, . . . , N 1 it
can be expressed in matrix form as follows:
e = Xb

(1)

where the (N p) (p + 1) matrix X is

x[p]
x[p 1]
x[0]
x[p + 1]
x[p]

x[1]

X=
..
..
..
..

.
.
.
.
x[N 1] x[N 2] x[N p 1]

and b = (b0 b1 . . . bp )T . Note that b0 is not constrained


to be unity. We seek the b that minimizes E = kek22 =
eT e, subject to the constraint that the residual spectrum
be constant at k0 , k = 0, 1, . . . , L. That is,
min eT e

subject to WT e = d

(2)

where

W=

1
1
1
..
.

1
ej0
ej20
..
.

1 ej(M1)0

..
.

1
ejL0
e

j2(L0 )

..
.

ej(M1)(L0 )

and M = N p. For the examples in Section 3, L was


chosen such that L0 < 2 (L + 1)0 . The constant
value of the spectrum at k0 has been set to 1 without
def
loss of generality, and hence d = [1 1 1]T = 1.
j0
For a general periodic input g[n], d = [G(e ) G(ej0 )
G(ej20 ) . . . G(ejL0 )]T . In particular, for the impulse
train, G(ejk0 ) = constant (real-valued).
The solution to the constrained minimization of (2) is
well-known and is solved using the method of Lagrange
multipliers [9]. We begin by replacing e with Xb, to get

E = bT Rb 2T CT b d
(3)
where R = XT X and C = XT W. Differentiating E
w.r.t. b we get
E
= 2Rb 2C
b

(4)

Setting the above to zero yields b = R1 C. The parameter is obtained from the constraint CT b = d,

50

50

40

40

Original filter
30

30

Original and
Estimated filter

20

Magnitude (dB)

Magnitude (dB)

20
10
0

LPC filter

10

Estimated filter

10
0
10

LPC filter
20

20

30

30

40
0

500

1000

1500

2000

2500

3000

3500

4000

Frequency (Hz)

40
0

500

1000

1500

2000

2500

3000

3500

4000

Frequency (Hz)

Figure 3: Frequency response of the original and estimated filters when the input is an impulse train. They
match exactly. The estimated filters spectrum was computed after normalizing b0 to unity. f0 = 1000 Hz, and
p = 6, L = 7. Also shown is the sixth order LPC filters
response (autocorrelation method). Its bias towards pitch
harmonics is clearly seen.

Figure 4: The proposed method gives an estimate that is


very sensitive to the analysis window position. The result
for a different position is shown above, which is significantly worse compared with the exact answer obtained
previously. On the other hand, the LPC result has hardly
changed.

leading us to the inverse filter that yields the minimum


error:
1
b = R1 C CT R1 C
d
(5)

when f0 = 100 Hz.


The performance of the proposed method comes at
a price, namely, it is very sensitive to where the analysis window is located. In Fig. 4 the result of applying
(5) on a different segment is shown. It is clear that for
this analysis window position, the new method gives an
estimate that is significantly worse in comparison to the
exact answer obtained previously. In sharp contrast, the
autocorrelation method of LPC analysis is far less sensitive to the analysis position. In [11] we have modified our
method such that it is no longer sensitive to the analysis
window location.
An intuitively appealing criterion for locating the analysis window is to choose the position that gives the minimum kek. That is, for a window beginning at m, if we
denote the corresponding error vector by em , the optimal
choice is given by min kem k. In the examples that we
m
tried, this has proved to be effective.
We present one more example in which the excitation is a synthetic glottal pulse train, rather than impulses.
The chosen resonant frequencies and pitch are closer to
a natural speech example. The true coefficients are [1,
2.0535, 2.4818, 2.1442, 2.2336, 1.6961, 0.7717]T ,
corresponding to the following resonances and bandwidths:
750 (90), 1100 (110), and 2550 (130). One period of the
excitation pulse was constructed as follows [10]: the first
ten samples were zero; the next
were gener
 13 samples

, 0 n 12; the
sin n
ated using the formula
26
12
 

last four samples were generated using sin n ,


8
8
1 n 4. These three sequences were concatenated
to get one period (27 samples), giving an f0 of 296.3 Hz.
The complex spectrum of a 30 ms segment of the input

Note that the corresponding B(z) is not guaranteed to be


minimum phase.
The solution to the problem of minimizing a quadratic
form with linear constraints has been known for a long
time. What is novel in this work is formulating our system identification problem within this framework and,
through simulation examples, demonstrating exact identification.

3. Simulation Results
Consider an all-pole filter with coefficients [1, 4.1780,
8.2209, 9.8011, 7.5151, 3.5005, 0.7717]T . This is the
same filter used in Figs. 1 and 2. The corresponding resonant frequencies and bandwidths (in Hz) are 350 (90),
750 (110), and 1500 (130). The excitation is an impulse
train with f0 = 1000 Hz, with the sampling frequency
fs = 8 kHz. These parameters have been chosen such
that the formants are located well away from the pitch
harmonics. In particular, note that there are two formants
between DC and the fundamental.
The result of applying (5) on a 30 ms segment of the
above filters output with p = 6, L = 7 is shown in
Fig. 3, which shows the magnitude spectra of 1/A(z)
and 1/B(z), along with the result of LPC analysis (autocorrelation method, sixth order). Equation (5) is not
guaranteed to give a solution with b0 = 1, but after normalization yields the exact answer. On the other hand,
the LPC spectrum is severely biased towards the pitch
harmonics. Although not shown here, it can be easily
verified that the LPC method produces very good results

40

4. Discussion

30

In both the examples given in the previous section and


the ones that we have tried, the solution of (5) gives an
answer that is real-valued, even though we did not impose
any such constraint.
Our method requires knowledge of G(ej ), which is
almost always not known if we want to apply our method
to a segment of natural speech. In such cases we need
to solve the problem in a blind manner, i.e., by assuming
that G(ej ) is not known. This is currently being investigated.
In Fig. 6 we have given an example of how the performance suffers if we use the constraint d = 1 even
when the input is not an impulse train. Both LPC and our
method estimate the first two formants reasonably well,
although for this example our method gives smaller bandwidth estimates. Both methods failed to capture the third
formant at 2550 Hz. We have not yet carefully studied or
attempted to quantify how badly the performance suffers
in such cases.
As mentioned in Section 3, in [11] we have addressed
the methods sensitivity to the starting point of the analysis window. In that work, we have also investigated the
effects of analysis window size, model order, choice of
constraint frequencies, errors in 0 , and analyzed natural
speech in the cases where the corresponding electroglottograph signal was available.

Magnitude (dB)

20

Original and
Estimated filter

10

10

LPC filter

20

30
0

500

1000

1500

2000

2500

3000

3500

4000

Frequency (Hz)

Figure 5: Exact results are also obtained when the excitation is a periodic pulse. f0 = 296.3 Hz, p = 6, L = 26.
50

Minimum norm
window location,
unity constraint

40

Magnitude (dB)

30

20

10

Original
LPC

10

30
0

5. Conclusion

Exact answer
window location,
unity constraint

20

500

1000

1500

2000

2500

3000

3500

4000

Frequency (Hz)

Figure 6: When the excitation is not an impulse train and


yet we constrain d = 1, the performance of the proposed
method suffers. Exact answer window location: the
analysis window is in the same position as in Fig. 5, i.e.,
the location that will yield the exact answer if the correct
constraint is used. Minimum norm window location:
window position that yields min kem k with d = 1.

We have proposed a method of identifying exactly an allpole system from its response to a periodic input. We
need knowledge of the input spectrum at k0 , where 0
is the fundamental frequency. We obtained the inverse filter by minimizing the norm of the residual subject to the
constraint that its spectrum be equal to that of the input
at k0 . The method, as presented in this paper, is sensitive to the analysis window location, and based on the
examples that we have tried, the criterion of choosing the
position that minimizes kem k seems to be effective.

6. Acknowledgment
was evaluated at k0 to obtain d. The results for p = 6
and L = 26 are given in Fig. 5 for the window location
that minimized kek. As in the previous example, the proposed method gives virtually exact results.
When the excitation is not an impulse train, instead of
using G(ejk0 ) as the constraint values, if we set d = 1,
in the light of the above simulation results, we cannot
hope to get the exact estimate. For the window location
that gave the exact answer when G(ejk0 ) was used, the
result of using d = 1 in the second example is given in
Fig. 6. Also shown in that figure is the estimate that minimizes kem k, which occurs at a different analysis window
location.

The author wishes to thank Prof. S. Umesh of IIT Kanpur


for his comments on the paper.

7. References
[1] L. R. Rabiner and R. W. Schafer, Digital Processing
of Speech Signals. Englewood Cliffs, NJ: PrenticeHall, 1978.
[2] J. I. Makhoul, Linear prediction: A tutorial review, Proceedings of the IEEE, vol. 63, pp. 561
580, Apr. 1975.
[3] B. S. Atal and S. L. Hanauer, Speech analysis

and synthesis by linear prediction of the speech


wave, Journal of the Acoustical Society of America, vol. 50, no. 2, pp. 637655, Aug. 1971.
[4] A. El-Jaroudi and J. Makhoul, Discrete all-pole
modeling, IEEE Transactions on Signal Processing, vol. 39, no. 2, pp. 411422, Feb. 1991.
[5] A. H. Gray, Jr. and J. D. Markel, A spectral-flatness
measure for studying the autocorrelation method of
linear prediction of speech analysis, IEEE Trans.
Acoust., Speech, Signal Processing, vol. 22, no. 3,
pp. 207217, Jun. 1974.
[6] J. D. Markel and A. H. Gray, Jr., Linear Prediction
of Speech. New York, NY: Springer-Verlag, 1976.
[7] C. S. Ramalingam and R. J. Vaccaro, A simplified
computational algorithm to obtain sequences with
non-negative Fourier transforms, IEEE Transac-

tions on Signal Processing, vol. 39, pp. 14591462,


Jun. 1991.
[8] J. Cadzow and Y. Sun, Sequences with positive semidefinite Fourier transforms, IEEE Trans.
Acoust., Speech, Signal Processing, vol. 34, no. 6,
pp. 15021510, 1986.
[9] D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods. Nashua, NH: Athena
Scientific, 1996.
[10] P. Hedelin, High quality glottal LPC-vocoding,
in Proceedings of IEEE ICASSP86, Tokyo, Japan,
Apr. 1986, pp. 465468.
[11] C.S. Ramalingam and B.H. Sri Hari, A constrained least-squares method for all-pole modeling
of speech, in preparation.

You might also like