You are on page 1of 11

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 45, NO.

1, FEBRUARY 1998

151

A Recurrent Neural-Network-Based Real-Time Learning Control Strategy Applying to Nonlinear Systems with Unknown Dynamics
Tommy W. S. Chow and Yong Fang
Abstract In this paper, we present a real-time learning control scheme for unknown nonlinear dynamical systems using recurrent neural networks (RNNs). Two RNNs, based on the same network architecture, are utilized in the learning control system. One is used to approximate the nonlinear system, and the other is used to mimic the desired system response output. The learning rule is achieved by combining the two RNNs to form the neural network control system. A generalized real-time iterative learning algorithm is developed and used to train the RNNs. The algorithm is derived by means of two-dimensional (2-D) system theory that is different from the conventional algorithms that employ the steepest optimization to minimize a cost function. This paper shows that an RNN using the real-time iterative learning algorithm can approximate any trajectory tracking to a very high degree of accuracy. The proposed learning control scheme is applied to numerical problems, and simulation results are included. The results are very promising, and this paper suggests that the 2-D-system-theory-based RNN learning algorithm provides a new dimension in real-time neural control systems. Index Terms Learning control, nonlinear systems, recurrent neural networks, 2-D system theory.

I. INTRODUCTION

UE to increasingly complex dynamical systems, more system designers have turned away from conventional control methods to intelligent-based control methods. Ku [1] pointed out that the fundamental shortcomings of current adaptive control techniques, such as nonlinear control laws which are difcult to derive, geometrically increasing complexity with the number of unknown parameters, and general unsuitability for real-time applications, have compelled researchers to look for solutions elsewhere. Recently, much success has been achieved in the use of neural networks (NNs) for control of nonlinear dynamic systems [2][6]. It has been shown that NNs have emerged as a successful tool in the eld of dynamical control systems. In this paper, we exploit a realtime iterative learning control of unknown nonlinear systems using NNs. There are two major classes of NNs that have become enormously important in recent years, namely, feedforward neural networks (FNNs) and recurrent neural networks (RNNs). It is known that a three-layer FNN is capable of approximating not only any continuous map arbitrarily closely, but, also,
Manuscript received September 23, 1996; revised May 6, 1997. The authors are with the Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong. Publisher Item Identier S 0278-0046(98)00402-X.

the derivatives of such maps. However, the FNN is a static mapping and is without the aid of tapped delays. The FNN is unable to represent a dynamic system mapping. Although most people use the FNN together with tapped delays to deal with dynamical problems, the FNN requires a lager number of neurons to represent dynamical responses in the time domain. In addition, tapped delay time representation is only limited to a nite number of previous measured outputs or imposed inputs. On the other hand, the RNN consists of both feedforward and feedback connections between layers and neurons forming complicated dynamics and is able to deal with time-varying input or output through their own natural temporal operation. Thus, the RNN is a dynamic mapping and is more appropriate than the FNN when applied to dynamical systems. Recently, there have been some signicant results working on the approximation capability of RNNs. Funahashi and Nakamura [7] have proved that any nite-time trajectory of a given -dimensional dynamical system can be approximately realized by internal states of the output units of a continuous-time RNN when appropriate network topologies together with appropriate initial conditions are used. Jin et al. [8] showed that some of the states of a class of discrete-time RNNs described by a set of difference equations may be used to approximate uniformly a state-space trajectory produced by either a discrete-time nonlinear system or a continuous function on a closed discrete-time interval. However, this approximation process has to be carried out by an adaptive learning process. When RNNs are used to approximate and control an unknown nonlinear system through an on-line learning process, they may be considered as subsystems of an adaptive control system. The weights of the networks need to be updated using a dynamical learning algorithm during the control process. In the training schemes of RNNs, most algorithms were developed in accordance with different problems. The learning algorithms adjust the weights of the network, so that the trained network will exhibit the required properties. The most general algorithms used for RNNs are usually the gradientdescent learning algorithms. By evaluating the gradient of a cost function with respect to the weights of the networks, it is possible to iteratively adjust the value of the weights in the direction opposite to the gradient. These algorithms include the backpropagation (BP) through time, recurrent BP, dynamic BP, and real-time recurrent learning (RTRL) algorithms. Recently, Olurotimi [9] presented a systematic

02780046/98$10.00 1998 IEEE

152

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 45, NO. 1, FEBRUARY 1998

approximate the nonlinear system response on behavior, and the other is used to mimic the desired system response output. This paper is organized as follows. The model of a discretetime RNN is given and the problem of an iterative learning RNN is described in Section II. In Section III, we present the 2-D representation of the learning process. In section IV, a real-time iterative learning algorithm is derived for trajectory tracking based on 2-D system theory. A convergence analysis of the learning process is also given. The main results of this paper are detailed in Section V. Finally, a conclusion is drawn in Section VI.
Fig. 1. The architecture graph of a real-time recurrent neural network.

II. NETWORK DYNAMICS AND LEARNING IMPLEMENTATION There are two types of RNNs, continuous-time RNNs and discrete-time ones. In this paper, our study will be mainly of the latter one. Let us consider a class of RNNs called real-time recurrent network (RTRN). The network consists of a total of neurons with external input connections. Let denote the -by-1 external input vector applied to the network at discrete time and let denote the corresponding -by-1 vector of individual neuron outputs produced by one timestep delay at The input vector and one-step delayed output vector are concatenated to form the -by-1 vector , the th element of which is denoted by Let denote a set of indexes for which is an external input, and let denote a set of indexes for which is the output of a neuron; we thus get if if (2.1)

method for the training of the RNN. The method depends upon an exact transformation to reveal an embedded feedforward structure in every RNN. Catfolis [10] proposed an improved implementation of the RTRL algorithm to enhance the performance of the learning algorithm during the training phase. It is worth noting that a decoupled extended Kalman lter algorithm for the training of RNNs with special emphasis on the application to control problems was recently developed by Puskorins and Feldkamp [11]. This training algorithm is more effective than simple gradient-descent algorithms. It is well known that there are feedback paths or recurrent links in RNNs, i.e., the current output depends upon the past outputs of the network. Furthermore, the level of error depends not only on the current parameter set, but, also, on the past parameter set. Obviously, it is necessary to consider these dependencies in the learning schemes. If the gradient method is used, the calculated gradient is called the gradient amidst dynamics [6]. Methods to calculate the gradient amidst dynamics include the sensitivity methods, such as recurrent BP and dynamic BP. The gradient is determined under the assumption that the weights do not vary with time. However, the assumption no longer holds when the weights are adjusted during the learning process. Apparently, the true gradient is not calculated. As a result, convergence cannot always be assured [6]. Inevitably, we have to face the fact that weights are time varying in the learning scheme. One of the intents of this paper is to develop an RNN learning scheme to deal with time-varying weights. In this study, we exploit two-dimensional (2-D) system theory, which itself has a wide range of applications in digital ltering, digital image processing, and many other area [12][15]. As there are two independent dynamics in the 2-D system, this enables us to use one of them to reect the RNN dynamics in the time domain and the other to reect the iterative learning process. A 2D nonlinear system mathematical model is used to describe the dynamic of the RNNs and the dynamics of the learning process. The error dynamic equation is expressed in the form of a 2-D Roessers model with time-varying coefcients [18]. Based on 2-D system theory, we have successfully derived a real-time iterative learning algorithm for trajectory tracking. Consequently, we are able to deal with the learning control of nonlinear systems with unknown dynamics. In this learning control scheme, two RNNs are utilized; one is used to

The network shown in Fig. 1 [16] has a structure of and The network is fully interconnected with a total of forward connections and feedback connections; of the feedback connections are in self-feedback connections. Let denote the -byrecurrent weight matrix of the network. In order to make provision for a threshold for inputs is always the operation of each neuron, one of the constrained to a value of The net internal activity of neuron at time for is given by (2.2) is the union of sets and At the next where time step the output of neuron is computed passing through the nonlinearity Hence, we obtain (2.3) The system equations (2.2) and (2.3), where the index ranges and where is dened in terms of the over the set external inputs and neuron outputs by (2.1), constitute the entire dynamics of the network. The external input vector at time does not affect the output of any neuron in The learning process is to match the network until time the outputs of certain neurons to the desired values at specic time instants.

CHOW AND FANG: RECURRENT NEURAL-NETWORK-BASED REAL-TIME LEARNING CONTROL STRATEGY

153

Let denote the desired (target) response of neuron at time Let denote the set of neurons that are chosen to act as visible units to provide externally reachable output; the remaining neurons are regarded as hidden. A time-varying -by-1 error vector is dened as follows: if otherwise. (2.4)

or (2.10) where is dened as in (2.4), and denotes the norm of matrix . To simplify the expression, we rst suppose all neurons are output neurons, namely, the set Thus, The extension to the general case is straightforward. III. THE 2-D REPRESENTATION OF THE LEARNING PROCESS The idea of real-time iterative learning is to record the error between the network outputs and the desired responses at time the current network states , and weights at each th execution of the learning algorithm, so as to modify the network weights with the aim of reducing the errors in th execution of the learning algorithm. After the next a number of learning iterations, the network at time should obtain appropriate weights to drive the network outputs to approximate the desired responses at the time step. In order to derive the real-time iterative learning algorithm, we rst consider the problem of iterative learning of the RTRN with time-varying weights for nite-time trajectory tracking. That is, we rst set a limit to in (2.9) of Now, we present the 2-D representation of the learning process. In the RTRN, there is a dynamical process described by the evolution of the variable in terms of time history. There is also another dynamical process described by the change of the variable to reect the learning iterations. Therefore, during the learning process, each variable of an RTRN depends upon two independent variables, discretized time and the learning iteration For example, and represent the network output and weights in the th time step of the th learning iteration. According to the 2-D expressions, the RTRN model (2.8) can be rewritten as (3.1)

allows for the possibility that values of the The notation desired response are specied for different neurons at different times. Dene the instantaneous sum of squares at time as (2.5) Then, the cost function is the sum over all time that is, (2.6) Based on the gradient-descent method, Williams and Zipser [17] proposed the RTRL algorithm. In the RTRL algorithm, an instantaneous estimate of the gradient of with respect to the weight matrix is used. The algorithm is capable of training an RTRN in a real-time fashion, but it is like other gradient-descent-type methods, in that it may suffer from the drawback of slow convergence. In this paper, we use 2-D system theory to derive a new iterative learning algorithm for RTRNs with time-varying weights. The weights of RTRNs can be estimated in a real-time fashion, and the error between the network output and the desired response can meet a very high level of accuracy. The proposed algorithm is then applied to learning control of nonlinear systems with unknown dynamics. Firstly, the essential state-space representation of the RTRNs should be established. The state-space nonlinear dynamics of the above RTRN are represented by

(2.7) We can rewrite the state-space model of the network in the following matrix form: (2.8) where is a vector of a nonlinear activation function, and We also express The problem of real-time iterative learning considered in this paper may now be formulated as follows. For an RTRN described in (2.8) with initial values desired (target) response, and a required tolerance update step by step the weights and , such that the network output follows the desired response for (2.9)

This is a 2-D dynamical system which clearly describes the 2-D dynamical behavior of iterative learning of the RTRN. The learning error can be expressed as (3.2) is independent of The where the desired response network input at time is also independent of The learning process of updating weights can be given as (3.3) . . where and is the learning rule that adjusts . the network weights in the th time step from to Fig. 2 shows the iterative learning process in 2-D notation. For every iteration of the learning process, the network executes a cycle. If every cycle begins with the same nonzero initial state, i.e., we get for for (3.4)

154

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 45, NO. 1, FEBRUARY 1998

Fig. 2. The iterative learning process of

th iteration in 2-D notation from time step

to

+1

The initial weights are randomized from a uniform distribution for (3.5)

Equations (3.4) and (3.5) are the boundary conditions of the 2-D system (3.1-3). We can derive the learning rule (3.3) according to the 2-D system scheme. But, it is also clear that, under any initial condition, such a learning rule should be able to reduce the error when the number of learning iteration increases; the error should approach zero as We have the following denitions. Denition 1: A learning rule (3.3) is said to be convergent if for as (3.6)

(4.1) where has a value between and Let (4.2) we have

for arbitrary initial boundary conditions (3.4) and (3.5). IV. LEARNING RULE AND CONVERGENCE ANALYSIS Our objective is to derive a convergent learning rule. The derivation is in accordance with the error equation, which is expressed in the form of the 2-D Roessers model [18]. We then derive the learning rule in accordance with the denition of convergence. From (3.1) to (3.3) for we get Obviously, if

(4.3) 0, the network outputs

are zero for arbitrary weights. In this case, may be arbitrary. Otherwise, apply the following learning rule:

(4.4)

CHOW AND FANG: RECURRENT NEURAL-NETWORK-BASED REAL-TIME LEARNING CONTROL STRATEGY

155

where and Therefore, using (4.2)(4.4), we obtain the error equation in the form of a 2-D Roessors model, shown by (4.5) at the bottom of the page, where is a by identity matrix. The boundary conditions of this 2-D system are for for (4.6)

Lemma 2: For the state vector of the 2-D system (4.5) and (4.6), we have for if This lemma implies that converges to 0 as for any Therefore, we have proved the following theorem. Theorem 1: An iterative learning rule and any given (4.11)

It is clear that are boundary for Note that this is a 2-D linear system with variable coefcients, because the system matrix depends upon the system variable. According to [18], in the solution of the 2-D Roessers model with variable coefcients, the solution of (4.5) and (4.6) is given in (4.7): (4.7) where the state-transition matrix 1) 2) (the identity matrix). (4.8) is dened as follows.

3) for or or or and per the equations shown at the bottom of the page. On the state-transition matrix we have developed the following new results for the convergence analysis of the proposed learning rule. Lemma 1: Suppose that and are nite; we have (4.9) where denotes the norm of matrix and

is convergent if Note that the convergence of the learning algorithm is not dependent upon the boundary conditions and, thus, is not dependent upon the networks initial values. In the following, the analysis of the convergence condition is considered. Because is a sigmoid function, its derivative satises Thus, the diagonal matrix has an inverse matrix and We can rewrite in the form of the rst equation shown at the bottom of the next page. Therefore, we get (4.12), shown at the bottom of the next page. It is very clear that if and for In this case, Hence, the following theorems are obtained. Theorem 2: The iterative learning rule

Based on (4.8) and the mathematical induction, the proof of Lemma 1 is straightforward and, hence, is omitted. Therefore, using Lemma 1, we have

(4.13) drives the error between network outputs and desired responses to zero after one learning iteration. It is noted that Theorem 2 requires no condition. In other words, the iterative learning rule is always capable of solving the problem, and the learning procedure requires only one step. However, is not available, because the value

(4.10) Because of the boundary we have the following result.

(4.5)

for

or

156

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 45, NO. 1, FEBRUARY 1998

of is between From the learning rule and

and we have

where

Therefore, we can obtain (4.14) This implies that we are able to approximate using Note that the learning rule (4.16) is also suited if the network contains some hidden neurons. In this case, the error term is dened as in (2.4). It is very clear that the weights of an RTRN are adaptively determined under the real-time algorithm. The algorithm is efcient because the basic iterative learning algorithm is able to track the desired responses with an arbitrary accuracy. Comparing Williams and Zipsers real-time learning with (4.16), the proposed real-time learning algorithm has a dynamic learning rate of To demonstrate the performance of the proposed real-time learning algorithm, it is applied to an example of trajectories tracking. Example 1: The problem of tracking a dynamical system governed by (4.17) is the system input and is the system output where was considered. For generating the mapping data, the unknown function was selected in the form and the initial value, The input is represented by a sequence of random numbers between 0 and 1. Because this is a single input and single output, an RTRN with single output neuron, a hidden neuron, and an external input connection is used for tracking the system. Using the real-time algorithm, the weights of the RTRN are adjusted in real time. The initial weights are randomized between 0 and 0.1. The sigmoid nonlinear function is The number of iterations at every time step is xed at 2. The absolute error at the beginning 180 time step is shown in Fig. 4. Generally, one is able to improve the tracking degree of accuracy by increasing the number of iterations at every time step and using the hidden neurons in the RTRN. In our experiment, although only two iterations at every time step and a hidden neuron were used, the tracking

The above algorithm deals with nite-time trajectory tracking. However, to allow the algorithm to apply to innite-time trajectory tracking, a modication of the algorithm is necessary. It is noted that the above algorithm can be used for all time steps from 0 to . In particular, we can use the algorithm only for a time step. Within this time step, the learning iteration stops at the specied iteration or when the specied trajectory tracking tolerance is met. Subsequently, the learning iteration of the next time step starts. The algorithm starts from zero time step that consists of and and the converged is regarded as the initial weights of the next time step This enables the algorithm to be carried out step by step to innity. In Fig. 3 the mechanism of this learning scheme is detailed. Generally, the weights at time step are represented as

(4.15) where denotes the number of iterations at the th time step. Because the real-time algorithm at every time step has the same input at each iteration, the input consists of the network outputs of the last time step and the new input at that particular time step. The learning rule can be expressed as follows:

(4.16)

(4.12)

CHOW AND FANG: RECURRENT NEURAL-NETWORK-BASED REAL-TIME LEARNING CONTROL STRATEGY

157

Fig. 3. Block diagram of the proposed real-time iterative learning algorithm.

Fig. 4. Example 1: the absolute errors between the nonlinear system output and the RTRN output at every time step using the real-time learning algorithm.

error was less than 10 Clearly, the real-time algorithm is very effective in handling nonlinear systems tracking. V. CONTROL
OF

Consider a general class of unknown multi-input and multioutput (MIMO) discrete-time nonlinear systems of the form (5.1) is an -dimensional output vector and is an where -dimensional input vector. The mapping is assumed to be unknown and analytic, and its mapping region is assumed to be The objective is to determine a control input sequence, such that the system output tracks the desired output. For the nonlinear system described in (5.1), Fig. 5 details the learning control scheme in which an initial input is given for every time step For the rst time step, which starts from the following three steps are iteratively carried out. 1) Let an RTRN be used to model a nonlinear system. By using the real-time learning algorithm described in Section IV, (5.1) may be approximately represented as

DYNAMICAL SYSTEMS

We have shown that the RTRN with time-varying weights using the real-time iterative learning algorithm can approximate the inputoutput relationships of nonlinear systems to any desired degree of accuracy. In this section, an RTRN combined with the real-time learning algorithm is used for the development of the real-time learning control of nonlinear systems with unknown dynamics. The idea of the learning control is to utilize two RTRNs, based on the same network architecture, to approximate the nonlinear system responses and to mimic the desired system response output. Then, the learning rule can be obtained by contrasting the two RTRNs and the current system output and control input. The detailed learning control scheme is stated as follows.

158

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 45, NO. 1, FEBRUARY 1998

Fig. 5. The general structure of the iterative learning control scheme applying to a nonlinear system with unknown dynamics (U 3 (t) denotes the new control input).

follows: (5.2) denotes the number of output denotes the number of hidden neurons is dened as in (2.8), and is a strictly increasing sigmoid function such that 2) Based on the same network architecture, the input and the desired output the second RTRN is trained using the real-time learning algorithm. We get (5.3) 3) If the matrix has the generalized inverse the new control input can be obtained as follows: (5.4) When the determined is within the specied error, this process stops and move to the next time step. Otherwise, it goes to 1). Note that at steps 1) and 2), both RTRNs are trained using the proposed real-time learning algorithm, and few training iterations may be required. However, in the following examples, only one iteration in both steps 1) and 2) is used. Obviously, the control rule (5.4) is able to approximate the system output to the desired output. The following example demonstrates the performance of the control scheme. Example 2: To test the control performance, an unknown nonlinear system is described by a difference equation of the form (5.5) and the initial value is a random number between 0 and 1. The desired output Two RTRNs with only an output neuron and an external input connection are used to model the single-input singleoutput system and the desired output by using the real-time learning algorithm with one iteration at every time step. The sigmiod unlinear function The initial input at every time step The initial weights of the where neurons and

RTRNs at the beginning time step are randomized between 0 and 1, and the latter weights are adaptively determined using the real-time learning algorithm. In this nonlinear system control application, the control input at every time step is adaptively determined by means of the weights of the RTRNs and current input. Figs. 6 and 7 show the improved tracking performance when steps 1)3) were iterated one and two times, respectively. As can be seen, the obtained results are promising and demonstrate that the learning control scheme is effective. It should be emphasized that only a single neuron and an external input without a hidden neuron were used in the above experiments. This is the simplest architecture of the RTRN. Although there is a larger control error, as shown in Fig. 6, when steps 1)3) were iterated one time, it is clear that the two curves of the control output and the desired output are almost indistinguishable, as shown in Fig. 7, when steps 1)3) were iterated two times. The higher degree of accuracy can be achieved by means of performing steps 1)3) more times. However, the control error is larger when the desired output In this case, we are able to change the architecture of the RTRNs to improve the control error. The RTRNs with a hidden neuron were used for the desired output. The curves of the control output and the desired output are shown in Fig. 8. After a few time steps, the two curves are indistinguishable. Obviously, the RNN-based control scheme with the real-time learning algorithm is very effective. Example 3: In this simulation, the model reference control problem for a nonlinear system [1], which is not a bounded input and bounded output (BIBO) system, is considered by using the proposed real-time learning control scheme. The system model is described as follows:

(5.6) The reference model is given by (5.7) where In [1], Ku and Lee pointed out that the system is unstable in the sense that, given a sequence of uniformly bounded the system output may diverge. The system controls is output diverges when the step input applied to the system. If a uniform step input is to be used, the must be less than 0.83, so as to guarantee control input the stability of the control system. In this case, the reference Ku and Lee [1] signal needs to be restricted to , in which the total gave the simulation results for number of neurons and weights used in the control system are 14 and 67, respectively. However, there are still signicant errors after 100 training epochs (one epoch is equal to 50 time steps). In our study, the proposed real-time learning control scheme is applied to this model reference control problem for The total number of neurons and weights for the two RTRNs are only 4 and 12, respectively. That is, two RTRNs

CHOW AND FANG: RECURRENT NEURAL-NETWORK-BASED REAL-TIME LEARNING CONTROL STRATEGY

159

Fig. 6. Example 2: the output of the learning control of a nonlinear system using RTRNs [with one iteration of steps 1)3)].

Fig. 7. Example 2: the output of the learning control of a nonlinear system using RTRNs [with two iterations of steps 1)3)].

with only an output neuron, a hidden neuron, and an external input connection are applied to model the nonlinear system and the reference model. As in Example 2, only one iteration of the real-time learning algorithm is used in steps 1) and 2) for every time step. Fig. 9 shows the outputs of the reference model and the system when steps 1)3) were iterated two times. Fig. 10 shows the absolute errors between the outputs of the reference model and the system when steps 1)3) were iterated four times. Both Figs. 9 and 10 demonstrate that the learning control scheme provides excellent performance.

VI. CONCLUSION In this paper, a real-time learning control scheme for unknown nonlinear dynamical systems using RNNs has been presented. In this control scheme, two RTRNs that are based on the same network architecture are utilized in the learning control system. One is used to approximate the nonlinear system and the other is used to mimic the desired response for the system control input. The approximation is based

on the real-time iterative learning problem of RTRNs with time-varying weights, which was developed in this paper. A mathematical model using 2-D system theory was derived to describe the dynamics of an RNN and the dynamics of the learning process. The model, which is a 2-D nonlinear system, can be viewed as a generalized iterative learning of the RTRN with time-varying weights. A 2-D expression of the error dynamics equation was derived using a 2-D linear system together with variable coefcients. Based on the 2-D expression of the error dynamics and the 2-D system theory, a convergent learning rule was derived. According to an iterative learning algorithm for nite-time trajectory tracking, a realtime iterative learning algorithm for innite-time trajectory tracking was derived. It has been shown that an RTRN with time-varying weights using these algorithms can approximate any trajectory to any desired degree of accuracy. Generally, this newly developed algorithm only requires a few iterations. Considering the network architecture, it was found that hidden neurons are usually not necessary, and the number of neurons required is the same as that of the dimensions of the trajectory.

160

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 45, NO. 1, FEBRUARY 1998

Fig. 8. The output of the learning control of a nonlinear system using RTRNs with a hidden neuron.

Fig. 9. Example 3: the output of the learning control of a non-BIBO nonlinear system using RTRNs [with two iterations of steps 1)3)].

Fig. 10. Example 3: the absolute error between the output of the reference model and the learning control of a non-BIBO nonlinear system using RTRNs [with four iterations of steps 1)3)].

CHOW AND FANG: RECURRENT NEURAL-NETWORK-BASED REAL-TIME LEARNING CONTROL STRATEGY

161

Thus, the RTRN with the real-time learning algorithm is better suited for control of dynamical systems. The experimental results strongly demonstrate that the learning control scheme is very effective for nonlinear systems with unknown dynamics. In this paper, we have derived a new methodology for the learning of the RNN. The obtained results are very promising and the developed 2-D-system-theory-based RNN provides a new dimension in real-time neural control systems. REFERENCES
[1] C. C. Ku and K. Y. Lee, Diagonal recurrent neural networks for dynamic systems control, IEEE Trans. Neural Networks, vol. 6, pp. 144156, Jan. 1995. [2] A. Karakasoglu, S. I. Sudharsanan, and M. K. Sundareshan, Identication and decentralized adaptive control using dynamical neural networks with application to robotic manipulators, IEEE Trans. Neural Networks, vol. 4, pp. 919930, Nov. 1993. [3] C. C. Ku, K. Y. Lee, and R. M. Edwards, Improved nuclear reactor temperature control using diagonal recurrent neural networks, IEEE Trans. Nucl. Sci., vol. 39, pp. 22982308, Dec. 1992. [4] L. Jin, P. N. Nikiforuk, and M. M. Gupta, Dynamic recurrent neural networks for control of unknown nonlinear systems, J. Dyn. Syst., Measur., Contr., vol. 116, no. 4, pp. 567576, Dec. 1994. [5] K. S. Narendra and K. Parthasarathy, Identication and control of dynamic systems using neural networks, IEEE Trans. Neural Networks, vol. 1, pp. 427, Jan. 1990. [6] B. Srinivasan, U. R. Prasad, and N. J. Rao, Back propagation through adjoints for the identication of nonlinear dynamic systems using recurrent neural models, IEEE Trans. Neural Networks, vol. 5, pp. 213228, Mar. 1994. [7] K. I. Funahashi and Y. Nakamura, Approximation of dynamical systems by continuous time recurrent neural networks, Neural Networks, vol. 6, no. 6, pp. 801806, Aug. 1993. [8] L. Jin, P. N. Nikiforuk, and M. M. Gupta, Approximation of discretetime state-space trajectories using dynamic recurrent neural network, IEEE Trans. Automat. Contr., vol. 40, pp. 12661270, July 1995. [9] O. Olurotimi, Recurrent neural network training with feedforward complexity, IEEE Trans. Neural Networks, vol. 5, pp. 185197, Mar. 1994. [10] T. Catfolis, A method for improving the real-time recurrent learning algorithm, Neural Networks, vol. 6, no. 6, pp. 807821, Aug. 1993. [11] B. A. Pearlmutter, Gradient calculations for dynamic recurrent neural networks: A survey, IEEE Trans. Neural Networks, vol. 6, pp. 12121228, Sept. 1995.

[12] M. Morf, B. C. Levy, and S. Y. Kung, New results in 2-D systems theory, Part I, Proc. IEEE, vol. 65, pp. 861872, June 1977. [13] T. Kaczorek, Two-Dimensional Linear Systems. New York: SpringerVerlag, 1985. [14] T. Kaczorek, Linear Control System, vol. II. New York: Wiley, 1993. [15] J. E. Kurek and M. B. Zaremba, Iterative learning control synthesis based on 2-D system theory, IEEE Trans Automat. Contr., vol. 38, pp. 121125, Jan. 1993. [16] S. Y. Kung, Digital Neural Network. Englewood Cliffs, NJ: PrenticeHall, 1993. [17] R. J. Williams and D. Zipser, A learning algorithm for continuously running fully recurrent neural networks, Neural Computation, vol. 1, no. 2, pp. 270280, Summer 1989. [18] T. Kaczorek and J. Klamka, Minimum energy control of 2-D linear systems with variable coefcients, Int. J. Control, vol. 44, no. 3, pp. 645650, 1986.

Tommy W. S. Chow received the B.Sc. (Hons.) degree and the Ph.D. degree from the University of Sunderland, Sunderland, U.K. He is currently an Associate Professor in the Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong. His current research activities include articial neural networks and applications, signal processing, and machine fault analysis.

Yong Fang received the B.S. degree in mathematics from Sichuan University of Normal, Chengdu, China, in 1984 and the M.S. degree in control theory and application from Nanjing University of Science and Technology, Nanjing, China, in 1990. He is currently working toward the Ph.D. degree in the Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong. From 1990 to 1995, he was a Lecturer in the Department of Mathematics, Neijiang Normal College, China. His research interests include 2-D system theory, adaptive control, neural networks and applications, and signal processing.

You might also like