You are on page 1of 9

A Control Theory Approach to Machinery Health Prognostics

Eric Bechhoefer, Praneet Menon Goodrich SIS Vergennes, VT David He Department of Mechanical & Industrial Engineering The University of Illinois at Chicago Chicago, IL, USA, 60607 ABSTRACT Evaluation of component health or the ability to predict incipient failure remains difficult. A shift in traditional vehicle health management tools, such as trend analysis, to techniques from different disciplines may be required to achieve the promise of on condition maintenance and prognostics. In this paper, we apply concepts from control theory to the evaluation of vehicle health. Presented are techniques taken from of control theory to observe hidden states associated with component health. Two examples techniques are given: removal of the effects of torque split, and an accurate prediction of the remaining useful life of a component. The techniques are shown to be general and robust. Additionally, the techniques allow measures of confidence, which facilitates display of prognostic information to an operator to better manage fleet assets. INTRODUCTION Control theory provides a powerful set of modeling tools to study the interactions and behavior of a system when subjected to conditions or inputs. The purpose for a control system is to regulate or gain knowledge on the flow of energy, information, or other quantities within a system. In the application of health monitoring, we wish to gain knowledge on the condition of the monitored components. In general, the representation of a system can be either open or closed looped. In an open looped system, the input is selected based on the output goals of the system. All available knowledge about the system is known a priori. The input is not influenced by the output of the system. In a closed loop systems, the control variables are modified by the behavior of the system. This simple representation of a system can be used to model machinery components. In such a model, the inputs variables could be torque and RPM into a gearbox, and the output variables, along with useful work, are heat and vibration. Additionally, any number of derived outputs can be calculated that are indicators of the systems component conditions (a Condition Indicator (CI)). These inputs and output can be used to represent the state of the system. A system state that represents the component health, remaining useful life (RUL), or rate of change of remaining useful life (dRUL/dt), would be of immense value. From a logistics point of view, RUL would facilitate a number of cost saving processes: Improved Readiness/Reduction in unscheduled maintenance Opportunistic maintenance practices Activation of a just in time part delivery, A reduction in the overall number of spares and optimization of deployable assets

From a safety perspective, RUL would reduce the chances of a class A mishap due to premature machinery failure. It is taken for granted that in order to facilitate RUL, an accurate diagnostics capability is present: only through functioning diagnostics could an accurate RUL, or prognostics, exits. The systems states that we are interested in, health and RUL, are hidden from the operator. While there are a number of observable states (lift, airspeed, vibration based CIs, etc), none of the observable states are a direct measure of health or RUL. This paper presents an approach to machinery health and prognostics based on a control theory paradigm. Kalman filter models are developed and real world examples are given. INTRUDUTION TO CONTROL THEORY In control theory, a system is said to be observable if it is possible to determine the state X of the system from the observations of the output over a finite time interval [1]. The system is completely observable if every transition of the state eventually affects every element of the output vector. This concept of observability was introduced by Kalman and is useful in solving the problem of reconstructing hidden state variables, such as RUL, from measurable variables. The State Observer A state observer estimates the state variable on the basis of measurement of the output and input control variables [1]. In general, a system plant can be defined by:

Presented at the American Helicopter Society 66th Annual Forum, Phoenix, AZ, May 11-13, 2010. Copyright 2010 by the American Helicopter Society International, Inc. All rights reserved.

& x = Ax + Bu y = Cx

(1)

& where x is the state variable, x is the rate of change of the state variable, and y is the output of the system.
An observer is a subsystem used to reconstruct the state vector of the plant. The model of the observer is the same as that of the plant, expect that we add an additional term which includes the estimated error to account for inaccuracies the A and B matrixes. Essentially this means that any hidden state (such as RUL) can be reconstructed if we can model the plant (e.g. failure propagation) successfully. Given this, the observer is defined as:

Q H R

is the process noise model is the measurement matrix is the measurement variance

& x = Ax + Bu + K (y ! Cx )

(2)

& where x is the estimate state and Cx is the estimated output. The matrix K is called the Kalman gain matrix, it is a weighting matrix that maps the differences between the measured output y and the estimated output Cx . A Kalman filter can be used to optimally set the Kalman Gain matrix. ! Figure 1 represents a system and its full state observer.

For nonlinear systems, an extended Kalman filter (EKF) is used, which is based on a Taylor series approximation of the joint distribution [2]. For non-linear, non-Gaussian noise problems, particle filters (PF) are attractive [3]. PF is based on representing the filtering distribution as a set of particles. The particles are generated using sequential importance resampling (a Monte Carlo technique), where a proposed distribution is used to approximate a posterior distribution by appropriate weighting. Selection of the state observer technique is based on the linearity of the problem, and the distribution of the process noise. HEALTH AND PROGNOSTICS USING A STATE OBSERVER Two examples are presented in which a state observer is used to measure and tack components states that would otherwise remain unobserved and hidden. In the first example, a state observer is used to track transmission error (TE, e.g. effect of torque on changing the CI value. See [4]). Increasing TE with torque suggested that a component is in the process of failing. Additionally, a torque split in data increases the noise, complicating trend analysis. The state observer effectively removes the torque split while tracking the increase in TE prior to component failure. In the second example, a state observer is designed to track the RUL and health of a component. This is a powerful tool that can facilitate condition based maintenance for monitored components. A State Observer for Torque Split

Figure 1 Example of Plant and State Observer Kalman Filtering as a State Observer A Kalman filter is a recursive algorithm that optimally filters the measured state based on a priori information such as the measurement noise, the unknown behavior of the state, and relationship between the input and output states (e.g. the plant), and the time between measurements. Computationally, it is attractive because there is no matrix inversion and it is a one step, iterative process. The filtering process is given as: Xt|t-1 = A Xt-1|t-1 State Propagation Pt|t-1 = A Pt-1|t-1A + Q Covariance K = Pt|t-1 H [H Pt|t-1 H + R]-1 Gain Pt|t = (I KH) Pt|t-1 State Covariance X t|t = Xt|t-1 + K(Y-H Xt|t-1) State Update where: t|t-1 is the condition statement (e.g. t given the information at t-1) X is the state information (x, xdot, x dot dot) Y is the measured data K is the Kalman Gain P is the state covariance matrix

In vibration based mechanical diagnostics, algorithms extract some feature of a component, which is used as a statistics of the components health. For example, in shaft analysis, acceleration associated with the first (SO1), second (SO2) and third (SO3) harmonic of the shaft revolutions are indicative of various faults. SO1 is the result of an imbalance or slightly damage shaft, SO2 is present with a bent shaft, and SO3 alone, or with SO2 can indicate a shaft coupling failure. The measured condition indicator (CI) can at times be a function of TE; some part of the transmitted power in a shaft imparts an imbalance driving the SO1 or higher order accelerations. Because there exits relationship between TE and the CI (the TE in effect causes a change in the CI), for a component that is faulted the CI is correlated with power transmitted. In drive-train components, power is proportionally to measured torque. In terms of diagnostics or trending of a component, it is desirable to reduce the scatter in the measured CI and account for this torque. While the relationship between torque and the CI can be viewed by plotting torque vs. CI, from a diagnostics perspective it is

important to capture this relationship as a function of time: a time vs. shaft order trend captures the progression of the fault. Figure 2 shows the relationship between the CI and Torque, but does not adequately relate the change in this relationship over time. It must be noted that an increase in the correlation of torque with the CI would indicate an increase of TE. This intern can indicate the propagation of a fault.

where z is the measured CI, H is the measurement matrix (e.g. appropriate torque values applied) and the recursive algorithm is:

P = " 2 (H T H ) K = PH T

#1

Yi = Yi#1 + K ( z # Hy i#1 )

(4)

While a powerful methodology, this algorithm is computationally complex: it requires the inversion of a matrix and also the window reduces the system response to ! changes in the monitored system (similar to an FIR filter). We would like to measure the relationship to CI and Torque nearly instantaneously. In the design of this Kalman filter, a 4 state model is used to reconstruct the: Torque corrected or Hidden CI dCI/dt, the rate of change of the CI B, the hidden effect of torque (torque coefficient) and dB/dt, the rate of change of the torque coefficient. Figure 2 Example of Relationship between CI and Torque In figure 2, the relationship between the CI and the Torque (slope) changes between early data and late dates due to a propagating fault. Presented here is a methodology to estimate the effect of torque on the CI dynamically over time using a state observer. This allows a reduction in CI variance and gives insight into the condition of the component by observing when the effect of torque is statistically significant indicative of a propagating fault or a manufacturing error. Initially, a simple model for CI/torque relationship is proposed: CIMeasured = CI + B*torque. The actual CI (e.g. SO1) is affected by torque in some linear manner. With out loss of generality, a linear model is use, although in some cases a quadratic or other non linear relationship may exits. In a batch process mode, this linear relationship could be estimated by linear regression: B = (H*H)-1*HT*CI, where H is a matrix of [1 torquei]. This methodology does not capture the time evolution process of component degradation, and thus, in not appropriate for this case. Because the effect of the torque is effectively hidden in the measurement, a state observer, such as a recursive least ! squares estimator or a Kalman filter can be used to reconstruct the hidden state. In a recursive least estimation, regression is performed on a window of data k units long: The measurement matrix, defined at H, is [1 0 torque 0], where as the transition matrix is:

"1 t 0 0% $ ' $0 1 0 0', A= $0 0 1 t ' $ ' #0 0 0 1&

(5)

and the plant noise (unknown behavior the tracked state is) is modeled as a white noise process:

"1 t 4 $ 4 $ 12 t 3 Q=$ $ 0 $ 0 #

1 t3 2 t2 0 0

0 0 1 t4 4 1 t3 2

0 % ' 0 ' , 1 t 3' 2 ' t2 ' &

(6)

Again, we assume a nearly linear relationship between the CI and Torque, and that the plant noise is Gaussian. The first example is a generator drive shaft that was recommended for maintenance. The spline adaptor, which connects the generator to the power takeoff, was found to be cracked and fretted. Figure 3 shows the reduction in CI variance as a result of normalizing for torque.

#z(i " k ) & #H (i " k ) & % ( % ( z k = % M (, H k = % M ( % z(i) ( % H (i) ( $ ' $ '

(3)

Figure 3 Generator Shaft with Torque Split It is interesting to note that the relationship between torque and CI increased dramatically once the fault started to propagate (figure 4, at 17 hours)

Figure 5 Torque Split on Input Shaft Note that in this case the TE from torque become negative over time and at 100% torque, effected the CI by almost 3Gs (figure 6), greatly increased the deterministic variance in the SO1 trend of that shaft. The maintenance procedure for this shaft is to only use SO1 measurements at low torque (ground). Evidently this relationship between torque and CI was known empirically and controlled for by maintenance practice.

Figure 4 Observed Torque Coefficient for Failing Shaft Figure 4 shows that at the point of maintenance, 100% torque added 1.4 Gs acceleration to the CI. In this second example (figure 5), we measure the SO1 over time. Note the high degree of correlation between torque and SO1 (figure 2) and how this results in deterministic variance in SO1 over time (figure 5). Application of the Kalman filter reduces the deterministic variance and allows a better measure of the true SO1 values. The current the maintenance practice recommends maintenance at 1.5 Gs

Figure 6 Time Dependent Torque Coefficient for Input Shaft A State Observer for Health and Prognostics The state observer can be constructed as a parallel system to the plant (or system under study). The question for health and prognostic is then: how to model the plant. This is a hard problem in itself. Failure modes propagating in a mechanical systems are difficult to model at a level of fidelity that would generate any meaningful results. We needed a generalized, data driven process that would model the plant adequately enough to generate RUL with small error. Since 1953, a number of fault growth theories have been proposed, such as: net area stress theories, accumulated strain hypothesis, dislocation theories, and others (see [5].

[9]). Through substitution of variables, many of these theories can be generalized by the Paris Law:

with a logistics paradigm [4]. Then N is the RUL times some constant (RPM for a synchronous system). The only remaining unknown is D, the material constant of the crack growth equation, which can be estimated as:

da dN = D(!K )

(7)

which governs the rate of crack growth, in a homogenous material, where: da/dN is the rate of change of the half crack length D is a material constant of the crack growth equation

D = da dN * ( 4" 2#a)

(12)

K is the range of the K during a fatigue cycle


m is the exponent of the crack growth equation The range of strain, K is given as:

In practice gross strain will not be know. Again, a surrogate value, such as torque, will be as appropriate. For a further discussion on using Paris Law, see [8]. ! We used a Paris Law model to represent the plant. The relationship between the measured data and the unknown variable D is non-linear. The process noise is approximately Gaussian. As such, a EKF state observer was designed in which the inputs are either a CI or HI and torque, and the reconstructed states are component Health, rate of change in health, RUL and dRUL/dt. Measuring the rate of change in the RUL is important because it will indicate when the RUL value is accurate and is returning a valid result. For example, if dRUL/dt is approximately -1 (RUL is decreasing at -1 per unit time/cycle/etc) and the second derivative is close to zero, then: the plant model can be assumed to correct, and the predicted RUL is an accurate measure of the time until maintenance is appropriate.

$K = 2"# (!a )
here is gross strain

1/ 2

(8)

is a geometric correction factor is the half crack length

Most of these variables are specific to a given material and test article. In practice, the variables are unknown. This requires some simplifying assumptions to be made to facilitate analysis. For many components/material, the crack growth exponent is 2. The geometric correction factor , is set to 1, which allows equation (4) to be reduced to:

Validation of the State Observer Model In order to validate the assumption used in estimating the number of cycles remaining, a crack growth data set from [9] was used. The data set contains crack length vs. number of cycles in austenitic steel, where the cyclic loading was 62 MN/m2. The raw and state observer data are plotted vs. millions of cycles in figure 7. The State Observer was a four state Kalman filter.

da dN = D( 4" 2#a)

(9)

The goal is to determine the number of cycles, N, remaining until a crack length a is reached. Taking the reciprocal of (9) gives: !

dN da = 1

D( 4" 2#a)

(10)

Integrating gives the number of cycles (N) remaining. Note that N for synchronous systems (e.g. constant RPM) is equivalent to time by multiplying with a constant. !

N=

" = "

af ao

dN da 1 D( 4# 2$a) da
f o

= 1

D( 4# 2$ )

(ln(a ) % ln(a ))

(11) Figure 7 Crack Length In Steel Test Article The state observer reconstructs rate of change in crack length, which is an input into the RUL state observer (figure 8).

Equation (11) gives the number of cycles N from the current measured crack ao to the final crack length af. The measured component condition indicator (CI) or Health Indicator (HI) ! will be used as a surrogate for crack length a, in keeping

acceleration of the pump was seen to be up to 30 Gs prior to failure. Because there are no vibration-based limits applied to this component, we can set a limit that would be appropriate for industrial monitoring, such as .75 inch per second (ips) peakto-peak. Given the shaft operating speed of 11,806 RPM, the conversion from RMS to ips peak-to-peak is:

HI = TSA RMS * 32.174 ft sec 2 *12in ft /(2" ) * 60sec/min/11806rpm* 2 /9rev = TSA RMS * 0.049
Figure 10 displays the raw pump and state observer health vs. flight hours. Note that the state observer is primarily used to reconstruct da/dN (e.g. the rate of change of health) and RUL.

!
Figure 8 Actual and Estimated RUL Note that the estimate RUL begins to converge after just 4 measurement updates and has converged after just 6 measurements. This is verified in figure 9: the rate of change in the RUL is approximately -1 after 8 measurements.

Figure 10 Pump HI Values and State Observer Figure 11 is the RUL of the pump. Prior to time 100, the RUL is effectively infinite because dH/dt is close to zero. Figure 9 dRUL/dt, Validating State Observer Model Prognostic on a Hydraulic Pump The hydraulic pump on a utility helicopter is driven at constant RPM by an auxiliary gearbox off of the drive train gearbox. The helicopter has a Health and Usage Monitoring System (HUMS) that generates CIs associated with the hydraulic pump drive shaft, but not configured to measure health of the hydraulic pump. While reviewing HUMS data, it was observed that the time synchronous average (TSA) RMS where trending upward. The shaft order 1, 2 and 3 values (which give indications of shaft condition) where nominal. Along with CI values, raw time domain data was also collected on this shaft. Analysis of this time domain data showed that the elevated TSA RMS was driven exclusively by a 9 per rev (acceleration corresponding to 9 time the shaft RPM), which is associated with the 9 piston hydraulic pump driven by this shaft. The peak-to-peak

Figure 11 Pump Actual and Estimated RUL

At time 120, corresponding to an increase the in pump HI value, the RUL decreases rapidly. At time 270 (50 remaining flight hours) the estimated RUL tracks with the actual RUL The derivative of the RUL is given in figure 12. An automated maintenance action could be triggered based on the dRUL/dt is close to -1, reporting hours remaining.

a f = exp ND( 4" 2# ) + ln( ao )

(12)

The upper and lower bound on the future health af can then be calculated through bounding the delta strain (e.g. 5% and 95% value of delta strain). As noted, model fit and ! confidence need to be taken into account: a poor model should be reflected in a large bound on RUL. One such way to model this uncertainty is based on dN/dt, for example:

u = 10 * 1+ dN dt

(13)

In (13), as dN/dt goes to -1, the model uncertainty goes to zero, and the RUL is then solely a function of delta strain. Figure 13 shows the evolution of the prognostic at time 100 ! hours. The error bound is large. The actual health trajectory is display as a reference to gauge the accuracy of the prognostic (yellow in figure 13).

Figure 12 Derivative of Estimated RUL The derivative converges to a value of -1 at 270 to 300 hours (pump failed at 335 hour). Consider that the HI is not a direct measure of any feature on the hydraulic pump. With a CI directly measuring SO9, one could assume that the prognostics capability would be increased. Application: Confidence in Prognostics Information In practice, HUMS prognostic functionality would be used to schedule maintenance or assist is assets selection for fleet management. Maintainers and operators will perform management of the aircraft assets. They will need an intuitive, simple display that conveys information on: current health, RUL, and confidence in the RUL prediction. Model confidence is essential in any RUL prediction. For any RUL calculation, given 1 hour of nominal usage, the RUL should decrees by 1 (e.g. dN/dt is approximately -1). Further, a measure of model drift or convergence would be the second derivative d2 N/dt2: a value close to zero indicates convergences. When these conditions are met, the model used for calculation of the RUL is consistent, and is indicative a good estimate of the life of the component. Another aspect of the prognostic model is to predict what the health of the component will be some time in the future. With the EKF model as proposed, the RUL or any predicted health is an expectation based on the current state and future usage (e.g. damage or stain). The Paris law is driven by delta strain: changes in strain will affect the RUL (eq 11). Future health is then based on the mean stain, and a bound on that strain. This strain information could be based on mission type (e.g. high strain: high/hot mission, low strain: ferry flight). The health at any time in the future is then:

Figure 13 Prognostics with Error Bounds at time 100 Hours The prognostic color reflects the confidence: Low Confidence: Yellow, abs(dN/dt-1) > 3 and abs(d2N/dt2) > 0.5 Medium Confidence: Blue abs(dN/dt-1) > 2 and abs(d2N/dt2) > 0.5 High Confidence: Green, abs(dN/dt-1) < 2 and abs(d2N/dt2) < 0.5

From figure 11 and 12, it is seen that at time 100 hours, the model is not converged. There is little degradation in health and dH/dt is near zero: it is clear that fault propagation does not begin until approximately time 200 hours. Figure 12 indicates that the model does not converge until approximately time 280 hours (dH/dt = -1). Figure 14 displays the prognostics at time 200 hours, corresponding to an increase in dH/dt (fault propagating). The RUL and reflects the improved model fit with reduced error bound, giving indications of medium confidence.

future point, this may become a new CI for this component, which would result in a better model fit and a corresponding increase in prognostic capability. CONCLUSIONS Goodrich SIS has over 22 years of vechicle health monitoring. Yet, even with the experience, evaluation of component health or the ability to predict incipient failure remains difficult. Using tools from different disciplines (control theory vs. mechanical engineering) may be required to achieve the promise of on condition maintenance and prognostics. In this paper, we apply the concept of a state observer, taken from control theory, as tool for vehicle health monitoring. Presented are techniques taken from of control theory to observe hidden states associated with component health. The examples given, such as removal of the effects of torque split, or the ability to predict the remaining useful life, have not been achieve through tradition trend analysis, Bayesian statistical approaches or other decision logic widely used in HUMS [7]. The power of the techniques is in the generality of the approach and the ability to successfully determine the remaining useful life with limited data. In one example, only signal average RMS from a hydraulic pump is used to predict failure 70 flight hours in the future. This technique has been applied to both shaft and bearing prediction with good success [8]. While this technique appears a step closer to achieving reliable prediction of remaining useful life, addition work needs to be done to implement in a system. The next goal would be to: Implement in an actual vehicle health monitoring system. Develop verification and validation procedures in accordance with the AC-29 MG -15 (civilization certification for credit) or the ADS-79-HDBK (US Army design standard for condition based maintenance) so that some logistic benefit can be obtained.

Figure 14 Prognostics with Error Bounds at time 200 Hours The prognostic has changed to blue as the model fit has improved in quality and confidence in the RUL has improved. At time 275, the model is well converged with dN/dt ~=-1 and d2N/dt2 is 0.133. Figure 15 shows tight error bound and the prognostic is green: high confidence. If the maintenance paradigm is to schedule maintenance at 0.75, the operator has close to 70 flight hours remaining.

Figure 15 Prognostics with Error Bounds at 275 hours Note the prognostic is remarkably close to the actual fault propagation trajectory once the model has converged. The ability to reconstruct the damage or health state and to estimate RUL using the Paris Law fault propagation model is remarkably robust. Equally important to the generation of the RUL is a means to quantify model fit and confidence, which is conveniently calculated from the first derivative of RUL. An explanation as to why the model did not converge until 270 flight hours could be a result of the CI not directly measuring SO9. Initially, the TSA RMS is driven by shaft order 1, 2, 3, and a 92/rev spur gear. The performance likely would be improved using only SO9 amplitude data. At some

Over the next two years, we will work toward engaging a customer to demonstrate this technology in a real world application. REFERENCES [1] Brogan, W., Modern Control Theory, Prentice Hall, Upper Saddle River, NJ, 07458, 1991. [2] Bar-Shalom, Y., Li, X., Estimation and Tracking, Artech House, Boston, 1993. Page 371-410 [3] Candy, J., Baysian Signal Processing, John Wiley & Sons, Hoboken, 2009, page 237-267 [4] Smith, D., Gear Noise and Vibration, Marcel Dekker, Inc., New York, 1999, page

[5] Frost, N. E., March, K. J., Pook, L. P., Metal Fatigue, 1999, Dover Publications, Mineola, NY., page 228-244. [6] Bechhoefer, E., Bernhard, A., A Generalized Process for Optimal Threshold Setting in HUMS IEEE Aerospace Conference, Big Sky, 2007. [7] Bechhoefer, E., Bernhard, A., He, D., Banerjee, P., Use of Hidden Semi Markov Models in the Prognostics of Shaft Failure, AHS International Forum 62, Phoenix, AZ Mar 2006. [8] Bechhoefer, E., A Method for Generalized Prognostics of a Component Using Paris Law American Helicopter Society 64th Annual Forum, Montreal, CA, April 29 - May 1, 2008 [9] Frost, N.E. DSR, NEL Rep No PM 287 (1959)

You might also like