You are on page 1of 11

IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO.

2, JUNE 2013

351

Optimal Preventive Maintenance Rate for Best Availability With Hypo-Exponential Failure Distribution
Meng-Lai Yin, Senior Member, IEEE, John E. Angus, and Kishor S. Trivedi, Fellow, IEEE
AbstractThe optimal rate of periodic preventive maintenance to achieve the best availability is studied for Markov systems with multiple degraded operational stages, where the time-to-failure has a hypo-exponential distribution. An analytical expression is developed for the availability of such systems having operational stages, and a necessary and sufcient condition is derived for a non-trivial optimal rate of periodic maintenance to exist. Numerical procedures for nding the optimal rate of periodic maintenance are given, and examples are presented. Index TermsDegraded-stage systems, hypo-exponential failure distribution, periodic preventive maintenance, system availability.

Usually, maintenance activities fall into two categories: corrective, or preventive [1]. Corrective maintenance (CM) occurs when the system has failed, while preventive maintenance (PM) takes place periodically to reduce or eliminate accumulated degradation. CM is not the focus here; what this paper will focus on is the rate at which PM is performed. For many real-life systems such as aircraft [2], power systems [1], and mission-critical systems, determining the best PM rate is a critical design issue. Thus, we address the issue of nding the optimal PM rate with a hypo-exponential failure distribution for the system, while all other system parameters remain xed. A. Previous Works on PM Optimization PM optimization has been studied extensively, particularly in the power systems literature, and mainly in the context of multi-objective optimization. For multi-objective optimization, maintenance activities are managed with multiple objectives considered. Hilber, et al. [3] used multi-objective optimization to study the problem of balancing PM and CM for electrical network systems, with the goal of obtaining the lowest total cost. Monte Carlo simulations and a heuristic approach were applied to reach the goal. Stopczyk et al. [4] developed a maintenance policy model using a semi-Markov process and a modied simulated annealing algorithm, also for a multi-objective optimization. Yang and Chang [5] proposed a method for multi-objective maintenance scheduling where a two-level modeling scheme is applied. Their method was implemented using simulations. In the present work, we consider the single-objective optimization problem of nding the best availability for a type of system that is common in electronic systems modeling. With regard to restoration, minimal PM and major PM can be distinguished. Minimal PM restores the system to a previous, less degraded stage, while major PM restores the system to a good as new state. Sim and Endrenyi [6] studied the optimal PM problem with minimal PM considered. Due to the maintenance transitions considered, they found that the availability could be obtained using a recursive computational scheme [6]. On the other hand, Chen and Trivedi [7] studied systems with major PM, and formulated general expressions for system availability under general failure, repair, and maintenance distributions. Models considering both major and minor PM activities can be found in Endrenyi el. al. [8], and Chen and Trivedi [9], where simulations, and analytical approaches were respectively employed. Our work considers major PM, and derives the availability function analytically for the purpose of obtaining the optimal

NOTATION number of system operational stages transition rate from stage to the next degraded stage preventive maintenance rate repair rate rate to complete a preventive maintenance task availability as a function of ACRONYMS PM CM Preventive Maintenance Corrective Maintenance I. INTRODUCTION ALANCING maintenance cost and system availability for economic efciency is the motivation of this study. In particular, we are interested in developing optimal preventive maintenance schedules to achieve the best availability.

Manuscript received January 04, 2012; revised July 26, 2012 and October 30, 2012; accepted November 29, 2012. Date of publication April 11, 2013; date of current version May 29, 2013. Associate Editor: L. Walls. M.-L. Yin is with the Electrical and Computer Engineering Department, California State Polytechnic University, Pomona, CA 91768 USA (e-mail: myin@csupomona.edu). J. E. Angus is with the School of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711 USA (e-mail: john.angus@cgu.edu). K. S. Trivedi is with the Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708 USA (e-mail: kst@ee.duke.edu). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TR.2013.2256672

0018-9529/$31.00 2013 IEEE

352

IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

PM rate, for systems with a hypo-exponential time-to-failure distribution. Note that our work is different from [9] where condition-based PM was studied. For condition-based PM, whether or not to conduct PM depends on the state of the system, while for time-based [7] PM, PM takes place at set intervals or predetermined times. Our work is also different from [10], where Shirmohammadi et al. applied an ad-hoc linear cost function in the exploration of an optimal PM rate, assuming imperfect repair and time-based PM. In our model, all restorations are perfect, and the PM is age-based.

As highlighted in [11], assessing the reliability for nonMarkovian systems in an analytical manner is difcult, not to mention obtaining the optimal PM rate. To the authors knowledge, no attempt has been made to nd the optimal periodic maintenance rate analytically for best system availability with a hypo-exponential failure distribution. II. PRELIMINARIES A. Problem Description

B. The Challenge of Complexity and Fidelity To provide optimal maintenance rates for operational systems, accuracy and efciency are both crucial. This point leads to the challenge of dealing with the two conicting factors, e.g., complexity, and delity. In general, analytical methods are efcient. However, they are difcult to develop when the system is complex, and a high-delity model is required [11]. On the other hand, discrete-event simulations can represent the system with high delity, but do not produce accurate results efciently, especially in regard to rare events that are critical to safety or performance. Numerous efforts have been devoted to address the issues of modeling complexity. In Chapter 6 of [11], Singh discussed methods to deal with the difculty of non-Markovian systems in reliability modeling. One promising method mentioned there was the device of stages method, which converts a non-Markov model into a Markov model by redening the state space. As stated in [11], the device-of-stages method is a method of representing a non-exponentially distributed state by a combination of stages each of which is exponentially distributed. In particular, the stages-in-series model described in [11] is applied here to model failures with the hypo-exponential distribution [12]. Thus, assuming the hypo-exponential failure distribution, a general availability model with major PM (explained next) is developed which serves as the basis for our study. An analytical expression for the availability for such a model is then derived, and a necessary and sufcient condition is specied for a nontrivial optimal rate of periodic maintenance to exist. Based on our ndings, numerical procedures to nd the optimal PM rate are given, and then demonstrated by examples and a simulation. Although nding the optimal periodic maintenance rate analytically for systems with non-exponentially distributed stateoccupancy times is inherently difcult, we will show that the problem can be alleviated by the device-of-stages method, and the analytical and numerical efforts can go hand-in-hand to achieve efciency. The work by van Casteren shows another way of attacking the issues of modeling complexity, which is related to PM. In [13], van Casteren proposed a calculation method for assessing the interruption costs, where PM can be a cause for the interruptions. His work is based on the Weibull-Markov model, which extends Markov models to include Weibull-distributed state occupancy times. The method developed in [13] takes advantage of pre-processing to achieve efciency.

In this study, the evolution of the system in continuous time is described by a homogeneous continuous time Markov chain. That is, all state-occupancy times are exponentially distributed. The maintenance policy assumed is age replacement, which means a unit is always replaced at the time of failure or hours after its installation, whichever occurs rst [14]. can be random or deterministic. It is also assumed that PM as well as CM will bring the system back to its good as new state. While our main focus is with a model in which the PM policy is random, it is instructive to review here the special case when is deterministic (with all other assumptions above maintained). This case is studied extensively in Chapters 3 and 4 of [14]. Consider a system whose time to failure (for a new system) has a nite rst moment, and has a continuous cumulative distribution function . The system is repaired upon failure, or units of time after it was last restored to a new condition, whichever comes rst. All maintenance actions bring the system to like new condition. Suppose the repair, and PM durations are randomly distributed with means , and , respectively. Using renewal arguments (or the general equations that are derived in [15]), the availability of this system is given by

(1) If, in the cost function in Equation 2.2 of Chapter 4 in [14], we make the substitutions that , , , and use the distribution that places probability 1 at the value as the distribution corresponding there to , then the optimum (minimum) of that cost function would give the maximum availability. Also, from Equation 2.3 in [14], we nd that the optimal is the solution (if one exists) to

(2) where is the failure rate function for the distribution . It is shown in [14] that there is a solution to this equation if corresponds to an IFR (Increasing Failure Rate) distribution, though it is possible that (i.e. do not perform any PM, and only perform repairs when the system fails). It is also shown in [14] that the optimal availability corresponding to that solution is at least as large as any

YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION

353

When this condition is met, the maximum availability occurs when (5) If condition (4) is not met, then the availability is a decreasing . Note that from function of , and the maximum occurs at (5), as m increases (i.e. average duration of PM decreases), the optimal increases, as it should. If we ignored the maintenance (i.e. set ), and make the failed state D2 absorbing, then the model depicted in Fig. 1 corresponds to a system that has a two-stage hypo-exponential failure time distribution, with parameters and . Let be the random variable representing the failure times using this two-stage hypo-exponential distribution. Then has the probability density function , and the cumulative distribution function . Note that these formulae also work for the case if one takes the limit as . This failure distribution clearly does not have the memoryless property. Therefore, if we were to combine the UP and D1 states into one operational state with the time-to-failure modeled directly by the hypo-exponential distribution, then the occupancy time in the operational state is no longer exponentially distributed. However, such a non-Markovian model can be converted to a Markovian one, as shown in Fig. 1, which facilitates the pursuit of the optimization problem analytically. Thus, by modeling the system as shown in Fig. 1, we have created a continuous time Markov model for this system in which CM takes place upon failure (the transition labeled with ), and PM occurs at a constant rate while the system is operational. Note that a variation on this model could be to increase the rate of entering PM from the degraded state D1 (e.g. make it instead of ). However, this tacitly assumes that the state D1 is visible, meaning one is always aware of which degraded state the system is in. Recalling the device-of-stages method discussed in Section I, the degraded states here are hidden, and only used so as to convert the process into a Markovian one so that an analytical solution can be obtained. The availability as a function of the PM rate is displayed for two examples in Figs. 2 and 3. The system in Fig. 2 has a repair rate of 0.1 (per hour); the maintenance rate is set to 6 per hour (i.e. 10 minute average duration), and , (per hour). The optimal occurs at 0.007955 per hour. Fig. 3 has a repair rate equal to 1 (ten times faster in terms of repair activity) with all other parameters the same as in the previous gure. The optimal is 0.0004641 per hour.

Fig. 1. Markov chain model for a PM or CM system with two operational stages.

optimal availability for the case in which the PM policy (i.e. ) is random. The foregoing discussion addresses optimum PM policies that are nonrandom, that is, age-based, but with PM triggered at a deterministic age . However, in mission-critical systems where down-time cannot be scheduled in advance, nonrandom PM is rarely feasible or desirable. Thus, this paper will focus on models in which PM occurs at random, and the foregoing discussion will merely serve to provide a means of nding a general upper bound to the availability achievable from PM. B. A Simple Case A simple system considered here can go through one degraded stage before it is completely failed, at which time a repair (CM) will take place. When the system is still operating, periodic PM activities are triggered with constant rate . The rate for (corrective) repair is . Both repair and PM take the system back to the original (as good as new) state. A Markov chain to represent this system with two distinct operational stages (i.e. one degraded operational state) is shown in Fig. 1. The discussion below helps clarify and gives insights into solving the more general case. The initial state is UP, where the system is fully operational. With the degradation rate , the system transits to state D1, which is a degraded operational state. With rate , the system can move from state D1 to state D2, which is the failed state. The CM, with repair rate , will take the system back to state UP, and all PM actions take the system back to state UP. The system is available when it is in states UP or D1. While the system is available, periodic maintenance will take place with rate . The system is not available when it is in state PM, or in state D2. The rate to nish PM is denoted as . The steady state availability can be obtained by solving the usual set of balance equations. Because we are interested in , the availability is represented as a function of with the other parameters xed. This model is fairly simple to analyze, yielding (3) The optimal maintenance rate can be obtained by differentiating the availability expression above with respect to , and solving for . The resulting quadratic equation for has a non-negative root only when (4)

III. THE GENERAL MODEL We now turn to the general case. Fig. 4 illustrates a general continuous time Markov chain model for the PM system with operational stages (i.e. degraded stages). The system is available if it is in state UP, or in any of the D1 through

354

IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

Fig. 5. A state diagram model for PM systems with general distributions.

Fig. 2. Availability vs. PM rate example 1: ( , ,

).

there are two outage types. The UP state is the initial state where the system is functional. is the cumulative distribution function for the time to failure. represents the cumulative distribution function for the time to trigger PM. is the cumulative distribution function for the PM task to be completed, and is the cumulative distribution function for the corrective repair times. For such a system, availability is dened as the long-run probability of being in state UP or alternatively the long-run fraction of time spent in state UP. Denoting state UP as state 0, state Down as state 1, and state PM as state 2, the availability can be calculated using the formula from [15]: (6) , and are the one-step transition probabilities, given by (7)

(8)
Fig. 3. Availability vs. PM rate example 2: ( , , ).

, , and are the mean sojourn times in the Moreover, corresponding states. Thus, (9)

Fig. 4. A generalized Markov chain model for PM systems with stages.

operational

(10)

(11) states; the system is down if it is in state PM where the system is under PM, or if it is in state Dn where the system has failed. This system can be solved in general the same way as the case. Alternatively, the general solution for availability in systems having multiple outage types has been worked out in [15] for the more general semi-Markov case, which can be directly applied here. In [14] and [9], methods were developed to obtain the availability for systems with periodic PM with general system failure distributions. This work is also related to the research presented in [7] where the system availability assuming non-exponentially distributed times-to-outages was derived. In particular, the model shown in Fig. 5 describes such a systems behavior when

Comparing the state diagram model shown in Fig. 5 with the Markov Chain model displayed in Fig. 4, we see that, for our model, corresponds to the cumulative distribution of the sum of -independent exponentially distributed random variables having respective rates , i.e. the hypo-exponential distribution [12]. This distribution describes the process of degradation of our system towards failure, whereas the cumulative distributions , , and correspond to those of exponential distributions with rates , , and respectively. Therefore, we have , and . Also, from the diagram, it is clear that . To get the remaining

YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION

355

quantities, we make use of the fact that the Laplace Transform of the exponential density with rate is

and continuous. Then the so that is dened for all availability in (16) can be written more compactly as

(12) and that the Laplace Transform of a convolution of densities (i.e. the distribution of the sum of -independent random variables) is the product of the Laplace Transforms of those densities. Hence, the Laplace Transform of the aforementioned hypo-exponential density is (13) From (13), and integration by parts, we obtain It follows immediately that, when given by

(19)

, the availability is

(20)

Fig. 6 illustrates the availability vs. PM rate using (19), for a system with , , , , , , and . In the next section, we will describe how to nd the optimal PM rate that maximizes the availability. IV. OPTIMIZING THE AVAILABILITY

(14)

Maximizing availability is equivalent to minimizing the quantity inside the brackets in (19). Differentiating that quantity with respect to and setting it equal to 0 yields (21)

and (15) Putting these together into (6) immediately gives

discussed earlier, roots of this equaAs in the case tion are candidate values of at which the availability is maximized. Though there is no closed form for the roots, we can develop a necessary and sufcient condition for there to be a solution to (21) that occurs at some value . Moreover, we can argue that such a root is unique, and yields the maximum availability. By direct differentiation, and using the fact that

(16)

(22) we have

From this expression, it is clear that the availability depends , , , and only through on the parameters , the values , , , and . Dene the function

(23)

(17) Note that is a polynomial in . Hence, the apparent singularity in at is removable (i.e. the limit as approaches 0 is nite), and we may dene (18)

Applying LHospitals rule to (23), it is seen that, as approaches innity, the limit is 0, and as approaches 0, the limit is

(24)

356

IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

Because the function is a polynomial with positive coefcients, and so is increasing in , the function is also increasing, and thus (25) It can also be shown that the left side of (23) is decreasing in by arguing that for . Note that the left hand side of this inequality is a polynomial in of degree with the constant term, and all coefcients of powers of are negative. This result can be seen by multiplying out the polynomials involved. We omit the tedious details. Because the left hand side of (23) is non negative and decreasing for , taking value
Fig. 6. Availability vs. PM rate for a system with , , , , and , . ,

at

, and value 0 as

approaches innity, then as long as

If we denote the function

(26) (29) Then the secant method, a variation on Newtons iterative method, will determine a numerical approximation to the root of . This method starts with two initial guesses, , , and proceeds by forming a sequence iteratively as

which is equivalent to

(27) there will be a unique positive solution to (21). This solution yields a minimum to the function inside the brackets in (19). This result follows from the negativity of the quantity for , which implies that the solution yields a maximum availability as dened by (19). When , (21) is easily seen to agree with (4). Also, when (27) is not true, it follows that the maximum availability occurs when . For the 4-operational-stage example shown in Fig. 6, the right side of (27) is 0.336, and . Thus the optimal PM rate exists, as Fig. 6 conrms. By (27), as long as the average time to complete a PM activity is less than 33.6% of the average corrective repair time, the availability can be increased by performing PM. When (27) is satised, we see from (23) that the optimal PM rate can be obtained uniquely by solving

(28)

(30) The sequence can be stopped whenever is less than a pre-specied error tolerance. A plot of availability (19) will quickly identify a region where the maximum occurs, so that the initial points , can be selected. To illustrate the foregoing, consider the system in the example of Fig. 6 with four distinct operational stages, i.e., . The transition rates are , , , , , and . Solving (28) via (30) gives an optimal availability of 0.99877, which occurs when , corresponding to a PM rate of . To help validate these results, we simulated the example of Fig. 6 using the MATLAB code shown in the Appendix. The same set of parameter values are used. For the PM rate, we took the analytical (optimum) result of solving (28) and (30); namely, . The simulation generates 10,000 cycles (returns to the full up state UP), and accumulates all the up-time and down-time incurred during all these cycles, returning the total up-time divided by the total up-time plus total down-time as the simulation estimate of steady state availability. We ran this simulation 10 times (a total of 100,000 cycles), yielding a mean (average of the 10 simulation estimates) availability of 0.99875 with a standard error (standard deviation of the 10 simulation estimates, divided by the square root of 10) of 0.000017, in excellent agreement with the analytical optimum of 0.99877.

YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION

357

V. APPLICATIONS The hypo-exponential distribution is a general model which can describe many widely-applied systems [16], including hot standby, cold standby, and warm standby systems. This section will demonstrate the methods derived above through these commonly used systems. A. Standby Systems With Identical Units An important special case that occurs frequently in fault tolerant system design is the parallel system with identical units (i.e., one unit providing service with units as redundant, or standby units). Two cases are prevalent. One, called hot standby, has all standby units switched on at all times. The other, named cold standby, has all standby units stitched off (hence are assumed will not fail) until they are called into service. These are easily handled by the general model developed in Sections III and IV. For the cold standby case, we have , ; and for the hot standby case, we have , . For these special cases, it is seen from (28) dening the optimal for availability that a further reduction is possible. As before, we set . For the cold standby case, (28) becomes (31) where rate to exist becomes . The condition (27) for an optimal PM (32) and the availability becomes (33) For the hot standby case, (28) becomes

TABLE I VALUES OF DEFINING THE OPTIMAL PM RATE FOR THE COLD STANDBY CASE

(34)

expression (27) becomes

(35)

and the availability becomes For values of outside this range, PM cannot improve availability, so would be optimal. These tables aid in designing optimal PM for these types of systems. As an example of the use of these tables, let us revisit the example leading to Fig. 6, which is a hot standby case with , , , . As we stated earlier, the general equations and numerical methods of Section IV were used to derive the optimal , corresponding to an optimal PM rate of . The optimal

(36)

satisfying Thus, in both cases, a given combination of (31) and (32) (or (34) and (35)) determines availability via (33) (or (36)) for any choice of . In Tables I and II, we present solutions to (31) and (34). For each value of through 10, a range of satisfying (32) and (35), respectively, is presented.

358

IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

VALUES OF

TABLE II DEFINING THE OPTIMAL PM RATE FOR THE HOT STANDBY CASE

methods of Section IV. Plugging in the value of interpolated from the table into (36) gives the optimal availability approximately as 0.99877, which matches exactly the more precise calculation to the precision reported. Now suppose we wish to repeat the analysis with a different value of the transition rate, say . The previous estimate of does not change. But now we have , and . The more precise methods of Section 4 yield . Using the new value of the transition rate (normalized by ) in (36) yields 0.99816 as the approximate optimal availability, which also agrees with the more precise calculation using the methods of Section IV. B. Warm Standby Systems Consider a hybrid system where one unit is required to be operating with the remaining units in a de-energized standby status, referred to as the warm standby system. The active unit has an exponentially distributed failure time with transition rate ; while the warm standby units can fail with transition rate . Thus, in the discussions in Section III and IV can be described , , , and as: , where . The optimal is obtained by solving (28), which can be re-expressed as

(37)

Denoting becomes

, and

, then (28)

(38) availability is 0.99877. Not knowing these results, suppose that a systems engineer wants a quick assessment of the optimal PM for this design. Because this is a case of a parallel system with hot standby units, Table II can be applied. For this system, the values of and fall in the range covered by Table II, so PM can improve availability in an optimal fashion. Note that had exceeded the value determined by (35), which in this case is 0.336, PM could not improve the availability for this system (i.e. is optimal). Interpolating the value of from the column with between the values 0.01 and 0.02 in Table II yields , from which it follows that , and , all reasonably close to the more precise calculations using the To check the existence condition for an optimal PM rate, (27) becomes

(39)

YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION

359

Fig. 7. Availability vs. PM rate for a system with warm-, hot- and cold- standby , , , , and . units

and availability is derived as

(40)

Because (38) depends on both and , no single table (like Tables I and II) can be derived. As an example, suppose is 4, and let be 0.001, be 0.0005. Thus, is 0.5. According to (39), the condition for an optimal PM rate to exist is . If the previous re, then the pair rate is applied, i.e., condition is satised, and an optimal PM rate exists. Applying the secant method described in (30), the optimal PM rate for the warm-standby case is 0.0036 per hour, which is a mean of about 278 hours. The corresponding availability is 0.9991. A plot of the availability for the systems described in this section, under the conditions of warm-, hot- and cold-standby, is shown in Fig. 7. From this gure, see that, to achieve the same availability, the cold-standby system has the smallest PM rate due to the assumption that cold standby units will not fail until used. VI. CONCLUSIONS We have analyzed a system with two types of outages (system failure, and PM), and a hypo-exponential time-to-failure distribution, for the purpose of nding the optimal periodic PM rate to achieve the best system availability. The hypo-exponential distribution is represented by decomposing the operational state into a fully UP state and degraded stages of operation so that a continuous time Markov chain model can be used in the analysis. In so doing, we have determined general conditions

under which PM can improve availability for this system, and shown that under these conditions with other system parameters xed, a unique optimal rate of PM exists. Moreover, we have outlined stable numerical methods for computing the optimal rate of PM, which is more efcient than Monte Carlo simulation methods. Finally, we have applied the general model to some special cases of a parallel system with identical units in cold or hot standby, as well as a system with warm-standby units. For the two systems with hot or cold standby units, a further reduction of the number of parameters determining the optimal PM has been developed, leading to the construction of two useful tables (one for the cold, and one for the hot standby case) giving the optimal ratio of PM rate to unit failure rate as a function of the product of the repair rate and the mean time to perform PM. These two tables, developed for parallel systems with units, provide the systems engineer a way to quickly determine whether PM can improve availability, and if so, quickly compute the optimal PM rate and optimal availability. We note that the optimum PM rate is sensitive to the values of the other systems parameters, as demonstrated in the examples corresponding to Figs. 2 and 3 above. The methods developed here for computing the optimum PM rate are stable, relatively efcient, and easy to implement, making any needed sensitivity analysis straight forward to compute. Having determined an optimum rate of PM, we have in effect solved an optimal cost problem. We have found the rate of PM that minimizes the maintenance cost, where cost is measured solely in terms of long run average down time between system renewals. We acknowledge that there are also other economic costs to performing PM that may also be worthwhile to consider. For example, one could consider nding the optimum rate of PM subject to various economic constraints on maintenance. Moreover, in practice there is often considerable uncertainty in setting values for the xed parameters in this model (failure or degradation rates, and repair rates). For this reason, it may be useful to implement other optimization methods that would allow the development of condence bounds on the optimum PM rate that reect these uncertainties. We leave these as areas for future research. Finally, it is clear from the work in [2] that all the results of this paper remain true if the PM duration, and repair time have general distributions with means , and , respectively. APPENDIX MATLAB SIMULATION CODE FOR THE EXAMPLE OF FIG. 6

%Simulation for the example of Fig. 6: ; ; ; ; delta from (22) & (23) ; ; ; %optimum

% states: 0 full up; 1 degraded; 2 degraded; % 3 degraded; 4 failed (down); 5 prev. maint. (down) ; %successive visits to state 0

360

IEEE TRANSACTIONS ON RELIABILITY, VOL. 62, NO. 2, JUNE 2013

; ; %initial state ; while cycles if totalcycles

if ; ; ; end ; ; if ;

; if ; else ; end end if ; ; if ; else ; end end if ; ; if ; else ; end end if ; ; if ; else ; end end if ; ; ; end end

end

REFERENCES
[1] W. Li, Risk Assessment of Power Systems: Models, Methods, and Applications. New York, NY, USA: Wiley-IEEE, 2005, 0-471-63168. [2] J. Moubray, Reliability-Centered Maintenance. New York, NY, USA: Industrial Press, 1997, 0-8311-3146-2. [3] P. Hilber, V. Miranda, M. A. Matos, and L. Bertling, Multiobjective optimization applied to maintenance policy for electrical networks, IEEE Trans. Power Syst., vol. 22, no. 4, pp. 16751682, Nov. 2007. [4] M. Stopczyk, B. Sakowicz, and G. J. Anders, Application of a semiMarkov model and a simulated annealing algorithm for the selection of an optimal maintenance policy for power equipment, Int. J. Rel. Safety, vol. 2, no. 1/2, pp. 129145, 2008. [5] F. Yang and C. S. Chang, Multiobjective evolutionary optimization of maintenance schedules and extents forcomposite power systems, IEEE Trans. Power Syst., vol. 24, no. 4, pp. 16941702, Nov. 2009. [6] S. H. Sim and J. Endrenyi, Optimal preventive maintenance with repair, IEEE Trans. Rel., vol. 37, no. 1, pp. 9296, Apr. 1988. [7] D. Chen and K. S. Trivedi, Analysis of periodic preventive maintenance with general component failure distribution, presented at the Pacic Rim Int. Symp. Dependable Comput. (PRDC 2001), Seoul, Korea, Dec. 2001. [8] J. Endrenyi, G. J. Anders, and A. M. Leite da Silva, Probabilistic evaluation of the effect of maintenance on reliabilityAn application, IEEE Trans. Power Sys., vol. 13, no. 2, pp. 576583, May 1998. [9] B. D. Chen and K. S. Trivedi, Closed-form analytical results for condition-based maintenance, Rel. Eng. Sys. Safety, vol. 76, pp. 4351, 2002. [10] A. H. Shirmohammadi, Z. G. Zhang, and E. Love, A computational model for determining the optimal preventive maintenance policy with random breakdowns and imperfect repairs, IEEE Trans. Rel., vol. 56, no. 2, pp. 332339, Jun. 2007. [11] C. Singh and R. Billinton, System Reliability Modeling and Evaluation, Hutchinson Educational. London, U.K., 1977 [Online]. Available: http://www.ece.tamu.edu/People/bios/singh/sysreliability [12] S. M. Ross, Introduction to Probability Models, 9th ed. Hoboken, NJ, USA: Wiley, 2007, pp. 298301. [13] J. Van Casteren, Assessment of Interruption Costs in Electric Power Systems using the Weibull-Markov Model, Dissertation [Online]. Available: http://webles.portal.chalmers.se/et/PhD/VanCasterenJasperPhD.pdf [14] R. E. Barlow and R. Proschan, Mathematical Theory of Reliability, in SIAM, Philadelphia, 1996, 978-0-898713-69-5. [15] Y. Cao, H. Sun, J. Han, and K. Trivedi, System availability with nonexponentially distributed outages, IEEE Trans. Reliability, vol. 51, no. 2, pp. 193198, 2002. [16] K. S. Trivedi, Probability Statist. Rel., Queuing Comput. Sci. Appl., 2nd ed. : Wiley, 2002, 0-471-33341-7. Meng-Lai Yin is a Professor of Electrical and Computer Engineering at the California State Polytechnic University, Pomona. She received her MS, and Ph.D. degrees in Information and Computer Science from the University of California, Irvine, in 1989, and 1995, respectively. She also holds a Master degree in Electrical and Computer Engineering from National Cheng-Kung University, Taiwan. She has years of industrial experience at Hughes Aircraft and Raytheon. Her research interests include performance and reliability analysis, embedded systems, and parallel processing.

YIN et al.: OPTIMAL PREVENTIVE MAINTENANCE RATE FOR BEST AVAILABILITY WITH HYPO-EXPONENTIAL FAILURE DISTRIBUTION

361

John E. Angus has been a Professor of Mathematics in the School of Mathematical Sciences at Claremont Graduate University since 1990. He received his MA in Mathematics from UCLA in 1977, and his MS and Ph.D. in Statistics from the University of California at Riverside in 1981. After receiving his BA in Mathematics from the University of San Diego in 1975, he worked for Hughes Aircraft Company as a Systems Engineer until 1990, and has been an active consultant to Raytheon since 1996 in system engineering and algorithm development. His research interests include survival analysis, applied probability, and statistics.

Kishor S. Trivedi holds the Hudson Chair in the Department of Electrical and Computer Engineering at Duke University, Durham, NC. He has been on the Duke faculty since 1975. He is the author of a well known text entitled, Probability and Statistics with Reliability, Queuing and Computer Science Applications, published by Prentice-Hall; a thoroughly revised second edition (including its Indian edition) of this book has been published by John Wiley. He has also published two other books entitled, Performance and Reliability Analysis of Computer Systems, published by Kluwer Academic Publishers and Queueing Networks and Markov Chains, John Wiley. He is a Fellow of the Institute of Electrical and Electronics Engineers. He is a Golden Core Member of IEEE Computer Society. He has published over 490 articles, and has supervised 44 Ph.D. dissertations. He is the recipient of the IEEE Computer Society Technical Achievement Award for his research on Software Aging and Rejuvenation. His research interests are in reliability, availability, performance, and survivability of computer and communication systems, and in software dependability. He works closely with industry in carrying out reliability and availability analysis, providing short courses on reliability, availability, and in the development and dissemination of software packages such as SHARPE, SREPT, and SPNP.

You might also like