You are on page 1of 16

Reliability Engineering and System Safety 169 (2018) 364–379

Contents lists available at ScienceDirect

Reliability Engineering and System Safety


journal homepage: www.elsevier.com/locate/ress

Resilience-based network design under uncertainty


Xiaoge Zhang a, Sankaran Mahadevan a,∗, Shankar Sankararaman b, Kai Goebel b
a
Department of Civil and Environmental Engineering, Vanderbilt University, Nashville, TN, 37235, USA
b
NASA Ames Research Center, Intelligent Systems Division, Moffett Field, CA 94035, USA

a r t i c l e i n f o a b s t r a c t

Keywords: This paper introduces an approach to quantify resilience for the design of systems that can be described as a
Resilience network. A key characteristic of resilience is the ability of restoring functionality and performance in response
Networks to a disruptive event. Therefore, the restoration behavior is encapsulated via a non-linear function that provides
Recovery
the ability to model at the component level more refined attributes of restoration. In particular, it considers the
Design optimization
remaining capacity (absorptive ability), the degree to which capability can be recovered (restoration ability) and
Uncertainty
the recovery speed. The component restoration functions can then be used to impose a resilience target at a given
time as a design constraint.
The resilience-based design optimization is then formulated for both deterministic and stochastic cases of a net-
work system. The objective is to have as the design solution a network that incurs the least cost while meeting
system resilience constraints. Maximum flow through the network is used as a measure of system performance.
Several possible links are examined with regards to flow performance from origin node to a destination node.
A probabilistic solution discovery algorithm is combined with stochastic ranking to approach this problem. Two
numerical examples are used to illustrate the procedure and the effectiveness of the proposed method.
© 2017 Elsevier Ltd. All rights reserved.

1. Introduction restored in a timely manner after the disruption. As a result, recent ef-
forts have expanded from reliability engineering to a new paradigm –
Considering the ubiquitous nature of multiple infrastructure net- resilience engineering [6,7]. In general, while reliability engineering fo-
works in our daily life, there is an ever-increasing demand for ensuring cuses on enhancing the ability of a system (or component) to work prop-
their regular function by minimizing the adverse effects of disruptive erly for a specified period of time by reducing its probability of failure
events. A common strategy is to reduce the likelihood of system mal- through testing and simulation, resilience engineering emphasizes im-
functions by increasing system redundancy, which is often the focus of proving the system’s capability to bounce back from disruptive events
reliability engineering [1,2]. Nevertheless, strategies are also needed to quickly to offer a desired level of performance (perhaps not the original
mitigate the consequences of undesirable events, because recent nat- level of performance) after the disruption. Three types of strategies have
ural disasters demonstrate that not all undesirable random events are been advocated for resilience, namely, preparedness, timely response,
preventable. For example, in 2008, a deadly earthquake with a magni- and rapid recovery [8].
tude of 8.0 hit the county Wenchuan in Sichuan province, China, caus- Since Holling [9] first introduced the concept of resilience and
ing 69,197 confirmed deaths, 374,176 injured, and 18,222 missing [3]. demonstrated its significant role in maintaining the stability of eco-
Several infrastructure systems, e.g., power network, telecommunication logical systems, this research topic has received increasing attention in
network, were heavily undermined by this earthquake, which severely other domains [10,11]. Subsequently, much effort has been dedicated
impeded on the subsequent rescue activities. The 14 August 2003 black- to defining and to measuring system resilience. For example, Haimes
out cost between US $4 billion and $6 billion according to the U.S. De- [12] defined system resilience as the ability of a system to withstand and
partment of Energy [4]. In 2016, heavy rains devastated the transporta- to recover from a major disruption, and compared it with reliability, ro-
tion system of Beijing, and hundreds of flights and trains were cancelled bustness, vulnerability, and risk. In 2009, Attoh-Okine et al. [13] formu-
after the capital was hit by persistent rain [5]. lated a resilience index for urban infrastructure using Dempster–Shafer
The above large-scale disruptive events highlight the need for in- theory. In 2012, Henry and Ramirez-Marquez [14] identified several
novative design of infrastructure systems, whose performance can be important parameters associated with system resilience quantification,
e.g., disruptive events and component restoration, in which they mea-


Corresponding author.
E-mail address: sankaran.mahadevan@vanderbilt.edu (S. Mahadevan).

https://doi.org/10.1016/j.ress.2017.09.009
Received 22 March 2017; Received in revised form 19 July 2017; Accepted 22 September 2017
Available online 28 September 2017
0951-8320/© 2017 Elsevier Ltd. All rights reserved.
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

sured system resilience as a time dependent function. In 2012, Ouyang ical functionality of the Indian Railways Network by considering three
et al. [15] assessed the resilience of the power transmission grid in Harris disruptive events, namely the 2004 Indian Ocean Tsunami, 2012 North
County, Texas USA under hurricane and other hazards. In 2013, Barker Indian blackout, and a cyber-physical attack. However, how to carry
et al. [16] developed an indicator to measure the component impor- out a priori analysis during the design phase to optimize the topology of
tance by quantifying its adverse impact on system resilience when the the current network to strengthen its resilience against these disruptions
disruption affected that component. Likewise, Fang et al. [17] utilized has not been addressed yet. In 2017, Fang and Sansavini [36] adopted a
two metrics, i.e., optimal repair time and resilience reduction worth, to planner-attacker-defender model to optimize power system investments
measure the importance of the components in a network system from the and resilience against attacks through capacity expansion and switch
perspective of their contribution to system resilience. In 2016, Adjetey- installation. Asadabadi and Miller-Hooks [37] modelled the impact of
Bahun et al. [18] proposed a simulation-based model to quantify the re- climate change in terms of sea level rise (SLR) on roadway performance
silience in mass railway systems by using passenger delay and passenger as a multi-stage, stochastic, bi-level, mixed integer program and devel-
load as system performance indicators. In 2016, Zhen and Mahadevan oped a to identify a recursive noisy genetic algorithm optimal invest-
[19] modeled system resilience as a function of time-dependent system ment location, timing and extent.
reliability, system failure paths and recovery probability, and utilized However, there are some common shortcomings in the above studies.
sensitivity analysis to measure the component importance. In 2017, Fo- First, they do not characterize the significant features of a system or a
touhi et al. [20] developed a bi-level, mixed-integer, stochastic program component (e.g., link in traffic network) in the aftermath of a disruptive
to quantify the resilience of a coupled traffic-power network under a event, such as, absorptive ability (the remaining capacity), restorative
host of potential natural hazard-impact scenarios. Recently, Hosseini ability (the restoration magnitude), and restoration speed. Such charac-
et al. [21] presented a comprehensive review focusing on qualitative teristics are important considerations in designing resilient infrastruc-
and quantitative modelling of system resilience in engineering systems. ture system. In addition, when a disruptive event happens, the system
Many different definitions of resilience are available [22], depend- restoration and recovery usually consumes a certain amount of time to
ing on the specific subject area. All of the definitions have one common bounce back to the original performance. From this perspective, system
goal: to understand the system resilience in different contexts so as to resilience is a time-varying variable, whereas, most current studies do
design and to deploy resilient infrastructure systems. However, meth- not model the system resilience in this manner.
ods for resilient infrastructure system design are yet to be explored. To In this paper, we are motivated to fill the above gaps by mathemat-
the best of the authors’ knowledge, only a few research studies have in- ically formulating the resilience-based network design problem and de-
vestigated this issue. For example, Youn et al. [23] proposed a concep- veloping an effective approach to identify solution to this problem. With
tual resilience definition for engineered systems by incorporating sys- respect to system resilience, we adopt the definition proposed by Henry
tem reliability and restoration, developed a Resilience-Driven System and Ramirez-Marquez [14] because their definition helps to model the
Design (RDSD) framework, and demonstrated its usefulness in a simpli- system resilience quantitatively as an attribute of a system’s delivery
fied aircraft control actuator optimization problem. In order to achieve function. Compared to the current state of the art, we make the follow-
the target system resilience, Yodo and Wang [24] presented a three- ing contributions:
step framework to allocate resilience optimally for the early stage de-
sign of complex engineering systems: quantification of system resilience • To design a resilient network system, the first step is to understand
through a Bayesian network, identification of critical components by how each system component recovers over time after the disruption.
utilizing sensitivity analysis, and allocation of resilience to critical com- In this paper, we introduce a flexible nonlinear function to charac-
ponents. Christopher and Peck [25] explored resilience in supply chains, terize the component restoration process after the disruption.
and identified a number of discernible general principles that underpin • We formulate the resilience-based network design optimization
resilience in supply chains. problem. Our goal is to design a system with minimal cost while
However, most of these studies focus on resilient design in engineer- satisfying a resilience constraint. The resilience constraint requires
ing systems. Whereas, in reality, many infrastructure systems, e.g., trans- that the system performance spring back to a desired level after the
portation networks, power grids, and telecommunication networks, are disruptive event.
organized in the form of networks. By enabling resilience in these in- • Since uncertainty arises in modeling the component restoration, we
frastructure systems, these systems can be equipped with the capability consider two cases: deterministic and stochastic. The system re-
to return to the original performance level or some other desired state silience constraint varies from case to case. In the first case, it is
in the presence of disruptive events, which has the potential to reduce a deterministic constraint. While in the latter, the system resilience
the economic losses. is a probabilistic constraint.
Unfortunately, the design of resilient infrastructure systems is still • To solve the resilience-based network design optimization problem,
a largely unexplored topic. Only a few studies have investigated re- we develop a probabilistic solution discovery algorithm and inte-
silient design [26–28] and communication systems [29,30]. For exam- grate it with a stochastic ranking approach to identify the optimal
ple, in 2012, Chen and Miller-Hooks [31] identified an optimal post- solution.
event course of actions for an intermodal freight model transport net-
work in the immediate aftermath of the disruptive event so as to ful- The rest of this paper is organized as follows. In Section 2, we in-
fill target operational levels while adhering to a fixed budget. In 2014, troduce system resilience and the system performance metric used in
Faturechi et al. [32] modeled the airport resilience as the expected this paper. In Section 3, we define the resilience-based network design
fraction of total pre-event demand in terms of arrival and departure problem and develop an algorithm to solve this problem. In Section 4,
flows that can be met post-repair within limited repair time and budget, two numerical examples are used to illustrate the proposed method and
formulated the problem as stochastic integer program, and identified demonstrate its efficiency. In Section 5, we conclude this paper with a
the optimal allocation of limited resources to maximize the system re- brief summary and suggest possible directions for future research.
silience. Later, Faturechi and Miller-Hooks [33] formulated a bi-level,
three-stage Stochastic Mathematical Program to characterize and opti-
mize the travel time resilience in road networks. In 2015, Fang et al. 2. Background
[34] considered the problem of optimizing the power transmission net-
work under the objective of maximizing system resilience to cascading In this section, we review the definition of system (network) re-
failures and minimizing investment costs. In the same year, Bhatia et al. silience, which is originally discussed in Refs [14,38], and the modelling
[35] quantified several different recovery strategies in restoring the crit- of network performance.

365
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

given the disruptive event eq can be defined as [16]:


| ( )
𝜓( 𝑡𝑓 |𝑒𝑞 ) − 𝜓 𝑡𝑑 ||𝑒𝑞
| 𝑞 |
𝐑𝐹 ( 𝑡𝑓 |𝑒 ) = ( ) ( ) (2)
| 𝜓 𝑡0 − 𝜓 𝑡𝑑 ||𝑒𝑞

2.2. System performance in traffic networks

Network concepts have been widely used to model numerous infras-


tructure systems, e.g., traffic network, airline network, power transmis-
sion network, telecommunication network, and railway network. The
primary service of these infrastructure systems is to transport goods or
products from many sources to multiple destinations, e.g., data packets,
passengers, goods, or electricity. In such circumstances, it is important
to know the maximum service level that the system can offer because it
strongly impacts the service quality. For example, the telecommunica-
Fig. 1. Concept of system resilience [14]. tion service providers might be concerned with the maximum number
of telephone phone calls that can be made simultaneously between two
cities. The traffic engineers might want to know the maximum flow rate
2.1. System resilience of vehicles from the downtown to the free-way ramp. Mathematically,
these issues can be modeled as a maximum flow problem, where the in-
Consider an infrastructure system, such as a power network system, terest to quantify the maximum amount of flow that can be transported
transportation system, inland waterway network etc. If unexpected dis- through a network. In this paper, we use the maximum flow as a metric
ruptions occur, its state transitions can be modelled using Fig. 1. To to measure the system performance because of its extensive applications
quantify the system resilience R, we introduce a system performance in the real world as well as the simplicity and easiness to adapt it to other
function 𝜓(t) to describe the system behavior at time t. Commonly used problems.
representations of this function can be network capacity, travel time, Consider a directed network G(V, E), where V denotes the set of ver-
traffic flow, system throughput, or network connectivity depending on tices, and E represents the set of arcs in G. Each link (i, j) ∈ E is associated
the specific system under consideration. As can be observed in Fig. 1, with a nonnegative flow variable xij . The capacity of each link (i, j) is
there are several distinct stages to characterize the transition of the sys- denoted by uij . Let o and d (o, d ∈ V) be the origin node and the desti-
tem over time: nation node of network G, respectively. The maximum flow problem is
to send as much flow as possible from o to d, while the flow fij along
• Before the occurrence of the disruption, the original system is oper- the link (i, j) cannot exceed its capacity uij . Mathematically, the problem
ated at the as-designed state S0 ; can be formulated in the following form:
• Once the disruption event eq happens at time te , the system per-
formance starts to degrade over time due to the failure of system max 𝑓𝑜𝑑
∑ ∑
components or the loss of partial functionality. The system perfor- 𝑠.𝑡. 𝑓𝑖𝑗 − 𝑓𝑗𝑖 = 0, for each 𝑗 ∈ 𝑉 ∖{𝑜, 𝑑 },
mance continues to degrade until it reaches a maximum disrupted (𝑖,𝑗 )∈𝐸 (𝑗,𝑖)∈𝐸
state 𝜓(Sd ) at time td . 0 ⩽ 𝑓𝑖𝑗 ⩽ 𝑢𝑖𝑗 , ∀(𝑖, 𝑗 ) ∈ 𝐸. (3)
• In response to the disturbance, certain measures are carried out to
recover the system functionality. At this stage, two different activi- In this paper, the maximum flow from the starting node o to the
ties get involved: repair and system recovery. Preparation refers to ending node d is used as an indicator of system performance. In the
the time required for identifying the system malfunction and repair- remaining sections, the system performance refers to the maximum flow
ing or replacing the impaired components. When all the preparation from o to d if not particularly indicated.
work is completed, the system performance begins to recover from
the disrupted state Sd at time ts . With time, the system performance 3. Resilience-based design optimization
restores to a new stable state Sf , and is maintained thereafter.
In this section, we first define the network design problem, formu-
Let R(t) denote the resilience of a system at time t, since resilience late it as a resilience-based optimization problem, then solve it in three
describes the ratio of system recovery at time t to the loss suffered by steps: first, we introduce a function to describe component restoration
the system at some previous point in time td , then R(t) can be expressed over time. Next, two mathematical models are formulated to character-
by the following equation [16]: ize resilience-based design optimization in deterministic and stochastic
cases. Finally, a probabilistic solution discovery algorithm is integrated
Recovery (𝑡) with stochastic ranking to find out the solution to each problem.
𝐑 (𝑡 ) = ( ) , 𝑡 ⩾ 𝑡𝑑 . (1)
Loss 𝑡𝑑
3.1. Describing the behavior of network components after the disruption
As shown in Fig. 1, 𝜓(t0 ) describes the value of the system service
function corresponding to the stable state S0 . The system performance As mentioned earlier, the first step in designing a resilient infrastruc-
remains at this level until the occurrence of the disruptive event eq at ture system is to understand how each system component behaves when
time te , upon which the system resilience is exhibited. Once the disrup- a disruptive event eq happens. Typically, a system component experi-
tive event eq occurs, the system performance degrades gradually until it ences two stages following a disruptive event: performance degradation
converges to a stable disrupted state Sd at time td , and the system deliv- (from the original state to a disrupted state) and performance recovery
ery function value corresponding to this disrupted state is 𝜓(td ), which is (from the disrupted state to a recovered state). With respect to the per-
lower than its original value 𝜓(t0 ). After a duration 𝑡𝑠 − 𝑡𝑑 , the recovery formance disruption, the system component performance drops down
action is taken at time ts , which restores the system from the disrupted to a new stable state, but only partial or no capability is retained be-
state Sd to a new stable state Sf with system performance function value cause of the disruptive event. The performance restoration is concerned
𝜓(tf ) at time tf . Based on the above definitions, the system resilience with the speed at which the network component bounces back to the

366
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

normal or improved operations after the disruption; the recovery may


be complete or partial.
To model the complex component restoration behavior, a function
needs to satisfy the following properties:
• As recognized by Cagnan and Davidson [39], it is desirable for a
restoration model to include the effect of possible decision variables
(e.g., the number of response personnel, amount of repair materials
and repair prioritization rules) on the restoration speed.
• The model should be able to accommodate uncertainty regarding in
the restoration process. Since the component restoration is a compli-
cated process, it is essential to represent the epistemic uncertainty
in the restoration curve.
• The model needs to be flexible enough to deal multiple types of haz-
ards. Such flexibility enables us to characterize the restoration pro-
cess that are generalizable to different situations.
Any function satisfying the above requirements can be utilized to
model the component restoration behavior. In this paper, for the sake
Fig. 2. Link capacity recovery following the disruptive event eq (b = 2, 𝜆 = 0.8, and
of illustration, we develop a nonlinear function, which meets these re- 𝑎 = [0, 1] with step size of 0.1).
quirements, to model the characteristics of each system component in
the presence of the disruptive event eq :
[ ( ) ( )]
𝑢∗𝑖𝑗 (𝑡) = 𝑢𝑖𝑗 𝑎𝑖𝑗 + 𝜆𝑖𝑗 ⋅ 1 − 𝑎𝑖𝑗 ⋅ 1 − 𝑒−𝑏𝑖𝑗 𝑡 (4)
where uij denotes the original performance of link (i, j) (in this case, it
refers to the link capacity), t represents the duration after the disrup-
tion, 𝑢∗𝑖𝑗 (𝑡) represents the restored link capacity at time t, aij denotes the
disrupted capacity retained in link (i, j) after the disruption, bij char-
acterizes the restoration speed of link (i, j), and 𝜆ij is the ratio used to
denote the degree to which the link is able to recover compared to its
original performance.
This nonlinear function incorporates core resilience concepts (i.e.,
absorptive capacity and restorative ability) and the time to recovery in
modelling the component resilience. In addition, the non-linear func-
tion defined in Eq. (4) is also equipped with the ability to characterize
the variability of component performance restoration speed over time.
Often, when we restore a complex component, the part that takes the
least amount of time or resources will be repaired first, whereas the
part which consumes the largest amount of resources or time will be
repaired last. From this point of view, the speed of component perfor- Fig. 3. Link capacity recovery following the disruptive event eq (a = 0.2, 𝜆 = 0.8, and
mance restoration gets slower and slower over time [40]. This feature 𝑏 = [0, 10] with step size of 1).
has been captured by the non-linear function defined in Eq. (4).
With respect to time t, this non-linear function also has the following
properties: the increase of parameter a, more capacity along the link is retained. Es-
• When 𝑡 = 0, Eq. (4) degenerates to 𝑢∗𝑖𝑗 (𝑡)
= 𝑢𝑖𝑗 𝑎𝑖𝑗 . In other words, the pecially when 𝑎 = 1, the link is immune to this disruptive event, and no
component (i, j) stays at the disrupted state because the recovery capacity is lost. The interpretation of parameter a is simple: the more
actions have not been taken. functionality retained relative to original capacity, the higher the ab-
• Suppose all the parameters are constants and bij > 0, with the in- sorptive capacity.
( ) The parameter b enables us to handle different restoration speeds.
crease of t, the term 1 − 𝑒−𝑏𝑖𝑗 𝑡 increases accordingly. In this case,
As illustrated in Fig. 3, we fix the parameters a and 𝜆 at 0.2 and 0.8,
the capacity of link (i, j) restores gradually over time.
respectively. When b is increased from 0 to 10 with a step size of 1,
• When t→∞ and bij > 0, 1 − 𝑒−𝑏𝑖𝑗 𝑡 → 1, 𝑢∗𝑖𝑗 (𝑡) → 𝑢𝑖𝑗 [𝑎𝑖𝑗 +
the link capacity springs back to the same recovered state with differ-
𝜆𝑖𝑗 ⋅ (1 − 𝑎𝑖𝑗 )]. As can be observed, the degree to which the link (i, ( ) ( )
ent speeds. Especially when 𝑏 = 0, the term 𝜆𝑖𝑗 ⋅ 1 − 𝑎𝑖𝑗 ⋅ 1 − 𝑒−𝑏𝑖𝑗 𝑡 in
j) is able to recover compared to its original state is determined
Eq. (4) becomes zero, which indicates that the link capacity is not recov-
by two other variables: aij and 𝜆ij . This characteristic can also be
erable. Another observation is that the recovery speed along a given link
noticed from Fig. 2: when t → ∞, the component capacity converges
increases accordingly with the increase of the value of b if all the other
to a stable value.
parameters are constants. Thus, this function defined in Eq. (4) has the
In addition to the variable t, the extra parameters, a, b, and 𝜆, also flexibility to model the restoration speed, and therefore capable of han-
increase the flexibility of this function, which enables us to handle the dling different disruptive events given their distinct effect on the link
component recovery process in multiple different applications. recoverability.
Since each system component has different characteristics, each has The degree to which the link could recover relative to its original
different post-disruption performance level. In this paper, we account capacity is represented by the parameter 𝜆 in Eq. (4). Fig. 4 displays
for this factor by using the parameter a, and it determines the remain- the recoverability along a given link when we fix a and b at 0.3 and 2,
ing capacity along each link after the disruption. Fig. 2 illustrates these and increase 𝜆 from 0 to 1 with a step size of 0.1. In particular, when
( ) ( )
concepts. Specifically, we fix the parameters b and 𝜆 at 2 and 0.8, re- 𝜆 = 0, the term 𝜆𝑖𝑗 ⋅ 1 − 𝑎𝑖𝑗 ⋅ 1 − 𝑒−𝑏𝑖𝑗 𝑡 in Eq. (4) becomes zero. This
spectively, and increase the value of a from 0 to 1 in a step size of 0.1. means that the link capacity cannot be restored from the disrupted state.
As can be observed, when 𝑎 = 0, the link loses all of its capacity. With When 𝜆 = 1, the link can restore to its original performance if b ≠ 0,

367
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

people injured in the devastating Bam earthquake in Iran in 2003. As


a result, only 23% of the blood units were supplied to the affected res-
idents, which increased the number of fatalities [42]. Hence, in order
to respond to disruptive events more effectively, we incorporate a re-
silience constraint to require that the system performance be restored
to a desired level.
Considering the importance that the system recoverability plays in
alleviating the consequence of the disruptive event eq , we include the
system resilience as a constraint in our system design. Specifically, given
the different restoration capability of each system component, our goal
is to design an infrastructure system so that its performance at a given
time instant can be restored to a desired level if the disruptive event eq
occurs. Specifically, we consider the following two cases depending on
the specific values of the relevant parameters.

3.2.1. Deterministic case


As illustrated in Eq. (4), each system component is modeled by a
number of parameters, e.g., 𝜆, a, and b. If we have precise knowledge
Fig. 4. Link capacity recovery following the disruptive event eq (a = 0.3, b = 2, and about these parameters, then there is no uncertainty in our estimation
𝜆 = [0, 1] with step size of 0.1).
about them. To calibrate the values of these parameters, a simple way is
to leverage various optimization techniques to minimize the deviation
between our prediction and the actual component capacity recovery his-
and 𝑡 → +∞. Obviously, the larger the 𝜆, the higher the post-disruption
tory based on the data collected over the past years, by which a deter-
performance level.
ministic value is derived for each parameter. In this case, the problem
Last but not the least, all three parameters, a, b, and 𝜆, can also
can be formulated as:
be functions of other variables. For example, the parameter a can be a ∑
function of the specific disruption scenario characteristics, e.g., damage min 𝑓 = 𝛿𝑖𝑗 𝑐𝑖𝑗 ,
type, disruption intensity, location, and its effect on other individual (𝑖,𝑗 )∈𝐸
( ) ( )
processes. Similarly, the restoration speed relies closely on the avail- 𝜓 𝒖∗ (𝑡)||𝑒𝑞 − 𝜓 𝑡𝑑 ||𝑒𝑞
ability of relevant resources [32], e.g., machinery/tools, personnel, and 𝑠.𝑡. ( ) ( ) > 𝜅. (5)
𝜓 𝑡0 − 𝜓 𝑡𝑑 ||𝑒𝑞
money. From this point of view, this model allows us to characterize
even more complicated component restoration behavior. Such features where 𝜑(.td |eq ) is the disrupted system state at time td , 𝜓(.u∗ (t)|eq ) is the
increases the flexibility of the proposed restoration function. The intro- system state at time t, 𝜓(t0 ) is the original system performance before
duction of the three parameters a, b, and 𝜆 also enables us to quantify the occurrence of the disruptive event eq , and 𝛿 ij is a binary decision
the contribution of each parameter to the system resilience by changing variable,
{
the value of one parameter and fixing the values of all the remaining 1, if link (𝑖, 𝑗) is included in our network,
𝛿𝑖𝑗 = (6)
parameters. Such analysis enables us to measure the sensitivity of the 0, otherwise.
system response to each parameter, thus relating our decisions with the and cij denotes the cost incurred to build link (i, j). The term
parameters. In addition, the above figures demonstrate the developed 𝜓 ( 𝒖∗ (𝑡)|𝑒𝑞 )−𝜓 ( 𝑡𝑑 |𝑒𝑞 )
function can handle the restoration process if it is continuous. When the 𝜓 (𝑡 0 )− 𝜓 ( 𝑡 𝑑 | 𝑒 𝑞 )
denotes the system resilience at time t, and 𝜅 is a
component restoration is a stepwise/discrete process, we can make the predefined system resilience threshold. Our objective is to minimize the
parameters a, b, and 𝜆 be discrete functions to model such process. system design cost subject to the system resilience constraint.

3.2.2. Non-deterministic case


3.2. Optimization problem formulation In many cases, the parameters 𝜆, a, and b are likely uncertain due
to several reasons, such as: (1) there is not enough data available to
Consider the network G(V, E), where V is the set of vertices, and E estimate these parameters; (2) there is uncertainty associated with the
is the set of all candidate links. Suppose the number of nodes in net- nature of the event eq ; (3) we do not have complete knowledge about
work G is N, and the number of candidate links in set E is m. Each the disruptive event; or (4) we are not able to analyze the restoration
link (i, j) ∈ E is associated with a recoverability function 𝑢∗𝑖𝑗 as described behavior of each system component after the disruption. All these fac-
above. To build a link (i, j), a cost cij is incurred. The original capac- tors increase our uncertainty in evaluating these parameters. Using a
ity associated with the link (i, j) before the disruption is denoted by Bayesian perspective, we represent our uncertainty regarding these pa-
uij . After the disruption, the restored capacity of the link (i, j) at time rameters through probability distributions. We formulate the following
t is defined by 𝑢∗𝑖𝑗 (𝑡). The network state vector at time t, denoted by equation to address the system optimization problem in the presence of
( )
𝒖∗ (𝑡) = ⋯ , 𝑢∗𝑖,𝑗 (𝑡), ⋯ , ∀(𝑖, 𝑗 ) ∈ 𝐸, expresses the performance of all the uncertainty.

links at time t. The system performance function 𝜓(u∗ (t)) can be ana- min 𝑓 = 𝛿𝑖𝑗 𝑐𝑖𝑗 ,
lyzed for any possible network state u∗ according to specific applica- (𝑖,𝑗 )∈𝐸
( ( ) ( ) )
tions. In this way, we map a collection of link states to a system state, 𝜓 𝒖∗ (𝑡)||𝑒𝑞 − 𝜓 𝑡𝑑 ||𝑒𝑞
which enables us to check whether the designed network meets our re- 𝑠.𝑡. 𝑃 ( ) ( ) > 𝜅 > 𝛽. (7)
𝜓 𝑡0 − 𝜓 𝑡𝑑 ||𝑒𝑞
quirement or not.
We define a system resilience constraint, which requires that the sys- As can be observed, the only difference between the deterministic
tem performance must restore to a desired level after some time. This case and the stochastic case is the resilience constraint. In the stochastic
resilience constraint has significant implications. For example, in emer- case, the uncertainty regarding the parameters, propagates through the
gency logistics planning after a disaster, it is essential to guarantee the system model to the estimation of system state at time t. As a result, we
effective delivery of a desired amount of goods and supplies to the af- formulate a probabilistic resilience constraint, which requires that the
fected area. As indicated by Jabbarzadeh et al. [41], ineffective blood probability of the system having a resilience larger than 𝜅 at time t be
planning and transportation caused untimely delivery of blood for those larger than 𝛽.

368
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

3.2.3. Additional constraints Based on this vector, Monte Carlo (MC) simulations are employed to
Although the system resilience shown in Eq. (2) embraces the tempo- generate a fixed number – called SAMPLE – of potential network design
ral dimension of system recovery, the definition is still “memoryless”: in strategies. The algorithm terminates when the probability of each link
the sense that different performance restoration curves might have the to be present in the optimal network design converges to either 0 or 1.
same value of system resilience RF (.tr |eq ). For example, suppose we have In other words, when the vector 𝜸 𝝉 can no longer be updated, the algo-
two different systems, in the first system, the system resilience at 𝑡 = 7 is rithm stops, and the links with the probability equal to 1 constitute the
𝐑𝐹 ( 𝑡 = 7|𝑒𝑞 ) = 50−10
60−10
= 0.8. While in the second system, its resilience at optimal solutions.
𝑡 = 7 is 𝐑𝐹 ( 𝑡 = 7|𝑒𝑞 ) = 6−2 = 0.8. If only the resilience constraint is im- In the Strategy Analysis step, we analyze the maximum flow from
7−2
posed, it is not adequate to distinguish these two different restoration o to d corresponding to each network design strategy previously gen-
process. erated, as denoted by 𝜓(u∗ (t)). Many methods are available to com-
Additional constraints could be imposed to overcome this drawback, pute the maximum flow from an origin node to a destination node,
such as, the system performance at a given time instant must be larger e.g., Goldberg method [57], push-relabel approach [58], and maxi-
than a predefined threshold, or the system performance within a time mum adjacency (MA) ordering method [59]. In this paper, we use
interval must exceed a desired level [19]. For the sake of illustration, the Ford-Fulkerson algorithm in Ref. [60] to calculate the max-
in this paper, we consider the system resilience constraint along with imum flow between the origin node o and destination node d. The
another constraint requiring that 𝜓(.u∗ (t)|eq ) ≥ 𝜉. In the deterministic Ford-Fulkerson method calculates the maximum flow in a network
case, it is mathematically expressed as: by identifying augmenting paths in residual graphs iteratively and it has
∑ a time complexity of O(Ef), where E and f denote the number of edges
min 𝛿𝑖𝑗 𝑐𝑖𝑗 , and the maximum flow in network G, respectively.
(𝑖,𝑗 )∈𝐸
( ) The next step is to rank all the generated solutions based on their ob-
𝜓 ( 𝒖∗ (𝑡)|𝑒𝑞 ) − 𝜓 𝑡𝑑 ||𝑒𝑞 jective and constraint values. A commonly used method in constrained
𝑠.𝑡. ( ) ( ) >𝜅
𝜓 𝑡0 − 𝜓 𝑡𝑑 ||𝑒𝑞 optimization problems is to introduce a penalty term into the objective
( ) function to penalize the solutions violating the constraints [61]. Con-
𝜓 𝑢∗ (𝑡)||𝑒𝑞 ⩾ 𝜉. (8)
sider the following problem,
Likewise, in the non-deterministic case, the problem can be formu- { }
lated as: min 𝑓 ( 𝒙 ), 𝒙 = 𝑥1 , ⋯ , 𝑥𝑛 ∈ 𝑅𝑛

min 𝛿𝑖𝑗 𝑐𝑖𝑗 , 𝑠.𝑡. 𝑥𝑖 ⩽ 𝑥𝑖 ⩽ 𝑥𝑖 , 𝑖 ∈ {1, ⋯ , 𝑛}
(𝑖,𝑗 )∈𝐸 𝑔𝑗 (𝒙) ⩽ 0, ∀𝑗 ∈ {1, ⋯ , 𝑚} (10)
( ( ) )
𝜓 ( 𝒖∗ (𝑡)|𝑒𝑞 ) − 𝜓 𝑡𝑑 ||𝑒𝑞
𝑠.𝑡. 𝑃 ( ) ( ) > 𝜅 > 𝛽, The penalty term rg helps to transform a constrained optimization
𝜓 𝑡0 − 𝜓 𝑡𝑑 ||𝑒𝑞
( ( ) ) problem into an unconstrained one:
𝑃 𝜓 𝑢∗ (𝑡)||𝑒𝑞 ⩾ 𝜉 > 𝜃. (9) ( )
𝜓 (𝒙) = 𝑓 (𝒙) + 𝑟𝑔 𝜙 𝑔𝑗 (𝒙), 𝑗 = 1, ⋯ , 𝑚 . (11)
3.3. Resilience-based system design
where 𝜙(gj (x)) is a real-valued function to quantify the degree of viola-
Given the above two problem formulations, we develop an effec- tion of the j-th constraint.
tive approach to identify the optimal solution. Both combinatorial and The penalty function was implemented in the original probabilistic
heuristic algorithms can be utilized to tackle this problem. But the com- solution discovery algorithm [52], and the solutions were ranked based
binatorial techniques need to search the entire solution space and there- on the updated objective value. However, one challenge in the penalty
fore computationally expensive, it takes much computational effort and method is how to determine the optimal value for each penalty term rg . If
resources to solve this problem. As a result, the use of combinatorial rg is too large, a feasible solution could be found, but the solution quality
techniques in the literature is restricted to small networks. On the other might be very poor. If rg is too small, an infeasible solution might not
hand, heuristic algorithms are able to provide near optimal solutions get penalized enough and might selected as optimal. For a given penalty
within an acceptable computational time, even for large-scale problems. term rg > 0, let the ranking of 𝜇 individuals be:
Hence, several heuristic algorithms have been proposed to tackle com- ( ) ( ) ( )
𝜓 𝒙𝟏 ⩽ 𝜓 𝒙𝟐 ⩽ ⋯ ⩽ 𝜓 𝒙𝝁 (12)
plicated problems, such as Genetic Algorithm [43–45], Particle Swarm
Optimization [46,47], and others [48–51]. where 𝜓 is the function given by (11). Then the relationship between
In this paper, we leverage a probabilistic solution discovery algo- individual i and individual 𝑖 + 1 can be modeled as:
rithm due to its efficiency in solving system optimization problems. This
𝑓𝑖 + 𝑟𝑔 𝜙𝑖 ⩽ 𝑓𝑖+1 + 𝑟𝑔 𝜙𝑖+1 ,
𝑖 ∈ {1, ⋯ , 𝜇 − 1} (13)
algorithm was originally proposed by Ramirez-Marquez and Rocco [52],
and it has been widely used to address various network optimization ( ) ( ( ) )
where the notations 𝑓𝑖 = 𝑓 𝒙𝒊 , and 𝜙𝑖 = 𝜙 𝑔𝑗 𝒙𝒊 , 𝑗 = 1, ⋯ , 𝑚 are
problems, e.g., network interdiction [53], network protection [54], all- used for convenience. A parameter ̂𝑟𝑖 , referred to as the critical penalty
terminal network reliability allocation [52], and container inspection coefficient, is introduced:
strategy optimization [55]. However, the original probabilistic solution
𝑓𝑖+1 − 𝑓𝑖
discovery algorithm [53] introduces a penalty term to transform a con- ̂
𝑟𝑖 = , for 𝜙𝑖 ≠ 𝜙𝑖+1 . (14)
strained optimization problem into an unconstrained one. But how to 𝜙𝑖 − 𝜙𝑖+1
decide the optimal value for the penalty term turns out to be a diffi- Three propositions are derived to show the dominance of either
cult optimization problem by itself. To overcome this challenge, we uti- penalty function or objective function depending on the specific value
lize the stochastic ranking technique developed by Runarsson and Yao of rg :
[56] and integrate it with the probabilistic solution discovery algorithm
to deal with the resilience-based network design problem. • Suppose 𝑓𝑖 < 𝑓𝑖+1 and 𝜙𝑖 > 𝜙𝑖+1 , we have 𝜙𝑖 − 𝜙𝑖+1 > 0. When 𝑟𝑔 >
In general, this algorithm consists of three steps: Strategy Develop- ̂
𝑟𝑖 , the comparison is said to be dominated by the penalty func-
( )
ment, Strategy Analysis, and Solution Discovery. In the Strategy Devel- tion because 𝑟𝑔 𝜙𝑖 − 𝜙𝑖+1 > 𝑓𝑖+1 − 𝑓𝑖 . When 0 < 𝑟𝑔 ⩽ ̂ 𝑟𝑖 , the com-
opment step, since we do not know whether to include a given link (i, parison is said to be dominated by the objective function because
( )
j) in the network design or not, we use a vector 𝜸 𝝉 to define the prob- 𝑟𝑔 𝜙𝑖 − 𝜙𝑖+1 ⩽ 𝑓𝑖+1 − 𝑓𝑖 . If both individuals do not violate the con-
ability that a given link will be present in our optimal network design. straints, we have 𝜙𝑖 = 𝜙𝑖+1 = 0 and ̂ 𝑟𝑖 → ∞.

369
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

• If 𝑓𝑖 ⩾ 𝑓𝑖+1 and 𝜙𝑖 < 𝜙𝑖+1 , we have 𝜙𝑖 − 𝜙𝑖+1 < 0, 𝑓𝑖+1 − 𝑓𝑖 ⩽ 0, and Algorithm 2 : Improved probabilistic solution discovery algorithm.
̂
𝑟𝑖 ⩾ 0. The comparison is said to be dominated by the penalty func- 1: Initialize SAMPLE, S, 𝜸𝟏 = 𝟎.𝟓, 𝜏 = 1, 𝐾 = Φ;
tion if 0 < 𝑟𝑔 < ̂
𝑟𝑖 because the penalty function 𝜙 plays the dominant 2: while 𝜏 < 𝑇 do
role in determining the inequality. When 𝑟𝑔 ⩾ ̂ 𝑟𝑖 , the comparison is 3: STEP 1: (Strategy Development)
said to be objective function dominated. 4: for ℎ = 1 to SAMPLE do
• If 𝑓𝑖 < 𝑓𝑖+1 and 𝜙𝑖 < 𝜙𝑖+1 , the comparison is non-dominated because 5: Generate potential network design strategies:
neither the penalty function nor the objective function can determine ( )
the inequality by itself. 𝒙𝒉𝝉 = ⋯ , 𝑥ℎ𝑖𝑗𝜏 , ⋯ , ∀(𝑖, 𝑗 ) ∈ 𝐸
It is noteworthy that the value of rg has no impact on the inequal- ( )
as dictated by 𝜸𝝉 = ⋯ , 𝛾𝑖𝑗𝜏 , ⋯ , ∀(𝑖, 𝑗 ) ∈ 𝐸, where 𝛾𝑖𝑗 =
ity (13) when we rank non-dominant or feasible individuals. Only in ( )
𝑃 𝑥𝑖𝑗 = 1 , ∀(𝑖, 𝑗 ) ∈ 𝐸.
the first two cases, rg is a pivotal factor because it is the flipping point 6: end for
that will determine whether the inequality (13) is objective or penalty 7: if 𝛾𝑖𝑗𝑢 = 1 or 𝛾𝑖𝑗𝑢 = 0, ∀(𝑖, 𝑗 ) ∈ 𝐸 then
function dominated. In particular, in the first case, if we increase rg to 8: 𝒙∗ = argmin{𝐾 }
a value larger than ̂𝑟𝑖 , the individual 𝑖 + 1 changes from a fitter individ- 9: break;
ual to a less-fit one. Hence, the chosen value of rg used for comparisons 10: end if
determines the order of all the individuals. As discussed earlier, neither 11: STEP 2: (Strategy Analysis)
under- nor over- penalization is a good constraint-handling technique, 12: (b): Compute the maximum flow from the origin node 𝑜 to the
and there should be a balance between preserving feasible individuals destination node 𝑑 corresponding to each randomly generated
and rejecting infeasible ones [62]. strategy 𝒙𝒉𝝉 ;
A simple way to achieve such a balance is to count how many com- 13: (c): Calculate the design cost of each strategy 𝒙𝒉𝝉 ;
parisons of two adjacent individuals are dominated by the objective 14: (d): Use stochastic ranking to order all the generated solutions;
functions and penalty functions, respectively, and determine the frac- 15: (e): List all the solutions in an ascending order
tion of individuals dominated by the objective and penalty functions. ( ) ( ) ( ) ( )
In this paper, we introduce a probability Pf to facilitate the ranking of 𝑓 𝒙𝝉 (1) ⩽ 𝑓 𝒙𝝉 (2) ⩽ ⋯ ⩽ 𝑓 𝒙𝝉 (ℎ) ⩽ ⋯ ⩽ 𝑓 𝒙𝝉 (SAMPLE)
the generated solutions. To be specific, given any pair of two adjacent
individuals, the probability of comparing them according to the objec- 16: STEP 3: (Solution Discovery)
tive function is 1 if both individuals are feasible; otherwise, it is Pf . { ( ) ( ) ( )}
We rank all the solutions using the bubble sort algorithm, as shown in 𝐾 → 𝐾 ∪ 𝑓 𝒙𝝉 (1) , 𝑓 𝒙𝝉 (2) , … , 𝑓 𝒙𝝉 (TOP)
Algorithm 1. In this algorithm, the variable H denotes the number of 𝜏 →𝜏+1

Algorithm 1 : Stochastic ranking. Update the probability of link to be present in the optimal solu-
1: 𝐼𝑗 = 𝑗 ∀𝑗 ∈ {1, ⋯ , 𝜇}
tion
2: for ℎ = 1 to 𝐻 do
17: for ∀(𝑖, 𝑗 ) ∈ 𝐸 do
∑𝑆 (𝑘)
𝑘=1 𝑥𝑖𝑗
3: for 𝑘 = 1 to 𝜇-1 do 18: 𝛾𝑖𝑗𝜏 = 𝑆
4: sample 𝑢 ∈ 𝑈 (0, 1); 19: end for
( ) ( )
5: if [𝜙 𝐱𝐤 = 𝜙 𝐱𝐤+𝟏 = 0] or (𝑢 < 𝑃𝑓 ) then 20: end while
( ) ( )
6: if 𝑓 𝐱𝐤 >𝑓 𝐱𝐤+𝟏 then
7: swap(𝐼𝑗 , 𝐼𝑗+1 )
8: end if With respect to the initial values of vector 𝜸 1 , since we do not have
( ) ( )
9: else if 𝜙 𝐱𝐤 > 𝜙 𝐱𝐤+𝟏 then the knowledge regarding the links that will be part of the optimal net-
10: swap(𝐼𝑗 , 𝐼𝑗+1 ) work design strategy, the lack of such knowledge forces us to provide
11: end if the same probability for each link (i, j) ∈ E to be present in the network.
12: end for In addition, in the absence of any information, 𝜸𝟏 = 0.5 for every link (i,
13: end for j) ∈ E.
The above algorithm could be implemented to solve the determinis-
tic resilience based design optimization problem directly. While in the
swaps going through the whole population. We rank 𝜇 individuals by
stochastic case, Monte Carlo simulations are utilized to sample the un-
comparing against their adjacent individuals. The algorithm terminates
certain parameters, e.g., a, b, and 𝜆, thus resulting in a distribution of
when the ordering of the populations no longer change. Based on the
the maximum flow from the origin node o to the destination node d.
stochastic raking method, we obtain an order of all the populations.
Except that, all the other procedures are the same as the deterministic
In the Solution Discovery step, a fixed number of solutions, denoted
case.
by TOP, are saved in the set K. A small fraction of size S of the whole
set of ordered strategies is used to update the probability defined by 𝛾 𝜏 .
4. Numerical examples
Given the new probability vector 𝛾 𝜏 , the algorithm goes back to the first
step to generate new candidate solutions for the next step. The program
In this section, two numerical examples are used to illustrate the
will terminate when 𝛾 𝜏 does not change any more, i.e., all the prob-
procedures and usefulness of the proposed method in designing resilient
abilities converge to zero or one. The optimal interdiction strategy is
network systems.
saved in the vector x∗ . The steps of the improved probabilistic solution
discovery algorithm are summarized in Algorithm 2 below: Example 4.1. Consider the network shown in Fig. 5, originally used by
As can be noticed, the stochastic ranking balances the objective and Hiller and Lieberman [63] as an example to discuss the shortest path
the penalty functions in optimization explicitly. The introduction of a and maximum flow problems. We modify this problem here to illustrate
single probability of Pf helps to avoid assigning one fixed value to the our approach in addressing the resilience-based system design problem.
penalty term and to specify conveniently an agreeable bias toward the In Fig. 5, all the links in this network represent the possible roads
objective function in ranking individuals. In this manner, we enable the we can build to connect the source node 1 to the ending node 7. The
search to move towards the optimum in the feasible space, not just to- two numbers along each link represent the capacity and the design cost
ward the optimum in the combined feasible and infeasible space. of that link. The service function for this network is the maximum flow

370
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

brackets denote the initial capacity associated with each link. As can be
observed, the flow along each path is constrained by the link that has
the minimum capacity.
With respect to the disrupted state, the link capacity is com-
puted as 𝑢∗𝑖𝑗 (𝑡 = 0) = 𝑢𝑖𝑗 𝑎𝑖𝑗 . For example, the capacity of link (1, 3)
at the disrupted state is: 𝑢∗13 (𝑡 = 0) = 𝑢13 𝑎13 = 7 ∗ 0.15 = 1.05. Then the
Ford-Fulkerson method is utilized ( )
to compute the maximum flow
through the network, which is 𝜓 𝑡𝑑 ||𝑒𝑞 = 1.28. Similarly, as illustrated
in Fig. 7(b), the numbers outside the brackets and inside the brackets
denote the flow and capacity along each link.
The system performance at 𝑡 = 6 is computed in a similar manner. We
first compute the recovered link capacity according to Eq. (4). For link
[ ( ) ( )]
(1, 3), we have 𝑢∗13 (𝑡 = 6) = 𝑢13 𝑎13 + 0.9 ∗ 1 − 𝑎13 ∗ 1 − 𝑒−𝑏13 ∗6 =
6.3, which is also indicated by the underlined number of Fig. 7(c). The
capacity for all the other links can be calculated in the same way. Given
the restored capacity for every link, the same Ford-Fulkerson al-
Fig. 5. A 7-node network.
gorithm is employed to map the link performance into system perfor-
mance, which is 𝜓 ( 𝑢∗ (𝑡 = 6)|𝑒𝑞 ) = 7.3. Then the system resilience can be
calculated as:
Table 1
( )
Link characteristics (deterministic case). 𝜓 ( 𝑢∗ (𝑡 = 6)|𝑒𝑞 ) − 𝜓 𝑡𝑑 ||𝑒𝑞 7.3 − 1.28
𝐑(𝑡 = 6) = ( ) ( ) = = 0.8958. (15)
Link a b Link a b 𝜓 𝑡0 − 𝜓 𝑡𝑑 ||𝑒𝑞 8 − 1.28
(1,2) 0.28 0.82 (3,5) 0.22 0.90
(1,3) 0.15 0.78 (3,6) 0.25 0.96 The design cost corresponding to the network shown in Fig. 6 is
(1,4) 0.10 0.80 (4,6) 0.28 0.84 easily computed, that is 𝑓 = 35. By using the above example, we have
(2,3) 0.37 0.94 (5,7) 0.27 0.76 described the computation of system resilience for a specific network.
(2,5) 0.33 0.82 (6,5) 0.31 0.81
But our goal is to find an appropriate network that has the minimal
(3,4) 0.26 0.85 (6,7) 0.24 0.88
cost but satisfies the system resilience constraint. To achieve this objec-
tive, a number of random networks are generated. The same evaluation
procedures are carried out on each of these randomly generated net-
works. Specifically, in the probabilistic solution discovery algorithm,
we let SAMPLE = 400, T = 10, S = 40, TOP = 20, and 𝑃𝑓 = 0.45. Table 2
displays the convergence process of the vector 𝜸 during the iterations.
As can be observed, the probabilistic solution discovery algorithm con-
verges to a stable solution in five iterations. The links with probabilities
equal to 1 form the ultimate solution, and these are (1, 4), (4, 6) and
(6, 7). These three links form one path from the node 1 to node 7, that
is: 1 → 4 → 6 → 7. Fig. 8 demonstrates the resilience of this network over
time. With time, the system recovers from the disrupted state gradually,
as observed from the increase of the system resilience in Fig. 8. When
𝑡 = 6, the system resilience is 0.8926, and the total cost is 17.
(b): Non-Deterministic case
In the non-deterministic case, the uncertain parameters related to
Fig. 6. Illustration of resilience index computation.
the function defined in Eq. (4) are represented by probability dis-
tributions.
( In this circumstance, ) we let the resilience constraint be
𝜓 ( 𝑢∗ (𝑡=8)|𝑒𝑞 )−𝜓 ( 𝑡𝑑 |𝑒𝑞 )
𝑃 𝜓 (𝑡 0 )− 𝜓 ( 𝑡 𝑑 | 𝑒 )
𝑞 > 0 . 8 > 0 . 9 . To characterize the uncertainty re-
lated to the parameters, uniform distributions are used due to its simplic-
between nodes 1 and 7. Our objective is to design a network system that ity and flexibility. The values of these variables are shown in Table 3. As
enables the system service function to restore to a certain level when the a result, Monte Carlo simulations are utilized to sample these parameters
disruptive event eq occurs. within their ranges. Fig. 9 illustrates the variability in the restoration ca-
(a): Deterministic case pability in the link (1, 2).
The first step is to model the system component performance recov- Similar to the deterministic case, the probabilistic solution discovery
ery process in the presence of the disruptive event eq . Table 1 lists the algorithm generates 400 random networks according to the initialized
parameters related to each component. For the sake of simplicity, we vector 𝜸 1 , evaluates the resilience of each network at 𝑡 = 8, and calcu-
assume that the parameter 𝜆 = 0.9 for all the links. In the deterministic lates the corresponding system cost. All the other parameters (i.e., TOP,
case, our objective is to design a system having the minimal cost subject T, S, and Pf ) in the non-deterministic case are completely same as the
to its resilience constraint 𝐑(𝑡 = 6) ⩾ 0.8. deterministic case. For each generated network, to characterize the vari-
From Table 1, for any generated network strategy, we are capable of ability in the link capacity recovery, 5000 realizations of the parameters
estimating its original state, disrupted state, and restored state at time a, b, and 𝜆 are sampled. Fig. 9 denotes the 5000 realizations of link ca-
𝑡 = 6. All these states are essential to compute the system resilience in- pacity recovery over time. This uncertainty propagates from the compo-
dex. For example, for a given network design shown in Fig. 6, its origi- nent level to the system level. As a result, in this case, the maximum flow
nal performance is computed using the initial capacity along each link through the network is probabilistic rather than a deterministic value.
based on the Ford-Fulkerson algorithm. As a result, without any Table 4 illustrates the convergence process of 𝜸 during the iterations. In
( )
disruption, we have the system performance as 𝜓 𝑡0 = 8. The specific only four iterations, the proposed method converges to a stable state.
solution is shown in Fig. 7(a). The numbers outside the brackets repre- As can be observed, the links (1, 3), (3, 6) and (6, 7) are retained in the
sent the flow along each link, and the underlined numbers inside the network, they form one route 1 → 3 → 6 → 7 to transmit the flow from

371
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

Fig. 7. System performance over time.

Table 2
Values of 𝜸 u during each iteration in the deterministic case.

u 𝛾 12u 𝛾 13u 𝛾 14u 𝛾 23u 𝛾 25u 𝛾 34u 𝛾 35u 𝛾 36u 𝛾 46u 𝛾 57u 𝛾 65u 𝛾 67u

1 0.3500 0.5250 0.5750 0.5250 0.4000 0.5250 0.5500 0.4000 0.6250 0.3000 0.4250 0.7250
2 0.1000 0.4750 0.7000 0.3500 0.0750 0.5500 0.5250 0.2750 0.7250 0.1250 0.3750 0.8750
3 0 0.1000 0.9750 0.2250 0.0250 0.3750 0.4500 0 1.0000 0 0.2000 0.9500
4 0 0 1.0000 0 0 0 0.0250 0 1.0000 0 0 0.9750
5 0 0 1.0000 0 0 0 0 0 1.0000 0 0 1.0000

Table 3
Link characteristics (non-deterministic case).

Link a b 𝜆 Link a b 𝜆

(1,2) [0.28, 0.43] [0.82, 4.95] [0.77, 0.85] (3,5) [0.22, 0.38] [0.90, 2.67] [0.76, 0.86]
(1,3) [0.15,0.31] [0.78, 3.36] [0.76, 0.84] (3,6) [0.25,0.42] [0.96, 2.21] [0.79, 0.92]
(1,4) [0.10,0.13] [0.80, 1.16] [0.73, 0.92] (4,6) [0.28,0.29] [0.84, 1.97] [0.76, 0.91]
(2,3) [0.37.49] [0.94, 1.16] [0.71, 0.84] (5,7) [0.27,0.33] [0.76, 2.98] [0.76, 0.86]
(2,5) [0.33,0.45] [0.82, 4.51] [0.72, 0.92] (6,5) [0.31,0.37] [0.81, 3.39] [0.75, 0.89]
(3,4) [0.26,0.39] [0.85, 2.26] [0.78, 0.92] (6,7) [0.24,0.42] [0.88, 4.90] [0.77, 0.90]

Table 4
Values of 𝜸 u during each iteration in the non-deterministic case.

u 𝛾 12u 𝛾 13u 𝛾 14u 𝛾 23u 𝛾 25u 𝛾 34u 𝛾 35u 𝛾 36u 𝛾 46u 𝛾 57u 𝛾 65u 𝛾 67u

1 0.4000 0.8250 0.6000 0.5250 0.4250 0.3500 0.5750 0.6750 0.6750 0.4000 0.5500 0.8000
2 0.3250 1.0000 0.3000 0.3750 0.3000 0.3500 0.5250 0.9500 0.3500 0.1500 0.5750 1.0000
3 0 1.0000 0.0500 0.2250 0 0.1500 0.2500 1.0000 0 0 0.3000 1.0000
4 0 1 0 0 0 0 0 1 0 0 0 1

Fig. 8. System resilience over time (deterministic case).


Fig. 9. Variability in component recoverability: Link (1, 2).

372
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

Table 5 other hand, the variability associated with the system resilience reduces
Link characteristics (deterministic case) .
gradually.
Link a b 𝜆 Link a b 𝜆

(S,1) 0.32 0.75 0.75 (4,14) 0.24 0.75 0.75


(S,2) 0.41 0.60 0.76 (5,15) 0.18 0.56 0.87
(S,3) 0.28 0.85 0.59 (6,16) 0.32 0.72 0.79
Example 4.2. The second example is taken from Dai and Poh [64],
(S,4) 0.15 0.48 0.73 (7,16) 0.42 0.45 0.85 which is shown in Fig. 12. This network consists of 20 nodes and 30
(S,5) 0.20 0.68 0.84 (8,16) 0.19 0.58 0.78 candidate links. The nodes S and T are designated as the source and
(1,6) 0.18 0.48 0.80 (9,17) 0.12 0.70 0.70 ending nodes, respectively. The numbers along each link denote the ca-
(1,7) 0.35 0.50 0.75 (10,17) 0.26 0.93 0.79
pacity and the cost of that link. In the following sections, we consider the
(1,8) 0.25 0.75 0.74 (11,17) 0.30 0.54 0.84
(2,8) 0.15 0.54 0.68 (12,T) 0.13 0.72 0.89 resilience based design optimization according to the recovery function
(2,9) 0.10 0.63 0.74 (13,18) 0.25 0.81 0.77 for each candidate link given the disruptive event eq .
(2,10) 0.40 0.96 0.82 (14,18) 0.15 0.78 0.78 (a): Deterministic case
(3,10) 0.25 0.84 0.86 (15,18) 0.20 0.85 0.65 In this case, the parameters (i.e., a, b and 𝜆) of the restoration func-
(3,11) 0.08 0.72 0.61 (16,T) 0.28 0.90 0.50
(3,12) 0.24 0.90 0.79 (17,T) 0.34 0.75 0.75
tion associated with each link are deterministic, as illustrated in Table 5.
(4,13) 0.38 0.68 0.72 (18,T) 0.10 0.88 0.78 The system resilience constraint in the deterministic case is defined as:
𝐑(𝑡 = 8) ⩾ 0.8. The parameters related to the probabilistic solution dis-
covery algorithm are the same as the first example.
Fig. 13 illustrates the convergence of 𝜸 for different links as a func-
tion of iteration number. As can be observed, with the increase in the
number of optimization iterations, the appearance probability of some
links (e.g., link (1, 6), (1, 7) and (S, 5)) decreases to zero, while the
probability of other links (i.e., link (S, 3), (3, 10), (10, 17) and (17,
T)) increase to one gradually. In addition, the appearance probability of
all these links converges to a stable state within 10 iterations. Table 6
presents the convergence of the appearance probability of each link as
the iteration proceeds. As can be observed, only the links (S, 3), (3, 10),
(10, 17) and (17, T) are retained in the ultimate network configuration,
and they form a path S → 3 → 10 → 17 → T to connect the source node
S and the ending node T. The total cost for such a network configura-
tion is 21 (7 + 5 + 3 + 6). For this network, the system performance
before the occurrence of the disruptive event eq equals 5, the system
performance in the disrupted state is 1.25, and the restored system per-
formance at 𝑡 = 8 is 4.471. As a result, the system resilience at 𝑡 = 8 is
calculated as: 4.471−15−1.25
.25
= 0.8587. Fig. 14 shows the trend of system re-
covery and resilience over time.
In addition, this problem also demonstrates the powerful capabil-
Fig. 10. Histogram of system resilience at 𝑡 = 8. ity of the probabilistic solution discovery algorithm. If we tackle this
problem by enumerating all the possible solutions, there are a total of
230 potential network design strategies. Whereas, we identify the quasi-
optimal solution by exploiting 4000 solutions using the evolutionary
algorithm, which are approximately 0.004% of the whole population.
In this sense, the developed probabilistic solution discovery algorithm
enables us to find out near-optimal solutions within affordable compu-
tational effort.
(b): Non-deterministic case
In the non-deterministic case, the parameters (a, b and 𝜆) relevant
to characterize the component restoration process are uncertain, as in-
dicated
( in∗ Table𝑞 7. In this case,)the system resilience constraint evolves
𝜓 ( 𝑢 (𝑡=8)|𝑒 )−𝜓 ( 𝑡𝑑 |𝑒𝑞 )
as 𝑃 𝜓 (𝑡0 )−𝜓 ( 𝑡𝑑 |𝑒𝑞 )
> 0.8 > 0.9. All the other parameters of the
probabilistic solution discovery algorithm are the same as the first nu-
merical example.
Table 8 illustrates the convergence of appearance probability of each
link. Only three links (S, 3), (3, 12) and (12, T) are present in the fi-
nal network strategy, and they constitute one path from S to T, that is:
S → 3 → 12 → T. For a specific sample of a, b and 𝜆, the maximum flow
(𝜓 (t0 ), 𝜓 (.td |eq ), and 𝜓 ( 𝑢∗ (𝑡 = 8)|𝑒𝑞 )) for such a network configuration
Fig. 11. System resilience over time in the non-deterministic case. at different phases can be derived easily. Fig. 15 demonstrate the distri-
bution of system resilience of this network at 𝑡 = 8. As can be observed,
out of 5000 samples, there are only few samples (453 samples) with
( ) resilience less than the predefined threshold of 0.8. The total cost for
𝜓 ( 𝑢∗ (𝑡=8)|𝑒𝑞 )−𝜓 ( 𝑡𝑑 |𝑒𝑞 )
node 1 to node 7. At 𝑡 = 8, 𝑃 𝜓 (𝑡 0 )− 𝜓 ( 𝑡 𝑑 | 𝑒 𝑞 )
> 0.8 = 0.9774. The the aforementioned network is 31. Fig. 16 shows the change of system
corresponding system cost is 22. Fig. 10 shows the histogram of system resilience over time.
resilience in the non-deterministic case when 𝑡 = 8. As indicated by the (c): Additional constraint
red dashed line, only the resilience of very few samples is less than 0.8. As stated earlier, the proposed method is also able to handle other
Fig. 11 demonstrates the increase of system resilience over time. On the system constraints. For the sake of demonstration, we add one more

373
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

Table 6
Values of 𝜸 u during each iteration in the deterministic case .

Iteration Number Probability Convergence

1 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.5500 0.6500 0.6750 0.4500 0.6000 0.5250 0.3500 0.4000 0.4750 0.5000
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.5000 0.3500 0.2750 0.5250 0.4250 0.5750 0.4500 0.4750 0.4000 0.5250
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.5250 0.6250 0.3750 0.5750 0.4000 0.5000 0.5250 0.4500 0.5750 0.5500
2 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.5000 0.5000 0.8500 0.2250 0.6000 0.5500 0.3750 0.3750 0.2750 0.3750
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.4250 0.3750 0.2000 0.6750 0.4250 0.5750 0.5000 0.4000 0.3000 0.4250
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.4500 0.7500 0.2000 0.7500 0.2750 0.4750 0.4250 0.2750 0.5250 0.6250
3 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.3250 0.4500 0.9000 0.1000 0.5500 0.5250 0.4250 0.3750 0.1500 0.4500
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.4750 0.3250 0.0750 0.6250 0.3250 0.5250 0.4500 0.3250 0.3000 0.4750
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.2500 0.7750 0.2250 0.7250 0.2750 0.2750 0.4750 0.1250 0.7000 0.7000
4 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.1500 0.7500 0.5000 0 0.0500 0.0750 0.3000 0.6000 0.0500 0.1750
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.8500 0.3250 0 0.0500 0.2250 0.3250 0.2500 0.1000 0.4250 0.4500
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.1500 1.0000 0.5250 0.1500 0.1000 0.1750 0.3000 0 1.0000 0.3250
5 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.2250 0.3250 0.9000 0.0500 0.5000 0.5500 0.2750 0.2250 0.1500 0.3000
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.5750 0.5000 0.0500 0.3250 0.2500 0.5250 0.5000 0.2750 0.3500 0.4750
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.2000 0.8750 0.1500 0.7500 0.2250 0.1500 0.6000 0.0250 0.7750 0.7750
6 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.1500 0.1750 0.8750 0.0250 0.2500 0.5500 0.2750 0.1750 0.1000 0.2250
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.6250 0.8000 0 0.0750 0.2500 0.4000 0.4250 0.2750 0.1500 0.3250
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.0750 0.9750 0.2500 0.5750 0.1500 0.1000 0.4500 0 0.9500 0.7250
7 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.0250 0.0250 0.9750 0.0250 0.0250 0.4500 0.2750 0.0750 0.0500 0.0500
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.5250 0.9750 0 0 0.0750 0.2750 0.1750 0.2000 0.1250 0.2750
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.0500 1.0000 0.2000 0.3000 0.0750 0.0500 0.1250 0 1.0000 0.4750
8 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0 0 1 0 0 0 0 0 0 0
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0 1 0 0 0 0 0 0 0 0
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0 1 0 0 0 0 0 0 1 0

Table 7
Link characteristics (non-deterministic case) .

Link a b 𝜆 Link a b 𝜆

(S,1) [0.32, 0.5953] [0.75,1.2874] [0.75,0.9475] (4,14) [0.24,0.5324] [0.75,1.0202] [0.75,0.8083]


(S,2) [0.41, 0.4241] [0.60,0.8881] [0.76,0.8286] (5,15) [0.18,0.2952] [0.56,1.4783] [0.87,0.9902]
(S,3) [0.28, 0.5278] [0.85,1.6579] [0.59,0.7236] (6,16) [0.32,0.3733] [0.72,1.1570] [0.79,0.8016]
(S,4) [0.15, 0.4134] [0.48,0.7874] [0.73,0.7511] (7,16) [0.42,0.4448] [0.45,0.7202] [0.85,0.9893]
(S,5) [0.20, 0.2955] [0.68,1.1893] [0.84,0.9969] (8,16) [0.19,0.3346] [0.58,1.5490] [0.78,0.9322]
(1,6) [0.18, 0.3534] [0.48,0.5097] [0.80,0.8595] (9,17) [0.12,0.1640] [0.70,0.9666] [0.70,0.8435]
(1,7) [0.35, 0.4156] [0.50,0.7288] [0.75,0.8005] (10,17) [0.26,0.3949] [0.93,1.6438] [0.79,0.9767]
(1,8) [0.25, 0.4219] [0.75,0.9449] [0.74,0.8959] (11,17) [0.30,0.4804] [0.54,0.9973] [0.84,0.9562]
(2,8) [0.15, 0.3875] [0.54,0.8874] [0.68,0.7516] (12,T) [0.13,0.2007] [0.72,1.0156] [0.89,0.9120]
(2,9) [0.10, 0.1113] [0.63,1.5348] [0.74,0.7818] (13,18) [0.25,0.3154] [0.81,1.3597] [0.77,0.8595]
(2,10) [0.40, 0.6553] [0.96,1.6829] [0.82,0.8821] (14,18) [0.15,0.4210] [0.78,1.3044] [0.78,0.9765]
(3,10) [0.25, 0.3024] [0.84,1.2124] [0.86,0.8709] (15,18) [0.20,0.2029] [0.85,1.7236] [0.65,0.8169]
(3,11) [0.08, 0.1980] [0.72,1.4373] [0.61,0.7019] (16,T) [0.28,0.5746] [0.90,1.8199] [0.50,0.5536]
(3,12) [0.24, 0.3774] [0.90,1.2354] [0.79,0.8560] (17,T) [0.34,0.4481] [0.75,1.4458] [0.75,0.9010]
(4,13) [0.38, 0.5037] [0.68,0.7092] [0.72,0.8006] (18,T) [0.10,0.3476] [0.88,1.1567] [0.78,0.9725]

374
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

Fig. 12. A 20-node network.

Table 8
Values of 𝜸 u during each iteration in the non-deterministic case .

Iteration Number Probability Convergence

1 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.5250 0.5000 0.5250 0.3250 0.4500 0.5500 0.4500 0.3750 0.5000 0.3750
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.4750 0.5250 0.4250 0.6750 0.5250 0.6000 0.4750 0.5750 0.4250 0.5000
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.3500 0.6500 0.4250 0.5250 0.5250 0.5500 0.5250 0.4250 0.4500 0.4750
2 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.4500 0.5000 0.6500 0.0750 0.3250 0.4000 0.3250 0.3750 0.4500 0.4000
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.5500 0.5250 0.3750 0.7000 0.3250 0.5500 0.4250 0.6250 0.3750 0.4250
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.1750 0.8000 0.4750 0.6000 0.3750 0.5250 0.5000 0.3500 0.4750 0.4250
3 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.2500 0.5250 0.9000 0 0.1750 0.3250 0.3750 0.2750 0.2750 0.4500
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.4500 0.5500 0.2250 0.8500 0.2750 0.5750 0.2500 0.6000 0.3000 0.4500
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.1250 0.7750 0.5500 0.7250 0.3500 0.6000 0.5000 0.1750 0.4250 0.2500
4 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.1250 0.2500 1.0000 0 0 0.2750 0.2500 0.2500 0.1750 0.4000
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.4250 0.4250 0.1250 1.0000 0.2000 0.5500 0.1250 0.5500 0.2500 0.3750
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.1500 0.7750 0.3750 0.9000 0.1500 0.5750 0.4000 0.0250 0.2750 0.0750
5 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.0250 0.0250 1.0000 0 0 0.1500 0.0500 0.2000 0.1000 0.1750
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.1750 0.2750 0 1.0000 0.1500 0.3750 0 0.3500 0.1250 0.3250
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.0750 0.6000 0.2000 0.9750 0.1000 0.3500 0.1750 0 0.1250 0
6 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0 0 1.0000 0 0 0.0500 0 0.1250 0 0.0500
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.0250 0.0250 0 1.0000 0 0.1500 0 0.2250 0 0.2000
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.0250 0.3750 0.1750 1.0000 0.0250 0.0750 0.0250 0 0 0
7 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0 0 1 0 0 0 0 0 0 0
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0 0 0 1 0 0 0 0 0 0
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0 0 0 1 0 0 0 0 0 0

375
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

Fig. 13. Probability of choosing each link during the iterations. Fig. 16. System resilience over time in the non-deterministic case.

Fig. 17. Optimal network design with the additional constraint.

constraint in the second numerical example, as illustrated below:


min 𝛿𝑖𝑗 𝑐𝑖𝑗 ,
Fig. 14. System resilience over time in the deterministic case.
(𝑖,𝑗 )∈𝐸
( ( ) )
𝜓 ( 𝒖∗ (𝑡 = 8)|𝑒𝑞 ) − 𝜓 𝑡𝑑 ||𝑒𝑞
𝑠.𝑡. 𝑃 ( ) ( ) > 0.8 > 0.9,
𝜓 𝑡0 − 𝜓 𝑡𝑑 ||𝑒𝑞
( ( ∗ ) )
𝑃 𝜓 𝒖 (𝑡 = 8)||𝑒𝑞 ⩾ 10 > 0.75.

The new additional constraint is 𝑃 (𝜓 ( 𝑢∗ (𝑡 = 8)|𝑒𝑞 ) ⩾ 10) > 0.85. The


probabilistic solution discovery algorithm tackles this problem in the
same manner: strategy generation, strategy analysis, and solution dis-
covery. All the relevant parameters (i.e., TOP, T, S, and Pf ) are the same
as in the second example. The transformation of 𝜸 u for even numbered
iterations is illustrated in Table 9. Within 10 optimization iterations,
the algorithm converges to a stable state. The links with the appearance
probabilities equal to 1 form the solution to the above problem, and the
corresponding network is shown in Fig. 17. While there is only one path
retained in the previous two examples, there are two paths present in
the network, namely S → 1 → 6 → 16 → T, and S → 3 → 12 → T, because
of the introduction of the additional constraint imposing restrictions on
the system performance at 𝑡 = 8. The system resilience and system per-
formance at 𝑡 = 8 are shown in Fig. 18(a) and 18(b), respectively. From
Fig. 18(a), it can be observed that the resilience of only 5.48% of the
samples is less than 0.8, and the system performance of all the sam-
Fig. 15. Histogram of system resilience at 𝑡 = 8. ples at 𝑡 = 8 is larger than 10. Hence, both of the system constraints are
satisfied, and the design cost for this network configuration is 41.

376
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

Table 9
Transformation of 𝜸 u for even-numbered cycles in the non-deterministic case .

Iteration Number Probability Convergence

2 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.5750 0.6000 0.8000 0.2750 0.2000 0.7000 0.3750 0.5000 0.3250 0.3500
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.6750 0.3250 0.3000 0.6250 0.3500 0.4500 0.4750 0.7000 0.2750 0.6250
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.3500 0.8750 0.3250 0.6750 0.6000 0.5000 0.4250 0.5250 0.6000 0.4000
4 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.7500 0.7750 0.9500 0.3000 0.0500 0.8250 0.2250 0.6000 0.2250 0.2500
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.7750 0.2250 0.0750 0.7500 0.3250 0.4500 0.5750 0.9250 0.3000 0.8250
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.3750 0.9500 0.2000 0.9250 0.5000 0.4500 0.4250 0.8250 0.6000 0.3500
6 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 0.8750 0.4250 1.0000 0.0500 0 0.8500 0.0250 0.5250 0.0500 0.1250
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.7750 0 0 1.0000 0.1500 0.2500 0.1250 1.0000 0.0500 0.7500
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0.1750 0.9500 0.1000 1.0000 0.0750 0.0250 0.2250 0.8500 0.4750 0.0500
8 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 1.0000 0 1.0000 0 0 1.0000 0 0.0250 0 0
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0.0500 0 0 1.0000 0 0.0500 0 1.0000 0 0.5500
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0 0.8250 0.0500 1.0000 0 0 0 1.0000 0 0
10 Link (S, 1) (S, 2) (S, 3) (S, 4) (S, 5) (1, 6) (1, 7) (1, 8) (2, 8) (2, 9)
Probability 1 0 1 0 0 1 0 0 0 0
Link (2, 10) (3, 10) (3, 11) (3,12) (4, 13) (4, 14) (5, 15) (6, 16) (7, 16) (8, 16)
Probability 0 0 0 1 0 0 0 1 0 0
Link (9, 17) (10, 17) (11, 17) (12, T) (13, 18) (14, 18) (15, 18) (16, T) (17, T) (18, T)
Probability 0 0 0 1 0 0 0 1 0 0

Fig. 18. System behavior at 𝑡 = 8 with the additional constraint: (a) System resilience (b) System performance.

5. Conclusion stochastic ranking approach. Two numerical examples are used to illus-
trate the procedures and the effectiveness of the proposed method.
In this paper, we consider a resilience based design optimization Our contributions are several fold. First of all, the introduction of
problem, and the objective is to identify a minimum-cost network de- system resilience constraint enables the system performance to restore
sign solution while meeting system resilience constraints. Different from to a certain level after disrupted by an event. Such capability is critical
reliability based design optimization, resilience based design optimiza- in common infrastructure systems, e.g., power grids, traffic networks. By
tion emphasizes the system restoration capability after the occurrence enhancing a system’s ability in response to disruptive events, it reduces
of a disruptive event. We propose a mathematical model to describe the the effect of undesired consequences resulting from these disruptions.
component restoration behavior, in which the absorptive capability, the Secondly, we develop a nonlinear function to describe the complex be-
restoration ability, and the recovery speed of each system component are havior of component restoration. The three parameters (a, b, and 𝜆) in
taken into consideration. Given the component restoration function, we this function enable us to characterize the core features (i.e., absorp-
formulate a system resilience constraint, which requires the resilience tive capacity, restoration capability, and recovery speed) of a compo-
of a given system to reach a desired level at a desired duration after nent after it is undermined by a disruptive event. The construction of
the disruption. A probabilistic solution discovery algorithm is adopted this flexible function allows us to model the component restoration in
to solve the design optimization problem in three steps: Strategy De- a comprehensive manner. Thirdly, we integrate the probabilistic solu-
velopment, Strategy Analysis, and Solution Discovery, enhanced by a tion discovery algorithm with stochastic ranking to form an effective

377
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

approach for solving the resilience based design optimization problem. [17] Fang Y-P, Pedroni N, Zio E. Resilience-based component importance measures for
Last but not the least, we account for additional constraints on the sys- critical infrastructure network systems. IEEE Trans Reliab 2016;65(2):502–12.
[18] Adjetey-Bahun K, Birregah B, Châtelet E, Planchet J-L. A model to quantify the re-
tem, e.g., system performance at a given time instant, which reveals silience of mass railway transportation systems. Reliab Eng Syst Saf 2016;153:1–14.
that the proposed model is flexible enough to incorporate other system [19] Hu Z, Mahadevan S. Resilience assessment based on time-dependent system relia-
constraints to address more complicated scenarios in reality. bility analysis. J Mech Des 2016;138(11):111404.
[20] Fotouhi H, Moryadeeb S, Miller-Hooks E. Quantifying the resilience of an urban
The proposed non-linear restoration function and design methodol- traffic-electric power coupled system. Reliab Eng Syst Saf 2017;163:79–94.
ogy can provide several benefits. By leveraging the data on past dis- [21] Hosseini S, Barker K, Ramirez-Marquez JE. A review of definitions and measures of
ruptive events, practitioners can characterize the complex component system resilience. Reliab Eng Syst Saf 2016;145:47–61.
[22] Hosseini S, Barker K, Ramirez-Marquez JE. A review of definitions and measures of
restoration behavior as a time dependent function and calibrate relevant
system resilience. Reliab Eng Syst Saf 2016;145:47–61.
parameters (a, b, and 𝜆) through regression or elicit them from expert [23] Youn BD, Hu C, Wang P. Resilience-driven system design of complex engineered
opinion. This will strengthen quantitative modeling of the possible con- systems. J Mech Des 2011;133(10):101011.
[24] Yodo N, Wang P. Resilience allocation for early stage design of complex engineered
sequences of each disruptive event on the system component, such as
systems. J Mech Des 2016;138(9):091402.
the amount of damage in each component and the difficulty in restor- [25] Christopher M, Peck H. Building the resilient supply chain. Int J Logist Manage
ing the component performance. In addition, the proposed framework 2004;15(2):1–14.
offers insights into the design of a resilient system. In this regard, it helps [26] Baroud H, Ramirez-Marquez JE, Barker K, Rocco CM. Stochastic measures
of network resilience: applications to waterway commodity flows. Risk Anal
to identify the combination of system components in an optimal way so 2014;34(7):1317–35.
as to fulfill the desired system performance. The inclusion of the sys- [27] Dueñas-Osorio L, Craig JI, Goodno BJ, Bostrom A. Interdependent response of net-
tem resilience constraint will accelerate the post-disaster performance worked systems. J Infrastruct Syst 2007;13(3):185–94.
[28] Faturechi R, Miller-Hooks E. Measuring the performance of transportation in-
restoration. Finally, since the proposed model has the flexibility to ac- frastructure systems in disasters: a comprehensive review. J Infrastruct Syst
commodate additional constraints, if the project budget and the cost of 2014;21(1):04014025.
each component is available, the proposed method can be utilized for [29] Ulieru M. Design for resilience of networked critical infrastructures. In: 2007 inaugu-
ral IEEE-IES digital EcoSystems and technologies conference. IEEE; 2007. p. 540–5.
resource allocation, i.e., identify the optimal investment to maximize [30] Francis R, Bekera B. A metric and frameworks for resilience analysis of engineered
the system resilience within a given budget. and infrastructure systems. Reliab Eng Syst Saf 2014;121:90–103.
Several assumptions were made in creating the non-linear function [31] Chen L, Miller-Hooks E. Resilience: an indicator of recovery capability in intermodal
freight transport. Transp Sci 2012;46(1):109–23.
and formulating the resilience-based network design problem. Future
[32] Faturechi R, Levenberg E, Miller-Hooks E. Evaluating and optimizing resilience of
work is needed to verify the suitability of the non-linear function for airport pavement networks. Comput Oper Res 2014;43:335–48.
practical systems. Toward this end, relevant data need to be extracted [33] Faturechi R, Miller-Hooks E. Travel time resilience of roadway networks under dis-
aster. Transp Res Part B 2014;70:47–64.
from past disruptive events, and the actual sub-system or component
[34] Fang Y, Pedroni N, Zio E. Optimization of cascade-resilient electrical infrastructures
restoration process needs to be compared against the non-linear func- and its validation by power flow modeling. Risk Anal 2015;35(4):594–607.
tion developed in this paper. Machine learning techniques can be used [35] Bhatia U, Kumar D, Kodra E, Ganguly AR. Network science based quantifi-
to calibrate the restoration function parameters using the available data. cation of resilience demonstrated on the indian railways network. PLoS ONE
2015;10(11):e0141890.
Future extensions of the design problem formulation could consider ad- [36] Fang Y, Sansavini G. Optimizing power system investments and resilience against
ditional cost and resource constraints for more realistic scenarios. attacks. Reliab Eng Syst Saf 2017;159:161–73.
[37] Asadabadi A, Miller-Hooks E. Optimal transportation and shoreline infrastruc-
ture investment planning under a stochastic climate future. Transp Res Part B
Acknowledgement
2017;100:156–74.
[38] Whitson JC, Ramirez-Marquez JE. Resiliency as a component importance measure
This research was financially supported partly by a research intern- in network reliability. Reliab Eng Syst Saf 2009;94(10):1685–93.
ship for the first author at NASA Ames Research Center, and partly by the [39] Cagnan Z, Davidson RA. Discrete event simulation of the post-earthquake restoration
process for electric power systems. Int J Risk Assess Manage 2007;7(8):1138–56.
Air Force Office of Scientific Research (Grant No. FA9550-15-1-0018). [40] Nair R, Avetisyan H, Miller-Hooks E. Resilience framework for ports and other in-
The support is gratefully acknowledged. termodal components. Transp Res Rec 2010(2166):54–65.
[41] Jabbarzadeh A, Fahimnia B, Seuring S. Dynamic supply chain network design for
References the supply of blood in disasters: a robust model with real world application. Transp
Res Part E 2014;70:225–44.
[1] Nachlas JA. Reliability engineering: probabilistic models and maintenance methods. [42] Abolghasemi H, Radfar MH, Tabatabaee M, Hosseini-Divkolayee NS, Burkle FM.
CRC Press; 2005. Revisiting blood transfusion preparedness: experience from the Bam earthquake re-
[2] Zhang X, Mahadevan S, Deng X. Reliability analysis with linguistic data: an eviden- sponse. Prehospital Disaster Med 2008;23(05):391–4.
tial network approach. Reliab Eng Syst Saf 2017;162:111–21. [43] Lei D. A genetic algorithm for flexible job shop scheduling with fuzzy processing
[3] 2008 Sichuan earthquake. https://en.wikipedia.org/wiki/2008_Sichuan_earthquake; time. Int J Prod Res 2010;48(10):2995–3013.
2008. [accessed 21-September-2016]. [44] Cadini F, Zio E, Petrescu C-A. Optimal expansion of an existing electrical power
[4] Fairley P. The unruly power grid. IEEE Spectr 2004;41(8):22–7. transmission network by multi-objective genetic algorithms. Reliab Eng Syst Saf
[5] Heavy rain hit Beijing. http://www.bbc.com/news/world-asia-china-36852886; 2010;95(3):173–81.
2016. [accessed 21-September-2016]. [45] Zhang X, Deng Y, Chan FT, Xu P, Mahadevan S, Hu Y. IFSJSP: a novel methodology
[6] Hollnagel E, Woods DD, Leveson N. Resilience engineering: concepts and precepts. for the job-shop scheduling problem based on intuitionistic fuzzy sets. Int J Prod Res
Ashgate Publishing, Ltd.; 2007. 2013;51(17):5100–19.
[7] Woods DD. Four concepts for resilience and the implications for the future of re- [46] Kulkarni RV, Venayagamoorthy GK. Particle swarm optimization in wireless-sen-
silience engineering. Reliab Eng Syst Saf 2015;141:5–9. sor networks: a brief survey. IEEE Trans Syst Man Cybern Part C (Appl Rev)
[8] Chertoff M. National infrastructure protection plan. Department of Homeland Secu- 2011;41(2):262–7.
rity (DHS), Washington, DC 2009. [47] Meng K, Wang HG, Dong Z, Wong KP. Quantum-inspired particle swarm op-
[9] Holling CS. Resilience and stability of ecological systems. Annu Rev Ecol Syst timization for valve-point economic load dispatch. IEEE Trans Power Syst
1973:1–23. 2010;25(1):215–22.
[10] Pant R, Barker K, Ramirez-Marquez JE, Rocco CM. Stochastic measures of resilience [48] Zhang X, Mahadevan S. A bio-inspired approach to traffic network equilibrium as-
and their application to container terminals. Comput Ind Eng 2014;70:183–94. signment problem. IEEE Trans Cybern 2017;doi: 10.1109/TCYB.2017.2691666.
[11] Baroud H, Barker K, Ramirez-Marquez JE, Rocco CM. Inherent costs and interdepen- [49] Faria Jr H, Binato S, Resende MG, Falcão DM. Power transmission network
dent impacts of infrastructure network resilience. Risk Anal 2015;35(4):642–62. design by greedy randomized adaptive path relinking. IEEE Trans Power Syst
[12] Haimes YY. On the definition of resilience in systems. Risk Anal 2005;20(1):43–9.
2009;29(4):498–501. [50] Zhang X, Huang S, Hu Y, Zhang Y, Mahadevan S, Deng Y. Solving 0-1 knap-
[13] Attoh-Okine NO, Cooper AT, Mensah SA. Formulation of resilience index of urban sack problems based on amoeboid organism algorithm. Appl Math Comput
infrastructure using belief functions. IEEE Syst J 2009;3(2):147–53. 2013;219(19):9959–70.
[14] Henry D, Ramirez-Marquez JE. Generic metrics and quantitative approaches for sys- [51] Zhang X, Adamatzky A, Yang H, Mahadaven S, Yang X-S, Wang Q, et al. A bio-in-
tem resilience as a function of time. Reliab Eng Syst Saf 2012;99:114–22. spired algorithm for identification of critical components in the transportation net-
[15] Ouyang M, Dueñas-Osorio L, Min X. A three-stage resilience analysis framework for works. Appl Math Comput 2014;248:18–27.
urban infrastructure systems. Struct Saf 2012;36:23–31. [52] Ramirez-Marquez JE, Rocco CM. All-terminal network reliability optimization via
[16] Barker K, Ramirez-Marquez JE, Rocco CM. Resilience-based network component im- probabilistic solution discovery. Reliab Eng Syst Saf 2008;93(11):1689–97.
portance measures. Reliab Eng Syst Saf 2013;117:89–97.

378
X. Zhang et al. Reliability Engineering and System Safety 169 (2018) 364–379

[53] Ramirez-Marquez JE, et al. Deterministic network interdiction optimization via an [59] Fujishige S. A maximum flow algorithm using MA ordering. Oper Res Lett
evolutionary approach. Reliab Eng Syst Saf 2009;94(2):568–76. 2003;31(3):176–8.
[54] Ramirez-Marquez JE, Levitin G, et al. Optimal protection of general source–sink [60] Aggarwal K. Integration of reliability and capacity in performance measure of a
networks via evolutionary techniques. Reliab Eng Syst Saf 2009;94(10):1676–84. telecommunication network. IEEE Trans Reliab 1985;34(2):184–6.
[55] Ramirez-Marquez JE. Port-of-entry safety via the reliability optimization of con- [61] Fiacco AV, McCormick GP. Nonlinear programming: sequential unconstrained min-
tainer inspection strategy through an evolutionary approach. Reliab Eng Syst Saf imization techniques, 4. SIAM; 1990.
2008;93(11):1698–709. [62] Gen M, Cheng R. Genetic algorithms and engineering optimization, 7. John Wiley &
[56] Runarsson TP, Yao X. Stochastic ranking for constrained evolutionary optimization. Sons; 2000.
IEEE Trans Evol Comput 2000;4(3):284–94. [63] Hiller F., Lieberman G.. Introduction to operation research 9th edition. 2001.
[57] Goldberg AV, Tarjan RE. A new approach to the maximum-flow problem. J ACM [64] Dai Y, Poh K. Solving the network interdiction problem with genetic algorithms.
(JACM) 1988;35(4):921–40. In: Proceedings of the fourth Asia-Pacific conference on industrial engineering and
[58] Cherkassky BV, Goldberg AV. On implementing the pushrelabel method for the max- management system, Taipei; 2002. p. 18–20.
imum flow problem. Algorithmica 1997;19(4):390–410.

379

You might also like