Professional Documents
Culture Documents
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 132
Abstract: In this paper we develop a Co-design methodology for generating gracefully degrading multiprocessor architectures
that fulfill the dual objectives of achieving real-time performance as well as ensuring high levels of system availability at
acceptable cost.
—————————— ——————————
1 INTRODUCTION
Specification Refinement
tems employ redundancy at multiple levels such as Task, Resource
Library of
instruction and data replication [3]. These hardware only
and software only fault tolerance approaches have their
Architecture Refinement
own advantages and disadvantages. Fault tolerant Co-
design is employed to get optimum benefit from both.
Some recent works have shown the viability of this hybr-
id strategy. The work in [4] called Co-synthesis of Fault- Library based Design space exploration
Tolerant Architectures (COFTA) employs cluster based
fault tolerance approach to maximize reliability and
availability. Tasks are clustered on the basis of error Comp. - Module Alloca- Comm.
transparency and fault detection latency. The co- architec- tion architecture
synthesis algorithm uses assertion and duplicate-and- ture explo- - Task Scheduling exploration
compare tasks for architecture construction while mak- ration - Cost estimation
ing sure that processing elements capacity is not ex-
ceeded. Their system specification in the form of a task
graph considers only data dependencies. The average Co-Simulation
cost increase due to their fault-tolerant synthesis is al-
most 60% compared to a system without fault-tolerant Hardware Interfaces Software
features. The addition of duplicate tasks and resources
leads to increased system cost.
Bolchini et al. [5][6] proposed a two step a Co-design Co-verification and Testing
framework for fault tolerance. The fault tolerance con-
straints are introduced in the specification at systems lev-
el so that whole Co-design process may also be driven by
the fault tolerance requirements. During first step of de-
sign algorithm constraints like area, power, cost, etc are
addressed. In the second step, reliability of critical tasks is
increased and the factors like Fault coverage, area over-
head, detect latency and performance degradation is Figure 1 : Co-design Flow
looked. However, this design approach does not support
time masking of transient errors.
3 FAULT TOLERANT CO-DESIGN FRAMEWORK
A reliability-centric co-design framework discussed in
[7][8] tries to maximize reliability while meeting area and Detailed Hardware Software Co-design is the synergetic
performance constraints. The most reliable design is se- design of both software and hardware of a computing
lected among several available alternatives; it involves system targeting a specific application. HW/SW Co-
binding conditional data flow graph tasks onto the re- design addresses the problem of the increased complexity
sources and performs a scheduling task. And then it of embedded systems by raising the level of abstraction.
gradually lowers the reliability of the design until the It is different from behavior synthesis; behavior synthesis
specified performance and area bounds are met, which maps an algorithm into ASICs, while Co-design maps a
involves resource exchange with a smaller or faster, but high-level abstract specification model of an entire system
less reliable resource. This approach allows only one ob- onto target architecture [12]. The Co-design framework is
jective (reliability) to be optimized while area and per- the ideal place where the designer validates the system
formance remains constraint. The critical tasks are dupli- functionality and evaluates different trade off alternatives
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 134
before proceeding with the low-level phase of the design quences of different types of faults that can be tole-
where automated tools perform the synthesis and the rated. The designer must fix a limit to the maximum
integration between the parts is performed before the length of a fault sequence. It is reasonable to assume
final low-level Co-verification. The Figure 1 illustrates the that faults occur one at a time and unless a faulty unit
Co-design flow. We develop a four phase Top-down Co- is repaired, the new fault will not manifest itself.
design methodology for fault tolerant system design. This assumption is based on the fact that failure rates
1. Requirement Specification and Analysis of components are far lower than repair rates. For ex-
2. Library Based Design Exploration ample, a processor may fail typically once in a year
3. Co-Simulation whereas it can be repaired or replaced within hours.
4. Co-Verification.
3. Transparency requirements: in order to provide for
3.1 Co-Specification and Analysis debug ability and testability of the system, errors
The system requirements and specification phase in- must be visible at the primary outputs. Co-design of-
volves multiple iterations of acquiring the user’s require- ten entails Systems-on-a-Chip design wherein IP
ments, converting them to formal specifications and scru- cores from different vendors are integrated. It is a
tinizing these specifications with formal verification me- challenge to test the entire system in an integrated
thods. It requires repeated interaction between user and manner. Therefore the interoperability and testability
designer so as to finalize a set of desired qualitative me- standards should be clearly specified at the outset.
trics along side functional specifications for the given ap- One can then justify procurement of cores that have
plication. built-in self-testability features allowing offline test-
There are no formal languages currently available that ing with a fault-coverage of nearly 100%.
can fully describe system functionality and requirements. 4. Criticality specifications: The availability require-
So, initially a natural-language requirements document is ments of critical tasks as well as non-critical tasks
written to capture this information. must be specified. This would be useful in identifying
System specification is a formal description of the tasks in the critical path that should be equipped with
functional as well as the global qualitative (sometimes online fault detection and recovery.
referred to as non-functional) requirements of the target 5. User’s preferences: Availability is a highly user-
application. Functional specifications describe the com-
sensitive requirement. We underline the need for al-
plete behavior of the system that is expected by the user.
lowing the user to specify his/her own perspective
Non-functional specifications are the restrictions or con-
on what is important and what is not during the co-
straints like deadlines, area, power, reliability, availabili-
specification stage. In [14], the authors propose a us-
ty, maintainability, costs, risks, etc. Even though the im-
er-driven co-synthesis mechanism that assigns user-
plementation platform (via hardware or software) should
specified Importance-of-Availability weights to the
not be considered at this stage, adequate attention to the
range of possible functional states in a gracefully de-
qualitative requirements is paramount to the success of
the Co-design process. Traditionally, software develop- grading system.
ment followed the deployment of the hardware platform. The functional specifications can be obtained in the
Due to this, qualitative requirements were somewhat ig- form of a Task Flow Graph of the application or through a
nored in comparison with the conceptual functional re- behavioral description language such as VHDL. We use
quirements. However, in the Co-design methodology, the Conditional Task Precedence Graph (CTFG) to model the
software and hardware development takes place concur- system specification [13].
rently and in a mutually synergetic manner. The qualita- A typical real-time application repeatedly executes a
tive requirements primarily guide the decision process of set of tasks in a periodic or aperiodic fashion, and is time-
choosing from the plethora of implementation alterna- bound for execution of the entire set of tasks. There is a
tives that are available. Therefore, the application-specific specific order in which tasks need to be executed. We
quality-metrics must be unambiguously specified at this model the application by means of an acyclic, partially
early stage to avoid costly errors uncovered at a later ordered CTPG. A CTPG is a pair (V, E); where V is a set
stage. of vertices, and E is a set of edges between the vertices E
For high availability applications, the end user and the {( ni, ni) | {( ni, ni V}. The set of nodes V represent the
system designer must concur on factors such as: computation sub-tasks and the set of edges E represent
the communication sub-tasks that carry data from source
1. Fault types: The different types of faults that can the
to sink tasks and also indicate the precedence constraints
system must be protected against. Given the huge between the computational tasks. Each computational
combination of possible faults as well as the complex- task has a granularity that embodies a tightly coupled
ity of VLSI circuits, it is virtually impossible to pro- major sub-function of an application that can be imple-
tect a system against all of them. Therefore, one must mented on a single processing unit. The set T= V E con-
cull out the set of most probable hardware and soft- tains all the sub-tasks. A set element is denoted in small
ware faults that can affect the system and clearly spell case {{( ni, ni…..}. . All edges have data volume weights
out realistic fault assumptions. with Beta distribution parameters = {a,b,c,d}. The sets PI
2. Fault sequences: The maximum number and se- and PO represent the set of primary input tasks and pri-
mary output tasks, respectively. A representative CTPG
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 135
is shown in Figure 2 (data volumes are omitted from the to utilize heterogeneous platforms such as general pur-
graph to make the graph clearer). pose processors, ASICs and rFPGAs to implement the
Each computational task has a granularity that embo- functional as well as qualitative specifications [10] .
dies a tightly coupled major sub-function of an applica- This approach provides the most time-economical co-
tion that can be implemented on a single processing unit. design process. A library of processing elements and
The graph shows some AND, OR and MIXED nodes. communication links is assumed to present. The library
AND node computational task is activated only if all the contains a plethora of soft, firm or hard cores for hard-
inputs are available. An AND node is depicted in the task ware platforms as well as a variety of software functions.
graph with all the fan-in paths braced by a single arc. OR It also contains the technical specifications for each mod-
node tasks are activated by one or more inputs. An OR- ule such as power dissipation, speed, tesestability fea-
node is shown in the task graph in the normal way (with- tures, failure / repair rates, cost, self-error detecting ca-
out any arc encircling the fan-in paths). MIXED nodes pabilities and other relevant technical parameters of these
are activated on the basis of some pre-sepcified condi- platforms.
tions. A MIXED node is depicted in the task graph with Besides this technology library, there is a database of
all the fan-in paths braced by a double arc. The edge alternative implementation details of each sub-task of the
weights represent random inter-task data transfer vo- application to be developed. This database includes task
lumes for clarity edge weights omitted from the figure) execution times on each platform.
Platform based design exploration refers to these libra-
ries and the application specifications to conduct the fol-
TA0=0 TA1=2 TA2=1 TA3=0 lowing activities: hardware-software partitioning and
task graph modifications for fault tolerance, task alloca-
tion, and task scheduling and architecture optimization.
V0 V1 V2 V3 Let us see how the design exploration process can usher-
in availability considerations.
These conditional tasks are bypassed during normal op- luated and can help the designer/user to refine the sys-
eration. Such If-then-else control flow or EXOR semantics tem specifications. The cycle of specifications-design ex-
are incorporated within the task flow representation. In ploration can be repeated till the most desirable architec-
combined passive-active replica, any of the available rep- tural solution is generated.
licate tasks may be invoked, implying ORed inputs. As
another example, the active replicate tasks in a Triple 3.3 Co-simulation
Modular Redundancy system must be available simulta- In this step, the hardware and software parts are imple-
neously at the input of a voter task, expressing an AND- mented. Hardware is synthesized in the form of layout
logic. specifications in case of ASICs and as a bit-map stream
Once the task flow is modified to include tasks and da- for FPGAs. Software functions are compiled into an ex-
ta flows that add fault tolerance levels, the process of se- ecutable code for their allocated processors. In the case of
lecting suitable processors and links to support them can ASIP-centric applications, a retargettable compiler must
be initiated. be used since hardware and software development takes
place simultaneously. [11] Presents embedded architec-
3.2.2 Task allocation and binding constraints ture description language for platform based co-design.
This is the process of mapping the tasks onto computa- 3.4 Co-Verification and Testing
tion and communication elements. This step determines After synthesis, all parasitics become known, allowing
the hardware implementation platforms, the software a finer estimation of all timing parameters, power dissipa-
processors and the communication links that are actually tion and silicon area consumed. Co-verification allows the
utilized for the target system. For fault-tolerant and high- software executable to run on its designated platform
ly available architectures, components with sufficient le- along with the remaining synthesized hardware. This
vels of redundancy and high-reliability components need step requires the simultaneous verification of the synthe-
to be chosen. sized hardware and software components at different
It may be noted that the mappings of replicate tasks levels of abstraction. In fact Co-verification includes vali-
are not independent of each other since they must be dations that are performed at each step of the Co-design
mapped to different computation nodes. This introduces process. The simulation environment usually supports
an exclusive constraint. Other tasks may require an inclu- several bus encoding protocols so that designers can easi-
sive constraint so that they are mapped to the same pro- ly evaluate different design alternatives. Simulation or
cessor. For example, an audit task in certification trail emulation techniques can be employed to verify both
must execute on the same processor as the main task. functionality and quality parameters.
Thus binding constraints must be satisfied while tasks are If all performance specifications are met, the Co-design
allocated to processors. of the target application is finalized. Otherwise, the
process loops back to the design exploration step in order
3.2.3 Task scheduling
to refine the architecture or even to the system specifica-
Task scheduling assigns start and completion times to tions stage to refine the specifications once again.
the tasks on their allocated processing units. The Co-
design system must employ fault-tolerant scheduling al- 4. Results
gorithms. For example, the scheduler can optimally as-
The above Co-design methodology is implemented in
sign recovery tasks such as roll-back during slacks pe-
C++. We used a flexible fuzzy logic interface that allows
riods. Active replicates must be scheduled so that their
the user to specify rules that effectively capture the users’
outputs are available simultaneously to a voter task.
perceived availability expectations under different work-
Tasks that share the same resource must execute sequen-
ing conditions. The design exploration is guided by these
tially according to well-formulated priority criteria. Real
users’ perceived availability expectations and deadline
time applications primarily use soft firm or hard deadline
meeting probabilities off the primary output tasks. A
based scheduling policies. Secondary considerations such
multi-objective genetic algorithm is employed to explore
as load balancing and power optimization also play a
the design space with following objectives and con-
role.
straints:
3.2.4 Cost estimation and architecture optimization 1. Maximize number of deadline.
The crux of the design exploration process is to optimize 2. The system remains available at steady-state and
a multi-objective function. The multi-objective function users’ perceived availability expectations
incorporates application-specific costs such as deadline- 3. The cost remains within the prescribed budget
miss-ratio, system availability and reliability, accuracy of and its realization is feasible.
results and cost of realization. Since task allocation and
scheduling are NP-complete problems, probabilistic meta- The resultant architecture demonstrates that the tasks
heuristics multi-objective algorithms such as genetic algo- and communication channels involved in producing us-
rithms, and Ant-colony-optimization can be employed to ers’ perceived critical services are allocated to most relia-
traverse the state space with a view to optimizing all the ble resources and replicated using space and time redun-
costs. It is desirable to finally obtain a range of architec- dancy.
tural solutions that offer a tradeoff between various quali-
ty factors. These alternative solutions can be further eva-
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 137
5. Conclusion
Earlier approaches to design fault tolerant systems
analyze the reliability and availability of a pre-existing
system and introduce redundancy geared towards fault
tolerance in critical parts of the system. Unfortunately,
this leads to suboptimal design as other qualitative objec-
tives are invariably compromised. In this article, we pre-
sented a Co-design flow that systematically incorporates
fault tolerance and availability considerations at par with
other objectives during each step of the Co-design flow,
from specifications to synthesis. This ensures that costly
design errors are detected and obviated at the very early
stages of system design.
REFERENCES
[1] R. Edwards, C. Dyer, and E. Normand, “Technical standard for atmos-
pheric radiation single event effects (SEE) on avionics electronics“, IEEE
Radiation Effects Data Workshop (REDW) , 2004.
[2] M. Gomaa, C. Scarbrough, T. Vjaykumar, and I. Pomeranz, “Transient-
fault recovery for chip multiprocessors”, IEEE MICRO, 23(6) Nov-Dec
2003.
[3] Goutam Kumar Saha, "Software based Fault Tolerance: a Survey",
ACM Ubiquity, Vol.7, No. 25, pp. 1-15, July 2006, ACM Press, USA.
[4] B. P. Dave, N. K. Jha, “COFTA: Hardware-Software Co-Synthesis of
Heterogeneous Distributed Embedded System for Low Overhead
Fault Tolerance”, IEEE Trans. on Computers, 1999, vol 48, no. 4, pp. 417-
441.
[5] C. Bolchini, L. Pomante, F. Salice, and D. Sciuto, “Reliability properties
assessment at system level: a co-design framework”, Proceedings of the
Seventh International On-Line Testing Workshop, pages 165–171, 2001.
[6] C. Bolchini, L. Pomante, F. Salice, and D. Sciuto, “A system level ap-
proach in designing dual-duplex fault tolerant embedded systems”,
Proceedings of the Eighth International On-Line Testing Workshop, pages 32–
36, 2002.
[7] Y. Xie, L. Li L et al., “Reliability aware cosynthesis for embedded sys-
tems, Proc. 15th conf. on application specific systems, architectures and proces-
sors (ASAP ’04), 2004.
[8] S. Tosun, N. Mansouri, E. Arvas, M. Kandemir, Y. Xie, “ Reliability-
Centric High-Level Synthesis”, Proceedings of Design Automation and Test
in Europe (DATE’05), IEEE, Munich, Germany, pp. 1258–1263, 2005.
[9] Adeel Israr and Sorin A. Huss, “ Specification and Design Considera-
tions for Reliable Embedded Systems”, Proceedings of the conference on
Design, automation and test in Europe, Pages: 1111-1116 ,2008
[10] Marcio F. da S. Oliveira, Eduardo W. Brião, Francisco A. Nascimento,
Flávio R. Wagner“, Model Driven Engineering for MPSoC Design
Space Exploration,” Proceedings of the 20th annual conference on Integrated
circuits and systems design, pp 81-86, 2007
[11] Juncao Li, Nicholas T. Pilkington, Fei Xie, Qiang Liu b., “ Embedded
architecture description language”, The Journal of Systems and Software 83
pp 235–252, 2010
[12] A. D. Pimentel, C. Erbas and S. Polstra. “A systematic approach to
exploring embedded system architectures at multiple abstraction le-
vels“, IEEE Trans. on comput. , vol. 55, no. 2, pp. 99-112, Feb. 2006.
[13] Michele Lombardi, Michela Milano, “Allocation and scheduling of Condi-
tional Task Graphs”, Artificial Intelligence, Volume 174 , Issue 7-8 , pp
500-529 , 2010.
[14] Shampa Chakraverty and Anil Kumar, “A rule-based availabili-
ty-driven cosynthesis scheme”, Design Automation for Embedded
Systems, Volume 11, Numbers 2-3, 193-222, 2007.