Professional Documents
Culture Documents
Keywords: hardware-software gradual refinement, level system modeling and verification. Such abstraction,
multimedia applications, abstraction levels primarily at the transaction-level, allows much faster
simulations and analysis, and enables design issues to be
Abstract detected early in the process. Several academic [3] and
SystemC based design methodology has been widely industrial research projects [4] proposed automatic MPSoC
adopted for heterogeneous multiprocessor System on Chip design flows from SystemC model. Therefore, SystemC can
(MPSoC) design. However, SystemC is hardware-oriented be a good candidate to design complex MPSoC. However,
language and it is not the standard language used by SystemC is still hardware-oriented language and not suitable
designers to specify complex applications at algorithm level. to specify, analyze, design and verify the overall application
On the other hand, Simulink is a popular choice for at algorithm level.
algorithm designer to specify complex system, but there are For the algorithm modeling, simulation and validation,
few design tools to implement Simulink models on MPSoC. Simulink has become a popular choice for many system
To deal with the increasing complexity of embedded designers [5]. Simulink offers a block set of algorithms for a
applications and MPSoC architectures, concurrent variety of applications. System designer can develop a
hardware/software design and verification at different model easily by combining the pre-defined blocks or user-
abstraction levels is an essential technique. In this paper, we defined blocks called S-Functions. Simulink also provide
present a Simulink-SystemC based multiprocessor SoC system designers with simulation environment to verify the
design flow that enables mixed hardware/software developed algorithm model. Real-Time Workshop (RTW)
refinement and simulation at different abstraction levels in and Simulink HDL Coder can automatically generate
addition to opening new facilities like communication software (C code) and hardware (VHDL/Verilog) code from
mapping exploration and interconnection component the algorithm model. However, mapping and refining
refinement. We applied the proposed approach for software algorithm models onto complete MPSoCs is still an open
and communication architecture refinement for three issue in the Simulink community.
multimedia applications: MP3, Motion JPEG and H.264. In this paper, we propose a hardware-software co-
design flow that enables gradual MPSoC design from a high
1. INTRODUCTION level application model. The high level application model
combines the algorithm with partitioning and mapping
Current embedded systems require a flexible and high- information. From this initial model captured in Simulink,
performance architecture to execute concurrently specific the hardware and software are gradually refined and
applications, including MPEG 2/4, H.263/4, CDMA 2000, simulated at different abstraction levels using SystemC.
WCDMA, and MP3. Heterogeneous multi-processor Additionally, the proposed flow allows for experimentation
System on Chip (MPSoC) architecture is becoming an with different communication mapping schemes and
attractive solution because it provides highly concurrent integration of different types of network components for the
computation and flexible programmability [1]. However, as global interconnection: bus based or Network on Chip
the complexity of system increases, it is more difficult and (NoC) based architectures. The main contribution of this
time consuming to develop and verify MPSoC software and work is the definition of the representation models at the
hardware. To solve the problem, new design methodology different abstraction levels.
and language are needed. The rest of the paper is organized as follows. Section II
SystemC [2] has become the preferred development presents previous work on Simulink and SystemC based
language for hardware/software to overcome the design design flows. Section III defines the four abstraction levels
complexity problem. SystemC, which is based on C/C++, adopted in the proposed flow and the hardware/software
provides the abstraction and constructs needed for high- representation models at each abstraction level. Section IV
4.3. Transaction Accurate Architecture Design and 4.4. Virtual Prototype Design and Simulation
Simulation The Virtual Prototype model is described in SystemC at
The Transaction Accurate Architecture model is RTL level. The software stack is fully explicit, including the
described using SystemC TLM language and is generated HAL layer to access the hardware resources and it is
according to the annotated architecture parameters of the detailed to ISA (Instruction Set Architecture) level for a
initial Simulink model and the results of the Virtual specific processor.
Architecture model simulation. The hardware architecture incorporates an ISS for each
At the Transaction Accurate Architecture level, the processor to execute the final binary code.
software is composed of task code, an OS, and The simulation performed at this level is cycle accurate.
communication layer which implements the HdS APIs. The It allows validating the memory mapping of the target
OS and communication components make use of HAL APIs architecture and the final software code. It also provides
to access the hardware resources. The intra-subsystem precise performance information such as software execution
communication is managed fully by the OS and time, computation load for the processors, the number of
communication libraries. The tasks are scheduled by the OS. cycles spent on communication, etc.
At the Transaction Accurate Architecture level, the
hardware is refined to a more detailed architecture. This 5. EXPERIMENTAL RESULTS
includes the local components of the different subsystems,
such as peripherals, synchronization components, local To show the efficiency of the proposed MPSoC design
memories and network interfaces. The different subsystems flow, we applied the presented approach onto three real
are interconnected by an explicit network communication multimedia applications: a MP3 Decoder, a Motion JPEG
component (bus or NoC). During the generation, design Decoder application and a H.264 Main Profile Encoder.
decisions such as NoC size definition, subsystems First, we developed the System Architecture model to
positioning over the global interconnect component, NoC validate the application’s algorithm and to specify the
topology, NoC routing algorithm and communication buffer partitioning and mapping.
size are implemented at the Transaction Accurate As target architecture we used a heterogeneous
Architecture level. architecture based on [19]. The architecture is composed of
The simulation at the Transaction Accurate an ARM9 processor, an ATMEL DSP subsystem, an I/O
Architecture is based on native software execution and subsystem and a global memory. The communication
SystemC simulation for the hardware platform [18]. Each between the processors may use different resources for
software stack is a SystemC thread which creates a UNIX mapping the data buffers (i.e. local memories of both
process for the software execution. At the beginning of the processors or global memory). The different subsystems
simulation, the SystemC platform launches a GNU standard may be interconnected using different components, such as
debugger (gdb) UNIX process for each software stack in an AMBA bus or NoC Mesh or Torus topologies) according
order to start its execution. The software stack interacts with to the application’s requirements.
.yuv ME
5.1. System Architecture Simulation F’n-1
Inter
MC
We developed the MPEG-1 Audio (layer 3) decoder Prediction
(shortly the MP3 Decoder) model in Simulink. During this Choose
Intra Pred.
Intra
Pred. Intra
step, the main functions of the MP3 decoder were isolated
into separate tasks (figure 4). Then, we mapped two tasks F’n
+
Filter IT-1 Q-1
(T1 and T3) on the ARM processor and task T2 on the DSP. +
To represent the communication protocol between the tasks, Figure 6. H.264 Encoder algorithm
we inserted 4 communication units: 1 between the 2 tasks
mapped on the ARM and 3 between the processors. The During the experimentation, we attempted different
simulation time in Simulink for an 80KB input MP3 audio communication mapping schemes between the processors.
file was 5s on a PC running at 1.73GHz, 1GMBytes RAM. In the first scheme, the data exchange is made only via the
The simulation allowed validating the application algorithm. global memory. The second communication scheme makes
T1 T2 T3 use of the local memories to store the communication
MP3 Packet Huffman Synthesis PCM buffers. The third case uses both local and global memories
IQ IMDCT
Decoder Decoder Filter bank for the communication (mixed case). In all the situations,
the communication between tasks mapped on the same
Figure 4. MP3 Decoder processor makes use of the software FIFO protocol. In the
following sections, we will consider the worst case scenario,
In the same manner, we built the Motion-JPEG when all the communication buffers are mapped on the
Decoder in Simulink using 7 S-functions and we grouped global memories. Thus, every data exchange between the
them into 3 tasks mapped on the two processors. The processors has to pass minimum two times through the
decoded image is displayed using a LCD panel connected to global interconnect component (the producer writes in the
the I/O Peripherals subsystem (POT). The System global memory, the consumer reads the data from the global
Architecture Model of MJPEG is illustrated in figure 5. It memory).
contains 8 communication units: 5 between the tasks
mapped on the ARM processor and 3 between the different 5.2. Virtual Architecture Simulation
subsystems. The simulation time for a 10 frames input We generated the Virtual Architecture model. The
bitstream encoded using QVGA YUV 444 format was 17s software is composed of C code for the application tasks.
in Simulink. The hardware is composed of three abstract subsystems
comm1
(DSP, ARM and I/O), the global memory and an abstract
comm3
network component, described in SystemC. At this level, the
ARM DSP POT
comm2
software tasks are scheduled by the SystemC simulation
engine.
The simulation at the virtual architecture level allowed
comm4
Task2 Specification gathering of important early performance measurements, e.g.
the amount of data exchanged between the processors, the
comm5
Task1 Task2 buffer size requirements and the amount of read/write
comm6
operations performed at the storage modules.
comm7
Table 1 illustrates the total amount of exchanged bytes
comm8
during the execution of the applications and the simulation
time for the 3 multimedia applications.
Figure 5. Motion JPEG System Architecture in Simulink
Biography
A routing request is performed at least once when a Katalin Popovici is currently PhD Student in
packet arrives to a router and, depending on the NoC Microelectronics at TIMA Laboratory, National Polytechnic
specification, can request as many times as necessary to Institute of Grenoble, France. She received the Computer