You are on page 1of 36

Riding The Next Wave Of Embedded

Multicore Processors

-Maximizing CPU Performance in a


Power-Constrained World

G.Balaji
The Issues

Variety of methods to increase processor


performance

What Commercial end-customers are demanding

What are the scaling factors for Processor.

How the industry-standard PC and Servers can


stay on our performance curve in fore coming
days.
What is a Processor?

A single chip package that fits in a socket,


(Bundle of much transistors)

cores
Cores can have functional units, cache, etc.
associated with them (Processing elements)
Cores can be fast or slow.

Number of signal pins doesn’t scale with number


of cores.
Why we need alternate solution

Each generation of microprocessor

Make them to run as much as faster.

Current processor requires thousands of meters


of microscopic “wire” which causes path delays
and synchronization difficulties.
Transistors leakage current adds to produce
heat .

Each generation they dissipate more heat as


clock speed increase.

Transistors consumes more power and produces


more heat

Internal path delays becoming unworkable.


Transistors are doubles every 18 months
Transistors Are Not Free

The number of transistors in a core determines


basic power consumption

Efficiency matters when designing new cores

More functional units means more transistors


Deeper pipelines mean more transistors
Larger caches mean more transistors
Solution

Dedicated Hardware accelerators.


Reducing processor’s frequency and voltage
results an reduction in its overall power
requirements.
Semiconductor manufacturers forwarded the
approach to build somewhat lower frequencies
and voltages.
But to integrate two or more of these processing
cores on a single chip.
Its called MULTI-CORE PROCESSOR..........
Multiprocessor and multicore systems are the
future.
Hardware

Software
Industry Needs

aerospace and defense (A&D) embedded


computer systems strive to meet the constantly
increasing demand for more processing power.
compute-intensive applications such as image
and radar processing.
They must simultaneously address the challenge
of constraints on size, weight and power (SWAP).
Dealing a today’s most advanced processors is a
problem with the heat .
Multicore processor enable designers to add
more processing power per slot without the
burden of additional heat dissipation or power
consumption.
Multi-Core Processors
A processor that combines
two or more independent
processors into a single
package.

(or) Link together multiple


cores that work in parallel
on the same chip.

Performance increases

Scalability
Hardware
Multi-Core Processor Architecture
Types of Multicore

• Heterogeneous
Specialization among
processors. Often different
instruction sets.

• Homogenous
Processors have the
same instruction set, can
run any task,
Three Architectures

Symmetric multi-processing (SMP)

Distributed processing (DP)

Asymmetric multiprocessing (AMP)


Symmetric Multi-processing (SMP)

What is symmetric processing


Especially well suited to handle real time
processing used in radar, image processing and
other military applications.
Each node may have two or more processors and
memory is global to all processors.
The processors may also have both local cache
and shared cache, and the cache is coherent
between all processors and memory.
A single O/S is used to control all the nodes.
Often prefer large global memories that can be
accessed at higher data rates .
The advantages of SMP
include a large global
memory and better
performance per Watt,
important for SWAP (size,
weight and power) SMP’s
large global memory is
accessible to all of the
processor cores.
The disadvantages of SMP
include the fact that the
memory latency and
bandwidth of a given node
can be affected by other
nodes, and cache
“thrashing” may occur in
some applications.
Asymmetric multiprocessing (AMP)

What is AMP ?
Application tasks are sent to the system’s
separate processors.
Each processor essentially a separate computing
system with its own OS and memory partition
within the common global memory.
One advantage of an AMP design is that
asymmetric memory partitions can be assigned
from one large global memory, making more
efficient use of memory resources .
Independent copies
of the O/S can run on
each node.
It offers superior
node-to-node
communication
compared to other
architecture.
memory latency and
bandwidth can be
affected by other
nodes, cache
“thrashing” may
occur in some
applications,
Distributed processing (DP)

Distributed processing is based on independent


nodes .
Each node has its own processor and memory,
and each of the nodes communicates over
busses .
Separate copies of the operating system are run
on each of the nodes.
Advantages of a DP approach include predictable
performance and higher memory bandwidth since
memory is not shared.
Adv & disadv
Software
Few Facts

The Software becomes the Problem


Parallelism required to gain performance.
– Parallel hardware is “easy” to design.
– Parallel software is (very) hard to write.
Fundamentally hard to grasp true concurrency
– Especially in complex software environments.
Existing software assumes single-processor
– Might break in new and interesting ways.
– Multitasking no guarantee to run on
multiprocessor.
Coding Approach

Many different
programming
languages, tools,
methodologies and
styles available.

Choice of
programming model
can have a huge
impact on performance,
ease of programming,
fine-grained parallelism

Can be done incrementally (one loop at a time)

Does not require deep knowledge of the code


Compiler assisted is best
Tedious if done by hand

Large loops have to be parallelized speedup to


occur.

Potentially many synchronization points.


Coarse-grained programming

Coarse – grained parallelism (task level)


Make loops parallel at higher level of the tree

More code is parallel at once.

Fewer synchronization points.

Requires deeper knowledge of the code.

May lead to load imbalance.


Software Back locks
Disabling Interrupts is not Locking
Single processor: DI = cannot be interrupted
– Guaranteed exclusive access to whole machine
– Cheap mechanism, used in many drivers &
kernels
Multiprocessor: DI = stop interrupts on one core
– Other cores keep running
– Shared data can be modified from the outside
Race Condition
Tasks “race” to a common point
– Result depends on who gets there first
– Occurs due to insufficient synchronization
Present with regular multitasking, but much
more severe in multiprocessing
Solution: protect all shared data with locks,
Synchronize to ensure expected order of events
Debugging process

Debuggers are the tools that allow the visibility


into the inner workings of an application.
Debugging a multicore target throws a whole new
wrench into the works.
How does one control each core with one
debugger connecting to all the cores or with a
separate debugger for each core?
JTAG-based connection devices allow “on-chip
debugging.” They allow the IDE to interact with
the target and provide services such as remotely
start, stop or suspend program execution (set a
breakpoint) and allow one to view memory and
register contents as well as IO and peripheral
devices.
Multiple debuggers can each be assigned to
individual cores and can send debug service
control packets to their assigned core without
impacting the other cores .
JTAG is a communication mechanism used to
control an embedded processor. It does not
directly have anything to do with debugging. On
the cores themselves there must be debug logic
that controls the core.
The “Nexus 5001 Forum” is an industry group
that has advanced a new IEEE standard (IEEE-
ISTO 5001) that defines just such a debug logic
block to support embedded development.
Features for industries

From medical imaging to military and aerospace,


there is a multi-core CPU-based SBC that can
provide the system developer with increased
processing power .

In automotive applications, an important benefit


of this multi-core approach is that it allows
redundancy in critical applications. For example,
safety monitors can readily be established, in
which one core monitors the other.
Conclusion
Hardware is leading the move
– Parallelism is a major paradigm shift for software
– Software and software tools are racing to catch up
– Education and training needs to be updated
– Programmers need to relearn programming
To manage the software, we need:
– New programming paradigms
For modified programming languages ,We need:
– New debug and analysis techniques
Finally we are going to see many interesting
hardware-software combinations.
References
http://www.cotsjournalonline.com riding the next
wave embedded multicore processor.
AMD http://www.amd.com/
AMD Multi-Core http://www.amd.com/multicore/
Multicore processor charecters and challenges
http://www.techonline.com/archives/articles
AMD Multi-Core White Paper
http://enterprise.amd.com/downloadables/33211A_M

www.ECNasiamag.com/archives/articles
journal future of embedded .

You might also like