You are on page 1of 52

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.

Classes of computers
Based on their physical size, performance and
application areas they are divided into four categories as

Classification of computers

Micro

Mini

Desktop

Laptop

Mainframe

Super

Hand-held

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

Micro Computers
Small ,low cost digital computer, which usually consists of a
microprocessor, a storage unit, an input channel, and an
output channel, all of which may be on one chip inserted into
one or several PC board.
Power supply, connecting cables, peripherals an OS and
software program can provide a complete micro computer
system.
Smallest of the computer family.
They were designed for the individual users only but
nowadays they have become powerful tools for many
businesses that when networked together can serve more
than one user.
e.g: IBM-PC, Pentium 100, Pentium 200, Apple Macintosh
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

Desktop Computer:
PC (Personal computer) intended for
stand alone use by an individual.
system unit, a display monitor, a keyboard,
internal hard disk storage,
and other peripheral devices.
Not very expensive .
APPLE, IBM, Dell & Hewlett Packard.

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

Laptop:
Portable computer. It resembles a
notebook, it is also known as notebook.
Features of a normal desktops.
Advantage: use this at anywhere at any
time ( when one is traveling).
No need of external power supply, only
rechargeable battery is enough.
Expensive when compared to desktop
computers.

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

Hand held Computers:


PDA (Personal Digital Assistant)
conveniently be stored in a pocket used
while the user is holding it.
Slightly bigger than the calculator.
Pen or electronic stylus as input device.
Also called as Palmtop computers.
No disk drive instead they have small
cards to store programs and data. But
they can be connected to a printer to
get the output.
Limited memory & not powerful as
desktop computers.

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

Mini Computers
Small digital computer whose
process & storage capacity <
mainframe but > micro computer.
Speed in between mainframe &
micro computer.
Size: two drawing filing cabinet.
Also called as mid range computer
Supporting
4
to
about
200
simultaneous users.
Multi user system for real time
controls & engineering work.
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

Mainframe Computer
Large expensive-simultaneous dp 100s
or 1000s of users.
Used to store, manage & process large
amounts of data that need to be reliable,
secure & centralized.
Supports large volumes of dp, high
performance online transaction
processing systems & extensive data
storage & retrieval.
Mainframes are 2nd largest
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

Super Computers
Special purpose machines. specially
designed to maximise the numbers of
FLOPS (Floating Point Operations Per
Second).>1gigaflop/sec
Highest processing speed to solve
engineering and scientific problems.
Number of CPU that operate in parallel.
Speed :400-10000 MFLOPS.
Resolve complex mathematical equations in
a few hours.
Fastest, costliest and most powerful
computer.
Used area: Aerodynamic metrology, plasma
physics , military strategist and Cinematics
specialist.
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

10

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

11

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

12

Why multi-core ?
Difficult to make single-core clock frequencies even higher
Deeply pipelined circuits:
heat problems
Clock problems
Efficiency (Stall) problems
Doubling issue rates above todays 3-6 instructions per clock, say
to 6 to 12 instructions, is extremely difficult
issue 3 or 4 data memory accesses per cycle,
rename and access more than 20 registers per cycle, and
fetch 12 to 24 instructions per cycle.
Many new applications are multithreaded
General trend in computer architecture (shift towards more
parallelism)
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0
13

Multi-Core Processor is a Special Kind of a


Multiprocessor

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

14

Multi-Core Architectures
Replicate multiple processor cores on a single die.

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

15

Multi-Core CPU Chip


The cores fit on a single processor
socket
Also called CMP (Chip Multi-Processor)

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

16

Trends in
Technology
Capacity

Logic: 2x in 3 years
DRAM: 4x in 3 years
Disk:
4x in 3 years

Speed (latency)
2x in 3 years
2x in 10 years
2x in 10 years

DRAM Generations
Year
Size
1980
64 Kb
1983
256 Kb
1986
1 Mb
1989
4 Mb
1992
16 Mb
1996
64 Mb
1998
128 Mb
2000
256 Mb
2002
512 Mb
2006
1024 Mb
16000:1
4:1
(Capacity)

Cycle Time
250 ns
220 ns
190 ns
165 ns
120 ns
110 ns
100 ns
90 ns
80 ns
60ns
(Latency)

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

17

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

18

Power, Energy & Cost

A Multi-core processor uses less power than single-core processors


Cache coherency circuitry can operate at a much higher clock rate
than is possible if the signals have to travel off-chip.
Signals between different CPUs travel shorter distances, those
signals degrade less.
These higher quality signals allow more data to be sent in a given
time period since individual signals can be shorter and do not need
to be repeated as often.
Ability of multi-core processors to increase application performance
depends on the use of multiple threads within applications.
Most Current video games will run faster on a 3 GHz single-core
processor than on a 2GHz dual-core processor (of the same core
architecture.
Two processing cores sharing the same system bus and memory
bandwidth limits the real-world performance advantage.
If a single core is close to being memory bandwidth limited, going
to multi-core might only give 30% to 70% improvement.
If memory bandwidth is not a problem, a 90% improvement can be
expected.
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

19

Dependability

Database servers
Web servers (Web commerce)
Compilers
Multimedia applications
Scientific applications,
CAD/CAM
In general, applications with
Thread-level parallelism
as opposed to instruction level parallelism
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

20

Measuring,
Reporting and
Summarizing
Performance

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

21

Measuring, Reporting and Summarizing


Performance

All computers are now parallel computers!


Multi-core processors represent an important new trend in computer
architecture.
Decreased power consumption and heat generation.
Minimized wire lengths and interconnect latencies.

They enable true thread-level parallelism with great energy efficiency


and scalability.
To utilize their full potential, applications will need to move from a
single to a multi-threaded model.
Parallel programming techniques likely to gain importance.

the difficult problem is not building multi-core hardware, but


programming it in a way that lets mainstream applications benefit from
the continued exponential growth in CPU performance.

the software industry needs to get back into the state where existing
applications run faster on new hardware
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

22

Processor-DRAM Performance
To illustrate the performance impact, assume a single-issue
pipelined CPU with CPI = 1 using non-ideal memory.
The minimum cost of a full memory access in terms of
number of wasted CPU cycles:
CPU
Year
1986:
1989:
1992:
1996:
1998:
2000:
2003: 2000

CPU
speed
8
33
60
200
300
1000
.5

Memory
Minimum CPU cycles or
cycle
Access
instructions wasted
MHZ
ns
ns
125
30
16.6
5
3.33
1
80

190
165
120
110
100
90

190/125 - 1 =
165/30 -1
=
120/16.6 -1 =
110/5 -1
=
100/3.33 -1 =
90/1 - 1
=
80/.5 - 1
=
159

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

0.5
4.5
6.2
21
29
89
23

MainMemory

Main memory generally uses (DRAM),


which uses a single transistor to store a bit, but requires a periodic data
refresh (~every 8 msec).
Cache uses SRAM: Static Random Access Memory
No refresh (6 transistors/bit vs. 1 transistor/bit for DRAM)
Size: DRAM/SRAM 4-8,
Cost & Cycle time: SRAM/DRAM 8-16
Main memory performance:
Memory latency:
Access time: The time it takes between a memory access
request and the time the requested information is available to
cache/CPU.
Cycle time: The minimum time between requests to memory
(greater than access time in DRAM to allow address lines to be
stable)
Memory bandwidth: The maximum sustained data transfer rate
between main memory and cache/CPU.
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

24

Quantitative Principles of Computer Design


Exploits

increased feature-size and density

Increases

functional units per chip (spatial


efficiency)

Limits

energy consumption per operation

Constrains

growth in processor complexity

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

25

The Cores Run in Parallel

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

26

Classes of Parallelism
Concurrent events multiprogramming,
multiprocessing, or multi computing.
Parallelism pipelining, vectorization,
concurrency, simultaneity, data parallelism,
partitioning, interleaving, overlapping,
multiplicity, replication, time sharing, space
sharing, multi tasking, multiprogramming,
multithreading, and distributed computing at
different process level.
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

27

Classes of Parallelism

Parallelism in Hardware (Uniprocessor)


Pipelining, Superscalar, VLIW etc.
Parallelism in Hardware
-(SIMD, Vector processors, GPUs)
Parallelism in Hardware (Multiprocessor)
Shared-memory multiprocessors
Distributed-memory multiprocessors
Chip-multiprocessors , Multi-cores
Parallelism in Hardware (Multicomputer
,clusters)
Parallelism in Software
Task parallelism
Data parallelism
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

28

ILP,DLP,TLP and RLP


1 ILP
2 LLP

Instruction Level Parallelism


Loop Level Parallelism

3 DLP

Data Level Parallelism

4 TLP

Thread Level Parallelism(S)

5 RLP

R Level Parallelism

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

29

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

30

Loop Level Parallelism

Instructions: iterations in a loop


Accomplished by unrolling the loop
H/W :Dynamicif block size increased
S/W : Static compiler

Loop controlling achieved by replicating the


loop body multiple times & adjusting the loop
termination code.(Creating multiple copies of
the loop body)
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

31

Thread Level Parallelism


A thread is a short sequence of instructions schedulable as a
unit by a processor.
This is parallelism on a more coarser scale
Server can serve each client in a separate thread (Web server,
database server)
A computer game can do AI, graphics, and physics in three
separate threads
Single-core superscalar processors cannot fully exploit TLP
Multi-core architectures are the next step in processor
evolution: explicitly exploiting TLP
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

32

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

33

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

34

Inter-Core Bus

35

IFETCE/ME/CSE/B.V.R.Raju/Iye
ar/Isem/CP7103/MCA/Unit-

Multithreading
Permits multiple independent threads to
execute SIMULTANEOUSLY on the SAME
core
Weaving together multiple threads on the
same core
Example:
if one thread is waiting for a floating point operation
to complete,
another thread can use the integer units
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

36

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

37

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

38

SMT Archtechture

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

39

Multi-Core vs SMT
Advantages/disadvantages?
Multi-core:
Since there are several cores, each is smaller
and not as powerful, but
easier to design and manufacture

Great with thread-level parallelism

SMT
Can have one large and fast superscalar core
Great performance on a single thread
Mostly still only exploits instruction-level parallelism
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

40

CMP Architecture

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

41

Limitations of Single core processors


Smarter Brain
(e.g. x386 x486 Pentium P2 P3 P4)
Larger Memory
Larger caches, DRAM, Disk
Smaller Head
Fewer chips (integrate more things onto a chip)
More Power Consumption
few Watts 120+ Watts!
More Complex
1Billion Transistors; design + verification
complexity
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

42

Multi-Core System
In multi-core systems, the term multi-CPU refers to
multiple physically separate processing-units
(which often contain special circuitry to facilitate
communication between each other).
The terms many-core and massively multi-core are
sometimes used to describe multi-core architectures
with an especially high number of cores (tens or
hundreds).

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

43

A multi-core processor is a single computing


component with 2 or more independent actual
cpu (called "cores"), which are the units that
read and execute program instructions. The
instructions are ordinary CPU instructions such
as add, move data, and branch, but the multiple
cores can run multiple instructions at the same
time, increasing overall speed for programs
amenable to parallel computing. Manufacturers
typically integrate the cores onto a single
integrated circuit die (known as a chip
multiprocessor or CMP), or onto multiple dies in
a single
chip package.
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0
44

Number of cores Common names

1
2
3
4
5
6
7
8
9
10

single-core
11 hendeca-core
12 dodeca-core
dual-core
13 trideca-core
tri-core /triple-core
14 tetradeca-core
quad-core
15 pentadeca-core
penta-core
16 hexadeca-core
hexa-core
17 heptadeca-core
18 octadeca-core
hepta-core

19
enneadeca-core
octa-core/octo-core
20 icosa-core
nona-core
deca-core
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0
45

Multicore processors share the


cache and MMU with short
interconnects

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

46

What Is Multicore Processing?

Multicore (or multi-core) processing uses software


designed to run as parallel or asynchronously
processed multiple applications over a multicore
processor. The efficiency of multicore processing is
dependent on how well the software application is
optimized to take advantage of the multiple cores,
the composition of those multicore processors, and
the speed of the external interfaces and related
hardware components in a system. Multicore
processing can be especially useful for low-latency
applications, with the largest boost in performance
likely to be noticed in improved response time while
running CPU-intensive processes.
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

47

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

48

The Multi core era

A multi-core processor is a single computing


component with two or more independent actual
central processing units (called "cores"), which
are the units that read and execute program
instructions.[1] The instructions are ordinary CPU
instructions such as add, move data, and branch,
but the multiple cores can run multiple
instructions at the same time, increasing overall
speed for programs amenable to parallel
computing.[2] Manufacturers typically integrate
the cores onto a single integrated circuit die
(known as a chip multiprocessor or CMP), or onto
multiple dies in a single chip package.
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

49

Case studies of Multi core


Architectures

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

50

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

51

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-1/PPt/Ver1.0

52

You might also like