Cluster Tutorial

High Performance Cluster Computing
(Architecture, Systems, and Applications)
ISCA
2000
Rajkumar Buyya, Monash University, Melbourne.

Email: rajkumar@csse.monash.edu.au / rajkumar@buyya.com
Web: http://www.ccse.monash.edu.au/~rajkumar / www.buyya.com 1
Objectives
❃ Learn and Share Recent advances in cluster

computing (both in research and commercial
settings):
– Architecture,
– System Software
– Programming Environments and Tools
– Applications
❃ Cluster Computing Infoware: (tutorial online)
– http://www.buyya.com/cluster/
2
Agenda
☞ Overview of Computing
☞ Motivations & Enabling Technologies
☞ Cluster Architecture & its Components
☞ Clusters Classifications
☞ Cluster Middleware
☞ Single System Image
☞ Representative Cluster Systems
☞ Resources and Conclusions
3
Computing Elements
Applications
Programming Paradigms
Threads
ThreadsInterface
Interface
Microkernel
Microkernel Operating System
Multi-Processor Computing System
P P P P P .. P Hardware
PP Processor Thread Process

4
Two Eras of Computing
Architectures
Sequential System Software
Era
Applications
P.S.Es
Architectures
Parallel
Era System
Software
Applications
1940 50 60 70 80 90 2000 P.S.Es
2030
Commercialization
R&D Commodity
5
Computing Power and
Computer Architectures
6
Computing Power (HPC)
Drivers
Solving grand challenge applications using

computer modeling, simulation and analysis
Life
LifeSciences
Sciences Aerospace
Aerospace
E-commerce/anything
CAD/CAM Digital
DigitalBiology
Biology Military Applications7
CAD/CAM
How to Run App. Faster
?
❃ There are 3 ways to improve performance:
– 1. Work Harder
– 2. Work Smarter
– 3. Get Help
❃ Computer Analogy
– 1. Use faster hardware: e.g. reduce the time per
instruction (clock cycle).
– 2. Optimized algorithms and techniques
– 3. Multiple computers to solve problem: That is,
increase no. of instructions executed per clock cycle.
8
Computing Platforms Evolution
Breaking Adm inistrative Barriers
2 1 0
2 1 0 2 10 2 10
P
E
?
R 2 1 0 2 10 2 10
2 1 0
F 21 00
O Administrative Barrie
R
M Ind ividu al
A Gro up
N D epart men t
C C ampus
E Sta te
N ational
Globe
Inte r Plane t
U niverse
9
Application Case Study
Web Serving and E-Commerce”
10
E-Commerce and PDC ?
❃ What are/will be the major problems/issues in

eCommerce? How will or can PDC be applied
to solve some of them?
❃ Other than “Compute Power”, what else can
PDC contribute to e-commerce?
❃ How would/could the different forms of PDC
(clusters, hypercluster, GRID,…) be applied to
e-commerce?
❃ Could you describe one hot research topic for
PDC applying to e-commerce?
❃ A killer e-commerce application for PDC ?
❃ …...
11
Killer Applications of Clusters
❃ Numerous Scientific & Engineering Apps.

❃ Parametric Simulations
❃ Business Applications
– E-commerce Applications (Amazon.com, eBay.com ….)
– Database Applications (Oracle on cluster)
– Decision Support Systems
❃ Internet Applications
– Web serving / searching
– Infowares (yahoo.com, AOL.com)
– ASPs (application service providers)
– eMail, eChat, ePhone, eBook, eCommerce, eBank, eSociety, eAnything!
– Computing Portals
❃ Mission Critical Applications
– command control systems, banks, nuclear reactor control, star-war, and handling life threatening
situations.
12
Major problems/issues in E-
commerce
 Social Issues
 Capacity Planning
❃ Multilevel Business Support (e.g., B2P2C)
❃ Information Storage, Retrieval, and Update
❃ Performance
❃ Heterogeneity
❃ System Scalability
❃ System Reliability
❃ Identification and Authentication
❃ System Expandability
❃ Security
❃ Cyber Attacks Detection and Control
(cyberguard)
❃ Data Replication, Consistency, and Caching
 Manageability (administration and control)
13
Amazon.com: Online
sales/trading killer E-commerce
❃Several Thousands of Items Portal
– books, publishers, suppliers
❃ Millions of Customers
– Customers details, transactions details, support for transactions
update
❃ (Millions) of Partners
– Keep track of partners details, tracking referral link to partner
and sales and payment
❃ Sales based on advertised price
❃ Sales through auction/bids
– A mechanism for participating in the bid (buyers/sellers define
rules of the game)
14
Can these drive
2100 2100 2100
2100
2100 2100 2100 2100

E-Commerce ?
❃ Clusters are already in use for web serving, web-hosting, and

number of other Internet applications including E-commerce
– scalability, availability, performance, reliable-high performance-
massive storage and database support.
– Attempts to support online detection of cyber attacks (through data
mining) and control
❃ Hyperclusters and the GRID:
– Support for transparency in (secure) Site/Data Replication for high
availability and quick response time (taking site close to the user).
– Compute power from hyperclusters/Grid can be used for data mining for
cyber attacks and fraud detection and control.
– Helps to build Compute Power Market, ASPs, and Computing Portals.
15
Science Portals - e.g., PAPIA
system
Pentiums
Myrinet
NetBSD/Linuux
PM
Score-D
MPC++
RWCP Japan: http://www.rwcp.or.jp/papia/ PAPIA PC Cluster

16
PDC hot topics for E-
commerce
❃ Cluster based web-servers, search engineers, portals…

❃ Scheduling and Single System Image.
❃ Heterogeneous Computing
❃ Reliability and High Availability and Data Recovery
❃ Parallel Databases and high performance-reliable-mass storage systems.
❃ CyberGuard! Data mining for detection of cyber attacks, frauds, etc.
detection and online control.
❃ Data Mining for identifying sales pattern and automatically tuning portal
to special sessions/festival sales
❃ eCash, eCheque, eBank, eSociety, eGovernment, eEntertainment,
eTravel, eGoods, and so on.
❃ Data/Site Replications and Caching techniques
❃ Compute Power Market
❃ Infowares (yahoo.com, AOL.com)
❃ ASPs (application service providers)
❃ ...
17
Sequential Architecture
Limitations
➘ Sequential architectures reaching physical

limitation (speed of light, thermodynamics)
➘ Hardware improvements like pipelining,
Superscalar, etc., are non-scalable and
requires sophisticated Compiler Technology.
➘ Vector Processing works well for certain kind
of problems.
18
Computational Power Improvement
Multiprocessor
C.P.I.
Uniprocessor
1 2. . . .
No. of Processors
19
Human Physical Growth Analogy:
Computational Power Improvement
Vertical Horizontal
Growth
5 10 15 20 25 30 35 40 45 . . . .
Age
20
Why Parallel Processing
NOW?
➘ The Tech. of PP is mature and can be
exploited commercially; significant
R & D work on development of tools
& environment.
➘ Significant development in
Networking technology is paving a
way for heterogeneous computing.
21
History of Parallel
Processing
✺ PP can be traced to a tablet

dated around 100 BC.
◆ Tablet has 3 calculating positions.
◆ Infer that multiple positions:
Reliability/ Speed
22
Motivating Factors
➟Aggregated speed with

which complex calculations
carried out by millions of neurons in human
brain is amazing! although individual
neurons response is slow (milli sec.) -
demonstrate the feasibility of PP
23
Taxonomy of
Architectures
➤ Simple classification by Flynn:

(No. of instruction and data streams)
 SISD - conventional
 SIMD - data parallel, vector computing
 MISD - systolic arrays
 MIMD - very general, multiple approaches.
➤ Current focus is on MIMD model, using

general purpose processors or
multicomputers.
24
Main HPC
Architectures..1a
❃ SISD - mainframes, workstations, PCs.

❃ SIMD Shared Memory - Vector machines,
Cray...
❃ MIMD Shared Memory - Sequent, KSR,
Tera, SGI, SUN.
❃ SIMD Distributed Memory - DAP, TMC CM-
2...
❃ MIMD Distributed Memory - Cray T3D,
Intel, Transputers, TMC CM-5, plus recent
workstation clusters (IBM SP2, DEC, Sun,
HP).
25
Motivation for using
Clusters
❃ The communications bandwidth

between workstations is increasing
as new networking technologies and
protocols are implemented in LANs
and WANs.
❃ Workstation clusters are easier to
integrate into existing networks
than special parallel computers.
26
Main HPC
Architectures..1b.
❃ NOTE: Modern sequential machines are

not purely SISD - advanced RISC
processors use many concepts from
– vector and parallel architectures (pipelining,
parallel execution of instructions, prefetching of
data, etc) in order to achieve one or more
arithmetic operations per clock cycle.
27
Parallel Processing
Paradox
❃ Time required to develop a

parallel application for solving
GCA is equal to:
– Half Life of Parallel Supercomputers.
28
The Need for
Alternative
Supercomputing
Resources
❃ Vast numbers of under utilised workstations
available to use.
❃ Huge numbers of unused processor cycles and
resources that could be put to good use in a
wide variety of applications areas.
❃ Reluctance to buy Supercomputer due to their
cost and short life span.
❃ Distributed compute resources “fit” better into
today's funding model.
29
Technology Trend
30
Scalable Parallel
Computers
31
Design Space of
Competing Computer
Architecture
32
Towards Inexpensive
Supercomputing
It is:
Cluster
Computing..
The Commodity
Supercomputing!
33
Cluster Computing -
Research Projects
❃ Beowulf (CalTech and NASA) - USA
❃ CCS (Computing Centre Software) - Paderborn, Germany
❃ Condor - Wisconsin State University, USA
❃ DQS (Distributed Queuing System) - Florida State University, US.
❃ EASY - Argonne National Lab, USA
❃ HPVM -(High Performance Virtual Machine),UIUC&now UCSB,US
❃ far - University of Liverpool, UK
❃ Gardens - Queensland University of Technology, Australia
❃ MOSIX - Hebrew University of Jerusalem, Israel
❃ MPI (MPI Forum, MPICH is one of the popular implementations)
❃ NOW (Network of Workstations) - Berkeley, USA
❃ NIMROD - Monash University, Australia
❃ NetSolve - University of Tennessee, USA
❃ PBS (Portable Batch System) - NASA Ames and LLNL, USA
❃ PVM - Oak Ridge National Lab./UTK/Emory, USA
34
Cluster Computing -
Commercial Software
❃ Codine (Computing in Distributed Network Environment) - GENIAS

GmbH, Germany
❃ LoadLeveler - IBM Corp., USA
❃ LSF (Load Sharing Facility) - Platform Computing, Canada
❃ NQE (Network Queuing Environment) - Craysoft Corp., USA
❃ OpenFrame - Centre for Development of Advanced Computing, India
❃ RWPC (Real World Computing Partnership), Japan
❃ Unixware (SCO-Santa Cruz Operations,), USA
❃ Solaris-MC (Sun Microsystems), USA
❃ ClusterTools (A number for free HPC clusters tools from Sun)
❃ A number of commercial vendors worldwide are offering clustering
solutions including IBM, Compaq, Microsoft, a number of startups like
TurboLinux, HPTI, Scali, BlackStone…..)
35
Clusters
❃ Surveys show utilisation of CPU cycles of

desktop workstations is typically <10%.
❃ Performance of workstations and PCs is
rapidly improving
❃ As performance grows, percent utilisation
will decrease even further!
❃ Organisations are reluctant to buy large
supercomputers, due to the large expense
and short useful life span.
36
Clusters
❃ The development tools for workstations are
more mature than the contrasting proprietary
solutions for parallel computers - mainly due
to the non-standard nature of many parallel
systems.
❃ Workstation clusters are a cheap and readily
available alternative to specialised High
Performance Computing (HPC) platforms.
❃ Use of clusters of workstations as a
distributed compute resource is very cost
effective - incremental growth of system!!!
37
Cycle Stealing
❃ Usually a workstation will be owned

by an individual, group, department,
or organisation - they are dedicated
to the exclusive use by the owners.
❃ This brings problems when
attempting to form a cluster of
workstations for running distributed
applications.
38
Cycle Stealing
❃ Typically, there are three types of

owners, who use their workstations
mostly for:
1. Sending and receiving email and
preparing documents.
2. Software development - edit,
compile, debug and test cycle.
3. Running compute-intensive
applications.
39
Cycle Stealing
❃ Cluster computing aims to steal spare cycles from

(1) and (2) to provide resources for (3).
❃ However, this requires overcoming the ownership
hurdle - people are very protective of their
workstations.
❃ Usually requires organisational mandate that
computers are to be used in this way.
❃ Stealing cycles outside standard work hours (e.g.
overnight) is easy, stealing idle cycles during
work hours without impacting interactive use
(both CPU and memory) is much harder.
40
Rise & Fall of
Computing
Technologies
Mainframes Minis PCs
Minis PCs Network

Computing
1970 1980 1995
41
Original Food Chain Picture
42
1984 Computer Food Chain
Mainframe
PC
Workstation
Mini Computer
Vector Supercomputer
43
1994 Computer Food Chain
(hitting wall soon)

Mini Computer
PC
Workstation
Mainframe
(future is bleak)
Vector Supercomputer MPP
44
Computer Food Chain (Now and Future)
45
What is a cluster?
❃ A cluster is a type of parallel or distributed

processing system, which consists of a
collection of interconnected stand-
alone/complete computers cooperatively
working together as a single, integrated
computing resource.
❃ A typical cluster:
– Network: Faster, closer connection than a typical network (LAN)

– Low latency communication protocols
– Looser connection than SMP
46
Why Clusters now?
(Beyond Technology and
Cost)
❃ Building block is big enough
– complete computers (HW & SW) shipped in millions: killer
micro, killer RAM, killer disks,
killer OS, killer networks, killer apps.
❃ Workstations performance is doubling every 18
months.
❃ Networks are faster
❃ Higher link bandwidth (v 10Mbit Ethernet)
❃ Switch based networks coming (ATM)
❃ Interfaces simple & fast (Active Msgs)
❃ Striped files preferred (RAID)
❃ Demise of Mainframes, Supercomputers, & MPPs
47
Architectural Drivers…
(cont)
❃ Node architecture dominates performance
– processor, cache, bus, and memory
– design and engineering $ => performance
❃ Greatest demand for performance is on large systems
– must track the leading edge of technology without lag
❃ MPP network technology => mainstream
– system area networks
❃ System on every node is a powerful enabler
– very high speed I/O, virtual memory, scheduling, …
48
...Architectural Drivers
❃ Clusters can be grown: Incremental scalability (up, down,

and across)
– Individual nodes performance can be improved by adding additional
resource (new memory blocks/disks)
– New nodes can be added or nodes can be removed
– Clusters of Clusters and Metacomputing
❃ Complete software tools
– Threads, PVM, MPI, DSM, C, C++, Java, Parallel C++, Compilers,
Debuggers, OS, etc.
❃ Wide class of applications
– Sequential and grand challenging parallel applications
49
Clustering of Computers
for Collective Computing:
Trends
?
1960 1990 1995+ 2000

Example Clusters:
Berkeley NOW
❃ 100 Sun
UltraSparcs
– 200 disks
❃ Myrinet
SAN
– 160 MB/s
❃ Fast comm.
– AM, MPI, ...
❃ Ether/ATM
switched
external
net
❃ Global OS
❃ Self Config
51
Basic Components
MyriNet
160 MB/s
Myricom
NIC
P M
M I/O bus
P
Sun Ultra 170
52
Massive Cheap Storage
Cluster
❃ Basic unit:
2 PCs double-
ending four SCSI
chains of 8 disks
each
Currently serving Fine Art at http://www.thinker.org/imagebase/
53
Cluster of SMPs
(CLUMPS)
❃ Four Sun E5000s

– 8 processors
– 4 Myricom NICs each
❃ Multiprocessor,
Multi-NIC, Multi-
Protocol
❃ NPACI => Sun 450s
54
Millennium PC Clumps
❃ Inexpensive,
easy to manage
Cluster
❃ Replicated in
many
departments
❃ Prototype for
very large PC
cluster
55
Adoption of the
Approach
56
So What’s So Different?
❃ Commodity parts?
❃ Communications Packaging?
❃ Incremental Scalability?
❃ Independent Failure?
❃ Intelligent Network Interfaces?
❃ Complete System on every node
– virtual memory
– scheduler
– files
– ...
57
OPPORTUNITIES
&
CHALLENGES
58
Opportunity of Large-scale
Computing on NOW
Shared Pool of
Computing Resources:
Processors, Memory, Disks
Interconnect
Guarantee atleast one Deliver large % of collective

workstation to many individuals resources to few individuals
(when active) at any one time
59
Windows of
Opportunities
❃ MPP/DSM:
– Compute across multiple systems: parallel.
❃ Network RAM:
– Idle memory in other nodes. Page across other nodes idle
memory
❃ Software RAID:
– file system supporting parallel I/O and reliablity, mass-
storage.
❃ Multi-path Communication:
– Communicate across multiple networks: Ethernet, ATM,
Myrinet
60
Parallel Processing
❃ Scalable Parallel Applications require

– good floating-point performance
– low overhead communication scalable
network bandwidth
– parallel file system
61
Network RAM
❃ Performance gap between processor and

disk has widened.
❃ Thrashing to disk degrades performance

significantly
❃ Paging across networks can be effective

with high performance networks and OS
that recognizes idle machines
❃ Typically thrashing to network RAM can be

5 to 10 times faster than thrashing to disk
62
Software RAID:
Redundant Array of
Workstation Disks
❃ I/O Bottleneck:
– Microprocessor performance is improving more than 50% per year.
– Disk access improvement is < 10%
– Application often perform I/O
❃ RAID cost per byte is high compared to single disks
❃ RAIDs are connected to host computers which are often a
performance and availability bottleneck
❃ RAID in software, writing data across an array of
workstation disks provides performance and some degree of
redundancy provides availability.
63
Software RAID, Parallel
File Systems, and
Parallel I/O
64
Cluster Computer and
its Components
65
Clustering Today
❃ Clustering gained momentum when 3

technologies converged:
– 1. Very HP Microprocessors
• workstation performance = yesterday supercomputers
– 2. High speed communication

• Comm. between cluster nodes >= between processors in an SMP.
– 3. Standard tools for parallel/ distributed computing

& their growing popularity.
66
Cluster Computer
Architecture
67
Cluster
Components...1a
Nodes
❃ Multiple High Performance Components:
– PCs
– Workstations
– SMPs (CLUMPS)
– Distributed HPC Systems leading to Metacomputing
❃ They can be based on different
architectures and running difference OS
68
Cluster
Components...1b
Processors
❃ There are many
(CISC/RISC/VLIW/Vector..)
– Intel: Pentiums, Xeon, Merceed….
– Sun: SPARC, ULTRASPARC
– HP PA
– IBM RS6000/PowerPC
– SGI MPIS
– Digital Alphas
❃ Integrate Memory, processing and
networking into a single chip
– IRAM (CPU & Mem): (http://iram.cs.berkeley.edu)
– Alpha 21366 (CPU, Memory Controller, NI) 69
Cluster Components…2
OS
❃ State of the art OS:

– Linux (Beowulf)
– Microsoft NT (Illinois HPVM)
– SUN Solaris (Berkeley NOW)
– IBM AIX (IBM SP2)
– HP UX (Illinois - PANDA)
– Mach (Microkernel based OS) (CMU)
– Cluster Operating Systems (Solaris MC, SCO Unixware, MOSIX
(academic project)
– OS gluing layers: (Berkeley Glunix)
70
High Performance
Networks
❃ Ethernet (10Mbps),
❃ Fast Ethernet (100Mbps),
❃ Gigabit Ethernet (1Gbps)
❃ SCI (Dolphin - MPI- 12micro-sec
latency)
❃ ATM
❃ Myrinet (1.2Gbps)
❃ Digital Memory Channel
❃ FDDI
71
Network Interfaces
❃ Network Interface Card

– Myrinet has NIC
– User-level access support
– Alpha 21364 processor integrates
processing, memory controller,
network interface into a single chip..
72
Communication
Software
❃ Traditional OS supported facilities (heavy weight
due to protocol processing)..
– Sockets (TCP/IP), Pipes, etc.
❃ Light weight protocols (User Level)
– Active Messages (Berkeley)
– Fast Messages (Illinois)
– U-net (Cornell)
– XTP (Virginia)
❃ System systems can be built on top of the above
protocols
73
Cluster Components…
6a
Cluster Middleware
❃ Resides Between OS and Applications and
offers in infrastructure for supporting:
– Single System Image (SSI)
– System Availability (SA)
❃ SSI makes collection appear as single
machine (globalised view of system
resources). Telnet cluster.myinstitute.edu
❃ SA - Check pointing and process migration..
74
6b
Middleware
Components
❃ Hardware
– DEC Memory Channel, DSM (Alewife, DASH) SMP Techniques
❃ OS / Gluing Layers
– Solaris MC, Unixware, Glunix)
❃ Applications and Subsystems
– System management and electronic forms
– Runtime systems (software DSM, PFS etc.)
– Resource management and scheduling (RMS):
• CODINE, LSF, PBS, NQS, etc.
75
Cluster Components…7a
Programming environments
❃ Threads (PCs, SMPs, NOW..)

– POSIX Threads
– Java Threads
❃ MPI
– Linux, NT, on many Supercomputers
❃ PVM
❃ Software DSMs (Shmem)
76
7b
Development Tools ?
❃ Compilers
– C/C++/Java/ ;
– Parallel programming with C++ (MIT Press book)
❃ RAD (rapid application
development tools).. GUI based
tools for PP modeling
❃ Debuggers
❃ Performance Analysis Tools
❃ Visualization Tools
77
Applications
❃ Sequential
❃ Parallel / Distributed (Cluster-aware
app.)
– Grand Challenging applications
• Weather Forecasting
• Quantum Chemistry
• Molecular Biology Modeling
• Engineering Analysis (CAD/CAM)
• ……………….
– PDBs, web servers,data-mining
78
Key Operational Benefits of Clustering
❃ System availability (HA). offer inherent high system

availability due to the redundancy of hardware, operating
systems, and applications.
❃ Hardware Fault Tolerance. redundancy for most system
components (eg. disk-RAID), including both hardware and
software.
❃ OS and application reliability. run multiple copies of the
OS and applications, and through this redundancy
❃ Scalability. adding servers to the cluster or by adding more
clusters to the network as the need arises or CPU to SMP.
❃ High Performance. (running cluster enabled programs)
79
Classification
of Cluster Computer
80
Clusters
Classification..1
❃ Based on Focus (in Market)

– High Performance (HP) Clusters
• Grand Challenging Applications
– High Availability (HA) Clusters
• Mission Critical applications
81
HA Cluster: Server Cluster with
"Heartbeat" Connection
82
Clusters
Classification..2
❃ Based on Workstation/PC
Ownership
– Dedicated Clusters
– Non-dedicated clusters
• Adaptive parallel computing
• Also called Communal multiprocessing
83
Clusters
Classification..3
❃ Based on Node Architecture..

– Clusters of PCs (CoPs)
– Clusters of Workstations (COWs)
– Clusters of SMPs (CLUMPs)
84
Building Scalable Systems:
Cluster of SMPs (Clumps)
Performance of SMP Systems Vs.

Four-Processor Servers in a Cluster 85
Clusters
Classification..4
❃ Based on Node OS Type..

– Linux Clusters (Beowulf)
– Solaris Clusters (Berkeley NOW)
– NT Clusters (HPVM)
– AIX Clusters (IBM SP2)
– SCO/Compaq Clusters (Unixware)
– …….Digital VMS Clusters, HP clusters,
………………..
86
Clusters
Classification..5
❃ Based on node components

architecture & configuration
(Processor Arch, Node Type:
PC/Workstation.. & OS: Linux/NT..):
– Homogeneous Clusters
• All nodes will have similar configuration
– Heterogeneous Clusters
• Nodes based on different processors and
running different OSes.
87
Clusters Classification..6a
Dimensions of Scalability & Levels of
Clustering
(3)
Network
Public Metacomputing (GRID)
Enterprise Technology (1)
Campus
/ O S
Department
m o ry
e
Workgroup /M
/ I/O
CPU
Uniprocessor
SMP
Cluster
MPP Platform (2)
88
Clusters
Classification..6b
Levels of Clustering
❃ Group Clusters (#nodes: 2-99)
– (a set of dedicated/non-dedicated computers - mainly connected
by SAN like Myrinet)
❃ Departmental Clusters (#nodes: 99-999)
❃ Organizational Clusters (#nodes: many 100s)
❃ (using ATMs Net)
❃ Internet-wide Clusters=Global Clusters: (#nodes:
1000s to many millions)
– Metacomputing
– Web-based Computing
– Agent Based Computing
• Java plays a major in web and agent based computing
89
Major issues in cluster
design
 Size Scalability (physical & application)
 Enhanced Availability (failure management)
 Single System Image (look-and-feel of one system)
 Fast Communication (networks & protocols)
 Load Balancing (CPU, Net, Memory, Disk)
 Security and Encryption (clusters of clusters)
 Distributed Environment (Social issues)
 Manageability (admin. And control)
 Programmability (simple API if required)
 Applicability (cluster-aware and non-aware app.)
90
Cluster Middleware
and
Single System Image
91
A typical Cluster Computing
Environment
Application
PVM / MPI/ RSH
???
Hardware/OS
92
CC should support
❃ Multi-user, time-sharing environments
❃ Nodes with different CPU speeds and memory sizes

(heterogeneous configuration)
❃ Many processes, with unpredictable requirements
❃ Unlike SMP: insufficient “bonds” between nodes
– Each computer operates independently
– Inefficient utilization of resources
93
The missing link is provide by
cluster middleware/underware
Application
PVM / MPI/ RSH

Middleware or
Underware
Hardware/OS
94
SSI Clusters--SMP services
on a CC
“Pool Together” the “Cluster-Wide” resources
❃ Adaptive resource usage for better performance

❃ Ease of use - almost like SMP
❃ Scalable configurations - by decentralized control
Result: HPC/HAC at PC/Workstation prices
95
What is Cluster Middleware
?
❃ An interface between between use
applications and cluster hardware and OS
platform.
❃ Middleware packages support each other
at the management, programming, and
implementation levels.
❃ Middleware Layers:
– SSI Layer
– Availability Layer: It enables the cluster services of
• Checkpointing, Automatic Failover, recovery from
failure,
• fault-tolerant operating among all cluster nodes.
96
Middleware Design Goals
❃ Complete Transparency (Manageability)

– Lets the see a single cluster system..
• Single entry point, ftp, telnet, software loading...
❃ Scalable Performance
– Easy growth of cluster
• no change of API & automatic load distribution.
❃ Enhanced Availability
– Automatic Recovery from failures
• Employ checkpointing & fault tolerant technologies
– Handle consistency of data when replicated..
97
What is Single System
Image (SSI) ?
❃ A single system image is the

illusion, created by software or
hardware, that presents a
collection of resources as one,
more powerful resource.
❃ SSI makes the cluster appear
like a single machine to the
user, to applications, and to
the network.
❃ A cluster without a SSI is not a
cluster 98
Benefits of Single
System Image
❃ Usage of system resources transparently

❃ Transparent process migration and load balancing
across nodes.
❃ Improved reliability and higher availability
❃ Improved system response time and performance
❃ Simplified system management
❃ Reduction in the risk of operator errors
❃ User need not be aware of the underlying system
architecture to use these machines effectively
99
Desired SSI Services
❃ Single Entry Point

– telnet cluster.my_institute.edu
– telnet node1.cluster. institute.edu
❃ Single File Hierarchy: xFS, AFS, Solaris MC Proxy
❃ Single Control Point: Management from single GUI
❃ Single virtual networking
❃ Single memory space - Network RAM / DSM
❃ Single Job Management: Glunix, Codine, LSF
❃ Single User Interface: Like workstation/PC windowing
environment (CDE in Solaris/NT), may it can use Web
technology
100
Availability Support
Functions
❃ Single I/O Space (SIO):
– any node can access any peripheral or disk devices without the
knowledge of physical location.
❃ Single Process Space (SPS)
– Any process on any node create process with cluster wide process wide
and they communicate through signal, pipes, etc, as if they are one a
single node.
❃ Checkpointing and Process Migration.
– Saves the process state and intermediate results in memory to disk to
support rollback recovery when node fails. PM for Load balancing...
❃ Reduction in the risk of operator errors
❃ User need not be aware of the underlying system architecture

to use these machines effectively 101
Scalability Vs. Single
System Image
UP
102
SSI Levels/How do we
implement SSI ?
❃ It is a computer science notion of levels of

abstractions (house is at a higher level of
abstraction than walls, ceilings, and
floors).
Application and Subsystem Level
Operating System Kernel Level
Hardware Level
103
SSI at Application and
Subsystem Level
Level Examples Boundary Importance
application cluster batch system, an application what a user

system management wants
subsystem distributed DB, a subsystem SSI for all

OSF DME, Lotus applications of
Notes, MPI, PVM the subsystem
file system Sun NFS, OSF, shared portion of implicitly supports
DFS, NetWare, the file system many applications
and so on and subsystems
toolkit OSF DCE, Sun explicit toolkit best level of
ONC+, Apollo facilities: user, support for heter-
Domain service name,time ogeneous system
104
(c) In search of clusters
SSI at Operating System
Kernel Level
Kernel/ Solaris MC, Unixware each name space: kernel support for
OS Layer MOSIX, Sprite,Amoeba files, processes, applications, adm
/ GLunix pipes, devices, etc. subsystems
kernel UNIX (Sun) vnode, type of kernel modularizes SSI

interfaces Locus (IBM) vproc objects: files, code within
processes, etc. kernel
virtual none supporting each distributed may simplify
memory operating system kernel virtual memory implementation
space of kernel objects
microkernel Mach, PARAS, Chorus, each service implicit SSI for
OSF/1AD, Amoeba outside the all system services
microkernel
105
SSI at Harware Level
Application and Subsystem Level
Operating System Kernel Level
memory SCI, DASH memory space better communica-

tion and synchro-
nization
memory SCI, SMP techniques memory and I/O lower overhead
and I/O device space cluster I/O
106
SSI Characteristics
❃ 1. Every SSI has a boundary

❃ 2. Single system support can
exist at different levels within
a system, one able to be build
on another
107
SSI Boundaries -- an
applications SSI
boundary
Batch System
SSI
Boundary (c) In search
of clusters
108
Relationship Among
Middleware Modules
109
SSI via OS path!
❃ 1. Build as a layer on top of the existing OS

– Benefits: makes the system quickly portable, tracks vendor
software upgrades, and reduces development time.
– i.e. new systems can be built quickly by mapping new
services onto the functionality provided by the layer beneath.
Eg: Glunix
❃ 2. Build SSI at kernel level, True Cluster OS
– Good, but Can’t leverage of OS improvements by
vendor
– E.g. Unixware, Solaris-MC, and MOSIX
110
SSI Representative
Systems
❃ OS level SSI
– SCO NSC UnixWare
– Solaris-MC
– MOSIX, ….
❃ Middleware level SSI
– PVM, TreadMarks (DSM), Glunix,
Condor, Codine, Nimrod, ….
❃ Application level SSI
– PARMON, Parallel Oracle, ... 111
SCO NonStop® Cluster for
UnixWare
http://www.sco.com/products/clustering/
UP or SMP node UP or SMP node
Users, applications, and Users, applications, and

systems management systems management
Standard OS Standard OS
Extensions Extensions
kernel calls kernel calls
Standard SCO Modular Modular Standard SCO

UnixWare® kernel kernel UnixWare
with clustering extensions with clustering
extensions
hooks hooks
Devices Devices
ServerNet™
Other nodes 112
How does NonStop Clusters
Work?
❃ Modular Extensions and Hooks to Provide:

– Single Clusterwide Filesystem view
– Transparent Clusterwide device access
– Transparent swap space sharing
– Transparent Clusterwide IPC
– High Performance Internode Communications
– Transparent Clusterwide Processes, migration,etc.
– Node down cleanup and resource failover
– Transparent Clusterwide parallel TCP/IP networking
– Application Availability
– Clusterwide Membership and Cluster timesync
– Cluster System Administration
– Load Leveling
113
Solaris-MC: Solaris for
MultiComputers
❃ global file system
❃ globalized process
management
Applications
❃ globalized
networking and I/O
System call interface
Network
File system Processes Solaris MC Other

nodes
C++ Object framework
Object invocations
Existing Solaris 2.5 kernel
Kernel
Solaris MC Architecture
http://www.sun.com/research/solaris-mc/ 114
Solaris MC components
Applications ❃ Object and

communication
System call interface support
❃ High availability
Network
support
File system Processes Solaris MC Other
nodes
❃ PXFS global
Object framework
distributed file
C++
Object invocations system
Existing Solaris 2.5 kernel ❃ Process
Kernel
mangement
Solaris MC Architecture ❃ Networking
115
Multicomputer OS for UNIX
http://www.mosix.cs.huji.ac.il/ (MOSIX)
❃ An OS module (layer) that provides the
applications with the illusion of working on a
single system
❃ Remote operations are performed like local
operations
❃ Transparent to the application - user
interface unchanged
Application
PVM / MPI / RSH

MO
S IX
Hardware/OS 116
Main tool
Preemptive process migration that can

migrate--->any process, anywhere, anytime
❃ Supervised by distributed algorithms that

respond on-line to global resource
availability - transparently
❃ Load-balancing - migrate process from over-
loaded to under-loaded nodes
❃ Memory ushering - migrate processes from a
node that has exhausted its memory, to
prevent paging/swapping
117
MOSIX for Linux at HUJI
❃ A scalable cluster configuration:

– 50 Pentium-II 300 MHz
– 38 Pentium-Pro 200 MHz (some are SMPs)
– 16 Pentium-II 400 MHz (some are SMPs)
❃ Over 12 GB cluster-wide RAM
❃ Connected by the Myrinet 2.56 G.b/s LAN
Runs Red-Hat 6.0, based on Kernel 2.2.7
❃ Upgrade: HW with Intel, SW with Linux
❃ Download MOSIX:
– http://www.mosix.cs.huji.ac.il/
118
NOW @ Berkeley
❃ Design & Implementation of higher-level system

❃ Global OS (Glunix)
❃ Parallel File Systems (xFS)
❃ Fast Communication (HW for Active Messages)
❃ Application Support
❃ Overcoming technology shortcomings
❃ Fault tolerance
❃ System Management
❃ NOW Goal: Faster for Parallel AND Sequential
http://now.cs.berkeley.edu/ 119
NOW Software
Components
Parallel Apps
Large Seq. Apps
Sockets, Split-C, MPI, HPF, vSM
Name Svr Global Layer Unix Active Messages
Unix Unix Unix Unix (Solaris) le

r
Workstation Workstation Workstation Workstation du
c he
S
VN segment VN segment VN segment VN segment
Driver Driver Driver Driver
AM L.C.P. AM L.C.P. AM L.C.P. AM L.C.P.
Myrinet Scalable Interconnect
120
3 Paths for Applications
on
NOW?
❃ Revolutionary (MPP Style): write new programs from
scratch using MPP languages, compilers, libraries,…
❃ Porting: port programs from mainframes, supercomputers,
MPPs, …
❃ Evolutionary: take sequential program & use
1) Network RAM: first use memory of many computers to
reduce disk accesses; if not fast enough, then:
2) Parallel I/O: use many disks in parallel for accesses not
in file cache; if not fast enough, then:
3) Parallel program: change program until it sees enough
processors that is fast=> Large speedup without fine
grain parallel program
121
Comparison of 4 Cluster
Systems
122
Cluster Programming
Environments
❃ Shared Memory Based
– DSM
– Threads/OpenMP (enabled for clusters)
– Java threads (HKU JESSICA, IBM cJVM)
❃ Message Passing Based
– PVM (PVM)
– MPI (MPI)
❃ Parametric Computations
– Nimrod/Clustor
❃ Automatic Parallelising Compilers
❃ Parallel Libraries & Computational Kernels (NetSolve)
123
Levels
Levels ofof
Parallelism
Parallelism
Code-
Code-
Granularity
Granularity
PVM/MPI Task
Taski-l
i-l Task
Taskii Task
Taski+1
i+1 Code
CodeItem
Item
Large
Largegrain
grain
(task
(tasklevel)
level)
Program
Program
func1
func1( () ) func2
func2( () ) func3
func3( () )
{{ {{ {{
Threads ....
....
....
....
....
....
....
....
....
Medium
Mediumgrain
grain
.... .... .... (control
}} }} }} (controllevel)
level)
Function
Function(thread)
(thread)
aa( (00) )=.. aa( (11)=.. aa( (22)=.. Fine

Compilers =..
bb( (00) )=..
)=..
bb( (11)=..
)=..
bb( (22)=.. Finegrain
grain
=.. )=.. )=.. (data
(datalevel)
level)
Loop
Loop(Compiler)
(Compiler)
CPU ++ xx Load
Load Very
Veryfine
finegrain
grain
(multiple
(multipleissue)
issue) 124
With hardware
MPI (Message Passing
Interface)
http://www.mpi-forum.org/
❃ A standard message passing interface.

– MPI 1.0 - May 1994 (started in 1992)
– C and Fortran bindings (now Java)
❃ Portable (once coded, it can run on virtually all
HPC platforms including clusters!
❃ Performance (by exploiting native hardware
features)
❃ Functionality (over 115 functions in MPI 1.0)
– environment management, point-to-point & collective
communications, process group, communication world,
derived data types, and virtual topology routines.
❃ Availability - a variety of implementations
available, both vendor and public domain.
125
A Sample MPI
Program...
# include <stdio.h>
(master)
# include <string.h>
#include “mpi.h”
main( int argc, char *argv[ ]) Hello,...
{
int my_rank; /* process rank */
int p; /*no. of processes*/ …
int source; /* rank of sender */
int dest; /* rank of receiver */
int tag = 0; /* message tag, like “email subject” */
char message[100]; /* buffer */
(workers)
MPI_Status status; /* function return status */
/* Start up MPI */
MPI_Init( &argc, &argv );
/* Find our process rank/id */
MPI_Comm_rank( MPI_COM_WORLD, &my_rank);
/*Find out how many processes/tasks part of this run */
MPI_Comm_size( MPI_COM_WORLD, &p);
126
A Sample MPI Program
if( my_rank == 0) /* Master Process */

{
for( source = 1; source < p; source++)
{
MPI_Recv( message, 100, MPI_CHAR, source, tag, MPI_COM_WORLD, &status);
printf(“%s \n”, message);
}
}
else /* Worker Process */
{
sprintf( message, “Hello, I am your worker process %d!”, my_rank );
dest = 0;
MPI_Send( message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COM_WORLD);
}
/* Shutdown MPI environment */
MPI_Finalise();
}
127
Execution
% cc -o hello hello.c -lmpi

% mpirun -p2 hello
Hello, I am process 1!
% mpirun -p4 hello
% mpirun hello
(no output, there are no workers.., no greetings)
128
PARMON: A Cluster
Monitoring Tool
PARMON Server
PARMON Client on JVM on each node
parmon
PARMON parmond
High-Speed
Switch
http://www.buyya.com/parmon/ 129
Resource Utilization at
a Glance
130
Globalised Cluster Storage
Single I/O Space and

Design Issues
Reference:
Designing SSI Clusters with Hierarchical Checkpointing and Single I/O
Space”, IEEE Concurrency, March, 1999
by K. Hwang, H. Jin et.al
131
Clusters with & without
Single I/O Space
Users Users
Single I/O Space Services
Without Single I/O Space With Single I/O Space Services
132
Benefits of Single I/O Space
❃ Eliminate the gap between accessing local disk(s) and remote

disks
❃ Support persistent programming paradigm
❃ Allow striping on remote disks, accelerate parallel I/O
operations
❃ Facilitate the implementation of distributed checkpointing and
recovery schemes
133
Single I/O Space Design Issues
❃ Integrated I/O Space
❃ Addressing and Mapping Mechanisms
❃ Data movement procedures
134
Integrated I/O Space
LD1 ...
D11 D12 D1t Local
D21 D22 ...
LD2 D2t Disks,
...
(RADD
... Space)
LDn Dn1 Dn2 Dnt
ses
addres
ntial
Seque
...
B11 B21 Bm1
...
B12 B22 Bm2 Shared
SD1 SD2 ... SDm RAIDs,
... (NASD Space)
B1k B2k Bmk
P1
. . . Peripherals
(NAP Space)
Ph
135
Addressing and Mapping
User Applications
Name Agent Disk/RAID/ Block Mover User-level

NAP Mapper Middleware
plus some
Modified OS
I/O Agent I/O Agent I/O Agent I/O Agent
System Calls
RADD NASD NAP

136
Data Movement Procedures
Node 1
User Block
Application Mover Node 2
Request
Data I/O Agent
I/O Agent Block A
LD2 or SDi
LD1 of the NASD A
User Node 1
Application Node 2
Block
A I/O Agent
I/O Agent Mover
LD2 or SDi
LD1 of the NASD A
137
What Next ??
Clusters of Clusters (HyperClusters)

Global Grid
Interplanetary Grid
Universal Grid??
138
Clusters of Clusters
(HyperClusters)
Cluster 1
Scheduler
Master
Daemon
LAN/WAN
Submit
Graphical Cluster 3
Control Execution
Daemon Scheduler
Clients
Master
Daemon
Cluster 2
Scheduler Submit
Graphical
Master Control Execution
Daemon Daemon
Clients
Submit
Graphical
Control Execution
Daemon
Clients
139
Towards Grid Computing….
140
For illustration, placed resources arbitrarily on the GUSTO test-bed!!
What is Grid ?
❃ An infrastructure that couples
– Computers (PCs, workstations, clusters, traditional supercomputers, and even laptops,
notebooks, mobile computers, PDA, and so on)
– Software ? (e.g., renting expensive special purpose applications on demand)
– Databases (e.g., transparent access to human genome database)
– Special Instruments (e.g., radio telescope--SETI@Home Searching for Life in galaxy,
Austrophysics@Swinburne for pulsars)
– People (may be even animals who knows ?)
❃ across the local/wide-area networks (enterprise,
organisations, or Internet) and presents them as an
unified integrated (single) resource.
141
Conceptual view of the Grid
Leading to Portal (Super)Computing
http://www.sun.com/hpc/
142
Grid Application-Drivers
❃ Old and New applications getting enabled due

to coupling of computers, databases,
instruments, people, etc:
– (distributed) Supercomputing
– Collaborative engineering
– high-throughput computing
• large scale simulation & parameter studies
– Remote software access / Renting Software
– Data-intensive computing
– On-demand computing
143
Grid Components
Applications and Portals

Grid
Scientific Engineering Collaboration Prob. Solving Env. … Web enabled Apps Apps.
Development Environments and Tools Grid

Languages Libraries Debuggers Monitoring Resource Brokers … Web tools Tools
Distributed Resources Coupling Services Grid

Comm. Sign on & Security Information Process Data Access … QoS Middleware
Local Resource Managers

Operating Systems Queuing Systems Libraries & App Kernels … TCP/IP & UDP
Grid
Networked Resources across Organisations Fabric
Computers Clusters Storage Systems Data Sources … Scientific Instruments
144
Many GRID Projects and
Initiatives
❃ Public Grid Initiatives

– Distributed.net
❃ PUBLIC FORUMS
– SETI@Home
– Computing Portals
– Compute Power Grid
– Grid Forum ❃ USA
– European Grid Forum – Globus
– IEEE TFCC! – Legion
– GRID’2000 and more. – JAVELIN
❃ Australia
– AppLes
– Nimrod/G – NASA IPG
– EcoGrid and GRACE – Condor
– DISCWorld – Harness
❃ Europe
– NetSolve
– UNICORE
– NCSA Workbench
– MOL
– WebFlow
– METODIS
– EveryWhere
– Globe
– and many more...
– Poznan Metacomputing
❃ Japan
– CERN Data Grid
– Ninf
– MetaMPI
– Bricks
– DAS
– JaWS
http://www.gridcomputing.com/ 145
NetSolve
Client/Server/Agent -- Based Computing
Easy-to-usetool to provideefficient and uniform
access to a variety of scientific packages on UNIX platforms
• Client-Server design
• Network-enabled solvers Software Reposit
Network Resources
• Seam less access to resources
• Non-hierarchical system
• Load Balancing
• Fault Tolerance choice
reply
• Interfaces to Fortran, C, Java,
Matlab, m ore
request
Softw are is available NetSolveClient 146

NetSolveAgent
HARNESS Virtual Machine
Scalable Distributed con trol and CCA based Daemon
Discovery and registration

Host A
Host D
Virtual Another
Machine VM
Host B
Component
Host C based da emon
proce ss control
Customization
Operation within VM uses user fea tures and extension
Distributed Control HARNESS daem on by dynamica ll
http://www.epm.ornl.gov/harness/ 147
adding plug-in
HARNESS Core Research
Par allel Plug-ins for Heterogeneous Distributed Virtual Machin e
One research go al is to un derstand and implemen t

a dynamic parallel plug -in environ ment.
provides a method for many use rs to e xte nd Harness

in much the same way tha t third party serial plug-ins
extend Nets cape,P hotos hop, and Linux.
Research issues wit h Parallel plug -ins in clu de:

heterog eneity, synch ro nization, in tero peration , p artial success
(three typica l ca ses):
•load plug-in into single host of VM w/o communication

•load plug-in into single host broadcast to rest of VM
•load plug-in into every host of VM w/ synchronization
148
Nimrod - A Job Management
System
http://www.dgs.monash.edu.au/~davida/nimrod.html 149
Job processing with Nimrod
150
Nimrod/G Architecture
Nimrod/G Client Nimrod/G Client Nimrod/G Client
Nimrod Engine
Schedule Advisor
Persistent Trading Manager

Store
Dispatcher Grid Explorer
TM TS
Middleware Services GE GIS
Grid Information Services

RM & TS
RM & TS
RM & TS
GUSTO Test Bed RM: Local Resource Manager, TS: Trade Server
151
Compute Power Market
Grid Information Server
Grid Explorer
Job
Application Control Schedule Advisor Trade Server Charging
Agent
Trading Alg.
Accounting
Trade Manager Resource
Reservation Other
services
Deployment Agent Resource Allocation
User Resource Broker R1 R2 … Rn
A Resource Domain
152
Pointers to Literature
on Cluster Computing
153
Reading Resources..1a
Internet & WWW
– Computer Architecture:
• http://www.cs.wisc.edu/~arch/www/
– PFS & Parallel I/O
• http://www.cs.dartmouth.edu/pario/
– Linux Parallel Procesing

• http://yara.ecn.purdue.edu/~pplinux/Sites/
– DSMs
• http://www.cs.umd.edu/~keleher/dsm.html
154
Reading Resources..1b
Internet & WWW
– Solaris-MC
• http://www.sunlabs.com/research/solaris-mc
– Microprocessors: Recent Advances
• http://www.microprocessor.sscc.ru
– Beowulf:
• http://www.beowulf.org
– Metacomputing
• http://www.sis.port.ac.uk/~mab/Metacomputing/
155
Reading Resources..2
Books
– In Search of Cluster
• by G.Pfister, Prentice Hall (2ed), 98
– High Performance Cluster Computing

• Volume1: Architectures and Systems
• Volume2: Programming and Applications
– Edited by Rajkumar Buyya, Prentice Hall, NJ, USA.
– Scalable Parallel Computing

• by K Hwang & Zhu, McGraw Hill,98
156
Reading Resources..3
Journals
– A Case of NOW, IEEE Micro, Feb’95

• by Anderson, Culler, Paterson
– Fault Tolerant COW with SSI, IEEE
Concurrency, (to appear)
• by Kai Hwang, Chow, Wang, Jin, Xu
– Cluster Computing: The Commodity
Supercomputing, Journal of Software
Practice and Experience-(get from my web)
• by Mark Baker & Rajkumar Buyya
157
Cluster Computing
Infoware
http://www.csse.monash.edu.au/~rajkumar/cluster/
158
Cluster Computing
Forum
IEEE Task Force on Cluster

Computing
(TFCC)
http://www.ieeetfcc.org
159
TFCC Activities...
❃ Network Technologies
❃ OS Technologies
❃ Parallel I/O
❃ Programming Environments
❃ Java Technologies
❃ Algorithms and Applications
❃ >Analysis and Profiling
❃ Storage Technologies
❃ High Throughput Computing
160
TFCC Activities...
❃ High Availability
❃ Single System Image
❃ Performance Evaluation
❃ Software Engineering
❃ Education
❃ Newsletter
❃ Industrial Wing
❃ TFCC Regional Activities
– All the above have there own pages, see pointers from:
– http://www.ieeetfcc.org
161
TFCC Activities...
❃ Mailing list, Workshops, Conferences,

Tutorials, Web-resources etc.
❃ Resources for introducing subject in senior

undergraduate and graduate levels.
❃ Tutorials/Workshops at IEEE Chapters..
❃ ….. and so on.
❃ FREE MEMBERSHIP, please join!
❃ Visit TFCC Page for more details:
– http://www.ieeetfcc.org (updated daily!).
162
Clusters Revisited
163
Summary
☞ We have discussed Clusters

☞Enabling Technologies
☞Architecture & its Components
☞Classifications
☞Middleware
☞Single System Image
☞Representative Systems
164
Conclusions
☞ Clusters are promising..

☞Solve parallel processing paradox
☞Offer incremental growth and matches with funding
pattern.
☞New trends in hardware and software technologies are
likely to make clusters more promising..so that
☞Clusters based supercomputers can be seen everywhere!
165
Computing Platforms Evolution
Breaking Adm inistrative Barriers
2 1 0
2 1 0 2 10 2 10
P
E
?
R 2 1 0 2 10 2 10
2 1 0
F 21 00
O Administrative Barrie
R
M Ind ividu al
A Gro up
N D epart men t
C C ampus
E Sta te
N ational
Globe
Inte r Plane t
U niverse
166
Thank
Thank You
You ...
...
167
Backup Slides...
168
SISD : A Conventional
Computer
Instructions
Data Input Processor
Processor Data Output
 Speed is limited by the rate at which computer

can transfer information internally.
Ex:PC, Macintosh, Workstations
169
The MISD Architecture
Instruction
Stream A
Instruction
Stream B
Instruction Stream C
Processor
A Data
Output
Data Processor Stream
Input B
Stream
Processor
C
 More of an intellectual exercise than a practical configuration.

Few built, but commercially not available
170
SIMD Architecture
Instruction
Stream
Data Output
Data Input Processor stream A
stream A A
Data Output
stream B
stream B B
Data Input stream C
C
stream C
Ci<= Ai * Bi
Ex: CRAY machine vector processing, Thinking machine cm*

171
MIMD Architecture
Instruction Instruction Instruction
Stream A Stream B Stream C
Data Output
Data Input Processor stream A
stream A A
Data Output
stream B
stream B B
Data Input stream C
C
stream C
Unlike SISD, MISD, MIMD computer works asynchronously.

Shared memory (tightly coupled) MIMD
Distributed memory (loosely coupled) MIMD
172
Shared Memory MIMD
machine
Processor
Processor Processor
Processor Processor
Processor
AA BB CC
M M M
E E E
M B M B M B
O U O U O U
R S R S R S
Y Y Y
Global
GlobalMemory
MemorySystem
System
Comm: Source PE writes data to GM & destination retrieves it

 Easy to build, conventional OSes of SISD can be easily be ported
 Limitation : reliability & expandability. A memory component or any
processor failure affects the whole system.
 Increase of processors leads to memory contention.
Ex. : Silicon graphics supercomputers....
173
Distributed Memory MIMD
IPC IPC
channel channel
Processor
Processor Processor
Processor Processor
Processor
AA BB CC
M M M
E E E
M B M B M B
O U O U O U
R S R S R S
Y Y Y
Memory
Memory Memory Memory
Memory Memory
System
System AA System
System BB System
SystemCC
● Communication : IPC on High Speed Network.

● Network can be configured to ... Tree, Mesh, Cube, etc.
● Unlike Shared MIMD
 easily/ readily expandable
 Highly reliable (any CPU failure does not affect the whole
system) 174

Cluster Tutorial

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cluster Tutorial

Uploaded by

Copyright:

Available Formats

High Performance Cluster Computing

(Architecture, Systems, and Applications)

Rajkumar Buyya, Monash University, Melbourne.

❃ Learn and Share Recent advances in cluster

PP Processor Thread Process

Solving grand challenge applications using

Web Serving and E-Commerce”

❃ What are/will be the major problems/issues in

❃ Numerous Scientific & Engineering Apps.

2100 2100 2100 2100

❃ Clusters are already in use for web serving, web-hosting, and

RWCP Japan: http://www.rwcp.or.jp/papia/ PAPIA PC Cluster

❃ Cluster based web-servers, search engineers, portals…

➘ Sequential architectures reaching physical

✺ PP can be traced to a tablet

➟Aggregated speed with

➤ Simple classification by Flynn:

➤ Current focus is on MIMD model, using

❃ SISD - mainframes, workstations, PCs.

❃ The communications bandwidth

❃ NOTE: Modern sequential machines are

❃ Time required to develop a

– Half Life of Parallel Supercomputers.

❃ Codine (Computing in Distributed Network Environment) - GENIAS

❃ Surveys show utilisation of CPU cycles of

❃ Usually a workstation will be owned

❃ Typically, there are three types of

❃ Cluster computing aims to steal spare cycles from

Mainframes Minis PCs

Minis PCs Network

(hitting wall soon)

Vector Supercomputer MPP

❃ A cluster is a type of parallel or distributed

– Network: Faster, closer connection than a typical network (LAN)

❃ Clusters can be grown: Incremental scalability (up, down,

1960 1990 1995+ 2000

Currently serving Fine Art at http://www.thinker.org/imagebase/

❃ Four Sun E5000s

❃ NPACI => Sun 450s

Guarantee atleast one Deliver large % of collective

❃ Scalable Parallel Applications require

❃ Performance gap between processor and

❃ Thrashing to disk degrades performance

❃ Paging across networks can be effective

❃ Typically thrashing to network RAM can be

❃ Clustering gained momentum when 3

– 2. High speed communication

– 3. Standard tools for parallel/ distributed computing

❃ State of the art OS:

❃ Network Interface Card

❃ Threads (PCs, SMPs, NOW..)

❃ System availability (HA). offer inherent high system

❃ Based on Focus (in Market)

❃ Based on Node Architecture..

Performance of SMP Systems Vs.

❃ Based on Node OS Type..

❃ Based on node components

PVM / MPI/ RSH

❃ Multi-user, time-sharing environments

❃ Nodes with different CPU speeds and memory sizes

❃ Many processes, with unpredictable requirements

❃ Unlike SMP: insufficient “bonds” between nodes

– Each computer operates independently