Cluster

management of groups of systems and their applications.
The goal of cluster

computing is to facilitate sharing a computer load over several systems
without either the users of system or the administrators needing to know that
more than one system is involved. The Windows NT Server Edition of the
Windows operating system is an example of a base operating system that has
been modified to include architecture that facilitates a cluster computing
environment to be established.
Cluster computing has been employed for over

fifteen years but it is the recent demand for higher availability in small
businesses that has caused an explosion in this field. Electronic databases
and electronic malls have become essential to the daily operation of small
businesses. Access to this critical information by these entities has created a
large demand for cluster computing principle features.
21
There are some key concepts that must be
understood when forming a cluster computing resource. Nodes or systems
are the individual members of a cluster. They can be computers, servers, and
other such hardware although each node generally has memory and
processing capabilities. If one node becomes unavailable the other nodes can
carry the demand load so that applications or services are always available.
There must be at least two nodes to compose a cluster structure otherwise
they are just called servers. The collection of software on each node that
manages all cluster specific activity is called the cluster service. The cluster
service manages all of the resources, the canonical items in the system, and
sees then as identical opaque objects. Resources can be such things as
physical hardware devices, like disk drives and network cards, logical items,
like logical disk volumes, TCP/IP addresses, applications, and databases.
22
When a resource is providing its service on a
specific node it is said to be on-line. A collection of resources to be managed
as a single unit is called a group. Groups contain all of the resources
necessary to run a specific application, and if need be, to connect to the
service provided by the application in the case of client systems. These
groups allow administrators to combine resources into larger logical units so
that they can be managed as a unit. This, of course, means that all operations
performed on a group affect all resources contained within that group.
Normally the development of a cluster computing system occurs in phases.
The first phase involves establishing the underpinnings into the base
operating system and building the foundation of the cluster components.
These things should focus on providing enhanced availability to key
applications using storage that is accessible to two nodes. The following
stages occur as the demand increases and should allow for much larger
clusters to be formed. These larger clusters should have a true distribution of
applications, higher performance interconnects, widely distributed storage
for easy accessibility and load balancing. Cluster computing will become
even more prevalent in the future because of the growing needs and
demands of businesses as well as the spread of the Internet.
Clustering Concepts
Clusters are in fact quite simple. They are a bunch of computers tied
together with a network working on a large problem that has been broken
down into smaller pieces. There are a number of different strategies we can
use to tie them together. There are also a number of different software
packages that can be used to make the software side of things work.
23
Parallelism
The name of the game in high performance computing is parallelism. It is

the quality that allows something to be done in parts that work
independently rather than a task that has so many interlocking dependencies
that it cannot be further broken down. Parallelism operates at two levels:
hardware parallelism and software parallelism.
Hardware Parallelism
On one level hardware parallelism deals with the CPU of an individual

system and how we can squeeze performance out of sub-components of the
CPU that can speed up our code. At another level there is the parallelism that
is gained by having multiple systems working on a computational problem
in a distributed fashion. These systems are known as ‘fine grained’ for
parallelism inside the CPU or having to do with the multiple CPUs in the
same system, or ‘coarse grained’ for parallelism of a collection of separate
systems acting in concerts.
CPU Level Parallelism
A computer’s CPU is commonly pictured as a device that operates on one

instruction after another in a straight line, always completing one-step or
instruction before a new one is started. But new CPU architectures have an
inherent ability to do more than one thing at once. The logic of CPU chip
divides the CPU into multiple execution units. Systems that have multiple
execution units allow the CPU to attempt to process more than one
instruction at a time. Two hardware features of modern CPUs support
multiple execution units: the cache – a small memory inside the CPU. The
pipeline is a small area of memory inside the CPU where instructions that
are next in line to be executed are stored. Both cache and pipeline allow
impressive increases in CPU performances.
24
System level Parallelism
It is the parallelism of multiple nodes coordinating to work on a problem in

parallel that gives the cluster its power. There are other levels at which even
more parallelism can be introduced into this system. For example if we
decide that each node in our cluster will be a multi CPU system we will be
introducing a fundamental degree of parallel processing at the node level.
Having more than one network interface on each node introduces
communication channels that may be used in parallel to communicate
with other nodes in the cluster. Finally, if we use multiple disk drive
controllers in each node we create parallel data paths that can be used to
increase the performance of I/O subsystem.
Software Parallelism
Software parallelism is the ability to find well defined areas in a problem we want to solve that can
be broken down into self-contained parts. These parts are the program elements that can be
distributed and give us the speedup that we want to get out of a high performance computing system.
Before we can run a program on a parallel cluster, we have to ensure that the problems we are trying
to solve are amenable to being done in a parallel fashion. Almost any problem that is composed of
smaller subproblems that can be quantified can be broken down into smaller problems and run on a
node on a cluster.
System-Level Middleware
System-level middleware offers Single System Image (SSI) and high

availability infrastructure for processes, memory, storage, I/O, and
networking. The single system image illusion can be implemented using the
hardware or software infrastructure. This unit focuses on SSI at the
operating system or subsystems level.
25
A modular architecture for SSI allows the use of

services provided by lower level layers to be used for the implementation of
higher-level services. This unit discusses design issues, architecture, and
representative systems for job/resource management, network RAM,
software RAID, single I/O space, and virtual networking. A number of
operating systems have proposed SSI solutions, including MOSIX,
Unixware, and Solaris -MC. It is important to discuss one or more such
systems as they help students to understand architecture and implementation
issues.
Message Passing Primitives
Although new high-performance protocols are available for cluster

computing, some instructors may want provide students with a brief
introduction to message passing programs using the BSD Sockets interface
Transmission Control Protocol/Internet Protocol (TCP/IP) before
introducing more complicated parallel programming with distributed
memory programming tools. If students have already had a course in data
communications or computer networks then this unit should be skipped.
Students should have access to a networked computer lab with the Sockets
libraries enabled. Sockets usually come installed on Linux workstations.
Parallel Programming Using MPI
An introduction to distributed memory programming using a standard tool

such as Message Passing Interface (MPI)[23] is basic to cluster computing.
Current versions of MPI generally assume that programs will be written in
C, C++, or Fortran. However, Java-based versions of MPI are becoming
available.
26
Application-Level Middleware
Application-level middleware is the layer of software between the operating

system and applications. Middleware provides various services required by
an application to function correctly. A course in cluster programming can
include some coverage of middleware tools such as CORBA, Remote
Procedure Call, Java Remote Method Invocation (RMI), or Jini. Sun
Microsystems has produced a number of Java-based technologies that can
become units in a cluster programming course, including the Java
Development Kit (JDK) product family that consists of the essential tools
and APIs for all developers writing in the Java programming language
through to APIs such as for telephony (JTAPI), database connectivity
(JDBC), 2D and 3D graphics, security as well as electronic commerce.
These technologies enable Java to interoperate with many other devices,
technologies, and software standards.
Single System image
A single system image is the illusion, created by software or hardware, that

presents a collection of resources as one, more powerful resource. SSI makes
the cluster appear like a single machine to the user, to applications, and to
the network. A cluster without a SSI is not a cluster. Every SSI has a
boundary. SSI support can exist at different levels within a system, one able
to be build on another.
27
Single System Image Benefits
•
Provide a simple, straightforward view of all system resources and

activities, from any node of the cluster
•
Free the end user from having to know where an application will run
•
Free the operator from having to know where a resource is located

•
Let the user work with familiar interface and commands and allows
the administrators to manage the entire clusters as a single entity
•
Reduce the risk of operator errors, with the result that end users see
improved reliability and higher availability of the system
•
Allowing centralize/decentralize system management and control to

avoid the need of skilled administrators from system administration
•
Present multiple, cooperating components of an application to the

administrator as a single application
28
•
Greatly simplify system management

•
Provide location- independent message communication

•
Help track the locations of all resource so that there is no longer any
need for system operators to be concerned with their physical
location
•
Provide transparent process migration and load balancing across

nodes.
• Improved system response time and performance
High speed networks
Network is the most critical part of a cluster. Its capabilities and performance
directly influences the applicability of the whole system for HPC. Starting
from Local/Wide Area Networks (LAN/WAN) like Fast Ethernet and ATM,
to System Area Networks (SAN) like Myrinet and Memory Channel
Eg. Fast Ethernet
• 100 Mbps over UTP or fiber-optic cable

• MAC protocol: CSMA/CD
29
COMPONENTS OF CLUSTER COMPUTER
1. Multiple High Performance Computers
a. PCs
b. Workstations
c. SMPs (CLUMPS)
2. State of the art Operating Systems
a. Linux (Beowulf)
b. Microsoft NT (Illinois HPVM)
c. SUN Solaris (Berkeley NOW)
d. HP UX (Illinois - PANDA)
e. OS gluing layers(Berkeley Glunix)
3. High Performance Networks/Switches
a. Ethernet (10Mbps),
b. Fast Ethernet (100Mbps),
c. Gigabit Ethernet (1Gbps)
d. Myrinet (1.2Gbps)
e. Digital Memory Channel
f. FDDI
4. Network Interface Card

a. Myrinet has NIC
b. User-level access support
5. Fast Communication Protocols and Services
a. Active Messages (Berkeley)

b. Fast Messages (Illinois)
c. U-net (Cornell)
d. XTP (Virginia)
30
6. Cluster Middleware
a. Single System Image (SSI)
b. System Availability (SA) Infrastructure
7. Hardware
a. DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques
8. Operating System Kernel/Gluing Layers
a. Solaris MC, Unixware, GLUnix

9. Applications and Subsystems
a. Applications (system management and electronic forms)
b. Runtime systems (software DSM, PFS etc.)
c. Resource management and scheduling software (RMS)
10. Parallel Programming Environments and Tools
a. Threads (PCs, SMPs, NOW..)

b. MPI
c. PVM
d. Software DSMs (Shmem)
e. Compilers
f. RAD (rapid application development tools)
g. Debuggers
h. Performance Analysis Tools

i. Visualization Tools
11. Applications
a. Sequential
b. Parallel / Distributed (Cluster-aware app.)
31
CLUSTER CLASSIFICATIONS
Clusters are classified in to several sections based on the
facts such as 1)Application target 2) Node owner ship 3)
Node Hardware 4) Node operating System 5) Node
configuration.
Clusters based on Application Target are again classified into
two:
• High Performance (HP) Clusters

• High Availability (HA) Clusters
Clusters based on Node Ownership are again classified into
two:
• Dedicated clusters
• Non-dedicated clusters
Clusters based on Node Hardware are again classified into
three:
• Clusters of PCs (CoPs)
• Clusters of Workstations (COWs)
• Clusters of SMPs (CLUMPs)
Clusters based on Node Operating System are again

classified into:
• Linux Clusters (e.g., Beowulf)
• Solaris Clusters (e.g., Berkeley NOW)

• Digital VMS Clusters
• HP-UX clusters
32
• Microsoft Wolfpack clusters
Clusters based on Node Configuration are again classified
into:
•
Homogeneous Clusters -All nodes will have similar

architectures and run the same OSs
•
Heterogeneous Clusters- All nodes will have different

architectures and run different OSs
ISSUES TO BE CONSIDERED
Cluster Networking
If you are mixing hardware that has different networking technologies, there
will be large differences in the speed with which data will be accessed and
how individual nodes can communicate. If it is in your budget make sure
that all of the machines you want to include in your cluster have similar
networking capabilities, and if at all possible, have network adapters from
the same manufacturer.
Cluster Software
You will have to build versions of clustering software for each kind of
system you include in your cluster.
Programming
Our code will have to be written to support the lowest common denominator
for data types supported by the least powerful node in our cluster. With
mixed machines, the more powerful machines will have attributes that
cannot be attained in the powerful machine.
Timing
This is the most problematic aspect of heterogeneous cluster. Since these
machines have different performance profile our code will execute at
33
different rates on the different kinds of nodes. This can cause serious
bottlenecks if a process on one node is waiting for results of a calculation on
a slower node. The second kind of heterogeneous clusters is made from
different machines in the same architectural family: e.g. a collection of Intel
boxes where the machines are different generations or machines of same
generation from different manufacturers.
Network Selection
There are a number of different kinds of network topologies, including

buses, cubes of various degrees, and grids/meshes. These network topologies
will be implemented by use of one or more network interface cards, or NICs,
installed into the head-node and compute nodes of our cluster.
Speed Selection
No matter what topology you choose for your cluster, you will want to get
fastest network that your budget allows. Fortunately, the availability of high
speed computers has also forced the development of high speed networking
systems. Examples are 10Mbit Ethernet, 100Mbit Ethernet, gigabit
networking, channel bonding etc.
34
FUTURE TRENDS - GRID COMPUTING
As computer networks become cheaper and faster, a new computing

paradigm, called the Grid has evolved. The Grid is a large system of
computing resources that performs tasks and provides to users a single point
of access, commonly based on the World Wide Web interface, to these
distributed resources. Users consider the Grid as a single computational
resource. Resource management software, frequently referenced as
middleware, accepts jobs submitted by users and schedules them for
execution on appropriate systems in the Grid, based upon resource

management policies. Users can submit thousands of jobs at a time without
being concerned about where they run. The Grid may scale from single
systems to supercomputer-class compute farms that utilize thousands of
processors. Depending on the type of applications, the interconnection
between the Grid parts can be performed using dedicated high-speed
networks or the Internet. By providing scalable, secure, high-performance
mechanisms for discovering and negotiating access to remote resources, the
Grid promises to make it possible for scientific collaborations to share
resources on an unprecedented scale, and for geographically distributed
groups to work together in ways that were previously impossible. Several
35
examples of new applications that benefit from using Grid technology
constitute a coupling of advanced scientific instrumentation or desktop
computers with remote supercomputers; collaborative design of complex
systems via high-bandwidth access to shared resources; ultra-large virtual
supercomputers constructed to solve problems too large to fit on any single
computer; rapid, large-scale parametric studies.
The Grid technology is currently under intensive

development. Major Grid projects include NASA’s Information Power Grid,
two NSF Grid projects (NCSA Alliance’s Virtual Machine Room and
NPACI), the European DataGrid Project and the ASCI Distributed Resource
Management project. Also first Grid tools are already available for
developers. The Globus Toolkit [20] represents one such example and
includes a set of services and software libraries to support Grids and Grid
applications.
36
CONCLUSION
Clusters are promising
• Solve parallel processing paradox
• Offer incremental growth and matches with funding pattern
•
New trends in hardware and software technologies are likely to make

clusters more promising and fill SSI gap.
• Clusters based supercomputers (Linux based clusters) can be seen
everywhere!
37

Cluster

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cluster

Uploaded by

Copyright:

Available Formats

management of groups of systems and their applications.

The goal of cluster

Cluster computing has been employed for over

The name of the game in high performance computing is parallelism. It is

On one level hardware parallelism deals with the CPU of an individual

CPU Level Parallelism

A computer’s CPU is commonly pictured as a device that operates on one

It is the parallelism of multiple nodes coordinating to work on a problem in

System-level middleware offers Single System Image (SSI) and high

A modular architecture for SSI allows the use of

Message Passing Primitives

Although new high-performance protocols are available for cluster

Parallel Programming Using MPI

An introduction to distributed memory programming using a standard tool

Application-level middleware is the layer of software between the operating

Single System image

A single system image is the illusion, created by software or hardware, that

Provide a simple, straightforward view of all system resources and

Free the operator from having to know where a resource is located

Allowing centralize/decentralize system management and control to

Present multiple, cooperating components of an application to the

Greatly simplify system management

Provide location- independent message communication

Provide transparent process migration and load balancing across

• 100 Mbps over UTP or fiber-optic cable

2. State of the art Operating Systems

3. High Performance Networks/Switches

4. Network Interface Card

a. Active Messages (Berkeley)

a. Solaris MC, Unixware, GLUnix

10. Parallel Programming Environments and Tools

a. Threads (PCs, SMPs, NOW..)

h. Performance Analysis Tools

• High Performance (HP) Clusters

• Clusters of PCs (CoPs)

• Clusters of Workstations (COWs)

• Clusters of SMPs (CLUMPs)

Clusters based on Node Operating System are again

• Linux Clusters (e.g., Beowulf)

• Solaris Clusters (e.g., Berkeley NOW)

Homogeneous Clusters -All nodes will have similar

Heterogeneous Clusters- All nodes will have different

There are a number of different kinds of network topologies, including

As computer networks become cheaper and faster, a new computing

middleware, accepts jobs submitted by users and schedules them for

execution on appropriate systems in the Grid, based upon resource

The Grid technology is currently under intensive

New trends in hardware and software technologies are likely to make

You might also like