You are on page 1of 12

CLUSTER COMPUTING

Introduction:

Computing is an evolutionary process. There are five generations of development


history of computing process improving over the other such as software, architecture,
representation etc. As a part of it, computing requirements have been increasing. So there is a
need for increasing the speed and cost effective computer systems. Parallel and distributed
computing is the solution. Today a wide range of applications are eager for higher computing
power, and faster execution. So cluster computing is defined as a type of parallel or
distributed processing system, which consists of a collection of interconnected stand-alone
computers co - operatively working together as a single, integrated computing resource.
These computers are linked together using high-speed network interfaces between themselves
and the actual binding together of the all the individual computers in the cluster is performed
by the operating system and the software used.

Clusters built using commodity-off-the-shelf (COTS) hardware components and are


used in many commercial applications. Computer clusters have a wide range of applicability
and deployment, ranging from small business clusters with a handful of nodes to some of the
fastest supercomputers in the world such as the K computer. Cluster computing technology
has been improving in several dimensions from many system vendors. It can achieve a very
high level of availability which is more than adequate for the vast majority of critical
application requirements; it provides a dimension of scalability as an alternative to extending
or upgrading a single system, and from many vendors older installed systems can be coupled
into the cluster providing an economical method of increased compute power and high
availability. The business drivers are powerful and they make cluster computing attractive for
a wide range of computing needs.

History:
SAGE was a cluster system built for NORAD under an Air Force contract by IBM in
the 1950s based on the MIT Whirlwind computer architecture. SAGE consisted of a number
of separate stand-alone systems cooperating to manage early warning detection of hostile
airborne intrusion. Breakthroughs in enabling technologies in the late 1970s, both in
hardware and software, have a significant long-term effect on future cluster computing . The
first commodity clustering product was ARCnet, developed by Datapoint in 1977. In the
1

decade of 1980s a collection of 160 interconnected Apollo workstations was employed as a


cluster to perform certain computational tasks by the NSA.
An important milestone in the application of message passing model is PVM (Parallel
Virtual Machine). PVM is a library of linkable functions that could allow routines running on
separate but networked computers to exchange data and coordinate their operation. The
history of cluster computing is linked with the evolution of networking technology, as it is
cheaper and faster. Microsoft, Sun Micro Systems and other leading hardware software
companies offer cluster packages.

Why clusters?

The first thing is the clusters are inexpensive and powerful. But now in every business
application super computers are evolved. If the machines are continuously used, then there
must be a need for the data center to maintain and upgrade them which is a tedious task. On
the other hand, clusters are very cheap and are easy to combine them into a single super
computer. Sometimes clusters are more advantageous than super computers. The distinct is
that they can build easily with the available components to build into a super computer. There
is no need to build a new cluster.
An application may desire more computational power for many reasons, but the
following three are the most common:
Real-time constraints: That is, a requirement that the computation finish
within a certain period of time. Weather forecasting is an example. Another is
processing data produced by an experiment; the data must be processed (or
stored) at least as fast as it is produced.
Throughput: A scientific or engineering simulation may require many
computations. A cluster can provide the resources to process many related
simulations. An example of using a Linux Beowulf cluster for throughput is
Google [13], which uses over 15,000 commodity PCs with fault-tolerant
software to provide a high-performance Web search service.
Memory: Some of the most challenging applications require huge amounts of
data as part of the simulation.
The most important benefit of the clusters for their wide growth is that they have
significantly reduced the processing power. One indication for this is the Gorden Bell award
for the Avalon Cluster al Los Alamos National Laboratory.
2

Architecture:
A cluster is a type of parallel or distributed processing system, which consists of a
collection of interconnected stand alone computers cooperatively working together as a single
resource.

All the PCs are connected through a network which is used to perform the tasks.
These PCs are also referred as nodes. In the above there will the parallel programming
environment and the applications that are to be carried out. Clusters with a special network
like Myrinet [18] use communication protocol such as Active Messages [19] for fast
communication among its nodes. This hardware interface bypasses the operating system and
provides direct user-level access to the network interface, thus remove the critical
communication overheads. The sequential applications are the tasks which are to be
performed one after the other.

In between the Cluster Middleware helps to improve the management and utilization
of the system. Cluster Middleware such as PVM or similar technologies enables parallel
processing and to monitor software and hardware deployment.

Programming environments can offer portable, efficient, and easy-to-use tools for
developing applications. Such environments include tools and utilities such as compilers,
message passing libraries, debuggers, and profilers. A cluster can execute both parallel and
sequential applications.

A node:
a single or multiprocessor system with memory, I/O facilities, &OS
generally 2 or more computers (nodes) connected together
in a single cabinet, or physically separated & connected via a LAN
appear as a single system to users and applications
Provide a cost-effective way to gain features and benefits.

Working of a cluster:
Clusters work primarily through a mechanism called polling. Polling is an activity
similar to pinging, where a message is sent to a target device and the results are examined. If
a successful result is received then the polling was successful; otherwise, it is determined to
have a problem. All servers participating in the clustered configuration keep polling each
other to determine whether each one is working. Polling relies on the cluster
interconnect system, the LAN and the device controllers.
The following figure illustrates a two-node cluster configuration with the cluster
interconnect and the SCSI BUS for the LAN.

Cluster interconnect is used for polling to determine whether the other node is
available while the SCSI BUS is used by the system. The interconnect device polls the other
node in the cluster to see if the node is available. If the node, or server, does not respond
within a time specified internally, the polling node tries to find the node through the LAN.
Failure or success indicates whether the problem lies with the target node or whether it is
with the interconnect device.
If the polling node determines LAN polling mechanism that the other node in the
cluster is offline, it will take over the node's disk resources in order to keep them available to
the network. When the disabled node comes back online, all the disk resources that were used
by the polling node will get back to its own. This is an automatic process and all part of the
failover feature of a clustered configuration.
If a failover occurs the following conditions will be met:
The interconnect poll must fail.
The LAN poll must fail.
The polling node must succeed in taking control of the missing cluster nodes

disk resources.

Handling of a failover cluster:


Once a failure has detected, the cluster manager is to reconfigure the
remaining nodes to form a new configuration. During this process, the remaining
nodes drop the failed node from the node/member list and the activities
of checking, load balancing, processing, etc., occur among the reconfigured
cluster group.
After the reconfiguration process has completed, the cluster manager starts the
recovery process. During the recovery process, all user and system activities are
brought to a new state that includes information about the failed instance. Also
included are removals and/or completion activities that are incomplete by the
processes running on the failed node.

Components of a cluster:
The main components of a cluster are the Personal Computer and the interconnection
network. The computer can be built out of Commercial off the shelf components (COTS) and
is available economically.
The cluster mainly consists of 4 major parts. They are:
1. Network,
2. Compute nodes
3. Master server
4. Gateway.
Each part has a specific function that is needed for the hardware to perform its function.
Network: It provides communication between nodes, server, and gateway. Also
consists of fast Ethernet switch, cables, and other networking hardware.
Compute Nodes: Serve as processors for the cluster. Each node is interchangeable,
there are no functionality differences between nodes and consists of all computers in the
cluster other than the gateway and server.
Master Server: Provides network services to the cluster. Actually runs parallel
programs and spawns processes on the nodes and should have minimum requirement.
Gateway: Acts as a bridge/firewall between outside world and cluster and should
have two ethernet card.

The interconnection network can be either an ATM ring (Asynchronous Transfer


Mode) which guarantees a fast and effective connection, or a Fast Ethernet connection which
is commonly available now. Gigabit Ethernet which provides speeds up to 1000Mbps,or
Myrinet a commercial interconnection network with high speed and reduced latency are
viable options.
But for high-end scientific clustering, there are a variety of network interface cards
designed specifically for clustering. Those include Myricom's Myrinet, Giganet's cLAN and
the IEEE 1596 standard Scalable Coherent Interface (SCI).

Myricom
6

Myricom offers cards and switches that interconnect at speeds of up to 1.28


Gbps in each direction. The cards come in two different forms, copper-based and
optical.
Giganet
Giganet is the first vendor of Virtual Interface (VI) architecture cards for the
Linux platform, in their cLAN cards and switches. It uses its own network
communications protocol rather than IP to exchange data directly between the servers.
Giganet's products can currently offer 1 Gbps unidirectional communications between
the nodes at minimum latencies of 7 microseconds.
IEEE SCI
The IEEE standard SCI has even lower latencies (under 2.5 microseconds),
and it can run at 400 MB per second (3.2 Gbps) in each direction. SCI is a ringtopology-based networking system.

Classification of clusters:
There are three types of clusters:
High Performance clusters
Load balancing clusters
High availability clusters.
High performance clusters:
The High Performance Cluster computing is the scaling of computing
performance the computational task to be solved is divided into several processing
steps or subtasks, and the result at the end of the process back together. The
coordination of this large number of processes will be organized by a management
node and coordinated. The result is then stored in a result node. The failure of a
compute node, however, leads mostly to a decrease in overall performance; the
process is not in jeopardy. These clusters use commodity systems such as a group of
single- or dual-processor PCs linked via high-speed connections and communicating
over a common messaging layer to run those parallel applications.

Load balancing clusters:


Load-balancing clusters provide a more practical system for business needs.
As the name implies, that system entails sharing the processing load as evenly as
possible across a cluster of computers. That load could be in the form of an
application processing load or a network traffic load that needs to be balanced. Each
node can handle part of that load, and the load can be dynamically assigned between
the nodes to balance it out. The same holds for network traffic.
A very important point is the quality of the load balancer. In this case, quality
means the opportunity of the system to made a highly qualified forecast about which
node will offer the best performance concerning an individual application.

If one node is out of service, then the whole system will still work. The load
balancer will recognize the failed node, and mark the crashed system. The total
performance of the load balancing cluster will be reduced, but services will still be
provided.

High availability clusters:


A single server by itself can be considered a point of failure. If something
causes the server to shut down (power failure, software glitches, virus attack, natural
disaster), the whole network shuts down and becomes non-operational until the
problem is solved, the server can be restarted and the whole network is brought up
again.

A high-availability cluster will have (at minimum) two 'nodes' a primary and
backup, with the backup on standby until something happens to the primary. But the
actual design and implementation of a high-availability cluster is a complex
undertaking involving both software design and hardware architecture.
An HA Cluster must be designed in such a way that there will be no loss of
data or work when something happens to the primary node and that there will be

automatic and effortless switching over to the backup. This means that the data on the
backup should be constantly updated in real time.
Even though a cluster is made up of two nodes, there is a risk of happening
failure (even if both nodes are on a same location). This leads to redundancy.

Issues to be taken:
Cluster networking:
All the systems which are to be connected in a cluster should belong to the
same network technology, network adapters from the same manufacture for easy
communication.
Cluster software:
You will have to build versions of clustering software for each kind of system
you include in your cluster.
Programming:
Our code will have to be written to support the lowest common denominator
for data types supported by the least powerful node in our cluster.
Timing:
Timing is the important aspect of a cluster. Since every machine will one run
on its own code in executing a task. This can result in a serious problem as one node
waiting for the result from the other node.
Network selection:
The network topologies will be implemented by use of one or more network
interface cards, or NICs, installed into the head-node and compute nodes of our
cluster.
Speed selection:
No matter what topology you choose for your cluster, you will want to get
fastest network that your budget allows.

Implementation:
10

The TOP 500 organization's semi-annual list of the 500 fastest computers usually
includes many clusters.
GridMathematica - computer algebra and 3D visualization.
High powered Gaming.
The 28th most powerful supercomputer on Earth as of June 2006, is a 12.25 TFlops
computer cluster of 1100 Apple XServe G5 2.3 GHz dual-processor machines
(4 GB RAM,
80 GB SATA HD)
running Mac
OS
X and
using InfiniBand interconnect.

Cluster Applications:
Cluster computing is rapidly becoming the architecture of choice in Grand Challenge
Applications.
The high scale of complexity such as processing time, memory space, and
communication bandwidth.
Scientific computing.
Making movie.
Commercial server (web/database etc).

Drawbacks:
Clusters are poorer than traditional supercomputers at non-parallel computation.
Dependency on cluster head
-If cluster head gets down all setup gets down.
Addition of additional Node is expensive.

References:
11

http://www.scribd.com
http://en.wikipedia.org/wiki/Computer_cluster
www.buyya.com/
http://www.seminarsonly.com
http://www.slideshare.net/cluster-computing
http://flylib.com/books/en/3.396.1.17/1/
http://www.ccgrid.org

12

You might also like