Professional Documents
Culture Documents
Introduction:
History:
SAGE was a cluster system built for NORAD under an Air Force contract by IBM in
the 1950s based on the MIT Whirlwind computer architecture. SAGE consisted of a number
of separate stand-alone systems cooperating to manage early warning detection of hostile
airborne intrusion. Breakthroughs in enabling technologies in the late 1970s, both in
hardware and software, have a significant long-term effect on future cluster computing . The
first commodity clustering product was ARCnet, developed by Datapoint in 1977. In the
1
Why clusters?
The first thing is the clusters are inexpensive and powerful. But now in every business
application super computers are evolved. If the machines are continuously used, then there
must be a need for the data center to maintain and upgrade them which is a tedious task. On
the other hand, clusters are very cheap and are easy to combine them into a single super
computer. Sometimes clusters are more advantageous than super computers. The distinct is
that they can build easily with the available components to build into a super computer. There
is no need to build a new cluster.
An application may desire more computational power for many reasons, but the
following three are the most common:
Real-time constraints: That is, a requirement that the computation finish
within a certain period of time. Weather forecasting is an example. Another is
processing data produced by an experiment; the data must be processed (or
stored) at least as fast as it is produced.
Throughput: A scientific or engineering simulation may require many
computations. A cluster can provide the resources to process many related
simulations. An example of using a Linux Beowulf cluster for throughput is
Google [13], which uses over 15,000 commodity PCs with fault-tolerant
software to provide a high-performance Web search service.
Memory: Some of the most challenging applications require huge amounts of
data as part of the simulation.
The most important benefit of the clusters for their wide growth is that they have
significantly reduced the processing power. One indication for this is the Gorden Bell award
for the Avalon Cluster al Los Alamos National Laboratory.
2
Architecture:
A cluster is a type of parallel or distributed processing system, which consists of a
collection of interconnected stand alone computers cooperatively working together as a single
resource.
All the PCs are connected through a network which is used to perform the tasks.
These PCs are also referred as nodes. In the above there will the parallel programming
environment and the applications that are to be carried out. Clusters with a special network
like Myrinet [18] use communication protocol such as Active Messages [19] for fast
communication among its nodes. This hardware interface bypasses the operating system and
provides direct user-level access to the network interface, thus remove the critical
communication overheads. The sequential applications are the tasks which are to be
performed one after the other.
In between the Cluster Middleware helps to improve the management and utilization
of the system. Cluster Middleware such as PVM or similar technologies enables parallel
processing and to monitor software and hardware deployment.
Programming environments can offer portable, efficient, and easy-to-use tools for
developing applications. Such environments include tools and utilities such as compilers,
message passing libraries, debuggers, and profilers. A cluster can execute both parallel and
sequential applications.
A node:
a single or multiprocessor system with memory, I/O facilities, &OS
generally 2 or more computers (nodes) connected together
in a single cabinet, or physically separated & connected via a LAN
appear as a single system to users and applications
Provide a cost-effective way to gain features and benefits.
Working of a cluster:
Clusters work primarily through a mechanism called polling. Polling is an activity
similar to pinging, where a message is sent to a target device and the results are examined. If
a successful result is received then the polling was successful; otherwise, it is determined to
have a problem. All servers participating in the clustered configuration keep polling each
other to determine whether each one is working. Polling relies on the cluster
interconnect system, the LAN and the device controllers.
The following figure illustrates a two-node cluster configuration with the cluster
interconnect and the SCSI BUS for the LAN.
Cluster interconnect is used for polling to determine whether the other node is
available while the SCSI BUS is used by the system. The interconnect device polls the other
node in the cluster to see if the node is available. If the node, or server, does not respond
within a time specified internally, the polling node tries to find the node through the LAN.
Failure or success indicates whether the problem lies with the target node or whether it is
with the interconnect device.
If the polling node determines LAN polling mechanism that the other node in the
cluster is offline, it will take over the node's disk resources in order to keep them available to
the network. When the disabled node comes back online, all the disk resources that were used
by the polling node will get back to its own. This is an automatic process and all part of the
failover feature of a clustered configuration.
If a failover occurs the following conditions will be met:
The interconnect poll must fail.
The LAN poll must fail.
The polling node must succeed in taking control of the missing cluster nodes
disk resources.
Components of a cluster:
The main components of a cluster are the Personal Computer and the interconnection
network. The computer can be built out of Commercial off the shelf components (COTS) and
is available economically.
The cluster mainly consists of 4 major parts. They are:
1. Network,
2. Compute nodes
3. Master server
4. Gateway.
Each part has a specific function that is needed for the hardware to perform its function.
Network: It provides communication between nodes, server, and gateway. Also
consists of fast Ethernet switch, cables, and other networking hardware.
Compute Nodes: Serve as processors for the cluster. Each node is interchangeable,
there are no functionality differences between nodes and consists of all computers in the
cluster other than the gateway and server.
Master Server: Provides network services to the cluster. Actually runs parallel
programs and spawns processes on the nodes and should have minimum requirement.
Gateway: Acts as a bridge/firewall between outside world and cluster and should
have two ethernet card.
Myricom
6
Classification of clusters:
There are three types of clusters:
High Performance clusters
Load balancing clusters
High availability clusters.
High performance clusters:
The High Performance Cluster computing is the scaling of computing
performance the computational task to be solved is divided into several processing
steps or subtasks, and the result at the end of the process back together. The
coordination of this large number of processes will be organized by a management
node and coordinated. The result is then stored in a result node. The failure of a
compute node, however, leads mostly to a decrease in overall performance; the
process is not in jeopardy. These clusters use commodity systems such as a group of
single- or dual-processor PCs linked via high-speed connections and communicating
over a common messaging layer to run those parallel applications.
If one node is out of service, then the whole system will still work. The load
balancer will recognize the failed node, and mark the crashed system. The total
performance of the load balancing cluster will be reduced, but services will still be
provided.
A high-availability cluster will have (at minimum) two 'nodes' a primary and
backup, with the backup on standby until something happens to the primary. But the
actual design and implementation of a high-availability cluster is a complex
undertaking involving both software design and hardware architecture.
An HA Cluster must be designed in such a way that there will be no loss of
data or work when something happens to the primary node and that there will be
automatic and effortless switching over to the backup. This means that the data on the
backup should be constantly updated in real time.
Even though a cluster is made up of two nodes, there is a risk of happening
failure (even if both nodes are on a same location). This leads to redundancy.
Issues to be taken:
Cluster networking:
All the systems which are to be connected in a cluster should belong to the
same network technology, network adapters from the same manufacture for easy
communication.
Cluster software:
You will have to build versions of clustering software for each kind of system
you include in your cluster.
Programming:
Our code will have to be written to support the lowest common denominator
for data types supported by the least powerful node in our cluster.
Timing:
Timing is the important aspect of a cluster. Since every machine will one run
on its own code in executing a task. This can result in a serious problem as one node
waiting for the result from the other node.
Network selection:
The network topologies will be implemented by use of one or more network
interface cards, or NICs, installed into the head-node and compute nodes of our
cluster.
Speed selection:
No matter what topology you choose for your cluster, you will want to get
fastest network that your budget allows.
Implementation:
10
The TOP 500 organization's semi-annual list of the 500 fastest computers usually
includes many clusters.
GridMathematica - computer algebra and 3D visualization.
High powered Gaming.
The 28th most powerful supercomputer on Earth as of June 2006, is a 12.25 TFlops
computer cluster of 1100 Apple XServe G5 2.3 GHz dual-processor machines
(4 GB RAM,
80 GB SATA HD)
running Mac
OS
X and
using InfiniBand interconnect.
Cluster Applications:
Cluster computing is rapidly becoming the architecture of choice in Grand Challenge
Applications.
The high scale of complexity such as processing time, memory space, and
communication bandwidth.
Scientific computing.
Making movie.
Commercial server (web/database etc).
Drawbacks:
Clusters are poorer than traditional supercomputers at non-parallel computation.
Dependency on cluster head
-If cluster head gets down all setup gets down.
Addition of additional Node is expensive.
References:
11
http://www.scribd.com
http://en.wikipedia.org/wiki/Computer_cluster
www.buyya.com/
http://www.seminarsonly.com
http://www.slideshare.net/cluster-computing
http://flylib.com/books/en/3.396.1.17/1/
http://www.ccgrid.org
12