You are on page 1of 19

GRID COMPUTING

GRID COMPUTING

INTRODUCTION:

Grid computing is emerging as a new paradigm for next-generation computing, enables the
sharing, selection, and aggregation of geographically distributed heterogeneous resources for
solving large-scale problems in science, engineering, and commerce. The resources in the Grid are
heterogeneous and geographically distributed. Availability, usage and cost policies vary depending
on the particular user, time, priorities and goals. It enables the regulation of supply and demand for
resources.

WHAT IS GRID COMPUTING?

Grid computing (or the use of computational grids) is the combination of computer resources
from multiple administrative domains applied to a common task, usually to a scientific, technical
or business problem that requires a great number of computer processing cycles or the need to
process large amounts of data.
One of the main strategies of grid computing is using software to divide and apportion pieces of a
program among several computers, sometimes up to many thousands. Grid computing is
distributed, large-scale cluster computing, as well as a form of network-distributed parallel
processing. The size of grid computing may vary from being small confined to a network of
computer workstations within a corporation, for example to being large, public collaboration
across many companies and networks. "The notion of a confined grid may also be known as intra-
nodes cooperation whilst the notion of a larger, wider grid may thus refer to inter-nodes
cooperation”. This inter-/intra-nodes cooperation "across cyber-based collaborative organizations
are also known as Virtual Organizations".
It is a form of distributed computing whereby a “super and virtual computer” is composed of a
cluster of networked loosely coupled computers acting in concert to perform very large tasks. This
technology has been applied to computationally intensive scientific, mathematical, and academic
problems through volunteer computing, and it is used in commercial enterprises for such diverse
GRID COMPUTING

applications as drug discovery, economic forecasting, seismic analysis, and back-office data
processing in support of e-commerce and Web services.
What distinguishes grid computing from conventional cluster computing systems is that grids tend
to be more loosely coupled, heterogeneous, and geographically dispersed. Also, while a
Computing grid may be dedicated to a specialized application; it is often constructed with the aid
of general-purpose grid software libraries and middleware.

In a basic grid computing system, every computer can access the resources of every other
computer belonging to the network.

What Grid Can Do?

• Exploiting underutilized resources:

In most organizations, there are large amounts of underutilized computing resources. Most desktop
machines are busy less than 5 percent of the time. In some organizations, even the server machines
GRID COMPUTING

can often be relatively idle. Grid computing provides a framework for exploiting the underutilized
resources and thus has the possibility of substantially increasing the efficiency of resource usage.
Another function of the grid is to better balance resource utilization. An organization may have
occasional unexpected peaks of activity that demand more resources. If the applications are grid-
enabled, they can be moved to underutilized machines during such peaks. In fact, some grid
implementations can migrate partially completed jobs. In general, a grid can provide a consistent
way to balance the loads on a wider federation of resources. This applies to CPU, storage, and
many other kinds of resources that may be available on a grid.

• Parallel CPU capacity

The potential for massive parallel CPU capacity is one of the most attractive features of a grid.
In addition to pure scientific needs, such computing power is driving a new evolution in
industries such as the bio-medical field, financial modeling, oil exploration, motion picture
animation, and many others.
The common attribute among such uses is that the applications have been written to use
algorithms that can be partitioned into independently running parts. A CPU intensive grid
application can be thought of as many smaller “sub jobs,” each executing on a different
machine in the grid. To the extent that these sub jobs do not need to communicate with each
other, the more “scalable” the application becomes. A perfectly scalable application will, for
example, finish 10 times faster if it uses 10 times the number of processors. Barriers often
exist to perfect scalability. The first barrier depends on the algorithms used for splitting the
application among many CPUs. If the algorithm can only be split into a limited number of
independently running parts, then that forms a scalability barrier. The second barrier appears
if the parts are not completely independent; this can cause contention, which can limit
scalability. For example, if all of the sub jobs need to read and write from one common file or
Database, the access limits of that file or database will become the limiting factor in the
application’s scalability.

• Virtual resources and virtual organizations for collaboration:


GRID COMPUTING

Another important grid computing contribution is to enable and simplify collaboration among
a wider audience. In the past, distributed computing promised this collaboration and achieved
it to some extent. Grid computing takes these capabilities to an even wider audience, while
offering important standards that enable very heterogeneous systems to work together to form
the image of a large virtual computing system offering a variety of virtual resources The
users of the grid can be organized dynamically into a number of virtual organizations, each
with different policy requirements. These virtual organizations can share their resources
collectively as a larger grid. Sharing starts with data in the form of files or databases. A “data
grid” can expand data capabilities in several ways. First, files or databases can seamlessly
span many systems and thus have larger capacities than on any single system. Such spanning
can improve data transfer rates through the use of striping techniques. Data can be duplicated
throughout the grid to serve as a backup and can be hosted on or near the machines most
likely to need the data, in conjunction with advanced scheduling techniques. Sharing is not
limited to files, but also includes many other resources, such as equipment, software, services,
licenses, and others. These resources are “virtualized” to give them a more uniform
interoperability among heterogeneous grid participants. The participants and users of the grid
can be members of several real and virtual organizations. The grid can help in enforcing
security rules among them and implement policies, which can resolve priorities for both
resources and users.

• Access to additional resources


In addition to CPU and storage resources, a grid can provide access to increased quantities of
other resources and to special equipment, software, licenses, and other services. The
additional resources can be provided in additional numbers and/or capacity. For example, if a
user needs to increase his total bandwidth to the Internet to implement a data mining search
engine, the work can be split among grid machines that have independent connections to the
Internet. In this way, the total searching capability is multiplied, since each machine has a
separate connection to the Internet. If the machines had shared the connection to the Internet,
there would not have been an effective increase in bandwidth. Some machines may have
expensive licensed software installed that the user requires. His jobs can be sent to such
machines more fully exploiting the software licenses. Some machines on the grid may have
GRID COMPUTING

special devices. Most of us have used remote printers, perhaps with advanced color
capabilities or faster speeds. Similarly, a grid can be used to make use of other special
equipment. For example, a machine may have a high speed, self feeding, DVD writer that
could be used to publish a quantity of data faster. Some machines on the grid may be
connected to scanning electron microscopes that can be operated remotely. In this case,
scheduling and reservation are important. A specimen could be sent in advance to the facility
hosting the microscope. Then the user can remotely operate the machine, changing
perspective views until the desired image is captured. The grid can enable more elaborate
access, potentially to remote medical diagnostic and robotic surgery tools with two-way
interaction from a distance. The variations are limited only by one’s imagination. Today, we
have remote device drivers for printers. Eventually, we will see standards for grid-enabled
device drivers to many unusual devices and resources. All of these will make the grid look
like a large virtual machine with a collection of virtual resources beyond what would be
available on just one conventional machine.

• Resource balancing
A grid federates a large number of resources contributed by individual machines into a greater
total virtual resource. For applications that are grid-enabled, the grid can offer a resource
balancing effect by scheduling grid jobs on machines with low utilization. This feature can
prove invaluable for handling occasional peak loads of activity in parts of a larger
organization.
This can happen in two ways:
• An unexpected peak can be routed to relatively idle machines in the grid.
• If the grid is already fully utilized, the lowest priority work being performed on the
grid can be temporarily suspended or even cancelled and performed again later to
make room for the higher priority work.
Without a grid infrastructure, such balancing decisions are difficult to prioritize and execute.
Occasionally, a project may suddenly rise in importance with a specific deadline. A grid
cannot perform a miracle and achieve a deadline when it is already too close. However, if the
size of the job is known, if it is a kind of job that can be sufficiently split into sub jobs, and if
enough resources are available after preempting lower priority work, a grid can bring a very
GRID COMPUTING

large amount of processing power to solve the problem. In such situations, a grid can, with
some planning, succeed in meeting a surprise deadline.

• Reliability and Management


High-end conventional computing systems use expensive hardware to increase reliability.
They are built using chips with redundant circuits that vote on results, and contain much logic
to achieve graceful recovery from an assortment of hardware failures. The machines also use
duplicate processors with hot plug ability so that when they fail, one can be replaced without
turning the other off. Power supplies and cooling systems are duplicated. The systems are
operated on special power sources that can start generators if utility power is interrupted. All
of this builds a reliable system, but at a great cost, due to the duplication of high-reliability
components.
The grid offers management of priorities among different projects. In the past, each project
may have been responsible for its own IT resource hardware and the expenses associated with
it. Often this hardware might be underutilized while another project finds itself in trouble,
needing more resources due to unexpected events. With the larger view a grid can offer, it
becomes easier to control and manage such situations.

Grid computing can be used in a variety of ways to address various kinds of application
requirements. Often, grids are categorized by the type of solutions that they best address. The
three primary types of grids are
• Computational grid
A computational grid is focused on setting aside resources specifically for computing power.
In this type of grid, most of the machines are high-performance servers.
• Scavenging grid
A scavenging grid is most commonly used with large numbers of desktop machines.
Machines are scavenged for available CPU cycles and other resources. Owners of the desktop
machines are usually given control over when their resources are available to participate in the
grid.
GRID COMPUTING

• Data grid
A data grid is responsible for housing and providing access to data across multiple
organizations. Users are not concerned with where this data is located as long as they have
access to the data. For example, you may have two universities doing life science research,
each with unique data. A data grid would allow them to share their data, manage the data, and
manage security issues such as who has access to what data. Another common distributed
computing model that is often associated with or confused with Grid computing is peer-topper
computing. In fact, some consider this is another form of Grid computing.

ARCHITECTURE:

Grid Architecture and Components


GRID COMPUTING

The components that are necessary to form a Grid (shown in Figure) are as follows:

• Grid fabric:
This consists of all the globally distributed resources that are accessible from anywhere on the
Internet. These resources could be computers (such as PCs or Symmetric Multi-Processors)
running a variety of operating systems (such as UNIX or Windows), storage devices,
databases, and special scientific instruments such as a radio telescope or particular heat
sensor.
• Core Grid middleware:
This offers core services such as remote process management, co-allocation of resources,
storage access, information registration and discovery, security, and aspects of Quality of
Service (Quos) such as resource reservation and trading.

• User-level Grid middleware:


This includes application development environments, programming tools and resource
brokers for managing resources and scheduling application tasks for execution on global
resources.

• Grid applications and portals:


Grid applications are typically developed using Grid-enabled languages and utilities such as
HPC++ or MPI. An example application, such as parameter simulation or a grand-challenge
problem, would require computational power, access to remote data sets, and may need to
interact with scientific instruments. Grid portals offer Web enabled application services,
where users can submit and collect results for their jobs on remote resources through the Web.

WORKING OF GRID COMPUTING:

GRID COMPONENTS:
Depending on the grid design and its expected use, some of these components may or may not
be required, and in some cases they may be combined to form a hybrid component.
GRID COMPUTING

Portal/user interface
Just as a consumer sees the power grid as a receptacle in the wall, a grid user should
not see all of the complexities of the computing grid. A grid portal provides the interface for a
user to launch applications that will use the resources and services provided by the grid. From
this perspective, the user sees the grid as a virtual computing resource just as the consumer of
power sees the receptacle as an. interface to a virtual generator.

Possible user view of a grid


Security:
A major requirement for Grid computing is security. At the base of any grid environment,
there must be mechanisms to provide security, including authentication, authorization, data
encryption, and so on. The Grid Security Infrastructure (GSI) component of the Globus
Toolkit provides robust security mechanisms. The GSI includes an OpenSSL implementation.
It also provides a single sign-on mechanism, so that once a user is authenticated, a proxy
certificate is created and used when performing actions within the grid. When designing your
grid environment, you may use the GSI sign-in to grant access to the portal, or you may have
your own security for the portal. The portal will then be responsible for signing in to the grid,
either using the user's credentials or using a generic set of credentials for all authorized users
of the portal.
GRID COMPUTING

Security in a grid environment

Broker:
Once authenticated, the user will be launching an application. Based on the application, and
possibly on other parameters provided by the user, the next step is to identify the available and
appropriate resources to use within the grid. This task could be carried out by a broker
function. Although there is no broker implementation provided by Globus, there is an LDAP-
based information service. This service is called the Grid Information Service (GIS), or more
commonly the Monitoring and Discovery Service (MDS). This service provides information
about the available resources within the grid and their status. A broker service could be
developed that utilizes MDS.

Broker service
GRID COMPUTING

Schedule:
Once the resources have been identified, the next logical step is to schedule the individual jobs
to run on them. If sets of stand-alone jobs are to be executed with no interdependencies, then a
specialized scheduler may not be required. However, if you want to reserve a specific resource
or ensure that different jobs within the application run concurrently (for instance, if they
require inter-process communication), then a job scheduler should be used to coordinate the
execution of the jobs. The Globus Toolkit does not include such a scheduler, but there are
several schedulers available that have been tested with and can be used in a Globus grid
environment. It should also be noted that there could be different levels of schedulers within a
grid environment. For instance, a cluster could be represented as a single resource. The cluster
may have its own scheduler to help manage the nodes it contains. A higher-level scheduler
(sometimes called a Meta scheduler) might be used to schedule work to be done on a cluster,
while the cluster's scheduler would handle the actual scheduling of work on the cluster's
individual nodes.

Scheduler

Data management:
If any data -- including application modules -- must be moved or made accessible to the nodes
where an application's jobs will execute, then there needs to be a secure and reliable method
for moving files and data to various nodes within the grid. The Globus Toolkit contains a data
management component that provides such services. This component, know as Grid Access to
GRID COMPUTING

Secondary Storage (GASS), includes facilities such as Grid FTP. Grid FTP is built on top of
the standard FTP protocol, but adds additional functions and utilizes the GSI for user
authentication and authorization. Therefore, once a user has an authenticated proxy certificate,
he can use the Grid FTP facility to move files without having to go through a login process to
every node involved. This facility provides third-party file transfer so that one node can
initiate a file transfer between two other nodes.

Data management

Job and resource management:


With all the other facilities we have just discussed in place, we now get to the core set of
services that help perform actual work in a grid environment. The Grid Resource Allocation
Manager (GRAM) provides the services to actually launch a job on a particular resource,
check its status, and retrieve its results when it is complete.
GRID COMPUTING

Gram

Job flow in a grid environment:


Enabling an application for a grid environment, it is important to keep in mind these
components and how they relate and interact with one another. Depending on your grid
implementation and application requirements, there are many ways in which these pieces can
be put together to create a solution.

Standards:

OGSA:

The Open Grid Services Architecture (OGSA) represents an evolution towards a Grid system
architecture based on Web services concepts and technologies. Since the release of the Globus
Toolkit 3.0, the Globus Project offers an open source collection of Grid services that follow
OGSA architectural principles. The Globus Toolkit also offers a development environment for
producing new Grid services that follow OGSA principles. OGSA is a product of the Grid
GRID COMPUTING

community at large, and it has a major focal point in the Global Grid Forum (GGF). Members
of the Globus Alliance have made significant contributions to the development of OGSA.

OGSI:
Building on both Grid and Web services technologies, the Open Grid Services Infrastructure
(OGSI) defines mechanisms for creating, managing, and exchanging information among
entities called Grid services. Succinctly, a Grid service is a Web service that conforms to a set
of conventions (interfaces and behaviors) that define how a client interacts with a Grid
service. These conventions, and other OGSI mechanisms associated with Grid service creation
and discovery, provide for the controlled, fault-resilient, and secure management of the
distributed and often long-lived state that is commonly required in advanced distributed
applications. In a separate document, we have presented in detail the motivation,
requirements, structure, and applications that underlie OGSI.

GSI:
GSI provides elements for secure authentication and communication in a grid. The
infrastructure is based on the SSL protocol (Secure Socket Layer), public key encryption, and
x.509 certificates. For a single sign-on, Globus add some extensions on GSI. It is based on the
Generic Security Service API, which is a standard API promoted by the Internet Engineering
Task Force (IETF).
These are the main functions implemented by GSI:
_ Single/mutual authentication
_ Confidential communication
_ Authorization
_ Delegation

APPLICATIONS:
There are different types of applications.
GRID COMPUTING

• Application as a whole can be broken up or split into several tasks which can be
performed individually. Such applications can run efficiently in the grid environment
where each task will be send to each machine to execute in parallel execution which
leads to efficient performance.
• Applications in which no task can be split as an independent tasks. All tasks are
dependent on some task or the other. So the parallel capacities of the grid are not
exploited.
• Applications in which some tasks are dependent and some are independent. The
independent tasks can exploit the parallel capacities of grid whereas others can
efficiently use other grid nodes for their processing .The applications must satisfy the
grid protocols and standards.
Any organization having a network infrastructure and the following profile can implement
grid computing
• There are many servers in the organization and they need to be consolidated.
• Applications in the organization are storage intensive.
• Applications are taking too much time to finish.
• Applications are CPU intensive.
• The organization is distributed geographically.

Some examples of grid computing are:

1) Simulation:
Grid computing is used in many aerospace industries for simulation purpose.
Grid computing is used in such industries for
• Performing simulations
• Calculating design problems.
The cost of processing is reduced in such industries due to grids which previously took days
and months for calculation
In the automotive industry, the applications are
• Finite element analysis
GRID COMPUTING

• Computational fluid dynamics


• DOE(design of experiments)
• MCAD(mechanical computer aided design)
Web interfaces are required in aerospace and automotive applications.

2) Entertainment
The need for grid computing in felt from the use of multiplayer gaming applications. Online
gaming is an example of computational grid

3) Digital cancer imaging


An emerging area where grid computing is used
• Medical record of the patients may be geographically placed apart. So it is easy with
grid computing to manage all the files and patient records. Files may have data like
MRI, CT scan, ultrasounds, Mammograms.
The access to local hospitals is transparent and fast due to grids

4) Spreadsheet
This is an example where personal computers are combined with applications that are
connected to LAN. This combination is being done with the help of grids.
The spreadsheet application today consists of thousands of rows and columns.
An application requiring larger space is split into multiple spreadsheets performs an operation
and returns the result.
Such a grid has following features
• Maintains application integrity
• Performs accurate calculations at faster state
• Complex tasks are split into subtasks
• Redundancy of data is controlled

ADVANTAGES:
GRID COMPUTING

Grids use a layer of middleware to communicate with and manipulate heterogeneous hardware
and data sets. In some field astronomy, for example hardware cannot reasonably be moved
and is prohibitively expensive to replicate on other instances databases vital to research
projects cannot be duplicated and transferred to other sites. Grids overcome these logistical
obstacles and open the tools of research to distant faculty and students. A grid might
coordinate scientific instruments in one country with a database in another and processors in a
third. From a user’s perspective, these resources function as a single system—differences In
platform and location become invisible.
On a typical college or university campus, many computers sit idle much of the time. A grid
can provide significant processing power for users with extraordinary needs. Animation
software, for instance, which is used by students in the arts, architecture, and other
departments, eats up vast amounts of processor capacity. An industrial design class might use
resource-intensive software to render highly detailed three-dimensional images. In both cases,
a campus grid slashes the amount of time it takes students to work with these applications. All
of this happens not from additional capacity but through the efficient use of existing power.
Grids make research projects possible that formerly were impractical or unfeasible due to the
physical location of vital resources. Using a grid, researchers in Great Britain, for example,
can conduct research that relies on databases across Europe, instrumentation in Japan, and
computational power in the United States. Making resources available in this way exposes
students to the tools of the profession, facilitating new possibilities for research and
instruction, particularly at the undergraduate level. Although speeds and capacities of
processors continue to increase, resource-intensive applications are proliferating as well. At
many institutions, certain campus users face ongoing shortages of computational power, even
as large numbers of computers are underused. With grids, programs previously hindered by
constraints on computing power become possible.
• Virtualized Sharing of Resources
• Secure reliable access to Resources
• Autonomic management of Resources
• Proper Utilization of Resources
• Fast Computation (nearly achievable to Super Computing by Parallel Computing)
GRID COMPUTING

• Virtually a very Large Capacity


• Economic
• No need of nodes homogeneity

CHALLENGES OF GRID COMPUTING:

Being able to access distant IT assets—and have them function seamlessly with tools on
different platforms—can be a boon to researchers, but it presents real security concerns to
organizations responsible for those resources. An institution that makes its IT assets available
to researchers or students on other campuses and in other countries must be confident that its
involvement does not expose those assets to unnecessary risks. Similarly, directors of research
projects will be reluctant to take advantage of the opportunities of a grid without assurances
that the integrity of the project, its data, and its participants will be protected. Another
challenge facing grids is the complexity in building middleware structures that can knit
together collections of resources to work as a unit across network connections that often span
oceans and continents. Scheduling the availability of IT resources connected to a grid can also
present new challenges to organizations that manage those resources. Increasing
standardization of protocols addresses some of the difficulty in creating smoothly functioning

The grid is not a silver bullet that can take any application and run it a 1000 times faster
without the need for buying any more machines or software. Not every application is suitable
or enabled for running on a grid. Some kinds of applications simply cannot be parallelized.
For others, it can take a large amount of work to modify them to achieve faster throughput.
The configuration of a grid can greatly affect the performance, reliability, and security of an
organization's computing infrastructure. For all of these reasons, it is important for us to
understand how far the grid has evolved today and which features are coming tomorrow or in
the distant future.
Some of the challenges are stated as follows:
 Non-determinism
 Infrastructure dependencies
GRID COMPUTING

 Distributed and partial failures


 Time-outs
 Dynamic nature of the structure
 Multiple heterogeneous platforms

CONCLUSION:

Grid computing introduces a new concept to IT infrastructures because it supports distributed


computing over a network of heterogeneous resources and is enabled by open standards. Grid
computing works to optimize underutilized resources, decrease capital expenditures, and
reduce the total cost of ownership. This solution extends beyond data processing and into
information management as well. Information in this context covers data in databases, files,
and storage devices. The Grid is analogous to the electricity (power) Grid and the vision are
to offer (almost) dependable, consistent, pervasive, and inexpensive access to resources
irrespective of their location for physical existence and their location for access. There are
currently a large number of projects and a diverse range of new and emerging Grid
developmental approaches being pursued. These systems range from Grid frameworks to
application test beds, and from collaborative environments to batch submission mechanisms.
It is difficult to predict the future in a field such as information technology where the
technological advances are moving very rapidly. Hence, it is not an easy task to forecast what
will become the ‘dominant’ Grid approach. Windows of opportunity for ideas and products
seem to open and close in the ‘blink of an eye’. However, some trends are evident. One of
those is growing interest in the use of Java and Web services for network computing.

You might also like