You are on page 1of 2

Enterprise Systems Management Concepts

History of ESM
ESM as a discipline has developed as a result of the widespread migration from
centralised mainframe and/or midrange computers to smaller distributed computers
that has taken place throughout the IT industry over the past 10 to 15 years. ESM
aims to solve one of the major difficulties associated with an IT infrastructure
comprising large numbers of small computers distributed over a network, namely
that these are inherently more difficult to manage than a smaller number of larger
computers. As well as the computer and network hardware and operating systems,
applications which run on the computers and which often span multiple computers,
networks and geographical locations must also be managed. Many modern
distributed computing infrastructures are highly complex; as a result, management
of these has become a major challenge.
ESM as a recognised discipline has been around for about the last 10 years, and
originated in the networking arena where the problems of remotely monitoring and
administrating multiple devices were encountered before the big rush toward
distributed computing. A special management communications protocol (SNMP simple network management protocol) was developed as part of the overall
protocol suite (TCP/IP) which allowed network devices from different
manufacturers to exchange data with each other. As the computers which were
attached to the network themselves became smaller, more numerous and more
widely dispersed, network management techniques were extended into the server
arena. Although SNMP is still used for the vast majority of network management
applications, and remains the only globally-supported standard for server and
application monitoring and management.

Basic Principles of ESM


The most basic challenge that any ESM solution aims to solve is monitoring the
status of all components of the IT infrastructure and, ideally, the applications that
run within in. Like any machine, computers need to be checked periodically to
ensure that they are functioning optimally. Consider a scenario where a single

systems administrator (SA) is available to monitor all the computers in a


companys IT infrastructure. It takes the SA five minutes to log on to each
computer and perform a basic set of health checks. It is not unusual for large
companies to have hundreds or even thousands of computers in their datacentres; if
it takes 5 minutes to check each one of these, every 100 computers would require
one SA simply to perform basic daily healthchecks for a full 8-hour working day.
Clearly this is not efficient use of resources.
The solution to this problem is to install agent software on each computer and
have this perform the healthchecks instead, freeing up the SAs time to perform
more useful activities. A further advantage is that the agent can be configured to
carry out healthchecks at frequent regular intervals; in practice, checking computer
once a day is simply not sufficient in cases where business critical functions are
supported, computers need to be checked at intervals of five minutes or less.
Although each agent software license may cost a few hundred pounds, this will
quickly be recouped in time savings for SAs time which may cost anything up to
500 per day.
If an agent detects a problem, it can be configured to generate an alert message in
order to let the SA know that there is something he or she needs to take a look at
and fix. This is known as reactive alerting. Ideally, if an agent could detect that a
problem was about to happen and send an alert to the SA, allowing preventative
measures to be taken to stop it happening, this would be a better state of affairs as
there would be no interruption to service. This kind of proactive monitoring is
typically what is aimed for by ESM system designers.
Another kind of action which can be taken by a monitoring agent when an actual or
impending fault is detected is to initiate a corrective action to prevent or resolve
the issue. This is only usually possible where the detected fault (actual or
impending) is a recognised one for which a standard fix is available. In such a case,
the agent may be configured to execute a command or script to apply the fix. A
notification may still be sent to the relevant support staff, however in this case it
informs them that a known problem had occurred and has been automatically
resolved no further action is required, although it may be desirable to investigate
further to prevent the problem from re-occurring.

You might also like