You are on page 1of 14

IT performance management with hp OpenView

white paper

why performance management? IT departments, whether they reside in enterprises or service providers, are facing increasing business pressures. Todays worldwide economy is forcing IT departments to be as cost-effective as possible, to make sure current IT resources are operating at maximum efficiency, and to ensure that future IT expenditures directly relate to business expense and revenue targets. Additionally, IT departments are becoming more service-oriented to their customers, whether the customers are internal or external to the organization. Customers want guarantees that they can access the IT services they need when they need them. If an IT service goes down, customers want that service up and running again as quickly as possible. Increasingly, customers are asking for service commitments in terms of contracts or service level agreements (SLAs). IT performance management plays a critical role in helping IT departments meet both their business objectives and customer needs for service delivery. This white paper, written for IT managers, discusses HPs strategy and capabilities for addressing the comprehensive needs of IT performance management. It describes the importance of the three primary needs of IT performance management: operational monitoring, problem diagnosis and resource optimization. It then provides an overview of the challenges in each area, the functionality that HP provides to address each area, and how these tools can be used together to achieve an organizations complete performance goals.

performance management ensures service delivery An IT service is the deployment of a specific IT resource to an end-user customer. A service can be access to a networked application, access to database information, Internet access or even email. It can be revenue generating, in the case of a service provider, or an internal charge that is incurred by a business unit or cost center. In order to make commitments for IT services, IT departments require a comprehensive set of tools to manage the service environment, ranging from provisioning specific services to assuring service delivery to tracking actual usage. As shown in Figure 1, performance management is a key component of service management. Through managing the performance of a service, enterprises and service providers can meet their commitments for the services availability, its response time and its recovery time should the service fail.

solution enablement
delivery

assurance customer experience service management

usage

performance

fault

network

system

application

storage

figure 1: performance management as a component of service management As the figure shows, comprehensive, or end-to-end, performance management requires tools to detail how a service is performing, from both the users perspective as well as from the perspective of the individual components of the underlying IT infrastructure. Tools representing the users perspective, which we call a top-down perspective, show how a service is performing by reflecting the experience of the end-user customer and monitoring whether the service is in compliance with service commitments. Additionally, understanding how the underlying IT infrastructure components, or elements, are performing sheds light on the overall service performance. IT infrastructure components include applications and their middleware, systems, networking and storage devices. IT operators can embed instrumentation within each component, then monitor the operation and performance of each component a process we call bottom-up performance management. With bottom-up performance management, IT operators can monitor and prevent performance problems with specific components from happening in the first place, again allowing greater conformance to service commitments. Figure 2 shows the relationship of customer-experience performance management and infrastructure-level performance management. A middle layer, transaction analysis, helps in determining where performance problems exist within the IT infrastructure once you detect a problem from the users perspective.

your customer or end-user end-user perspective customer experience management service level management transaction analysis event management event management reresource management source management

transaction perspective infrastructure perspective

your computing environment

figure 2: levels of performance management IT requirements for performance management IT departments, both in enterprises and service providers, are increasingly required to guarantee service availability and responsiveness to their customers, while at the same time reducing operating costs and capital expenditures. As technology evolves to web-based applications with layers of redundancy and load distribution, service brown-outs challenge traditional faultoriented management paradigms. User productivity is impacted more often as a result of service performance degradation than a total service outage. Enterprise Management Associates has recently reported that on average it takes a user 20 minutes to discover that an application is working again and to find the place where they left off when degradation occurred. Service brown-outs require you to develop the ability to detect service degradations before they are experienced by your customers, then quickly relate them to infrastructure problems. Additionally, you must understand how your current IT investments can be optimized for todays workload and leveraged for future expansion. Therefore, IT performance management achieves three goals for an organization: Minimizes lost business opportunities caused by poor availability and response time of IT services

Reduces the costs of lost user productivity and of locating and resolving degradations in the IT infrastructure Minimizes the inefficient allocation of capital caused by lack of knowledge of current and anticipated resource needs

The first two goals are often quantified in a service level commitment of some type, including formal service level agreements (SLAs), in clauses that relate to availability, response time and time to recovery. Regularly negotiated SLAs help IT to offer continuous improvement in service levels while guarding against expectation creep. The third goal is quantified through the process of budget allocation and purchasing regulations. As show in Figure 3, these three goals are achieved through the disciplines of operational monitoring, problem diagnosis and resource optimization.

service management
commit to service levels monitor compliance communicate to business

operational monitoring service level objectives customer experience infrastructure elements

one transaction
users/customers web servers application servers

database servers

many infrastructure components resource optimization utilization trends/baselines hot spots forecasting and planning

performance problem resolution bottleneck location problem diagnosis

figure 3: disciplines of IT performance management operational monitoring Operational monitoring can be defined in two ways: 1. Operational monitoring is the process of continually monitoring a service against availability and response time goals. In this case, operational monitoring measures the

customers experience with a particular service and the service response time that the user has. This top-down monitoring detects impending violations of service commitments. By integrating thresholds and alarms with contextually relevant performance management reporting, you can quickly identify IT problems and resolve them before service level objectives (SLOs) are violated. 2. Operational monitoring also monitors the critical performance parameters of the systems, networks and applications that support the service in order to catch problems before the service is impacted. This bottom-up monitoring examines components, such as LAN and WAN segments, application and system processes, log files, memory and disk resources, then determines the potential impact of failures and slow-downs on business services. Performance monitoring requires both levels of monitoring for optimizing service delivery. Monitoring specific infrastructure components alone does not provide an accurate picture of what the end user experiences for service delivery. You cannot make meaningful service level commitments without understanding how the overall service is performing. At the same time, monitoring the customer experience alone does not enable you to identify potential problems within the infrastructure. For example, many organizations are running web applications. Because web applications have a distributed architecture, top-down monitoring would not detect any performance problems if a mirrored disk went down. Reliability features such as automatic fail-over typically do not show up with top-down performance monitoring. By monitoring the application server, you would detect the faulty disk and solve the problem before delivery of the application to end users is affected. With operational monitoring, you can save money by eliminating lost opportunity costs from having a resource such as a mission-critical business application go down or become unavailable due to poor performance. Increased application availability means the business is up and running. problem resolution If a problem or SLO breach is detected through operational monitoring, you must isolate the problem so that the service can be restored as quickly as possible. Most services are delivered over a complex IT infrastructure that is difficult to troubleshoot. Poor service performance can result from delays in the network path, an overloaded server or a faulty application. Understanding and resolving the problem requires tools that can correlate the individual infrastructure components (network, system and application) to the performance of the service. By being able to drill down quickly to the infrastructure element at fault, you can resolve problems faster. With quick and accurate problem diagnosis tools, your IT department can: Shorten the time to resolve problems and get the service up and running in order to fulfill service level commitments Put infrastructure problems in the hands of the right IT resources to resolve them as quickly as possible and eliminate finger pointing Leverage your IT staff more efficiently and reduce the need for dedicated performance management specialists

resource optimization Resource optimization results from knowing how current capital investments are being used and answers questions such as: Is any equipment over- or under-utilized? How can I reallocate resources to be more balanced? Is there enough capacity for me to offer new services without having to incur more capital expense?

Resource optimization tools must be able to monitor resource consumption at all levels. They need to identify baselines for resource utilization and identify trends that affect utilization. Additionally, they need to forecast future resource utilization. Linear forecasting based on past usage patterns can be achieved relatively simply. What-if analysis to determine the impact of server consolidation, improved bandwidth, right-sizing or increased workloads is more complex. By understanding how current resources are utilized, businesses can improve their current service performance before problems are detected and provision new services to customers quickly and efficiently. Most organizations are fire fighting today: they are trying to solve problems as problems arise. Resource optimization tools help you get ahead of the curve and become more proactive in managing existing services and offering new services to your customers. IT organizational models for performance management

Role
Operators and Help Desk

Job Measure
MTTR Time to resolve problems Accuracy of escalations

Business Key Challenges Objectives


Maximize service availability Maximize service availability Prioritizing problems Excessive noise Blame culture and finger-pointing Effective analysis of voluminous data Vendor and OS differences Information from operations to help characterize problem Difficulty in reproducing problems in the lab Performance engineering compromised to meet deadlines Guarding against expectation creep Managing outsourced services Multi-tiered SLAs

reactive

Performance Expert Time to resolve problems (network engineer, system admin, application manager) Application Developer Assist application performance expert to resolve problems Provide bug fix and rerelease application Service levels achieved for deployment times, availability, reliability, recovery. Efficiency of accounting policies and cost allocation Minimize server and network underutilization Predicting need for bandwidth and server upgrades

Minimize app maintenance Develop manageable and supportable applications Maximize customer satisfaction and loyalty Demonstrate value to LOBs Reduce TCO of services Maximize return from infrastructure investments

proactive

IT Service Level Manager

Capacity Planner / Strategic Architect (network planner, system planner, service planner)

System simulations which can closely model real life Highly complex Special expertise required

figure 4 roles in IT performance management

IT organizations require both reactive and proactive internal processes to manage performance effectively. Reactive processes are established around event-based availability and performance management. These processes can be triggered automatically (software alarm) or manually (help desk request). Proactive processes are established around capacity and service level management and performance engineering. These processes involve taking control of IT infrastructure provisioning and managing user expectations. The complexity of resolving performance degradations and the expertise required most often demands an approach to problem assignment based on core competencies. By its very nature distributed performance management drives organizational silos. Forward-thinking organizations mitigate this problem by defining central problem-handling processes with the aim of resolving as many problems at level 1 as possible to drive down support costs. Level 1 operations can handle problems from across all technology domains: networks, system, application and storage. However, when problems are escalated to level 2 or level 3 specialists technology silos become inevitable. Integrated network fault and performance is typically one silo. Integrated availability and performance management for systems and applications is typically another. True cross-domain problem-solving is often a group exercise hampered by a lack of consolidated diagnostic data. Service level management by definition demands a holistic cross-domain approach. Effective capacity management also demands a holistic approach across the IT infrastructure as well as a keen awareness of business and workload characteristics. Due to the mix of competencies needed capacity management is often also performed within technology silos. The figure lists the key roles in performance management. Best practices such as the IT Infrastructure Library (http://www.itil.org) or Six Sigma (http://www.sixsigmaforum.com) offer excellent guidance on setting up effective IT processes and roles. HP OpenViews performance management strategy Our HP OpenView performance management strategy is to enable our customers to deliver services with predefined, committed levels of service at a minimum cost. While continuing to enhance our traditional performance management products, we are delivering enhancements and new products so that you can have complete, end-to-end service performance management. HP OpenView performance management tool enable IT organizations to: Monitor the availability and performance of IT services from the customers perspective Quickly diagnose performance problems with systems and networks and isolate the cause of performance problems in todays web application infrastructure Monitor performance and usage trends and forecast resource usage across the complete IT infrastructure

The following sections describe our HP OpenView solutions in more detail and show how these management tools work together to assure service levels.

operational monitoring with HP OpenView HP OpenView provides performance monitoring for the complete IT infrastructure. Our performance monitoring products conduct top-down, customer experience monitoring as well as infrastructure-level monitoring. This combination allows IT operators to avoid slow downs with service performance by identifying potential infrastructure problems that may affect service delivery, and to detect and isolate problems as quickly as possible when SLO failures occur. We provide top-down customer experience monitoring through HP OpenView Internet Services and HP OpenView Web Transaction Observer. Designed to model the customers experience in using todays web applications, HP OpenView Internet Services uses synthetic transactions to simulate and monitor the performance of a web-based service. It also uses aggregated data from real transactions measured at the web server by HP OpenView Transaction Analyzer to monitor the performance of a web-based service. Service alarms notify when service performance objectives have been breached. HP OpenView Internet Services can be customized to extend to non-web applications as well. HP OpenView Web Transaction Observer measures the real end-to-end response time of a web application from the client web browser. It also includes service alarms for notifying operators via HP OpenView Operations of SLO breaches. If a problem is detected when monitoring the customers experience in using a service, web page usage and clickthrough analysis can determine the impact on customer behavior, and optional data from Keynotes subscription services can isolate problems beyond the firewall within particular metropolitan areas of ISPs. We provide infrastructure-level monitoring through HP OpenView Operations: o basic network performance monitoring is included with HP OpenView Network Node Manager. By setting simple alarms for network devices, operators can monitor how their network is performing. Integration with HP OpenView Performance Insight expands upon these capabilities to identify whether a problem resulted from a real-time fault or from a configuration change made some time ago. for basic system performance monitoring, HP OpenView Operations agents come with an embedded performance component. This component collects global data such as CPU utilization or free disk space to identify whether a problem exists with a single system. More detailed diagnostic data from the HP OpenView Performance Agent, such as process-level or file-system metrics is required for in-depth analysis. for application performance monitoring, HP works with the leading application vendors, such as Oracle, IBM, Microsoft, SAP and BEA to develop HP OpenView Smart Plug-Ins that collect data on critical application metrics. This data, such as throughput, request times and cache hit ratios, is then fed into the HP OpenView Operations agent for thresholding analysis and the HP OpenView Performance Agent for in-depth diagnosis and persistent storage. storage performance monitoring is available through the Storage Optimizer product included with HP OpenView Storage Area Manager. HP OpenView Operations maintains the relationships between individual infrastructure components and the IT services that rely on them within Service Views. Using this critical information, top-down and infrastructure-level alarms

can be prioritized based on the IT service or customer that will be impacted by a failed or slow component. Ideally, this allows you to identify and rectify an issue before users see any performance bottlenecks. With the exception of the Smart Plug-Ins which depend upon HP OpenView Operations, most of these monitoring products can run standalone. You get the greatest value, however, when the products are integrated. HP OpenView Operations is the primary console for collecting and processing alarms and for launching other management tools. Thorough operational monitoring is the starting point for reliably managing service performance. Because HP OpenView collects such a rich set of data from operational monitoring, you can be much more efficient at isolating and addressing service delivery problems as they arise. problem resolution with HP OpenView Once a problem or SLO breach is detected through monitoring, the first task is to locate the problem. The challenge is in understanding which part of the IT infrastructure is at fault. HP OpenView offers a variety of tools that isolate and detail problems with service performance. HP OpenView Transaction Analyzer bridges the gap from top-down service monitoring to component-level diagnosis for web applications based upon the Java 2 Enterprise Edition (J2EE) or Microsofts Distributed interNet Architecture (DNA). Once HP OpenView Internet Services detects an SLO breach for a web application, HP OpenView Transaction Analyzer further isolates the problem, showing the operator where a problem may be occurring within the infrastructure, whether the problem is with the application, the database, the server or the network. Without the capabilities of HP OpenView Transaction Analyzer, IT organizations can take hours and even days to investigate a performance problem and determine where detailed troubleshooting is required. A common side effect is that IT specialists attempt to prove that their area of the infrastructure is not at fault, and a culture of blame results. HP OpenView Transaction Analyzer automatically discovers and tracks each transaction, highlights where a transaction spends the most time and singles out any constituent response times that exceed performance baselines. Problem areas are quickly identified, eliminating guesswork and blame. Once a problem is located, the operator can launch one of several diagnostic tools, depending upon where the problem occurs in the IT infrastructure. Figure 5 shows how HP OpenView products can help you get to the root cause of performance problems.

Internet Services

end-toend response

Transaction Analyzer

internet

web server

app server

database

network segmen t

legacy app

ISP

desktop

firewall

web page

DNS DHCP

JSP ASP

O/S

EJB COM

SQL

path

main frame

Keynote

3rd party Performanc e

Web Transaction Transaction Internet Analyzer Observer Services


proces

Transaction Analyzer Transaction Analyzer


switch router

3rd party

Performance Manager GlancePlus

Problem Diagnosis

figure 5: getting to the root cause of performance problems with HP OpenView products HP OpenView capabilities for diagnosing application problems: For real-time web application problems, HP OpenView Transaction Analyzer can also drill down to highlight poorly performing sub-transactions such as Java Beans or COM objects. For custom applications, HP was a co-founder and continues to support the Open Group Application Response Measurement standard. This is an API which can be used by inhouse developers within custom applications to measure business transactions from the end-user perspective, as well as measure contributing components of response time in distributed applications. HP has implemented the ARM API in the HP OpenView Performance Agent and in GlancePlus across all major system platforms. HP OpenView Internet Services, Web Transaction Observer and Transaction Analyzer also leverage and extend the ARM capability internally to provide non-intrusive end-to-end transaction management. For diagnosing and solving application-specific problems that have recently occurred (or near-time problems), HP OpenView Performance Manager graphs the application data that is collected during performance monitoring by the application-specific HP OpenView Smart Plug-Ins. For historical application problems, HP OpenView Reporter produces reports that display the performance data.

10

All of the major applications include vendor-specific diagnostic tools that can be used with HP OpenView products to further isolate and solve application performance-related problems. An example is the Oracle Enterprise Manager Diagnostic Pack designed for troubleshooting problems with the Oracle database which can be launched from the HP OpenView Operator console and is integrated via the (gratis) HP OpenView Smart PlugIn for Oracle Enterprise Manager.

HP OpenView capabilities for diagnosing system problems: For diagnosing problems with system performance, the java-based HP OpenView Performance Manager, which has superseded the old PerfView Analyzer, provides the visualization or graphic representation of the data collected by the HP OpenView Performance Agent or the embedded component of the HP OpenView Operations agent. The HP OpenView Performance Agent collects a comprehensive set of more than 600 metrics, including metrics specific to the individual processes that are running on a system and their individual utilization rates. The HP OpenView Performance Agent supports multiple system nodes, and its patented application grouping provides the ability to aggregate system metrics to the application level. For example, operators can see how one applications CPU utilization compares to other applications. The HP OpenView Performance Agent also performs highly efficient data collection, because it logs only interesting processes, such as processes that consume certain amounts of CPU or disk I/O, configurable by the user. This patented feature can mean a 20-to-1 reduction in overhead and the amount of data that is collected. For a real-time analysis of system performance on a single system node, OpenView provides GlancePlus. GlancePlus drills down into the processes that are running on a specific system.

For network performance diagnosis, HP OpenView Problem Diagnosis performs network path analysis, showing all possible paths in a best-effort network and a hop-by-hop latency analysis. You can launch and refresh all of these infrastructure-level diagnostic tools from the HP OpenView Operations service console. Single-click alarm-based context-setting enables IT performance experts to identify and solve service-level problems quickly, regardless of where they are in the IT infrastructure, before they affect service commitments to customers. resource optimization with HP OpenView Through optimizing IT resources, you can get more from your IT investments, and you can address problems before problems impact service. HP OpenViews resource optimization capabilities enable you to: See which IT infrastructure components are over- or under-utilized Identify usage trends and forecast usage for infrastructure components Understand how infrastructure utilization and trends affect service performance, which infrastructure resources directly affect the service and what modifications should be made to optimize service delivery

11

HP OpenView optimization products analyze the rich set of performance data that is collected from all of the key infrastructure areas and generate reports that can be used as part of your budgeting and procurement process. We offer the following capabilities for resource optimization: HP OpenView Reporter provides unified reporting for system and application data collected by the HP OpenView Performance Agent, HP OpenView Internet Services and HP OpenView Smart Plug-Ins. For optimizing network resources, HP OpenView Performance Insight for Networks provides detailed reports that show over- and under-utilization of network resources, trending data and forecasting for network resources. 30, 60 and 90 day linear forecasts are made based on the previous 90 days activity and for metrics with thresholds a predication is made as to when the threshold will be exceeded. For more advanced capacity planning, HP OpenView supports third-party tools that offer sophisticated analysis of HP OpenView data. Tools from SAS perform statistical analysis, while tools from OPNET and Hyperformix offer end-to-end analytic modeling or discrete simulation to enable what-if scenario planning.

The resource optimization capabilities from HP OpenView and its solution partners offer several advantages over other performance optimization products: Efficient data collection: data is collected without impacting the performance of the service Grouping capabilities: reports can group system or network metrics by application service to get a complete picture of the application Cross-vendor reporting: in addition to generating reports on a particular vendors product, reports can reflect the entire environment or budgetary perspective End-to-end capacity planning of all the assets required to provision a service from the wide area network, to the web server farm to the back-end servers and storage devices.

All of these capabilities show you how to allocate cash flow most efficiently.

12

future directions By addressing the three areas of performance management, HP OpenView helps you manage your services more effectively. You can model the availability and performance of a service from your customers viewpoint. If a problem or SLO breach is detected, you can follow the path and further diagnose where the problem exists. Then, you can use the appropriate infrastructure tool to drill down to the specific problem and solve it. With the addition of resource optimization products, HP OpenView and solution partners can provide you with the tools you need for maximizing your IT investments. We are making investments in our performance management technologies in two dimensions: we are adding new capabilities at the same time as reducing the total cost of ownership of our solution. We plan to add capabilities in three areas: Broaden our solution to monitor and analyze performance data from more network devices, system platforms and business applications: o Extend out-of-the-box Report Pack coverage in OVPI to cover Cisco Powered Networks, RMON, system and application monitoring, MPLS/VPN, VLAN, VoIP and IPSec encryption. Extend system performance monitoring and diagnostics to manage RedHat, Turbo, Debian and SuSE Linux. Extend availability and response time monitoring and diagnostics to business applications such as SAP and Exchange and key Microsoft .NET services and servers such as ASP.Net and BizTalk Improve the ease of extending OVPI to monitor additional devices and OVIS to monitor additional applications

o o

Tighten the integration between top-down monitoring tools and infrastructure instrumentation in order to reduce mean time to recovery and locate the cause of service level breaches quickly or even automatically: o Extend HP OpenViews intelligent network diagnostics beyond rules and topology based event reduction, correlation and targeted polling to include network performance data as part of the root cause analysis process Extend the HP OpenView Transaction Analyzer with key metrics extracted from the HP OpenView Performance Agent to facilitate faster root cause determination once a component in the infrastructure is isolated Semi-automate the correlation of component bottleneck identification to SLO response time breeches to further speed up the triage process when managing complex distributed web applications

Link performance management seamlessly with service level management: o Leverage the OVPI technology to provide consolidated reporting for IT Service Level Managers across networks, systems and applications o Integrate the functions of HP OpenView TeMIP ServiceCenter and HP OpenView Internet Services to create a robust SLM solution for telcos o Leverage HP OpenView Service Desks SLM capabilities and the integration between Service Desk and OpenView Internet Services to create a robust SLM solution for enterprises

13

We plan to reduce the total cost of ownership of our solution along two vectors: integrated fault and performance: tightly integrate NNM and OVPI through common polling, thresholding, real-time graphing, database and administration interface enable central deployment of the OVPA agent from OVO to all supported operating systems, build a mechanism for central configuration and control of OVPA and extend OVPM to UNIX empower operators within OVOW with the capability that exists in OVO/Unix today to launch and position OVIS in context of an alarm and perform impact and root cause analysis from the service map

and end-to-end performance and reporting: leverage the OVPI technology to provide generalized reporting for all products in the OpenView portfolio and consolidated reporting across networks, systems, applications and storage devices leverage OVIS to alarm and report on availability and performance of real transactions measured by OVTA simplify the number of user interfaces and homogenize the look and feel of performance product UIs by using common UI components extend operating system coverage and simplify product configuration and maintenance of performance products by using common measurement service and database components build out-of-the-box integrations with 3rd party solutions in the areas of load scripting and capacity planning.
Hewlett-Packard Company 2002. All Rights Reserved. Reproduction, adaptation or translation without prior written permission is prohibited except as allowed under the copyright laws. November 2002.

14

You might also like