You are on page 1of 40

CIM4280

VMware vCenter Operations in the Real World


Martin Klaus, Group Manager, VMware, Inc.

Disclaimer

This session may contain product features that are


currently under development.

This session/overview of the new technology represents


no commitment from VMware to deliver these features in any generally available product.

Features are subject to change, and must not be included in


contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features
discussed or presented have not been determined.

Key Takeaways from this Session

Hear from customers who have implemented vCenter Operations:


Three different environments, three different use cases Results and improvements over status quo Future direction and next steps

Each customer will present their story followed by Q&A


Please hold your questions for the Q&A

This is not a product deep-dive session.


Attend labs and vCenter Operations sessions at VMworld

Panelists
Fletcher Cocquyt, Stanford School of Medicine Brian Mack, Maximus Ian Dodd, Kaiser Permanente

VMware vCenter Operations Editions


vCenter Operations Enterprise vCenter Operations Advanced vCenter Operations Standard
Performance Analytics Capacity Monitoring vSphere Change Events + Full Capacity Planning + VM Right-sizing + Capacity Reports + Performance Analytics for external performance data (adapters for DB, storage, etc. monitoring) + Customizable Dashboards + Full Configuration & Compliance Management

vSphere

VMware Cloud / vCenter

Non-VMware (incl. physical) environments

CIM4280

vCenter Operations in the Real World


Fletcher Cocquyt, Principal Engineer, VCP, vExpert, Stanford University, School of Medicine

Stanford School of Medicine How does vCenter Operations address our real world VI monitoring requirements? Virtual Infrastructure:
Servers: 310 VMs on 21 ESXi hosts Storage: 20Tb NFS datastores replicated on campus, DR site Networking: 10Gb ESXi upgrades are 75% complete

Metrics:
25830 metrics monitored by static thresholds with Zabbix, cacti, big brother
7

Agenda Virtual Infrastructure (VI) comes with unique requirements in terms of monitoring Challenges posed when monitoring a consolidated VI How vCenter Operations addresses these for us Examples of issues vCenter Operations identified and helped us resolve
Confidential. Not for distribution. 8

Problem with Static Thresholds:


Require manual configuration Using Templates assumes one threshold fits all false positives ->

Tuning can be done, but requires domain knowledge and large investment of time -> over tuned missed alerts Misses inefficiencies and transient issues which can be serious Problems are amplified in a consolidated environment No intelligence to filter the signal from the noise

Information Overload/Data inundation


Of those 25830 metrics which are of actual interest to the sysadmin at any one moment (where is the signal in the noise)?
9

Problem: Finding Signal is Non-Trivial


There exist algorithms to detect anomalies in time series Holt-Winters detects aberrant behavior interesting but not necessarily actionable Lacks the holistic approach to provided by derivative metrics like health and workload

10

Enter: vCenter Operations


Incorporating: Derivative Metrics and Dynamic Thresholds Why are Derivative Metrics significant? Aggregate 1st level metrics (CPU, Memory, IO) Reduce false positive alerts from 1st level metric solutions Provide intelligence to filter noise and only present signal Sysadmin may still drill down through derivative high level metric to underlying root cause in short order Allow for Alerting based on dynamic thresholds without need for tuning based on domain knowledge

11

Dynamic Thresholds
Obviate the requirement for domain knowledge and large investment of time tuning signal from noise Work with derivative metrics to virtually eliminate false positives Detect inefficiencies and transient conditions missed by static thresholds Provides a holistic context/perspective for the sysadmin monitoring the virtual infrastructure By incorporating derived metrics and dynamic thresholds, vCenter Operations presents a drop in solution to provide immediate impact and value detecting issues across the consolidated infrastructure Some real world examples showing the actionable value

12

CPU Anomaly: SAS Application

13

Memory Contention: Mail Server

VMworld 2011

14

Memory Leak: Puppet daemon

VMworld 2011

15

CPU, Network and Disk IO


vCenter Operations correlates CPU, Network, Disk IO Storm due to simultaneous IDS scans

CPU

Network

Disk IO
VMworld 2011 16

VM IO Contention: Storage vMotion

VMworld 2011

17

Conclusions and Future Directions


vCenter Operations is a drop in solution to immediately address real world monitoring requirements for consolidated virtual environments Before vCenter Operations our sysadmin team spent a huge amount of time tuning our monitoring to avoid false alarms After vCenter Operations we can successfully filter the noise, presenting only actionable signal to the busy sysadmin Will be essential as we deploy new initiates including virtual desktops and private clouds for departments Look forward to possible future integration with triggers for vCenter actions such as DRS, dynamic load balancing, chargeback, etc. Thank-you!

18

CIM4280

vCenter Operations in the Real World


Brian Mack, Manager, Systems Engineering, Maximus

VirtualEnvironment
Infrastructure
15ESXServers 300VirtualMachines 25TBsofFibreChanelstorageina3Tieredenvironment

MonitoringTools(pre vCenterOperations)
UptimeSoftware vFoglight vKernel CapacityIQ

20

Whatwasneeded?
Improvedcommunicationandreporting performancestatisticstoourinternaland externalclients. Improvedvisibilityforinternalengineerstoour virtualenvironment. Betterproblemsolvingtools.

21

vCenterOperations

22

MAXIMUSRealWorldIssue

23

MAXIMUSRealWorldIssue(continued)

24

vCenterOperationswithCapacityIQ

vCenterOperations+ CapacityIQ
BetterreportutilizationinCapacityIQ BetterRootCauseAnalysisforclients Healthmonitoring/Betterresourceutilization

25

ReturnonInvestment(ROI)
Wherearewenow?
WenowhavevCenterOperationsrunningwithproduction licensesfor250VMs. vCenterOperationshasbecomeourmaindashboardfor monitoringourproductionanddevelopmentvirtual environments. CostSavings
Nootherlicensesrequiredforotherpreviouslyusedtools Betterresourcemanagement Lesstimeandexpenserequiredfortroubleshootingclient problems ShorterturnaroundtimefornewVMdeployment

26

CIM4280

vCenter Operations in the Real World


Ian Dodd, Director, Service Delivery Management Infrastructure Management Group, Data Center Services, Kaiser Permanente

Enterprise Overview
8m Members 15,000 Servers Virtualize first policy 20Pb Storage Large Mainframe environment Enterprise Management Vendors
HP IBM VMware Oracle Quest MSFT In-House Developed
28

The Challenges
Multiple management tools and consoles Multiple approaches to performance management Inconsistent approach to threshold management High Ticket Volume due to threshold breaches Proactive approach is challenging Automation not fully exploited Metric Acquisition inconsistent Workload is ticket driven

29

Goals
Reduce consoles to two operational views:
Availability Performance

50% reduction in Static Thresholds Increased Fidelity & Specificity of alerts Apply consistent metric attributes across Enterprise Create dedicated approach to Performance Management

30

Approach
Acquisition
Apply best practices for enterprise wide metric attribute gathering

Availability
Create single pane of glass for availability monitoring on automated alerting

Performance
Establish Critical Application Support Team and tools focused on Performance management

Automation
Develop methodology for Availability and Performance Automation

Visualization
Develop and publish role based visualization for Enterprise Availability and Performance
31

Architecture
Remedy Availability Console Orchestration / Automation

vCOPS Web Client

vCOPS Web Client

vCOPS Web Client


vCOPS Critical Applications Support Team Performance Console

vCOPS Web Client

vCOPS Line Of Business Analytics

vCOPS Line Of Business Analytics

vCOPS Storage Analytics

vCOPS VI Analytics

32

Performance Environment Design

33

Performance Environment Design

34

Accomplishments
Performance & Availability consoles selected Comprehensive Best practice metrics defined across Enterprise Critical Application Support Team (CAST) established 1st vCOPS Line Of Business (LOB) environment in production Identification of Static thresholds to be removed in progress Decreased Problem Life Cycle in Production LOB Automation process established in CAST team Increased Fidelity and routing of alerts

35

Q&A
Fletcher Cocquyt, Stanford School of Medicine Brian Mack, Maximus Ian Dodd, Kaiser Permanente

36

vCenter Operations Sessions at VMworld

CIM2285 - Performance, Capacity and Configuration Management


Converge in the Cloud Steve Henning, Jai Malkani

CIM2449 - Introduction to vCenter Operations


Kit Colbert, Marcelo Rodriguez

CIM2452 - vCenter Operations Technical Deep Dive


Kit Colbert

CIM3218 - Performance Management in a Heterogeneous


Environment Dave Overbeek, David LaVigna

CIM2921 VC Ops - Best Practices for Managing Virtual Capacity


Hemaint Gaidhani, Rajesh Kasanagotu

EUC2045 View and Virtual Cloud Infrastructure Whats New


Robert Baesman, Jessy Schoss
37

Martin Klaus mklaus@vmware.com

38

CIM4280

vCenter Operations in the Real World

2011 VMware Inc. All rights reserved

You might also like