You are on page 1of 8

. . : : IQMS : : . .

Page 1 of 8

IQMS | Quality Objectives | FAQs | CBTs | Certificates | IQMS Policies Processes Procedures Work Instructions/Guidelines Glossary / Definition

Availability Management
File Identifier Effective Date Version Information Classification PR_MIT_AV 20 Dec 2008 0.5 Internal;Restricted

1. Goals and Objectives The Goals and objectives of Availability Management Process are: To define, analyze, plan, measure and improve all aspects of the availability of IT services. Availability Management is responsible for ensuring that all IT infrastructure, processes, tools, roles etc are appropriate for the agreed availability targets. Produce and maintain an appropriate and up-to-date availability Plan that reflects the current and future needs of the business Provide advice and guidance to all other areas of the business and IT on all availability-related issues Ensure that service availability achievements meet or exceed all their agreed targets, by managing services and resources-related availability performance Assist with the diagnosis and resolution of availability related incidents and problems Assess the impact of changes on the Availability Plan and the performance and capacity of all services and resources Ensure that proactive measures to improve the availability of services are implemented wherever it is cost-justifiable to do so. 2. Scope Availability Management is concerned with the design, implementation, measurement and management of IT Infrastructure Availability and Application availability, to ensure the agreed business requirements for Availability are consistently met. Availability Management: Should be applied to new IT Services and for existing services, where Service Level Agreements (SLAs) have been established with internal or external suppliers. Appropriately include those Services deemed to be business critical e.g. power supply (UPS), links in Network monitoring even if formal SLAs does not exist. Should include the suppliers (internal and external) that form the IT support organisation as a precursor to the creation of a formal SLA. Includes all aspects of the IT Infrastructure and supporting organisation which may impact Availability. This also includes appropriate aspects such as training, skills, policy, process effectiveness, procedures and tools. 3. Entry criteria and inputs: The Availability requirements of the business for either a new or enhanced IT Service The Availability, reliability and maintainability requirements for the IT Infrastructure components that underpin the IT Service(s)

http://fm.wipro.co.in/IQMS_FS/Pages_IMS/Procedures/Service_Design/Availability/Availability_Management... 9/21/2009

. . : : IQMS : : . .

Page 2 of 8

Data on achievement of Service Level vis- -vis agreed targets for each IT Service that has an agreed SLA Information from Incident and Problem records regarding failing Services or Components and frequency of failure of Services or Components. Monitoring and configuration data pertaining to agreed IT Services 4. Activity details 4.1 Identify Availability Requirements (M) Requirements needs to be identified based on the following inputsBusiness Requirements for Availability from Service Portfolio Management process Functional and Technical Specification from service design Service Catalogue SLA/SLR Service Reports SIP from CSI process 4.2 New Availability Requirement? Check if there is any new requirement for Availability from above step This can be done by verifying the availability requirement against SLA/Service catalog. (For example, any new Business requirement, change in existing services etc). In case any new requirement for availability is identified proceed to step 4.3 However, in case a new requirement for availability is NOT identified proceed to step 4.10 4.3 Verify SLA/Service Catalogue for new availability requirements Verify the SLA or Service catalogue for the following: Verify SLA/SC for current capability or availability provisioning abilities Understand if any new functions need to be covered under VBFs Understand the impact of new availability requirements on existing availability design. 4.4 Identify Vital Business Functions Identify if any new VBFs or business critical elements of the business process supported by the IT services are being added as a part of the availability requirements. 4.5 Design for Availability (M) Prepare technical design of IT service and align the internal and external suppliers required to meet the availability requirements of the business. Technical design will include infrastructure, environment, data and applications. One should conduct a Risk Assessment of the environment to ensure that all factors of the environment are considered as a part of the Availability Design. The design for Availability should also encompass the planned and preventative maintenance activities which enable the IT support organization to provide: Preventative maintenance to avoid failures Planned software or hardware upgrades to provide new functionality or additional capacity Business requested changes to the business applications

http://fm.wipro.co.in/IQMS_FS/Pages_IMS/Procedures/Service_Design/Availability/Availability_Management... 9/21/2009

. . : : IQMS : : . .

Page 3 of 8

Implementation of new technology and functionality for exploitation by the business The design for Availability should also produce the Projected Service Outage (PSO) document. This document consists of any variations from the service availability agreed within SLAs. This should be produced based on input from: The Change Schedule (Forward schedule of Changes) The Release Schedules Planned and preventative maintenance schedules Availability Testing Schedules ITSCM and Business Continuity Management testing schedules. The PSO document contains details of all scheduled and planned service downtime within the agreed service hours for all services. These documents should be agreed with all the appropriate areas and representatives of both the business and IT. (Refer to the Availability Design Guidelines document for more information) 4.6 Design for Recovery Prepare technical design of IT service to ensure that in the event of an IT service failure, the service and its supporting components can be reinstated to enable normal business operations to resume as quickly as possible. This can be done with the help of following methods: In order to remain effective, the maintainability of IT services and components should be monitored, and their impact on the expanded incident lifecycle understood, managed and improved. This can be achieved by one or more of the following methods: Component Failure Impact Analysis (CFIA) Single Point of Failure (SPoF) Analysis Fault Tree Analysis (FTA) Service Failure Analysis (SFA) Risk Analysis and Mgmt Wipro uses component Failure Impact Analysis (CFIA) as a standard methodology to design recovery models for availability of a service or component. (Refer to the Recovery Design Guidelines document for more information) (Refer to the Expanded Incident Lifecycle Guidelines document for more information) (Refer to Component Failure Impact Analysis (CFIA) Template for more information) 4.7 Test Availability Design (M) Once the design and the recovery criterion are Availability mechanisms needs to be tested in a regular and scheduled manner to ensure that they will be available when actually needed. Some availability mechanisms, such as 'load balancing', 'mirroring' and 'grid computing', are used in the provision of normal service on a day-by-day basis; others are used on a fail-over or manual reconfiguration basis. It is essential; therefore, that all availability mechanisms are tested in a regular and scheduled manner to ensure that when they are actually needed for real they work. This schedule needs to be maintained and widely circulated so that all areas are aware of its content and so that all other proposed activities can be synchronized with its content, such as: The change schedule Release plans and the release schedule All transition plans, projects and programs Planned and preventative maintenance schedules

http://fm.wipro.co.in/IQMS_FS/Pages_IMS/Procedures/Service_Design/Availability/Availability_Management... 9/21/2009

. . : : IQMS : : . .

Page 4 of 8

The schedule for testing IT service continuity and recovery plans Business plans and schedules. 4.8 Availability Test Successful In case availability test is successful proceed to step 4.15 However, in case test is unsuccessful proceed to step 4.9 4.9 Requirements to be gathered again In the event of failure of Availability tests, Check if the requirements are to be gathered again to ensure that the design for availability meets the requirements and therefore ensure that the tests are conducted successfully. If requirements are to be gathered, proceed to step 4.1 If requirements are not to be gathered, proceed to step 4.5 4.10 Monitor Availability as per defined matrices (M) Monitor the actual service and component availability delivered versus the agreed targets and store all monitoring related data in the AMIS for further analysis and reporting. 4.11 Analyze service and component availability (M) Based on the data gathered and stored on the monitoring of service and component availability in AMIS, a detailed analysis should be performed on the availability targets versus the actual performance. Investigation of the monitoring data should also bring out the contribution of events and incidents to the impact on SPoFs causing unavailability of services and components, with remedial actions being implemented within either the Availability plan or the overall SIP (Service Improvement Plan) Trends should be produced from this analysis to direct and focus activities such as Service Failure Analysis (SFA) to those areas causing the most impact or disruption to the business and the users. Trends related to the unavailability of a service or component should also focus on the cost of unavailability. The monetary value can be calculated as a combination of the tangible costs associated with failure, but can also include a number of intangible costs. The monetary value should also reflect the cost impact to the whole organization, i.e. the business and IT organization. Tangible costs can include: Lost user productivity, Lost IT staff productivity, Lost revenue, Overtime payments, Wasted goods and material, Imposed fines or penalty payments. Intangible costs can include: Loss of customers, Loss of customer goodwill (customer dissatisfaction), Loss of business opportunity (to sell, gain new customers or revenue, etc.), Damage to business reputation, Loss of confidence in IT service provider, Damage to staff morale. 4.12 Report service and component availability (M) The data gathered from AMIS on the monitoring of services or components are analyzed and mapped to the Key Performance Indicators of the Availability Management Process and reported accordingly to the identified stakeholders. 4.13 Any Trends identified in unavailability of service or component (M) From the reports that are generated, check if there are any trends identified in the unavailability of the

http://fm.wipro.co.in/IQMS_FS/Pages_IMS/Procedures/Service_Design/Availability/Availability_Management... 9/21/2009

. . : : IQMS : : . .

Page 5 of 8

Service or Component. In case trends are identified in unavailability of service or component, proceed to step 4.14 However, in case trends are not identified, proceed to step 4.10. There might also be a need for the Availability Management team to revisit the design for Availability to re-check the parameters that are configured. This ensures that meaningful trends on the availability data can be generated within the limitations of the design of Availability for the Service or Component. 4.14 Update service improvement plan If there are any trends identified in the unavailability of service or component, any identified steps for improvement of the availability should be documented as a part of the Service Improvement Plan and reported as a part of the Continual Service Improvement Process. However, the identified trends could also lead to changes to the availability design or the capacity plan or the SLAs mentioned in Service Level Management. It is the responsibility of the Availability Manager to point out such inputs to the right process owner to ensure corrective actions that are identified for improvement of availability are addressed. 4.15 Create/Update availability plan (M) The Availability plan is created for new availability requirements or updated for existing services or components. The basic contents of the Availability Plan, may include, but not restricted to: Actual levels of availability versus agreed levels of availability for key IT services. Availability measurements should always be business and customer-focused and report availability as experienced by the business and users. Activities being progressed to address shortfalls in availability for existing IT services. Where investment decisions are required, options with associated costs and benefits should be included. Details of changing availability requirements for existing IT services. The plan should document the options available to meet these changed requirements. Where investment decisions are required, the associated costs of each option should be included. Details of the availability requirements for forthcoming new IT services. The plan should document the options available to meet these new requirements. Where investment decisions are required, the associated costs of each option should be included. (Refer to the Availability Plan Template for more information) 4.16 Implement availability plan Once the Availability plan is created / updated, implementation of the methodologies mentioned in the availability plan is carried out by raising a request for change and following the change management process for an authorized implementation of availability parameters or changes to the infrastructure / component design to suit availability requirements thereon. Post Implementation, the services and components become a part of existing operations and the regular monitoring of the availability of these services and components will commence accordingly in an iterative methodology. 5. Risk Assessment and Management (M) 5.1 Identify Risks Identify potential failure modes and their impact on Service Availability. A series of risk categories is identified and for each category a suite of potential risks is listed.

http://fm.wipro.co.in/IQMS_FS/Pages_IMS/Procedures/Service_Design/Availability/Availability_Management... 9/21/2009

. . : : IQMS : : . .

Page 6 of 8

5.2 Identify potential causes of failure, current process controls Identify the likely causes of risks for the availability. Each risk cause is a particular aspect of the availability which is likely to experience a risk during the lifecycle of the operations. 5.3 Assign severity of Risks, occurrence of Risks and Detection of Risks Quantify the Severity, probability of occurrence and likelihood of detection of risks on a scale of 1 to 10. (Refer to the Guideline mentioned in Risk Register Template) 5.4 Define Risk priority number Risk Priority Number (RPN) is product of Severity, probability of occurrence and likelihood of detection of the identified risk. The higher the RPN, higher is the impact of the Risk on Availability of services. 5.5 Identify counter measures and recommended actions For each risk identified and in priority order, list The preventative actions to be taken to reduce/avoid the likelihood of occurrence of risk. The contingent actions to be taken to reduce the impact should the risk eventuate 5.6 Create/Update Risk Register (M) Once Risk has been identified, prioritized and counter measures are proposed, Availability Manager needs to keep a track of these Risks by creating/updating the Risk Register. (Refer to the Risk Register Template for more information) 5.7 Implement Counter Measures (M) Once Risk has been identified and recorded in the Risk Register, the counter measures that have been identified should be implemented by following the Change Management Process. Once the counter measures are implemented, proceed to 5.1 to update the identified Risk in the Risk Register. Since the counter measures are implemented, the severity, occurrence and detectability of the risk are now reduced and hence, the Risk Priority Number also gets reduced accordingly. 6. KPIs Percentage improvement in overall end-to-end availability of service Improvement in the MTBF (Mean Time Between Failures) Improvement in the MTBSI (Mean Time Between Systems Incidents) Reduction in the MTRS (Mean Time to Restore Service). Percentage reduction in critical time failures, e.g. specific business peak and priority availability needs are planned for Percentage improvement in business and users satisfied with service (by CSS results). Percentage reduction in the cost of unavailability Percentage improvement in the Service Delivery costs Timely completion of regular Risk Analysis and system review Effective review and follow-up of all SLA, OLA and underpinning contract breaches. 7. Work Items/Outputs Availability Plan Availability Design

http://fm.wipro.co.in/IQMS_FS/Pages_IMS/Procedures/Service_Design/Availability/Availability_Management... 9/21/2009

. . : : IQMS : : . .

Page 7 of 8

Availability Reports Availability Test Reports Risk Register 8. Summary

Tasks Identify Vital Business Function Design for Availability

Responsibility Availability Management Team Availability Management Team Availability Management Team Availability Management Team

Suggested Template CFIA Template

Verification /Validation Mechanism Review

Review Checklists

Approval Authority Availability Manager

Risk Register

Review

Availability Manager

Design for Recovery Test Availability Design Preparation of Availability Plan

Risk Register

Review

Availability Manager Availability Manager Availability Manager

Review Availability Plan Review -

Availability Management Team

Implement Availability Plan Monitoring of services

Availability Management Team Availability Management Team

Availability Plan Availability Plan Downtime Tracker

Review

Availability Manager

Review

Availability Manager

Analyze Service and Component Availability

Availability Management Team

Review

Availability Manager

Generation of Availability Reports Update Service Improvement Plan

Availability Management Team Availability Management Team

Availability Report SIP

Review

Availability Manager

Review

Availability Manager

9. References Availability Management Process Expanded Incident Lifecycle Guidelines Recovery Design Guidelines Service Level Management - Sustenance Process Incident Management Procedure Problem_Management Procedure Capacity_Management Procedure

http://fm.wipro.co.in/IQMS_FS/Pages_IMS/Procedures/Service_Design/Availability/Availability_Management... 9/21/2009

. . : : IQMS : : . .

Page 8 of 8

Change Management Procedure Service Asset_Configuration_Management Procedure


2009 Wipro Limited. All rights reserved. Internal;Restricted.

http://fm.wipro.co.in/IQMS_FS/Pages_IMS/Procedures/Service_Design/Availability/Availability_Management... 9/21/2009