Professional Documents
Culture Documents
Overview
Version 2.2
June 28, 2018
Matthew Wollman
This page left intentionally blank.
1.0 9/24/2012 Matthew First release of document after core team approval
Wollman
Removed P1 and P2 differences
2.0 2/13/2013 Matthew Combined Purpose and Scope, and objectives and policies.
Wollman & Reorganized roles and responsibilities by order of role
Janet Crystal involvement in process. Reorganized and reduced Process
activities section to a high – level overview. Process activities
will be detailed in separate documentation
2.1 8/15/2014 Matthew Change to RACI, Service Owner is Accountable for External
Wollman Communications, Removed C-Cure to Critical Services
A Major Incident is the interruption or degradation of a core production service (any centralized
HUIT-provided service that serves multiple customers and users) that results in the disruption of its
customers’ ability to carry out University teaching, learning, research and/or administration at the
University.
The scope of this document is to provide an overview of the processes that apply to every Major
Incident for all HUIT services and that all HUIT employees must follow. Once trained, all HUIT employees
will be able to identify a Major Incident and to escalate it to the appropriate technical group for
resolution.
Policies
1. HUIT’s focus is to alert the community to the occurrence of a Major Incident as quickly as possible.
Early notification of a potential issue is more important than an accurate description of the problem.
2. HUIT will use standardized methods and procedures to enable an efficient and prompt response,
analysis, documentation, ongoing coordination and ownership, communication, and reporting.
3. Escalation in a Major Incident will start with the Incident Commander and move to the HUIT
employees most responsible for each service.
4. HUIT will communicate with affected end-users regularly throughout the lifecycle of a Major
Incident.
5. HUIT will maintain a consistent and regular presence through open communications among HUIT
staff and will provide consistent updates to the Service Desk, Service Owner, Incident Manger, and
HUIT leadership.
6. HUIT will log and document all details of Major Incidents throughout the lifetime of each event.
Incident Commander
The Incident Commander has the highest level of responsibility during a Major Incident and is
accountable for its lifecycle through coordination, documentation, and communication. The roles of
HUIT Incident Commander and HUIT Incident Communicator may be combined in one person for
incidents that are of short duration or that are deemed less critical. For incidents of longer duration or
those with greater impact, the responsibility of the Incident Commander can be escalated to a Manager
or Director in HUIT.
1. A Major Incident is one of the Critical Services listed in Appendix C of this document.
2. A Major Incident affects over 1,000 users of one or more services.
3. A Major Incident is not or cannot be resolved within four hours.
Service Desk
The Service Desk is responsible for the following activities:
SOC Operations
The SOC Operations group is responsible for the following activities:
Incident Coordination
The Incident Commander will involve and consult with all necessary parties to resolve the incident
as quickly as possible.
The Incident Commander will facilitate conference bridges to ensure that information is
disseminated in a timely manner, that time spent on the bridge is focused and that troubleshooting
can continue.
The Incident Commander will escalate the incident to additional resources, including hierarchical
escalations as necessary.
Conference Bridge
Once notified of a Major Incident, the Incident Commander will use a conference bridge that
includes all affected groups to maintain communication between the technical resources and the
service owner(s).
The Incident Commander will determine the appropriate schedule for calling a conference bridge
and its duration after the initial assessment.
External Communication
Throughout the Incident, HUIT will use its website as the primary location for information updates.
HUIT will distribute Incident notification(s) to external customers, add an outgoing message to the
Service Desk ACD system (as necessary), and send a tweet whose content will also appear on HUIT's
Facebook page and in Harvard’s Yammer community.
Investigation
HUIT will investigate continuously throughout a Major Incident and coordinate updates with
vendors, developers, and end-users.
Resolution
Service Owners have final sign-off authority on the resolution of a Major Incident and ensure
end-user notification.
Incident Documentation
The Incident Communicator will document the initial assessment of the incident's root cause (if
known), create a timeline, and establish the steps taken for investigation and resolution.
The Service Owner(s) and Technical Line Manager(s) will forward any notes or timelines that they
have maintained throughout the incident to the Incident Commander.
Phone or Emails
incident?
Yes
No
Assume Normal
Process
Yes
No
Assume Normal
Process
Service Owner / Product
Confirm Incident
Is this a Major Resolution
Customer
Incident? Communicate
Manager
Resolve Incident
Escalate to
Communicate
ITSM
Notify Service
Assume Incident Appropriate
Notify SD Owner / Product Internally End
Commander Role Technical
Manager Open Incident
Resources
Report
Line Manager
Technical
Provide
Escalate to ITSM Notify Update to ITSM
Technical
(6-2831) Huit-inf-alerts Coordinate add’l
Details / Log for
tech resources
Incident Report
Technical Resources
No
Assume Normal
Process
Legend
Incident Ownership Coordination and
Communication Tasks Technical Resolution Tasks
Path Ownership Tasks
Incident Communicator
Technical Resource
SOC Operations
Service Owner
Service Desk
Activity
Incident Identification A R R R R R
Initial Communications A,R R R C C
Escalation A,R R R R R R
Incident Coordination A,R C C
Conference Bridge A,R R I C R
External Communication R R I I C A,R
Internal Communication C,I R A C,I
Investigation I I I R C,I A
Resolution A,R R R R
Incident Documentation A,R R C C C C C
•Initial Communications
•Service Desk Updates ACD System
•Service DeskLlogs Remedy Ticket
•Incident Commander Sends HUIT Alert
T+30 •Incident Commander Updates Website
•Service Owner or Incident Commander Sends External Notification
•First Update
•Initial Diagnosis?
•Estimated Time to Resolution?
T+45 •Additional Communications Need to be Sent?
•Agree upon Update Times and Intervals (e.g., every 30 minutes)
•Regular Updates
•Update on Progress?
•Additional Rresources?
T+Interval •Updated Communications?
Critical Service—Any service whose failure or degradation creates an immediate and large-scale impact.
See Appendix C.
Incident Commander—The Incident Commander is responsible for the lifecycle of the Major Incident,
including coordination, documentation and communication and is its owner.
Major Incident—A Major Incident occurs when a core production service is interrupted or degraded,
resulting in a noticeable disruption of the customers’ ability to carry out University teaching,
learning, research and administration.
Non-Core Service—Any HUIT service that is hosted or provided to one specific customer or group of
users for a non-centralized purpose.
Service Owner—In the context of the Major Incident process, the service owner is a HUIT staff member
who has a comprehensive view of the service including but not limited to customer and user
relationships, a broad understanding of the components required to deliver that service, and the
expectations for the quality set for that service.
Utility—The functionality offered by a service to meet a particular need. Utility can be summarized as
‘what a service does’, and can be used to determine whether a service is able to meet its
required outcomes or is ‘fit for purpose’. The business value of an IT service is created by a
combination of utility and warranty.
Warranty – Assurance that a product or service will meet agreed requirements. This may be a formal
agreement such as a service level agreement or contract, or it may be implied through ad-hoc
messages or agreements. Warranty refers to the ability of a service to be available when
needed, to provide the required capacity, and to provide the required reliability in terms of
continuity and security. Warranty can be summarized as 'how the service is delivered', and can
be used to determine whether a service is 'fit for use'. The business value of an IT service is
created by the combination of utility and warranty. See also service validation and testing.