You are on page 1of 15

1

MM3 - Reliability and Fault tolerance


in Networks
Service Level Agreements
Jens Myrup Pedersen
2
Service Level Agreements
The General Problem
The Quality of Service (QoS) parameters for the Communication Network
specifies the minimum requirements to e.q.:

Network accessibility
Network availability
Network performance (capacity, delay etc.)
Network operation and maintenance

The Quality of Service (QoS) parameters for the Communication Network will
normally be specified in a Service Level Agreement (SLA) between the
Network Clients and the Network Operator.

The SLA also specifies how the QoS parameters should be measured in order
to validate the fulfilment, as well as actions to be taken in case of service
degradation.
3
Service Level Agreements
The General Problem
In critical Distributed Real-Time System applications these parameters will
have a high impact on the network design in order to meet the overall system
requirements.

The basic parameters is concerned with accessibility and availability and are
expressed as a percentage of the agreed network service period.

Accessibility is the parameter for potential access to the network Availability is
the parameter for the ability to communicate across the network.

The ability to meet the demands is based on the network architecture and the
Mean Time Between Failures (MTBF) and the Mean Time To Repair (MTTR)
of the individual nodes, lines and management system.
4
Service Level Agreements
The General Problem
In critical Distributed Real-Time Systems the network QoS must be ensured by
a sufficient degree of redundancy and fast reaction in the network management
system.

The cost in the network design can be very high. There is a potential base for
relatively high savings in applying optimisation in the choice of network
topology in the design phase.

For large networks this optimisation problem is unsolved in general as the
complexity in the network grows exponentially with the number of nodes.

Applying some basic principles may not lead to optimal solutions, but at least
solutions solving the problem with predictable parameters.
5
Service Level Agreements
Example from a general purpose network JANET, UK
Network availability for at least:
Availability of 99.6% to more than 90% of clients
Availability of 99% to more than 96.5%of clients
Availability of 97% to more than 98.5% of clients
Availability of 93% to more than 99.5% of clients
Mean time between failures of the service of at least:
1000 hours provided to 99% of clients
The target rate is less than 0.001 incidents per hour,
calculated each month by dividing the number of failures in
the best 99% access points by the number of access points and
the number of hours in the month.
6
Service Level Agreements
Example from a general purpose network JANET, UK
7
Service Level Agreements
Example from a general purpose network JANET, UK
8
Service Level Agreements
Example from a general purpose network JANET, UK
9
Service Level Agreements
Example from a general purpose network JANET, UK
10
Service Level Agreements
Example from a general purpose network JANET, UK
End-to-end latency between any pair of clients for
128 octet packets, measured as the time of entry on to the first access
line of the last bit of the packet to the time of exit from the second
access line of the first bit of the packet, of less than a stated target
time, which depends on the transmission technology used for 95% of
transmissions over any thirty minute period.
Clients shall normally expect to be able to
transmit and receive traffic (from a number of sources) which, over
any thirty minute period, uses at least 40% of the nominal capacity of
their access line, once the overheads of the data solely concerned with
the transmission technology in use have been discounted
11
Service Level Agreements
Example from a general purpose network JANET, UK
Performance Indicators and Service Levels for Domain Name Service :
Availability of the primary name server for the target domain of
99.5%
Availability of service from an available officially supported name
server of 99.95%.
Performance Indicators and Service Levels for NTP Time Service:
This service is intended for use by access points in constructing their own
distributed time services (RFC 1305).
Availability of each time reference of 98%,
MTBF of 800 hours.

12
Service Level Agreements
International Standards
To estimate and verify the quality of the various components in the network
a number of measurement are specified in international agreed standards.
The ITU Recommendations G.821 and G.826 specify a set of communication
line parameters for SDH networks, primarily based on Bit Error Rates and
derived numbers.
The values will be part of the SLA between the end user and the network
service provider.

13
Service Level Agreements
International Standards, SDH
The recommendation G.821 has the following definitions:
Errored second (ES), a one-second time interval in which one or more bit
errors occurs.
Severely Errored second (SES), a one-second time interval in which the bit
error rate exceeds 10
-3
.
Unavailable second (US), a circuit is considered to be unavailable from the
first of at least 10 consecutive SES. The circuit is available from the first of
at least 10 consecutive seconds which are not SES.

Degraded minute (DM), a one-minute time interval in which the bit error
rate exceeds 10
-6
.

Error free seconds (EFS), a one-second time interval without any bit
errors.

In recommendation G.821 similar definitions are specified based on the block
level.
14
Service Level Agreements
International Standards, SDH
The recommendation G.826 has the following definitions:
Errored second (ES), a one-second time interval containing one or
more errored blocks.
Errored block (EB), a block containing one or more errored bits
Severely Errored second (SES), a one-second time interval in which
more than 30% of the blocks are errored.
Unavailable second (US), as for G.821
Background block error (BBE), an error block that is not a SES

A measurement time interval has to be specified, and the derived ratios for
ES, SES and BER are the base for the QoS parameters.

The recommended measurement time for G.821 and G.826 is 30 days.
15
Service Level Agreements
International Standards, ATM
The recommendation I.356 has the following definitions:
Cell Loss Ratio the number of cells lost divided by the number of cells
transmitted.
Cell Error Ratio (CER), the number of errored cells divided by the
number of cells transmitted.
Cell Misinsertion Rate (CMR) the number of wrongly inserted cells in
a specified time interval.
Cell Transfer Delay (CTD) the time from a cell enters a device under
test to it leaves the device.
Mean Cell Transfer Delay (CTD) is the arithmetical mean of a number
of CTD values in a specified period.
Cell Delay Variation (CDV) is the degree of variation in the cell
transfer delay (CTD) of a virtual connection.

You might also like