You are on page 1of 47

5 Keys to Building High Availability Web Applications

for Service and Microservice Based Systems


Lee Atchison, Principal Cloud Architect and Advocate
Confidential 200816 New Relic, Inc. All rights reserved.

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

You had power


most of the time.
Why are you
complaining?

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

How do you keep an


application operational?
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

5 Keys to High Availability Web Applications

Key 1

Key 2

Key 3

Key 4

Key 5

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Key 1

Key 2

Key 3

Key 4

Key 5

Build applications
keeping
availability
in mind

OR
Develop for
failure

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Services will fail

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Services will fail


always.

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

As a Service Developer

Your response to
a dependency
failure must be

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

As a Service Developer

Your response to
a dependency
failure must be

Understandable

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

As a Service Developer

Predictable

Your response to
a dependency
failure must be

Understandable

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

As a Service Developer

Predictable

Your response to
a dependency
failure must be

Understandable

Reasonable
for the given
dependency failure

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

How should I
respond when a
dependency fails?
Dont know something? Dont show it!

Provide a
graceful backoff

Dont show a drop down list of accounts


if you cant contact the account service

Dont show an image (or show a


placeholder) if you cant determine
which image to show

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Example (Real Life)


Our web application showing a page

An avatar was representing


the customer on each page

A 3rd party system


generated the avatar

One day, that 3rd


party system failed

The app didnt know what


to do so it failed, too

Our application was completely down,


all because of a minor icon missing...
16

New Relic Template 2015

It didnt know how to respond.


It could have:

Why did this cause your


application to fail?

Recognized the failure of the 3rd party provider as


soon as possible

Substitute a generic image (or removed it)


when the service failure was detected

Circuit Breaker pattern would help a lot here

17

New Relic Template 2015

How should I
respond when a
dependency fails?
Fail as early as possible:

Provide a
graceful backoff

Dont propagate bad data


once you determine a piece of data is
invalid, discard it as soon as possible

Validate input given


reject bad input immediately

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Example (Real Life)


Account service was having performance problems

400
0

Someone was
sending bad requests

Service tried to
process the request

Customers felt a
performance problem

System had
browned out

(And eventually failed)

19

New Relic Template 2015

Input to the service was obviously bad

Yet, we attempted to use the input

Result was a failed service

So, what brought our


application to its knees?

20

New Relic Template 2015

The Lesson

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Key 1

Key 2

Build applications
keeping
availability
in mind

Always think
about scaling

OR

Just because
your application
works now does
not mean it will
work tomorrow

Develop for
failure

Key 3

Key 4

Key 5

OR

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Just because your


application works
now does not mean
it will work
tomorrow

Why?

Most web applications have increasing


traffic patterns

Traffic will increase, double, triple, 10x


sooner than you think

Dont build it for todays traffic


build it for tomorrows traffic

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Build for
tomorrow
might mean:

Build in the ability to increase the size and capacity


of your databases.

Determine what logical limits exist to your data


scaling. What happens when your database tops
out in its capabilities?

Build your application so that you can add additional


application servers easily. This often involves being
observant about where and how state is maintained,
and how traffic is routed.*

Think about caching. What information can be


cached? What can't? Why can't it?

Redirect static traffic to offline providers.

Think about whether specific pieces of dynamic


content can actually be generated statically.

* This topic is large enough for an entire chapter, even an entire book, on on its own.
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Example: Is It Static or Dynamic?

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Example: Is It Static or Dynamic?


Non-static content

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Example: Is It Static or Dynamic?


Non-static content

Banner is now static

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Example: Is It Static or Dynamic?


Non-static content

Banner is now static

Personalized content
can be added in browser

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Key 1

Key 2

Key 3

Build applications
keeping
availability
in mind

Always think
about scaling

Mitigate
risk

OR

Just because
your application
works now does
not mean it will
work tomorrow

Develop for
failure

Key 4

Key 5

OR

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

All Systems Have Risk in Them

There is risk that a

Server
will crash

Database will
get corrupted

Returned
answer will
be incorrect

Network
connection
will fail

Newly deployed
piece of
software will fail

Risk is a measure of the likelihood of a surprise occurring


Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Risk

Keeping a system available requires


removing risk
Hence, removing surprise

But as systems become more and more


complicated
... this becomes less and less possible

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Risk
Managing what
your risk is

Managing how much


risk is acceptable

Risk Management
is at the heart of
building highly
available systems

Knowing what
you can do to mitigate
the risk
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Risk

Knowing what
you can do to mitigate
the risk

Risk mitigation

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Risk Mitigation

Risk mitigation is part of risk management


Risk mitigation:
Knowing what to do when a problem
occurs in order to reduce the impact
of the problem
Making sure your application works
as best and as completely as possible,
even when services and resources fail
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Risk Mitigation

Risk mitigation requires thinking


about the things that can go wrong
and putting a plan together, now
to be able to handle the situation
when it does happen.

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Key 1

Key 2

Key 3

Key 4

Build applications
keeping
availability
in mind

Always think
about scaling

Mitigate
risk

Monitor
availability

OR

OR

OR

Just because
your application
works now does
not mean it will
work tomorrow

Yes, we can
help you

Develop for
failure

Key 5

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Monitor Availability

Understand how your application is performing

Use application monitoring:

Keep an eye on how your app is performing


Generate notifications when the application
performs in abnormal ways

Make sure your app is properly instrumented


Internal as well as external to your app

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Monitor Availability

Have your tools monitor continuously

Establish a baseline for how your application


is performing

Look for trends and patterns

Look for outliers and deviations from the trends

Treat these as potential availability issues

As your system grows:

Examine how your baseline changes


Make sure your scalability plan will
continue to work

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Service Level Agreements

Establish
Internal
SLAs

Quick
diagnoses

Hot spots
to optimize
performance
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Service Level Agreements

Critical to building
scalable application

Establish
Internal
SLAs

Only way to scale


an organization in
a reliable way is
with reliable SLAs

Quick
diagnoses

Hot spots
to optimize
performance
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Key 1

Key 2

Key 3

Key 4

Key 5

Build applications
keeping
availability
in mind

Always think
about scaling

Mitigate
risk

Monitor
availability

Availability
response

OR

OR

OR

OR

Just because
your application
works now does
not mean it will
work tomorrow

Yes, we can
help you

Yes, that was


your pager that
went off

Develop for
failure

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Responsiveness

When a problem occurs

Do you know what to do to fix the


problem?

Does everyone on your team know


what to do?

Do you have playbooks?

Does your pager rotation and notification


system work?
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Responsiveness

You must be prepared to act on issues.


This means:

Alerts that reach the needed individuals

Prepared processes and procedures


for common failure modes
(this is part of risk mitigation process)

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Responsiveness
When an alert is triggered

Owner of that service must be first


ones alerted

Other teams may want to be alerted


as well
Services that are tightly dependent on
triggered service
Early warning notification for upstream
or downstream issues
May want a second level notification
for dependencies
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Responsiveness
BEFORE the problem occurs:

Well established plans

Documented processes and cheat sheets

Contact lists for critical consuming


service owners

Clear, precise escalation plan:


Who to contact if problem becomes too
big for responder to handle
If scope of problem extends significantly
and critically beyond failing system

Know who to escalate if first responder doesnt


respond
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

5 Keys to High Availability Web Apps

Key 1

Key 2

Key 3

Key 4

Key 5

Build applications
keeping
availability
in mind

Always think
about scaling

Mitigate
risk

Monitor
availability

Availability
response

Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

Thank you for


your time!
Questions?
Lee Atchison

lee@newrelic.com
www.leeatchison.com

@leeatchison

Architecting for Scale


Published by: OReilly Media
Available: May 2016
www.architectingforscale.com

leeatchison
Confidential 200816 New Relic, Inc. All rights reserved.

New Relic Template 2015

You might also like