You are on page 1of 10

FASTER PERFORMANCE FOR

DYNAMIC HTML PAGES


BY SHARAD JAISWAL

INTRODUCTION
As the web evolves to deliver more
engaging and interactive experiences, one
prominent outcome is the increasing size
and complexity of web pages. As reported
by httparchive, the average page size of the
top 100 websites has grown from 400KB to
1300KB over the past five years, due to the
increase in heavy images and complex
JavaScript and HTML code. This creates
some serious web performance challenges.

At Instart Logic we address this challenge with our software-defined application delivery (SDAD) platform that optimizes the
delivery of the underlying components of complex web pages, such as images (Image Streaming), and JavaScript
(JavaScript Streaming).
However, another prominent trend is that for a significant percentage of these sites, the underlying HTML itself is not
cacheable. As per httparchive, nearly 40-50% of the analyzed sites have explicit do not cache directives in the HTTP
response headers. Our internal analysis of the Internet Retailer top 100 websites (a collection of the most popular ecommerce websites) suggests that almost half of web pages are not cacheable.
A page is marked as non-cacheable typically when it involves a degree of personalization a trend that is increasingly
common across a wide range of web sites. Since personalization requires the execution of some server-side business logic,
such pages can lead to significantly long delays. Dynamic pages usually represent some of the most interactive, media-rich
(and thus latency-prone) pages on the web. Yet, their non-cacheable nature conflicts with the traditional approach of
speeding up the delivery of web objects to cache and serve from local browser storage or the edge.
So, we asked ourselves is there a systematic way to bring better performance to modern, hard-to-cache, dynamic HTML
web pages? Instart Logics answer to this question is our new SmartSequence technology with HTML Streaming.

HTML STREAMING: WHAT IT IS AND HOW IT WORKS


HTML Streaming is a novel, principled and transparent approach to the delivery of dynamic HTML pages from our SDAD
service. The basic insight is that an HTML page should not be treated as a monolithic object, but as being made up of two
types of components:
elements that change rarely across requests
elements for which change is frequent (for example, changes across users due to personalization)
Given an HTML page, our goal is to identify and store the rarely-changing HTML elements on an Instart Logic edge server, so it
can be served quickly to an end user's browser when a request arrives. We term this cacheable subset a stub, and it includes
the client-side Nanovisor. The non-cacheable elements are freshly fetched from the origin, and then patched-in with the
previous elements already received.
Take a look at the timing diagram below, and let's assume a request is triggered from the browser at time t1. It is received by
the HTML Streaming service in an Instart Logic server, and if the stub for the HTML page is present, then the client immediately
gets a response, which arrives by time t3. If instead the stub was not present, the request would have gone all the way to the
origin, waited through the server processing delay, and arrived at the client at time t7. The difference t7 - t3 is the head start a
browser gets because of HTML Streaming.

Now, after sending the stub, the Instart Logic server will make a request to the origin. When the response arrives back at the
server (at time t5) the HTML Streaming service within the server compares the HTML in the response to the one sent out
earlier with the stub. Any differences are patched by sending instructions to the client Nanovisor. If the resulting patch is found
to be unsafe, then the server and the client work in conjunction to reload the page automatically before anything is shown to
the end user.

The head start a browser receives when it processes the stub can result in substantial performance gains (up to 40% over certain
crucial web page performance metrics such as Start Render, and DOM Content Loaded).

CHALLENGES FOR HTML STREAMING


Our first iteration of HTML Streaming could only be applied to pages where the <HEAD> portion of the HTML was the same
across all users. Now with our new enhancements, the service can handle even the most dynamic HTML.
However, transparently (for the origin) creating a subset of a dynamic HTML page that can be speculatively pre-executed and
patched to create the full page, exposes several challenging technical problems. We will now discuss some of these challenges.
First, to safeguard end-user privacy, any user-specific information should be identified and removed from the stub. Second, the
patching should ensure the execution order of both cached and patched-in scripts is maintained as in the original page, and that
the page loads correctly. Third, the stub should evolve and keep up with changes in the origin content (while retaining the above
two properties). Finally, any unsafe or incorrect behavior caused by the pre-execution of the cached stub in the browser should be
detectable, and corrective action initiated if it happens.
A key design goal of HTML Streaming was thus to learn a "safe" stub, that is, a subset of the HTML which satisfies the criteria
outlined above, and allows us to detect unsafe pre-execution of its content, if any. Now lets take a deeper look into how we
compute a safe stub.

LEARNING A SAFE STUB


The starting point behind building a safe, cacheable stub is to periodically examine requests over a learning period, and identify
elements in the head which are common across requests. This, however, does not ensure safe execution. For example, consider
a HEAD with a element of the form

<meta id="csrf-token" value="dkked32">

Assume that the value attribute of this <META> element changes across requests, and hence this element is not included in the
cached stab (and will be subsequently patched-in). Now suppose there is a <SCRIPT> element, present in the stub, further down
the <HEAD> which accesses this <META> element. This could lead to problems in the page load, since the accessed <META>
element was not included in the stub. To deal with this issue, we virtualize the changing element. This entails removing all
sensitive (changing) attributes, and then, using the Nanovisor, to set up a watch by intercepting all access functions for this
element. The watch allows us to determine if a subsequent patching of this changing element is safe or not.
In addition, there are other conditions (e.g. preserving the execution order of scripts) that also have to be further satisfied to
ensure correctness, which we are not going into today as part of this blog post.

AUTO-TUNED LEARNING WITH SMARTSEQUENCE


HTML Streaming has several moving parts involved in the creation of a safe/performant stub, which have to adapt to a wide range
of web sites and updates in the site content. The SmartSequence technology powers and monitors the HTML Streaming feature
to ensure this adaptation is transparent to the origin and end users.
As dynamic HTML flows through the service, SmartSequence allows the system to first learn the patterns of which portions of the
HTML are unique and truly dynamic, and also monitor for any requests that are triggering a reload, and the reason for this. Based
on this information, the system automatically adjusts the periods of up-front learning and even adjusts for which pages the feature
is active, on a per-URL (and even per-browser) basis, all by learning from live production traffic. This process is continuous and
allows the system to automatically evolve as the website or user behavior changes over time.

CONCLUSION
In summary, modern web sites are moving towards personalization for better user engagement. However, often this comes at the
cost of performance due to non-cacheable dynamic HTML. At the same time, users are growing increasingly impatient and want
to view content as soon as possible. Performance is thus an important imperative for these web sites.
Instart Logics HTML Streaming feature powered by SmartSequence technology is a new mechanism to accelerate dynamic web
page performance and improve user experience. Evaluations of HTML Streaming applied to Internet Retailer Top 100 sites with
dynamic HTML content demonstrate significant performance gains for a wide range of sites. In fact, we have observed gains
greater than 20-30% on a range of metrics such as Start Render, Load Time and Speed Index, for 20-40% of the sites considered
(depending on the metric). These gains hold across first and repeat views, and end user connection types (wired cable or mobile
3G).
HTML Streaming is being deployed today by several of our customers who are enjoying these great performance benefits.

Visit our Blog for more information

You might also like