You are on page 1of 35

Preliminary Project Report

On

Scaling REST Services


Submitted by
Ashwin Habbu
Luv Varma
Shreya Tripathi
Utsav Dusad
in partial fulfillment for the award of the degree
of

Bachelor of Engineering
of
University of Pune
in
Information Technology

Pune Institute of Computer Technology


2013-14

Preliminary Project Report


on

Scaling REST Services


submitted by

Ashwin Habbu
Luv Varma
Shreya Tripathi
Utsav Dusad

Guided By
Prof.S.S Pande
Mr.Amol Pujari
Mr. Prasad Joshi

I. T. Department
Pune Institute of Computer Technology.
S. No. 27, Dhankawadi, Pune Satara Road, Pune 411043.
University of Pune
2013-14

Department of Information Technology

Certificate
This is to certify that,
Ashwin Habbu
Luv Varma
Shreya Tripathi
Utsav Dusad
have successfully completed this project report entitled Scaling REST Services,
under my guidance in partial fulfillment of the requirements for the degree of
Bachelor of Engineering in Department of Information Technology of University
of Pune during the academic year 2013-14.

Date:- October 18, 2013


Place:- Pune

Prof.S.S.Pande
Guide

Dr.Emmanuel.M.
HoD

Acknowledgement

We take this opportunity to thank our project guide Prof.S.S.Pande and


Head of the Department Dr.Emmanuel.M. for their valuable guidance and for
providing all the necessary facilities, which were indispensable in the completion
of this project report. We are also thankful to all the staff members of the Department of Information Technology of Pune Institute of Computer Technology,
Pune for their valuable time, support, comments, suggestions and persuasion. We
would also like to thank the institute for providing the required facilities, Internet
access and important books.

Ashwin Habbu
Luv Varma
Shreya Tripathi
Utsav Dusad

Abstract

REST services are a playing very important role building modern web applications and services. Good web application design involves several distributed
components including one more REST service backed up by sql and no-sql data
streams. The scaled applications and services allow billions of users to access the
services seamlessly. While static data is being cached at every stage, dynamic
data needs to be regularly refreshed at lightning speed. REST is of significant
importance in achieving this. Hence we want to focus on making a REST service
serve the contents at a super speed. This project aims to create a REST service
deployed using a new and/or custom micro web server/framework where it should
support only REST only features. It is going to be targeting only GET requests.

Contents
List of Figures

iv

1 Introduction

1.1

Need . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.1

Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.2

REST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Aim

3 Objective

4 Literature Survey

4.1

4.2

4.3

REST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.1.1

SOAP,A Competitor to REST . . . . . . . . . . . . . . . . .

HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.2.1

HTTP Verbs . . . . . . . . . . . . . . . . . . . . . . . . . .

4.2.2

GET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.2.3

Representation . . . . . . . . . . . . . . . . . . . . . . . . .

4.2.4

Response Codes . . . . . . . . . . . . . . . . . . . . . . . . .

Scalability of REST Services . . . . . . . . . . . . . . . . . . . . . . 11

ii

4.3.1

How does Caching help in Scaling? . . . . . . . . . . . . . . 12

4.4

MVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.5

Tools Studied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.5.1

cURL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.5.2

Apache Bench . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Problem Statement

18

6 Project Requirements

19

6.1

Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . 19

6.2

Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Design
7.1

20

UML Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

8 Planning and Scheduling

26

9 References

27

iii

List of Figures
4.1

Behaviour of Passive Model . . . . . . . . . . . . . . . . . . . . . . 15

4.2

Behaviour of Active Model . . . . . . . . . . . . . . . . . . . . . . . 16

7.1

Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7.2

Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7.3

Deployment Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 22

7.4

Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7.5

Communication Diagram . . . . . . . . . . . . . . . . . . . . . . . . 24

7.6

Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

iv

Chapter 1
Introduction
1.1

Need

REST services are playing a crucial role in building modern web application and
services. Good web application design involves several distributed components
including one or more REST services backed by SQL and NO-SQL data streams.
Usually caching is done to serve contents over HTTP, so as to scale the applications
and the services provided by them seamlessly serving up to billions of users at
a time. However caching is conducive to static data, but for dynamic data the
contents need to be refreshed and served at extremely fast speeds. REST has a
major role to play in this. This project deals with creating REST services that
can serve these contents at extremely fast speeds, without having to deal with any
form of session management or any other similar modules that are usually required
for a full stack web application.

1.2

Basic Concepts

The things which we need for understanding the Scaling of REST services require
the basic understanding of the term like REST or Scaling. These terms are listed
below with adequate meaning and structure.

1.2.1

Scaling

Scalability is a desirable property of a system which indicates its ability to either


handle growing amounts of work in a graceful manner, or to be readily enlarged
as demands increase. In RPC-style architectures, for example SOA, the service
is composed of fine-grained user-defined operations. And different services have
different interfaces. Each interface holds its own semantics and operating parameters. So the interface contract is critical to service definitions. If the client wants
to interoperate with Web services in SOA correctly, they have to understand the
semantics of each interface contract. This approach is no trouble in a relatively
closed application environment. But in an open distributed environment, for the
unforeseeable number of operations, the approach may cause problems of tightly
coupling and interface complexity. These problems prevent the distributed system
from having Web-scale scalability. For the current number of Web sites in Web,
imagine that if each Web site defines its own special interface and ask the Web
browser to download or write a plug-in to adapt to its interfaces. Otherwise, the
Web browser will not understand the semantics of the interface and cannot interact with the particular site.. REST takes a different way from RPC in the matter
of interfaces, and that is uniform interface constraint(constraint3). It means that
all resources expose the same interfaces to the client in REST architecture.

1.2.2

REST

REST - Representational State Transfer. It relies on a stateless, client server,


cacheable communications protocol. Almost in all cases the HTTP protocol is
used. It is an architectural style - the basic idea behind it is that instead of using
complex mechanism such as COBRA, RPC or SOAP to connect between machines,
simple HTTP is used to make the calls between the machines. REST applications
use HTTP requests to post data, read data and delete data.

Chapter 2
Aim
The aim of this project is to create a REST service deployed using a new or custom
micro web server and/or framework where it should support only REST features.
By doing this we want to be able to make the service scalable according to the
application and serve any number of customers at an extremely high speed. This
service will be customized for GET messages only.

Chapter 3
Objective
There are a number of other protocols built on top of HTTP that are designed to
specifically build web services. They focus on using HTTP as a transport protocol
for enormous XML encoded messages. The resulting service is very complex,
almost impossible to debug and wont work unless the clients also have the exact
same setup. The main issue with this is that the HTTP is used as a means of
sending the XML encoded data when HTTP can be used to send the contents as
it is. RESTful web services require none of this internal complexity, and therefore
it can instead spend its complexity on additional features or on making multiple
services interact. We want to create or customise a web framework that will deploy
a RESTful web service that will be lightweight, scalable and serve the Clients at
a super fast speed. However it will be customized for GET messages only.

Chapter 4
Literature Survey
Our project at GSLAB requires us to understand the different web service architectures such as SOAP, REST etc. And tools to test the performance of the
different web services. In the following sections, 4.1 to 4.3, we shall study about
the REST, Scaling and different HTTP verbs. In 4.4 to 4.5 we will look at the
tools to test the performance of the web services and about Mvcs.

4.1

REST

REST - Representational State Transfer. It relies on a stateless, client server,


cacheable communications protocol. Almost in all cases the HTTP protocol is
used. It is an architectural style - the basic idea behind it is that instead of using
complex mechanism such as COBRA, RPC or SOAP to connect between machines,
simple HTTP is used to make the calls between the machines. REST applications
use HTTP requests to post data, read data and delete data. Thus we can say
that REST uses HTTP for create/read/update and delete operations. REST is a
lightweight alternative. Some of its features are:
1. It is platform independent. That is it doesnt matter whether the server is
6

Windows and the Client Unix.


2. It is language independent. Again both server and client can be in different
languages.
3. It can be easily used in the presence of firewalls.
4. It runs on top of HTTP, so it is based on the standards of HTTP.

4.1.1

SOAP,A Competitor to REST

SOAP is Simple Object Access Protocol. It is a protocol for accessing a Web


Service. It is an XML based protocol to let applications exchange information
over HTTP. The binary data that is sent over it must be encoded first into a
predefined format. The reason we have mentioned SOAP is that the significance
of REST cannot be understood without comparing it with any other protocol.
REST does not require the encoding of binary data or resources. They can simply
be delivered upon their request. The main point of difference between the two
is that SOAP transfers messages (typically) over HTTP by first formatting the
message in XML and then sending it. However in REST you can simply send
and receive data between the client and server as JSON, XML or EVEN PLAIN
TEXT. Its light weight and doesnt require all the complex standards SOAP does.
It is also simpler to use.

4.2

HTTP

HTTP is the protocol that allows for sending documents back and forth on the web.
A protocol is a set of rules that determines which messages can be exchanged, and
which messages are appropriate replies to others. In HTTP, there are two different
roles: server and client. In general, the client always initiates the conversation;
7

the server replies. HTTP is text based; that is, messages are essentially bits of
text, although the message body can also contain other media. HTTP messages
are made of a header and a body. The body can often remain empty; it contains
data that you want to transmit over the network. The header contains metadata,
such as encoding information; but, in the case of a request, it also contains the
important HTTP methods. In the REST style, you will find that header data is
often more significant than the body.

4.2.1

HTTP Verbs

Each request specifies a certain HTTP verb, or method, in the request header.
This is the first all caps word in the request header. An example of GET method
being used is: GET / HTTP/1.1 HTTP verbs tell the server what to do with
the data identified by the URL. The request can optionally contain additional
information in its body, which might be required to perform the operation for
instance, data you want to store with the resource. You can supply this data in
cURL with the -d option. If youve ever created HTML forms, youll be familiar
with two of the most important HTTP verbs: GET and POST. But there are far
more HTTP verbs available. The most important ones for building RESTful API
are GET, POST, PUT and DELETE.

4.2.2

GET

GET is the simplest type of HTTP request method; the one that browsers use
each time you click a link or type a URL into the address bar. It instructs the
server to transmit the data identified by the URL to the client. Data should never
be modified on the server side as a result of a GET request. In this sense, a GET
request is read-only, but of course, once the client receives the data, it is free to
do any operation with it on its own side for instance, format it for display.
8

4.2.3

Representation

The HTTP client and HTTP server exchange information about resources identified by URLs. Both the header and the body are pieces of the representation. The
HTTP headers, which contain metadata, are tightly defined by the HTTP spec;
they can only contain plain text, and must be formatted in a certain manner.
The body can contain data in any format, and this is where the power of HTTP
truly shines. You know that you can send plain text, pictures, HTML, and XML
in any human language. Through request metadata or different URLs, you can
choose between different representations for the same resource. For example, you
might send a webpage to browsers and JSON to applications.
The HTTP response should specify the content type of the body. This is done
in the header, in the Content-Type field; for instance:
1. Content/Type: application/json
For simplicity, our example application only sends JSON back and forth, but
the application should be architecture in such a way that you can easily change
the format of the data, to tailor for different clients or user preferences.

4.2.4

Response Codes

HTTP response codes standardize a way of informing the client about the result
of its request.
You might have noticed that the example application uses the PHP header(),
passing some strange looking strings as arguments. The header() function prints
the HTTP headers and ensures that they are formatted appropriately. Headers
should be the first thing on the response, so you shouldnt output anything else
before you are done with the headers. Sometimes, your HTTP server may be
configured to add other headers, in addition to those you specify in your code.
9

Headers contain all sort of meta information; for example, the text encoding
used in the message body or the MIME type of the bodys content. In this case,
we are explicitly specifying the HTTP response codes. HTTP response codes
standardize a way of informing the client about the result of its request. By default,
PHP returns a 200 response code, which means that the response is successful.
The server should return the most appropriate HTTP response code; this way,
the client can attempt to repair its errors, assuming there are any. Most people
are familiar with the common 404 Not Foundresponse code, however, there are a
lot more available to fit a wide variety of situations.
Keep in mind that the meaning of a HTTP response code is not extremely
precise; this is a consequence of HTTP itself being rather generic. You should
attempt to use the response code which most closely matches the situation at
hand. That being said, do not worry too much if you cannot find an exact fit.
Here are some HTTP response codes, which are often used with REST:
1. 200 OK: This response code indicates that the request was successful.
2. 201 Created: This indicates the request was successful and a resource was
created. It is used to confirm success of a PUT or POST request.
3. 400 Bad Request: The request was malformed. This happens especially with
POST and PUT requests, when the data does not pass validation, or is in
the wrong format.
4. 404 Not Found: This response indicates that the required resource could not
be found. This is generally returned to all requests which point to a URL
with no corresponding resource.
5. 401 Unauthorized: This error indicates that you need to perform authentication before accessing the resource.
10

6. 405 Method Not Allowed: The HTTP method used is not supported for this
resource.
7. 409 Conflict: This indicates a conflict. For instance, you are using a PUT
request to create the same resource twice.
8. 500 Internal Server Error: When all else fails; generally, a 500 response is
used when processing fails due to unanticipated circumstances on the server
side, which causes the server to error out.

4.3

Scalability of REST Services

Scalability is a desirable property of a system which indicates its ability to either


handle growing amounts of work in a graceful manner, or to be readily enlarged
as demands increase.
In RPC-style architectures, for example SOA, the service is composed of finegrained user-defined operations. And different services have different interfaces.
Each interface holds its own semantics and operating parameters. So the interface contract is critical to service definitions. If the client wants to interoperate
with Web services in SOA correctly, they have to understand the semantics of each
interface contract. This approach is no trouble in a relatively closed application environment. But in an open distributed environment, for the unforeseeable number
of operations, the approach may cause problems of tightly coupling and interface
complexity. These problems prevent the distributed system from having Web-scale
scalability. For the current number of Web sites in Web, imagine that if each Web
site defines its own special interface and ask the Web browser to download or
write a plug-in to adapt to its interfaces. Otherwise, the Web browser will not
understand the semantics of the interface and cannot interact with the particular
site. Then Web browser on the client has to install millions of different plug-ins,
11

which is unacceptable. REST takes a different way from RPC in the matter of
interfaces, and that is uniform interface constraint (constraint3). It means that
all resources expose the same interfaces to the client in REST architecture. The
most known REST implementation is HTTP protocol. REST is formed based on
truly understanding HTTP protocol. In REST, HTTP is a state transfer protocol
other than a data transport protocol but. HTTP not only can uniquely locate a
resource, but also tell us how to operate the resource.
Stateless interactions constrain (constrain6) can also enhance the scalability of
RESTful architecture. Stateless interactions constrain demands that each clients
request should contain all application states necessary to understand that request.
None of state information is kept on the server, and none of it is implied by previous
requests. Stateless interactions reduce the cost of enlarging system scale. Because
all conversations are stateless, when system scale is up, the only thing needed to
do is plugging more load-balanced servers. The coordination between different
servers is not needed. Some RPC conversations are stateful. When facing the
same question, RPC has to use some additional technologies to achieve the same
effect, such as duplicate data or shared memory.

4.3.1

How does Caching help in Scaling?

Caching allows you to store your web assets on remote points along the way to
your visitors browsers. Of course the browser itself also maintains an aggressive
cache, which keeps clients from having to continually ask your server for a resource
each time it comes up.
Cache configuration for your web traffic is critical for any performing site. If
you pay for bandwidth, make revenue from an e-commerce site, or even just like
keeping your reputation as a web-literate developer intact, you need to know what
caching gets you and how to set it up.
12

In the case of assets, things like your company logo, the favicon for your site,
or your core CSS files arent likely to change from request to request, so it is safe
to tell the requester to hold onto their copy of the asset for a while.
By cutting down on the requests your server has to deal with, you are able
to handle more requests, and your users will enjoy a faster browsing experience,
thereby reducing traffic at server side which in turn helps in scaling of those web
applications . Generally, assets like images, JavaScript files, and style-sheets can
all be cached fairly heavily, while assets that are dynamically generated, like dashboards, forums, or many types of web-applications, benefit less, if at all.

4.4

MVC

Model View Controller or MVC as it is popularly called, is a software design


pattern for developing web applications. A Model View Controller pattern is
made up of the following three parts:
1. Model: The lowest level of the pattern which is responsible for maintaining
data.
2. View: This is responsible for displaying all or a portion of the data to the
user.
3. Controller: Software Code that controls the interactions between the Model
and View.
MVC is popular as it isolates the application logic from the user interface layer
and supports separation of concerns. Here the Controller receives all requests for
the application and then works with the Model to prepare any data needed by
the View. The View then uses the data prepared by the Controller to generate a

13

final presentable response. The MVC abstraction can be graphically represented


as follows.
1. The model: The model is responsible for managing the data of the application. It responds to the request from the view and it also responds to
instructions from the controller to update itself.
2. The view: A presentation of data in a particular format, triggered by a
controllers decision to present the data. They are script based tinplating
systems like JSP, ASP, PHP and very easy to integrate with AJAX technology.
3. The controller: The controller is responsible for responding to user input and
performs interactions on the data model objects. The controller receives the
input, it validates the input and then performs the business operation that
modifies the state of the data model.
Types of Models:
1. Passive Model
2. Active Model
The passive model is employed when one controller manipulates the model
exclusively. The controller modifies the model and then informs the view that
the model has changed and should be refreshed. The model in this scenario is
completely independent of the view and the controller, which means that there
is no means for the model to report changes in its state. The HTTP protocol is
an example of this. There is no simple way in the browser to get asynchronous
updates from the server. The browser displays the view and responds to user
input, but it does not detect changes in the data on the server. Only when the
user explicitly requests a refresh is the server interrogated for changes.
14

Figure 4.1: Behaviour of Passive Model


The active model is used when the model changes state without the controllers
involvement. This can happen when other sources are changing the data and the
changes must be reflected in the views. Consider a stock-ticker display. You receive
stock data from an external source and want to update the views (for example,
a ticker band and an alert window) when the stock data changes. Because only
the model detects changes to its internal state when they occur, the model must
notify the views to refresh the display.

15

Figure 4.2: Behaviour of Active Model

4.5
4.5.1

Tools Studied
cURL

cURL is a command line tool that is available on all major operating systems. Once
you have cURL installed, type: curl -v google.com. This will display the complete
HTTP conversation. It is used for getting or sending files using URL syntax. Since
cURL uses libcurl, it supports a range of common Internet protocols, currently
16

including HTTP, HTTPS, FTP, FTPS, SCP, SFTP, TFTP, LDAP, LDAPS, DICT,
TELNET, FILE, IMAP, POP3, SMTP etc. cURL is a utility that is used to make
HTTP requests to a given url. It outputs HTTP response to standard output.
Examples of cURL use from command line:
1. Make requests with different HTTP method type without data:
(a) curl -X POST http://www.somedomain.com/
(b) curl -X DELETE http://www.somedomain.com/
(c) curl -X PUT http://www.somedomain.com/
2. Make requests with data: send login data with POST request curl request

POST http://www.somedomain.com/login/ data username=myusername&password=myp


3. send PUT request with data curl request PUT http://www.somedomain.com/restapi/user/12345/ data email=myemail@gmail.com
4. Get response with HTTP headers: If you need to get HTTP headers with
your response, include parameter can be used curl request GET http://somedomain.com/
include

4.5.2

Apache Bench

ApacheBench (ab) is a single-threaded command line computer program for measuring the performance of HTTP web servers. Originally designed to test the
Apache HTTP Server, it is generic enough to test any web server. Example Usage: ab -n 100 -c 10 http://www.yahoo.com/ This will execute 100 HTTP GET
requests, processing up to 10 requests concurrently, to the specified URL, in this
example, http://www.yahoo.com/

17

Chapter 5
Problem Statement
We want to imagine and design a REST service deployed using a new or custom
micro web server and/or framework where it should support only REST only features. No session management and any other similar module that usually required
for a full stack web application. We as well decide to Target GET request.

18

Chapter 6
Project Requirements
6.1

Hardware Requirements

1. Test machine

6.2

Software Requirements

1. Windows 7/Vista OS
2. Tools: cURL, ApacheBench

19

Chapter 7
Design
7.1

UML Diagrams

(1).png
Figure 7.1: Use Case Diagram

20

Figure 7.2: Class Diagram

21

Figure 7.3: Deployment Diagram

22

Figure 7.4: Sequence Diagram

23

Figure 7.5: Communication Diagram

24

Diagram1(1).jpg
Figure 7.6: Activity Diagram

25

Chapter 8
Planning and Scheduling
1. Months 1, 2, 3: Study of various web service protocols and architectures
such as HTTP, SOAP, REST architecture, MVC architecture, Advantages
of REST etc. More focused on understanding the importance of REST architecture
2. Months 4: Development of RESTful web services on various web frameworks.
Analysis of these services will be carried out and performance measured. The
different modules of a micro web server or framework will also be examined.
3. Months 5: Development of a testing environment where the perfomance of
each service is measured and generated. Implementation will also begin in
the form of customising framework modules and compiling and implementing
them.
4. Month 6, 7: Implementation of the RESTful architecture by customising
a web framework. Show how the RESTful server is more efficient than the
other servers. Also highlight the performance before/after the customization,
how the change was brought about and what was actually changed.

26

Chapter 9
References
[ 1 ]Pautasso, E. Wilde, and A. Marinos. First International Workshop on
RESTful Design (WSREST 2010), Apr. 2010
[ 2 ]Richardson and S. Ruby. RESTful Web Services. OReilly, Oct. 2007
[ 3 ]REST and Web Services: In Theory and in Practice;Paul Adamczyk, Patrick
H. Smith, Ralph E. Johnson, and Munawar Hafiz.
[ 4 ]Berners-Lee, R. Fielding, and H. Frystyk. RFC 1945: Hypertext Transfer
ProtocolHTTP/1.0,May. 1996

27

You might also like