You are on page 1of 12

Fault-tolerant TCP

S.Nagaraju
M050216CS

Guided by: Dr. M. P. Sebastian ,


Professor and Head Of The Department,Computer Engineering.
April 14, 2006

Department of Computer Engineering


National Institute of Technology,Calicut
Kerala -673601.

1
CERTIFICATE

This is to certify that the mini project Fault-Tolerant TCP is a bonafide


record of the mini project done by Mr.S.Nagaraju(M050216CS) under our
supervision and guidance. The project report has been submitted to Depart-
ment of Computer Engineering of National Institute of Technology, Calicut
in partial fulfilment of the requirements for the award of the degree of Master
Of Technology in Computer Science and Engineering.

Dr.M.P.Sebastian Dr.M.P.Sebastian
Professor and Project Guide
Head of the Department Professor and Head of the Department
Dept. of Computer Engineering Dept. of Computer Engineering
NIT Calicut NITCalicut

2
ACKNOWLEDGEMENT

I have been very fortunate to have Dr.M.P.Sebastian, Professor and Head


Of The Department,Computer Engineering, as my guide whose timely guid-
ance, advice and inspiration helped me in the preparation of this Mini Project.
I express my sincere gratitude for having guided me through this work.

S.Nagaraju.

3
Contents
1 Introduction 6

2 Assumptions or requirements 9

3 Protocol 9
3.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Opening the Initial Connection . . . . . . . . . . . . . . . . . 9
3.3 Connection re-establishment . . . . . . . . . . . . . . . . . . . 10

4 Performance 10

5 Network Configurations 10

6 Recovering from Logged Data 11

7 Conclusion 11

8 References 12

4
Abstract
We present an implementation of a faulttolerant TCP (FT-TCP)
that allows a faulty server to keep its TCP connections open until
it either recovers or it is failed over to a backup. The failure and
recovery of the server process are completely transparent to client
processes connected with it via TCP. FT-TCP does not affect the
software running on a client, does not require to change the servers
TCP implementation, and does not use a proxy.

5
1 Introduction
Failure of Server is within the control of the organization while the
failure of Client is not.So server recovery is important. Previously
fallow three approaches.

1. Application-level Approach

Solution: Insert a software layer between TCP and Application lay-


ers of client and server. Checkpoint TCP connection state for TCP
session recovery. Re-establish connection between old client and new
server from log data.
Pros: efficient, portable.
Problem: Complexity of such a software and software layer must
be run at both ends. This approach can be shown in the following
Fig. 1.

Figure 1: Application-level Approach.

2. Proxy-based Approach

Solution: Use proxy on the network path


Pros: client is unchanged
Problem: introduces single point of failure and potential perfor-
mance bottleneck.

Figure 2: Proxy-based Approach.

6
3. TCP Replication Approach

Solution: Modify the TCP implementation on the server to accom-


modate reestablishing connections.
Pros: changes only to server, no single point of failure.
Problem: very difficult to deterministically replicate TCP.

Figure 3: Proxy-based Approach.

Proposed Approach : FT-TCP

Solution: Wrap the servers TCP driver on top and bottom.


Eliminates drawbacks of the previous three approaches, i.e..
. Does not change client side software.
. Does not change servers TCP implementation.
. Does not use a proxy No changes to TCP, no changes to client,
no single point of failure.

Figure 4: FT-TCP.

Based on the concept of wrapping. A layer of software surrounds


the TCP layer and intercepts all its communication. IP side South
Side Wrap (SSW). Application side North Side Wrap (NSW). The
two wraps communicate with a logger.Wraps and logger help maintain
the current state of the TCP connection. In case of failures, they
cooperate to restart the server and restore the TCP connection state.

7
Working of SSW

SSW intercepts data between TCP and IP layers. Segments from


TCP to IP maps seq # from clients connection state to servers con-
nection state. SSW translates the seq #s to be consistent with those
used in the original handshake.Packets from IP to TCP SSW does
an inverse mapping on the Ack numbers. SSW also sends packets to
the logger and modifies or generates ACKs from server to client this
ensures that logger saves data before client discards it from the send
buffer.

Working of NSW

Intercepts read and write socket calls from the application layer to
the TCP layer. Logs the amount of data returned with each socket call
(read length). During crashed server recovery NSW forces read socket
calls to have same data and read lengths. This ensures deterministic
recovery. Discards write socket calls to avoid resending data to the
client.

Working of Logger

Runs on an independent processor from the server (failure indepen-


dence). Stores connection state information like advertised window
size, acknowledgement seq # and data and read lengths. Acknowl-
edges to NSW and SSW after logging.

8
2 Assumptions or requirements
A restarting server has the application restarting from its initial
state. Process issues the same sequence of read socket calls when
replayed. Requires a mechanism allowing another process/processor
to take over the IP address of a process on a failed processor. This
mechanism also needs to update the ARP cache of any client on same
physical network.

3 Protocol
3.1 Variables
. delta seq = allows SSW to map seq #s.
. stable seq = smallest seq # that the SSW does not know to be
logged.
. serverseq = highest seq # acknowledged by the client.
. unstable reads = no. of read socket calls whose read lengths
NSW does not know to be logged.
. restarting = true when server is not in normal operation.

3.2 Opening the Initial Connection


FT-TCP captures and logs clients and server’s initial seq #s. SSW
completes initialization of FT-TCP by assigning appropriate values to
variables. On receipt of a packet from IP, SSW does the following:
- Forward the packet to the logger.
- Subtract delta seq from Ack#.
- On receiving ack from logger, SSW updates stable seq if neces-
sary.
. On receipt of a segment from TCP:
- Remaps the seq # by adding delta seq.
- Sets Ack# to stable seq.
- This may lead to reduction in window size.
- So to compensate SSW increases advertised window size by asn-
stable seq.
NSW does the following:
. Read socket call = sends read length to the logger and increments
unstable reads.
. Write socket calls = Blocks the call until unstable reads = 0. .
Logger Ack = Decrements unstable reads.

9
3.3 Connection re-establishment
. When a server crashes:
- Logger detects server failure and temporarily takes over by send-
ing TCP segments with closed window and acks upto stable seq.
- Server restarts and FT-TCP reconnects with Log server. FT-
TCP sets stable seq and server seq from the logged data, sets unsta-
ble reads to 0 and recovering to true.
- Logger implicitly relinquishes generation of Acks to SSW.
- Restarting application either executes an accept or connect socket
call.
- SSW fabricates a SYN that appears to come from the client and
has initial seq # of stable seq and passes it to the TCP layer.
- Acknowledging SYN from servers TCP is captured by SSW and
it sets delta seq to the initial seq # minus the new proposed initial
seq #.
- SSW discards this segment, fabricates an ACK and passes it to
the servers TCP.

4 Performance
. Prototype implementation:
- Client transmits a stream as bulk data to the server as fast as it
can.
- The server just discards this data.
. Quantities measured:
- Throughput of FT-TCP.
- Additional latency introduced by FT-TCP.
- Recovery time of the server.

5 Network Configurations
1. Client and server share a 10 MB Ethernet and server and logger
share another 10 MB Ethernet (10-10).
2. All 3 are on the same 10 MB Ethernet (10 Shared).
3. Client and Server share 10 MB Ethernet and server and logger
share 100 MB Ethernet (10-100).

10
6 Recovering from Logged Data
To avoid large latency, FT-TCP sends recovery data to logger asyn-
chronously. Some recovery data may be lost and so recovery can re-
store the server to a state earlier to the one that the client knows
about.E.g.. Ack seq # sent by server = asn but logger has only
recorded asn-l. Now when server recovers, TCP knows only about
asn-l so its next packet has an ack # less than asn. But client may
have already discarded the data upto asn since it received an ack for
it. FT-TCP solves this problem by making the SSW not allow the
outgoing ack seq # to be larger than asn-l+1.

Similar problem with amount of data exchanged. E.g.. Read re-


turns 900 bytes. NSW sends this read length to logger but server
crashes before this message gets through. On recovery, read returns
1500 bytes. This is an inconsistent state. If client can make out the
inconsistency, then server failure is not masked. FT-TCP solution -
Delay all write socket calls from server application till all prior read
lengths are known to be stored on the logger.

7 Conclusion
FT-TCP wraps an existing TCP layer to mask server failures from
unmodified clients. If server-logger connection is fast the additional
overhead on throughput and latency is low. FT-TCP solves the net-
work part of the puzzle indistinguishable from non-fault-tolerant TCP.

11
8 References
References
[1] P. M. Chen et. al. The Rio file cache: “Surviving operating system
crashes”. In Proceedings of the Seventh International Conference on
Architectural Support for Programming Languages and Operating
Systems, October 1996, pp. 7483.
[2] E. Elnozahy, L. Alvisi, Y.M. Wang, and D.B. Johnson. “A Survey
of Rollback-Recovery Protocols in Message Passing Systems”. CMU
Technical Report CMU-CS-99-148, June 1999.
[3] D. Maltz and P. Bhagwat. “TCP splicing for application layer
proxy performance”. IBM Research Report 21139 (Computer Sci-
ence/ Mathematics), IBM Research Division, 17 March 1998.

12

You might also like