Shark: Scaling File Servers Via Cooperative Caching

Shark: Scaling File Servers via Cooperative Caching
Siddhartha Annapureddy, Micheal J. Freedman, David Mazieres

Presented By : Syed Jibranuddin
Motivation
Replicating Execution Environment
on each machine before launching distributed application -> WASTE RESOURCES + DEBUG
Multiple Client Copying Same Files Replicating data and serving same copy of data to multiple clients -> INCONSISTENCIES
Introduction
Shark is a distributed file system Designed for large-scale, wide-area deployment. Scalable. Provides a novel cooperative-caching mechanism
Reduces the load on an origin file server.
Challenges
Scalability:
What if a large program (say in MBs) is being executed from a file server by many clients? Because of bandwidth, server might deliver unacceptable performance. As the model is similar to P2P file systems,
How Shark Works ???
Shark
clients find nearby copies of data by using distributed index. avoids transferring the file/chunks of the file from the server, if the same data can be fetched from nearby, client.
Client
Shark
is compatible with existing backup and restore procedures. shared read, Shark greatly reduces server load
By
Design Protocol
1. 2. 3.
4.

File into chunks by using Rabin finger print algorithm. Shark interacts with the local host using an existing NFSv3 and runs in user space. When First Client reads a particular file Gets file and Registers as replica proxy for chunks of file in distributed index. Now when 2nd client wants to access the same file:
discovers proxies of the file chunks by querying the distributed index. Establish a secure channel to proxy(s), download the file chunks in parallel.
5. 6.
1.
2.
After fetching, the client then registers itself as a replica proxy for these chunks. Server exposes two Apis. : Put & Get: SECURE DATA SHARING Data is encrypted by the sender and can be decrypted only by the client with appropriate read permissions. Clients cannot download large amounts of data without proper read authorization.
Secure Data Sharing
Cross-file-system sharing Shark uses token generated by the file server as a shared secret between client and proxy. Client can verify the integrity of the received data. For a sender client (proxy client) to authorize requester client, requester client will provide the knowledge of token Once authorized, receiver client will establish the read permission of the file.
File Consistency
Shark uses two network file system techniques: Leases AFS-style whole-file caching Client first checks its own cache, if file is not cached/updated, it fetches it from fileserver/replica proxies. When client makes a read RPC to the file server, it gets a read lease on the particular file. In Shark default lease duration = 5mins.
Cooperative Caching
.
Client (C1)
GETTOK (fh, offset, count)
1. file attributes 2. file token
3. (chunktok1, offset1, size1) (chunktok2, offset2, size2) (chunktok3, offset2, size2)
Server
C2
C3
C4
File Not Cached / Lease Expired
Issues Series of Read calls to the Kernel NFS server. 1. TF = tok (F) = HMACr(F) 2. Split data into chunks 3. Compute tokens for chunks Caches the tokens for future reference. Multicast Requesting for Chunk Fi
Determines if (Local Cache == Latest) Fetch K Chunks in Parallel
if (Local Cache is not latest => Create K Threads
t1 t2 t3 t1 t2 t3 t3
Chunk F1 Chunk F3
Request Chunk F2 Chunk F2
F3 F2
PUT() -> Announuces as a Proxy for Fi
Distributed Indexing
Shark uses global distributed index for all shark clients. System maps opaque keys onto nodes by hashing the value onto a key ID. Assigning IDs to nodes allows lookup on the O(log(n)) Shark stores only small information about which clients stores what data. Shark uses Coral as its distributed index. Coral Provides Distributed Sloppy Hash Table (DSHT).
Implementation
Shark consists Components:
of
Main
Server side daemon
Implemented
Client side daemon Coral daemon
in C++ and are built Client side daemon Biggest Component of shark. Handles User Requests
Evaluation
Shark is evaluated against NFSv3 and SFS. Read tests are performed both
within the controlled Emulab LAN environment and In the wide area on the PlanetLab v3.0 test-bed.
The server required 0.9 seconds to compute chunks for a 10 MB random file, and 3.6 seconds for a 40 MB random file.
For the local-area micro-benchmarks, local machines at NYU are used as a Shark client. In this micro-benchmark, Sharks chunking
Conclusion
Current Networked Filesystems like NFS are not scalable.
Forces users to resort to inconvenient tools like rsync.
Shark offers a filesystem that scales to hundreds of clients.

Locality aware cooperative cache. Supports Cross FS Sharing enabling Novel Applications.
Question s

Shark: Scaling File Servers Via Cooperative Caching

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shark: Scaling File Servers Via Cooperative Caching

Uploaded by

Copyright:

Available Formats

Shark: Scaling File Servers via Cooperative Caching

Siddhartha Annapureddy, Micheal J. Freedman, David Mazieres

Replicating Execution Environment

How Shark Works ???

Secure Data Sharing

File Not Cached / Lease Expired

Determines if (Local Cache == Latest) Fetch K Chunks in Parallel

if (Local Cache is not latest => Create K Threads

PUT() -> Announuces as a Proxy for Fi

Shark consists Components:

Server side daemon

Client side daemon Coral daemon

Current Networked Filesystems like NFS are not scalable.

Forces users to resort to inconvenient tools like rsync.

Shark offers a filesystem that scales to hundreds of clients.

You might also like