You are on page 1of 20

Distributed Shared Memory

Outline

Introduction

Shared Memory vs. Distributed


Memory

Distributed Shared Memory


Architecture

DSM vs. Message Passing

Design and Implementation

Consistency Models

DSM Algorithms

Conclusion

Introduction

From the system interconnection perspective, parallel systems fall into


two main categories:
CP
CP
U
U
CP
CP
U
U

Shared Memory

Memor
y

CP
CP
U
U

CP
CP
U
U
Multiprocessors

Tightly-Coupled Multiprocessor
Multiprocessor
Shared Memory architecture
architecture

CP
CP
U
U

Memor
y

CP
CP
U
U
Network

Memor
y

CP
CP
U
U

Memor
y

CP
CP
U
U

Multicomputers
Loosely-Coupled
Distributed Memory

Shared Memory vs. Distributed Memory

Shared Memory

Distributed Memory

Global Address Space

No concept of global address space

Cache Coherent

No concept of cache coherency

Lack of scalability

Scalable performance

Expensive to build

Cost effectiveness: can use commodity, off-theshelf processors and networking

Easy to program, reusability

programmer is responsible for data


communication

Data sharing fast and uniform

Data sharing by message passing, non-uniform


memory access times

Cache coherent means if one processor updates a location in shared


memory, all the other processors know about the update.

Distributed-Shared Memory
Architecture
Definition

A Distributed Shared Memory DSM is an abstraction


that allows the physically separated memories to be
addressed as one logically shared address space.

General Characteristics
Hybrid architecture

Virtual Space shared between all processes

Shared Memory model implemented over


physically distributed memory
Shared-Memory programming techniques can be used
When reading and updating, processes see DSM as an ordinary
memory within their address space

Mapping Manager: Maps shared-memory

address to physical memory (remote or local)

P1

P2

M1

M2

Mn

MM

MM

MM

Pn

Interconnection
Network

Shared Virtual Space

Distributed-Shared Memory
Architecture (contd.)

General Characteristics
Covert Communication operations

Heterogeneous Nodes

The shared memory component can be a cache-coherent SMP


machine and/or a graphics processing unit GPU

Processes on different computers observe the updates made


by one another

Communications are still needed to exchange data, but they are hidden
from the programmer. Inter-process communication transparency

Cache-coherent

SMP or UMA (bus-based) : A model with identical processors that


have equal access times to a shared memory
UMA : Uniform Memory Access
SMP : Symmetric Multiprocessor

P1

P2

M1

M2

Mn

MM

MM

MM

Pn

Interconnection
Network

Shared Virtual Space

Distributed-Shared Memory
Architecture
(contd.)
Advantages

Implicit data sharing

Less expensive to build and scalable

Inherited from the distributed-memory architecture

Very large total physical memory for all nodes

Shields programmer from Send/Receive primitives

Large programs can run more efficiently

Software Portability and Reusability

Programs written for shared memory multiprocessors can be run on DSM systems with
minimum changes

Disadvantages

Little programmer control over actual messages being generated


DSM implementations use asynchronous message-passing

cannot be more efficient than Message Passing implementations

Distributed-Shared Memory
Architecture
(contd.)
Best Suitable

When individual shared data items can be accessed directly

e.g. Parallel Applications

Less appropriate
When data is accessed by request

e.g. Client-Server Systems

A server may be used to assist in providing DSM functionality for data


shared between clients

DSM vs. Message Passing

Property

DSM

Message Passing

Marshalling

No. Variables shared directly

Yes. Programmers job

Address space

Single. Interference may occur

Private. Processes are protected

Data representation

Uniform

Heterogonous

Synchronization

Normal construct for


shared-memory programming

Message passing primitives

Process execution

Non-overlapping lifetimes

At the same time

communications cost

Invisible

Obvious

No evidence against or in favor to any of the two


communication mechanisms

Design and Implementation


Main Issues
Granularity refers to the size of sharing unit that can be uniform chunks of memory or data structures

Structure refers to the arrangement of shared data

Most systems view DSM as a linear array of words

Replacement Strategies

byte, page or complex data structure


Small Pages : increased parallelism
increase in directory size
Large Pages : reduce paging overhead, but increase sharing overhead

Similar to caching mechanisms in MP


In cache systems, LRU is often used
In DSM, shared pages need to be given higher priority than exclusively owned pages
first

Synchronization Primitives

Coherence protocols must ensure the consistency of shared data


DSM must allow simultaneous access to shared data on different machines

single writer, multiple readers, etc.

they could be replaced

Consistency Models
Definition

A memory consistency model for a shared address space specifies constraints on the order in which
memory operations must appear to be performed (i.e. to become visible to the processors) with respect to
one another.

Strict Consistency Model

Any read to a certain memory location returns the value stored by most recent write
operation to that address, irrespective of the locations of the processors performing the
read and the write operation.

Sequential Consistency Model

if the result of any execution is the same as if the operations of all processors were
executed in the same sequential order, and the operations of each individual processor
appear in this sequence in the order specified by its program. (Leslie Lamport)
Definition restated: Sequential consistency requires that a shared memory multiprocessor
appears to be amultiprogramming uniprocessor system to any program running on it.

All instructions are executed in order


Every write operation becomes instantaneously visible throughout the system

Consistency Models (contd.)


Sequential Consistency
Model

Example 1
P1

P2

Data = 2000
{}

while (Head == 0)

Head = 1

= Data

Sequential consistency requires program


order
The write to Data has to complete before
the write to Head can begin
The read of Head has to complete before
the read of Data can begin

Example 2
Initially A = B = 0
P1
A=1

P2

P3

if (A == 1)
B=1
if (B
== 1)
register = A

Sequential consistency can be had if a


process makes sure that everyone has seen
an update before that value is read -

Consistency Models (contd.)


Causal Consistency Model
Writes that are potentially causally related must be seen by all processors in the same order .
Writes that are not potentially causally related may be seen in a different order on different
machines
Processor Consistency Model

Writes done by a single processor are seen by all other processors in the order in
which they were written on that processor.
Writes from different processors may be seen in a different order by different
processors.

Release Consistency Model

Weak consistency with two types of synchronization operations : acquire and release
Each type of operations is guaranteed to be processor consistent

DSM
Algorithms

Server

The Central Server Algorithm


Central Server maintains all shared data

Read Request The server just returns the data

Write Request update the data and send acknowledgement to the client

Two messages for each data access

Implementation

If an applications request to access shared data fails repeatedly, a failure condition is sent to the application

Issues: performance and reliability

Client

A timeout is used to resend a request if acknowledgment fails


Associated sequence numbers can be used to detect duplicate write requests

bottleneck at the server

Possible solutions

Partition shared data between several servers


Use a mapping function to distribute/locate data

DSM Algorithms (contd.)


The Migration Algorithm
Data is always migrated to the site where it is accessed

Allow only one node to access a shared data at a time

Migration Request

Single Reader / Single Writer SRSW protocol

Data is typically migrated between servers in a fixed-size unit called a block

Facilitate the management of data instead of migrating individual data units

Advantages
Takes advantage of the locality of reference

No communication costs are incurred when a process accesses data held locally

Data Block

DSM can be integrated with the virtual memory of the OS at each node
-

The size of the block is chosen to be equal to a virtual memory page or a multiple thereof
A locally-held shared memory page can be mapped into the applications virtual address space

Access to data items on data blocks not held locally triggers a page fault

Normal machine instructions for accessing memory can be used


the fault handler can communicate with the remote hosts to obtain the requested data.

When a data block is migrated away, it will be removed from any local address space it was mapped to

To locate a remote data object:


Use a location server
Broadcast query
Issues

Pages can thrash between hosts: to minimize it, set minimum time for data objects to reside at a node

DSM Algorithms (contd.)


The problem with the previous techniques is the sequential access
to the data block
The Read-Replication Algorithm
Extends the migration algorithm

Replicates data blocks at multiple nodes for read access

Replication can reduce the average cost of read operations

Multiple nodes can have read access or one node write access

Block Request

multiple readers-one writer MRSW protocol

After a write, all copies are invalidated or updated

DSM has to keep track of locations of all copies of data objects

IVY The owner node of data object knows all nodes that have copies

PLUS Distributed linked-list tracks all nodes that have copies

Data Block

Multicast invalidate

Advantages

The read-replication can lead to substantial performance improvements if the ratio of reads to writes is
large

Disadvantages
Write operations might be more expensive since replies may have to be invalidated or updated to maintain
consistency

DSM Algorithms
(contd.)
The Full-Replication Algorithm

sequencer

Extension of read-replication algorithm

Multiple nodes have both read and write access to shared data blocks

multiple-readers multiple-writers MRMW protocol

write

update

Issues

Consistency of data for multiple writers

Solution

use of gap-free sequencer

Client
s

All writes sent to a sequencer

Sequencer assigns sequence number and sends write request to all sites that have copies

Each node performs writes according to sequence numbers

A gap in sequence numbers indicates a missing write request: Node asks for retransmission of missing write
requests

DSM Algorithms
Performance Measure

It needs to take into account the cost of


accessing local and remote data blocks

Parameters
p: cost of sending or receiving
a short packet
P: cost of sending or receiving a data
block, assume P/p equal to 20

S:

r: Read/Write ratio
f: probability of an access fault on a

number of sites participating in


distributed shared memory

non-replicated data block

f `:

probability of an access fault on


replicated data blocks

Conclusion

Being a hybrid of the distributed and shared memory architectures, DSM


systems offer a trade-off between the easy-programming of shared
memory machines and the efficiency and scalability of the distributed
memory systems.

While the programmer is relieved from the communication details, he still


has to take care of many design and implementation issues. The
algorithms mentioned above offer various solutions with cost and
performance varying for each.
No single algorithm is good for all applications.
Algorithms need to be adaptive to application characteristics

Thank you for Paying Attention

You might also like