Distributed Shared Memory

Distributed Shared Memory
Outline
Introduction
Shared Memory vs. Distributed

Memory
Distributed Shared Memory

Architecture
DSM vs. Message Passing
Design and Implementation
Consistency Models
DSM Algorithms
Conclusion
Introduction
From the system interconnection perspective, parallel systems fall into

two main categories:
CP
CP
U
U
CP
CP
U
U
Shared Memory
Memor
y
CP
CP
U
U
CP
CP
U
U
Multiprocessors
Tightly-Coupled Multiprocessor
Multiprocessor
Shared Memory architecture
architecture
CP
CP
U
U
Memor
y
CP
CP
U
U
Network
Memor
y
CP
CP
U
U
Memor
y
CP
CP
U
U
Multicomputers
Loosely-Coupled
Distributed Memory
Shared Memory vs. Distributed Memory
Shared Memory
Distributed Memory
Global Address Space
No concept of global address space
Cache Coherent
No concept of cache coherency
Lack of scalability
Scalable performance
Expensive to build
Cost effectiveness: can use commodity, off-theshelf processors and networking
Easy to program, reusability
programmer is responsible for data

communication
Data sharing fast and uniform
Data sharing by message passing, non-uniform

memory access times
Cache coherent means if one processor updates a location in shared

memory, all the other processors know about the update.
Distributed-Shared Memory
Architecture
Definition
A Distributed Shared Memory DSM is an abstraction

that allows the physically separated memories to be
addressed as one logically shared address space.
General Characteristics
Hybrid architecture
Virtual Space shared between all processes
Shared Memory model implemented over

physically distributed memory
Shared-Memory programming techniques can be used
When reading and updating, processes see DSM as an ordinary
memory within their address space
Mapping Manager: Maps shared-memory
address to physical memory (remote or local)
P1
P2
M1
M2
Mn
MM
MM
MM
Pn
Interconnection
Network
Shared Virtual Space
Architecture (contd.)
General Characteristics
Covert Communication operations
Heterogeneous Nodes
The shared memory component can be a cache-coherent SMP

machine and/or a graphics processing unit GPU
Processes on different computers observe the updates made

by one another
Communications are still needed to exchange data, but they are hidden
from the programmer. Inter-process communication transparency
Cache-coherent
SMP or UMA (bus-based) : A model with identical processors that

have equal access times to a shared memory
UMA : Uniform Memory Access
SMP : Symmetric Multiprocessor
P1
P2
M1
M2
Mn
MM
MM
MM
Pn
Interconnection
Network
Shared Virtual Space
Architecture
(contd.)
Advantages
Implicit data sharing
Less expensive to build and scalable
Inherited from the distributed-memory architecture
Very large total physical memory for all nodes
Shields programmer from Send/Receive primitives
Large programs can run more efficiently
Software Portability and Reusability
Programs written for shared memory multiprocessors can be run on DSM systems with
minimum changes
Disadvantages
Little programmer control over actual messages being generated

DSM implementations use asynchronous message-passing
cannot be more efficient than Message Passing implementations
Architecture
(contd.)
Best Suitable
When individual shared data items can be accessed directly
e.g. Parallel Applications
Less appropriate
When data is accessed by request
e.g. Client-Server Systems
A server may be used to assist in providing DSM functionality for data

shared between clients
DSM vs. Message Passing
Property
DSM
Message Passing
Marshalling
No. Variables shared directly
Yes. Programmers job
Address space
Single. Interference may occur
Private. Processes are protected
Data representation
Uniform
Heterogonous
Synchronization
Normal construct for

shared-memory programming
Message passing primitives
Process execution
Non-overlapping lifetimes
At the same time
communications cost
Invisible
Obvious
No evidence against or in favor to any of the two

communication mechanisms
Design and Implementation

Main Issues
Granularity refers to the size of sharing unit that can be uniform chunks of memory or data structures
Structure refers to the arrangement of shared data
Most systems view DSM as a linear array of words
Replacement Strategies
byte, page or complex data structure

Small Pages : increased parallelism
increase in directory size
Large Pages : reduce paging overhead, but increase sharing overhead
Similar to caching mechanisms in MP

In cache systems, LRU is often used
In DSM, shared pages need to be given higher priority than exclusively owned pages
first
Synchronization Primitives
Coherence protocols must ensure the consistency of shared data

DSM must allow simultaneous access to shared data on different machines
single writer, multiple readers, etc.
they could be replaced
Consistency Models
Definition
A memory consistency model for a shared address space specifies constraints on the order in which
memory operations must appear to be performed (i.e. to become visible to the processors) with respect to
one another.
Strict Consistency Model
Any read to a certain memory location returns the value stored by most recent write
operation to that address, irrespective of the locations of the processors performing the
read and the write operation.
Sequential Consistency Model
if the result of any execution is the same as if the operations of all processors were
executed in the same sequential order, and the operations of each individual processor
appear in this sequence in the order specified by its program. (Leslie Lamport)
Definition restated: Sequential consistency requires that a shared memory multiprocessor
appears to be amultiprogramming uniprocessor system to any program running on it.
All instructions are executed in order

Every write operation becomes instantaneously visible throughout the system
Consistency Models (contd.)

Sequential Consistency
Model
Example 1
P1
P2
Data = 2000
{}
while (Head == 0)
Head = 1
= Data
Sequential consistency requires program

order
The write to Data has to complete before
the write to Head can begin
The read of Head has to complete before
the read of Data can begin
Example 2
Initially A = B = 0
P1
A=1
P2
P3
if (A == 1)
B=1
if (B
== 1)
register = A
Sequential consistency can be had if a

process makes sure that everyone has seen
an update before that value is read -
Consistency Models (contd.)

Causal Consistency Model
Writes that are potentially causally related must be seen by all processors in the same order .
Writes that are not potentially causally related may be seen in a different order on different
machines
Processor Consistency Model
Writes done by a single processor are seen by all other processors in the order in
which they were written on that processor.
Writes from different processors may be seen in a different order by different
processors.
Release Consistency Model
Weak consistency with two types of synchronization operations : acquire and release
Each type of operations is guaranteed to be processor consistent
DSM
Algorithms
Server
The Central Server Algorithm

Central Server maintains all shared data
Read Request The server just returns the data
Write Request update the data and send acknowledgement to the client
Two messages for each data access
Implementation
If an applications request to access shared data fails repeatedly, a failure condition is sent to the application
Issues: performance and reliability
Client
A timeout is used to resend a request if acknowledgment fails

Associated sequence numbers can be used to detect duplicate write requests
bottleneck at the server
Possible solutions
Partition shared data between several servers

Use a mapping function to distribute/locate data
DSM Algorithms (contd.)

The Migration Algorithm
Data is always migrated to the site where it is accessed
Allow only one node to access a shared data at a time
Migration Request
Single Reader / Single Writer SRSW protocol
Data is typically migrated between servers in a fixed-size unit called a block
Facilitate the management of data instead of migrating individual data units
Advantages
Takes advantage of the locality of reference
No communication costs are incurred when a process accesses data held locally
Data Block
DSM can be integrated with the virtual memory of the OS at each node
-
The size of the block is chosen to be equal to a virtual memory page or a multiple thereof
A locally-held shared memory page can be mapped into the applications virtual address space
Access to data items on data blocks not held locally triggers a page fault
Normal machine instructions for accessing memory can be used

the fault handler can communicate with the remote hosts to obtain the requested data.
When a data block is migrated away, it will be removed from any local address space it was mapped to
To locate a remote data object:

Use a location server
Broadcast query
Issues
Pages can thrash between hosts: to minimize it, set minimum time for data objects to reside at a node
DSM Algorithms (contd.)

The problem with the previous techniques is the sequential access
to the data block
The Read-Replication Algorithm
Extends the migration algorithm
Replicates data blocks at multiple nodes for read access
Replication can reduce the average cost of read operations
Multiple nodes can have read access or one node write access
Block Request
multiple readers-one writer MRSW protocol
After a write, all copies are invalidated or updated
DSM has to keep track of locations of all copies of data objects
IVY The owner node of data object knows all nodes that have copies
PLUS Distributed linked-list tracks all nodes that have copies
Data Block
Multicast invalidate
Advantages
The read-replication can lead to substantial performance improvements if the ratio of reads to writes is
large
Disadvantages
Write operations might be more expensive since replies may have to be invalidated or updated to maintain
consistency
DSM Algorithms
(contd.)
The Full-Replication Algorithm
sequencer
Extension of read-replication algorithm
Multiple nodes have both read and write access to shared data blocks
multiple-readers multiple-writers MRMW protocol
write
update
Issues
Consistency of data for multiple writers
Solution
use of gap-free sequencer
Client
s
All writes sent to a sequencer
Sequencer assigns sequence number and sends write request to all sites that have copies
Each node performs writes according to sequence numbers
A gap in sequence numbers indicates a missing write request: Node asks for retransmission of missing write
requests
DSM Algorithms
Performance Measure
It needs to take into account the cost of

accessing local and remote data blocks
Parameters
p: cost of sending or receiving
a short packet
P: cost of sending or receiving a data
block, assume P/p equal to 20
S:
r: Read/Write ratio
f: probability of an access fault on a
number of sites participating in

distributed shared memory
non-replicated data block
f `:
probability of an access fault on

replicated data blocks
Conclusion
Being a hybrid of the distributed and shared memory architectures, DSM

systems offer a trade-off between the easy-programming of shared
memory machines and the efficiency and scalability of the distributed
memory systems.
While the programmer is relieved from the communication details, he still

has to take care of many design and implementation issues. The
algorithms mentioned above offer various solutions with cost and
performance varying for each.
No single algorithm is good for all applications.
Algorithms need to be adaptive to application characteristics
Thank you for Paying Attention

Distributed Shared Memory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Shared Memory

Uploaded by

Copyright:

Available Formats

Distributed Shared Memory

Shared Memory vs. Distributed

Distributed Shared Memory

DSM vs. Message Passing

Design and Implementation

From the system interconnection perspective, parallel systems fall into

Shared Memory vs. Distributed Memory

Global Address Space

No concept of global address space

No concept of cache coherency

Cost effectiveness: can use commodity, off-theshelf processors and networking

Easy to program, reusability

programmer is responsible for data

Data sharing fast and uniform

Data sharing by message passing, non-uniform

Cache coherent means if one processor updates a location in shared

A Distributed Shared Memory DSM is an abstraction

Virtual Space shared between all processes

Shared Memory model implemented over

Mapping Manager: Maps shared-memory

address to physical memory (remote or local)

Shared Virtual Space

The shared memory component can be a cache-coherent SMP

Processes on different computers observe the updates made

SMP or UMA (bus-based) : A model with identical processors that

Shared Virtual Space

Implicit data sharing

Less expensive to build and scalable

Inherited from the distributed-memory architecture

Very large total physical memory for all nodes

Shields programmer from Send/Receive primitives

Large programs can run more efficiently

Software Portability and Reusability

Little programmer control over actual messages being generated

cannot be more efficient than Message Passing implementations

When individual shared data items can be accessed directly

e.g. Parallel Applications

e.g. Client-Server Systems

A server may be used to assist in providing DSM functionality for data

DSM vs. Message Passing

No. Variables shared directly

Yes. Programmers job

Single. Interference may occur

Private. Processes are protected

Normal construct for

Message passing primitives

At the same time

No evidence against or in favor to any of the two

Design and Implementation

Structure refers to the arrangement of shared data

Most systems view DSM as a linear array of words

byte, page or complex data structure

Similar to caching mechanisms in MP

Coherence protocols must ensure the consistency of shared data

single writer, multiple readers, etc.

they could be replaced

Strict Consistency Model

Sequential Consistency Model

All instructions are executed in order

Consistency Models (contd.)

Sequential consistency requires program

Sequential consistency can be had if a

Consistency Models (contd.)

Release Consistency Model