You are on page 1of 45

MEMORY ORGANIZATION

• Memory Hierarchy

• Main Memory

• Associative Memory

• Cache Memory

• Virtual Memory

• Memory Management Hardware


Memory
• Ideally,
1. Fast
2. Large
3. Inexpensive

• Is it possible to meet all 3 requirements simultaneously ?

Some Basic Concepts


• What is the max. size of memory?
• Address space
–16-bit : 216 = 64K memory locations
–32-bit : 232 = 4G memory locations
–40-bit : 240 = 1 T memory locations

• What is Byte addressable?


Introduction
• Even a sophisticated processor may
perform well below an ordinary
processor:
–Unless supported by matching
performance by the memory system.
• The focus of this module:
–Study how memory system
performance has been enhanced
through various innovations and
optimizations.
Memory Hierarchy

MEMORY HIERARCHY
Memory Hierarchy is to obtain the highest possible
access speed while minimizing the total cost of the memory system
Auxiliary memory
Magnetic
tapes I/O Main
processor memory
Magnetic
disks

CPU Cache
memory

Increasing speed Increasing cost


Register

Cache

Main Memory

Magnetic Disk

Magnetic Tape

Increasing size
Basic Concepts of Memory

Processor k-bit address bus Memory


MAR

k
n-bit data bus Up to 2 addressable
MDR locations

Control lines word length= n bits


CU
R / W , MFC , etc

Connection of the memory to the processo


Basic Concepts of Memory
• Data transfer between memory & processor takes place through MAR &
MDR.
• If MAR is of K-bit then memory unit contains 2K addressable location.
[K number of address lines]
• If MDR is of n-bit then memory cycle n bits of data transferred between
memory & processor. [n number of data lines]
• Bus also includes control lines Read / Write & MFC for coordinating
data transfer.
• Processor read operation
à MARin , Read / Write line = 1 , READ , WMFC , MDRin
• Processor write operationàà
à MDRin , MARin , MDRout , Read / Write line = 0 , WRITE , WMFC
• Memory access is synchronized using a clock.
• Memory Access Time – Time between start Read and MFC signal [Speed
of memory]
• Memory Cycle Time – Minimum time delay between initiation of two
successive memory operations.[ ]
Basic Concepts of Memory
Processor
increasing increasing increasing
size Registers speed cost per bit

Cache L1

SRAM
Cache L2

Main
ADRAM
memory

secondary
storage
memory
Basic Concepts of Memory
• Random Access Memory à any location can be accessed for read / write operation
in fixed amount of time .
• Types of RAM :à à
1. Static memory / SRAM : Capable of retain states as long as power is
applied, volatile in nature.[High cost & speed]
2. Asynchronous DRAM : Dynamic RAM are less expensive but they do not
retain their state indefinitely. Widely used in computers.
3. Synchronous DRAM : Whose operation is directly Synchronized with a clock
signal.
4. Performance Parameter :- Bandwidth & Latency.
5. Bandwidth :-Number of bytes transfer in 1 unit of time.
6. Latency:- Amount of time takes to transferred a word of data to & from
memory.
• Read Only Memory / ROM :à à location can be accessed for read operation only in
fixed amount of time . Capable of retain states called as, non-volatile in nature.
• Programmable ROM : Allows data to be loaded by user.
• Erasable PROM : Erased [ by UV ray ]Stored data to load new data.
• Electrically EPROM : Erased by different voltages.
• Memory uses semiconductor integrated circuit to increase performance.
• To reduce memory cycle time à Use Cache memory àA small SRAM physically very
closed to processor which works on locality of reference.
• Virtual memory is used to increase the size of the physical memory.
Internal Organization of Memory Chips

2 4

Organization of bit cells in a memory chip


Internal Organization of Memory Chips
• Memory cells are organized in an array [ Row & Column format ] where
each cell is capable of storing one bit of information.
• Each row of cell contains memory word / data & all cells are connected to
a common word line, which is driven by address decoder on chip.
• Cell in each column are connected to sense / write circuit by 2 bit lines.
• sense / write circuits are connected to data I/O lines of chip.
• READ Operation à sense / write circuit Sense / Read information stored in
cells selected by a word line & transmit same information to o/p data line.
• WRITE Operation à sense / write circuit receive i/p information & store it in
the cell.
• If a memory chip consist of 16 memory words of 8 bit each then it is
referred as 16 x 8 organization or 128 x 8 bit organization.
• The data I/O of each sense / write circuit are connected to a single
bidirectional data line that can be connected to the data bus of a computer.
• 2 control lines Read / Write [Specifies the required operation ] & Chip
Select (CS ) [ select a chip in a multichip memory ].
• It can store 128 bits & 14 external connections like address, data & control
lines.
Internal Organization of Memory Chip
An Example

32 - to -1
O/P MUX
&
I/P DMUX

Data I/P & O/P


Internal Organization of Memory Chip
An Example
1k [ 1024 ] Memory Cell

• Design a memory of 1k [ 1024 ] memory cells.


• For 1k , we require 10 bits address line.
• So 5 bits for rows & columns each to access address of the memory cell
represented in array.
• A row address selects a row of 32 cells, all of which accessed in parallel.
• According to the column address, only one of these cells is connected to
the external data line by the output MUX & input DMUX.
Static Memories
• Circuits capable of retaining their state as long as power is applied Static
RAM (SRAM) (volatile ).
• 2 inverters are cross connected to form a latch.
• Latch is connected to 2 bit lines by transistors T1 & T2.
• transistors T1 & T2 act as switches can be opened & closed under control
of word line.
• For ground level transistors turned off (initial time cell is in state 1, X=1
& Y=0 ).
• Read Operation :-
1. Word line activated to close
switches T1 & T2 .
2. Cell state either 1 or 0 & the T1 T2
X Y
signal on bit line b and b’ are
always complements to each
other.
3. Sense / Write circuit set the end
value of bit line as output.
Word line
• Write Operation :-
1. State of the cell is set by
placing the actual values on bit Bit line
line b & its complement on b’
and activating the word line. A Static RAM Cell
[Sense / Write circuit ]
Asynchronous DRAM
• SRAM’s are fast but very costly due to much number of transistors for
their cells.
• So, less expensive cell, which also can’t retain their state indefinitely
turn into a memory as dynamic RAM [DRAM].
• Data is stored in DRAM cell in form of charge on capacitor but only for
a period of tens of milliseconds.

An Example of DRAM
• DRAM cell consist of a capacitor,
C , & a transistor, T .
• To store information in cell,
transistor T is turn on, & provide
correct amount of voltage to bit
line.
• After transistor turn off capacitor
begins to discharge.
• So, Read operation must be
completed before capacitor
drops voltage below some
threshold value [ by sense
amplifier connected to bit line].
Single Transistor Dynamic memory Cell
Design 16MB DRAM Chip

• 2 M x 8 memory chip .
• Cells are organized in the form of 4K x 4K .
• 4096 cells in each row divided into 512 group of 8. Hence 512 byte data can be stored in each row.
• 12 [ 512 x 8 = 212 ] bit address to select row & 9 [ 512 = 212 ] bits to specify a group of 8 bits in the
selected row.
• RSA [Row address strobe] & CSA [Column address strobe] will be crossed to find the proper bit
to read or write.
• The information on D7-0 lines is transferred to the selected circuit for write operation.
Synchronous DRAM
• Directly synchronize with a clock signal = SDRAM
• The address & data connections are buffered by means register.
• During read operation all cells in selected row loaded into latch & then to O/P register.
• Refresh counter refresh contain of the cells only.
• SDRAM can work on different modes by mode register like burst & self CSA activation
..
Memory Controller
• Memory address are divided into 2 parts.
• High order address bit which select row in the cell array, are provided first & latched into memory
chip under control of RSA signal.
• Low order address bit , which selects a column are provided on the same address & latched
through CSA signal.
• Controller accepts a complete address & R/W signal from processor under control of REQUEST
signal, which indicates memory access operation is needed.
• Controller forwards row & column address timing to have address multiplexing function.
• Then R/W & CS are send to memory.
• Data lines are directly connected between processor & memory.

Row/Column
Address address

RAS
R/W
CAS
Memory
Request Controller R/W

Processor Clock
CS Memory
Clock

data
Associative Memory
• Reduces the search time
efficiently
• Address is replaced by
content of data called as
Content Addressable Memory
(CAM)
• Called as Content based data.
• Hardwired Requirement :àà
– It contains memory array &
logic for m words with n bits
per each word.
– Argument register (A) & Key
register (k) each have n bits.
– Match register (M) has m bits,
one for each word in memory.
– Each word in memory is
compared in parallel with the
content of argument register
and key register.
– If a match found for a word
which matches with the bits
of argument register & its
corresponding bits in the
match register then a search
for a data word is over.
Cache Memory
• Relatively small SRAM [ Having low access time ]memory located
physically closer to processor.
• Locality of Reference
• The references to memory at any given time interval tend to be confined
within a localized areas
• This area contains a set of information and the membership changes
gradually as time goes by
• Temporal Locality à The information which will be used in near future is
likely to be in use already( e.g. Reuse of information in loops)
• Spatial Locality à If a word is accessed, adjacent(near) words are likely
accessed soon (e.g. Related data items (arrays) are usually stored
together; instructions are executed sequentially)
• Cache is a fast small capacity memory that should hold those information
which are most likely to be accessed

Main memory
CPU
Cache memory
Performance Of Cache Cache Write
Memory • Write Through
• All the memory accesses are directed – If Hit, both Cache and memory is
first to Cache written in parallel
• If the word is in Cache; Access cache – If Miss, Memory is written
to provide it to CPU à CACHE HIT – For a read miss, missing block may
• If the word is not in Cache; Bring a be overloaded onto a cache block
block (or a line) including that word to
replace a block now in Cache à • Write-Back (Copy-Back)
CACHE MISS – If Hit, only Cache is written
• Hit Ratio - % of memory accesses – If Miss, missing block is brought to
satisfied by Cache memory system Cache and write into Cache
• Te: Effective memory access time – Update only the cache location &
in Cache memory system mark it as updated with an
• Tc: Cache access time associated flag bit called as dirty /
• Tm: Main memory access time modified bit.
• Te = h*Tc + (1 - h) [Tc+Tm] – For a read miss, candidate block
• Example: must be written back to the memory
Tc = 0.4 µs, Tm = 1.2µ µs, h = 85% • Memory is not up-to-date, i.e., the same
item in Cache and memory may have
• Te = 0.85*0.4 + (1 - 0.85) * 1.6 = 0.58µ
µs different value called as cache coherence
problem.
Cache Memory
MEMORY AND CACHE MAPPING - ASSOCIATIVE MAPPLING -
Mapping Function à Specification of correspondence between main memory
blocks and cache blocks
Associative mapping
Direct mapping
Set-associative mapping

Associative Mapping
- Any block location in Cache can store any block in memory
à Most flexible
- Mapping Table is implemented in an associative memory
à Fast, very Expensive
- Mapping Table Stores both address and the content of the memory word

address (15 bits)

Argument register

Address Data
01000 3450
CAM 02777 6710
22235 1234
Cache Memory
- DIRECT MAPPING -
- Each memory block has only one place to load in Cache
- Mapping Table is made of RAM instead of CAM
- n-bit memory address consists of 2 parts; k bits of Index field and n-k bits of
Tag field
- n-bit addresses are used to access main memory and k-bit Index is used to
access the Cache
Addressing Relationships Tag(6) Index(9)

00 000 32K x 12
000
512 x 12
Main memory Cache memory
Address = 15 bits Address = 9 bits
Data = 12 bits Data = 12 bits
77 777 777

Direct Mapping Cache Organization


Memory
address Memory data
00000 1220 Cache memory
Index
address Tag Data
00777 2340 000 00 1220
01000 3450

01777 4560
02000 5670

777 02 6710
02777 6710
Cache Memory
DIRECT MAPPING
Operation
1. CPU generates a memory request with memory address (TAG;INDEX)
2. Access Cache using INDEX ; (tag; data)
Compare TAG on memory address and tag on cache memory
3. If two tags matches à then it is cache Hit
4. Provide corresponding Cache [INDEX] to find (data) & then send it to CPU
5. If not match à then a cache Miss & read required data from main memory.
6. Stores new data from Memory [tag : INDEX] à on Cache together with tag
& data [INDEX] (data)
7. Cache [INDEX] ß (TAG ; M [ TAG ; INDEX ])
8. CPU ß Cache [INDEX] (data)

Direct Mapping with block size of 8 words


Index tag data 6 6 3
000 01 3450 Tag Block Word
Block 0
007 01 6578
010 INDEX
Block 1
017

Block 63 770 02
777 02 6710
Cache Memory
- SET ASSOCIATIVE MAPPING
Each memory block has a set of locations in the Cache to load
Set Associative Mapping Cache with set size of two or more words of memory
under same index address
Index Tag Data Tag Data
000 01 3450 02 5670

777 02 6710 00 2340


Operation
1. CPU generates a memory address (TAG ; INDEX)
2. Access Cache with INDEX, (Cache word = (tag 0 : data 0); (tag 1 : data 1))
3. Compare TAG and tag 0 and then tag 1
4. If tag i = TAG à then cache Hit, CPU get data i
5. If tag i ≠ TAG à then cache Miss, {or set is full }
Replace either (tag 0, data 0) or (tag 1, data 1)
Assume (tag 0, data 0) is selected for replacement
(Why (tag 0, data 0) instead of (tag 1, data 1) ?)
Memory[tag 0, INDEX] ß Cache[INDEX](data 0)
Cache[INDEX](tag 0, data 0) ß Memory(TAG, M[TAG,INDEX]),
CPU ß Cache[INDEX](data 0)
Paging
• External fragmentation problem can be treated by
PAGING.
• Logical address space of a process can be
noncontiguous; process is allocated physical memory
whenever space is available
• Address mapping in paging schemeà à
– Divide physical memory into fixed-sized blocks called
frames (size is power of 2, between 512 bytes and 8,192
bytes)
– Divide logical memory into blocks of same size called
pages
– Keep track of all free frames
– To run a program of size n pages, need to find n free
frames and load program
– Set up a page table to translate logical to physical
addresses
– Internal fragmentation
Address Translation Scheme

• Address generated by CPU (Logical Address ) is divided


intoà
à
– Page number (p) – used as an index into a page table which
contains base address of each page in physical memory
– Page offset (d) – combined with base address to define the
physical memory address that is sent to the memory unit
– Base address + page offset = physical address
Paging Model of Logical and Physical Memory

Base
Address
Page
Of each
Number
Page in
As
Physical
index
Memory
Paging Hardware

Base address
+ page offset=
Physical address

Page no.(p) is Base address of


An index into Each page in
Page table. Physical memory
Paging Example
Page 0

Base
Address
Page 1 Page
Of each
Number
Page in
As
Page 2 Physical
index
Memory

Page 3

Physical Address = (frame No.c page size) + Page offset

32-byte memory and 4-byte pages


LA Address Mapping PA
• Physical Address = (frame No.c page size) + Page offset
• Example:-
– page size is 04 bytes.
– Physical memory size is of 32bytes.
– Hence total number of pages 32 bytes / 4 bytes = 08.
ќ If logical address is 0 then, what is its corresponding physical
address?
ќ Page offset (d) = displacement within pages = 0 – 0 = 0
ќ Hence Page number is = 0, which is in frame 5 given as in page table.
ќ So physical address = (5 x 4) + 0 = 20
ќ If logical address is 03 then, what is its corresponding physical
address?
ќ Page offset (d) = displacement within pages = 03 – 0 = 3
ќ Hence Page number is = 0, which is in frame 5 given as in page table.
ќ So physical address = (5 x 4) + 3 = 23
Paging Example
ќ If logical address is 04 then, what is its corresponding physical
address?
ќ Page offset (d) = displacement within pages = 04 – 4 = 0
ќ Hence Page number is = 1, which is in frame 6 given as in page table.
ќ So physical address = (6 x 4) + 0 = 24
ќ If logical address is 10 then, what is its corresponding physical
address?
ќ Page offset (d) = displacement within pages = 10 – 8 = 2
ќ Hence Page number is = 2, which is in frame 1 given as in page table.
ќ So physical address = (1 x 4) + 2 = 06
ќ If logical address is 13 then, what is its corresponding physical
address?
ќ Page offset (d) = displacement within pages = 13 – 12 = 01
ќ Hence Page number is = 3, which is in frame 2 given as in page table.
ќ So physical address = (2 x 4) + 1 = 09
ќ If logical address is 15 then, what is its corresponding physical
address?
ќ Page offset (d) = displacement within pages = 15 – 12 = 3
ќ Hence Page number is = 3, which is in frame 2 given as in page table.
ќ So physical address = (2x 4) + 3 = 11
Paging Hardware With TLB

As Associative register
Virtual memory
• Virtual memory – separation of user logical memory
from physical memory.
–Only part of the program needs to be in
memory for execution
–Logical address space can therefore be
much larger than physical address space
–Allows address spaces to be shared by
several processes
–Allows for more efficient process creation

• Virtual memory can be implemented via:


–Demand paging
–Demand segmentation
Virtual Memory That is Larger Than
Physical Memory


Page Replacement
Page Replacement Algorithms
• Want lowest page-fault rate

• Evaluate algorithm by running it on a particular string of


memory references (reference string) and computing the
number of page faults on that string
First-In-First-Out (FIFO) Algorithm

• Replacement is depends upon the arrival time of a page to


memory.
• A page is replaced when it is oldest (in the ascending order of
page arrival time to memory).
• As it is a FIFO queue no need to record the arrival time of a
page & the page at the head of the queue is replaced.
• Performance is not good always
• When a active page is replaced to bring a new page, then a page
fault occurs immediately to retrieve the active page.
• To get the active page back some other page has to be
replaced. Hence the page fault increases.
FIFO Page Replacement

H TWOHIT
TWOHIT
I S
S
T

15 PAGE FAULTS
Problem with FIFO Algorithm

• Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
• 3 frames (3 pages can be in memory at a time per process)

1 1 4 5 9 page faults

2 2 1 3

3 3 2 4

• 4 frames UNEXPECTED
1 1 5 4
Page fault
increases
2 2 1 5

3 3 2

4 4 3 10 page faults

• Belady’s Anomaly:à
à more frames ⇒ more page faults
Optimal Algorithm

• To recover from belady’s anomaly problem :à à Use Optimal


page replacement algorithm
• Replace the page that will not be used for longest period of
time.
• This guarantees lowest possible page fault rate for a fixed
number of frames.
• Example :à

– First we found 3 page faults to fill the frames.
– Then replace page 7 with page 2 because it will
not needed up to the 18th place in reference
string.
– Finally there are 09 page faults.
– Hence it is better than FIFO algorithm (15 page
Faults).
Optimal Page Replacement

H
H
IT
I
T TWOHIT THREE TWOHIT
TWOHIT
S HITS S
S

09 PAGE FAULTS
Difficulty with Optimal Algorithm
• Replace page that will not be used for longest period of
time
• 4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 4

6 page faults
3

4 5

• Used for measuring how well your algorithm performs.


• It always needs future knowledge of reference string.
Least Recently Used (LRU) Algorithm
• LRU algorithm lies between FIFO & Optimal algorithm ( in
terms of page faults).
• FIFO :à
à time when page brought into memory.
• OPTIMAL :à à time when a page will used.
• LRU :àà Use the recent past as an approximation of near
future (so it cant be replaced), then we will
replace that page which has not been used for longest
period of time. Hence it is least recently used algorithm.
• Example :à à
– Up to 5th page fault it is same as optimal algorithm.
– When page 4 occur LRU chose page 2 for replacement.
– Here we find only 12 page faults.
LRU Page Replacement

H H TWOHIT
H H
IT IT TWOHIT S
IT IT
S

12 PAGE FAULTS
Least Recently Used (LRU) Algorithm
• Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 1 1 1 5

2 2 2 2 2

3 5 5 4 4

4 4 3 3 3

• Counter implementation
– Every page entry has a counter; every time page is
referenced through this entry, copy the clock into the
counter
– When a page needs to be changed, look at the
counters to determine which are to change

You might also like