You are on page 1of 133

Tutorial: Networks on Chip

The 1st ACM/IEEE International Symposium on Networks-on-Chip Princeton, New Jersey, May 6, 2007

Network Layer Communication Performance in Network-on-Chips A. Jantsch (Royal Institute of Technology, Sweden) 10:00 - 11:45 am Power, Energy and Reliability Issues in NoC R. Marculescu (Carnegie Mellon University, USA) 1:15 - 3:00 pm Tooling, OS Services and Middleware L. Benini (University of Bologna, Italy) 3:15 - 5:00 pm

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 1

Network Layer Communication Performance in Network-on-Chips

Introduction Communication Performance Organizational Structure Interconnection Topologies Trade-os in Network Topology Routing Quality of Service

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 2

Introduction
Interconnection Network

Topology: How switches and nodes are connected Routing algorithm: determines the route from source to destination Switching strategy: how a message traverses the route
Network interface Network interface

Communication assistm Mem P Mem P

Communication assistm

Flow control: Schedules the traversal of the message over time

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 3

Basic Denitions

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 3

Basic Denitions
Message is the basic communication entity. Flit is the basic ow control unit. A message consists of 1 or many its. Phit is the basic unit of the physical layer.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 3

Basic Denitions
Message is the basic communication entity. Flit is the basic ow control unit. A message consists of 1 or many its. Phit is the basic unit of the physical layer. Direct network is a network where each switch connects to a node. Indirect network is a network with switches not connected to any node.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 3

Basic Denitions
Message is the basic communication entity. Flit is the basic ow control unit. A message consists of 1 or many its. Phit is the basic unit of the physical layer. Direct network is a network where each switch connects to a node. Indirect network is a network with switches not connected to any node. Hop is the basic communication action from node to switch or from switch to switch.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 3

Basic Denitions
Message is the basic communication entity. Flit is the basic ow control unit. A message consists of 1 or many its. Phit is the basic unit of the physical layer. Direct network is a network where each switch connects to a node. Indirect network is a network with switches not connected to any node. Hop is the basic communication action from node to switch or from switch to switch. Diameter is the length of the maximum shortest path between any two nodes measured in hops. Routing distance between two nodes is the number of hops on a route. Average distance is the average of the routing distance over all pairs of nodes.
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 4

Basic Switching Techniques


Circuit Switching A real or virtual circuit establishes a direct connection between source and destination.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 4

Basic Switching Techniques


Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. Packet Switching Each packet of a message is routed independently. The destination address has to be provided with each packet.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 4

Basic Switching Techniques


Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. Packet Switching Each packet of a message is routed independently. The destination address has to be provided with each packet. Store and Forward Packet Switching The entire packet is stored and then forwarded at each switch.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 4

Basic Switching Techniques


Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. Packet Switching Each packet of a message is routed independently. The destination address has to be provided with each packet. Store and Forward Packet Switching The entire packet is stored and then forwarded at each switch. Cut Through Packet Switching The its of a packet are pipelined through the network. The packet is not completely buered in each switch.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 4

Basic Switching Techniques


Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. Packet Switching Each packet of a message is routed independently. The destination address has to be provided with each packet. Store and Forward Packet Switching The entire packet is stored and then forwarded at each switch. Cut Through Packet Switching The its of a packet are pipelined through the network. The packet is not completely buered in each switch. Virtual Cut Through Packet Switching The entire packet is stored in a switch only when the header it is blocked due to congestion.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 4

Basic Switching Techniques


Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. Packet Switching Each packet of a message is routed independently. The destination address has to be provided with each packet. Store and Forward Packet Switching The entire packet is stored and then forwarded at each switch. Cut Through Packet Switching The its of a packet are pipelined through the network. The packet is not completely buered in each switch. Virtual Cut Through Packet Switching The entire packet is stored in a switch only when the header it is blocked due to congestion. Wormhole Switching is cut through switching and all its are blocked on the spot when the header it is blocked.
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 5

Latency
1 A B C D
Time(n) = Admission + RoutingDelay + ContentionDelay Admission is the time it takes to emit the message into the network. RoutingDelay is the delay for the route. ContentionDelay is the delay of a message due to contention.

2 3

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 6

Routing Delay
Store and Forward: Tsf (n, h) = h( n b + )

n ... message size in bits np ... size of message fragments in bits h ... number of hops b ... raw bandwidth of the channel ... switching delay per hop

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 6

Routing Delay
Store and Forward: Circuit Switching: Tsf (n, h) = h( n b + ) Tcs(n, h) =
n b

+ h

n ... message size in bits np ... size of message fragments in bits h ... number of hops b ... raw bandwidth of the channel ... switching delay per hop

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 6

Routing Delay
Store and Forward: Circuit Switching: Cut Through: Tsf (n, h) = h( n b + ) Tcs(n, h) = Tct(n, h) =
n b n b

+ h + h

n ... message size in bits np ... size of message fragments in bits h ... number of hops b ... raw bandwidth of the channel ... switching delay per hop

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 6

Routing Delay
Store and Forward: Circuit Switching: Cut Through: Store and Forward with fragmented packets: Tsf (n, h) = h( n b + ) Tcs(n, h) = Tct(n, h) =
n b n b

+ h + h
nnp b

Tsf (n, h, np) =

+ h(

np b

+ )

n ... message size in bits np ... size of message fragments in bits h ... number of hops b ... raw bandwidth of the channel ... switching delay per hop

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 7

Routing Delay: Store and Forward vs Cut Through


SF vs CT switching; d=2, k=10, b=1 12000 Cut Through Store and Forward 10000 8000 6000 4000 2000 0 300 250 Average latency Average latency 200 150 100 50 0 350 Cut Through Store and Forward SF vs CT switching; d=2, k=10, b=32

100

200

300

400 500 600 700 Packet size in Bytes

800

900

1000

1100

100

200

300

400 500 600 700 Packet size in Bytes

800

900

1000

1100

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 7

Routing Delay: Store and Forward vs Cut Through


SF vs CT switching; d=2, k=10, b=1 12000 Cut Through Store and Forward 10000 8000 6000 4000 2000 0 300 250 Average latency Average latency 200 150 100 50 0 350 Cut Through Store and Forward SF vs CT switching; d=2, k=10, b=32

100

200

300

400 500 600 700 Packet size in Bytes SF vs CT switching, k=2, m=8

800

900

1000

1100

100

200

300

400 500 600 700 Packet size in Bytes SF vs CT switching, d=2, m=8

800

900

1000

1100

90 80 70 Average latency Average latency 60 50 40 30 20 Cut Through Store and Forward

300 Cut Through Store and Forward 250 200 150 100 50

10 0 0 200 400 600 800 Number of nodes (k=2) 1000 1200 0 0 200 400 600 800 Number of nodes (d=2) 1000 1200

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 8

Local and Global Bandwidth

Local bandwidth = b n+n n+w E Total bandwidth = Cb[bits/second] = Cw[bits/cycle] = C [phits/cycle] Bisection bandwidth ... minimum bandwidth to cut the net into two equal parts.
b ... raw bandwidth of a link; n ... message size; nE ... size of message envelope; w ... link bandwidth per cycle; ... switching time for each switch in cycles; w ... bandwidth lost during switching; C ... total number of channels;

1
For a k k mesh with bidirectional channels: Total bandwidth = (4k 2 4k )b Bisection bandwidth = 2kb

2 3

A B C D
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 9

Link and Network Utilization

total load on the network: L =

N hl [phits/cycle] M N hl [phits/cycle] 1 MC

load per channel: =

M ... each host issues a packet every M cycles C ... number of channels N ... number of nodes h ... average routing distance l = n/w ... number of cycles a message occupies a channel n ... average message size w ... bitwidth per channel

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Performance - 10

Network Saturation

Network saturation

Network saturation

Delivered bandwidth

Latency

Offered bandwidth

Delivered bandwidth

Typical saturation points are between 40% and 70%. The saturation point depends on Trac pattern Stochastic variations in trac Routing algorithm
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Organizational Structure - 11

Organizational Structure

Link Switch Network Interface

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Organizational Structure - 12

Link
Short link At any time there is only one data word on the link. Long link Several data words can travel on the link simultaneously. Narrow link Data and control information is multiplexed on the same wires. Wide link Data and control information is transmitted in parallel and simultaneously. Synchronous clocking Both source and destination operate on the same clock. Asynchronous clocking The clock is encoded in the transmitted data to allow the receiver to sample at the right time instance.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Organizational Structure - 13

Switch
Receiver Input ports Crossbar Input buffer Output buffer Transmitter Output ports

Control (Routing, Scheduling)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Organizational Structure - 14

Switch Design Issues


Degree: number of inputs and outputs; Buering Input buers Output buers Shared buers Routing Source routing Deterministic routing Adaptive routing Output scheduling Deadlock handling Control ow

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Organizational Structure - 15

Network Interface

Admission protocol Reception obligations Buering Assembling and disassembling of messages Routing Higher level services and protocols

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 16

Interconnection Topologies
Fully connected networks Linear arrays and rings Multidimensional meshes and tori Trees Butteries

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 17

Fully Connected Networks


Bus: switch degree diameter distance network cost total bandwidth bisection bandwidth
Node

Node

Node

= = = = = =

N 1 1 O(N ) b b

Node

Node Node

Crossbar: switch degree diameter distance network cost total bandwidth bisection bandwidth

= = = = = =

N 1 1 O(N 2) Nb Nb

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 18

Linear Arrays and Rings


Linear array: switch degree diameter distance network cost total bandwidth bisection bandwidth Torus: switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = = 2 N 1 2/3N O(N ) 2(N 1)b 2b

Linear array

Torus

Folded torus

= = = = =

2 N/2 1/3N O(N ) 2N b 4b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 19

Multidimensional Meshes and Tori


2d mesh

k -ary d-cubes are d-dimensional tori with unidirectional links and k nodes in each dimension:
3d cube

number of nodes N switch degree diameter

= kd = d = d(k 1) d1 2 (k 1) = O(N ) = 2N b

2d torus

distance network cost total bandwidth

bisection bandwidth = 2k (d1)b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 20

Routing Distance in k -ary n-Cubes


Network Scalability wrt Distance 45 40 35 Average distance 30 25 20 15 10 5 0 0 200 400 600 Number of nodes 800 1000 1200 k=2 d=4 d=3 d=2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 21

Projecting High Dimensional Cubes

2ary 2cube

2ary 3cube

2ary 4cube

2ary 5cube

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 22

Binary Trees

number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth

= = = = = = =

2d 2d 1 3 2d d+2 O(N ) 2 2(N 1)b 2b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 23

k -ary Trees

number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth

= = = = = =

kd kd k+1 2d d+2 O(N ) 2 2(N 1)b kb

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 24

Binary Tree Projection

Ecient and regular 2-layout; Longest wires in resource width: lW = 2


d N lW 2 4 0 3 8 1 4 16 1 5 32 2 6 64 2
d1 2 1

7 128 4

8 256 4

9 512 8

10 1024 8

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 25

k -ary n-Cubes versus k -ary Trees

k -ary n-cubes: number of nodes N switch degree diameter distance network cost total bandwidth = kd = d+2 = d(k 1) d1 2 (k 1) = O(N ) = 2N b

k -ary trees: number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = = = kd kd k+1 2d d+2 O(N ) 2 2(N 1)b kb

bisection bandwidth = 2k (d1)b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 26

Butteries
01 01 Butterfly building block

0
01

1
01

4 3 2 1 0 16 node butterfly

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 27

Buttery Characteristics

4 3 2 1 0

number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth

= = = = = = = =

2d 2d1d 2 d+1 d+1 O(N d) 2ddb N 2b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 28

k -ary n-Cubes versus k -ary Trees vs Butteries

k -ary n-cubes binary tree cost distance links per node bisection frequency limit of random trac O(N ) 1 d 2 N log N 2 2N
d1 d

buttery O(N log N ) log N log N


1 2N

O(N ) 2 log N 2 1 1/N

1/( d

N 2)

1/2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 29

Problems with Butteries

Cost of the network O(N log N ) 2-d layout is more dicult than for binary trees Number of long wires grows faster than for trees. For each source-destination pair there is only one route. Each route blocks many other routes.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 30

Benes Networks

Many routes; Costly to compute non-blocking routes; High probability for non-blocking route by randomly selecting an intermediate node [Leighton, 1992];

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 31

Fat Trees

fat nodes

16node 2ary fattree

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 32

k -ary n-dimensional Fat Tree Characteristics


kd k d1d 2k 2d d O(N d) 2k ddb 2k d1b

16node 2ary fattree

number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth

= = = = = = =

fat nodes

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 33

k -ary n-Cubes versus k -ary d-dimensional Fat Trees

k -ary n-cubes: number of nodes N switch degree diameter distance network cost total bandwidth = kd = d = d(k 1) d1 2 (k 1) = O(N ) = 2N b

k -ary n-dimensional fat trees: number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = = = = kd k d1d 2k 2d d O(N d) 2k ddb 2k d1b

bisection bandwidth = 2k (d1)b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 34

Relation between Fat Tree and Hypercube

binary 2dim fat tree

binary 1cube
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 35

Relation between Fat Tree and Hypercube - contd

binary 3dim fat tree

binary 2cube

binary 2cube

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 36

Relation between Fat Tree and Hypercube - contd

binary 4dim fat tree

binary 3cube

binary 3cube

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 37

Trade-os in Topology Design for the k -ary n-Cube

Unloaded Latency Latency under Load

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 38

Network Scaling for Unloaded Latency


Latency(n) = Admission + RoutingDelay + ContentionDelay n + h RoutingDelay Tct(n, h) = b 1 1 1 d d(k 1) = (k 1) logk N = (d N 1) RoutingDistance h = 2 2 2
Network scalabilit wrt latency (m=32) 160 140 120 Average latency 100 80 60 40 20 Average latency k=2 d=5 d=4 d=3 d=2 260 240 220 200 180 160 140 120 k=2 d=5 d=4 d=3 d=2 Network scalabilit wrt latency (m=128)

1000

2000

3000

4000 5000 6000 Number of nodes

7000

8000

9000

10000

1000

2000

3000

4000 5000 6000 Number of nodes

7000

8000

9000

10000

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 39

Unloaded Latency for Small Networks and Local Trac


Network scalabilit wrt latency (m=128) 137 136 135 Average latency 134 133 132 131 130 129 10 20 30 40 50 60 Number of nodes 70 80 90 100 129.5 129 10 k=2 d=5 d=4 d=3 d=2 Average latency 132 131.5 131 130.5 130 k=2 d=5 d=4 d=3 d=2 Network scalabilit wrt latency (m=128; h=dk/5)

20

30

40

50 60 Number of nodes

70

80

90

100

Network scalabilit wrt latency (m=128; h=dk/5) 170 165 160 Average latency 155 150 145 140 135 130 125 0 1000 2000 3000 4000 5000 6000 Number of nodes 7000 8000 9000 10000 k=2 d=5 d=4 d=3 d=2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 40

Unloaded Latency under a Free-Wire Cost Model

Free-wire cost model: Wires are free and can be added without penalty.
Latency wrt dimension under freewire cost model (m=32;b=32) 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Average latency 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Latency wrt dimension under freewire cost model (m=128;b=32)

Average latency

4 Dimension

4 Dimension

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 41

Unloaded Latency under a Fixed-Wire Cost Models

Fixed-wire cost model: The number of wires is constant per node: 128 wires per node: w(d) = 64 d . d 2 3 4 5 6 7 8 9 10 w(d) 32 21 16 12 10 9 8 7 6
Latency wrt dimension under fixedwire cost model (m=32;b=64/d) 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Average latency 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Latency wrt dimension under fixedwire cost model (m=128;b=64/d)

Average latency

6 Dimension

10

6 Dimension

10

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 42

Unloaded Latency under a Fixed-Bisection Cost Models


Fixed-bisection cost model: The number of wires across the bisection is constant: d k bisection = 1024 wires: w(d) = 2 = 2N . Example: N=1024: d 2 3 4 5 6 7 8 9 10 w(d) 512 16 5 3 2 2 1 1 1
Latency wrt dimension under fixedbisection cost model (m=32B;b=k/2) 400 350 300 Average latency 250 200 150 100 50 0 2 3 4 5 6 Dimension 7 8 9 10 N=16K N=1K N=256 N=128 N=64 Average latency 1000 900 800 700 600 500 400 300 200 100 2 3 4 5 6 Dimension 7 8 9 10 N=16K N=1K N=256 N=128 N=64 Latency wrt dimension under fixedbisection cost model (m=128B;b=k/2)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 43

Unloaded Latency under a Logarithmic Wire Delay Cost Models


Fixed-bisection Logarithmic Wire Delay cost model: The number of wires across the bisection is constant and the delay on wires increases logarithmically with the length [Dally, 1990]: n Length of long wires: l = k 2 1 d Tc 1 + log l = 1 + ( 1) log k 2
Latency wrt dimension under fixedbisection log wire delay cost model (m=32B;b=k/2) 1200 1000 800 600 400 200 0 N=16K N=1K N=256 N=128 N=64
Average latency Latency wrt dimension under fixedbisection log wire delay cost model (m=128B;b=k/2) 4000 N=16K N=1K 3500 N=256 N=128 N=64 3000 2500 2000 1500 1000 500

Average latency

6 Dimension

10

6 Dimension

10

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 44

Unloaded Latency under a Linear Wire Delay Cost Models


Fixed-bisection Linear Wire Delay cost model: The number of wires across the bisection is constant and the delay on wires increases linearly with the length [Dally, 1990]: n Length of long wires: l = k 2 1 Tc l = k 2 1
Latency wrt dimension under fixedbisection log wire delay cost model (m=32B;b=k/2) 12000 N=16K N=1K N=256 10000 N=128 N=64 Average latency
Average latency Latency wrt dimension under fixedbisection log wire delay cost model (m=128B;b=k/2) 40000 N=16K N=1K 35000 N=256 N=128 N=64 30000 25000 20000 15000 10000

8000 6000 4000 2000 0

5000

6 Dimension

10

6 Dimension

10

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 45

Latency under Load

Assumptions [Agarwal, 1991]: k -ary n-cubes random trac dimension-order cut-through routing unbounded internal buers (to ignore ow control and deadlock issues)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 46

Latency under Load - contd


Latency(n) = Admission + RoutingDelay + ContentionDelay T (m, k, d, w, ) = RoutingDelay + ContentionDelay m T (m, k, d, w, ) = + dhk ( + W (m, k, d, w, )) w m 1 hk 1 W (m, k, d, w, ) = 1 + w (1 ) h2 d k 1 d(k 1) h = 2 m w hk message size bitwidth of link aggregate channel utilization average distance in each dimension switching time in cycles

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-os in Topologies - 47

Latency vs Channel Load


Latency wrt channel utilization (w=8;delta=1) 300 250 200 150 100 50 0 m8,d3,k10 m8,d2,k32 m32,d3,k10 m32,d2,k32 m128,d3,k10 m128,d2,k32

Average latency

0.1

0.2

0.3

0.4 0.5 0.6 Channel utilization

0.7

0.8

0.9

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Routing - 48

Routing
Deterministic routing The route is determined solely by source and destination locations. Arithmetic routing The destination address of the incoming packet is compared with the address of the switch and the packet is routed accordingly. (relative or absolute addresses) Source based routing The source determines the route and builds a header with one directive for each switch. The switches strip o the top directive. Table-driven routing Switches have routing tables, which can be congured. Adaptive routing The route can be adapted by the switches to balance the load. Minimal routing allows only shortest paths while non-minimal routing allows even longer paths.
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 49

Quality of Service
Best Eort (BE) Optimization of the average case Loose or non-existent worst case bounds Cost eective use of resources Guaranteed Service (GS) Maximum delay Minimum bandwidth Maximum Jitter Requires additional resources

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 50

Regulated Flows
A Flow F is (, ) regulated if F (b) F (a) + (b a) for all time intervals [a, b], 0 a b and where F (t) the cumulative amount of trac between 0 and t 0. 0 is the burstiness constraint; 0 is the maximum average rate;

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 50

Regulated Flows
A Flow F is (, ) regulated if F (b) F (a) + (b a) for all time intervals [a, b], 0 a b and where F (t) the cumulative amount of trac between 0 and t 0. 0 is the burstiness constraint; 0 is the maximum average rate;

F(t)

t1

t2 t3 t4

t5

t
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 51

Regulated Flows - Delay Element

F 1

F 2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 51

Regulated Flows - Delay Element

F 1

D
F1 (, ) F2 ( + D, )

F 2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 52

Regulated Flows - Work Conserving Multiplexer

F 1 F 2

D B

F 3 b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 52

Regulated Flows - Work Conserving Multiplexer

F 1 F 2

D B

F 3 b

F1 (1, 1) F2 (2, 2) link bandwidth b < 1 + 2 F3 ? maximum delay D = ? maximum backlog B = ?

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 53

Work Conserving Multiplexer - 1


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 53

Work Conserving Multiplexer - 1


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 1 (t1): F1 and F2 transmit at full speed;

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 53

Work Conserving Multiplexer - 1


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 1 (t1): F1 and F2 transmit at full speed; Assume: At t = 0 the queue is empty; 1 2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 53

Work Conserving Multiplexer - 1


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 1 (t1): F1 and F2 transmit at full speed; Assume: At t = 0 the queue is empty; 1 2 Injection rate: 2b; Drain rate: b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 53

Work Conserving Multiplexer - 1


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 1 (t1): F1 and F2 transmit at full speed; Assume: At t = 0 the queue is empty; 1 2 Injection rate: 2b; Drain rate: b

bt1 = 1 + 1t1 1 t1 = b 1
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 54

Work Conserving Multiplexer - 2


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 54

Work Conserving Multiplexer - 2


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 2 (t2): F1 transmits at rate 1, F2 transmits at full speed;

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 54

Work Conserving Multiplexer - 2


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 2 (t2): F1 transmits at rate 1, F2 transmits at full speed; Injection rate: b + 1; Drain rate: b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 54

Work Conserving Multiplexer - 2


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 2 (t2): F1 transmits at rate 1, F2 transmits at full speed; Injection rate: b + 1; Drain rate: b

btaccu = 2 + 2taccu 2 taccu = b 2


A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 55

Work Conserving Multiplexer - 3


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 55

Work Conserving Multiplexer - 3


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 3 (tdrain): F1 transmits at rate 1, F2 transmits at rate 2;

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 55

Work Conserving Multiplexer - 3


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 3 (tdrain): F1 transmits at rate 1, F2 transmits at rate 2; Injection rate: 1 + 2; Drain rate: b

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 55

Work Conserving Multiplexer - 3


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 3 (tdrain): F1 transmits at rate 1, F2 transmits at rate 2; Injection rate: 1 + 2; Drain rate: b tdrain = Bmax b 1 2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 55

Work Conserving Multiplexer - 3


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 3 (tdrain): F1 transmits at rate 1, F2 transmits at rate 2; Injection rate: 1 + 2; Drain rate: b tdrain = Bmax b 1 2

Bmax = bt1 + 1t2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 55

Work Conserving Multiplexer - 3


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Phase 3 (tdrain): F1 transmits at rate 1, F2 transmits at rate 2; Injection rate: 1 + 2; Drain rate: b tdrain = Bmax b 1 2 1 2 b 2
A. Jantsch, KTH

Bmax = bt1 + 1t2 = 1 +

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 56

Work Conserving Multiplexer - Summary


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 56

Work Conserving Multiplexer - Summary


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Bmax = 1 +

1 2 b 2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 56

Work Conserving Multiplexer - Summary


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Bmax = 1 +

1 2 b 2 1 + 2 = b 1 2

Dmax = taccu + tdrain

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 56

Work Conserving Multiplexer - Summary


B max B 1 1 b Time 1 1 b 1 2

t= 0

t1 taccu

t2

tdrain

Bmax = 1 + Dmax F3

1 2 b 2

1 + 2 = taccu + tdrain = b 1 2 (1 + 2, 1 + 2)
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 57

MPEG Encoding Case Study

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 57

MPEG Encoding Case Study

Processor

Custom HW Interconnect

Custom HW

Memory

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 57

MPEG Encoding Case Study

Processor

Custom HW Interconnect

Custom HW

Memory

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 57

MPEG Encoding Case Study

Processor

Custom HW Interconnect

Custom HW

Memory

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 57

MPEG Encoding Case Study

Processor

Custom HW Interconnect

Custom HW

Memory

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 58

MPEG Encoding Case Study - contd

T F1

M
F1 (0, t)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 59

MPEG Encoding Case Study - contd

T F1 C1 F2 F4 C2 F3 M

S F6 C3 F7 F8

V F9 C4

F1 (0, t)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 59

MPEG Encoding Case Study - contd

T F1 C1 F2 F4 C2 F3 M

S F6 C3 F7 F8

V F9 C4

F1 (0, t) C1 : (t, D1) C2 : (t, D2) C1 : (t, D3) C4 : (t, D4)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 59

MPEG Encoding Case Study - contd

T F1 C1 F2 F4 C2 F3 M

S F6 C3 F7 F8

V F9 C4

F1 (0, t) C1 : (t, D1) C2 : (t, D2) C1 : (t, D3) C4 : (t, D4) F2 (tD1, t)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 60

MPEG Encoding Case Study - Memory

F2 F7
M :

FM1

FM2

FM3 FM4

(2t, DM )

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 60

MPEG Encoding Case Study - Memory

F2 F7
M :

FM1

FM2

FM3 FM4

Dmux Fmuxout

(2t, DM ) For a general multiplexer we have: 1 + 2 = Cout 1 2 (1 + 2, 1 + 2)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 60

MPEG Encoding Case Study - Memory

F2 F7
M :

FM1

FM2

FM3 FM4

Dmux Fmuxout

(2t, DM ) For a general multiplexer we have: 1 + 2 = Cout 1 2 (1 + 2, 1 + 2)

FM 3 (t(D1 + Dmux + DM ), t)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 60

MPEG Encoding Case Study - Memory

F2 F7
M :

FM1

FM2

FM3 FM4

Dmux Fmuxout

(2t, DM ) For a general multiplexer we have: 1 + 2 = Cout 1 2 (1 + 2, 1 + 2)

FM 3 (t(D1 + Dmux + DM ), t) FM 4 ?
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 61

MPEG Encoding Case Study - contd


T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 61

MPEG Encoding Case Study - contd


T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8

RM 1 (Sbuer, t); RM 2 (Sbuer, t); RS (Sbuer, t); Sbuer is the size of the input buer in S.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 61

MPEG Encoding Case Study - contd


T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8

RM 1 (Sbuer, t); RM 2 (Sbuer, t); RS (Sbuer, t); Sbuer is the size of the input buer in S.

D(, )-regulator =

max(0, )

B(, )-regulator = max(0, )

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 61

MPEG Encoding Case Study - contd


T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8

RM 1 (Sbuer, t); RM 2 (Sbuer, t); RS (Sbuer, t); Sbuer is the size of the input buer in S.

F (Sbuer,max(0 t ) , ) D(, 6 = )-regulator C3 : (t, D3) B(, )-regulator = max(0, ) F7 (Sbuer + tD3, t)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 62

MPEG Encoding Case Study - Memory

F2 F7

FM1

FM2

FM3 FM4

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 62

MPEG Encoding Case Study - Memory

F2 F7
M :

FM1

FM2

FM3 FM4

(2t, DM )

Dmux = FM 1

1 + 2 Sbuer + t(D1 + D3) = Cout 1 2 Cout 2t (Sbuer + t(D1 + D3), 2t)

FM 2 (Sbuer + t(D1 + D3 + 2DM ), 2t) FM 3 (t(D1 + Dmux + DM ), t) FM 4 (Sbuer + t(D3 + Dmux + DM ), t)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 63

MPEG Encoding Case Study - contd

T F1 F4 C2 C1 F2 F3

S F5 RS F6 C3 F7 RM2 FM4

V F9 C4 F8

RM1 FM3 M

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 63

MPEG Encoding Case Study - contd


Backlog of the regulators:
T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8

BRM 1 = max(0, t(D1 + Dmux + DM ) Sbuer) BRM 2 = max(0, 128B + t(D3 + Dmux +DM ) Sbuer)

Delay of the regulators: DRM 1 = DRM 2 = BRM 1 t BRM 2 t

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 64

MPEG Encoding Case Study - contd

T F1 F4 C2 C1 F2 F3

S F5 RS F6 C3 F7 RM2 FM4

V F9 C4 F8

RM1 FM3 M

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 64

MPEG Encoding Case Study - contd


The ow from the memory to S: F3
T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8

(Sbuer, t) : (t, D2)

C2

F4 (Sbuer + D2, t), A charatcerization of S and its output: Sbuer S : (t, ) t F5 (2Sbuer + tD2, t) The ows between memory and V: F8 C4 (Sbuer, t) : (, D4)

F9 (Sbuer + tD4, t)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 65

MPEG Encoding Case Study - contd


T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 65

MPEG Encoding Case Study - contd


T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8

End to end delay: Dtotal =D1 + Dmux + DM + DRM 1 + D2 + DS + DRS + D3 + Dmux + DM + DRM 2 + D4 The ow at V: FT V (0 + tDtotal, t)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 66

Modeling with Regulated Flows


Interconnect: Model each channel by available bandwidth and maximum delay variation; Model each node in the interconnect as an arbiter; Model read request, write acknowledge as separate ows; Model synchronization as separate ows; A simple generalization of (, ) ows is F min(i, i), i > 0 F (b) F (a) min(i + i(b a))
i

Good analysis depends on good element models;

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 67

Network Calculus - Arrival Curves

bits (t)

ba a

Given a monotonically increasing function , dened for t 0, is an arrival curve for ow F if for all 0 a b: F (b) F (a) (b a)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 68

Network Calculus - Min-Plus Convolusion


Given two monotonically increasing functions f and g . The min-plus convolusion of f and g is the function (f g )(t) = inf (f (t s) + g (s))
0st

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 68

Network Calculus - Min-Plus Convolusion


Given two monotonically increasing functions f and g . The min-plus convolusion of f and g is the function (f g )(t) = inf (f (t s) + g (s))
0st

If is an arrival curve for F we have: F F

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 68

Network Calculus - Min-Plus Convolusion


Given two monotonically increasing functions f and g . The min-plus convolusion of f and g is the function (f g )(t) = inf (f (t s) + g (s))
0st

If is an arrival curve for F we have: F F and F with being the best bound that we can nd based on information of .

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 69

Network Calculus - Service Curves


F*(t) bits

F(t) F(t) F*(t) S (t) (F ) (t)

Given a system S with an input ow F and an output ow F . S oers the ow a service curve if and only if is a monotonically increasing function and F F which means that F (t) inf (F (t) + (t s))
st
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 70

Network Calculus - Backlog Bound


F*(t) bits backlog F(t) F(t) F*(t) S t

Given a ow F constrained by arrival curve and a system oering a service curve , the backlog F (t) F (t) for all t satises F (t) F (t) sup((s) (s))
s0

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 71

Network Calculus - Delay Bound


bits delay F(t) F(t) F*(t) S t

F*(t)

Given a ow F constrained by arrival curve and a system oering a service curve , the delay d(t) at time t is d(t) = inf( 0 : F (t) F (t + )). It satises d(t) h(, ) = sup(inf( 0 : (t) (t + )))
t0

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 72

Network Calculus - Output Arrival Curve


bits F(t) F(t) F*(t) S t

F*(t)

Given a ow F constrained by arrival curve and a system oering a service curve , the output ow F is constrained by the arrival curve = ( .

)(t) = sup((t + s) (s))


s0

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 73

Network Calculus - Useful Functions


bits R bits R bits

Peak rate function: R(t) = Rt

Rate latency function: R,T (t) = R[t T ]+

Ane function: ,(t) = 0 + t for t = 0 for t > 0

bits

bits 5 4 3 2 1 T t Tt 2Tt 3Tt

bits

1 4Tt 5Tt 6Tt t T t

Burst-delay function: T (t) = 0 for t T for t > T

Staircase function: vT, (t) = (t + )/T

Step function: uT (t) = 0 for t T 1 for t > T


A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 74

Network Calculus - Concatenation of Nodes

S / 1 F S1 1

2 S2 F* 2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 74

Network Calculus - Concatenation of Nodes

S / 1 F S1 1

2 S2 F* 2

Example:

1 = R1,T1 2 = R2,T2 R1,T1 R2,T2 = min(R1,R2),T1+T2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 74

Network Calculus - Concatenation of Nodes

S / 1 F S1 1

2 S2 F* 2

Example:

1 = R1,T1 2 = R2,T2 R1,T1 R2,T2 = min(R1,R2),T1+T2 f g =gf (f g ) h = f (g h) (f + c) g = (f g ) + c for any constant c R


A. Jantsch, KTH

Useful properties:

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 75

Network Calculus - Pay Bursts Only Once


S / 1 F S1 1 2 S2 F* 2
bits R1 R2 R2=min( R1, R2)

T1 T2

T1 + T2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 75

Network Calculus - Pay Bursts Only Once


S / 1 F S1 1 2 S2 F* 2
bits R1 R2 R2=min( R1, R2)

T1 T2

T1 + T2

= , 1 = R1,T1 = R1 max(0, t T1) 2 = R2,T2 = R2 max(0, t T2)

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 75

Network Calculus - Pay Bursts Only Once


S / 1 F S1 1 2 S2 F* 2
bits R1 R2 R2=min( R1, R2)

T1 T2

T1 + T2

= , 1 = R1,T1 = R1 max(0, t T1) 2 = R2,T2 = R2 max(0, t T2) R1,T1 R2,T2 = min(R1,R2),T1+T2 = min(R1, R2) max(0, t (T1 + T2))

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 75

Network Calculus - Pay Bursts Only Once


S / 1 F S1 1 2 S2 F* 2
bits R1 R2 R2=min( R1, R2)

T1 T2

T1 + T2

= , 1 = R1,T1 = R1 max(0, t T1) 2 = R2,T2 = R2 max(0, t T2) R1,T1 R2,T2 = min(R1,R2),T1+T2 = min(R1, R2) max(0, t (T1 + T2)) T1 D1 + D2 = + + + T1 + T2 R1 R2 R2

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Quality of Service - 75

Network Calculus - Pay Bursts Only Once


S / 1 F S1 1 2 S2 F* 2
bits R1 R2 R2=min( R1, R2)

T1 T2

T1 + T2

= , 1 = R1,T1 = R1 max(0, t T1) 2 = R2,T2 = R2 max(0, t T2) R1,T1 R2,T2 = min(R1,R2),T1+T2 = min(R1, R2) max(0, t (T1 + T2)) T1 D1 + D2 = + + + T1 + T2 R1 R2 R2 DS = + T1 + T2 min (R1, R2)
A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Summary - 76

Summary

Communication Performance: bandwidth, unloaded latency, loaded latency Organizational Structure: NI, switch, link Topologies: wire space and delay domination favors low dimension topologies; Routing: deterministic vs source based vs adaptive routing; deadlock; Quality of Service and ow regulation

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Communication Performance In NoCs - 77

To Probe Further
Classic papers: [Agarwal, 1991] Agarwal, A. (1991). Limit on interconnection performance. IEEE Transactions on Parallel and Distributed Systems, 4(6):613624. [Dally, 1990] Dally, W. J. (1990). Performance analysis of k-ary n-cube interconnection networks. IEEE Transactions on Computers, 39(6):775785. Text books: [Duato et al., 1998] Duato, J., Yalamanchili, S., and Ni, L. (1998). Interconnection Networks - An Engineering Approach. Computer Society Press, Los Alamitos, California. [Culler et al., 1999] Culler, D. E., Singh, J. P., and Gupta, A. (1999). Parallel Computer Architecture - A Hardware/Software Approach. Morgan Kaufman Publishers. [Dally and Towels, 2004] Dally, W. J. and Towels, B. (2004). Principles and Practices of Interconnection Networks. Morgan Kaufman Publishers. [DeMicheli and Benini, 2006] DeMicheli, G. and Benini, L. (2006). Networks on Chip. Morgan Kaufmann. [Leighton, 1992] Leighton, F. T. (1992). Introduction to Parallel Algorithms and Architectures. Morgan Kaufmann, San Francisco. [LeBoudec, 200] Jean-Yves LeBoudec, J-Y. (2001). Network Calculus. Springer Verlag, LCNS 2050
A. Jantsch, KTH

You might also like