Networks-on-Chip: From Implementations to Programming Paradigms

Ebook742 pages7 hours

Networks-on-Chip: From Implementations to Programming Paradigms

Name: Networks-on-Chip: From Implementations to Programming Paradigms
Author: Sheng Ma
ISBN: 9780128011782

By Sheng Ma, Libo Huang, Mingche Lai and Wei Shi

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Networks-on-Chip: From Implementations to Programming Paradigms provides a thorough and bottom-up exploration of the whole NoC design space in a coherent and uniform fashion, from low-level router, buffer and topology implementations, to routing and flow control schemes, to co-optimizations of NoC and high-level programming paradigms.

This textbook is intended for an advanced course on computer architecture, suitable for graduate students or senior undergrads who want to specialize in the area of computer architecture and Networks-on-Chip. It is also intended for practitioners in the industry in the area of microprocessor design, especially the many-core processor design with a network-on-chip. Graduates can learn many practical and theoretical lessons from this course, and also can be motivated to delve further into the ideas and designs proposed in this book. Industrial engineers can refer to this book to make practical tradeoffs as well. Graduates and engineers who focus on off-chip network design can also refer to this book to achieve deadlock-free routing algorithm designs.

Provides thorough and insightful exploration of NoC design space. Description from low-level logic implementations to co-optimizations of high-level program paradigms and NoCs.
The coherent and uniform format offers readers a clear, quick and efficient exploration of NoC design space
Covers many novel and exciting research ideas, which encourage researchers to further delve into these topics.
Presents both engineering and theoretical contributions. The detailed description of the router, buffer and topology implementations, comparisons and analysis are of high engineering value.

Skip carousel

LanguageEnglish

PublisherElsevier Science

Release dateDec 4, 2014

ISBN9780128011782

Author

Sheng Ma

Sheng Ma received the B.S. and Ph.D. degrees in computer science and technology from the National University of Defense Technology (NUDT) in 2007 and 2012, respectively. He visited the University of Toronto from Sept. 2010 to Sept. 2012. He is currently an Assistant Professor of the College of Computer, NUDT. His research interests include on-chip networks, SIMD architectures and arithmetic unit designs.

Related authors

Skip carousel

Related to Networks-on-Chip

Related ebooks

Skip carousel

Heterogeneous Computing with OpenCL 2.0
Ebook
Heterogeneous Computing with OpenCL 2.0
byDavid R. Kaeli
Rating: 0 out of 5 stars
0 ratings
The Definitive Guide to the ARM Cortex-M3
Ebook
The Definitive Guide to the ARM Cortex-M3
byJoseph Yiu
Rating: 4 out of 5 stars
4/5
Open-Source Robotics and Process Control Cookbook: Designing and Building Robust, Dependable Real-time Systems
Ebook
Open-Source Robotics and Process Control Cookbook: Designing and Building Robust, Dependable Real-time Systems
byLewin Edwards
Rating: 3 out of 5 stars
3/5
Shared Memory Application Programming: Concepts and Strategies in Multicore Application Programming
Ebook
Shared Memory Application Programming: Concepts and Strategies in Multicore Application Programming
byVictor Alessandrini
Rating: 0 out of 5 stars
0 ratings
Principles and Practices of Interconnection Networks
Ebook
Principles and Practices of Interconnection Networks
byWilliam James Dally
Rating: 0 out of 5 stars
0 ratings
Real-Time Systems Development
Ebook
Real-Time Systems Development
byRob Williams
Rating: 0 out of 5 stars
0 ratings
High-Speed Analog-to-Digital Conversion
Ebook
High-Speed Analog-to-Digital Conversion
byMichael J. Demler
Rating: 0 out of 5 stars
0 ratings
Wireless Receiver Architectures and Design: Antennas, RF, Synthesizers, Mixed Signal, and Digital Signal Processing
Ebook
Wireless Receiver Architectures and Design: Antennas, RF, Synthesizers, Mixed Signal, and Digital Signal Processing
byTony J. Rouphael
Rating: 0 out of 5 stars
0 ratings
Power Converters with Digital Filter Feedback Control
Ebook
Power Converters with Digital Filter Feedback Control
byKeng C. Wu
Rating: 0 out of 5 stars
0 ratings
Foundations of Microprogramming: Architecture, Software, and Applications
Ebook
Foundations of Microprogramming: Architecture, Software, and Applications
byAshok K. Agrawala
Rating: 0 out of 5 stars
0 ratings
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Ebook
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
byScott Hauck
Rating: 0 out of 5 stars
0 ratings
Reliable Computer Systems: Design and Evaluatuion
Ebook
Reliable Computer Systems: Design and Evaluatuion
byDaniel Siewiorek
Rating: 5 out of 5 stars
5/5
Definitive Guide to Arm Cortex-M23 and Cortex-M33 Processors
Ebook
Definitive Guide to Arm Cortex-M23 and Cortex-M33 Processors
byJoseph Yiu
Rating: 5 out of 5 stars
5/5
Silicides for VLSI Applications
Ebook
Silicides for VLSI Applications
byShyam P. Murarka
Rating: 0 out of 5 stars
0 ratings
Parallel Computing
Ebook
Parallel Computing
byEduard L Lafferty
Rating: 0 out of 5 stars
0 ratings
Solid–State Devices and Applications
Ebook
Solid–State Devices and Applications
byRhys Lewis
Rating: 4 out of 5 stars
4/5
Practical Parallel Programming
Ebook
Practical Parallel Programming
byBarr E. Bauer
Rating: 0 out of 5 stars
0 ratings
USB Complete: The Developer's Guide
Ebook
USB Complete: The Developer's Guide
byJan Axelson
Rating: 4 out of 5 stars
4/5
Hardware/Firmware Interface Design: Best Practices for Improving Embedded Systems Development
Ebook
Hardware/Firmware Interface Design: Best Practices for Improving Embedded Systems Development
byGary Stringham
Rating: 5 out of 5 stars
5/5
Embedded System Design on a Shoestring: Achieving High Performance with a Limited Budget
Ebook
Embedded System Design on a Shoestring: Achieving High Performance with a Limited Budget
byLewin Edwards
Rating: 4 out of 5 stars
4/5
System on Chip Interfaces for Low Power Design
Ebook
System on Chip Interfaces for Low Power Design
bySanjeeb Mishra
Rating: 0 out of 5 stars
0 ratings
FPGA prototyping The Ultimate Step-By-Step Guide
Ebook
FPGA prototyping The Ultimate Step-By-Step Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
RF Engineering for Wireless Networks: Hardware, Antennas, and Propagation
Ebook
RF Engineering for Wireless Networks: Hardware, Antennas, and Propagation
byDaniel M. Dobkin
Rating: 4 out of 5 stars
4/5
Introduction to Parallel Programming
Ebook
Introduction to Parallel Programming
bySteven Brawer
Rating: 0 out of 5 stars
0 ratings
Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition
Ebook
Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition
byJames Jeffers
Rating: 0 out of 5 stars
0 ratings
The Art of Designing Embedded Systems
Ebook
The Art of Designing Embedded Systems
byJack Ganssle
Rating: 0 out of 5 stars
0 ratings
PIC32 Microcontrollers and the Digilent Chipkit: Introductory to Advanced Projects
Ebook
PIC32 Microcontrollers and the Digilent Chipkit: Introductory to Advanced Projects
byDogan Ibrahim
Rating: 5 out of 5 stars
5/5
Modeling Embedded Systems and SoC's: Concurrency and Time in Models of Computation
Ebook
Modeling Embedded Systems and SoC's: Concurrency and Time in Models of Computation
byAxel Jantsch
Rating: 0 out of 5 stars
0 ratings
Modern Embedded Computing: Designing Connected, Pervasive, Media-Rich Systems
Ebook
Modern Embedded Computing: Designing Connected, Pervasive, Media-Rich Systems
byPeter Barry
Rating: 5 out of 5 stars
5/5
Electromagnetic Compatibility Engineering
Ebook
Electromagnetic Compatibility Engineering
byHenry W. Ott
Rating: 0 out of 5 stars
0 ratings

Systems Architecture For You

Skip carousel

Computer Science: A Concise Introduction
Ebook
Computer Science: A Concise Introduction
byIan Sinclair
Rating: 4 out of 5 stars
4/5
CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008
Ebook
CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Wii U Architecture: Architecture of Consoles: A Practical Analysis, #21
Ebook
Wii U Architecture: Architecture of Consoles: A Practical Analysis, #21
byRodrigo Copetti
Rating: 0 out of 5 stars
0 ratings
Microsoft Certification: Complete step by step guide to pass all Microsoft Exams and get certifications real and unique practice tests included
Ebook
Microsoft Certification: Complete step by step guide to pass all Microsoft Exams and get certifications real and unique practice tests included
byDavid Mayer
Rating: 5 out of 5 stars
5/5
CompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61
Ebook
CompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101
Ebook
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Blockchain Basics: A Non-Technical Introduction in 25 Steps
Ebook
Blockchain Basics: A Non-Technical Introduction in 25 Steps
byDaniel Drescher
Rating: 5 out of 5 stars
5/5
Raspberry Pi Projects For Dummies
Ebook
Raspberry Pi Projects For Dummies
byMike Cook
Rating: 5 out of 5 stars
5/5
Wii Architecture: Architecture of Consoles: A Practical Analysis, #11
Ebook
Wii Architecture: Architecture of Consoles: A Practical Analysis, #11
byRodrigo Copetti
Rating: 0 out of 5 stars
0 ratings
Software Architecture with Python
Ebook
Software Architecture with Python
byAnand Balachandran Pillai
Rating: 0 out of 5 stars
0 ratings
Mastering Kubernetes
Ebook
Mastering Kubernetes
byGigi Sayfan
Rating: 5 out of 5 stars
5/5
A Practical Guide to SysML: The Systems Modeling Language
Ebook
A Practical Guide to SysML: The Systems Modeling Language
bySanford Friedenthal
Rating: 4 out of 5 stars
4/5
AutoCAD 2023 : Beginners And Intermediate user Guide
Ebook
AutoCAD 2023 : Beginners And Intermediate user Guide
byDaniel Smith
Rating: 0 out of 5 stars
0 ratings
Extending Docker
Ebook
Extending Docker
byMcKendrick Russ
Rating: 0 out of 5 stars
0 ratings
JavaScript Application Design: A Build First Approach
Ebook
JavaScript Application Design: A Build First Approach
byNicolas Bevacqua
Rating: 0 out of 5 stars
0 ratings
Professional Cloud Architect – Google Cloud Certification Guide: A handy guide to designing, developing, and managing enterprise-grade GCP cloud solutions
Ebook
Professional Cloud Architect – Google Cloud Certification Guide: A handy guide to designing, developing, and managing enterprise-grade GCP cloud solutions
byKonrad Cłapa
Rating: 0 out of 5 stars
0 ratings
The IT Support Handbook: A How-To Guide to Providing Effective Help and Support to IT Users
Ebook
The IT Support Handbook: A How-To Guide to Providing Effective Help and Support to IT Users
byMike Halsey
Rating: 0 out of 5 stars
0 ratings
Solution Architecture Foundations
Ebook
Solution Architecture Foundations
byMark Lovatt
Rating: 3 out of 5 stars
3/5
Microsoft Windows 7 Administrator's Reference: Upgrading, Deploying, Managing, and Securing Windows 7
Ebook
Microsoft Windows 7 Administrator's Reference: Upgrading, Deploying, Managing, and Securing Windows 7
byJorge Orchilles
Rating: 0 out of 5 stars
0 ratings
The Practice of Enterprise Architecture: A Modern Approach to Business and IT Alignment
Ebook
The Practice of Enterprise Architecture: A Modern Approach to Business and IT Alignment
bySvyatoslav Kotusev
Rating: 4 out of 5 stars
4/5
Collaborative Enterprise Architecture: Enriching EA with Lean, Agile, and Enterprise 2.0 practices
Ebook
Collaborative Enterprise Architecture: Enriching EA with Lean, Agile, and Enterprise 2.0 practices
byStefan Bente
Rating: 4 out of 5 stars
4/5
Computer System Organization: The B5700/B6700 Series
Ebook
Computer System Organization: The B5700/B6700 Series
byElliott I. Organick
Rating: 0 out of 5 stars
0 ratings
Arduino Projects For Dummies
Ebook
Arduino Projects For Dummies
byBrock Craft
Rating: 3 out of 5 stars
3/5
Internet of Things with ESP8266
Ebook
Internet of Things with ESP8266
bySchwartz Marco
Rating: 5 out of 5 stars
5/5
SysML in Action with Cameo Systems Modeler
Ebook
SysML in Action with Cameo Systems Modeler
byOlivier Casse
Rating: 5 out of 5 stars
5/5
Hardware/Firmware Interface Design: Best Practices for Improving Embedded Systems Development
Ebook
Hardware/Firmware Interface Design: Best Practices for Improving Embedded Systems Development
byGary Stringham
Rating: 5 out of 5 stars
5/5
Learn Git in a Month of Lunches
Ebook
Learn Git in a Month of Lunches
byRick Umali
Rating: 0 out of 5 stars
0 ratings
A Practical Guide for IoT Solution Architects
Ebook
A Practical Guide for IoT Solution Architects
byDr Mehmet Yildiz
Rating: 5 out of 5 stars
5/5
Systems Architecture Modeling with the Arcadia Method: A Practical Guide to Capella
Ebook
Systems Architecture Modeling with the Arcadia Method: A Practical Guide to Capella
byPascal Roques
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
Podcast episode
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
41. Bob Nystrom
Podcast episode
41. Bob Nystrom
byIt's All Widgets! Flutter Podcast
0 ratings
0% found this document useful
[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
Podcast episode
[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
byData Skeptic
0 ratings
0% found this document useful
The Learning Curve: Can augmented reality help kids learn? As more learning goes online, education is shifting. Host Christina Warren discusses how new technology, enabled by the benefits of 5G, can make school more effective and fun.
Podcast episode
The Learning Curve: Can augmented reality help kids learn? As more learning goes online, education is shifting. Host Christina Warren discusses how new technology, enabled by the benefits of 5G, can make school more effective and fun.
byNetworked: The 5G Future
0 ratings
0% found this document useful
Living on the Edge (Computing)
Podcast episode
Living on the Edge (Computing)
byTechStuff
0 ratings
0% found this document useful
TSMC: It's time. We dive into the unbelievable history behind the quietest technology giant of them all — and as of recording the world's 9th (!) most valuable company — the Taiwan Semiconductor Manufacturing Company. This story checks every box in the Acqu
Podcast episode
TSMC: It's time. We dive into the unbelievable history behind the quietest technology giant of them all — and as of recording the world's 9th (!) most valuable company — the Taiwan Semiconductor Manufacturing Company. This story checks every box in the Acqu
byAcquired
100%
100% found this document useful
Episode 99. SHHH! It's a secret! (Storing API Keys / Passwords / tokens!): Ok, so is time to talk about something secretive! Like API Passwords, Auth tokens, or keys... these are things that we want to have as a Secret within our microservice. And yeah, adding them into your source code is a big no-no Here we cover the dos...
Podcast episode
Episode 99. SHHH! It's a secret! (Storing API Keys / Passwords / tokens!): Ok, so is time to talk about something secretive! Like API Passwords, Auth tokens, or keys... these are things that we want to have as a Secret within our microservice. And yeah, adding them into your source code is a big no-no Here we cover the dos...
byJava Pub House
0 ratings
0% found this document useful
Hacker-Proof Code Confirmed: Computer scientists can prove certain programs to be error-free with the same certainty that mathematicians prove theorems.
Podcast episode
Hacker-Proof Code Confirmed: Computer scientists can prove certain programs to be error-free with the same certainty that mathematicians prove theorems.
byQuanta Science Podcast
0 ratings
0% found this document useful
005 Linear Regression: Introduction to the first machine-learning algorithm, the 'hello world' of supervised learning - Linear Regression ocdevel.com/mlg/5 for notes and resources
Podcast episode
005 Linear Regression: Introduction to the first machine-learning algorithm, the 'hello world' of supervised learning - Linear Regression ocdevel.com/mlg/5 for notes and resources
byMachine Learning Guide
0 ratings
0% found this document useful
Engineering interview tips & tricks: with Emma Draper & Jonas
Podcast episode
Engineering interview tips & tricks: with Emma Draper & Jonas
byGo Time: Golang, Software Engineering
0 ratings
0% found this document useful
Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks: Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks Did you know that it's possible to hide malware in neural networks? Actually, you can hide malware in many statistical models. This is the subject of two recently-published papers (aptly ti...
Podcast episode
Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks: Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks Did you know that it's possible to hide malware in neural networks? Actually, you can hide malware in many statistical models. This is the subject of two recently-published papers (aptly ti...
byData & Science with Glen Wright Colopy
0 ratings
0% found this document useful
Run your home on a Raspberry Pi: with Mike Riley, author of Portable Python Projects
Podcast episode
Run your home on a Raspberry Pi: with Mike Riley, author of Portable Python Projects
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
What is distributed computing?: Sometimes using a single computer just won't cut it, and buying time on a supercomputer can be prohibitively expensive. So what do you do next? Tune in and learn more about distributed computing in this podcast.
Podcast episode
What is distributed computing?: Sometimes using a single computer just won't cut it, and buying time on a supercomputer can be prohibitively expensive. So what do you do next? Tune in and learn more about distributed computing in this podcast.
byTechStuff
100%
100% found this document useful
#51 Francois Chollet - Intelligence and Generalisation
Podcast episode
#51 Francois Chollet - Intelligence and Generalisation
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Algorithms, Algorithmic Discrimination, and Autonomous Vehicles with Caleb Watney: Algorithms, Algorithmic Discrimination, and Autonomous Vehicles with Caleb Watney Today's guest is Caleb Watney of the R Street Institute. In our conversation, we discuss algorithms, particularly with respect to their role in judicial decision making....
Podcast episode
Algorithms, Algorithmic Discrimination, and Autonomous Vehicles with Caleb Watney: Algorithms, Algorithmic Discrimination, and Autonomous Vehicles with Caleb Watney Today's guest is Caleb Watney of the R Street Institute. In our conversation, we discuss algorithms, particularly with respect to their role in judicial decision making....
byEconomics Detective Radio
0 ratings
0% found this document useful
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
Podcast episode
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
byMaintainable
0 ratings
0% found this document useful
Ohm's Law
Podcast episode
Ohm's Law
byAmogha. A. K. Podcast: Electrical And Electronics Engineering
0 ratings
0% found this document useful
Episode 161: Trapped as a QA engineer and trapped as a generalist
Podcast episode
Episode 161: Trapped as a QA engineer and trapped as a generalist
bySoft Skills Engineering
0 ratings
0% found this document useful
Edge Databases with Glauber Costa: Picture a user interacting with a web app on their phone. When they tap the screen the app triggers communication with a server, which in turn communicates with a database. This process then happens in reverse to eventually update what the user sees on...
Podcast episode
Edge Databases with Glauber Costa: Picture a user interacting with a web app on their phone. When they tap the screen the app triggers communication with a server, which in turn communicates with a database. This process then happens in reverse to eventually update what the user sees on...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
2: Pytest vs Unittest vs Nose: Choosing a test framework
Podcast episode
2: Pytest vs Unittest vs Nose: Choosing a test framework
byTest and Code
0 ratings
0% found this document useful
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
Podcast episode
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
byLinear Digressions
0 ratings
0% found this document useful
WebAssembly and wasmCloud
Podcast episode
WebAssembly and wasmCloud
byThe Cloudcast
0 ratings
0% found this document useful
018 Natural Language Processing 1: Introduction to Natural Language Processing (NLP) topics. ocdevel.com/mlg/18 for notes and resources
Podcast episode
018 Natural Language Processing 1: Introduction to Natural Language Processing (NLP) topics. ocdevel.com/mlg/18 for notes and resources
byMachine Learning Guide
0 ratings
0% found this document useful
Episode 1: Too Much Choice | LU1: Does the Linux community lean on the age old excuse of choice, to brush of the real limitations of desktop Linux environments? We debate that, and then discuss the growing reasons to roll your own email server.
Podcast episode
Episode 1: Too Much Choice | LU1: Does the Linux community lean on the age old excuse of choice, to brush of the real limitations of desktop Linux environments? We debate that, and then discuss the growing reasons to roll your own email server.
byLINUX Unplugged
0 ratings
0% found this document useful
gRPC & protocol buffers: with Askhay Shah
Podcast episode
gRPC & protocol buffers: with Askhay Shah
byGo Time: Golang, Software Engineering
0 ratings
0% found this document useful
S19:E2 - How to learn Swift and get into iOS development (Marc Aupont): You don't necessarily need expensive Apple devices to start learning Swift
Podcast episode
S19:E2 - How to learn Swift and get into iOS development (Marc Aupont): You don't necessarily need expensive Apple devices to start learning Swift
byCodeNewbie
0 ratings
0% found this document useful
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
Podcast episode
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
byData Engineering Podcast
100%
100% found this document useful
Ep. 34 - d'Oh My Zsh: In this episode, Oh My Zsh founder Robby Russell tells the story of how he unexpectedly launched one of the most popular zsh configuration frameworks out there. He shares his process, some mean tweets, and his advice for people starting open source...
Podcast episode
Ep. 34 - d'Oh My Zsh: In this episode, Oh My Zsh founder Robby Russell tells the story of how he unexpectedly launched one of the most popular zsh configuration frameworks out there. He shares his process, some mean tweets, and his advice for people starting open source...
byfreeCodeCamp Podcast
0 ratings
0% found this document useful
#2 How Data Science is Impacting Telecommunications Networks: Chris Volinsky, AT&T Labs' Assistant Vice President for Big Data Research and a member of the team that won the $1M Netflix Prize, an open competition for improving Netflix' online recommendation system, speaks with Hugo. We'll be discussing the role d...
Podcast episode
#2 How Data Science is Impacting Telecommunications Networks: Chris Volinsky, AT&T Labs' Assistant Vice President for Big Data Research and a member of the team that won the $1M Netflix Prize, an open competition for improving Netflix' online recommendation system, speaks with Hugo. We'll be discussing the role d...
byDataFramed
0 ratings
0% found this document useful
How Google Tracks Hackers: Tracking hacking groups has become a booming business. Dozens of so-called “threat intelligence” companies keep tabs on them and sell subscriptions to feeds where they provide customers with up to date information on what the most advanced cyber crimin...
Podcast episode
How Google Tracks Hackers: Tracking hacking groups has become a booming business. Dozens of so-called “threat intelligence” companies keep tabs on them and sell subscriptions to feeds where they provide customers with up to date information on what the most advanced cyber crimin...
byCYBER
0 ratings
0% found this document useful

Skip carousel

Silq Is An Easier Quantum Programming Language
Futurity
Article
Silq Is An Easier Quantum Programming Language
Jun 22, 2020
3 min read
The Virtual Garage
Racecar Engineering
Article
The Virtual Garage
Aug 6, 2021
11 min read
Rise Of The Robots
Linux Format
Article
Rise Of The Robots
Jan 12, 2021
7 min read
» Stochastic Algorithms
Linux Format
Article
» Stochastic Algorithms
Dec 14, 2021
If you’re up for some relatively maths-heavy computer-science reading (and who isn’t?), then consider looking into stochastic algorithms. Sometimes lumped together with machine-learning, stochastic algorithms is a loosely defined category that you co
1 min read
Quantum Entanglement Could Take GPS To The Next Level
Futurity
Article
Quantum Entanglement Could Take GPS To The Next Level
Apr 20, 2020
3 min read
Raspberry Pi 4 8GB
Linux Format
Article
Raspberry Pi 4 8GB
Jun 2, 2020
2 min read
Understanding CPUs
PC Powerplay
Article
Understanding CPUs
Sep 2, 2019
10 min read
Control Real-world Hardware On Your PC
Linux Format
Article
Control Real-world Hardware On Your PC
Mar 9, 2021
10 min read
“LoRa Is Able To Work Below The Noise Floor–this Is Completely At Odds With What You’ll Have Been Taught”
PC Pro Magazine
Article
“LoRa Is Able To Work Below The Noise Floor–this Is Completely At Odds With What You’ll Have Been Taught”
Feb 11, 2021
9 min read
Math’s Notes
CQ Amateur Radio
Article
Math’s Notes
Aug 1, 2020
4 min read
Computer Scientists Discover Limits of Major Research Algorithm
Quanta
Article
Computer Scientists Discover Limits of Major Research Algorithm
Aug 17, 2021
1 min read
We Need an FDA For Algorithms
Nautilus
Article
We Need an FDA For Algorithms
Nov 1, 2018
In the introduction to her new book, Hannah Fry points out something interesting about the phrase “Hello World.” It’s never been quite clear, she says, whether the phrase—which is frequently the entire output of a student’s first computer program—is
10 min read
Custom Embedded Linux Images
Linux Format
Article
Custom Embedded Linux Images
Jun 4, 2019
The Yocto Project (Yocto) www.yoctoproject.org is a system that uses the Linux kernel and packages contributed from the OpenEmbedded software team. The Yocto team points out that its product is not a Linux distribution, but instead builds custom dist
8 min read
Design Your Own Microprocessor
Linux Format
Article
Design Your Own Microprocessor
Jan 14, 2020
15 min read
Run Windows 11 on a Raspberry Pi 4
Maximum PC
Article
Run Windows 11 on a Raspberry Pi 4
Feb 1, 2022
5 min read
Gordo’s Short Circuits
CQ Amateur Radio
Article
Gordo’s Short Circuits
Oct 1, 2022
4 min read
Live-plotting Data
Linux Format
Article
Live-plotting Data
Jul 30, 2019
7 min read
Kernel Internals
Linux Format
Article
Kernel Internals
Aug 24, 2021
4 min read
Automakers Face A Threat To Ev Sales: Slow Charging Times
AppleMagazine
Article
Automakers Face A Threat To Ev Sales: Slow Charging Times
Jun 11, 2021
5 min read
Programmers: Stop Calling Yourselves Engineers
The Atlantic
Article
Programmers: Stop Calling Yourselves Engineers
Nov 5, 2015
10 min read
Superspeed Your Network For Free… (or Nearly)
PC Pro Magazine
Article
Superspeed Your Network For Free… (or Nearly)
Jan 6, 2022
A slow network hampers your productivity, and it’s also frankly embarrassing – it sends a message to your staff, and to any external parties who may access it. Yet it’s an area where major improvements can frequently be made with some tweaks to your
8 min read
Quantum Entanglement
Techfastly
Article
Quantum Entanglement
Oct 1, 2021
4 min read
Hello, World!
PC Gamer
Article
Hello, World!
Oct 17, 2019
3 min read
Technical The Power Of Data
Yachting Monthly
Article
Technical The Power Of Data
Jun 18, 2020
4 min read
Intel Charts Course To Trillion-transistor Chips
APC
Article
Intel Charts Course To Trillion-transistor Chips
Jan 23, 2023
2 min read
Here’s A Feasible Way For Our Devices To Send Data With Light
Futurity
Article
Here’s A Feasible Way For Our Devices To Send Data With Light
Apr 25, 2018
Researchers have developed a method to fabricate silicon chips that can communicate with light and are no more expensive than current chip technology. The new microchip technology capable of optically transferring data could solve a severe bottleneck
3 min read
Editor’s Note
Techfastly
Article
Editor’s Note
Apr 1, 2022
Dear Readers, eBPF is a ground-breaking technology derived from the Linux kernel that allows sandboxed programmes to operate within an operating system kernel. It stands for Extended Berkeley Packet Filter (eBPF). eBPF was first published in diminish
1 min read
Why The Future Needs Optical Data Centres
PC Pro Magazine
Article
Why The Future Needs Optical Data Centres
Sep 10, 2020
9 min read
Declutter Your Network
PC Pro Magazine
Article
Declutter Your Network
Apr 4, 2024
9 min read
Strategic Command
Racecar Engineering
Article
Strategic Command
Nov 4, 2022
9 min read

Related categories

Skip carousel

Reviews for Networks-on-Chip

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Networks-on-Chip - Sheng Ma

techniques.

Part I

Prologue

Chapter 1

Introduction

Abstract

The continual advancement of semiconductor technology and unconquerable obstacles for designing efficient single-core processors together are rapidly driving computer architecture to enter the many-core era. Mitigating development challenges for many-core processors needs a communication-centric cross-layer optimization method. The traditional bus and crossbar communication structures have several shortcomings, including poor scalability, low bandwidth, large latency, and high power consumption. To address these limitations, the network-on-chip (NoC) introduces a packet-switched fabric for on-chip communication, and it becomes the de facto many-core interconnection mechanism. The baseline NoC design exploration mainly consists of the design of the network topology, routing algorithm, flow control mechanism, and router microarchitecture. This chapter reviews the significant research progress in all these design aspects. The research progress mainly focuses on optimizing the pure NoC-layer performance, including the zero-load latency and saturation throughput. We also discuss the NoC design trends of several commercial or prototype processors. The NoCs of these real processors cover a significant design space, since their functionalities and requirements are rich and different. Finally, we provide an overview of the book content.

Keywords

Many-Core Era

Communication-Centric Cross-Layer Optimization

Networks-on-Chip

Chapter outline

1.1 The Dawn of the Many-Core Era 3

1.2 Communication-Centric Cross-Layer Optimizations 5

1.3 A Baseline Design Space Exploration of NoCs 7

1.3.1 Topology 8

1.3.2 Routing Algorithm 9

1.3.3 Flow Control 11

1.3.4 Router Microarchitecture 13

1.3.5 Performance Metric 16

1.4 Review of NoC Research 17

1.4.1 Research on Topologies 17

1.4.2 Research on Unicast Routing 18

1.4.3 Research on Supporting Collective Communications 19

1.4.4 Research on Flow Control 20

1.4.5 Research on Router Microarchitecture 22

1.5 Trends of Real Processors 23

1.5.1 The MIT Raw Processor 23

1.5.2 The Tilera TILE64 Processor 24

1.5.3 The Sony/Toshiba/IBM Cell Processor 26

1.5.4 The U.T. Austin TRIPS Processor 28

1.5.5 The Intel Teraflops Processor 29

1.5.6 The Intel SCC Processor 30

1.5.7 The Intel Larrabee Processor 32

1.5.8 The Intel Knights Corner Processor 34

1.5.9 Summary of Real Processors 36

1.6 Overview of the Book 38

References 40

1.1 The dawn of the many-core era

The development of the semiconductor industry has followed the self-fulfilling Moore's law [108] in the past half century: to maintain a competitive advantage for products, the number of transistors per unit area on integrated circuits doubles every 2 years. Currently, a single chip is able to integrate several billion transistors [114, 128], and Moore's law is expected to continue to be valid in the next decade [131]. The steadily increasing number of transistors, on the one hand, offers the computer architecture community enormous opportunities to design and discover innovative techniques to scale the processor's performance [54]. On the other hand, the huge number of transistors exacerbates several issues, such as power consumption [42, 111], resource utilization [38], and reliability problems [14], which bring about severe design challenges for computer architects.

Until the beginning of this century, the primary performance scaling methods focused on improving the sequential computation ability for single-core processors [110]. Since the birth of the first processor, the processor's sequential performance-price ratio has been improved by over 10 billion times [117]. Some of the most efficient performance enhancement techniques include leveraging deep pipelines to boost the processor frequency [56, 60], utilizing sophisticated superscalar techniques to squeeze the instruction-level parallelism [73, 83], and deploying large on-chip caches to exploit the temporal and spatial locality [70, 121]. Yet, since the switching power is proportional to the processor frequency, sophisticated superscalar techniques involve complex logic, and large caches configure huge memory arrays, these traditional performance-scaling techniques induce a sharp increase in processor complexity [3] and also power consumption [134]. In addition, there are diminishing performance returns as the already exploited instruction-level parallelism and temporal/spatial locality have nearly reached the intrinsic limitations of sequential programs [120, 143]. The unconquerable obstacles, including the drastically rising power consumption and diminishing performance gains for single-core processors, have triggered a profound revolution in computer architecture.

To efficiently reap the benefits of billions of transistors, computer architects are actively pursuing the multicore design to maintain sustained performance growth [50]. Instead of an increase in frequency or the use of sophisticated superscalar techniques to improve the sequential performance, the multicore designs scale the computation throughput by packing multiple cores into a single chip. Different threads of a single application or different concurrent applications can be assigned to the multiple cores to efficiently exploit the thread-level and task-level parallelism respectively. This can achieve significantly higher performance than using a single-core processor. Also, multicore processors reduce engineering efforts for the producers, since different amounts of the same core can be stamped down to form a family of processors. Since Intel's cancellation of the single-core processors Tejas and Jayhawk in 2004 and its switching to developing dual-core processors to compete with AMD, the community has been rapidly embracing the multicore design method [45].

Generally speaking, when the processor integrates eight or fewer cores, it is called a multicore processor. A processor with more cores is called a many-core processor. Current processors have already integrated several tens or hundreds of cores, and a new version of Moore's law uses the core count as the exponentially increasing parameter [91, 135]. Intel and IBM announced their own 80-core and 75-core prototype processors, the Teraflops [141] and Cyclops-64 [29], in 2006. Tilera released the 64-core TILE64 [11] and the 100-core TILE-Gx100 [125] processors in 2007 and 2009 respectively. In 2008, ClearSpeed launched the 192-core CSX700 processor [105]. In 2011, Intel announced its 61-core Knights Corner coprocessor [66], which is largely and successfully deployed in the currently most powerful supercomputer, the TianHe-2 [138]. Intel plans to release its next-generation acceleration co-processor, the 72-core Knights Landing co-processor, in 2015 [65]. Other typical many-core processors include Intel's Single-chip Cloud Computer (SCC) [58], Intel's Larrabee [130], and AMD's and NVIDIA's general purpose computing on graphics processing units (GPGPUs) [41, 90, 114, 146]. The many-core processors are now extensively utilized in several computation fields, ranging from scientific supercomputing to desktop graphic applications; the computer architecture and its associated computation techniques are entering the many-core era.

1.2 Communication-centric cross-layer optimizations

Although there have already been great breakthroughs in the design of many-core processors in the theoretical and practical fields, there are still many critical problems to solve. Unlike in single-core processors, the large number of cores increases the design and deployment difficulties for many-core processors; the design of efficient many-core architecture faces several challenges ranging from high-level parallel programming paradigms to low-level logic implementations [2, 7]. These challenges are also opportunities. Whether they can be successfully and efficiently solved may largely determine the future development for the entire computer architecture community. Figure 1.1 shows the three key challenges for the design of a many-core processor.

Figure 1.1 Key challenges for the design of a many-core processor.

The first challenge lies in the parallel programming paradigm layer [7, 31]. For many years, only a small number of researchers used parallel programming to conduct large-scale scientific computations. Nowadays, the widely available multicore and many-core processors make parallel programming essential even for desktop users [103]. The programming paradigm acts as a bridge to connect the parallel applications with the parallel hardware; it is one of the most significant challenges for the development of many-core processors [7, 31]. Application developers desire an opaque paradigm which hides the underlying architecture to ease programming efforts and improve application portability. Architecture designers hope for a visible paradigm to utilize the hardware features for high performance. An efficient parallel programming paradigm should be an excellent tradeoff between these two requirements.

The second challenge is the interconnection layer. The traditional bus and crossbar structures have several shortcomings, including poor scalability, low bandwidth, large latency, and high power consumption. To address these limitations, the network-on-chip (NoC) introduces a packet-switched fabric for the on-chip communication [25], and it becomes the de facto many-core interconnection mechanism. Although significant progress has been made in NoC research [102, 115], most works have focused on the optimizations of pure network-layer performance, such as the zero-load latency and saturation throughput. These works do not consider sufficiently the upper programming paradigms and the lower logic implementations. Indeed, the cross-layer optimization is regarded as an efficient way to extend Moore's law for the whole information industry [2]. The future NoC design should meet the requirements of both upper programming paradigms and lower logic implementations.

The third challenge appears in the logic implementation layer. Although multicore and many-core processors temporarily mitigate the problem of the sharply rising power consumption, the computer architecture will again face the power consumption challenge with the steadily increasing transistor count driven by Moore's law. The evaluation results from Esmaeilzadeh et al. [38] show that the power limitation will force approximately 50% of the transistors to be off for processors based on 8nm technology in 2018, and thus the emergence of dark silicon. The dark silicon phenomenon strongly calls for innovative low-power logic designs and implementations for many-core processors.

The aforementioned three key challenges directly determine whether many-core processors can be successfully developed and widely used. Among them, the design of an efficient interconnection layer is one of the most critical challenges, because of the following issues. First, the many-core processor design has already evolved from the computation-centric method into the communication-centric method [12, 53]. With the abundant computation resources, the efficiency of the interconnection layers largely and heavily determines the performance of the many-core processor. Second, and more importantly, mitigating the challenges for the programming paradigm layer and the logic implementation layer requires optimizing the design of the interconnection layer. The opaque programming paradigm requires the interconnection layer to intelligently manage the application traffic and hide the communication details. The visible programming paradigm requires the interconnection layer to leverage the hardware features for low communication latencies and high network throughput. In addition, the interconnection layers induce significant power consumption. In the Intel SCC [59], Sun Niagara [81], Intel Teraflops [57] and MIT Raw [137] processors, the interconnection layer's power consumption accounts for 10%, 17%, 28%, and 36% of the processor's overall power consumption respectively. The design of low-power processors must optimize the power consumption for the interconnection layer.

On the basis of the communication-centric cross-layer optimization method, this book explores the NoC design space in a coherent and uniform fashion, from the low-level logic implementations of the router, buffer, and topology, to network-level routing and flow control schemes, to the co-optimizations of the NoC and high-level programming paradigms. In Section 1.3, we first conduct a baseline design space exploration for NoCs, and then in Section 1.4 we review the current NoC research status. Section 1.5 summarizes NoC design trends for several real processors, including academia prototypes and commercial products. Section 1.6 briefly introduces the main content of each chapter and gives an overview of this book.

1.3 A baseline design space exploration of NoCs

Figure 1.2 illustrates a many-core platform with an NoC interconnection layer. The 16 processing cores are organized in a 4 × 4 mesh network. Each network node includes a processing node and a network router. The processing node may be replaced by other hardware units, such as special accelerators or memory controllers. The network routers and the associated links form the NoC. The router is composed of the buffer, crossbar, allocator, and routing unit. The baseline NoC design exploration mainly consists of the design of the network topology, routing algorithm, flow control mechanism, and router microarchitecture [26, 35]. In addition, we also discuss the baseline performance metrics for NoC evaluations.

Figure 1.2 A many-core platform with an NoC.

1.3.1 Topology

The topology is the first fundamental aspect of NoC design, and it has a profound effect on the overall network cost and performance. The topology determines the physical layout and connections between nodes and channels. Also, the message traverse hops and each hop's channel length depend on the topology. Thus, the topology significantly influences the latency and power consumption. Furthermore, since the topology determines the number of alternative paths between nodes, it affects the network traffic distribution, and hence the network bandwidth and performance achieved.

Since most current integrated circuits use the 2D integrated technology, an NoC topology must match well onto a 2D circuit surface. This requirement excludes several excellent topologies proposed for high-performance parallel computers; the hypercube is one such example. Therefore, NoCs generally apply regular and simple topologies. Figure 1.3 shows the structure of three typical NoC topologies: a ring, a 2D mesh, and a 2D torus.

Figure 1.3 Three typical NoC topologies. (a) Ring. (b) 2D mesh. (c) 2D torus.

The ring topology is widely used with a small number of network nodes, including the six-node Ivy Bridge [28] and the 12-node Cell [71] processors. The ring network has a very simple structure, and it is easy to implement the routing and flow control, or even apply a centralized control mechanism [28, 71]. Also, since Intel has much experience in implementing the cache coherence protocols on the ring networks, it prefers to utilize the ring topology, even in the current 61-core Knights Corner processor [66]. Yet, owing to the high average hop count, the scalability of a ring network is limited. Multiple rings may mitigate this limitation. For example, to improve the scalability, Intel's Knights Corner processor leverages ten rings, which is about two times more rings than the eight-core Nehalem-EX processor [22, 30, 116]. Another method to improve the scalability is to use 2D mesh or torus networks.

The 2D mesh is quite suitable for the wire routing in a multimetal layer CMOS technology, and it is also very easy to achieve deadlock freedom. Moreover, its scalability is better than that of the ring. These factors make the 2D mesh topology widely used in several academia and industrial chips, including the U.T. Austin TRIPS [48], Intel Teraflops [140], and Tilera TILE64 [145] chips. Also, the 2D mesh is the most widely assumed topology for the research community. A limitation of this topology is that its center portion is easily congested and may become a bottleneck with increasing core counts. Designing asymmetric topologies, which assign more wiring resources to the center portion, may be an appropriate way to address the limitation [107].

The node symmetry of a torus network helps to balance the network utilization. And the torus network supports a much higher bisection bandwidth than the mesh network. Thus, the 2D or higher-dimensional torus topology is widely used in the off-chip networks for several supercomputers, including the Cray T3E [129], Fujitsu K Computer [6], and IBM Blue Gene series [1, 16]. Also, the torus's wraparound links convert plentiful on-chip wires into bandwidth, and reduce the hop count and latency. Yet, the torus needs additional effort to avoid deadlock. Utilizing two virtual channels (VCs) for the dateline scheme [26] and utilizing bubble flow control (BFC) [18, 19, 100, 123] are two main deadlock avoidance methods for torus networks.

1.3.2 Routing algorithm

Once the NoC topology has been determined, the routing algorithm calculates the traverse paths for packets from the source nodes to the destination nodes. This section leverages the 2D mesh as the platform to introduce routing algorithms.

There are several classification criteria for routing algorithms. As shown in Figure 1.4, according to the traverse path length, the routing algorithms can be classified as minimal routing or nonminimal routing. The paths of nonminimal routing may be outside the minimal quadrant defined by the source node and the destination node. The Valiant routing algorithm is a typical nonminimal routing procedure [139]. It firstly routes the packet to a random intermediate destination (the inter. dest node shown in Figure 1.4a), and then routes the packet from this intermediate destination to the final destination.

Figure 1.4 Nonminimal routing and minimal routing. (a) Nonminimal routing. (b) Minimal routing.

In contrast, the packet traverse path of minimal routing is always inside the minimal quadrant defined by the source node and the destination node. Figure 1.4b shows two minimal routing paths for one source and destination node pair. Since minimal routing's packet traverse hops are fewer than those of nonminimal routing, its power consumption is generally lower. Yet, nonminimal routing is more appropriate to achieve global load balance and fault tolerance.

The routing algorithms can be divided into deterministic routing and nondeterministic routing on the basis of the path count provided. Deterministic routing offers only one path for each source and destination node pair. Dimensional order routing (DOR) is a typical deterministic routing algorithm. As shown in Figure 1.5a, the XY DOR offers only one path between nodes (3,0) and (1,2); the packet is firstly forwarded along the X direction to the destination's X position, and then it is forwarded along the Y direction to the destination node.

Figure 1.5 DOR, O1Turn routing, and adaptive routing. (a) XY DOR. (b) O1Turn routing. (c) Adaptive routing.

In contrast, nondeterministic routing may offer more than one path. According to whether it considers the network status, nondeterministic routing can be classified as oblivious routing or adaptive routing. Oblivious routing does not consider the network status when calculating the routing paths. As shown in Figure 1.5b, O1Turn routing is an oblivious routing algorithm [132]. It offers two paths for each source and destination pair: one is the XY DOR path, and the other one is the YX DOR path. The injected packet randomly chooses one of them. Adaptive routing considers the network status when calculating the routing paths. As shown in Figure 1.5c, if the algorithm detects there is congestion between nodes (3,1) and (3,2), adaptive routing tries to avoid the congestion by forwarding the packet to node (2,1) rather than node (3,2). Adaptive routing has some abilities to avoid network congestion; thus, it generally supports higher saturation throughput than deterministic routing.

One critical issue of the routing algorithm design is to achieve deadlock freedom. Deadlock freedom requires that there is no cyclic dependency among the network resources, typically the buffers and channels [24, 32]. Some proposals eliminate the cyclic resource dependency by forbidding certain turns for packet routing [20, 43, 46]. These routing algorithms are generally classified as partially adaptive routing since they do not allow the packets to utilize all the paths between the source and the destination. Other proposals [32, 33, 97, 99], named fully adaptive routing, allow the packet to utilize all minimal paths between the source and the destination. They achieve deadlock freedom by utilizing the VCs [24] to form acyclic dependency graphs for routing subfunctions, which act as the backup deadlock-free guarantee.

1.3.3 Flow control

Flow control allocates the network resources, such as the channel bandwidth, buffer capacity, and control state, to traversing packets. Store-and-forward (SAF) [26], virtual cut-through (VCT) [72], and wormhole [27] flow control are three primary flow control types.

Only after receiving the entire packet does the SAF flow control apply the buffers and channels of the next hop. Figure 1.6a illustrates an SAF example, where a five-flit packet is routed from router 0 to router 4. At each hop, the packet is forwarded only after receiving all five flits. Thus, SAF flow control induces serialization latency at each packet-forwarding hop.

Figure 1.6 Examples of SAF, VCT, and wormhole flow control. H, head flit; B, body flit; T, tail flit. (a) An SAF example. (b) A VCT example. (c) A wormhole example.

To reduce serialization latency, VCT flow control applies the buffers and channels of the next hop after receiving the header flit of a packet, as shown in Figure 1.6b. There is no contention at router 1, router 2, and router 3; thus, these routers forward the packets immediately after receiving the header flit. VCT flow control allocates the buffers and channels at the packet granularity; it needs the downstream router to have enough buffers for the entire packet before forwarding the header flit. In cycle 3 in Figure 1.6b, there are only two buffer slots at router 3, which is not enough for the five-flit packet. Thus, the packet waits at router 3 for three more cycles to be forwarded. VCT flow control requires the input port to have enough buffers for an entire packet.

In contrast to the SAF and VCT mechanisms, the wormhole flow control allocates the buffers at the flit granularity. It can send out the packet even if there is only one buffer slot downstream. When there is no network congestion, wormhole flow control performs the same as VCT flow control, as illustrated for routers 0, 1, and 2 in Figure 1.6b and c. However, when congestion occurs, VCT and wormhole flow control perform differently. Wormhole flow control requires only one empty downstream slot. Thus, as shown in Figure 1.6c, router 2 can send out the head flit and the first body flit at cycle 3, even if the two free buffer slots at router 3 are not enough for the entire packet. Wormhole flow control allocates the channels at the packet granularity. Although the channel between routers 2 and 3 is free from cycle 5 to cycle 7 in Figure 1.6c, this channel cannot be allocated to other packets until the tail flit of the last packet is sent out at cycle 10.

The VC techniques [24] can be combined with all three flow control types. A VC is an independent FIFO buffer queue. The VC can achieve many design purposes, including mitigating the head-of-line blocking and improving the physical channel utilization for wormhole flow control, avoiding the deadlock and supporting the quality of service (QoS). For example, to improve the physical channel utilization, multiple VCs can be configured to share the same physical channel to allow packets to bypass the current blocked packet. The router allocates the link bandwidth to different VCs and packets at the flit granularity. When the downstream VC for one packet does not have enough buffers, other packets can still use the physical channel through other VCs. Table 1.1 summarizes the properties of these flow control techniques [35].

Table 1.1

Summary of Flow Control Mechanisms

1.3.4 Router microarchitecture

The router microarchitecture directly determines the router delay, area overhead, and power consumption. Figure 1.7 illustrates the structure of a canonical VC NoC router. The router has five ports, with one port for each direction of a 2D mesh network and another port for the local node. A canonical NoC router is mainly composed of the input units, routing computation (RC) logic, VC allocator, switch allocator, crossbar, and output units [26]. The features and functionality of these components are as

Enjoying the preview?

Page 1 of 1

Networks-on-Chip: From Implementations to Programming Paradigms

About this ebook

Sheng Ma

Related authors

Related to Networks-on-Chip

Related ebooks

Systems Architecture For You

Related podcast episodes

Related articles

Related categories

Reviews for Networks-on-Chip

What did you think?

Book preview

Networks-on-Chip - Sheng Ma

Abstract

Keywords

Chapter outline

1.1 The dawn of the many-core era

1.2 Communication-centric cross-layer optimizations

1.3 A baseline design space exploration of NoCs

1.3.1 Topology

1.3.2 Routing algorithm

1.3.3 Flow control

1.3.4 Router microarchitecture