Professional Documents
Culture Documents
Overview
Why GPUs
GPU Archiecture
Hardware Types
Software
Dev Tools
Programming Techniques
Applications
Gotchas
Q&A
Why GPUs
ASIC
Mobile
Embedded
Cloud
Most Vendors
84 Streaming Multiprocessors
64 FP32 Cores per SM = 5376
32 FP64 Cores per SM = 2688
10 Tensor Cores per SM = 840
GPU Speed-Up
https://blogs.nvidia.com/blog/2010/06/23/gpus-are-only-up-to-14-times-faster-than-cpus-says-intel/
Amdahl's Law
Speed-up Limited by Amdahls Law
Serial
5%
Parallel
95%
Serial Parallel
50% 50%
Software Model
Kernel Instances
Host
Driver
Code
GPU Code Example
#include "../common/book.h"
#define N 10
From: https://developer.download.nvidia.com/books/cuda-by-example/cuda-by-example-sample.pdf
Dev Stacks & Tools
CUDA
OpenCL
OpenACC
OpenMP
C++ AMP
Others
Debuggers
Libraries
http://www.seas.upenn.edu/~cis565/LECTURES/Lecture3.pdf
GPU Memory Hierarchy
https://www.bu.edu/pasi/files/2011/07/Lecture31.pdf
Parallel Algorithms
Traditional Model Lock Free Model
HPC
Numerous,
Finance
AI/ML
Tensor flow,
Computer Vision
OpenCV,
Video/Audio
FFMpeg,
Database
Kinetica, MapD,
Gotchas
Devices
Drivers
Memory Allocation
Memory Bandwidth
Tools
Debugging
Transfer Bandwidth
No Virtual Memory
No Interrupts
No O/S
SIMD
Summary
GPU Great Benefits