You are on page 1of 11

Scalable Parallel Computing: Technology, Architecture, Programming

K. Hwang and Z. Xu, McGraw-Hill, New York, NY, 1998. ISBN 0-07-031798-4. Chapter 1: Scalable Computer Platforms and Models (p. 3-50)

Evolution of Computer Architectures Five generations of machines Scalable Computer Architectures Functionality and Performance Scaling in Cost Compatibility System Architectures Shared Nothing Shared Disk Shared Memory Macro-Architecture vs. Micro-Architecture Dimensions of Scalability Resource Scalability Application Scalability Technology Scalability Parallel Computer Models: Semantic Attributes Homogeneity Synchrony Interaction Mechanism Address Space Memory Model Performance Attributes Machine size, clock rate, workload, sequential execution, parallel execution, speed, speedup, efficiency, utilization, startup time, asymptotic bandwidth Abstract Machine Models: PRAM: Tcomp and Tload imbalance, simple, shared variable

1 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Bulk Synchronous Parallel: Tcom, Tload imbalance, Tcommunication, and Tsynchronization Includes interaction overhead Superstep execution: comp, interact, synch Phase Parallel: Tcom, Tload imbalance, Tcommunication, Tsynchronization, and Tparallel Includes all overhead Execution phases: Parallelism Phase, Computation Phase, and Interaction Phase Physical Machine Models: Parallel Vector Processor (PVP): UMA, crossbar, shared memory Symmetric Multiprocessor (SMP): UMA, crossbar or bus, shared memory, hard to scale Massively Parallel Processor (MPP): NORMA, message passing, custom interconnection, classic supercomputers Distributed Shared Memory (DSM): NUMA or NORMA, shared memory (hardware or software based), custom interconnections, possible cache directories Cluster of Workstations (COW): NORMA, message passing, SSI challenged, commodity processors and interconnection Basic Concept of Clustering Cluster Nodes Single-System-Image (SSI) Internode Connection Enhanced Availability Better Performance Cluster benefits and difficulties Useability, availability, scalability, available utilization, and performance/cost ratio Scalable Design Principles Independence Balanced Design Design for Scalability Latency Hiding

2 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Chapter 2:

Basics of Parallel Programming (p. 59-77)

Comparison of parallel and sequential programming Programming Components and Considerations Processes, Tasks, and Threads Process State and State Table Process Descriptor Process Context Execution Mode kernel, user Parallelism Issues Homogeneity in Processes Language Constructs Static versus Dynamic Parallelism Process Grouping Allocation Issues: DOP Degree of parallelism. Granularity Also called grain size. Interaction/Communication Issues Communication Synchronization Aggregation Data and Resource Dependence Flow dependence Anti-dependence Output dependence I/O dependence Unknown dependence Bernstein Conditions Oi I j =
Ii O j =

Oi O j =

3 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Chapter 3:

Performance Metrics and Benchmarks (p. 91-154)

Benchmarks have been defined to focus on specific machine characteristics Micro benchmarks: specific functions or attributes Macro benchmarks: functional programs representative of a class of applications Performance of Parallel Computers Computations Parallelism and Interaction Overhead Parallelism Overhead Process Management Grouping Operations (creation/destruction of groups) Process Inquiry Operations Interaction Overhead Synchronization Communication Aggregation Broadcast, scatter, gather, total exchange Performance Metrics Sequential Time, Parallel Time, Critical Path Time Speed, Speedup, Efficiency, Utilization Total Overhead Scalability and Speedup Analysis Amdahls Law: Fixed Problem Size Gustafsons Law Fixed Time Sun and Nis Law Memory/Resource Bounding Iso-performance Models

4 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Chapter 4:

Microprocessors as Building Blocks (p. 155-210)

Instruction Pipeline Design Issues: Pipeline cycle or processor cycle Instruction issue latency Cycles per instruction (CPI) Instruction issue rate Simple operations Complex operations Resource conflicts Instruction Execution Ordering From CISC to RISC and beyond Scalar Superscalar Superpipelined Superscalar-Superpipelined VLIW Multimedia Extensions Future Microprocessors Multiway Superscalar Superspeculative Processor Simultaneous Multithreaded Processor Trace (multiscalar) Processor Vector IRAM Processor Single-chip Multiprocessors Raw (configurable) Processors

5 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Chapter 5:

Distributed Memory and Latency Tolerance (p. 211-272)

Memory Hierarchy Inclusion Property Coherence Contention Locality of Reference Properties Temporal Spatial Sequential Memory Planning Capacity Average Access Time Cache Coherency Protocols Sources of incoherence: Write by different processors, process migration, I/O operations Cache Coherency Protocols Snoopy or Cache Directories Snoopy Coherency Protocols Must be able to observe memory transfers write-update vs. write invalidate MESI Shared Memory Consistency Memory event ordering Memory Consistency Models Strict Consistency Sequential Consistency Processor Consistency Weak Consistency Release Consistency Distributed Cache/Memory Architectures UMA, NUMA, COMA, NORMA SMP centralized memory architectures Others distributed memory architectures Cache Coherence Considerations Cache Coherent cc Non cache coherent ncc Software Cache coherent - sc 6 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Cache Directories Latency Tolerance Techniques Latency avoidance, reduction, and hiding Distributed Coherent Caches Data Prefetching Relaxed Memory Consistency Multithreaded Latency Hiding

7 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Chapter 6:

System Interconnections and Gigabit Networks (p. 273-342)

Basic Interconnecion Network Network Components Network Characteristics Network Properties Network Topologies Node degree, network diameter, bisection width Buses, Crossbar, and Multistage Interconnection Networks (MIN) Gigabit Network Technology Ethernet ATM Scalable Coherence Interface (SCI)

8 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Chapter 7:

Threading, Synchronization, and Communication (p. 343-402)

Software Multithreading the thread concept Threads, thread states and thread management Lightweight Process (LWP), LWP states and LWP management Heavyweight Process Kernel vs. User Level Processing Synchronization Mechanisms Synch problems faced by users Language constructs employed by the user to solve the synch problem (high-level language constructs) Synch primitives available in multiprocessor architectures (low-level constructs) Algorithms used to implement high-level constructs with the low-level constructs available. The TCP/IP Communication Protocol Suite OSI and Internet protocol stack Network Addressing TCP and UDP and IP Fast and Efficient Communications Effective Bandwidth Network Interface Circuitry Software communication libraries

9 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Chapter 8:

Symmetric and CC-NUMA Multiprocessors (p. 407-452)

SMP and CC-NUMA Technology Availability: Bottleneck: Latency: Memory Bandwidth: I/O Bandwidth: Scalability: Programming Advantage: Typical Applications Commercial SMP Servers Comparison of CC-NUMA Architectures Architecture: Shared Memory Access: Enhanced Scalability: Concerns:

10 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

Chapter 9:

Support of Clusters and Availability (p. 453-504)

Challenges of Clustering Classification Attributes Dedicated Cluster Enterprise Cluster Cluster Design Issues Availability Support for Clustering Reliability Availability Serviceability Types of Failures Availability Techniques Isolated Redundancy: Hot Standby, Mutual Takeover, and Fault-Tolerant Failover Recovery Schemes Checkpointing and Failure Recovery Methods Overhead What to Checkpoint Consistent Snapshot Support for Single System Image Single System (Application, Above Kernel, Kernel/Hardware) Single Control Use from any entry point Location Transparent Job Management in Clusters Characteristics of Cluster Workload Job Scheduling Issues

11 of 11
K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming, McGrawHill, New York, NY, 1998. ISBN 0-07-031798-4.

You might also like