大規模並行處理器程式設計

大規模並行處理器程式設計

《大規模並行處理器程式設計》是2010年清華大學出版社出版的圖書,作者是柯克。

基本介紹

  • 書名:大規模並行處理器程式設計
  • 作者:柯克
  • ISBN:9787302229735
  • 類別:程式設計
  • 定價:36.00元
  • 出版社:清華大學出版社
  • 出版時間:2010年7月1日
  • 開本:16開
內容簡介,目錄,

內容簡介

本書介紹了並行程式設計與GPU體系結構的基本概念,並詳細探討了用於構建並行程式的各種技術,用案例演示了並行程式設計的整個開發過程,即從並行計算的思想開始,直到最終實現實際且高效的並行程式。
本書特點
介紹了並行計算的思想,使得讀者可以把這種問題的思考方式滲透到高性能並行計算中去。
介紹了CUDA的使用,CUDA是NVIDIA公司專門為大規模並行環境創建的一種軟體開發工具。
介紹如何使用CUDA編程模式和OpenCL來獲得高性能和高可靠性。

目錄

Preface
Acknowledgments
Dedication
CHAPTER 1 INTRODUCTION
1.1 GPUs as Parallel Computers
1.2 Architecture of a Modern GPU
1.3 Why More Speed or Parallelism?
1.4 Parallel Programming Languages and Models
1.5 Overarching Goals
1.6 Organization of the Book
CHAPTER 2 HISTORY OF GPU COMPUTING
2.1 Evolution of Graphics pipelines
2.1.1 The Era of Fixed-Function Graphics Pipelines
2.1.2 Evolution of Programmable Real-Time Graphics
2.1.3 Unified Graphics and Computing Processors
2.1.4 GPGPU: An Intermediate Step
2.2 GPU Computing
2.2.1 Scalable GPUs
2.2.2 Recent Developments
2.3 Future Trends
CHAPTER 3 INTRODUCTION TO CUDA
3.1 Data Parallelism
3.2 CUDA Program Structure
3.3 A Matrix-Matrix Multiplication Example
3.4 Device Memories and Data Transfer
3.5 Kernel Functions and Threading
3.6 Summary
3.6.1 Function declarations
3.6.2 Kernel launch
3.6.3 Predefined variables
3.6.4 Runtime APl
CHAPTER 4 CUDA THREADS
4.1 CUDA Thread Organization
4.2 Using b]ockldx and threadIdx
4.3 Synchronization and Transparent Scalability
4.4 Thread Assignment
4.5 Thread Scheduling and Latency Tolerance
4.6 Summary
4.7 Exercises
CHAPTER 5 CUDATM MEMORIES
5.1 Importance of Memory Access Efficiency
5.2 CUDA Device Memory Types
5.3 A Strategy for Reducing Global Memory Traffic
5.4 Memory as a Limiting Factor to Parallelism
5.5 Summary
5.6 Exercises
CHAPTER 6 PERFORMANCE CONSIDERATIONS
6.1 More on Thread Execution
6.2 Global Memory Bandwidth
6.3 Dynamic Partitioning of SM Resources
6.4 Data Prefetching
6.5 Instruction Mix
6.6 Thread Granularity
6.7 Measured Performance and Summary
6.8 Exercises
CHAPTER 7 FLOATING POINT CONSIDERATIONS
7.1 Floating-Point Format
7.1.1 Normalized Representation of M
7.1.2 Excess Encoding of E
7.2 Representable Numbers
7.3 Special Bit Patterns and Precision
7.4 Arithmetic Accuracy and Rounding
7.5 Algorithm Considerations
7.6 Summary
7.7 Exercises
CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI RECONSTRUCTION
8.1 Application Background
8.2 Iterative Reconstruction
8.3 Computing FHd
Step 1. Determine the Kernel Parallelism Structure
Step 2. Getting Around the Memory Bandwidth Limitation.
Step 3. Using Hardware Trigonometry Functions
Step 4. Experimental Performance Tuning
8.4 Final Evaluation
8.5 Exercises
CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION AND ANALYSIS
CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING
CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM
CHAPTER 12 CONCLUSION AND'FuTuRE OUTLOOK
APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE
APPENDIX B GPU COMPUTE CAPABILITIES
Index

相關詞條

熱門詞條

聯絡我們