Description
CUDA Fortran for Scientists and Engineers (2nd Ed.)
Best Practices for Efficient CUDA Fortran Programming
Authors: Ruetsch Gregory, Fatica Massimiliano
Language: EnglishSubject for CUDA Fortran for Scientists and Engineers:
Keywords
1D stencil; Accuracy due to FMA instructions; Accuracy of reduction; Arithmetic throughput; Asynchronous data transfers; Bandwidth; Bank conflicts; CUDA events; CUFFT; CURAND; Coalescing; Compilation; Compute capability; Compute mode; Constant memory; Convolution; Data coalescing; Data parallelism; Device management; Direct access; Direct transfer; Error handling; Execution configuration; Host and device code; Hybrid computation; Hyper-Q; Instruction-level parallelism; Kernel; Message passing interface (MPI); NVIDIA system management interface (nvidia-smi); Nonuniform mesh; Overlapping of transfers and compute; Peer-to-peer; Performance metrics; Pinned memory; Profiling; Random-number generation; Reduction; Registers; Shared memory; Spectral derivative; Stream; Synchronization; Textures; Thread-level parallelism; Timing; Unified virtual addressing (UVA); Warp
350 p. · Paperback
Description
/li>Contents
/li>Biography
/li>Comment
/li>
CUDA Fortran for Scientists and Engineers: Best Practices for Efficient CUDA Fortran Programming shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran. In order to add CUDA Fortran to existing Fortran codes, they explain how to understand the target GPU architecture, identify computationally-intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance ? all in Fortran, without having to rewrite in another language. Each concept is illustrated with actual examples so you can immediately evaluate the performance of your code in comparison. This second edition provides much needed updates on how to efficiently program GPUs in CUDA Fortran. It can be used either as a tutorial on GPU programming in CUDA Fortran as well as a reference text.
PART I: CUDA Fortran Programming 1. Introduction 2. Correctness, Accuracy, and Debugging 3. Performance Measurements and Metrics 4. Synchronization 5. Optimization 6. Multi-GPU Programming 7. Porting Tips and Techniques 8. Interfacing with CUDA C, OpenACC, and CUDA Libraries PART II Case Studies 9. Monte Carlo Method 10. Finite Difference Method 11. Applications of the Fast Fourier TransformRay Tracing
Massimiliano Fatica is the manager of the Tesla HPC Group at NVIDIA where he works in the area of GPU computing (high-performance computing and clusters). He holds a laurea in Aeronautical Engineering and a Phd in Theoretical and Applied Mechanics from the University of Rome “La Sapienza. Prior to joining NVIDIA, he was a research staff member at Stanford University where he worked at the Center for Turbulence Research and Center for Integrated Turbulent Simulations on applications for the Stanford Streaming Supercomputer.
- Presents optimization strategies for current hardware, including Hopper generation GPUs
- Includes discussions of new language and hardware features, including managed memory, tensor cores, shuffle instructions, new multi-GPU paradigms
- Offers resources and strategies for porting large codes to GPUs, including language features as well as library use