Chapter 1 – X86-Core Architecture
Chapter Goal: Explains the core architecture of an x86-64 processor. Topics discussed include fundamental data types, registers, status flags, memory addressing modes, and other important architectural subjects. Understanding of this material is necessary for the reader to successfully comprehend the book’s subsequent chapters.
Historical overview
Data types
Fundamental data types
Numerical data types
SIMD data types
Miscellaneous data types
Strings
Bit fields and bit strings
X86-64 processor internal architecture
Overview
General-purpose registers
Instruction pointer
RFLAGS
Floating-point and SIMD registers
MXCSR Register
Instruction operands
Memory addressing
Condition codes
Differences between x86-32 and x86-64
Chapter 2 – X86-64 Core Programming (Part 1)
Chapter Goal: Introduces the fundamentals of x86-64 assembly language programming. The programming examples illustrate essential x86-64 assembly language programming concepts including integer arithmetic, bitwise logical operations, and shift instructions. This chapter also explains basic assembler usage and x86-64 assembly language syntax.
Assembler basics
Instruction syntax
Assembler directives
Modern X86 Assembly Language Programming, Third Edition Page 2 of 7
Daniel Kusswurm – F:\ModX86Asm3E\Proposal\ModernX86Asm3e_Outline (proposal).docx
MASM vs. NASM
Source code overview
File and function naming conventions
Integer arithmetic
Integer (32-bit) addition and subtraction
Bitwise logical operations
Shift operations
Integer (64-bit) addition and subtraction
Integer multiplication and division
Chapter 3 – X86-64 Core Programming (Part 2)
Chapter Goal: Explores additional core x86-64 assembly language programming concepts. Topics discussed include advanced integer arithmetic, memory addressing modes, and condition codes. This chapter also covers important x86-64 assembly language programming concepts including proper stack use and for-loops.
Simple stack arguments
Mixed-type integer arithmetic
Memory addressing
Condition codes
Assembly language for-loops
Chapter 4 – X86-64 Core Programming (Part 3)
Chapter 4 explains how to exercise core x86-64 assembly language programming data constructs including arrays and structures. It also describes how to use common x86-64 string processing instructions.
Arrays
1D integer array arithmetic calculations
1D integer array arithmetic calculations using multiple arrays
2D integer arrays
Strings
Overview of x86 string instructions
Counting characters
String/array compare
String/array copy
String/array reversal
Assembly language structures
Chapter 5 – Scalar Floating-Point
Chapter 5 teaches the reader how to perform scalar floating-point arithmetic and other operations using assembly language. It also outlines the calling convention requirements for scalar floating-point arguments and return values.
Floating-point programming concepts
Single-precision floating-point arithmetic
Temperature conversions
Cone volume/surface area calculation
Double-precision floating-point arithmetic
Sphere volume/surface area calculation
Floating-point compares and conversions
Floating-point compares using VUCOMIS[S|D]
Floating-point compares using VCMPS[S|D]
Floating-point conversions
Floating-point arrays
Array mean/standard deviation calculation
Chapter 6 – Assembly Language Calling Conventions
Chapter 6 formally defines the calling run-time conventions for x86-64 assembly language functions. The first section explains the requirements for Windows and Visual C++ while the second section covers Linux and GNU C++.Calling convention requirements for Windows and Visual C++
Stack frames (Ch06_01)
Using non-volatile general-purpose registers
Using non-volatile SIMD registers
Calling external functions
Calling convention requirements for Linux and GNU C++
Stack arguments
Using non-volatile general-purpose registers
Calling external functions
Chapter 7 – Advanced Vector Extensions
Chapter 7 introduces Advanced Vector Extensions (AVX). It begins with a discussion of AVX architecture and related topics. Chapter 7 also explains elementary SIMD programming concepts. Understanding of this material is necessary for the reader to comprehend the AVX, AVX2, and AVX-512 programming examples in subsequent chapters.
X86-AVX architecture overview
AVX
AVX2
AVX-512
Merge masking and zero masking
Embedded broadcasts
Instruction level rounding
SIMD programing concepts
Basic arithmetic
Wraparound vs. saturated arithmetic
Pack floating-point
Pack integer
Programming differences between x86-SSE and x86-AVX
Chapter 8 – AVX Programming – Packed Integers Chapter 8 spotlights packed integer arithmetic and other operations using AVX. It also describes how to code packed integer calculating functions using arrays and the AVX instruction set.
Integer arithmetic
Addition and subtraction
Multiplication
Bitwise logical operations
Arithmetic and logical shifts
Integer array algorithms
Pixel minimum and maximum
Pixel mean
Chapter 9 – AVX Programming – Packed Floating Point
Chapter 9 demonstrates packed floating-point arithmetic and other operations using AVX. This chapter also explains how to use AVX instructions to perform calculations with floating-point arrays and matrices.
Floating-point arithmetic
Basic arithmetic operations
Compares
Conversions
Floating-point arrays
Array mean and standard deviation
Array square roots and compares
Floating-point matrices
Matrix column means
Chapter 10 – AVX2 Programming – Packed Integers
Chapter 10 describes AVX2 integer programming using x86-64 assembly language. This chapter also elucidates the coding of common image processing algorithms using the AVX2 instruction set.
Integer arithmetic
Basic operations
Size promotions
Image processing
Pixel clipping
RGB to grayscale
Pixel conversions
Image histogram
Chapter 11 – AVX2 Programming – Packed Floating Point (Part 1)
Chapter 11 teaches the reader how to enhance the performance of universal floating-point calculations using x86-64 assembly language and the AVX2 instruction set. The reader will also learn how to accelerate these types of calculations using fused-multiply-add (FMA) instructions.
Floating-Point Arrays
Least squares with FMA
Floating-Point Matrices
Matrix multiplication F32
Matrix multiplication F64
Matrix (4x4) multiplication F32
Matrix (4x4) multiplication F64
Matrix (4x4) vector multiplication F32
Matrix (4x4) vector multiplication F64
Covariance matrix F64
Chapter 12 – AVX2 Programming – Packed Floating Point (Part 2)
Chapter 12 is a continuation of the previous chapter. It explicates the coding of advanced algorithms including matrix inversion and convolutions using AVX2 and FMA instructions.
Advanced Matrix Operations
Matrix inverse F32
Matrix inverse F64
Signal Processing
1D convolution F32 variable-size kernel
1D convolution F64 variable-size kernel
1D convolution F32 fixed-size kernel
1D convolution F64 fixed-size kernel
Chapter 13 – AVX-512 Programming – Packed Integers
Chapter 13 highlights packed integer arithmetic and other operations using x86-64 assembly language and AVX-512. It also discusses how to code frequently used image processing algorithms using the AVX-512 instruction set.
Integer Arithmetic
Addition and subtraction
Masked addition and subtraction
Image Processing
Pixel clipping
Image statistics
Image histogram
Chapter 14 – AVX-512 Programming – Packed Floating Point (Part 1)
Chapter 14 explains basic operations using packed floating-point operands and the AVX-512 instruction set. It also teaches the reader how to code common floating-point algorithms using x86-64 assembly language and AVX-512.
Floating-point arithmetic
Floating-point arithmetic
Floating-point compares
Floating-point arithmetic and mask registers
Floating-point matrices
Covariance matrix
Matrix multiplication F32
Matrix multiplication F64
Matrix (4x4) vector multiplication F32
Matrix (4x4) vector multiplication F64 (Ch14_08)
Chapter 15 – AVX-512 Programming – Packed Floating Point (Part 2)
Chapter 15 is a continuation of the previous chapter. It illustrates the coding of advanced algorithms using AVX-512 and FMA instructions.
Signal Processing
1D convolution F32 variable-size kernel
1D convolution F64 variable-size kernel
1D convolution F32 fixed-size kernel
1D convolution F64 fixed-size kernel
Chapter 16 – Advanced Instructions and Optimization Guidelines
Chapter 16 demonstrates the use of advanced x86-64 assembly language instructions. It also discusses guidelines that the reader can exploit to improve the performance of their assembly language code.
Advanced instructions
CPUID instruction – processor information
CPUID instruction – AVX, AVX2, FMA, and AVX-512 detection
Integer non-temporal memory loads and stores
Floating-point non-temporal memory stores
SIMD text processing
Processor microarchitecture overview
X86-64 assembly language optimization guidelines
Appendix A – Source Code and Development Tools
Appendix A describes how to download, install, and execute the source code. It also includes some brief usage notes about the software development tools used to create the source code examples.
Source code
Download instructions
Setup and configuration
Executing a source code example
Software development tools for Windows
Microsoft Visual Studio
MASM
Software development tools for Linux
GNU make
GNU C++ compiler
NASM
Benchmarking notes
Appendix B – References and Additional Resources
Appendix B contains a list of references that were consulted during the writing of this book. It also lists supplemental resources that the reader can consult for additional x86-64 assembly language programming information.
X86-64 assembly language programming references
Algorithm references
C++ references
X86 processor software utilities and libraries
Additional resources