PyTorch CUDA memory profiling: visualize GPU memory with _record_memory_history and _dump_snapshot
AI Impact Summary
PyTorch provides a built-in GPU memory profiling workflow that records memory events and exports a profile for visualization, enabling engineers to attribute peak allocations to model parameters, activations, gradients, and optimizer state. The content includes concrete memory calculations (e.g., a 10,000 × 50,000 linear layer ~2 GB; activations around 1 GB) and shows memory spikes tied to initialization, forward passes, backward passes, and optimizer steps, illustrating where memory pressure originates. This capability supports data-driven tuning of batch sizes, activation/gradient management, and optimizer footprints to prevent CUDA out-of-memory errors during training and optimize GPU utilization.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info