PyTorch GPU memory profiling with _record_memory_history and _dump_snapshot for training runs
AI Impact Summary
PyTorch exposes GPU memory profiling via torch.cuda.memory._record_memory_history and _dump_snapshot, enabling step-by-step visualization of memory usage during model creation, forward passes, backward passes, and optimization. The material breaks down memory by model parameters, activations, gradients, and optimizer state, and shows how workload changes like batch size alter the peak memory profile. It demonstrates profiling on both a simple Linear model and a larger transformer-based model (Qwen/Qwen2.5-1.5B), underscoring how to interpret the memory history file (profile.pkl) and the visual graph. This capability supports engineering teams in capacity planning and memory optimization decisions to reduce CUDA OOM incidents.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info