Amazon EKS DRA for EFA: Topology-Aware RDMA Allocation
AI Impact Summary
Amazon EKS has introduced Dynamic Resource Allocation (DRA) for Elastic Fabric Adapter (EFA), enabling topology-aware RDMA allocation for AI/ML and HPC workloads. This allows workloads utilizing NVIDIA GPUs, AWS Trainium, or AWS Inferentia to optimize inter-node communication by routing traffic through the closest network interface. The EFA DRA driver simplifies management and maximizes EFA interface utilization, particularly through shared interface support across workloads.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium