Deploy GPT-J 6B on Amazon SageMaker via Hugging Face Transformers
AI Impact Summary
GPT-J 6B can be deployed on Amazon SageMaker by packaging the 6B weights into a model.tar.gz on S3 and using the HuggingFaceModel integration with aligned Transformers and PyTorch versions. Memory considerations are key: the 24GB FP32 footprint is reduced with float16 and low_cpu_mem_usage, but real-time endpoints have a 60-second SLA on SageMaker, so cold-start latency and load times directly affect latency and throughput. The recommended path includes hosting artifacts on S3 (or HF-hosted artifacts) and provisioning GPU-backed endpoints, with attention to version compatibility and warm-start strategies to meet production SLAs.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info