Llama 3.1 released: 8B/70B/405B models with 128K context and multilingual support on Hugging Face
AI Impact Summary
Llama 3.1 provides three sizes (8B, 70B, 405B) with a 128K token context and multilingual support across eight languages, expanding open-weight availability with base and instruction-tuned variants. It also introduces guard models (Llama Guard 3 and Prompt Guard) and tool-calling capabilities for agentic use cases, including built-in search and Wolfram Alpha integrations, which supports safer deployments and enhanced automation. However, the memory and hardware requirements are substantial (e.g., 405B FP16 ~810 GB for weights; KV cache adds to memory needs), so production deployments must plan high-memory GPUs or multi-node, potentially leverage FP8/INT4 quantization and cloud inference services via Inference Endpoints, Google Cloud, SageMaker, or Dell Hub.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium