MediumCapability

Llama 3.1 released: 8B/70B/405B models with 128K context and multilingual support on Hugging Face

AI Impact Summary

Llama 3.1 provides three sizes (8B, 70B, 405B) with a 128K token context and multilingual support across eight languages, expanding open-weight availability with base and instruction-tuned variants. It also introduces guard models (Llama Guard 3 and Prompt Guard) and tool-calling capabilities for agentic use cases, including built-in search and Wolfram Alpha integrations, which supports safer deployments and enhanced automation. However, the memory and hardware requirements are substantial (e.g., 405B FP16 ~810 GB for weights; KV cache adds to memory needs), so production deployments must plan high-memory GPUs or multi-node, potentially leverage FP8/INT4 quantization and cloud inference services via Inference Endpoints, Google Cloud, SageMaker, or Dell Hub.

Affected Systems

Llama 3.1Llama Guard 3

Date: Date not specified
Change type: capability
Severity: medium

Llama 3.1 released: 8B/70B/405B models with 128K context and multilingual support on Hugging Face

More from Hugging Face

Get alerts for Hugging Face