StarCoder2 released: open-code LLM family (3B/7B/15B) trained on The Stack v2
AI Impact Summary
StarCoder2 introduces an open-family of code LLMs (3B, 7B, 15B) trained on The Stack v2, featuring a 16k token context and a fill-in-the-middle objective. By releasing the models, datasets, and training code, BigCode enables teams to experiment with open-code LLMs using NVIDIA NeMo on NVIDIA-accelerated infrastructure and deploy via the Hugging Face Hub. The Stack v2’s repository-scoped context and broad language coverage expand applicability to diverse coding tasks, potentially accelerating development cycles for code-completion, synthesis, and tooling in enterprise environments.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info