Transformers.js v3 adds WebGPU support, new quantization dtypes, and 120 architectures
AI Impact Summary
Transformers.js v3 enables WebGPU acceleration via device: 'webgpu' with ONNX Runtime Web integration, unlocking high-throughput in-browser inference for feature extraction, ASR, and image classification. It broadens quantization options to dtype variants (fp32, fp16, q8, q4, etc.) and supports per-module dtypes, offering fine-grained control over model size and accuracy. The release expands to 120 architectures (including Phi-3, Gemma, Gemma2, LLaVa, Florence-2, Depth Pro, MusicGen, PyAnnote, RT-DETR) and adds models like Florence-2-base-ft and Qwen2.5-0.5B-Instruct, with notable examples using mixedbread-ai and ONNX Community models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info