InfoCapability

Transformers.js v3 adds WebGPU support, new quantization dtypes, and 120 architectures

AI Impact Summary

Transformers.js v3 enables WebGPU acceleration via device: 'webgpu' with ONNX Runtime Web integration, unlocking high-throughput in-browser inference for feature extraction, ASR, and image classification. It broadens quantization options to dtype variants (fp32, fp16, q8, q4, etc.) and supports per-module dtypes, offering fine-grained control over model size and accuracy. The release expands to 120 architectures (including Phi-3, Gemma, Gemma2, LLaVa, Florence-2, Depth Pro, MusicGen, PyAnnote, RT-DETR) and adds models like Florence-2-base-ft and Qwen2.5-0.5B-Instruct, with notable examples using mixedbread-ai and ONNX Community models.

Affected Systems

@huggingface/transformersONNX Runtime Web

Date: Date not specified
Change type: capability
Severity: info

Transformers.js v3 adds WebGPU support, new quantization dtypes, and 120 architectures

More from Hugging Face

Get alerts for Hugging Face