InfoCapability

Encrypted LLM Inference with FHE: GPT-2 via Concrete-Python and TFHE integration

AI Impact Summary

Researchers demonstrate a privacy-preserving LLM workflow by executing parts of GPT-2 under Fully Homomorphic Encryption (FHE). The approach splits inference between client and server, uses Hugging Face GPT-2 components, and replaces selected attention operations with FHE-friendly equivalents via Concrete-Python/Concrete-ML and TFHE with PBS, aiming to protect user data and model IP. This is a capability-level change that, if productionized, would enable on-premise or trusted-server deployments for sensitive domains, but entails substantial compute, cryptographic integration, and potential accuracy/latency trade-offs (4-bit quantization achieving ~96% accuracy on a small eval set).

Affected Systems

GPT-2 (GPT2LMHeadModel)Hugging Face transformers

Date: Date not specified
Change type: capability
Severity: info

Encrypted LLM Inference with FHE: GPT-2 via Concrete-Python and TFHE integration

More from Hugging Face

Get alerts for Hugging Face