Falcon-Arabic-7B: 32k-context Arabic LLM built on Falcon 3 by TII
AI Impact Summary
Falcon-Arabic is a 7B parameter multilingual model built on Falcon 3, augmented with 32k-context tokens and native Arabic pretraining to cover Modern Standard Arabic and dialects. The tokenizer is extended with 32k Arabic tokens and an embedding initialization strategy to map new tokens to related embeddings, enabling faster convergence on sentiment, reasoning, and dialect handling. Evaluations claim Falcon-Arabic outperforms other Arabic LLMs in its size class and even models up to four times larger, with a 32k context enabling retrieval-augmented generation and long-form content tasks. The release also documents supervised fine-tuning and DPO-based alignment, signaling stronger instruction-following for Arabic prompts and conversational correctness.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info