Arc Virtual Cell Challenge: ST and SE transformers for context-generalized CRISPR perturbation predictions
AI Impact Summary
Arc Institute's Virtual Cell Challenge provides a scalable benchmark for context generalization in biology by offering ~300k scRNA-seq profiles and a concrete two-model baseline to predict transcriptomic response to CRISPR perturbations in unseen cell types. The ST model leverages a Llama backbone with covariate-matched control and perturbation encoders plus a decoder, trained with Maximum Mean Discrepancy to align perturbed and control distributions; the SE model uses a BERT-like autoencoder with gene embeddings derived from ESM2 protein embeddings and a 2048-gene cell representation. This setup yields a testbed for evaluating cross-cell-type generalization and end-to-end perturbation prediction, enabling faster in silico screening workflows for biotech teams, provided the models are validated to handle biological variability and batch effects.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info