Fine-tuning CLIP with satellite images and captions using RSICD dataset
AI Impact Summary
This work documents domain-adaptive fine-tuning of OpenAI CLIP on remote-sensing imagery using the RSICD dataset, executed on TPUs via Flax/JAX on Google Cloud. By pairing satellite images with multilingual captions and applying image and text augmentation, it yields improved image-text alignment over the baseline CLIP model, enabling more accurate zero-shot retrieval in geospatial datasets. The project demonstrates cross-domain applicability (RSICD, UCM, Sydney) but also highlights potential dual-use concerns for surveillance and climate-related monitoring, underscoring the need for deployment risk assessment and governance.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info