LeMaterial launches LeMat-Bulk: 6.7M unified materials dataset from Materials Project, Alexandria, and OQMD
AI Impact Summary
LeMaterial unifies Materials Project, Alexandria, and OQMD into LeMat-Bulk, delivering a 6.7M-entry, standardized dataset with 7 material properties to accelerate ML-driven materials discovery. It introduces a material fingerprint hashing approach to deduplicate and connect materials across databases, enabling faster novelty detection and cleaner training data. The effort, built with Optimade, Crystal Toolkit, Pymatgen, Dash, and Hugging Face tooling, lowers integration and data wrangling costs while promoting open, credit-attributed data under CC-BY-4.0. Roadmap features (new data like r2SCAN, surface datasets, and broader model coverage) signal growing scope for enterprise ML pipelines and cross-database research workflows.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info