LeMaterial launches LeMat-Bulk: 6.7M-entry harmonized materials dataset from Materials Project, Alexandria, and OQMD
AI Impact Summary
LeMaterial announces an open-source initiative to unify major materials datasets (Materials Project, Alexandria, OQMD) into LeMat-Bulk, a 6.7 million-entry resource with standardized fields and seven material properties, enabling streamlined ML model training and large-scale screening. It leverages Optimade for data standardization and introduces a material fingerprinting approach to deduplicate and link materials across databases, which can accelerate discovery but requires careful integration with existing pipelines and licensing considerations. The roadmap includes additional data (e.g., charge data, r2SCAN, OC20/OC22), new models (Equiformerv2, FAENet), and tooling for similarity retrieval, implying downstream systems must accommodate evolving schemas and hashing-based identifiers while ensuring attribution under CC-BY-4.0.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info