This dataset is downloadable from this link, which contains 561,888 materials. Its format is described in here. The original data is available at the OQMD website.
Click on the above Colab link to create a Colab notebook for a data loading tutorial.
How to create this dataset is described in here.
There is an obviously abnormal entry in this dataset.
index | name | formula | spacegroup | nelements | nsites |
---|---|---|---|---|---|
277145 | oqmd-753381 | Mg | 10 | 2 | 1 |
This problem originates in the OQMD. You can see its calculation result at link on the online database (based on OQMD v1.5 as of April 19, 2022). You can remove this entry by modifying the split
file as follows:
import json
with open('split.json') as f:
split = json.load(f)
split['train'].remove(277145)
with open("split.json", 'w') as f:
json.dump(split, f)
As of December 2022, there are 15 corrections for space group. You can see at this link. These incorrect determinations were uncovered by updating Spglib (https://spglib.github.io/spglib/).