diff --git a/README.md b/README.md index 5c9a151..d92c9c1 100644 --- a/README.md +++ b/README.md @@ -3,14 +3,13 @@ This repository contains scripts and data files used in the creation of the SPICE dataset. It does not contain the dataset itself. That is available from Zenodo: -[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8222043.svg)](https://doi.org/10.5281/zenodo.8222043) - +[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10835749.svg)](https://doi.org/10.5281/zenodo.10835749) SPICE (Small-Molecule/Protein Interaction Chemical Energies) is a collection of quantum mechanical data for training potential functions. The emphasis is particularly on simulating drug-like small molecules interacting with proteins. It is designed to achieve the following goals. -- **Cover a wide range of chemical space**. It includes 15 elements (H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I) +- **Cover a wide range of chemical space**. It includes 17 elements (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, I) and a wide range of chemical groups. It includes charged and polar molecules as well as neutral ones. It is designed to sample a wide range of both covalent and non-covalent interactions. - **Cover a wide range of conformations**. It includes both low and high energy conformations. It is @@ -29,27 +28,33 @@ with proteins. It is designed to achieve the following goals. public domain equivalent [CC0 license](https://creativecommons.org/share-your-work/public-domain/cc0/). SPICE is made up of a collection of subsets. Each one is designed to provide a particular type of information. -They include the following. +The subsets in the current version (2.0) include the following. - **Dipeptides**. These provide comprehensive sampling of the covalent interactions found in proteins. - **Solvated amino acids**. These provide sampling of protein-water and water-water interactions. - **PubChem molecules**. These sample a very wide variety of drug-like small molecules. +- **Solvated PubChem molecules**. These provide sampling of ligand-water interactions. - **Monomer and dimer structures from [DES370K](https://www.nature.com/articles/s41597-021-00833-x)**. These provide sampling of a wide variety of non-covalent interactions. +- **Amino acid, ligand pairs**. These provide sampling of nonbonded protein-ligand interactions. - **Ion pairs**. These provide further sampling of Coulomb interactions over a range of distances. +- **Water clusters**. These provide additional sampling of water-water interactions. This table summarizes the content of each subset: the number of molecules/clusters it contains, the total number of conformations, the range of sizes spanned by the molecules/clusters, and the list of elements that appear in the subset. -|Subset|Molecules|Conformations|Atoms|Elements| -|---|---|---|---|---| -|Dipeptides|677|33850|26–60|H, C, N, O, S| +|Subset|Molecules/Clusters|Conformations|Atoms|Elements| +|------|------------------|-------------|-----|--------| +|Dipeptides|677|33,850|26–60|H, C, N, O, S| |Solvated Amino Acids|26|1300|79–96|H, C, N, O, S| -|DES370K Dimers|3490|345676|2–34|H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I| -|DES370K Monomers|374|18700|3–22|H, C, N, O, F, P, S, Cl, Br, I| -|PubChem|14643|731856|3–50|H, C, N, O, F, P, S, Cl, Br, I| +|DES370K Dimers|3490|345,676|2–34|H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I| +|DES370K Monomers|374|18,700|3–22|H, C, N, O, F, P, S, Cl, Br, I| +|PubChem|28,039|1,398,566|3–50|H, B, C, N, O, F, Si, P, S, Cl, Br, I| +|Solvated PubChem|1397|13,934|63–110|H, C, N, O, F, P, S, Cl, Br, I| +|Amino Acid Ligand Pairs|79,967|194,174|24–72|H, C, N, O, F, P, S, Cl, Br, I| |Ion Pairs|28|1426|2|Li, F, Na, Cl, K, Br, I| -|Total|19238|1132808|2–96|H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I| +|Water Clusters|1|1000|90|H, O| +|Total|113,999|2,008,628|2–110|H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, I| ## Citing The Dataset