Skip to content

Commit

Permalink
Updated README for 2.0 (#97)
Browse files Browse the repository at this point in the history
  • Loading branch information
peastman authored Mar 19, 2024
1 parent 4aea5f7 commit d859cc3
Showing 1 changed file with 16 additions and 11 deletions.
27 changes: 16 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,13 @@
This repository contains scripts and data files used in the creation of the SPICE dataset. It does not contain the
dataset itself. That is available from Zenodo:

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8222043.svg)](https://doi.org/10.5281/zenodo.8222043)

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10835749.svg)](https://doi.org/10.5281/zenodo.10835749)

SPICE (Small-Molecule/Protein Interaction Chemical Energies) is a collection of quantum mechanical data for
training potential functions. The emphasis is particularly on simulating drug-like small molecules interacting
with proteins. It is designed to achieve the following goals.

- **Cover a wide range of chemical space**. It includes 15 elements (H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I)
- **Cover a wide range of chemical space**. It includes 17 elements (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, I)
and a wide range of chemical groups. It includes charged and polar molecules as well as neutral ones. It is
designed to sample a wide range of both covalent and non-covalent interactions.
- **Cover a wide range of conformations**. It includes both low and high energy conformations. It is
Expand All @@ -29,27 +28,33 @@ with proteins. It is designed to achieve the following goals.
public domain equivalent [CC0 license](https://creativecommons.org/share-your-work/public-domain/cc0/).

SPICE is made up of a collection of subsets. Each one is designed to provide a particular type of information.
They include the following.
The subsets in the current version (2.0) include the following.

- **Dipeptides**. These provide comprehensive sampling of the covalent interactions found in proteins.
- **Solvated amino acids**. These provide sampling of protein-water and water-water interactions.
- **PubChem molecules**. These sample a very wide variety of drug-like small molecules.
- **Solvated PubChem molecules**. These provide sampling of ligand-water interactions.
- **Monomer and dimer structures from [DES370K](https://www.nature.com/articles/s41597-021-00833-x)**.
These provide sampling of a wide variety of non-covalent interactions.
- **Amino acid, ligand pairs**. These provide sampling of nonbonded protein-ligand interactions.
- **Ion pairs**. These provide further sampling of Coulomb interactions over a range of distances.
- **Water clusters**. These provide additional sampling of water-water interactions.

This table summarizes the content of each subset: the number of molecules/clusters it contains, the total number of
conformations, the range of sizes spanned by the molecules/clusters, and the list of elements that appear in the subset.

|Subset|Molecules|Conformations|Atoms|Elements|
|---|---|---|---|---|
|Dipeptides|677|33850|26–60|H, C, N, O, S|
|Subset|Molecules/Clusters|Conformations|Atoms|Elements|
|------|------------------|-------------|-----|--------|
|Dipeptides|677|33,850|26–60|H, C, N, O, S|
|Solvated Amino Acids|26|1300|79–96|H, C, N, O, S|
|DES370K Dimers|3490|345676|2–34|H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I|
|DES370K Monomers|374|18700|3–22|H, C, N, O, F, P, S, Cl, Br, I|
|PubChem|14643|731856|3–50|H, C, N, O, F, P, S, Cl, Br, I|
|DES370K Dimers|3490|345,676|2–34|H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I|
|DES370K Monomers|374|18,700|3–22|H, C, N, O, F, P, S, Cl, Br, I|
|PubChem|28,039|1,398,566|3–50|H, B, C, N, O, F, Si, P, S, Cl, Br, I|
|Solvated PubChem|1397|13,934|63–110|H, C, N, O, F, P, S, Cl, Br, I|
|Amino Acid Ligand Pairs|79,967|194,174|24–72|H, C, N, O, F, P, S, Cl, Br, I|
|Ion Pairs|28|1426|2|Li, F, Na, Cl, K, Br, I|
|Total|19238|1132808|2–96|H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I|
|Water Clusters|1|1000|90|H, O|
|Total|113,999|2,008,628|2–110|H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, I|

## Citing The Dataset

Expand Down

0 comments on commit d859cc3

Please sign in to comment.