NeuralMD: A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics
Authors: Shengchao Liu*, Weitao Du*, Hannan Xu, Yanjing Li, Zhuoxinran Li, Vignesh Bhethanabotla, Divin Yan, Christian Borgs*, Anima Anandkumar*, Hongyu Guo*, Jennifer Chayes*
[Project Page] [ArXiv] [Datasets on HuggingFace] [Checkpoints on HuggingFace]
Setup the anaconda
wget https://repo.continuum.io/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh -b
export PATH=$PWD/anaconda3/bin:$PATH
Start with some basic packages.
conda create -n Geom3D python=3.9
conda activate Geom3D
conda install -y numpy networkx scikit-learn
conda install -y -c conda-forge rdkit
conda install -y pytorch==2.2 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y -c pyg -c conda-forge pyg=2.5
conda install -y -c pyg pytorch-scatter
conda install -y -c pyg pytorch-sparse
conda install -y -c pyg pytorch-cluster
pip install ogb==1.2.1
pip install sympy
pip install ase
pip install lie_learn # for TFN and SE3-Trans
pip install packaging # for SEGNN
pip3 install e3nn # for SEGNN
pip install transformers # for smiles
pip install selfies # for selfies
pip install atom3d # for Atom3D
pip install cffi # for Atom3D
pip install biopython # for Atom3D
pip install cython # for pyximport
conda install -y -c conda-forge py-xgboost-cpu # for XGB
pip install pymatgen # for CIF loading
pip install h5py
pip install torch-ema
git clone [email protected]:chao1224/torchdiffeq.git
cd torchdiffeq
pip install MDAnalysis
pip install -e .
We provide two ways to generate the datasets for MISATO.
- We provide the script under
data/MISATO
to generate two sub-datasets, and you can check thedata/README.md
for more details. - You can download the datasets from zenodo and HuggingFace directly.
2.1. You can download the MISATO
MD.hdf5
data from zenodo link, or use the following CMD:
wget -O data/MD/h5_files/MD.hdf5 https://zenodo.org/record/7711953/files/MD.hdf5
2.2. Then you can download the dataset from HuggingFace link provided by us.
The data folder structure looks like the following:
.
`-- MISATO_1000
| `-- raw
| | `-- train_MD.txt
| | `-- test_MD.txt
| | `-- MD.hdf5
| | `-- val_MD.txt
`-- MISATO
| `-- raw
| | `-- train_MD.txt
| | `-- test_MD.txt
| | `-- MD.hdf5
| | `-- val_MD.txt
`-- README.md
`-- MISATO_100
| `-- raw
| | `-- train_MD.txt
| | `-- test_MD.txt
| | `-- MD.hdf5
| | `-- val_MD.txt
Please check examples
for semi-flexible binding experiments.
We have two types of tasks
multi_traj
single_traj
and four ML methodsVerletMD
GNNMD
DenoisinLD
NeuralMD
--NeuralMD_binding_model=NeuralMD_Binding01
for NeuralMD ODE--NeuralMD_binding_model=NeuralMD_Binding02
or--NeuralMD_binding_model=NeuralMD_Binding04
for NeuralMD SDE
We provide the optimal checkpoints and corresponding hyperparameters at this HuggingFace link.
Feel free to cite this work if you find it useful to you!
@inproceedings{
@article{liu2024NeuralMD,
title={A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics},
author={Liu, Shengchao* and Du, Weitao* and Xu, Hannan and Li, Yanjing and Li, Zhuoxinran and Bhethanabotla, Vignesh and Liang, Yan and Borgs, Christian* and Anandkumar, Anima* and Guo, Hongyu* and Chayes, Jennifer*},
journal={arXiv preprint arXiv:2401.15122},
year={2024}
}