Documentation coming soon!
Poster: Learning Meaningful Representations of Life
This repository integrates a number of codebases into a protein-protein docking system.
- Evolutionary Scale Modeling
- Atom3D
- IPA
- SidechainNet
- Massively Parallel Natural Extension of Reference Frame
IPA and MPNerf are present as submodules, as they are modified from original.
The file data_preprocess.py will set up data for training. The steps are as follows
- Download LMDBs from Atom3D
- Map LMDBs to separated PDB chains
- Parse PDBs into SidechainNet format, unify data into single files
- Extract and encode sequences from PDB through ESM
- Wrap dataset into single SidechainNet Dataset format