-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support openfe 0.14 #13
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #13 +/- ##
=======================================
Coverage ? 92.10%
=======================================
Files ? 7
Lines ? 418
Branches ? 0
=======================================
Hits ? 385
Misses ? 33
Partials ? 0 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, otherwise this looks good to me.
# due to issues with partial charge generation in ambertools | ||
# we default to using the input conformer for charge generation | ||
off_mol.assign_partial_charges( | ||
'am1bcc', use_conformers=off_mol.conformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll raise an issue, once we've got OpenFreeEnergy/openfe#598 sorted out, we should switch this to using a settings-selected backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that sounds great, having this option is indeed useful. Thanks
# Dict[SmallMoleculeComponent, openff.toolkit.Molecule] | ||
state_a_small_mols = {component: component.to_openff() for component in state_a.components.values() if | ||
isinstance(component, SmallMoleculeComponent)} | ||
state_b_small_mols = {component: component.to_openff() for component in state_b.components.values() if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in understanding here that "common" small molecules between the two states are effectively being duplicated here?
If so there are two considerations:
- Given the complexity of some spectator molecules, this might add a bit of an overhead in duplicating partial charge generation. It's not the worst, but it's enough that if you have a multi-ring molecule you might find yourself idling for a little bit on what should be a short simulation.
- Because we're seeing a lack of consistency between repeats of a partial charge generation (i.e. calling antechamber multiple times on the same inputs), there is a small risk that you could end up with non-transforming molecules that have different partial charges. This is a bit rarer, and eventually we'll have a fix, but it is still something that is likely to persist for a bit until we can fix things upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes a lot of sense, and I agree. We probably want to avoid having to run into those issues. That explains why using the Mapping
object here could be a better choice, just like you did on the openfe side.
That approach works, but it seems a bit hard to read/understand and might be prone to errors in the future. Wouldn't using a set
work here? Since the set operations should be okay to be used with the openff Molecule
objects. For example as in:
In [11]: mol_list = [benzene]*3 + [hexane]*4 + [cyclohexane]*7
In [12]: mol_list
Out[12]:
[Molecule with name '' and SMILES '[H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H]',
Molecule with name '' and SMILES '[H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H]',
Molecule with name '' and SMILES '[H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H]',
Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]',
Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]',
Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]',
Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]',
Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]']
In [13]: len(mol_list)
Out[13]: 14
In [14]: mol_set = set(mol_list)
In [15]: len(mol_set)
Out[15]: 3
That way, we can iterate through the set to parametrize the molecules and guarantee that they will only be passed once. I hope that makes sense. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see the problem now, that doesn't work because there can be instances of the same molecule in different locations and/or with different conformers, and this would end up parametrizing only one of them. I think using the Mapping
object is the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good @ijpulidos! Just a couple notes to follow up on; after that I think this is good to go if that's all that's needed to make feflow
openfe
0.14.0 compatible.
system_generator.create_system(off_mol.to_topology().to_openmm(), | ||
molecules=[off_mol]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification question: are we just running create_system
here to make sure that the charged molecules can be parameterized by OpenMM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the solvation step a bit later on, if you don't do this here you don't have necessary topology info registered to solvate your system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the topology info get registered? Looking at the definition of SystemGenerator.create_system, it's not clear to me that this method has any side effects or stores any state on the SystemGenerator
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry wrong word, not topology
but forcefield
- which you need to pass to addSolvent
.
It's been a long while, but my rough recollection is that by calling create_system
you update the TemplateGenerator which is hooked up to the app.ForceField
object, so that it remains in "template memory" the next time around.
Note: we could probably reduce the footprint to a single system_generator here by moving this out of the loop and passing list(chain(all_alchemical_mols.values(), common_small_mols.values()))
as the input to molecules
. I don't think it'll have significant performance differences, but it might be a tad bit cleaner?
|
||
# Assign charges if unassigned -- more info: Openfe issue #576 | ||
for off_mol in chain(state_a_small_mols.values(), state_b_small_mols.values()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Assign charges if unassigned -- more info: Openfe issue #576 | |
for off_mol in chain(state_a_small_mols.values(), state_b_small_mols.values()): | |
combined_small_mols = state_a_small_mols | state_b_small_mols | |
# Assign charges if unassigned -- more info: Openfe issue #576 | |
for off_mol in combined_small_mols.values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're relying on the pre-existance of partial charges as the way to ensure that we then don't re-generate partial charges later.
Depending on how OFF molecule equality works, wouldn't doing this end up with some molecules without partial charges in some of the two object lists?
Just to clarify a few of the changes made. As far as I can tell, the equality between OFF molecules could mean that two molecules with different conformers (or in different locations) could be thought as equal, since it only cares about the cheminformatics, not about the structure. Therefore, I agree with @IAlibay, I think we need to pass all the off mols to the partial charge generation. If we try merging dictionaries between the two states we could miss some of them, as far as I can see. On the other hand, I tried to consistently use the typing This is a big lengthy but I hope it helps clarifying the changes I made. |
I just realized that we need a test system that has small molecules (common small molecules) other than the alchemical molecules to test the added functionality. @IAlibay how is this being tested on the repex protocol? |
@ijpulidos there's a couple of approaches you can take. Protein-ligand complexes: we've been using EG5 and pfkfb3 (the latter more in manual tests rather than CI). Those have cofactors you can pass through. Host-guest systems: those are by default a non alchemical SMC + an alchemical one. Although they take a while longer to charge and probably aren't amazing for CI unless you have charges ahead of time. |
We also have this fictitious system with a bunch of benzene modifications that have been shifted around: https://github.com/OpenFreeEnergy/openfe/blob/main/openfe/tests/protocols/conftest.py#L165 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! - as you mentioned a test case with multiple ligands would be good, but otherwise this seems good
These changes support the latest 0.14 release from openfe and charge transformations.
It creates a dictionary for the small molecules with the following structure
Dict[SmallMoleculeComponent, openff.toolkit.Molecule]
for bothstate_a
andstate_b
small molecules. Which can readily be used with the API changes in thesystem_creation.get_omm_modeller
function from openfe 0.14Resolves #11