Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support openfe 0.14 #13

Merged
merged 8 commits into from
Dec 7, 2023
Merged

Support openfe 0.14 #13

merged 8 commits into from
Dec 7, 2023

Conversation

ijpulidos
Copy link
Contributor

@ijpulidos ijpulidos commented Nov 28, 2023

These changes support the latest 0.14 release from openfe and charge transformations.

It creates a dictionary for the small molecules with the following structure Dict[SmallMoleculeComponent, openff.toolkit.Molecule] for both state_a and state_b small molecules. Which can readily be used with the API changes in the system_creation.get_omm_modeller function from openfe 0.14

Resolves #11

@codecov-commenter
Copy link

codecov-commenter commented Nov 28, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@40c966b). Click here to learn what that means.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #13   +/-   ##
=======================================
  Coverage        ?   92.10%           
=======================================
  Files           ?        7           
  Lines           ?      418           
  Branches        ?        0           
=======================================
  Hits            ?      385           
  Misses          ?       33           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ijpulidos ijpulidos requested review from dotsdl and IAlibay November 28, 2023 16:54
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, otherwise this looks good to me.

# due to issues with partial charge generation in ambertools
# we default to using the input conformer for charge generation
off_mol.assign_partial_charges(
'am1bcc', use_conformers=off_mol.conformers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll raise an issue, once we've got OpenFreeEnergy/openfe#598 sorted out, we should switch this to using a settings-selected backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that sounds great, having this option is indeed useful. Thanks

# Dict[SmallMoleculeComponent, openff.toolkit.Molecule]
state_a_small_mols = {component: component.to_openff() for component in state_a.components.values() if
isinstance(component, SmallMoleculeComponent)}
state_b_small_mols = {component: component.to_openff() for component in state_b.components.values() if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in understanding here that "common" small molecules between the two states are effectively being duplicated here?

If so there are two considerations:

  1. Given the complexity of some spectator molecules, this might add a bit of an overhead in duplicating partial charge generation. It's not the worst, but it's enough that if you have a multi-ring molecule you might find yourself idling for a little bit on what should be a short simulation.
  2. Because we're seeing a lack of consistency between repeats of a partial charge generation (i.e. calling antechamber multiple times on the same inputs), there is a small risk that you could end up with non-transforming molecules that have different partial charges. This is a bit rarer, and eventually we'll have a fix, but it is still something that is likely to persist for a bit until we can fix things upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes a lot of sense, and I agree. We probably want to avoid having to run into those issues. That explains why using the Mapping object here could be a better choice, just like you did on the openfe side.

That approach works, but it seems a bit hard to read/understand and might be prone to errors in the future. Wouldn't using a set work here? Since the set operations should be okay to be used with the openff Molecule objects. For example as in:

In [11]: mol_list = [benzene]*3 + [hexane]*4 + [cyclohexane]*7

In [12]: mol_list
Out[12]: 
[Molecule with name '' and SMILES '[H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H]',
 Molecule with name '' and SMILES '[H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H]',
 Molecule with name '' and SMILES '[H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H]',
 Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]',
 Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]',
 Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]',
 Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]',
 Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
 Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
 Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
 Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
 Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
 Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]',
 Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]']

In [13]: len(mol_list)
Out[13]: 14

In [14]: mol_set = set(mol_list)

In [15]: len(mol_set)
Out[15]: 3

That way, we can iterate through the set to parametrize the molecules and guarantee that they will only be passed once. I hope that makes sense. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see the problem now, that doesn't work because there can be instances of the same molecule in different locations and/or with different conformers, and this would end up parametrizing only one of them. I think using the Mapping object is the way to go.

Copy link
Member

@dotsdl dotsdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @ijpulidos! Just a couple notes to follow up on; after that I think this is good to go if that's all that's needed to make feflow openfe 0.14.0 compatible.

Comment on lines +186 to +187
system_generator.create_system(off_mol.to_topology().to_openmm(),
molecules=[off_mol])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification question: are we just running create_system here to make sure that the charged molecules can be parameterized by OpenMM?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the solvation step a bit later on, if you don't do this here you don't have necessary topology info registered to solvate your system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the topology info get registered? Looking at the definition of SystemGenerator.create_system, it's not clear to me that this method has any side effects or stores any state on the SystemGenerator...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry wrong word, not topology but forcefield - which you need to pass to addSolvent.

It's been a long while, but my rough recollection is that by calling create_system you update the TemplateGenerator which is hooked up to the app.ForceField object, so that it remains in "template memory" the next time around.

Note: we could probably reduce the footprint to a single system_generator here by moving this out of the loop and passing list(chain(all_alchemical_mols.values(), common_small_mols.values())) as the input to molecules. I don't think it'll have significant performance differences, but it might be a tad bit cleaner?

Comment on lines 176 to 178

# Assign charges if unassigned -- more info: Openfe issue #576
for off_mol in chain(state_a_small_mols.values(), state_b_small_mols.values()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Assign charges if unassigned -- more info: Openfe issue #576
for off_mol in chain(state_a_small_mols.values(), state_b_small_mols.values()):
combined_small_mols = state_a_small_mols | state_b_small_mols
# Assign charges if unassigned -- more info: Openfe issue #576
for off_mol in combined_small_mols.values():

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe doing the above should avoid the duplication of charging @IAlibay is referring to here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're relying on the pre-existance of partial charges as the way to ensure that we then don't re-generate partial charges later.

Depending on how OFF molecule equality works, wouldn't doing this end up with some molecules without partial charges in some of the two object lists?

@ijpulidos ijpulidos requested review from IAlibay and dotsdl December 5, 2023 00:46
@ijpulidos
Copy link
Contributor Author

@IAlibay @dotsdl This one should be now ready to be re-reviewed. Thanks so much for the comments, they were really helpful. I hope it now makes more sense.

@ijpulidos
Copy link
Contributor Author

Just to clarify a few of the changes made. As far as I can tell, the equality between OFF molecules could mean that two molecules with different conformers (or in different locations) could be thought as equal, since it only cares about the cheminformatics, not about the structure. Therefore, I agree with @IAlibay, I think we need to pass all the off mols to the partial charge generation. If we try merging dictionaries between the two states we could miss some of them, as far as I can see.

On the other hand, I tried to consistently use the typing Dict[SmallMoleculeComponent, openff.toolkit.Molecule] to store the small molecules from the different states, as possible. This should avoid creating copies of the molecules (by referencing them in the dict) and they should also be ready to be used in the get_omm_modeller util function. Maybe in the future we can consider creating a specific model/container for these if that makes any sense.

This is a big lengthy but I hope it helps clarifying the changes I made.

@ijpulidos
Copy link
Contributor Author

ijpulidos commented Dec 5, 2023

I just realized that we need a test system that has small molecules (common small molecules) other than the alchemical molecules to test the added functionality. @IAlibay how is this being tested on the repex protocol?

@IAlibay
Copy link
Member

IAlibay commented Dec 5, 2023

@ijpulidos there's a couple of approaches you can take.

Protein-ligand complexes: we've been using EG5 and pfkfb3 (the latter more in manual tests rather than CI). Those have cofactors you can pass through.

Host-guest systems: those are by default a non alchemical SMC + an alchemical one. Although they take a while longer to charge and probably aren't amazing for CI unless you have charges ahead of time.

@IAlibay
Copy link
Member

IAlibay commented Dec 5, 2023

We also have this fictitious system with a bunch of benzene modifications that have been shifted around: https://github.com/OpenFreeEnergy/openfe/blob/main/openfe/tests/protocols/conftest.py#L165

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! - as you mentioned a test case with multiple ligands would be good, but otherwise this seems good

@ijpulidos ijpulidos merged commit 6c17003 into main Dec 7, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support openfe 0.14
4 participants