Support openfe 0.14 #13

ijpulidos · 2023-11-28T16:36:55Z

These changes support the latest 0.14 release from openfe and charge transformations.

It creates a dictionary for the small molecules with the following structure Dict[SmallMoleculeComponent, openff.toolkit.Molecule] for both state_a and state_b small molecules. Which can readily be used with the API changes in the system_creation.get_omm_modeller function from openfe 0.14

Resolves #11

codecov-commenter · 2023-11-28T16:49:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@40c966b). Click here to learn what that means.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #13   +/-   ##
=======================================
  Coverage        ?   92.10%           
=======================================
  Files           ?        7           
  Lines           ?      418           
  Branches        ?        0           
=======================================
  Hits            ?      385           
  Misses          ?       33           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

IAlibay

Just a few comments, otherwise this looks good to me.

IAlibay · 2023-11-28T21:52:03Z

feflow/protocols/nonequilibrium_cycling.py

+                # due to issues with partial charge generation in ambertools
+                # we default to using the input conformer for charge generation
+                off_mol.assign_partial_charges(
+                    'am1bcc', use_conformers=off_mol.conformers


I'll raise an issue, once we've got OpenFreeEnergy/openfe#598 sorted out, we should switch this to using a settings-selected backend.

Ah that sounds great, having this option is indeed useful. Thanks

IAlibay · 2023-11-28T21:56:33Z

feflow/protocols/nonequilibrium_cycling.py

+        #    Dict[SmallMoleculeComponent, openff.toolkit.Molecule]
+        state_a_small_mols = {component: component.to_openff() for component in state_a.components.values() if
+                              isinstance(component, SmallMoleculeComponent)}
+        state_b_small_mols = {component: component.to_openff() for component in state_b.components.values() if


Am I correct in understanding here that "common" small molecules between the two states are effectively being duplicated here?

If so there are two considerations:

Given the complexity of some spectator molecules, this might add a bit of an overhead in duplicating partial charge generation. It's not the worst, but it's enough that if you have a multi-ring molecule you might find yourself idling for a little bit on what should be a short simulation.

Because we're seeing a lack of consistency between repeats of a partial charge generation (i.e. calling antechamber multiple times on the same inputs), there is a small risk that you could end up with non-transforming molecules that have different partial charges. This is a bit rarer, and eventually we'll have a fix, but it is still something that is likely to persist for a bit until we can fix things upstream.

I see, that makes a lot of sense, and I agree. We probably want to avoid having to run into those issues. That explains why using the Mapping object here could be a better choice, just like you did on the openfe side.

That approach works, but it seems a bit hard to read/understand and might be prone to errors in the future. Wouldn't using a set work here? Since the set operations should be okay to be used with the openff Molecule objects. For example as in:

In [11]: mol_list = [benzene]*3 + [hexane]*4 + [cyclohexane]*7 In [12]: mol_list Out[12]: [Molecule with name '' and SMILES '[H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H]', Molecule with name '' and SMILES '[H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H]', Molecule with name '' and SMILES '[H][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]1[H]', Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]', Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]', Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]', Molecule with name '' and SMILES '[H][C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[H]', Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]', Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]', Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]', Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]', Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]', Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]', Molecule with name '' and SMILES '[H][C]1([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]([H])([H])[C]1([H])[H]'] In [13]: len(mol_list) Out[13]: 14 In [14]: mol_set = set(mol_list) In [15]: len(mol_set) Out[15]: 3

That way, we can iterate through the set to parametrize the molecules and guarantee that they will only be passed once. I hope that makes sense. Thoughts?

Oh I see the problem now, that doesn't work because there can be instances of the same molecule in different locations and/or with different conformers, and this would end up parametrizing only one of them. I think using the Mapping object is the way to go.

dotsdl

Looking good @ijpulidos! Just a couple notes to follow up on; after that I think this is good to go if that's all that's needed to make feflow openfe 0.14.0 compatible.

dotsdl · 2023-12-04T17:26:26Z

feflow/protocols/nonequilibrium_cycling.py

+            system_generator.create_system(off_mol.to_topology().to_openmm(),
+                                           molecules=[off_mol])


Clarification question: are we just running create_system here to make sure that the charged molecules can be parameterized by OpenMM?

This is for the solvation step a bit later on, if you don't do this here you don't have necessary topology info registered to solvate your system.

Where does the topology info get registered? Looking at the definition of SystemGenerator.create_system, it's not clear to me that this method has any side effects or stores any state on the SystemGenerator...

Sorry wrong word, not topology but forcefield - which you need to pass to addSolvent.

It's been a long while, but my rough recollection is that by calling create_system you update the TemplateGenerator which is hooked up to the app.ForceField object, so that it remains in "template memory" the next time around.

Note: we could probably reduce the footprint to a single system_generator here by moving this out of the loop and passing list(chain(all_alchemical_mols.values(), common_small_mols.values())) as the input to molecules. I don't think it'll have significant performance differences, but it might be a tad bit cleaner?

dotsdl · 2023-12-04T17:38:44Z

feflow/protocols/nonequilibrium_cycling.py

+
+        # Assign charges if unassigned -- more info: Openfe issue #576
+        for off_mol in chain(state_a_small_mols.values(), state_b_small_mols.values()):


Suggested change

# Assign charges if unassigned -- more info: Openfe issue #576

for off_mol in chain(state_a_small_mols.values(), state_b_small_mols.values()):

combined_small_mols = state_a_small_mols | state_b_small_mols

# Assign charges if unassigned -- more info: Openfe issue #576

for off_mol in combined_small_mols.values():

I believe doing the above should avoid the duplication of charging @IAlibay is referring to here.

We're relying on the pre-existance of partial charges as the way to ensure that we then don't re-generate partial charges later.

Depending on how OFF molecule equality works, wouldn't doing this end up with some molecules without partial charges in some of the two object lists?

ijpulidos · 2023-12-05T00:47:45Z

@IAlibay @dotsdl This one should be now ready to be re-reviewed. Thanks so much for the comments, they were really helpful. I hope it now makes more sense.

ijpulidos · 2023-12-05T15:17:29Z

Just to clarify a few of the changes made. As far as I can tell, the equality between OFF molecules could mean that two molecules with different conformers (or in different locations) could be thought as equal, since it only cares about the cheminformatics, not about the structure. Therefore, I agree with @IAlibay, I think we need to pass all the off mols to the partial charge generation. If we try merging dictionaries between the two states we could miss some of them, as far as I can see.

On the other hand, I tried to consistently use the typing Dict[SmallMoleculeComponent, openff.toolkit.Molecule] to store the small molecules from the different states, as possible. This should avoid creating copies of the molecules (by referencing them in the dict) and they should also be ready to be used in the get_omm_modeller util function. Maybe in the future we can consider creating a specific model/container for these if that makes any sense.

This is a big lengthy but I hope it helps clarifying the changes I made.

ijpulidos · 2023-12-05T15:46:59Z

I just realized that we need a test system that has small molecules (common small molecules) other than the alchemical molecules to test the added functionality. @IAlibay how is this being tested on the repex protocol?

IAlibay · 2023-12-05T18:13:32Z

@ijpulidos there's a couple of approaches you can take.

Protein-ligand complexes: we've been using EG5 and pfkfb3 (the latter more in manual tests rather than CI). Those have cofactors you can pass through.

Host-guest systems: those are by default a non alchemical SMC + an alchemical one. Although they take a while longer to charge and probably aren't amazing for CI unless you have charges ahead of time.

IAlibay · 2023-12-05T18:16:13Z

We also have this fictitious system with a bunch of benzene modifications that have been shifted around: https://github.com/OpenFreeEnergy/openfe/blob/main/openfe/tests/protocols/conftest.py#L165

IAlibay

lgtm! - as you mentioned a test case with multiple ligands would be good, but otherwise this seems good

ijpulidos added 4 commits November 28, 2023 11:31

Support openfe014. Charge transformations.

86604c6

import chain

ca5919a

use small mols dict

d1e2c17

cast to list (pickable)

e7d8b4a

Only pydantic 1 so far

109cd93

ijpulidos requested review from dotsdl and IAlibay November 28, 2023 16:54

IAlibay approved these changes Nov 28, 2023

View reviewed changes

Avoiding deprecated path API.

7881709

dotsdl approved these changes Dec 4, 2023

View reviewed changes

Avoiding redundant calls to charge assignment by using mapping obj.

3a5f1be

ijpulidos requested review from IAlibay and dotsdl December 5, 2023 00:46

Have to use values for common

fe06fa2

IAlibay approved these changes Dec 7, 2023

View reviewed changes

ijpulidos merged commit 6c17003 into main Dec 7, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support openfe 0.14 #13

Support openfe 0.14 #13

ijpulidos commented Nov 28, 2023 •

edited

Loading

codecov-commenter commented Nov 28, 2023 •

edited

Loading

IAlibay left a comment

IAlibay Nov 28, 2023

ijpulidos Nov 28, 2023

IAlibay Nov 28, 2023

ijpulidos Nov 28, 2023

ijpulidos Nov 28, 2023

dotsdl left a comment

dotsdl Dec 4, 2023

IAlibay Dec 4, 2023

dotsdl Dec 7, 2023

IAlibay Dec 7, 2023

dotsdl Dec 4, 2023

dotsdl Dec 4, 2023

IAlibay Dec 4, 2023

ijpulidos commented Dec 5, 2023

ijpulidos commented Dec 5, 2023

ijpulidos commented Dec 5, 2023 •

edited

Loading

IAlibay commented Dec 5, 2023

IAlibay commented Dec 5, 2023

IAlibay left a comment

		system_generator.create_system(off_mol.to_topology().to_openmm(),
		molecules=[off_mol])


		# Assign charges if unassigned -- more info: Openfe issue #576
		for off_mol in chain(state_a_small_mols.values(), state_b_small_mols.values()):

Support openfe 0.14 #13

Support openfe 0.14 #13

Conversation

ijpulidos commented Nov 28, 2023 • edited Loading

codecov-commenter commented Nov 28, 2023 • edited Loading

Codecov Report

IAlibay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotsdl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ijpulidos commented Dec 5, 2023

ijpulidos commented Dec 5, 2023

ijpulidos commented Dec 5, 2023 • edited Loading

IAlibay commented Dec 5, 2023

IAlibay commented Dec 5, 2023

IAlibay left a comment

Choose a reason for hiding this comment

ijpulidos commented Nov 28, 2023 •

edited

Loading

codecov-commenter commented Nov 28, 2023 •

edited

Loading

ijpulidos commented Dec 5, 2023 •

edited

Loading