Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log entry for consumables item_codes_not_recognised in health system log is non-deterministic #1434

Closed
matt-graham opened this issue Jul 26, 2024 · 0 comments · Fixed by #1445
Assignees
Labels
bug Something isn't working

Comments

@matt-graham
Copy link
Collaborator

In

logger.info(
key="item_codes_not_recognised",
data={
_treatment_id if _treatment_id is not None else "": list(
_item_codes
)
for _treatment_id, _item_codes in self._not_recognised_item_codes
},
)

a dictionary mapping from treatment IDs to lists of item codes is logged at the end of the simulation for any instances of HSI events which requested consumables for which one or more of the corresponding item codes were not 'recognised' (in practice I think this corresponds to the consumable item codes not being present in the ResourceFile_Consumables_availability_small.csv resource file even if the are present in the ResourceFile_Consumable_Items_and_Packages.csv resource file).

This dictionary is constructed from a set self._not_recognised_item_codes stored as an attribute of the Consumables class instance. The elements of this set are 2-tuples with first entry the treatment ID string and second entry a tuple of the unrecognised item codes (any requested item codes not present in self.item_codes attribute which is constructed from item codes present in ResourceFile_Consumables_availability_small.csv resource file).

The dictionary comprehension iterates over this set, mapping from the treatment ID in each item to the corresponding item codes.

There a few distinct problems with this approach:

  • The iteration order over sets is non-deterministic in Python over distinct Python processes - that is if we construct the same set in two Python processes and iterate over it we will not necessarily have the items returned in the same order. This means the order of the keys / entries in the logged dictionary is non-deterministic across runs.
  • As the tuple of item codes in each entry in the self._not_recognised_item_codes set is itself constructed from a set (and creating a tuple from a set involves iterating over the set) the order of the item codes in each tuple is also non-deterministic across runs.
  • Most problematic, as the dictionary comprehension uses the treatment ID as a key, and in general there can be multiple entries in the self._not_recognised_item_codes set with the same treatment ID (if they differ in the unrecognised item codes which can occur when a different instances of a HSI event can request different consumables), only the item codes for the last entry in self._not_recognised_item_codes in the (non-deterministic) iteration order for a given treatment ID are recorded in the logged dictionary. I introduced this bug in Ensure log entries use consistent ordering and types for columns #1404 🙈 as previously each entry in self._not_recognised_item_codes was iterated over and logged separately (which created issues with each entry having inconsistent column types), and so were always logged even with repeated treatment IDs.

This means that on different runs of the 'same' simulation (in the sense of equivalent fixed seed, parameters, modules loaded etc.) we will get different values logged for item_codes_not_recognised key in health system log and the actual logged item codes for a given treatment ID will only be a subset of the actual unrecognised items if a given HSI event type makes different requests for (unrecognised) item codes on different runs of apply method.

This is I believe the underlying issue behind the discrepancies noted in #1227 (comment).

As a solution, I think that rather than using a set for self._not_recognised_item_codes, it would be better to have this be a dictionary keyed by treatment ID with values corresponding to sets of unrecognised item codes. The log entry would then just correspond to this dictionary with the set values mapped to sorted lists. This should ensure both consistency in ordering of treatment IDs and item codes for each treatment ID, and avoid the issue with item codes for repeated treatment IDs being lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant