Log entry for consumables item_codes_not_recognised
in health system log is non-deterministic
#1434
Labels
bug
Something isn't working
In
TLOmodel/src/tlo/methods/consumables.py
Lines 279 to 287 in 625b4d9
a dictionary mapping from treatment IDs to lists of item codes is logged at the end of the simulation for any instances of HSI events which requested consumables for which one or more of the corresponding item codes were not 'recognised' (in practice I think this corresponds to the consumable item codes not being present in the
ResourceFile_Consumables_availability_small.csv
resource file even if the are present in theResourceFile_Consumable_Items_and_Packages.csv
resource file).This dictionary is constructed from a set
self._not_recognised_item_codes
stored as an attribute of theConsumables
class instance. The elements of this set are 2-tuples with first entry the treatment ID string and second entry a tuple of the unrecognised item codes (any requested item codes not present inself.item_codes
attribute which is constructed from item codes present inResourceFile_Consumables_availability_small.csv
resource file).The dictionary comprehension iterates over this set, mapping from the treatment ID in each item to the corresponding item codes.
There a few distinct problems with this approach:
self._not_recognised_item_codes
set is itself constructed from a set (and creating a tuple from a set involves iterating over the set) the order of the item codes in each tuple is also non-deterministic across runs.self._not_recognised_item_codes
set with the same treatment ID (if they differ in the unrecognised item codes which can occur when a different instances of a HSI event can request different consumables), only the item codes for the last entry inself._not_recognised_item_codes
in the (non-deterministic) iteration order for a given treatment ID are recorded in the logged dictionary. I introduced this bug in Ensure log entries use consistent ordering and types for columns #1404 🙈 as previously each entry inself._not_recognised_item_codes
was iterated over and logged separately (which created issues with each entry having inconsistent column types), and so were always logged even with repeated treatment IDs.This means that on different runs of the 'same' simulation (in the sense of equivalent fixed seed, parameters, modules loaded etc.) we will get different values logged for
item_codes_not_recognised
key in health system log and the actual logged item codes for a given treatment ID will only be a subset of the actual unrecognised items if a given HSI event type makes different requests for (unrecognised) item codes on different runs ofapply
method.This is I believe the underlying issue behind the discrepancies noted in #1227 (comment).
As a solution, I think that rather than using a set for
self._not_recognised_item_codes
, it would be better to have this be a dictionary keyed by treatment ID with values corresponding to sets of unrecognised item codes. The log entry would then just correspond to this dictionary with the set values mapped to sorted lists. This should ensure both consistency in ordering of treatment IDs and item codes for each treatment ID, and avoid the issue with item codes for repeated treatment IDs being lost.The text was updated successfully, but these errors were encountered: