Log entry for consumables `item_codes_not_recognised` in health system log is non-deterministic #1434

matt-graham · 2024-07-26T14:08:49Z

In

Lines 279 to 287 in 625b4d9

    
           logger.info( 
        
               key="item_codes_not_recognised", 
        
               data={ 
        
                   _treatment_id if _treatment_id is not None else "": list( 
        
                       _item_codes 
        
                   ) 
        
                   for _treatment_id, _item_codes in self._not_recognised_item_codes 
        
               }, 
        
           )

a dictionary mapping from treatment IDs to lists of item codes is logged at the end of the simulation for any instances of HSI events which requested consumables for which one or more of the corresponding item codes were not 'recognised' (in practice I think this corresponds to the consumable item codes not being present in the ResourceFile_Consumables_availability_small.csv resource file even if the are present in the ResourceFile_Consumable_Items_and_Packages.csv resource file).

This dictionary is constructed from a set self._not_recognised_item_codes stored as an attribute of the Consumables class instance. The elements of this set are 2-tuples with first entry the treatment ID string and second entry a tuple of the unrecognised item codes (any requested item codes not present in self.item_codes attribute which is constructed from item codes present in ResourceFile_Consumables_availability_small.csv resource file).

The dictionary comprehension iterates over this set, mapping from the treatment ID in each item to the corresponding item codes.

There a few distinct problems with this approach:

The iteration order over sets is non-deterministic in Python over distinct Python processes - that is if we construct the same set in two Python processes and iterate over it we will not necessarily have the items returned in the same order. This means the order of the keys / entries in the logged dictionary is non-deterministic across runs.
As the tuple of item codes in each entry in the self._not_recognised_item_codes set is itself constructed from a set (and creating a tuple from a set involves iterating over the set) the order of the item codes in each tuple is also non-deterministic across runs.
Most problematic, as the dictionary comprehension uses the treatment ID as a key, and in general there can be multiple entries in the self._not_recognised_item_codes set with the same treatment ID (if they differ in the unrecognised item codes which can occur when a different instances of a HSI event can request different consumables), only the item codes for the last entry in self._not_recognised_item_codes in the (non-deterministic) iteration order for a given treatment ID are recorded in the logged dictionary. I introduced this bug in Ensure log entries use consistent ordering and types for columns #1404 🙈 as previously each entry in self._not_recognised_item_codes was iterated over and logged separately (which created issues with each entry having inconsistent column types), and so were always logged even with repeated treatment IDs.

This means that on different runs of the 'same' simulation (in the sense of equivalent fixed seed, parameters, modules loaded etc.) we will get different values logged for item_codes_not_recognised key in health system log and the actual logged item codes for a given treatment ID will only be a subset of the actual unrecognised items if a given HSI event type makes different requests for (unrecognised) item codes on different runs of apply method.

This is I believe the underlying issue behind the discrepancies noted in #1227 (comment).

As a solution, I think that rather than using a set for self._not_recognised_item_codes, it would be better to have this be a dictionary keyed by treatment ID with values corresponding to sets of unrecognised item codes. The log entry would then just correspond to this dictionary with the set values mapped to sorted lists. This should ensure both consistency in ordering of treatment IDs and item codes for each treatment ID, and avoid the issue with item codes for repeated treatment IDs being lost.

The text was updated successfully, but these errors were encountered:

matt-graham added the bug Something isn't working label Jul 26, 2024

matt-graham self-assigned this Jul 26, 2024

matt-graham mentioned this issue Jul 26, 2024

Log entry for EquipmentEverUsed_ByFacilityID non-deterministic #1435

Closed

tbhallett mentioned this issue Jul 29, 2024

Warning generated about logged columns #1440

Open

matt-graham mentioned this issue Jul 30, 2024

Fix logging of unrecognised consumables item codes #1445

Merged

matt-graham closed this as completed in #1445 Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log entry for consumables `item_codes_not_recognised` in health system log is non-deterministic #1434

Log entry for consumables `item_codes_not_recognised` in health system log is non-deterministic #1434

matt-graham commented Jul 26, 2024

Log entry for consumables item_codes_not_recognised in health system log is non-deterministic #1434

Log entry for consumables item_codes_not_recognised in health system log is non-deterministic #1434

Comments

matt-graham commented Jul 26, 2024

Log entry for consumables `item_codes_not_recognised` in health system log is non-deterministic #1434

Log entry for consumables `item_codes_not_recognised` in health system log is non-deterministic #1434