Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the measures implementation more efficient #2294

Open
evansd opened this issue Dec 9, 2024 · 0 comments
Open

Make the measures implementation more efficient #2294

evansd opened this issue Dec 9, 2024 · 0 comments
Assignees

Comments

@evansd
Copy link
Contributor

evansd commented Dec 9, 2024

This is going to get increasingly important as measures get used more and more. The existing implementation is known to be sub-optimal in various ways, some of which are easy to fix and others which will require more thought.

This issue collects a grab bag of different ideas in no particular order

  1. Using a CASE expression to map from booleans to integers is silly and pretty easily fixable:
    Expand CastToInt operation to accept Series[bool] #1313

  2. Patient properties which do not vary from interval to interval (e.g. ethnicity) could be extracted once and joined back in to the per-interval properties. (See Slack thread.)

  3. We could do the summing and grouping in SQL on the temporary results tables and avoid pulling back all the individual patient rows (even if we no longer write them to disk).

  4. We could possibly do the summing and grouping on the fly while running the query which would build the results table and avoid creating it entirely.

  5. We do separate queries for each denominator. This isn't usually too bad as often then is only one denominator across a series of measures. But where there are multiple denominators this becomes multiplicatively costly. Possibly we can identify sets of denominators which can be sensibly combined in a single query.

  6. For certain measure queries it might be possible to avoid doing one query per interval and instead query for some appropriate series of events which allows us to calculate the answer for each interval. For example, it should be possible to calculate the number of patients registered at each practice each month over a five year period by doing a single query for the relevant practice registration records rather than 60 queries, one for each month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants