Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mixed direction sorts #2223

Open
evansd opened this issue Nov 14, 2024 · 2 comments
Open

Support mixed direction sorts #2223

evansd opened this issue Nov 14, 2024 · 2 comments

Comments

@evansd
Copy link
Contributor

evansd commented Nov 14, 2024

Problem

Our current API for sorts (the sort_by() and first/last_for_patient() methods) doesn't allow specifying a mixture of ascending and descending sorts.

This means that while we can write queries like "find events with the largest numeric value and return the latest":

events.sort_by(events.numeric_value, events.date).last_for_patient()

We can't straightforwardly support queries like "find events with the largest numeric value and then return the earliest of these" because that requires sorting numeric value and date in opposite directions.

For this particular example we can work around this by inverting the float which has the effect of reversing the sort direction:

events.sort_by(-events.numeric_value, events.date).first_for_patient()

But that trick is not applicable to many other types (e.g you can't invert a date).

Slack discussion:
https://bennettoxford.slack.com/archives/C069YDR4NCA/p1731501153163389

Possible API designs

We could allow sort_by() to take a reverse keyword argument. This is simple and matches Python's sorted() function. It allows you to support mixed direction sorts by chaining e.g. to find events with the largest numeric value and then return the earliest of these you could do:

events.sort_by(events.date, reverse=True).sort_by(events.numeric_value).last_for_patient()

However, chaining sorts like this is extremely counter-intuitive as you need to apply them in backwards order of priority, and remember which you have applied reverse to. I had to think quite hard to persuade myself that the above example is correct.

Another alternative is to have some wrapper like desc() which causes wrapped columns to sort in descending order as opposed to the default of ascending. (SQLAlchemy does something like this.) The same example would then look something like:

events.sort_by(desc(events.numeric_value), events.date).first_for_patient()

I'm sure there are other options we're missing as well. We should make sure we've explored the design space well here before settling on anything.

@madwort
Copy link
Contributor

madwort commented Nov 14, 2024

chaining sorts

I really don't like this & don't think we should suggest this to researchers. As discussed elsewhere, you also need to know that the first sort is stable after the second sort.

wrapper like desc()

would we have a corresponding (but superfluous) asc() wrapper? and/or I'd be tempted to require sort order everywhere, but I think that's going to be too prescriptive for your taste?

events.sort_by(desc(events.numeric_value), asc(events.date)).first_for_patient()
events.sort_by(asc(events.date)).first_for_patient()

@evansd
Copy link
Contributor Author

evansd commented Dec 3, 2024

I don't think we can require asc() everywhere for backwards compatibility reasons.

I was wondering if "reversed" is a more natural term here than "descending", partly because the user doesn't actually specify whether the ordering is ascending or descending until they call either first or last _for_patient(). What you're really saying by marking a particular sort field like this is that that field sorts backwards to whatever direction the other sorts go in – without saying what direction that is.

It then occurs to me that reversed() is a Python builtin which just calls __reversed__() on its argument so we could make that work without requiring another import. I can't decide whether that's quite neat, or horrific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants