-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better autocomplete behaviour in ehrQL #506
Comments
I had a spike query language which used parameterized types to give some of what we'd want here. I eventually ran into a limitation that type variables and parameterized types don't compose as you'd like in Python. It's possible that this will be a bit easier now we've got parallel hierarchies and mixins in the ehrQL definition. |
I would like to tentatively retract this statement. Following on from the changes in #663 I have a proof-of-concept for how autocomplete and type checking could be made to work in ehrQL. In fact it works in the purely in-browser version of VSCode, which was slightly surprising to me. You should be able to open it directly in the Github editor here. Or you can also visit vscode.dev and copy-paste the example from: In either case, you should be prompted to install the Python extension and should click "Install anyway" when it warns you that functionality will be limited in the browser. If you've just installed it then you might need to wait 10 seconds or so before autocomplete becomes available. You should find that the appropriate columns are offered as completions on each frame and, for each of those, the methods offered are just the appropriate ones for the type and dimension of the series. The key trick here was using a descriptor to access series from frames and then adding lots of signature There's obviously a combinatorial explosion of overloads here as we add more types, and as we start extending this approach to the rest of the ehrQL methods. I think the only sane way of maintaining these would be to generate them programmatically i.e. have some script which introspects ehrQL/the Query Model and spits out the relevant signatures. Possibly they can live in a standalone typing stubs file. Or maybe all the concrete series classes should be generated this way, and just the mixins and base classes written by hand. In any case, I think the only thing that needs immediate thought is making sure that the API surface we're currently exposing to users doesn't prevent us providing autocomplete later. I'm hopeful that the changes in #663 are sufficient here. |
Wow. |
Seconded.
We're going to end up with a file called ehrql.xml, aren't we? |
I've realised there's still an unsolved problem here, which is how to handle operations on frames which return different types of frame while retaining column name autocomplete. The basic EventFrame->EventFrame methods ( def take(self: T, condition: Whatever) -> T:
... But for |
@evansd Type parameters? There is a limit to their usefulness because the don't compose fully with type variables in Python, but they may do enough. |
We define tables in ehrQL using subclasses of `EventFrame` and `PatientFrame` e.g. ```py @table class event(EventFrame): date = Series(datetime.date) code = Series(SNOMEDCTCode) ``` This gives us two big advantages: * auto-complete for column names; * the option to define table-specific [helper][1] methods. Previously however, as soon as you performed any kind of operation on one of these tables (i.e. called a method) you'd get back a plain `EventFrame` or `PatientFrame` with no column auto-completion and no helper methods. This PR ensures that we return the appropriate frame subclass from all methods. This also allows us to remove the `__getattr__` magic from the `BaseFrame` class. **The nasty bits** Returning an appropriate type in all cases requires two bits of trickery in the form of dynamic class compilation: 1. When we call `sort_by()` we need the result to have, as well as its existing methods, the `get_first/last_for_patient()` methods. So we construct a subclass which mixes the necessary methods. 2. When we call one of `get_first/last_for_patient()` methods we need to get back a `PatientFrame`. This should have all the columns defined on the original frame, but none of the methods. We introspect the class definition to extract all the columns and construct a new `PatientFrame` with those columns included. **Static auto-complete** The above gives us auto-complete in a dynamic context like an IPython session when code is actually executed. We also get a limited form of static (type-based) auto-complete in VSCode. Previously, this worked only on the original frame and this PR extends this so that it persists through `where/except_where` calls. However it won't persist through `sort_by` or `get_first/last_for_patient()`. After reasonably extensive investigation (which I need to write up in [this ticket][2]) I don't _think_ our ideal behaviour is acheivable in Pylance (VSCode's type checker) as things currently stand. But I don't think anything in this PR makes things worse in that regard. [1]: #1021 [2]: #506
We define tables in ehrQL using subclasses of `EventFrame` and `PatientFrame` e.g. ```py @table class event(EventFrame): date = Series(datetime.date) code = Series(SNOMEDCTCode) ``` This gives us two big advantages: * auto-complete for column names; * the option to define [table-specific helper][1] methods. Previously however, as soon as you performed any kind of operation on one of these tables (i.e. called a method) you'd get back a plain `EventFrame` or `PatientFrame` with no column auto-completion and no helper methods. This PR ensures that we return the appropriate frame subclass from all methods. This also allows us to remove the `__getattr__` magic from the `BaseFrame` class. **The nasty bits** Returning an appropriate type in all cases requires two bits of trickery in the form of dynamic class compilation: 1. When we call `sort_by()` we need the result to have, as well as its existing methods, the `get_first/last_for_patient()` methods. So we construct a subclass which mixes the necessary methods. 2. When we call one of `get_first/last_for_patient()` methods we need to get back a `PatientFrame`. This should have all the columns defined on the original frame, but none of the methods. We introspect the class definition to extract all the columns and construct a new `PatientFrame` with those columns included. **Static auto-complete** The above gives us auto-complete in a dynamic context like an IPython session when code is actually executed. We also get a limited form of static (type-based) auto-complete in VSCode. Previously, this worked only on the original frame and this PR extends this so that it persists through `where/except_where` calls. However it won't persist through `sort_by` or `get_first/last_for_patient()`. After reasonably extensive investigation (which I need to write up in [this ticket][2]) I don't _think_ our ideal behaviour is acheivable in Pylance (VSCode's type checker) as things currently stand. But I don't think anything in this PR makes things worse in that regard. [1]: #1021 [2]: #506
We define tables in ehrQL using subclasses of `EventFrame` and `PatientFrame` e.g. ```py @table class event(EventFrame): date = Series(datetime.date) code = Series(SNOMEDCTCode) ``` This gives us two big advantages: * auto-complete for column names; * the option to define [table-specific helper][1] methods. Previously however, as soon as you performed any kind of operation on one of these tables (i.e. called a method) you'd get back a plain `EventFrame` or `PatientFrame` with no column auto-completion and no helper methods. This PR ensures that we return the appropriate frame subclass from all methods. This also allows us to remove the `__getattr__` magic from the `BaseFrame` class. **The nasty bits** Returning an appropriate type in all cases requires two bits of trickery in the form of dynamic class compilation: 1. When we call `sort_by()` we need the result to have, as well as its existing methods, the `get_first/last_for_patient()` methods. So we construct a subclass which mixes the necessary methods. 2. When we call one of `get_first/last_for_patient()` methods we need to get back a `PatientFrame`. This should have all the columns defined on the original frame, but none of the methods. We introspect the class definition to extract all the columns and construct a new `PatientFrame` with those columns included. **Static auto-complete** The above gives us auto-complete in a dynamic context like an IPython session when code is actually executed. We also get a limited form of static (type-based) auto-complete in VSCode. Previously, this worked only on the original frame and this PR extends this so that it persists through `where/except_where` calls. However it won't persist through `sort_by` or `get_first/last_for_patient()`. After reasonably extensive investigation (which I need to write up in [this ticket][2]) I don't _think_ our ideal behaviour is acheivable in Pylance (VSCode's type checker) as things currently stand. But I don't think anything in this PR makes things worse in that regard. [1]: #1021 [2]: #506
We define tables in ehrQL using subclasses of `EventFrame` and `PatientFrame` e.g. ```py @table class event(EventFrame): date = Series(datetime.date) code = Series(SNOMEDCTCode) ``` This gives us two big advantages: * auto-complete for column names; * the option to define [table-specific helper][1] methods. Previously however, as soon as you performed any kind of operation on one of these tables (i.e. called a method) you'd get back a plain `EventFrame` or `PatientFrame` with no column auto-completion and no helper methods. This PR ensures that we return the appropriate frame subclass from all methods. This also allows us to remove the `__getattr__` magic from the `BaseFrame` class. **The nasty bits** Returning an appropriate type in all cases requires two bits of trickery in the form of dynamic class compilation: 1. When we call `sort_by()` we need the result to have, as well as its existing methods, the `get_first/last_for_patient()` methods. So we construct a subclass which mixes in the necessary methods. 2. When we call one of the `get_first/last_for_patient()` methods we need to get back a `PatientFrame`. This should have all the columns defined on the original frame, but none of the methods. We introspect the class definition to extract all the columns and construct a new `PatientFrame` with those columns included. **Static auto-complete** The above gives us auto-complete in a dynamic context like an IPython session where code is actually executed. We also get a limited form of static (type-based) auto-complete in VSCode. Previously, this worked only on the original frame and this PR extends this so that it persists through `where/except_where` calls. However it won't persist through `sort_by` or `get_first/last_for_patient()`. After reasonably extensive investigation (which I need to write up in [this ticket][2]) I don't _think_ our ideal behaviour is acheivable in Pylance (VSCode's type checker) as things currently stand. But I don't think anything in this PR makes things worse in that regard. [1]: #1021 [2]: #506
We define tables in ehrQL using subclasses of `EventFrame` and `PatientFrame` e.g. ```py @table class event(EventFrame): date = Series(datetime.date) code = Series(SNOMEDCTCode) ``` This gives us two big advantages: * auto-complete for column names; * the option to define [table-specific helper][1] methods. Previously however, as soon as you performed any kind of operation on one of these tables (i.e. called a method) you'd get back a plain `EventFrame` or `PatientFrame` with no column auto-completion and no helper methods. This PR ensures that we return the appropriate frame subclass from all methods. This also allows us to remove the `__getattr__` magic from the `BaseFrame` class. **The nasty bits** Returning an appropriate type in all cases requires two bits of trickery in the form of dynamic class compilation: 1. When we call `sort_by()` we need the result to have, as well as its existing methods, the `get_first/last_for_patient()` methods. So we construct a subclass which mixes in the necessary methods. 2. When we call one of the `get_first/last_for_patient()` methods we need to get back a `PatientFrame`. This should have all the columns defined on the original frame, but none of the methods. We introspect the class definition to extract all the columns and construct a new `PatientFrame` with those columns included. **Static auto-complete** The above gives us auto-complete in a dynamic context like an IPython session where code is actually executed. We also get a limited form of static (type-based) auto-complete in VSCode. Previously, this worked only on the original frame and this PR extends this so that it persists through `where/except_where` calls. However it won't persist through `sort_by` or `get_first/last_for_patient()`. After reasonably extensive investigation (which I need to write up in [this ticket][2]) I don't _think_ our ideal behaviour is acheivable in Pylance (VSCode's type checker) as things currently stand. But I don't think anything in this PR makes things worse in that regard. [1]: #1021 [2]: #506
We define tables in ehrQL using subclasses of `EventFrame` and `PatientFrame` e.g. ```py @table class event(EventFrame): date = Series(datetime.date) code = Series(SNOMEDCTCode) ``` This gives us two big advantages: * auto-complete for column names; * the option to define [table-specific helper][1] methods. Previously however, as soon as you performed any kind of operation on one of these tables (i.e. called a method) you'd get back a plain `EventFrame` or `PatientFrame` with no column auto-completion and no helper methods. This PR ensures that we return the appropriate frame subclass from all methods. This also allows us to remove the `__getattr__` magic from the `BaseFrame` class. **The nasty bits** Returning an appropriate type in all cases requires two bits of trickery in the form of dynamic class compilation: 1. When we call `sort_by()` we need the result to have, as well as its existing methods, the `get_first/last_for_patient()` methods. So we construct a subclass which mixes in the necessary methods. 2. When we call one of the `get_first/last_for_patient()` methods we need to get back a `PatientFrame`. This should have all the columns defined on the original frame, but none of the methods. We introspect the class definition to extract all the columns and construct a new `PatientFrame` with those columns included. **Static auto-complete** The above gives us auto-complete in a dynamic context like an IPython session where code is actually executed. We also get a limited form of static (type-based) auto-complete in VSCode. Previously, this worked only on the original frame and this PR extends this so that it persists through `where/except_where` calls. However it won't persist through `sort_by` or `get_first/last_for_patient()`. After reasonably extensive investigation (which I need to write up in [this ticket][2]) I don't _think_ our ideal behaviour is acheivable in Pylance (VSCode's type checker) as things currently stand. But I don't think anything in this PR makes things worse in that regard. [1]: #1021 [2]: #506
We define tables in ehrQL using subclasses of `EventFrame` and `PatientFrame` e.g. ```py @table class event(EventFrame): date = Series(datetime.date) code = Series(SNOMEDCTCode) ``` This gives us two big advantages: * auto-complete for column names; * the option to define [table-specific helper][1] methods. Previously however, as soon as you performed any kind of operation on one of these tables (i.e. called a method) you'd get back a plain `EventFrame` or `PatientFrame` with no column auto-completion and no helper methods. This PR ensures that we return the appropriate frame subclass from all methods. This also allows us to remove the `__getattr__` magic from the `BaseFrame` class. **The nasty bits** Returning an appropriate type in all cases requires two bits of trickery in the form of dynamic class compilation: 1. When we call `sort_by()` we need the result to have, as well as its existing methods, the `get_first/last_for_patient()` methods. So we construct a subclass which mixes in the necessary methods. 2. When we call one of the `get_first/last_for_patient()` methods we need to get back a `PatientFrame`. This should have all the columns defined on the original frame, but none of the methods. We introspect the class definition to extract all the columns and construct a new `PatientFrame` with those columns included. **Static auto-complete** The above gives us auto-complete in a dynamic context like an IPython session where code is actually executed. We also get a limited form of static (type-based) auto-complete in VSCode. Previously, this worked only on the original frame and this PR extends this so that it persists through `where/except_where` calls. However it won't persist through `sort_by` or `get_first/last_for_patient()`. After reasonably extensive investigation (which I need to write up in [this ticket][2]) I don't _think_ our ideal behaviour is acheivable in Pylance (VSCode's type checker) as things currently stand. But I don't think anything in this PR makes things worse in that regard. [1]: #1021 [2]: #506
A lot of this has now been done in #2337 . Here are the outstanding things:
|
A couple of other minor things that I've spotted:
|
Another thing. When you define a dataset.sex = patients.sex
dataset.
UPDATE |
The dream of having a fully statically typed query language almost certainly isn't possible due to the limitations of Python's type system and the complexities of the query model. But we should still try to make the editing experience as helpful as possible.
Our target here should be VS Code with the default Python extension installed. That's what many users have locally, but it's also what's provided in online IDEs like Gitpod and Github Codespaces.
I think it's acceptable here for the autocomplete behaviour here to be a bit "optimistic" in that it completes method names that probably exist on the object in question, even if it turns out that they don't always. At least, I think that's a lot better than not showing anything because we can't be certain exactly what methods exist. Especially if we combine that with decent error reporting as in #505.
I don't know exactly how we'd achieve that though. Have some type which has the all the methods and then use that as the type signature on everything?
It would also be really good to get column name completion if that's at all possible.
The text was updated successfully, but these errors were encountered: