-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate the KJ event loop into Python's asyncio event loop #310
Conversation
Very cool to see this, I haven't looked that closely but from the description this sounds like the right way to solve this problem. It's nice to see it is finally happening. I'm pretty overloaded so will have to mostly leave this review to @haata but let me know if there are any specific questions I can help answer.
The license is BSD 2-clause, which does not require permission: https://github.com/capnproto/node-capnp/blob/node14/LICENSE I am not a lawyer and this is not legal advice, but what I typically do is put a comment in the code saying something like "derived from X, which has the following license:" then reproduce the copyright and license file. |
Thanks @kentonv. The licensing is a bit tricky, because the file this is taken from lists MIT at the top, while the repository license specifies BSD: https://github.com/capnproto/node-capnp/blob/node10/src/node-capnp/capnp.cc |
Oh interesting, apparently I changed the license in: capnproto/node-capnp@9b3e84b But I forgot to update the LICENSE file. The two licenses are functionally equivalent. I'm happy with you citing either one of them. |
Nice! I'll try to take some time this week to play around with the code as well.
|
Okay, everything is basically functional now. The main problem I'm trying to deal with now is the clean shutdown of the event loop. This leads to some problems in clients running this interface: The client connects to a server and registers a @kentonv could you suggest how to shut that down gracefully? I was looking at |
3ae9f25
to
f2c7473
Compare
All tests now pass on linux and macos. Windows still fails due to usage of unix-specific apis. But otherwise, this is probably ready for a round of review. |
Yeah, it looks like something like this for Windows https://stackoverflow.com/questions/20543940/where-do-i-get-arpa-inet-h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a good read over and I think it looks pretty good overall. I really like how it makes some of the examples a lot simpler.
I want to try using this with my existing pycapnp library to see how it performs. Fingers crossed I should be able to do this by the end of the week.
capnp/lib/capnp.pyx
Outdated
|
||
def __dealloc__(self): | ||
del self.thisptr | ||
|
||
cpdef _connect(self, host_string): | ||
# TODO: Make async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can KJ help with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #311 for my proposal w.r.t. file descriptors. This would solve async connection issue.
examples/async_ssl_server.py
Outdated
@@ -37,7 +37,7 @@ def alive(self, **kwargs): | |||
|
|||
class Server: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would be worthwhile to have an alternative api to use the internal KJ TLS as well?
It might be a bit simpler of an API (depending on how much work it is to use). I don't think it's necessary to do as part of this PR unless you think it's easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #311 for my proposal w.r.t. file descriptors. This would solve the TLS issue.
a0cc27b
to
fafa8e7
Compare
@kentonv, I'm still having trouble cleanly shutting down the event loop. I cannot control when Python's event loop shuts down, and it makes sense to me to shut down the kj event loop as well at that point. Even if some promises are still running. Otherwise, errors will occur. The problem here, is that after shutting down the loop, references to objects like Is it just plain impossible to shut down the event loop while objects that are using the loop still exist? |
Yes, this leads to dangling pointers and use-after-free. I haven't looked at your code so this may or may not make sense, but I imagine the solution would be something like:
|
@kentonv that makes sense. However:
What is the best way to do this? As far as I can see, neither the EventPort nor the EventLoop can wholesale cancel all promises. Should I create a giant |
Sorry, to clarify, by "all current and future promises" I specifically meant promises created by the EventPort itself to represent fundamental events. Promises created by the app need not be canceled explicitly, as long as the EventLoop still exists. |
@kentonv I just tried my best interpretation of your approach:
I'm not exactly sure what you mean by 'fundamental events'. Are you talking about events in the Python loop? Those have already been cancelled during the closing of the event loop... Basically, what I need to decide is how to respond to calls of
|
I mean the promises that the EventPort itself returns. As opposed to other promises created by the app derived from those. My point here is you do not need to cancel and destroy every existing promise, you just need everything which is specifically waiting on an event from the EventPort to receive an exception. Presumably, once the Python event loop is gone, then the KJ event loop can only continue until it is empty. Once it's empty, then there's no way for it to become non-empty again, because no further events can ever be delivered... I think? I guess if Python code is still running it can still call pycapnp in a way that schedules new events, but maybe the answer is to have all the Python functions throw Python exceptions if the event loop no longer exists?
No, destructors never wait for the loop. They might queue an event, but they expect that event to run later, they don't wait for it. |
Unfortunately, I still don't understand what that means. As far as I can tell, the
I think that if (1) the Python event loop is gone and (2) the KJ event loop is non-empty, the KJ event loop will never become empty again in a non-exceptional way. Once Python's loop is gone, I have no way of scheduling events through
Yes, that is my plan.
Thanks, that's good to know. |
OK I see the confusion. Your code here is a bit different, the AsyncIoProvider implementation uses the Python APIs directly to listen for events, so the Anyway, what I meant was, for example, the promise OTOH, you do NOT need to wrap the promise returned by, say,
I see, for some reason I was imagining that the python event loop would be shut down as a result of a call made by the app during a KJ event, i.e. the KJ event loop is currently running and will continue to empty its queue. But I realize now I have no idea when the python event loop shuts down. In fact it sounds to me like something that shouldn't really happen except at program exit? Anyway, I guess I don't know the right answer in that case. |
Okay, yes, I get it now. That makes a lot of sense. Thanks.
For the record: I'm not very happy with Python's API, because its cross-platform support is somewhat lacking. I'd be happier to reuse some KJ API that lets me poll for events. But as far as I can see, the code for that is not publicly exposed. And I'm not super keen on copying all that code over and maintaining it separately. Do you have a better solution for that?
Python has a |
This is hard because at the bottom of the stack, kj::UnixEventPort is waiting for all I/O events using a single unified poll() loop (or using epoll on Linux, kqueue on BSD, etc.), and you can't really have two poll() loops running in the same thread. The Python asyncio system presumably has its own poll loop, so you can't use UnixEventPort at the same time. I tried to design the KJ APIs to allow building on top of some other system's event loop. This is exactly what you did in this PR. It's possible there's a different level of abstraction that would have worked better here, like maybe if UnixEventPort's API were abstract, you could implement that directly on top of the Python event loop, and then you'd be able to reuse the AsyncIoProvider implementation that sits on UnixEventPort. I'm not really sure, though, because UnixEventPort is very much tied to file descriptors and unix-isms -- it makes no sense on Windows. But I'd expect many of these other event loops are designed to be somewhat cross-platform. AsyncIoProvider is the only abstraction I've come up with which avoids being platform-specific. |
Okay, fair enough. I guess we'll just have to live with Python's API here. And if we ever do #311, then the situation will improve because Python's higher level API's are more cross-platform. Thanks! |
Fix capnproto#256 This PR attempts to remove the slow and expensive polling behavior for asyncio in favor of proper linking of the KJ event loop to the asyncio event loop.
It was originally copied from the nodejs implementation, which in turn copied from async-io-unix.c++. But that copy is pretty old.
beb5611
to
5a522bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the change from a_wait() to await(). We'll probably need to keep a_wait() around for a bit, but I think it would be good to deprecate it over a release or two.
@@ -317,7 +306,7 @@ ctypedef fused PromiseTypes: | |||
_Promise | |||
_RemotePromise | |||
_VoidPromise | |||
PromiseFulfillerPair | |||
# PromiseFulfillerPair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the PromiseFulfillerPair does not at all support the same operations as the other classes in this fused type. So I had to remove it. (But I don't believe anything is lost by that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, please add a comment indicating this, or just remove this line.
raise KjException('Function passed to `then` call must take no arguments') | ||
Will still work with non-asyncio socket communication, but requires async handling of the function call. | ||
""" | ||
# TODO: Is keeping a separate _VoidPromise class really worth it? Does it make things faster? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I unfortunately, don't know the answer to this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm keeping it like it is for now. But in the future, someone might look into refactoring/simplifying the class hierarchy of Promises more.
@kentonv, sorry for all the questions, but my systems programming skills are really being pushed to the limit here, especially on windows. I'm now trying to port to windows, where my strategy is basically to copy large parts of Question: |
Thinking about the cross-platform issue a bit more: I'm starting to think that we should just double down on my proposal in #311 and entirely get rid of the use of raw file descriptors when the asyncio loop is active. That way we can entirely circumvent all this rather tricky, low-level system programming. It will probably be a little slower, but I think that if we implement it cleverly using buffered streaming protocols the performance should be acceptable. That will allow us to get rid of all the new code in |
No, you can't just use a So, you will need to figure out how to wait on Win32 handles using the Python event loop APIs. Yeah, it does sound like #311 might be part of the answer here. |
This was something that I also had difficulty with, the file descriptors just didn't work well when interacting with Python. Working with something Python supports directly (cross-platform) would make a lot of this easier to think about/debug for a python developer. |
All asyncio now goes through _AsyncIoStream
I have now implemented This PR is now fully cross-platform. As such, I believe that this is now a candidate for merging. This PR introduces a number of backwards-incompatible changes:
As such, I would propose that after this is merged, a |
@haata Gentle reminder that this is ready for final review and merging. |
Thanks for the reminder (somehow I'll find some time to do this by the end of the weekend). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking so long, I think this looks great!
} | ||
|
||
kj::Promise<void> PyAsyncIoStream::whenWriteDisconnected() { | ||
// TODO: Possibly connect this to protocol.connection_lost? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you notice any issues doing this? Seems like it's supported in 3.7.
https://docs.python.org/3.7/library/asyncio-protocol.html#asyncio.BaseProtocol.connection_lost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I believe that the semantics of connection_lost
are not exactly the same as the semantics of whenWriteDisconnected
, but @kentonv might be able to comment more accurately.
In any case, as I understand it, this function is basically an optimization, and there is no requirement for the the promise to ever return. So it wasn't a priority for me.
@@ -317,7 +306,7 @@ ctypedef fused PromiseTypes: | |||
_Promise | |||
_RemotePromise | |||
_VoidPromise | |||
PromiseFulfillerPair | |||
# PromiseFulfillerPair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, please add a comment indicating this, or just remove this line.
@LasseBlaauwbroek can you summarize a changelog entry for me? You don't need to add it to the file (I can do that later), just add it as a comment here. After that I'll merge it in. |
# started. | ||
_C_DEFAULT_EVENT_LOOP_LOCAL.loop = _weakref.ref(kjloop) | ||
loop.close = oldclose() | ||
return oldclose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be causing an issue in my local tests. Should it not be return oldClose
?
Just an example of what the error I'm seeing is:
@pytest.mark.asyncio
async def test_connect_async():
nr_client = tsnbox_ipc.NodeRouter._new_client(NodeRouter())
The library pytest_asyncio
does a bit of cleanup of event loops after each test,
def _close_event_loop() -> None:
policy = asyncio.get_event_loop_policy()
try:
loop = policy.get_event_loop()
except RuntimeError:
loop = None
if loop is not None:
if not loop.is_closed():
warnings.warn(
_UNCLOSED_EVENT_LOOP_WARNING % loop,
DeprecationWarning,
)
loop.close() <<<<<<<<<<<<<<<<<<
````
`loop.close()` fails with Nonetype object is not callable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Good eye.
I already have this fix as part of another PR. To be opened soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of this further down is also confusing, since we re-assign loop.close to the result of this function.
I understand a bit better now, seeing the changes in cull-sync
Fix #256
This PR attempts to remove the slow and expensive polling behavior for asyncio in favor of proper linking of the KJ event loop to the asyncio event loop.
This is a highly experimental proof-of-concept. There are probably memory leaks all over the place; my knowledge of both the C++ side and the Cython side is limited. However, early feedback from @kentonv, @haata, and others would be highly appreciated. Also, don't hesitate to push any and all improvement to this branch (or another branch).
Current state: The example in
examples/async_client.py
is functional, but its counterpartexamples/async_server.py
is not because timers are not yet implemented. So to run this, the server needs to be started with a version of pycapnp without these changes.Most of the code in
capnp/helpers/asyncProvider.cpp
was taken from @kentonv's nodejs implementation. So I'm hereby asking for permission to copy that. If that's okay, let me know what kind of attribution you might want.