Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZEO.asyncio: switch to async/await style #195

Closed
wants to merge 87 commits into from
Closed

Conversation

d-maurer
Copy link
Contributor

@d-maurer d-maurer commented Apr 2, 2022

This PR tries to make the ZEO client interface implementation easier to understand; likely, the result is slightly less efficient than the current implementation.

The PR switches to standard async/await style for the asyncio part of the client interface implementation. Therefore, it must drop Python 2 support.
It uses throughout standard asyncio Futures (with scheduled callbacks) instead of special futures (with immediate callbacks) at some places.
It significantly extends (and at some places corrects) the source code documentation.

@navytux The PR drops "credentials" support. Maybe, this is a mistake. Two aspects made me drop it:

  1. the parameters have been in a section with heading "mostly ignored"
  2. a comment stated that ZEO 5 dropped "credentials" in favor of SSL.

Thus, I definitely ignored the parameters. Only later, I noticed that they might still be usefull for the use case "ZEO 6 client + ZEO 4 server". If we want continued support for this use case, the "credentials" support can easily be restored.

The PR removes the testing dependency from mock and Random2. Random2 was there to get the same randomization in Python 2 and Python 3. With Python 2 support dropped, Random2 was no longer necessary. However, its replacement by Python 3's random caused a huge diff for test_cache.

Note to reviewers: due to significant changes it is likely easier to look at the files directly rather than at the diffs.

Copy link
Member

@dataflake dataflake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention a slight loss of efficiency. It may be worthwhile to see if you can use zodbshootout to get a better picture: https://pypi.org/project/zodbshootout/

setup.py Outdated Show resolved Hide resolved
src/ZEO/asyncio/base.py Show resolved Hide resolved
src/ZEO/tests/ConnectionTests.py Outdated Show resolved Hide resolved
@d-maurer
Copy link
Contributor Author

d-maurer commented Apr 8, 2022

Failing uvloop tests

Locally, the uvloop tests succeed. Therefore, I assume a significant setup difference.

According to the uvloop documentation, uvloop delegates to the C library libuv. I know that uvloop does something even when libuv is not installed (but was unable to determine what that is). Nevertheless, I assume that the use of uvloop is meaningfull only if libuv is installed. In my local environment, I have installed libuv1-dev 1.18.0-3.

The information about the GHA setup does not indicate that libuv is installed.
@icemac , @dataflake Do you have a possibility to check whether libuv is available for the uvloop checks - and if so in what version?

@dataflake
Copy link
Member

A description of installed packages for those GHA runners is at https://github.com/actions/virtual-environments/blob/main/images/linux/Ubuntu2004-Readme.md. I don't see libuv on there.

You can install additional OS packages from the GHA tests configuration. Edit .github/workflows/tests.yml and add the following below pip install tox in the Install dependencies step: sudo apt-get update && sudo apt-get install -y libuv libuv-dev. If you look at the console output for the test run you should see that installation step and output about the version installed.

Anecdotal evidence from my own testing machine: I usually just run bin/tox, which never includes the uvloop tests. So I checked first if the uvloop library is installed at all - it wasn't. Running bin/tox -pall -epy36-uvloop -epy37-uvloop -epy38-uvloop -epy39-uvloop showed failues on Python 3.7 and 3.8. Then I manually installed the libraries, but I am still seeing (other) failures on Python 3.7 and 3.8. So having the libraries installed doesn't make the tests succeed on my own machine.

@dataflake
Copy link
Member

dataflake commented Apr 8, 2022

P.S.: The test failures on both test runs (without the uvloop libraries first and then with them installed) are preceded by nearly identical tracebacks, which look just like the output from GHA:

Traceback (most recent call last):
  File "/opt/src/ZEO/.tox/py38-uvloop/lib/python3.8/site-packages/ZEO/tests/testZEO.py", line 333, in do_store
    store.store(oid, revid, b'x', '', t)
  File "/opt/src/ZEO/.tox/py38-uvloop/lib/python3.8/site-packages/ZEO/ClientStorage.py", line 602, in store
    tbuf = self._check_trans(txn, 'store')
  File "/opt/src/ZEO/.tox/py38-uvloop/lib/python3.8/site-packages/ZEO/ClientStorage.py", line 523, in _check_trans
    raise ClientDisconnected(meth, 'on a disconnected transaction')
ZEO.Exceptions.ClientDisconnected: ('store', 'on a disconnected transaction')

@d-maurer
Copy link
Contributor Author

d-maurer commented Apr 8, 2022 via email

@dataflake
Copy link
Member

Exactly, after looking at the code and noticing that the loop.cpython-3x-x86_64-linux-gnu.so file does not link to the uv library itself I thought the Python wheel must be bundling at least part of the uv library.

@navytux
Copy link
Contributor

navytux commented Apr 8, 2022

@d-maurer, personally I do not have problems with dropping credentials support.
I also believe that now, when ZEO5 is established for years and relatively
stabilized, it is ok to drop support for "ZEO5 client wrt ZEO4 server". I might
be wrong, but I'm almost 100% sure that no one is using such ZEO5+ZEO4
configuration nowdays.

However this pull-request is very large 26000 line diff doing many things at
once and I strongly suggest splitting it in shorter, logically separate, steps.
It took us ~ 7 years to stabilize ZEO5 and it would be a huge pity to have it
destabilized again. You probably understand that, but a reminder just in case
is that here we are dealing with such kind of software where concurrency
mistakes turn into data corruptions.

I had not looked closely yet, but please anyway find some feedback below. Hope
it might be useful.

It uses throughout standard asyncio Futures (with scheduled callbacks)
instead of special futures (with immediate callbacks) at some places.

(emphasis mine)

This change is a big source of potential concurrency problems in my view. The
"call immediate rather than soon" type of future that ZEO5 uses has impact not
only on performance, but also on semantic. In particular with immediate call it
is guaranteed that if e.g. a message sending is complete the callback is
invoked immediately and another - semantically related message - could be sent
right after the first one. With switching to "invoke future callbacks
eventually" this property is lost. I see you already observed some problems
related to this change in
#193 (comment) and
I'm sure there are many places in current ZEO implementation that
undocumentedly rely on "invoke immediately" Fut behaviour.

This change also significantly affects performance because latency of many
operations will grow. There is a big difference in between being notified right
after an operation completes compared to being notified after an operation
completes and a full event loop cycle is over.

Thus, I suggest not to make this change, or do it very carefully alone in a
step separated from everything else.


Second. The drop of Python2 support.

I understand it is 2022, but still, even after it was deprecated, Python2 is
being used. For example we actually do currently use Python2 where I work,
because porting to Python3 takes a lot of effort and is slow to happen. I hope
it will eventually happen, but I do not see it coming in the near future.

PyPy is also not dropping Python2 support .

Dropping Python2 support for ZODB was discussed some time ago several times already, e.g in

zopefoundation/ZODB#312
zopefoundation/ZODB#303
https://github.com/orgs/zopefoundation/teams/developers/discussions/2

In my personal view dropping this support brings only cosmetic benefit and is
outweighted by the fact that people, that still do use Python2, are being cut.

Kirill

P.S. it is easy to extract small commits from a big change with git checkout -p.

/cc @perrinjerome, @arnaud-fontaine, @mgedmin, @jamadden, @dataflake

@dataflake
Copy link
Member

However this pull-request is very large 26000 line diff doing many things at once and I strongly suggest splitting it in shorter, logically separate, steps. It took us ~ 7 years to stabilize ZEO5 and it would be a huge pity to have it destabilized again. You probably understand that, but a reminder just in case is that here we are dealing with such kind of software where concurrency mistakes turn into data corruptions.

The file src/ZEO/tests/test_cache.py is about 23000 lines, that is just test data that changed.

Second. The drop of Python2 support.

Please keep in mind:

  • no one is forced to upgrade to ZEO 6. Everyone who wants to stick with Python 2 can stay with ZEO 5.
  • Python's own release cadence has sped up a lot. With each new Python release it becomes harder to also support Python 2.7.
  • I hear the argument "people are still using Python 2.7" a lot, but it's always very unspecific. Do YOU know specific users that use ZEO on Python 2.7 right now and who for some reason cannot move? And if they cannot move, wouldn't they be served well just sticking with ZEO 5?
  • every time someone makes a demand "please keep support for totally outdated feature XXX" they make demands on volunteers who spend their spare time on a project. If it is really that important those people should be willing to spend their own time as well.

You keep bringing up worries about stability. We're trying our best with very limited resources to move the project forward while keeping existing users in mind. Anything that has the potential to cause disruption will lead to a major release and anyone who finds they cannot use that new major release can stick with an older version.

@d-maurer
Copy link
Contributor Author

d-maurer commented Apr 9, 2022 via email

@dataflake
Copy link
Member

... In my personal view dropping this support brings only cosmetic benefit and is outweighted by the fact that people, that still do use Python2, are being cut.

@dataflake reported recently that it becomes more and more difficult to feed Zope 4 with current versions of auxiliary packages - sometimes necessary to fix security problems - because they have dropped Python 2 support. A security fix for waitress (used as client facing IO component by Zope), available only for Python 3.7+ has caused this report. He suggested that, as a consequence, Python 2 support might need to get dropped in the not so far future. I remember that @dataflake has been one of those developers who advocated not to drop Python 2 support without good reason.

That is correct, I always asked to keep Python 2 support unless the extra work becomes too much of a burden. My main reason as a Zope release manager was Zope 4, which is supported on Python 2.7 and is still an officially supported Zope release.

In the meantime, many third party dependencies we have no control over dropped Python 2 support left and right. Zope 4 is now stuck on outdated and unsupported dependencies, and already one of them is an unfixed security issue (the waitress problem Dieter mentions). So Zope 4 itself has become "unsupportable" in a sense. All we can do is put a big warning label on it and ask people to either move their Zope 4 deployment to Python 3.7, or move to Zope 5.

As a result, my opposition to dropping Python 2 support is now gone. There is no good reason to spend more time on a platform that we simply cannot support anymore due to reasons out of our own control.

I do not know whether the reasons behind my PR are good enough to justify dropping Python 2 support. I understand much better how the implementation works with the "async/await" style introduced by the PR than the former future_generator. Not sure whether others agree and whether they think it is worth to drop Python 2 and accept potential instabilities.

My opinion: Only keep Python 2 support if you can reach the technical goal of your changes without bending over backwards to keep it compatible with Python 2. Users who need Python 2 or who are afraid of the changes have a viable workaround, they can stay on ZEO 5.
I don't have an opinion about the potential instability because I am not qualified to make a judgment call here.

navytux pushed a commit that referenced this pull request Jan 8, 2023
--------
kirr:

Extracted from #195
navytux pushed a commit that referenced this pull request Jan 8, 2023
…to ensure that we do not store the same information twice into the cache; this may or may not be important)

--------
kirr:

	for discussion

Extracted from #195
navytux pushed a commit that referenced this pull request Jan 8, 2023
--------
kirr:

	for discussion

	originally support for credentials was added in dbb066d2
	(#53)

Extracted from #195
@navytux navytux deleted the branch master January 8, 2023 18:29
@navytux navytux closed this Jan 8, 2023
@navytux
Copy link
Contributor

navytux commented Jan 8, 2023

navytux closed this 7 minutes ago

I'm sorry. I did not closed this. I just removed y/py3 after superceeding #200 with #218 and github automatically closed hereby PR. I will try to reopen it.

@navytux navytux reopened this Jan 8, 2023
@navytux
Copy link
Contributor

navytux commented Jan 8, 2023

Hello @d-maurer,

I have finished reviewing your work.

My review has the form of your original changes coming in adjusted form and divided in two parts:

  1. the first part picks up most of your work, including all the bugfixes,
    improvements and the switch to async/await-like coroutines. However I picked it in
    backward-compatible manner so that everything continues to work on
    both Python2 and Python3. Along the way I also fixed mistakes in your
    original changes here and there as I was noticing them during review. And
    sometimes I added more functionality to support the work. For example I
    fully implemented cancellation for our Task, Future and CoroutineExecutor
    because without cancellation support things were not working properly on my side.
    And I switched everything to our own Task and Future so that they are more
    widely tested by more uniform usage.

    The whole change is split into logical patches each of which tries to do one
    logical step at a time and do it well. For every patch I tried to preserve
    correct authorship and include corresponding notices and references into the
    patch description. I believe this way it makes it easier for people to both
    a) understand the changes now, and b) understand the changes later when
    studying the project via e.g. git gui blame.

    This part is represented by knext branch and asyncio.client improvements #217 pull-request.

  2. the second part drops Python2 support and switches from async/await-style
    coroutines being implemented with @coroutine/yield syntax to those
    coroutines implemented with async/await syntax directly. Due to the way
    how part 1 is organized this change is small. It is represented by
    7f574ecd patch and further
    changes that drop py2 support otherwise in knext+py3 branch and
    ZEO.asyncio: switch to async/await style #218 pull-request.

    In Drop Python2 support (WIP: do not merge before #208 or #195) #200 (comment) you
    said that "dropping Python 2 support gives no gain by itself. Only if
    gainfull things (I think of ZEO.asyncio: switch to async/await style #195) require dropping Python 2 support, I would
    merge Drop Python2 support (WIP: do not merge before #208 or #195) #200."
    So from this point of view, if maybe we are ok to keep our
    async/await-style coroutines being implemented with similar
    @coroutine/yield syntax, there is no real need to merge this and to drop
    py2 support. Still if people believe that continuing py2 support adds
    additional maintenance burden, as I said in
    ZEO.asyncio: switch to async/await style #195 (comment) I'm
    ok with spawning ZEO 5 branch after part "1" and I'm volunteering to
    maintain Python2 support for ZODB stack myself.

That's it.

If you are interested to check the overall difference in between your and mine
work, here is the link for δ(asyncio3.5+, knext+py3): https://github.com/zopefoundation/ZEO/compare/asyncio3.5++ksyncmaster..knext+kpy3
(list of patches) (*).

The full list of changes is:

  • report unconsumed exceptions from Future and Task instead of hiding them silently
  • add proper cancellation support to Future/AsyncTask/ConcurrentTask/CoroutineExecutor
  • improve tests for CoroutineExecutor
  • fix ConcurrentFuture.wait regarding timeout handling
  • ClientIo -> ClientIO
  • No NoBlobFileError as explained in *: Introduce and use NoBlobFileError for situations when blob file is not found #203 (comment)
  • use higher timeouts in tests to avoid failures
  • use consistent naming in smp.py (e.g. it for iterator everywhere, not only in some places)
  • fix _smp.pyx raising "Processing message" exception
  • fix race in between IO loop startup and ClientThread.close
  • retain reporting about cancellation in connect()
  • use our Task instead of asyncio.Task for ClientIO._connecting and .verify_task
  • retain removed comments that remain relevant
  • retain original "disconnected" messages in await_operational
  • retain test_call_async
  • fix thinko related to future_mode in test_close_with_running_loop
    and test_close_with_stopped_loop. Probably there are more similar places -
    those two were just what I caught by hitting test failures.
  • change this should not happen comments with explanation how it can happen for real
  • undo addition of process_async() - this function is used only once and by the code that comes right above it
  • follow original master code structure more closely
  • do not introduce intermediate regressions
  • remove (object) from class X(object) everywhere
  • use super() instead of super(X, self) everywhere
  • fix typos, small cosmetics
  • probably something more I forgot...

Please see the diff and individual patches for details.

For completeness here are the other relevant deltas:

And there are a few things I did not picked up:

a) moving cache access out of ClientIO into Protocol (part of e370854d).

This change is questionable to me. In my understanding it is the invariant
of the current system that Protocol.load_before is run only when there is
no corresponding cache entry. In other words instead of moving the cache
access it should be ok to add an assert into Protocol.load_before that
corresponding cache entry is not there.

This change is self-contained and orthogonal to the rest. I've created #219 to discuss it further.

b) removal of def test_suite() is src/ZEO/asyncio/test.py.

Those test_suite() functions are there to support python setup.py test.
I agree this way of running tests is not used nowdays, but if we are to drop
it, we should do it similarly to
zopefoundation/ZODB@3577a9c4 - i.e. dropping
support for python setup.py test completely instead of making it broken.

I can do corresponding PR if we decide that dropping python setup.py test is ok.

c) timeout=None vs timeout=absent handling in tpc_abort and
server_status in ClientStorage. Maybe I'm missing something and those
changes are needed for some reason. But offhand I did not see the need for them.

d) removal of credentials support. Credentials are different from
user/password passed directly to ClientStorage. They were added as experimental feature in 2016 in
dbb066d2
(#53). I wanted to be confident
with removing this, but was in a hurry to finish the review. I created
#220 to discuss credentials removal.

Hope it help.

And last but not lest I want to say thank you for being open, listening to my
arguments and amending your original work correspondingly. And for your
patience with waiting for my review being slow to come. Here if I have a chance
I would suggest that next time we move in not so big steps. It certainly helps to
keep the ultimate goal as the target. But it also helps if the whole change for
that ultimate goal is reached in smaller steps each of which is easier to
perceive and review.

Kirill

(*) for the comparison I've merged asyncio3.5+ with master myself. This resulted in 751933ea.

@dataflake
Copy link
Member

b) removal of def test_suite() is src/ZEO/asyncio/test.py.

Those test_suite() functions are there to support python setup.py test.
I agree this way of running tests is not used nowdays, but if we are to drop
it, we should do it similarly to
zopefoundation/ZODB@3577a9c4 - i.e. dropping
support for python setup.py test completely instead of making it broken.

I can do corresponding PR if we decide that dropping python setup.py test is ok.

Yes, removal of all code that exists just to support running python setup.py test is OK and should be done. We have been removing it from other projects wherever we see it.

navytux pushed a commit that referenced this pull request Jan 10, 2023
kirr:

Similarly to ZODB `python setup.py test` was not working properly for
some time already. For example running it today executes only 6 tests:

    (py3.venv) kirr@deca:~/src/wendelin/z/ZEO$ python setup.py test
    running test
    WARNING: Testing via this command is deprecated and will be removed in a future version. Users looking for a generic test entry point independent of test runner are encouraged to use tox.
    ...
    running build_ext
    test_ZEO_DB_convenience_error (ZEO.tests.testZEO.Test_convenience_functions) ... ok
    test_ZEO_DB_convenience_ok (ZEO.tests.testZEO.Test_convenience_functions) ... ok
    test_ZEO_client_convenience (ZEO.tests.testZEO.Test_convenience_functions) ... ok
    test_ZEO_connection_convenience_ok (ZEO.tests.testZEO.Test_convenience_functions) ... ok
    test_ZEO_connection_convenience_value (ZEO.tests.testZEO.Test_convenience_functions) ... ok
    test_work_with_multiprocessing (ZEO.tests.testZEO.MultiprocessingTests)
    Client storage should work with multi-processing. ... ok

    ----------------------------------------------------------------------
    Ran 6 tests in 0.070s

    OK

Also @d-maurer and @dataflake say this way of running tests is being
removed throughout all Zopefoundation projects:

#217 (comment)
#195 (comment)

-> Let it go.

Cherry-picked from
zopefoundation/ZODB@3577a9c4 and adjusted a
bit for ZEO.
@navytux
Copy link
Contributor

navytux commented Jan 10, 2023

Thanks, Jens. I've tried to address this in 7c8888a6, 05cb4849 and 87fef6b7.

navytux added a commit that referenced this pull request Jan 18, 2023
First major part of Dieter's work from #195 :

Reimplement and streamline the ``asyncio`` part of the ``ClientStorage`` implementation:
   - switch from futures with explicit callbacks to `async/await`-like style
   - use standard ``asyncio`` features to implement timeouts
   - redesign the API of the class implementing the ZEO client protocol
   - significantly improve source documentation and tests
   - fix several concurrency bugs
   - add optional ``cython`` based optimization;
     it speeds up reads but slows down writes.
     To use it, install ``cython`` (and its dependencies) and
     run ``cythonize -i *.pyx`` in ``src/ZEO/asyncio``.

NOTE: this works on both Python2 and Python3.

/reviewed-by @d-maurer
/reviewed-on #217
@navytux navytux deleted the branch master January 18, 2023 14:48
@navytux navytux closed this Jan 18, 2023
@navytux
Copy link
Contributor

navytux commented Jan 18, 2023

Oops, I've removed y/py3 again and this closed hereby PR again. @d-maurer do we need to keep this open given that #217 was merged and #218 is there hopefully on its way too?

If you prefer to keep #195 opened I will repush y/py3 branch again.

@d-maurer
Copy link
Contributor Author

d-maurer commented Jan 18, 2023 via email

@navytux
Copy link
Contributor

navytux commented Jan 18, 2023

Dieter, thanks for feedback. I see. For the reference the differences in between #195 and #217+#218 are explained in #195 (comment).

@navytux navytux reopened this Jan 18, 2023
@navytux navytux changed the base branch from y/py3 to master January 18, 2023 15:25
@navytux
Copy link
Contributor

navytux commented Jan 18, 2023

Anyway I've reopened this PR and retargeted it to master.

navytux added a commit that referenced this pull request Jan 19, 2023
@d-maurer notes at #218 (comment) :

    Why do we need this special __await__ logic? It is not present in #195 and nevertheless all uvloop tests passed.

and he is right - in 7f574ec I missed to do the full removal of py2
support code from CoroutineExecutor:

in py2 it was

	@coroutine
	def dothing1():
	    yield dothing2()

this way the yield for dothing2 was working this way:

- dothing2() called - it returns a coroutine object
- yield yields that coroutine object, not objects that second coroutine
  object would yield by itself. This is the difference in beween yield
  and `yield from`.
- CoroutineExecutor was noticing such yield and handling `yield coro()`
  with the same semantic as if it was `yield from coro()` by manually
  creating another AsyncTask.

but on py3 await has semantic of `yield from` and so we do not need to
keep all that `yield from` emulating code. 7f574ec dropped part of
that, but did not drop special handling of `yield coro()` if coro is
non-native custom coroutine function.

Let that code go as well.
navytux added a commit that referenced this pull request Jan 23, 2023
Second part of Dieter's work from #195 :

- Switch to using `async/await` directly instead of `@coroutine/yield`
- Drop support for Python 2.7, 3.5 and 3.6

/reviewed-by @d-maurer, @dataflake
/reviewed-on #218
navytux added a commit that referenced this pull request Jan 24, 2023
Drop experimental support for credentials object: the corresponding
``ClientStorage.__init__`` parameter ``credentials`` is retained but ignored.
From now on ZEO supports authentication only via SSL certificates.

Note that ZEO 5 never supported authenticating via ``username`` and
``password`` - support for such basic auth was dropped in 2016 before ZEO 5.0
was released.

See c7f2138 for details.

Extracted from #195

/reviewed-by @dataflake, @d-maurer
/reviewed-on #220
@d-maurer
Copy link
Contributor Author

d-maurer commented Oct 3, 2024

The essential changes in the PR have been repackaged by @navytux (a requirement for his approval) and merged into master. Therefore, I close this PR.

@d-maurer d-maurer closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants