mvccadapter: check if the last TID changed without invalidation #307

jmuchemb · 2020-04-01T15:46:56Z

Since commit b5895a5, a ZEO test fails as follows:

File "src/ZEO/tests/drop_cache_rather_than_verify.txt", line 114, in drop_cache_rather_than_verify.txt
Failed example:
    conn.root()[1].x
Expected:
    6
Got:
    1

Earlier in the test, the ZEO server is restarted and then another client commits. When disconnected, the first client does not receive invalidations anymore and the connection gets stuck in the past until there's a new commit after it reconnected. It was possible to make the test pass with the following patch:

--- a/src/ZEO/ClientStorage.py
+++ b/src/ZEO/ClientStorage.py
@@ -357,6 +357,7 @@ def notify_connected(self, conn, info):
 
         # invalidate our db cache
         if self._db is not None:
+            self._db.invalidate(self.lastTransaction(), ())
             self._db.invalidateCache()
 
         logger.info("%s %s to storage: %s",

Other implementations like NEO are probably affected the same way.

Rather than changing interfaces in a backward-incompatible way, this commit revert to the original behaviour, and all the changes that were done in existing tests are reverted.

jmuchemb · 2020-04-01T15:47:37Z

This is a follow-up of #291.

I submit this PR because it solves an issue that has to be fixed quickly.

However, this is extra code that's useless most of the time so I prefer we change the interface, which may require a major version bump.

jmuchemb · 2020-06-02T14:25:26Z

I'll merge on Friday.

jamadden · 2020-06-02T14:44:42Z

Can we get CI green, please?

jamadden

I think I understand what's going on here and I think this looks OK, at least in isolation, once tests go green.

It needs a good change note (I think there's already something about #290 in there, so maybe just an updated change note mentioning this PR?).

jmuchemb · 2020-06-02T18:00:45Z

Can we get CI green, please?

Oops. Now fixed, with:

--- a/src/ZODB/tests/testmvcc.py
+++ b/src/ZODB/tests/testmvcc.py
@@ -87,10 +87,10 @@
 
 >>> cn._storage._start == p64(u64(st.lastTransaction()) + 1)
 True
->>> cn.db()._mvcc_storage.invalidate(100, dict.fromkeys([1, 2]))
+>>> cn.db()._mvcc_storage.invalidate(p64(100), dict.fromkeys([1, 2]))
 >>> cn._storage._start == p64(u64(st.lastTransaction()) + 1)
 True
->>> cn.db()._mvcc_storage.invalidate(200, dict.fromkeys([1, 2]))
+>>> cn.db()._mvcc_storage.invalidate(p64(200), dict.fromkeys([1, 2]))
 >>> cn._storage._start == p64(u64(st.lastTransaction()) + 1)
 True

I also needed #310 to be merged.

jamadden · 2020-06-02T18:40:44Z

Awesome!

navytux · 2020-06-02T20:05:44Z

src/ZODB/mvccadapter.py

+        # so we still have to poll.
+        ltid = self._storage.lastTransaction()
+        # But at this precise moment, a transaction may be committed and
+        # we have already received the new tid, along with invalidations.


@jmuchemb, @jamadden isn't the above true only if storage, upon receiving invalidation messages from server, first notifies DB.invalidate() and only after updates zstor.last_tid? For the reference this is indeed true for ZEO4:

https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/ClientStorage.py#L1473-L1482

but not true for NEO:

https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124

If zstor.last_tid is updated first before zstor sends invalidations to DB (to MVCCAdapter here), what could happen is:

poll_invalidations calls zstor.lastTransaction -> gets just committed tid;

poll invalidations proceeds and could retrieve ._invalidations before zstor could actually update them with invalidation data corresponding to tid;

tid, and incomplete set of objects to invalidate is returned.

I.e. it will be a data corruption, because objects might remain stale in live cache with similar data corruption scenario as demonstrated in #290.

P.S. I could not quickly see which way ZEO5 implements. Maybe it is something different, or actually related to zopefoundation/ZEO#155.

P.P.S RelStorage is unaffected as it implements IMVCCStorage natively.

There's indeed a problem between this MR & NEO :(

For ZEO4, the 2 operations are wrapped around the same lock so the order does not matter.

For ZEO5, it's first DB.invalidate() then lastTransaction() updated.

So I have to answer 2 questions:

Is ZEO5 behaviour enough ?

If it is, can I do the same in NEO ?

zopefoundation/ZEO#155 is unrelated.

@jmuchemb, thanks for feedback.

Yes, I believe if we change DB.invalidate() call to always happen before zstor.last_tid update that would be enough. If we go this way this change also requires corresponding change in IStorage interface definiton. Something like below:

--- a/src/ZODB/interfaces.py +++ b/src/ZODB/interfaces.py @@ -685,6 +685,15 @@ def isReadOnly(): def lastTransaction(): """Return the id of the last committed transaction. + Returned tid is ID of last committed transaction as observed from + some time _before_ lastTransaction call returns. In particular for + client-sever case, lastTransaction can return cached view of storage + that was learned some time ago. + + It is guaranteed that for all IStorageWrappers, that wrap the storage, + invalidation notifications have been completed for transactions + with ID ≤ returned tid. + If no transactions have been committed, return a string of 8 null (0) characters. """

EDIT: corrected to "... some time before lastTransaction call ~~was made~~returns." due to @jmuchemb notice.

Indeed, I found no way to cause any bug if lastTransaction() is updated just after invalidations are notified. I feared issues because NEO also uses lastTransaction() internally: note however it must do it before releasing its cache lock, otherwise "load current" fails on ZODB<5.

Updating lastTransaction() at the right moment also matters for "internal" invalidations: I mean that it must be done just after calling the tpc_finish callback.

https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d79f68dce00267f3ae9c4e12d56d19dede should make NEO work with this MR. It includes tests for the 4 cases.

@jmuchemb, thanks for feedback and adapting things on NEO side. I see about the need to update app.last_tid while still under app._cache_lock - as app._cache and app.last_tid need to be kept coherent on NEO side internally.

Updating lastTransaction() at the right moment also matters for "internal" invalidations: I mean that it must be done just after calling the tpc_finish callback.

Yes, thanks for pointing this out. There app.last_tid has to be also set while still under app._cache_lock and after call to app._cache.invalidate(). As you say this matters for "load current" on ZODB<5. I thus suggest the following amendment for ZODB4 version of proposed IStorage.lastTransaction interface change (ZODB4 specifics in bold):

def lastTransaction():
"""Return the id of the last committed transaction.

Returned tid is ID of last committed transaction as observed from
some time before lastTransaction call returns. In particular for
client-sever case, lastTransaction can return cached view of storage
that was learned some time ago.

It is guaranteed that for all IStorageWrappers, that wrap the storage,
invalidation notifications have been completed for transactions
with ID ≤ returned tid.

It is guaranteed that after lastTransaction returns, "current" view of
the storage as observed by load() is ≥ returned tid.

If no transactions have been committed, return a string of 8
null (0) characters.

I believe it is very important we don't just fixup NEO and merge this patch without fixing IStorage interface. There are application-level cases that rely on exactly the same guarantees that MVCCAdapter needs here and got wrong in the first version of this patch. Also it has to be clear for all other potential storages and their authors - just from reading the interfaces - which semantic must be provided.

I created the following pull-requests to adjust the IStorage interface:

#313
#314

/cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gidzit, @jimfulton

EDIT: corrected to "... some time before lastTransaction call ~~was made~~returns." due to @jmuchemb notice.

I believe it is very important we don't just fixup NEO and merge this patch without fixing IStorage interface.

Yes. First, I wanted to be sure this MR is correct.

as observed from some time before lastTransaction call was made.

Strictly speaking, it's wrong because the value that lastTransaction reads could be updated between the call to lastTransaction start and the instruction that actually reads the value in question.

I suggest the following change:

--- a/src/ZODB/interfaces.py +++ b/src/ZODB/interfaces.py @@ -685,6 +685,14 @@ def isReadOnly(): def lastTransaction(): """Return the id of the last committed transaction. + For proper MVCC operation, the return value is the id of the last + transaction for which invalidation notifications are completed. + + In particular for client-server implementations, lastTransaction + should return a cached value (rather than querying the server). + A preliminary call to sync() can be done to get the actual last + TID at the wanted time. + If no transactions have been committed, return a string of 8 null (0) characters. """

@jmuchemb, thanks for feedback.

Strictly speaking, it's wrong ...

You are right, thanks for noticing it. It was a thinko on my side that I copied from ZODB/go. The correction is to require "as observed from some time before lastTransaction call ~~was made~~returns." I've amended ZODB/go and my two pull requests accordingly.

I suggest the following change:

I prefer my version, but am ok if ZODB/interfaces.py is updated with your text, or any other text that clearly documents required semantic.

@jamadden

Because if lastTransaction() returns tid, for which local database handle has not yet been updated with invalidations, it could lead to data corruption due to concurrency issues similar to zopefoundation#290: - DB refreshes a Connection for new transaction; - zstor.lastTransaction() is called to obtain database view for this connection. - objects in live-cache for this Connection are invalidated with invalidations that were queued through DB.invalidate() calls from storage. - if lastTransaction does not guarantee that all DB invalidations for transactions with ID ≤ returned tid have been completed, it can be that: incomplete set of objects are invalidated in live cache i.e. data corruption. This particular data corruption has been hit when working on core of ZODB and was not immediately noticed: zopefoundation#307 (review) this fact justifies the importance of explicitly stating what IStorage guarantees are / must be in the interface. This guarantee - already holds for FileStorage (no database mutations from outside of single process); - is already true for ZEO4 and ZEO5 zopefoundation#307 (review) zopefoundation#307 (comment) - holds for RelStorage because it implements IMVCCStorage natively; - is *not* currently true for NEO because NEO sets zstor.last_tid before calling DB.invalidate: https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124 However NEO is willing to change and already prepared the fix to provide this guarantee because of data corruption scenario that can happen without it: zopefoundation#307 (comment) https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d7 In other words all storages that, to my knowledge, are in current use are either already providing specified semantic, or will be shortly fixed to provide it. This way we can fix up the interface and make the semantic clear. /cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton

@jamadden

… ... ... and to provide correct "current" view for load(). This is ZODB4 backport of commit 9a1d29f (zopefoundation#313) It has been amended with ZODB4-specific It is guaranteed that after lastTransaction returns, "current" view of the storage as observed by load() is ≥ returned tid. because on ZODB<5 - contrary to ZODB5 - there is "load current" for which correct semantic must be also provided: zopefoundation#307 (comment) Original description follows: ---- 8< ---- Because if lastTransaction() returns tid, for which local database handle has not yet been updated with invalidations, it could lead to data corruption due to concurrency issues similar to zopefoundation#290: - DB refreshes a Connection for new transaction; - zstor.lastTransaction() is called to obtain database view for this connection. - objects in live-cache for this Connection are invalidated with invalidations that were queued through DB.invalidate() calls from storage. - if lastTransaction does not guarantee that all DB invalidations for transactions with ID ≤ returned tid have been completed, it can be that: incomplete set of objects are invalidated in live cache i.e. data corruption. This particular data corruption has been hit when working on core of ZODB and was not immediately noticed: zopefoundation#307 (review) this fact justifies the importance of explicitly stating what IStorage guarantees are / must be in the interface. This guarantee - already holds for FileStorage (no database mutations from outside of single process); - is already true for ZEO4 and ZEO5 zopefoundation#307 (review) zopefoundation#307 (comment) - holds for RelStorage because it implements IMVCCStorage natively; - is *not* currently true for NEO because NEO sets zstor.last_tid before calling DB.invalidate: https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124 However NEO is willing to change and already prepared the fix to provide this guarantee because of data corruption scenario that can happen without it: zopefoundation#307 (comment) https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d7 In other words all storages that, to my knowledge, are in current use are either already providing specified semantic, or will be shortly fixed to provide it. This way we can fix up the interface and make the semantic clear. /cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton

Since commit b5895a5 ("mvccadapter: fix race with invalidations when starting a new transaction"), a ZEO test fails as follows: File "src/ZEO/tests/drop_cache_rather_than_verify.txt", line 114, in drop_cache_rather_than_verify.txt Failed example: conn.root()[1].x Expected: 6 Got: 1 Earlier in the test, the ZEO server is restarted and then another client commits. When disconnected, the first client does not receive invalidations anymore and the connection gets stuck in the past until there's a new commit after it reconnected. It was possible to make the test pass with the following patch: --- a/src/ZEO/ClientStorage.py +++ b/src/ZEO/ClientStorage.py @@ -357,6 +357,7 @@ def notify_connected(self, conn, info): # invalidate our db cache if self._db is not None: + self._db.invalidate(self.lastTransaction(), ()) self._db.invalidateCache() logger.info("%s %s to storage: %s", Other implementations like NEO are probably affected the same way. Rather than changing interfaces in a backward-incompatible way, this commit revert to the original behaviour, and all the changes that were done in existing tests are reverted. However, the interfaces are clarified about the fact that storage implementations must update at a precise moment the value that is returned by lastTransaction(): just after invalidate() or tpc_finish callback.

@jamadden

Because if lastTransaction() returns tid, for which local database handle has not yet been updated with invalidations, it could lead to data corruption due to concurrency issues similar to zopefoundation#290: - DB refreshes a Connection for new transaction; - zstor.lastTransaction() is called to obtain database view for this connection. - objects in live-cache for this Connection are invalidated with invalidations that were queued through DB.invalidate() calls from storage. - if lastTransaction does not guarantee that all DB invalidations for transactions with ID ≤ returned tid have been completed, it can be that: incomplete set of objects are invalidated in live cache i.e. data corruption. This particular data corruption has been hit when working on core of ZODB and was not immediately noticed: zopefoundation#307 (review) this fact justifies the importance of explicitly stating what IStorage guarantees are / must be in the interface. This guarantee - already holds for FileStorage (no database mutations from outside of single process); - is already true for ZEO4 and ZEO5 zopefoundation#307 (review) zopefoundation#307 (comment) - holds for RelStorage because it implements IMVCCStorage natively; - is *not* currently true for NEO because NEO sets zstor.last_tid before calling DB.invalidate: https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124 However NEO is willing to change and already prepared the fix to provide this guarantee because of data corruption scenario that can happen without it: zopefoundation#307 (comment) https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d7 In other words all storages that, to my knowledge, are in current use are either already providing specified semantic, or will be shortly fixed to provide it. This way we can fix up the interface and make the semantic clear. /cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton

@jamadden

… ... ... and to provide correct "current" view for load(). This is ZODB4 backport of commit b32d52f (zopefoundation#313) It has been amended with ZODB4-specific It is guaranteed that after lastTransaction returns, "current" view of the storage as observed by load() is ≥ returned tid. because on ZODB<5 - contrary to ZODB5 - there is "load current" for which correct semantic must be also provided: zopefoundation#307 (comment) Original description follows: ---- 8< ---- Because if lastTransaction() returns tid, for which local database handle has not yet been updated with invalidations, it could lead to data corruption due to concurrency issues similar to zopefoundation#290: - DB refreshes a Connection for new transaction; - zstor.lastTransaction() is called to obtain database view for this connection. - objects in live-cache for this Connection are invalidated with invalidations that were queued through DB.invalidate() calls from storage. - if lastTransaction does not guarantee that all DB invalidations for transactions with ID ≤ returned tid have been completed, it can be that: incomplete set of objects are invalidated in live cache i.e. data corruption. This particular data corruption has been hit when working on core of ZODB and was not immediately noticed: zopefoundation#307 (review) this fact justifies the importance of explicitly stating what IStorage guarantees are / must be in the interface. This guarantee - already holds for FileStorage (no database mutations from outside of single process); - is already true for ZEO4 and ZEO5 zopefoundation#307 (review) zopefoundation#307 (comment) - holds for RelStorage because it implements IMVCCStorage natively; - is *not* currently true for NEO because NEO sets zstor.last_tid before calling DB.invalidate: https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124 However NEO is willing to change and already prepared the fix to provide this guarantee because of data corruption scenario that can happen without it: zopefoundation#307 (comment) https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d7 In other words all storages that, to my knowledge, are in current use are either already providing specified semantic, or will be shortly fixed to provide it. This way we can fix up the interface and make the semantic clear. /cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton

@jamadden

…eturning ... ... and to provide correct "current" view for load(). This is ZODB4 backport of zopefoundation#313, which itself is just a more explicit language of zopefoundation@4a6b0283#diff-881ceb274f9e538d4144950eefce8682R685-R695 It has been amended with ZODB4-specific It is guaranteed that after lastTransaction returns, "current" view of the storage as observed by load() is ≥ returned tid. because on ZODB<5 - contrary to ZODB5 - there is "load current" for which correct semantic must be also provided: zopefoundation#307 (comment) Original description follows: ---- 8< ---- Because if lastTransaction() returns tid, for which local database handle has not yet been updated with invalidations, it could lead to data corruption due to concurrency issues similar to zopefoundation#290: - DB refreshes a Connection for new transaction; - zstor.lastTransaction() is called to obtain database view for this connection. - objects in live-cache for this Connection are invalidated with invalidations that were queued through DB.invalidate() calls from storage. - if lastTransaction does not guarantee that all DB invalidations for transactions with ID ≤ returned tid have been completed, it can be that: incomplete set of objects are invalidated in live cache i.e. data corruption. This particular data corruption has been hit when working on core of ZODB and was not immediately noticed: zopefoundation#307 (review) this fact justifies the importance of explicitly stating what IStorage guarantees are / must be in the interface. This guarantee - already holds for FileStorage (no database mutations from outside of single process); - is already true for ZEO4 and ZEO5 zopefoundation#307 (review) zopefoundation#307 (comment) - holds for RelStorage because it implements IMVCCStorage natively; - is *not* currently true for NEO because NEO sets zstor.last_tid before calling DB.invalidate: https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124 However NEO is willing to change and already prepared the fix to provide this guarantee because of data corruption scenario that can happen without it: zopefoundation#307 (comment) https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec#note_112238 https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec#note_113331 In other words all storages that, to my knowledge, are in current use are either already providing specified semantic, or will be shortly fixed to provide it. This way we can fix up the interface and make the semantic clear. /cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton

@jamadden

…eturning ... ... and to provide correct "current" view for load(). This is ZODB4 backport of zopefoundation#313, which itself is just a more explicit language of zopefoundation@4a6b0283#diff-881ceb274f9e538d4144950eefce8682R685-R695 It has been amended with ZODB4-specific It is guaranteed that after lastTransaction returns, "current" view of the storage as observed by load() is ≥ returned tid. because on ZODB<5 - contrary to ZODB5 - there is "load current" for which correct semantic must be also provided: zopefoundation#307 (comment) Original description follows: ---- 8< ---- Because if lastTransaction() returns tid, for which local database handle has not yet been updated with invalidations, it could lead to data corruption due to concurrency issues similar to zopefoundation#290: - DB refreshes a Connection for new transaction; - zstor.lastTransaction() is called to obtain database view for this connection. - objects in live-cache for this Connection are invalidated with invalidations that were queued through DB.invalidate() calls from storage. - if lastTransaction does not guarantee that all DB invalidations for transactions with ID ≤ returned tid have been completed, it can be that: incomplete set of objects are invalidated in live cache i.e. data corruption. This particular data corruption has been hit when working on core of ZODB and was not immediately noticed: zopefoundation#307 (review) this fact justifies the importance of explicitly stating what IStorage guarantees are / must be in the interface. This guarantee - already holds for FileStorage (no database mutations from outside of single process); - is already true for ZEO4 and ZEO5 zopefoundation#307 (review) zopefoundation#307 (comment) - holds for RelStorage because it implements IMVCCStorage natively; - is *not* currently true for NEO because NEO sets zstor.last_tid before calling DB.invalidate: https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124 However NEO is willing to change and already prepared the fix to provide this guarantee because of data corruption scenario that can happen without it: zopefoundation#307 (comment) https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec#note_112238 https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec#note_113331 In other words all storages that, to my knowledge, are in current use are either already providing specified semantic, or will be shortly fixed to provide it. This way we can fix up the interface and make the semantic clear. /cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton

@jwolf083

This backports to ZODB4 Connection ZODB5's approach to handle MVCC via always calling storage.loadBefore() instead of "load for latest version + loadBefore if we were notified of database changes" approach. Why? ---- Short answer: because Wendelin.core 2 needs to know at which particular database state application-level ZODB connection is viewing the database, and it is hard to implement such functionality correctly without this backport. Please see appendix for the explanation. What ---- This backports to ZODB4 the minimum necessary part of upstream commit 227953b (Simplify MVCC by determining transaction start time using lastTransaction) + follow-up correctness fixes: zopefoundation#50 zopefoundation#56 zopefoundation#291 zopefoundation#307 In short: - a Connection is always opened with explicitly corresponding to a particular database revision - Connection uses only loadBefore with that revision to load objects - every time a Connection is (re)opened, the result of queued invalidations and explicit query to storage.lastTransaction is carefully merged to refresh Connection's idea about which database state it corresponds to. The "careful" in last point is important. Historically ZODB5 was first reworked in commit 227953b (zopefoundation#56) to always call lastTransaction to refresh state of Connection view. Since there was no proper synchronisation with respect to process of handling invalidations, that lead to data corruption issue due to race in Connection.open() vs invalidations: zopefoundation#290 That race and data corruption was fixed in commit b5895a5 (zopefoundation#291) by way of avoiding lastTransaction call and relying only on invalidations channel when refreshing Connection view. This fix in turn led to another data corruption issue because in presence of client-server reconnections, ZODB storage drivers can partly skip notifying client with detailed invalidation messages: zopefoundation#291 (comment) A fix to that issue (zopefoundation#307) proposed to change back to query storage for lastTransaction on every Connection refresh, but to implement careful merging of lastTransaction result and data from invalidation channel. However it was found that the "careful merging" can work correctly only if we require from storage drivers a particular ordering of invalidation events wrt lastTransaction return and result: zopefoundation#307 (comment) While ZEO was already complying with that requirements, NEO had to be fixed to support that: zopefoundation#307 (comment) https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec https://lab.nexedi.com/nexedi/neoppod/commit/96a5c01f Patch details ------------- We change Connection._txn_time to be a "before" for the database state to which Connection view corresponds. This state is hooked to be initialized and updated in Connection._flush_invalidations - the function that is called from both explicit Connection (re)open and at transaction boundaries via Connection.afterCompletion hook. Objects loading is factored into Connection._load which replaces old "load + check invalidated + fallback to loadBefore" game in Connection._setstate. Connection.open now calls Connection._flush_invalidations unconditionally - even if it was global cache reset event - because besides invalidation flushes the latter is now responsible for querying storage lastTransaction. TmpStore - a "storage" that provides runtime support for savepoints - is refactored correspondingly to delegate loading of original objects back to underlying Connection. DB.close is modified - similarly to ZODB5 - to release DB's Connections carefully with preventing connections from DB poll from implicitly starting new transactions via afterCompletion hook. ZODB.nxd_patches is introduced to indicate to client software that this particular patch is present and can be relied upon. Tests are updated correspondingly. In 227953b Jim talks about converting many tests - because "Lots of tests didn't clean up databases and connections properly" and because new MVCC approach "makes database and connection hygiene a bit more important, especially for tests, because a connection will continue to interact with storages if it isn't properly closed, which can lead to errors if the storage is closed." but finally implementing automatic cleanup at transaction boundaries because there are too many tests to fix. We backport only automatic cleanup + necessary explicit test fixes to keep the diff minimal. All tests pass. This includes tests for ZODB itself, ZEO and NEO test over hereby modified ZODB(*), my test programs from zopefoundation#290 and zopefoundation/ZEO#155 and ERP5 tests. Upcoming wendelin.core 2 also work with this change. (*) ZEO, NEO and ERP5 tests fail sometimes, but there is no regression here because ZEO, NEO and ERP5 tests are failing regularly, and in the same way, even with unmodified ZODB. Appendix. zconn_at ------------------ This appendix provides motivation for the backport: For wendelin.core v2 we need a way to know at which particular database state application-level ZODB connection is viewing the database. Knowing that state, WCFS client library interacts with WCFS filesystem server and, in simple terms, requests the server to provide data as of that particular database state. Let us call the function that for a client ZODB connection returns database state corresponding to its database view zconn_at. Now here is the problem: zconn_at is relatively easy to implement for ZODB5 - see e.g. here: https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-54-ga6a8f5b/lib/zodb.py#L142-181 https://lab.nexedi.com/nexedi/wendelin.core/commit/3bd82127 however, for ZODB4, since its operational models is not directly MVCC, it is not that straightforward. Still, even for older ZODB4, for every client connection, there _is_ such at that corresponds to that connection view of the database. We need ZODB4 support, because ZODB4 is currently the version that Nexedi uses, and my understanding is that it will stay like this for not a small time. I have the feeling that ZODB5 was reworked in better direction, but without caring enough about quality which resulted in concurrency bugs with data corruption effects like zopefoundation#290 zopefoundation/ZEO#155 etc. Even though the first one is now fixed (but it broke other parts and so both ZODB had to be fixed again _and_ NEO had to be fixed for that ZODB fix to work currently), I feel that upgrading into ZODB5 for Nexedi will require some non-negligible amount of QA work, and thus it is better if we move step-by-step - even if we eventually upgrade to ZODB5 - it is better we first migrate wendelin.core 1 -> wendelin.core 2 with keeping current version of ZODB. Now please note what would happen if zconn_at gives, even a bit, wrong answer: wcfs client will ask wcfs server to provide array data as of different database state compared to current on-client ZODB connection. This will result in that data accessed via ZBigArray will _not_ correspond to all other data accessed via regular ZODB mechanism. It is, in other words, a data corruptions. In some scenarios it can be harmless, but generally it is the worst that can happen to a database. It is good to keep in mind ZODB issue290 when imagining corner cases that zconn_at has to deal with. Even though that issue is ZODB5 only, it shows what kind of bugs it can be in zconn_at implementation for ZODB4. Just for the reference: in Wendelin-based systems there is usually constant stream of database ingestions coming from many places simultaneously. Plus many activities crunching on the DB at the same time as well. And the more clients a system handles, the more there will be level-of-concurrency increase. This means that the problem of correctly handling concurrency issues in zconn_at is not purely theoretical, but has direct relation to our systems. -------- With this backport, zconn_at for ZODB4 becomes trivial and robust to implement: https://lab.nexedi.com/kirr/wendelin.core/blob/484071b3/lib/zodb.py#L183-195 I would like to thank Joshua Wölfel whose internship helped this topic to shape up: https://www.erp5.com/project_section/wendelin-ia-project/forum/Joshua-internship-D8b7NNhWfz /cc @Nexedi, @jwolf083 Signed-off-by: Kirill Smelkov <[email protected]>

d-maurer · 2021-04-08T09:11:19Z

The change has caused a problem with ZEO: "zopefoundation/ZEO#166". The reason is obvious: ZEO does not yet fulfill the requirement that lastTransaction is updated after connection cache invalidation processing.

In "zopefoundation/ZEO#167" I tried to change this but failed: the (ZODB) check_tid_ordering_w_commit tests are now failing. This test uses the tpc_finish callback used by mvccadapter for invalidation processing to ensure that lastTransaction is called at this time. As lastTransaction now changes after invalidation processing, the test sees a value which is deemed too old.

@jmuchemb Do you have an idea how this problem might be fixed?

jmuchemb · 2021-04-08T13:59:59Z

In "zopefoundation/ZEO#167" I tried to change this but failed: the (ZODB) check_tid_ordering_w_commit tests are now failing. This test uses the tpc_finish callback used by mvccadapter for invalidation processing to ensure that lastTransaction is called at this time. As lastTransaction now changes after invalidation processing, the test sees a value which is deemed too old.

@jmuchemb Do you have an idea how this problem might be fixed?

check_tid_ordering_w_commit was fixed in #316 but no new release of ZODB since then.

d-maurer · 2021-04-09T07:37:07Z

Julien Muchembled wrote at 2021-4-8 07:00 -0700:

> In "[zopefoundation/ZEO#167](zopefoundation/ZEO#167)" I tried to change this but failed: the (ZODB) `check_tid_ordering_w_commit` tests are now failing. This test uses the `tpc_finish` callback used by `mvccadapter` for invalidation processing to ensure that `lastTransaction` is called at this time. As `lastTransaction` now changes after invalidation processing, the test sees a value which is deemed too old. > > @jmuchemb Do you have an idea how this problem might be fixed? `check_tid_ordering_w_commit` was fixed in zopefoundation/ZODB#316 but no new release of ZODB since then.

The `check_tid_ordering_w_commit` tests fail (for ZEO#167) even with the current ZODB master (which contains `ZODB#316`). This does not surprise me: the test samples `lastTransaction` at the precise place where during normal operations `MVCCAdapterInstance` would process invalidations. As (with ZODB>=5.6) `lastTransaction` must change after invalidation processing, it has not yet changed at this time and the test must fail. I believe that the original problem should be resolved in a different way: Let `MVCCAdapterInstance` override `lastTransaction` to return its `_ltid` (which is the tid of the last processed invalidation) and call the base `lastTransaction` if this is still invalid. There could still be a race condition - but only for newly created `MVCCAdapterInstance`s or after reconnections. I hope that this could be handle in `poll_invalidations`. I will explore that approach.

jmuchemb · 2021-04-09T18:08:45Z

You're not clear about where the bug would be. And if the code of ZODB is correct, I prefer not to complicate just to make the test pass. I don't understand what you suggest to do in MVCCAdapterInstance because check_tid_ordering_w_commit works directly with storages.

Note that check_tid_ordering_w_commit passes with NEO. The difference between NEO & ZEO is about the cache lock:

NEO: the lock is outside the cache instance and the caller must acquire it before accessing the cache
ZEO: the lock is inside the cache instance and cache methods lock internally

In NEO, we can then do several operations without releasing the lock. And in particular, as you can see within tpc_finish, the callback call & the last_tid update are done with the same acquired lock. The test starts with a load and NEO also acquires the cache lock for that: it means that when the load is processed, the last_tid is guaranteed to be updated (NEO needed that with ZODB<5).

For ZEO, I tried the following change (on top of zopefoundation/ZEO#167):

--- a/src/ZEO/asyncio/client.py
+++ b/src/ZEO/asyncio/client.py
@@ -661,6 +661,7 @@ def tpc_finish_threadsafe(self, future, wait_ready, tid, updates, f):
                 self.protocol.close()
                 self.disconnected(self.protocol)
             else:
+              with cache._lock:
                 # typically performs connection cache invalidations
                 f(tid)
                 # ZODB>=5.6.0 requires that ``LastTid``

and the test passes.

But what I said about ZODB is also true for ZEO: if ZEO is already correct about how it locks the cache, then we should rather change the test.

d-maurer · 2021-04-09T22:18:06Z

Julien Muchembled wrote at 2021-4-9 11:09 -0700:

You're not clear about where the bug would be. And if the code of ZODB is correct, I prefer not to complicate just to make the test pass. I don't understand what you suggest to do in `MVCCAdapterInstance` because `check_tid_ordering_w_commit` works directly with storages.

`MVCCAdapterInstance` puts the invalidations associated with `tpc_finish` into the callback to the storage's `tpc_finish`. The storage interface now (ZODB>=5.6) requests that `lastTransaction` changes **after** invalidation processing. For `ClientStorage`, this implies that `cache.setLastTID` (the source of `lastTransaction`) must be called after the callback call. In the `check_tid_ordering_w_commit`, this same callback is used to coordinate `lastTransaction` sampling. Therefore, it necessarily gets the wrong `lastTransaction`: the one before the `tpc_finish` not the one set up by `tpc_finish`. Do you now see the problem? If `lastTransaction` must change after invalidation processing, `check_tid_ordering_w_commit` cannot see the correct `lastTransaction` value because it monitors this value at the place where during normal operations the invalidation takes place.

Note that `check_tid_ordering_w_commit` passes with NEO. The difference between NEO & ZEO is about the cache lock: - NEO: the lock is outside the cache instance and the caller must acquire it before accessing the cache

I perfectly understand why the test fails with `ZEO`. I do not understand why it succeeds with any storage which is used via `MVCCAdapter`. Or more precisely: if the test succeeds, the storage would need to change `lastTransaction` **before** invalidation processing -- contrary to the new requirement of ZODB>=5.6.

... For ZEO, I tried the following change (on top of zopefoundation/ZEO#167): ```diff --- a/src/ZEO/asyncio/client.py +++ b/src/ZEO/asyncio/client.py @@ -661,6 +661,7 @@ def tpc_finish_threadsafe(self, future, wait_ready, tid, updates, f): self.protocol.close() self.disconnected(self.protocol) else: + with cache._lock: # typically performs connection cache invalidations f(tid) # ZODB>=5.6.0 requires that ``LastTid`` ``` and the test passes.

Really surprising: how can the test pass when the test setup tries hard to ensure that `lastTransaction` is sampled during the `f(tid)` call and `setLastTID` (which provides the value for `lastTransaction`) is called only thereafter?

But what I said about ZODB is also true for ZEO: if ZEO is already correct about how it locks the cache, then we should rather change the test.

As I read the test, it depends on the fact that `lastTransaction` changes before invalidation processing -- at least for all storages which are used with `MVCCAdapter` -- contrary to the new (ZODB>=5.6) ordering requirement.

navytux · 2021-04-13T17:48:26Z

@d-maurer, I quickly checked, and while I might be missing something, to me it looks like there should not be the problem you describe with check_tid_ordering_w_commit: there both commit (that calls storage.tpc_finish with callback to sample new tid) and load (that samples new serial and lastTransaction) are run in their own threads in parallel. The logic here is that while tpc_finish callback is run - it is holding a kind of "storage lock". This way thread that calls lastTransaction must be blocked to wait for the tpc_finish thread to first release that lock. And after the storage lock is released tpc_finish is complete and so the notion of last tid is already updated too.

ZODB/src/ZODB/tests/BasicStorage.py

Lines 321 to 379 in dad7780

    
           to_join = [] 
        
           def run_in_thread(func): 
        
               t = threading.Thread(target=func) 
        
               t.setDaemon(True) 
        
               t.start() 
        
               to_join.append(t) 
        
           started = threading.Event() 
        
           finish = threading.Event() 
        
           @run_in_thread 
        
           def commit(): 
        
               def callback(tid): 
        
                   started.set() 
        
                   tids.append(tid) 
        
                   finish.wait() 
        
               self._storage.tpc_finish(t, callback) 
        
           results = {} 
        
           started.wait() 
        
           attempts = [] 
        
           attempts_cond = utils.Condition() 
        
           def update_attempts(): 
        
               with attempts_cond: 
        
                   attempts.append(1) 
        
                   attempts_cond.notifyAll() 
        
           @run_in_thread 
        
           def load(): 
        
               update_attempts() 
        
               results['load'] = utils.load_current(self._storage, ZERO)[1] 
        
               results['lastTransaction'] = self._storage.lastTransaction() 
        
           expected_attempts = 1 
        
           if hasattr(self._storage, 'getTid'): 
        
               expected_attempts += 1 
        
               @run_in_thread 
        
               def getTid(): 
        
                   update_attempts() 
        
                   results['getTid'] = self._storage.getTid(ZERO) 
        
           if hasattr(self._storage, 'lastInvalidations'): 
        
               expected_attempts += 1 
        
               @run_in_thread 
        
               def lastInvalidations(): 
        
                   update_attempts() 
        
                   invals = self._storage.lastInvalidations(1) 
        
                   if invals: 
        
                       results['lastInvalidations'] = invals[0][0] 
        
           with attempts_cond: 
        
               while len(attempts) < expected_attempts: 
        
                   attempts_cond.wait() 
        
           time.sleep(.01) # for good measure :) 
        
           finish.set()

Existing storages already confirm to the requiredment to first call the tpc_finish callback, and only after that to set the last tid. And they indeed do so from under holding the "storage lock". Here are some examples:

BaseStorage:

ZODB/src/ZODB/BaseStorage.py

Lines 230 to 246 in dad7780

    
           def tpc_finish(self, transaction, f=None): 
        
               # It's important that the storage calls the function we pass 
        
               # while it still has its lock.  We don't want another thread 
        
               # to be able to read any updated data until we've had a chance 
        
               # to send an invalidation message to all of the other 
        
               # connections! 
        
               with self._lock: 
        
                   if transaction is not self._transaction: 
        
                       raise POSException.StorageTransactionError( 
        
                           "tpc_finish called with wrong transaction") 
        
                   try: 
        
                       if f is not None: 
        
                           f(self._tid) 
        
                       u, d, e = self._ude 
        
                       self._finish(self._tid, u, d, e) 
        
                       self._clear_temp()

FileStorage:

ZODB/src/ZODB/FileStorage/FileStorage.py

Lines 810 to 821 in dad7780

    
           def tpc_finish(self, transaction, f=None): 
        
               with self._files.write_lock(): 
        
                   with self._lock: 
        
                       if transaction is not self._transaction: 
        
                           raise StorageTransactionError( 
        
                               "tpc_finish called with wrong transaction") 
        
                       try: 
        
                           tid = self._tid 
        
                           if f is not None: 
        
                               f(tid) 
        
                           self._finish(tid, *self._ude) 
        
                           self._clear_temp()

MappingStorage

ZODB/src/ZODB/MappingStorage.py

Lines 286 to 321 in dad7780

    
               self._commit_lock.acquire() 
        
               with self._lock: 
        
                   self._transaction = transaction 
        
                   self._tdata = {} 
        
                   if tid is None: 
        
                       if self._transactions: 
        
                           old_tid = self._transactions.maxKey() 
        
                       else: 
        
                           old_tid = None 
        
                       tid = ZODB.utils.newTid(old_tid) 
        
                   self._tid = tid 
        
           # ZODB.interfaces.IStorage 
        
           @ZODB.utils.locked(opened) 
        
           def tpc_finish(self, transaction, func = lambda tid: None): 
        
               if (transaction is not self._transaction): 
        
                   raise ZODB.POSException.StorageTransactionError( 
        
                       "tpc_finish called with wrong transaction") 
        
               tid = self._tid 
        
               func(tid) 
        
               tdata = self._tdata 
        
               for oid in tdata: 
        
                   tid_data = self._data.get(oid) 
        
                   if tid_data is None: 
        
                       tid_data = BTrees.OOBTree.OOBucket() 
        
                       self._data[oid] = tid_data 
        
                   tid_data[tid] = tdata[oid] 
        
               self._ltid = tid 
        
               self._transactions[tid] = TransactionRecord(tid, transaction, tdata) 
        
               self._transaction = None 
        
               del self._tdata 
        
               self._commit_lock.release()

NEO:

https://lab.nexedi.com/nexedi/neoppod/blob/de0feb4e/neo/client/handlers/master.py#L69-86

Doing last tid update from under lock (and retrieving last tid with also taking the lock) is likely what @jimfulton was talking about "some locking strategy" in c3091e2.

Does that help? Did I missed something?

Thanks again for your work on to bring ZEO5 into a better shape,
Kirill

d-maurer · 2021-04-13T21:40:16Z

Kirill Smelkov wrote at 2021-4-13 10:48 -0700: ***@***.***, I quickly checked, and while I might be missing something, to me it looks like there should not be the problem you describe with `check_tid_ordering_w_commit`: there both `commit` (that calls storage.tpc_finish with callback to sample new tid) and `load` (that samples new serial and `lastTransaction`) are run in their own threads in parallel. The logic here is that while tpc_finish callback is run - it is holding a kind of "storage lock". This way thread that calls lastTransaction must be blocked to wait for the tpc_finish thread to first release that lock. And after the storage lock is released tpc_finish is complete and so the notion of last tid is already updated too. I am convinced: I have misread the test that `finish` is set (and therefore `tpc_finish` can complete) after the sampling threads have run; but in fact, `finish` is set after all sampling threads are started. The "storage" lock (not visible by looking at the test only) apparently prevents the sampling threads to complete before `tpc_finish` ends. But for ZEO5, `ClientStorage` seems to lack such a lock.

... Does that help? Did I missed something?

It does. We still need to find a way to implement the lock for `ClientStorage` in ZEO5. There, the test fails when I move the `setLastTID` call after the callback.

navytux · 2021-04-14T07:18:45Z

Thanks, @d-maurer.

jmuchemb force-pushed the issue290 branch 2 times, most recently from a9ecb96 to 8d20f20 Compare April 1, 2020 15:49

jmuchemb mentioned this pull request May 4, 2020

Release new version? #308

Closed

jamadden approved these changes Jun 2, 2020

View reviewed changes

jmuchemb force-pushed the issue290 branch from 7138f4d to e95d8d4 Compare June 2, 2020 15:33

navytux reviewed Jun 2, 2020

View reviewed changes

navytux mentioned this pull request Jun 8, 2020

IStorage: Require lastTransaction() to invalidate DB before returning #313

Closed

navytux mentioned this pull request Jun 8, 2020

[ZODB4] IStorage: Require lastTransaction() to invalidate DB before returning… #314

Closed

changelog: pull request #307

181fa37

jmuchemb force-pushed the issue290 branch from e95d8d4 to 181fa37 Compare June 10, 2020 14:52

jmuchemb merged commit 181fa37 into master Jun 11, 2020

jmuchemb deleted the issue290 branch June 11, 2020 09:37

jmuchemb mentioned this pull request Jun 15, 2020

Relax assertion in check_tid_ordering_w_commit test #316

Merged

navytux mentioned this pull request Apr 8, 2021

Another data corruption due to concurrency bug (ZEO5 regression; no external invalidations) zopefoundation/ZEO#166

Closed

navytux mentioned this pull request Apr 13, 2021

move setLastTID call after connection cache invalidation processing zopefoundation/ZEO#167

Closed

d-maurer mentioned this pull request Apr 14, 2021

mark MVCCAdapter instance's tpc_finish callbacks #344

Closed

navytux mentioned this pull request Apr 24, 2021

Switch from Travis CI to GitHub Actions for testing zopefoundation/ZEO#175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mvccadapter: check if the last TID changed without invalidation #307

mvccadapter: check if the last TID changed without invalidation #307

jmuchemb commented Apr 1, 2020

jmuchemb commented Apr 1, 2020 •

edited

Loading

jmuchemb commented Jun 2, 2020

jamadden commented Jun 2, 2020

jamadden left a comment •

edited

Loading

jmuchemb commented Jun 2, 2020

jamadden commented Jun 2, 2020

navytux Jun 2, 2020

jmuchemb Jun 2, 2020

navytux Jun 3, 2020 •

edited

Loading

jmuchemb Jun 8, 2020

navytux Jun 8, 2020 •

edited

Loading

jmuchemb Jun 8, 2020

navytux Jun 9, 2020

d-maurer commented Apr 8, 2021

jmuchemb commented Apr 8, 2021

d-maurer commented Apr 9, 2021 via email

jmuchemb commented Apr 9, 2021

d-maurer commented Apr 9, 2021 via email

navytux commented Apr 13, 2021

d-maurer commented Apr 13, 2021 via email

navytux commented Apr 14, 2021

mvccadapter: check if the last TID changed without invalidation #307

mvccadapter: check if the last TID changed without invalidation #307

Conversation

jmuchemb commented Apr 1, 2020

jmuchemb commented Apr 1, 2020 • edited Loading

jmuchemb commented Jun 2, 2020

jamadden commented Jun 2, 2020

jamadden left a comment • edited Loading

Choose a reason for hiding this comment

jmuchemb commented Jun 2, 2020

jamadden commented Jun 2, 2020

navytux Jun 2, 2020

Choose a reason for hiding this comment

jmuchemb Jun 2, 2020

Choose a reason for hiding this comment

navytux Jun 3, 2020 • edited Loading

Choose a reason for hiding this comment

jmuchemb Jun 8, 2020

Choose a reason for hiding this comment

navytux Jun 8, 2020 • edited Loading

Choose a reason for hiding this comment

jmuchemb Jun 8, 2020

Choose a reason for hiding this comment

navytux Jun 9, 2020

Choose a reason for hiding this comment

d-maurer commented Apr 8, 2021

jmuchemb commented Apr 8, 2021

d-maurer commented Apr 9, 2021 via email

jmuchemb commented Apr 9, 2021

d-maurer commented Apr 9, 2021 via email

navytux commented Apr 13, 2021

d-maurer commented Apr 13, 2021 via email

navytux commented Apr 14, 2021

jmuchemb commented Apr 1, 2020 •

edited

Loading

jamadden left a comment •

edited

Loading

navytux Jun 3, 2020 •

edited

Loading

navytux Jun 8, 2020 •

edited

Loading