Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mvccadapter: check if the last TID changed without invalidation #307

Merged
merged 2 commits into from
Jun 11, 2020

Conversation

jmuchemb
Copy link
Member

@jmuchemb jmuchemb commented Apr 1, 2020

Since commit b5895a5, a ZEO test fails as follows:

File "src/ZEO/tests/drop_cache_rather_than_verify.txt", line 114, in drop_cache_rather_than_verify.txt
Failed example:
    conn.root()[1].x
Expected:
    6
Got:
    1

Earlier in the test, the ZEO server is restarted and then another client commits. When disconnected, the first client does not receive invalidations anymore and the connection gets stuck in the past until there's a new commit after it reconnected. It was possible to make the test pass with the following patch:

--- a/src/ZEO/ClientStorage.py
+++ b/src/ZEO/ClientStorage.py
@@ -357,6 +357,7 @@ def notify_connected(self, conn, info):
 
         # invalidate our db cache
         if self._db is not None:
+            self._db.invalidate(self.lastTransaction(), ())
             self._db.invalidateCache()
 
         logger.info("%s %s to storage: %s",

Other implementations like NEO are probably affected the same way.

Rather than changing interfaces in a backward-incompatible way, this commit revert to the original behaviour, and all the changes that were done in existing tests are reverted.

@jmuchemb
Copy link
Member Author

jmuchemb commented Apr 1, 2020

This is a follow-up of #291.

I submit this PR because it solves an issue that has to be fixed quickly.

However, this is extra code that's useless most of the time so I prefer we change the interface, which may require a major version bump.

@jmuchemb jmuchemb force-pushed the issue290 branch 2 times, most recently from a9ecb96 to 8d20f20 Compare April 1, 2020 15:49
@jmuchemb jmuchemb mentioned this pull request May 4, 2020
@jmuchemb
Copy link
Member Author

jmuchemb commented Jun 2, 2020

I'll merge on Friday.

@jamadden
Copy link
Member

jamadden commented Jun 2, 2020

Can we get CI green, please?

Copy link
Member

@jamadden jamadden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what's going on here and I think this looks OK, at least in isolation, once tests go green.

It needs a good change note (I think there's already something about #290 in there, so maybe just an updated change note mentioning this PR?).

@jmuchemb
Copy link
Member Author

jmuchemb commented Jun 2, 2020

Can we get CI green, please?

Oops. Now fixed, with:

--- a/src/ZODB/tests/testmvcc.py
+++ b/src/ZODB/tests/testmvcc.py
@@ -87,10 +87,10 @@
 
 >>> cn._storage._start == p64(u64(st.lastTransaction()) + 1)
 True
->>> cn.db()._mvcc_storage.invalidate(100, dict.fromkeys([1, 2]))
+>>> cn.db()._mvcc_storage.invalidate(p64(100), dict.fromkeys([1, 2]))
 >>> cn._storage._start == p64(u64(st.lastTransaction()) + 1)
 True
->>> cn.db()._mvcc_storage.invalidate(200, dict.fromkeys([1, 2]))
+>>> cn.db()._mvcc_storage.invalidate(p64(200), dict.fromkeys([1, 2]))
 >>> cn._storage._start == p64(u64(st.lastTransaction()) + 1)
 True
 

I also needed #310 to be merged.

@jamadden
Copy link
Member

jamadden commented Jun 2, 2020

Awesome!

# so we still have to poll.
ltid = self._storage.lastTransaction()
# But at this precise moment, a transaction may be committed and
# we have already received the new tid, along with invalidations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmuchemb, @jamadden isn't the above true only if storage, upon receiving invalidation messages from server, first notifies DB.invalidate() and only after updates zstor.last_tid? For the reference this is indeed true for ZEO4:

https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/ClientStorage.py#L1473-L1482

but not true for NEO:

https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124

If zstor.last_tid is updated first before zstor sends invalidations to DB (to MVCCAdapter here), what could happen is:

  • poll_invalidations calls zstor.lastTransaction -> gets just committed tid;
  • poll invalidations proceeds and could retrieve ._invalidations before zstor could actually update them with invalidation data corresponding to tid;
  • tid, and incomplete set of objects to invalidate is returned.

I.e. it will be a data corruption, because objects might remain stale in live cache with similar data corruption scenario as demonstrated in #290.

P.S. I could not quickly see which way ZEO5 implements. Maybe it is something different, or actually related to zopefoundation/ZEO#155.

P.P.S RelStorage is unaffected as it implements IMVCCStorage natively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's indeed a problem between this MR & NEO :(

For ZEO4, the 2 operations are wrapped around the same lock so the order does not matter.

For ZEO5, it's first DB.invalidate() then lastTransaction() updated.

So I have to answer 2 questions:

  1. Is ZEO5 behaviour enough ?
  2. If it is, can I do the same in NEO ?

zopefoundation/ZEO#155 is unrelated.

Copy link
Contributor

@navytux navytux Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmuchemb, thanks for feedback.

Yes, I believe if we change DB.invalidate() call to always happen before zstor.last_tid update that would be enough. If we go this way this change also requires corresponding change in IStorage interface definiton. Something like below:

--- a/src/ZODB/interfaces.py
+++ b/src/ZODB/interfaces.py
@@ -685,6 +685,15 @@ def isReadOnly():
     def lastTransaction():
         """Return the id of the last committed transaction.
 
+        Returned tid is ID of last committed transaction as observed from
+        some time _before_ lastTransaction call returns. In particular for
+        client-sever case, lastTransaction can return cached view of storage
+        that was learned some time ago.
+
+        It is guaranteed that for all IStorageWrappers, that wrap the storage,
+        invalidation notifications have been completed for transactions
+        with ID ≤ returned tid.
+
         If no transactions have been committed, return a string of 8
         null (0) characters.
         """

EDIT: corrected to "... some time before lastTransaction call was madereturns." due to @jmuchemb notice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I found no way to cause any bug if lastTransaction() is updated just after invalidations are notified. I feared issues because NEO also uses lastTransaction() internally: note however it must do it before releasing its cache lock, otherwise "load current" fails on ZODB<5.

Updating lastTransaction() at the right moment also matters for "internal" invalidations: I mean that it must be done just after calling the tpc_finish callback.

https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d79f68dce00267f3ae9c4e12d56d19dede should make NEO work with this MR. It includes tests for the 4 cases.

Copy link
Contributor

@navytux navytux Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmuchemb, thanks for feedback and adapting things on NEO side. I see about the need to update app.last_tid while still under app._cache_lock - as app._cache and app.last_tid need to be kept coherent on NEO side internally.

Updating lastTransaction() at the right moment also matters for "internal" invalidations: I mean that it must be done just after calling the tpc_finish callback.

Yes, thanks for pointing this out. There app.last_tid has to be also set while still under app._cache_lock and after call to app._cache.invalidate(). As you say this matters for "load current" on ZODB<5. I thus suggest the following amendment for ZODB4 version of proposed IStorage.lastTransaction interface change (ZODB4 specifics in bold):

def lastTransaction():
"""Return the id of the last committed transaction.

Returned tid is ID of last committed transaction as observed from
some time before lastTransaction call returns. In particular for
client-sever case, lastTransaction can return cached view of storage
that was learned some time ago.

It is guaranteed that for all IStorageWrappers, that wrap the storage,
invalidation notifications have been completed for transactions
with ID ≤ returned tid.

It is guaranteed that after lastTransaction returns, "current" view of
the storage as observed by load() is ≥ returned tid.

If no transactions have been committed, return a string of 8
null (0) characters.

I believe it is very important we don't just fixup NEO and merge this patch without fixing IStorage interface. There are application-level cases that rely on exactly the same guarantees that MVCCAdapter needs here and got wrong in the first version of this patch. Also it has to be clear for all other potential storages and their authors - just from reading the interfaces - which semantic must be provided.

I created the following pull-requests to adjust the IStorage interface:

#313
#314

/cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gidzit, @jimfulton


EDIT: corrected to "... some time before lastTransaction call was madereturns." due to @jmuchemb notice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is very important we don't just fixup NEO and merge this patch without fixing IStorage interface.

Yes. First, I wanted to be sure this MR is correct.

as observed from some time before lastTransaction call was made.

Strictly speaking, it's wrong because the value that lastTransaction reads could be updated between the call to lastTransaction start and the instruction that actually reads the value in question.

I suggest the following change:

--- a/src/ZODB/interfaces.py
+++ b/src/ZODB/interfaces.py
@@ -685,6 +685,14 @@ def isReadOnly():
     def lastTransaction():
         """Return the id of the last committed transaction.
 
+        For proper MVCC operation, the return value is the id of the last
+        transaction for which invalidation notifications are completed.
+
+        In particular for client-server implementations, lastTransaction
+        should return a cached value (rather than querying the server).
+        A preliminary call to sync() can be done to get the actual last
+        TID at the wanted time.
+
         If no transactions have been committed, return a string of 8
         null (0) characters.
         """

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmuchemb, thanks for feedback.

Strictly speaking, it's wrong ...

You are right, thanks for noticing it. It was a thinko on my side that I copied from ZODB/go. The correction is to require "as observed from some time before lastTransaction call was madereturns." I've amended ZODB/go and my two pull requests accordingly.

I suggest the following change:

I prefer my version, but am ok if ZODB/interfaces.py is updated with your text, or any other text that clearly documents required semantic.

navytux added a commit to navytux/ZODB that referenced this pull request Jun 8, 2020
Because if lastTransaction() returns tid, for which local database
handle has not yet been updated with invalidations, it could lead to
data corruption due to concurrency issues similar to
zopefoundation#290:

- DB refreshes a Connection for new transaction;
- zstor.lastTransaction() is called to obtain database view for this connection.
- objects in live-cache for this Connection are invalidated with
  invalidations that were queued through DB.invalidate() calls from
  storage.
- if lastTransaction does not guarantee that all DB invalidations for
  transactions with ID ≤ returned tid have been completed, it can be
  that:

	incomplete set of objects are invalidated in live cache

  i.e. data corruption.

This particular data corruption has been hit when working on core of
ZODB and was not immediately noticed:

zopefoundation#307 (review)

this fact justifies the importance of explicitly stating what IStorage
guarantees are / must be in the interface.

This guarantee

- already holds for FileStorage (no database mutations from outside of
  single process);
- is already true for ZEO4 and ZEO5
  zopefoundation#307 (review)
  zopefoundation#307 (comment)
- holds for RelStorage because it implements IMVCCStorage natively;
- is *not* currently true for NEO because NEO sets zstor.last_tid before
  calling DB.invalidate:

  https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124

However NEO is willing to change and already prepared the fix to provide
this guarantee because of data corruption scenario that can happen
without it:

  zopefoundation#307 (comment)
  https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d7

In other words all storages that, to my knowledge, are in current use
are either already providing specified semantic, or will be shortly
fixed to provide it.

This way we can fix up the interface and make the semantic clear.

/cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton
navytux added a commit to navytux/ZODB that referenced this pull request Jun 8, 2020
… ...

... and to provide correct "current" view for load().

This is ZODB4 backport of commit 9a1d29f
(zopefoundation#313)

It has been amended with ZODB4-specific

      It is guaranteed that after lastTransaction returns, "current" view of
      the storage as observed by load() is ≥ returned tid.

because on ZODB<5 - contrary to ZODB5 - there is "load current" for
which correct semantic must be also provided:
zopefoundation#307 (comment)

Original description follows:

---- 8< ----

Because if lastTransaction() returns tid, for which local database
handle has not yet been updated with invalidations, it could lead to
data corruption due to concurrency issues similar to
zopefoundation#290:

- DB refreshes a Connection for new transaction;
- zstor.lastTransaction() is called to obtain database view for this connection.
- objects in live-cache for this Connection are invalidated with
  invalidations that were queued through DB.invalidate() calls from
  storage.
- if lastTransaction does not guarantee that all DB invalidations for
  transactions with ID ≤ returned tid have been completed, it can be
  that:

	incomplete set of objects are invalidated in live cache

  i.e. data corruption.

This particular data corruption has been hit when working on core of
ZODB and was not immediately noticed:

zopefoundation#307 (review)

this fact justifies the importance of explicitly stating what IStorage
guarantees are / must be in the interface.

This guarantee

- already holds for FileStorage (no database mutations from outside of
  single process);
- is already true for ZEO4 and ZEO5
  zopefoundation#307 (review)
  zopefoundation#307 (comment)
- holds for RelStorage because it implements IMVCCStorage natively;
- is *not* currently true for NEO because NEO sets zstor.last_tid before
  calling DB.invalidate:

  https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124

However NEO is willing to change and already prepared the fix to provide
this guarantee because of data corruption scenario that can happen
without it:

  zopefoundation#307 (comment)
  https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d7

In other words all storages that, to my knowledge, are in current use
are either already providing specified semantic, or will be shortly
fixed to provide it.

This way we can fix up the interface and make the semantic clear.

/cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton
Since commit b5895a5 ("mvccadapter:
fix race with invalidations when starting a new transaction"),
a ZEO test fails as follows:

    File "src/ZEO/tests/drop_cache_rather_than_verify.txt", line 114, in drop_cache_rather_than_verify.txt
    Failed example:
        conn.root()[1].x
    Expected:
        6
    Got:
        1

Earlier in the test, the ZEO server is restarted and then another
client commits. When disconnected, the first client does not receive
invalidations anymore and the connection gets stuck in the past until
there's a new commit after it reconnected. It was possible to make the
test pass with the following patch:

--- a/src/ZEO/ClientStorage.py
+++ b/src/ZEO/ClientStorage.py
@@ -357,6 +357,7 @@ def notify_connected(self, conn, info):

         # invalidate our db cache
         if self._db is not None:
+            self._db.invalidate(self.lastTransaction(), ())
             self._db.invalidateCache()

         logger.info("%s %s to storage: %s",

Other implementations like NEO are probably affected the same way.

Rather than changing interfaces in a backward-incompatible way,
this commit revert to the original behaviour, and all the changes
that were done in existing tests are reverted.

However, the interfaces are clarified about the fact that storage
implementations must update at a precise moment the value that is
returned by lastTransaction(): just after invalidate() or
tpc_finish callback.
navytux added a commit to navytux/ZODB that referenced this pull request Jun 9, 2020
Because if lastTransaction() returns tid, for which local database
handle has not yet been updated with invalidations, it could lead to
data corruption due to concurrency issues similar to
zopefoundation#290:

- DB refreshes a Connection for new transaction;
- zstor.lastTransaction() is called to obtain database view for this connection.
- objects in live-cache for this Connection are invalidated with
  invalidations that were queued through DB.invalidate() calls from
  storage.
- if lastTransaction does not guarantee that all DB invalidations for
  transactions with ID ≤ returned tid have been completed, it can be
  that:

	incomplete set of objects are invalidated in live cache

  i.e. data corruption.

This particular data corruption has been hit when working on core of
ZODB and was not immediately noticed:

zopefoundation#307 (review)

this fact justifies the importance of explicitly stating what IStorage
guarantees are / must be in the interface.

This guarantee

- already holds for FileStorage (no database mutations from outside of
  single process);
- is already true for ZEO4 and ZEO5
  zopefoundation#307 (review)
  zopefoundation#307 (comment)
- holds for RelStorage because it implements IMVCCStorage natively;
- is *not* currently true for NEO because NEO sets zstor.last_tid before
  calling DB.invalidate:

  https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124

However NEO is willing to change and already prepared the fix to provide
this guarantee because of data corruption scenario that can happen
without it:

  zopefoundation#307 (comment)
  https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d7

In other words all storages that, to my knowledge, are in current use
are either already providing specified semantic, or will be shortly
fixed to provide it.

This way we can fix up the interface and make the semantic clear.

/cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton
navytux added a commit to navytux/ZODB that referenced this pull request Jun 9, 2020
… ...

... and to provide correct "current" view for load().

This is ZODB4 backport of commit b32d52f
(zopefoundation#313)

It has been amended with ZODB4-specific

      It is guaranteed that after lastTransaction returns, "current" view of
      the storage as observed by load() is ≥ returned tid.

because on ZODB<5 - contrary to ZODB5 - there is "load current" for
which correct semantic must be also provided:
zopefoundation#307 (comment)

Original description follows:

---- 8< ----

Because if lastTransaction() returns tid, for which local database
handle has not yet been updated with invalidations, it could lead to
data corruption due to concurrency issues similar to
zopefoundation#290:

- DB refreshes a Connection for new transaction;
- zstor.lastTransaction() is called to obtain database view for this connection.
- objects in live-cache for this Connection are invalidated with
  invalidations that were queued through DB.invalidate() calls from
  storage.
- if lastTransaction does not guarantee that all DB invalidations for
  transactions with ID ≤ returned tid have been completed, it can be
  that:

	incomplete set of objects are invalidated in live cache

  i.e. data corruption.

This particular data corruption has been hit when working on core of
ZODB and was not immediately noticed:

zopefoundation#307 (review)

this fact justifies the importance of explicitly stating what IStorage
guarantees are / must be in the interface.

This guarantee

- already holds for FileStorage (no database mutations from outside of
  single process);
- is already true for ZEO4 and ZEO5
  zopefoundation#307 (review)
  zopefoundation#307 (comment)
- holds for RelStorage because it implements IMVCCStorage natively;
- is *not* currently true for NEO because NEO sets zstor.last_tid before
  calling DB.invalidate:

  https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124

However NEO is willing to change and already prepared the fix to provide
this guarantee because of data corruption scenario that can happen
without it:

  zopefoundation#307 (comment)
  https://lab.nexedi.com/nexedi/neoppod/commit/5870c5d7

In other words all storages that, to my knowledge, are in current use
are either already providing specified semantic, or will be shortly
fixed to provide it.

This way we can fix up the interface and make the semantic clear.

/cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton
@jmuchemb jmuchemb merged commit 181fa37 into master Jun 11, 2020
@jmuchemb jmuchemb deleted the issue290 branch June 11, 2020 09:37
navytux added a commit to navytux/ZODB that referenced this pull request Jul 31, 2020
…eturning ...

... and to provide correct "current" view for load().

This is ZODB4 backport of zopefoundation#313,
which itself is just a more explicit language of
zopefoundation@4a6b0283#diff-881ceb274f9e538d4144950eefce8682R685-R695

It has been amended with ZODB4-specific

      It is guaranteed that after lastTransaction returns, "current" view of
      the storage as observed by load() is ≥ returned tid.

because on ZODB<5 - contrary to ZODB5 - there is "load current" for
which correct semantic must be also provided:
zopefoundation#307 (comment)

Original description follows:

---- 8< ----

Because if lastTransaction() returns tid, for which local database
handle has not yet been updated with invalidations, it could lead to
data corruption due to concurrency issues similar to
zopefoundation#290:

- DB refreshes a Connection for new transaction;
- zstor.lastTransaction() is called to obtain database view for this connection.
- objects in live-cache for this Connection are invalidated with
  invalidations that were queued through DB.invalidate() calls from
  storage.
- if lastTransaction does not guarantee that all DB invalidations for
  transactions with ID ≤ returned tid have been completed, it can be
  that:

	incomplete set of objects are invalidated in live cache

  i.e. data corruption.

This particular data corruption has been hit when working on core of
ZODB and was not immediately noticed:

zopefoundation#307 (review)

this fact justifies the importance of explicitly stating what IStorage
guarantees are / must be in the interface.

This guarantee

- already holds for FileStorage (no database mutations from outside of
  single process);
- is already true for ZEO4 and ZEO5
  zopefoundation#307 (review)
  zopefoundation#307 (comment)
- holds for RelStorage because it implements IMVCCStorage natively;
- is *not* currently true for NEO because NEO sets zstor.last_tid before
  calling DB.invalidate:

  https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124

However NEO is willing to change and already prepared the fix to provide
this guarantee because of data corruption scenario that can happen
without it:

  zopefoundation#307 (comment)
  https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec
  https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec#note_112238
  https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec#note_113331

In other words all storages that, to my knowledge, are in current use
are either already providing specified semantic, or will be shortly
fixed to provide it.

This way we can fix up the interface and make the semantic clear.

/cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton
navytux added a commit to navytux/ZODB that referenced this pull request Nov 30, 2020
…eturning ...

... and to provide correct "current" view for load().

This is ZODB4 backport of zopefoundation#313,
which itself is just a more explicit language of
zopefoundation@4a6b0283#diff-881ceb274f9e538d4144950eefce8682R685-R695

It has been amended with ZODB4-specific

      It is guaranteed that after lastTransaction returns, "current" view of
      the storage as observed by load() is ≥ returned tid.

because on ZODB<5 - contrary to ZODB5 - there is "load current" for
which correct semantic must be also provided:
zopefoundation#307 (comment)

Original description follows:

---- 8< ----

Because if lastTransaction() returns tid, for which local database
handle has not yet been updated with invalidations, it could lead to
data corruption due to concurrency issues similar to
zopefoundation#290:

- DB refreshes a Connection for new transaction;
- zstor.lastTransaction() is called to obtain database view for this connection.
- objects in live-cache for this Connection are invalidated with
  invalidations that were queued through DB.invalidate() calls from
  storage.
- if lastTransaction does not guarantee that all DB invalidations for
  transactions with ID ≤ returned tid have been completed, it can be
  that:

	incomplete set of objects are invalidated in live cache

  i.e. data corruption.

This particular data corruption has been hit when working on core of
ZODB and was not immediately noticed:

zopefoundation#307 (review)

this fact justifies the importance of explicitly stating what IStorage
guarantees are / must be in the interface.

This guarantee

- already holds for FileStorage (no database mutations from outside of
  single process);
- is already true for ZEO4 and ZEO5
  zopefoundation#307 (review)
  zopefoundation#307 (comment)
- holds for RelStorage because it implements IMVCCStorage natively;
- is *not* currently true for NEO because NEO sets zstor.last_tid before
  calling DB.invalidate:

  https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124

However NEO is willing to change and already prepared the fix to provide
this guarantee because of data corruption scenario that can happen
without it:

  zopefoundation#307 (comment)
  https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec
  https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec#note_112238
  https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec#note_113331

In other words all storages that, to my knowledge, are in current use
are either already providing specified semantic, or will be shortly
fixed to provide it.

This way we can fix up the interface and make the semantic clear.

/cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton
navytux added a commit to navytux/ZODB that referenced this pull request Nov 30, 2020
This backports to ZODB4 Connection ZODB5's approach to handle MVCC via
always calling storage.loadBefore() instead of "load for latest version
+ loadBefore if we were notified of database changes" approach.

Why?
----

Short answer: because Wendelin.core 2 needs to know at which particular
database state application-level ZODB connection is viewing the
database, and it is hard to implement such functionality correctly
without this backport. Please see appendix for the explanation.

What
----

This backports to ZODB4 the minimum necessary part of upstream commit 227953b
(Simplify MVCC by determining transaction start time using lastTransaction) +
follow-up correctness fixes:

zopefoundation#50
zopefoundation#56
zopefoundation#291
zopefoundation#307

In short:

- a Connection is always opened with explicitly corresponding to a particular database revision
- Connection uses only loadBefore with that revision to load objects
- every time a Connection is (re)opened, the result of queued invalidations and
  explicit query to storage.lastTransaction is carefully merged to refresh
  Connection's idea about which database state it corresponds to.

The "careful" in last point is important. Historically ZODB5 was first reworked
in commit 227953b (zopefoundation#56) to always
call lastTransaction to refresh state of Connection view. Since there
was no proper synchronisation with respect to process of handling
invalidations, that lead to data corruption issue due to race in
Connection.open() vs invalidations:

zopefoundation#290

That race and data corruption was fixed in commit b5895a5
(zopefoundation#291) by way of avoiding
lastTransaction call and relying only on invalidations channel when
refreshing Connection view.

This fix in turn led to another data corruption issue because in
presence of client-server reconnections, ZODB storage drivers can partly
skip notifying client with detailed invalidation messages:

zopefoundation#291 (comment)

A fix to that issue (zopefoundation#307)
proposed to change back to query storage for lastTransaction on every
Connection refresh, but to implement careful merging of lastTransaction
result and data from invalidation channel. However it was found that the
"careful merging" can work correctly only if we require from storage
drivers a particular ordering of invalidation events wrt lastTransaction
return and result:

zopefoundation#307 (comment)

While ZEO was already complying with that requirements, NEO had to be
fixed to support that:

zopefoundation#307 (comment)
https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec
https://lab.nexedi.com/nexedi/neoppod/commit/96a5c01f

Patch details
-------------

We change Connection._txn_time to be a "before" for the database state
to which Connection view corresponds. This state is hooked to be
initialized and updated in Connection._flush_invalidations - the
function that is called from both explicit Connection (re)open and at
transaction boundaries via Connection.afterCompletion hook.

Objects loading is factored into Connection._load which replaces old
"load + check invalidated + fallback to loadBefore" game in
Connection._setstate.

Connection.open now calls Connection._flush_invalidations
unconditionally - even if it was global cache reset event - because
besides invalidation flushes the latter is now responsible for querying
storage lastTransaction.

TmpStore - a "storage" that provides runtime support for savepoints - is
refactored correspondingly to delegate loading of original objects back
to underlying Connection.

DB.close is modified - similarly to ZODB5 - to release DB's Connections
carefully with preventing connections from DB poll from implicitly
starting new transactions via afterCompletion hook.

ZODB.nxd_patches is introduced to indicate to client software that this
particular patch is present and can be relied upon.

Tests are updated correspondingly. In 227953b Jim talks about
converting many tests - because

	"Lots of tests didn't clean up databases and connections properly"

and because new MVCC approach

	"makes database and connection hygiene a bit more important,
	especially for tests, because a connection will continue to interact
	with storages if it isn't properly closed, which can lead to errors if
	the storage is closed."

but finally implementing automatic cleanup at transaction boundaries
because there are too many tests to fix. We backport only automatic
cleanup + necessary explicit test fixes to keep the diff minimal.

All tests pass. This includes tests for ZODB itself, ZEO and NEO test
over hereby modified ZODB(*), my test programs from

zopefoundation#290	and
zopefoundation/ZEO#155

and ERP5 tests. Upcoming wendelin.core 2 also work with this change.

(*) ZEO, NEO and ERP5 tests fail sometimes, but there is no regression
here because ZEO, NEO and ERP5 tests are failing regularly, and in the
same way, even with unmodified ZODB.

Appendix. zconn_at
------------------

This appendix provides motivation for the backport:

For wendelin.core v2 we need a way to know at which particular database
state application-level ZODB connection is viewing the database. Knowing
that state, WCFS client library interacts with WCFS filesystem server
and, in simple terms, requests the server to provide data as of that
particular database state. Let us call the function that for a client
ZODB connection returns database state corresponding to its database
view zconn_at.

Now here is the problem: zconn_at is relatively easy to implement for
ZODB5 - see e.g. here:

https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-54-ga6a8f5b/lib/zodb.py#L142-181
https://lab.nexedi.com/nexedi/wendelin.core/commit/3bd82127

however, for ZODB4, since its operational models is not
directly MVCC, it is not that straightforward. Still, even for older
ZODB4, for every client connection, there _is_ such at that corresponds
to that connection view of the database.

We need ZODB4 support, because ZODB4 is currently the version that
Nexedi uses, and my understanding is that it will stay like this for not
a small time. I have the feeling that ZODB5 was reworked in better
direction, but without caring enough about quality which resulted in
concurrency bugs with data corruption effects like

zopefoundation#290
zopefoundation/ZEO#155
etc.

Even though the first one is now fixed (but it broke other parts and so
both ZODB had to be fixed again _and_ NEO had to be fixed for that ZODB
fix to work currently), I feel that upgrading into ZODB5 for Nexedi will
require some non-negligible amount of QA work, and thus it is better if
we move step-by-step - even if we eventually upgrade to ZODB5 - it is
better we first migrate wendelin.core 1 -> wendelin.core 2 with keeping
current version of ZODB.

Now please note what would happen if zconn_at gives, even a bit, wrong
answer: wcfs client will ask wcfs server to provide array data as of
different database state compared to current on-client ZODB connection.
This will result in that data accessed via ZBigArray will _not_
correspond to all other data accessed via regular ZODB mechanism.
It is, in other words, a data corruptions.
In some scenarios it can be harmless, but generally it is the worst
that can happen to a database.

It is good to keep in mind ZODB issue290 when imagining corner cases
that zconn_at has to deal with. Even though that issue is ZODB5 only, it
shows what kind of bugs it can be in zconn_at implementation for ZODB4.

Just for the reference: in Wendelin-based systems there is usually constant
stream of database ingestions coming from many places simultaneously. Plus many
activities crunching on the DB at the same time as well. And the more clients a
system handles, the more there will be level-of-concurrency increase. This
means that the problem of correctly handling concurrency issues in zconn_at is
not purely theoretical, but has direct relation to our systems.

--------

With this backport, zconn_at for ZODB4 becomes trivial and robust to implement:

https://lab.nexedi.com/kirr/wendelin.core/blob/484071b3/lib/zodb.py#L183-195

I would like to thank Joshua Wölfel whose internship helped this topic
to shape up:

https://www.erp5.com/project_section/wendelin-ia-project/forum/Joshua-internship-D8b7NNhWfz

/cc @Nexedi, @jwolf083
Signed-off-by: Kirill Smelkov <[email protected]>
@d-maurer
Copy link
Contributor

d-maurer commented Apr 8, 2021

The change has caused a problem with ZEO: "zopefoundation/ZEO#166". The reason is obvious: ZEO does not yet fulfill the requirement that lastTransaction is updated after connection cache invalidation processing.

In "zopefoundation/ZEO#167" I tried to change this but failed: the (ZODB) check_tid_ordering_w_commit tests are now failing. This test uses the tpc_finish callback used by mvccadapter for invalidation processing to ensure that lastTransaction is called at this time. As lastTransaction now changes after invalidation processing, the test sees a value which is deemed too old.

@jmuchemb Do you have an idea how this problem might be fixed?

@jmuchemb
Copy link
Member Author

jmuchemb commented Apr 8, 2021

In "zopefoundation/ZEO#167" I tried to change this but failed: the (ZODB) check_tid_ordering_w_commit tests are now failing. This test uses the tpc_finish callback used by mvccadapter for invalidation processing to ensure that lastTransaction is called at this time. As lastTransaction now changes after invalidation processing, the test sees a value which is deemed too old.

@jmuchemb Do you have an idea how this problem might be fixed?

check_tid_ordering_w_commit was fixed in #316 but no new release of ZODB since then.

@d-maurer
Copy link
Contributor

d-maurer commented Apr 9, 2021 via email

@jmuchemb
Copy link
Member Author

jmuchemb commented Apr 9, 2021

You're not clear about where the bug would be. And if the code of ZODB is correct, I prefer not to complicate just to make the test pass. I don't understand what you suggest to do in MVCCAdapterInstance because check_tid_ordering_w_commit works directly with storages.

Note that check_tid_ordering_w_commit passes with NEO. The difference between NEO & ZEO is about the cache lock:

  • NEO: the lock is outside the cache instance and the caller must acquire it before accessing the cache
  • ZEO: the lock is inside the cache instance and cache methods lock internally

In NEO, we can then do several operations without releasing the lock. And in particular, as you can see within tpc_finish, the callback call & the last_tid update are done with the same acquired lock. The test starts with a load and NEO also acquires the cache lock for that: it means that when the load is processed, the last_tid is guaranteed to be updated (NEO needed that with ZODB<5).

For ZEO, I tried the following change (on top of zopefoundation/ZEO#167):

--- a/src/ZEO/asyncio/client.py
+++ b/src/ZEO/asyncio/client.py
@@ -661,6 +661,7 @@ def tpc_finish_threadsafe(self, future, wait_ready, tid, updates, f):
                 self.protocol.close()
                 self.disconnected(self.protocol)
             else:
+              with cache._lock:
                 # typically performs connection cache invalidations
                 f(tid)
                 # ZODB>=5.6.0 requires that ``LastTid``

and the test passes.

But what I said about ZODB is also true for ZEO: if ZEO is already correct about how it locks the cache, then we should rather change the test.

@d-maurer
Copy link
Contributor

d-maurer commented Apr 9, 2021 via email

@navytux
Copy link
Contributor

navytux commented Apr 13, 2021

@d-maurer, I quickly checked, and while I might be missing something, to me it looks like there should not be the problem you describe with check_tid_ordering_w_commit: there both commit (that calls storage.tpc_finish with callback to sample new tid) and load (that samples new serial and lastTransaction) are run in their own threads in parallel. The logic here is that while tpc_finish callback is run - it is holding a kind of "storage lock". This way thread that calls lastTransaction must be blocked to wait for the tpc_finish thread to first release that lock. And after the storage lock is released tpc_finish is complete and so the notion of last tid is already updated too.

to_join = []
def run_in_thread(func):
t = threading.Thread(target=func)
t.setDaemon(True)
t.start()
to_join.append(t)
started = threading.Event()
finish = threading.Event()
@run_in_thread
def commit():
def callback(tid):
started.set()
tids.append(tid)
finish.wait()
self._storage.tpc_finish(t, callback)
results = {}
started.wait()
attempts = []
attempts_cond = utils.Condition()
def update_attempts():
with attempts_cond:
attempts.append(1)
attempts_cond.notifyAll()
@run_in_thread
def load():
update_attempts()
results['load'] = utils.load_current(self._storage, ZERO)[1]
results['lastTransaction'] = self._storage.lastTransaction()
expected_attempts = 1
if hasattr(self._storage, 'getTid'):
expected_attempts += 1
@run_in_thread
def getTid():
update_attempts()
results['getTid'] = self._storage.getTid(ZERO)
if hasattr(self._storage, 'lastInvalidations'):
expected_attempts += 1
@run_in_thread
def lastInvalidations():
update_attempts()
invals = self._storage.lastInvalidations(1)
if invals:
results['lastInvalidations'] = invals[0][0]
with attempts_cond:
while len(attempts) < expected_attempts:
attempts_cond.wait()
time.sleep(.01) # for good measure :)
finish.set()

Existing storages already confirm to the requiredment to first call the tpc_finish callback, and only after that to set the last tid. And they indeed do so from under holding the "storage lock". Here are some examples:

BaseStorage:

def tpc_finish(self, transaction, f=None):
# It's important that the storage calls the function we pass
# while it still has its lock. We don't want another thread
# to be able to read any updated data until we've had a chance
# to send an invalidation message to all of the other
# connections!
with self._lock:
if transaction is not self._transaction:
raise POSException.StorageTransactionError(
"tpc_finish called with wrong transaction")
try:
if f is not None:
f(self._tid)
u, d, e = self._ude
self._finish(self._tid, u, d, e)
self._clear_temp()

FileStorage:

def tpc_finish(self, transaction, f=None):
with self._files.write_lock():
with self._lock:
if transaction is not self._transaction:
raise StorageTransactionError(
"tpc_finish called with wrong transaction")
try:
tid = self._tid
if f is not None:
f(tid)
self._finish(tid, *self._ude)
self._clear_temp()

MappingStorage

self._commit_lock.acquire()
with self._lock:
self._transaction = transaction
self._tdata = {}
if tid is None:
if self._transactions:
old_tid = self._transactions.maxKey()
else:
old_tid = None
tid = ZODB.utils.newTid(old_tid)
self._tid = tid
# ZODB.interfaces.IStorage
@ZODB.utils.locked(opened)
def tpc_finish(self, transaction, func = lambda tid: None):
if (transaction is not self._transaction):
raise ZODB.POSException.StorageTransactionError(
"tpc_finish called with wrong transaction")
tid = self._tid
func(tid)
tdata = self._tdata
for oid in tdata:
tid_data = self._data.get(oid)
if tid_data is None:
tid_data = BTrees.OOBTree.OOBucket()
self._data[oid] = tid_data
tid_data[tid] = tdata[oid]
self._ltid = tid
self._transactions[tid] = TransactionRecord(tid, transaction, tdata)
self._transaction = None
del self._tdata
self._commit_lock.release()

NEO:

https://lab.nexedi.com/nexedi/neoppod/blob/de0feb4e/neo/client/handlers/master.py#L69-86

Doing last tid update from under lock (and retrieving last tid with also taking the lock) is likely what @jimfulton was talking about "some locking strategy" in c3091e2.

Does that help? Did I missed something?

Thanks again for your work on to bring ZEO5 into a better shape,
Kirill

@d-maurer
Copy link
Contributor

d-maurer commented Apr 13, 2021 via email

@navytux
Copy link
Contributor

navytux commented Apr 14, 2021

Thanks, @d-maurer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants