-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch from #30491 "Generate right-length node name" seems invalid #40256
Comments
/cc @mmusgrov (narayana) |
Interesting fact: |
Interesting but it should easy to fix, perhaps by using a different encoding or otherwise.
ArjunaCore is a toolkit for implementing various transaction protocols but some integrations add further constraints on the length, the most restrictive of which is kubernetes which further restricts it to 23 characters because of a uniqueness restriction caused by the use of stateful sets.
This is the standard length and is sufficient for working within the constraints of the Xid data structure, since we embed the nodeIdentifier in the Xid bytes so that the recovery manager can determine whether or not it is responsible for recovering it - the only platform we've come across where this length can't be used is kubernetes.
I don't understand this statement. |
Look how it gets initialized in TxControl without further proof! |
My point was that I didn't understand what you wrote because I was unable to parse the grammar. |
But you got my point now? |
I don't recall you being this uncooperative and obstinate in our previous interactions so let's avoid conversation and I will just use your bug description, thanks for the initial report. |
The problem here is when the original array of bytes are not UTF-8 encoded. I wonder if using an array of byte instead of a String for the nodeName wouldn't be better here. |
Why not avoid UTF-8 characters and just change the charset to one that will give you back the 28 or less bytes. The standard one would be ISO-LATIN-1 (java.nio.charset.StandardCharsets.ISO_8859_1). |
Note that the node name no longer needs to be readable by a human (which historically has been useful for debugging) since it gets hashed. |
This works because:
|
@mmusgrov even if the byte array is encoded in ISO/IEC 8859-1 I don't think that the resulting SHA-224 will be a ISO/IEC 8859-1 byte array. I picked Base64 but ISO/IEC 8859-1 would be ok too. |
@marcosgopen The bug report says that the UTF bytes for the SHA-224 hash produced by narayana-jta extension is longer than the 28 byte limit. But the bug report is trying to read the bytes back using UTF_8 even though UTF-8 is a variable-length character encoding, but if Benjamin uses ISO/IEC 8859-1 to get the bytes it will produce a byte array of the correct length. I ran a java version of Benjamin's Groovy test and if you run it too then you will find that the digest is 28 bytes, the UTF-8 bytes is about double that, depending on the input string, and the ISO/IEC 8859-1 bytes is less that or equal to 28, as we expect:
|
I declined #40613 because there is no bug here and the issue should be closed as not a bug. |
Thank you Mike for your clarification. I assume then that we need to change the way the TxControl reads the bytes here: https://github.com/jbosstm/narayana/blob/main/ArjunaCore/arjuna/classes/com/arjuna/ats/arjuna/coordinator/TxControl.java#L147 (invoked at https://github.com/quarkusio/quarkus/blob/main/extensions/narayana-jta/runtime/src/main/java/io/quarkus/narayana/jta/runtime/NarayanaJtaRecorder.java#L51) |
Yes that's correct, we'll need to use ISO_8859_1 as well, when you create the PR will you include a test. |
@marcosgopen We'll need to release in a major because it will break recovery unless we add a config flag since the nodeIdentifier will change and we won't be able to detect orphans from previous runs. |
Perhaps it would also help to use the ISO_8859_1 charset when quarkus shortens the node name. So I think that the following change to line 62 should do it:
|
@mmusgrov I tested your solution but unfortunately it doesn't work because when bytes are encoded in a different Charset the length can increase. Which this is the case. |
Here is the test to demonstrate that changing quarkus to use ISO_8859_1 will work as expected:
|
Sorry @mmusgrov , I think my previous comment was not so clear. So if we want to use your snippet we would need to change the TxControl and make a release. |
Ah I see, my original code snippet showed that problem too, I don't why I didn't spot it. So it looks like the only way around it is to change Narayana to read it using ISO_8859_1. But that would need to go into a major release since it could break orphan detection. |
And we'd need to provide users with a migration path. |
That will impact uniqueness. We'll just have to "bite the bullet" and provide a migration path for our users. |
The current hash using SHA-224 already presents potential duplicates and rehashing it a second time increases that risk. |
Every hash has a potential to generate duplicates. I did some calculation on the original issue. The probability for a SHA224 to collide was astronomical small. We can afford to loose some order of magnitudes. Haven't done the calculation for base64 encoding, though. |
Okay thanks, note that as well as rehashing a second time, Marco's PR is truncating the hash which further increases the risk of duplicates. |
Even if we effectively half the entropy we have, we still need something at the order of |
So @graben I assume that your original cryptic responses to my questions on this issue were referring to how Narayana works rather than how quarkus works and if so we all could have avoided a lot of wasted effort if you'd raised the issue against Narayana in the first place. |
@mmusgrov : sorry for the confusion. I was not sure whether the solution in Quarkus is buggy or Narayana needs improvement. Later one looked okay from my first point of view. :-) |
Ok @marcosgopen it looks like we'll be needing your PR after all. |
ok. And thanks again for bringing the issue to our attention. |
Done. Added a single remark. |
@marcosgopen So let's still go with your PR, but in the meantime I will raise a narayana issue to allow callers to specify the character set: [Edit] I raised issue JBTM-3883 to allow the charset to be specified when setting the narayana nodeIdentifier config property. |
I did, but we'll need a better title for it as for a Quarkus maintainer pov (and also a user looking at change logs), the title does not tell me anything :) |
@marcosgopen Can you rename the title to something like "Ensure the Transaction Manager node name is less than or equal to 28 bytes", it's up to you whether or not to update the description to indicate why the limit exists (because we need to include it in the fixed size Xid data structure so that the TM can determine which XA resource branches it is responsible for). |
I can see the PR is open now. @mmusgrov I updated the title and the description. |
Describe the bug
A port of #30491 to the narayana-spring-boot project snowdrop/narayana-spring-boot#136 is actually causing enlisting issues with databases like MariaDB (snowdrop/narayana-spring-boot#140).
It seems that the SHA-224 hash is indeed 28 bytes long, but the UTF-8 String created from it is quite bigger, somewhere between 40 and 60 bytes. For MariaDB this is causing too large XID Strings created by Narayana while starting XA resource during enlisting. In my opinion this behaviour should also affect Quarkus (not verified!)
Small Groovy script to proove length of shortend node name to exceed 28 byte barrier
Expected behavior
Creation of a valid shortend node name
Actual behavior
Broken node name as described
How to Reproduce?
Enlist XAResource of MariaDB connection into Narayana JTA transaction.
Output of
uname -a
orver
No response
Output of
java -version
No response
Quarkus version or git rev
No response
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response
The text was updated successfully, but these errors were encountered: