-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable bulk-memory by default #22873
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! So awesome to see this land!
@sbc100 @tlively @aheejin @kripken for opinions: edit: I should clarify that the case that would fail here with the current logic is when the build targets Safari 14.1 (the current default, but would be done manually in the future). The default behavior going forward would actually be the same as it is now, except that bulk memory would be enabled (i.e. the lowering pass would not be run, and linking an atomics-using library would cause the resulting binary to have atomics). More generally speaking, or feature-enabling code in emscripten is a little inconsistent. e.g. with the current logic enabling WASM_BIGINT will automatically cause bulk memory and nontrapping-fp to be enabled because it implicitly causes Safari 15 to be selected, and then that selection determines whether the lowering passes run. I find that a little surprising, and there currently isn't a way to override it other then selecting a different browser version manually. Also the -mbulk-memory et al. feature flags don't work at link time to select features at a fine grain, only the browser versions work. That seems kind of bad to me, but might also be a pain to fix, for not much benefit. |
Why not just allow the pass to run in this case? The pass related to memory.copy and memory.fill only right? The passive segements and memory.init would be unaffected, no? |
Thats a good question.. I can't remember the conclusion but I do remember that updated the spec such that browser would allow atomic instruction even in single threaded builds, so maybe this was why we did that. But if we want to support older browsers we would still want to be able to lower those atomics away at link time I think. How many such libraries do we have? Its certainly nice to be able to build libraries just once and avoid the |
Ah, right I forgot about allowing atomics in single-threaded builds. That makes sense then, that the default behavior for the default targets (which support bulk and atomics) would be to just allow atomics to pass through anytime. So for older browsers we currently do not support lowering away atomics at link time (only at compile time). Our current default target (Safari 14.1) actually does support atomics, but I guess we don't support targeting even older browsers while linking in atomic-using libraries. There aren't many emscripten bulitin library variants built this way (freetype might be the only one actually) so maybe this isn't a big problem. If we want to lean further into this direction, we could potentially even remove some '-mt' variants and just always link atomic versions.
We could do it, it would just be a pessimization for no reason; there are no targets AFAIK that support passive segments but not memory.copy/fill. |
After some discussion with @sbc100 we might do the following:
|
OK, I think the bulk memory part of this patch is ready; the tests are working and the comments so far are addressed. As written, this PR does not update the default features, it only builds with build memory and turns on the lowering. This means the lowering will run by default. This isn't a state we want to release with, but it means we can land the changes for nontrapping-fp and updating the default separately. The other option is to add to this PR to enable the nontrapping-fp lowering (once we have test_sse fixed), and then update the defaults. |
tools/feature_matrix.py
Outdated
@@ -156,6 +163,8 @@ def apply_min_browser_versions(): | |||
if settings.PTHREADS: | |||
enable_feature(Feature.THREADS, 'pthreads') | |||
enable_feature(Feature.BULK_MEMORY, 'pthreads') | |||
if settings.WASM_WORKERS or settings.SHARED_MEMORY: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe else
here? otherwise the feature will be add twice in pthread mode (which also enables SHARED_MEMORY)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Although I don't think it actually matters, it should be safe to enable it more than once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
This should fix the recent CI failures on the test-node-compat bot. Once #22873 lands this can, of course, be removed.
This should fix the recent CI failures on the test-node-compat bot. Once emscripten-core#22873 lands this can, of course, be removed.
system/lib/libc/emscripten_memcpy.c
Outdated
@@ -8,21 +8,25 @@ | |||
#include "libc.h" | |||
#include "emscripten_internal.h" | |||
|
|||
#if !defined(__wasm_bulk_memory__) | |||
#error "This file must be compile with bulk memory enabled" | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just drop these 3 lines. I'm not sure what they achieve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, they were more useful when I had inline asm in this file.
OK, I think this is actually completely working now. And I just realized @kripken doesn't have any comments so +cc just in case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm otherwise
@@ -15,6 +15,7 @@ function e(b) { | |||
for (var v, p = 0, t = a, w = f.length, y = a + (3 * w >> 2) - ("=" == f[w - 2]) - ("=" == f[w - 1]); p < w; p += 4) a = m[f.charCodeAt(p + 1)], | |||
v = m[f.charCodeAt(p + 2)], c[t++] = m[f.charCodeAt(p)] << 2 | a >> 4, t < y && (c[t++] = a << 4 | v >> 2), | |||
t < y && (c[t++] = v << 6 | m[f.charCodeAt(p + 3)]); | |||
return c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea what this is? Looks odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that does look odd. Hard to know why without knowing which function this actually is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's plausible that this is some function that in some way uses bulk memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, based on the charCodeAt
and << 6
etc., this looks like base64Decode
:
emscripten/src/base64Decode.js
Lines 38 to 58 in 4cbcb26
function base64Decode(b64) { | |
#if ENVIRONMENT_MAY_BE_NODE | |
if (typeof ENVIRONMENT_IS_NODE != 'undefined' && ENVIRONMENT_IS_NODE) { | |
var buf = Buffer.from(b64, 'base64'); | |
return new Uint8Array(buf.buffer, buf.byteOffset, buf.byteLength); | |
} | |
#endif | |
#if ASSERTIONS | |
assert(b64.length % 4 == 0); | |
#endif | |
var b1, b2, i = 0, j = 0, bLength = b64.length, output = new Uint8Array((bLength*3>>2) - (b64[bLength-2] == '=') - (b64[bLength-1] == '=')); | |
for (; i < bLength; i += 4, j += 3) { | |
b1 = base64ReverseLookup[b64.charCodeAt(i+1)]; | |
b2 = base64ReverseLookup[b64.charCodeAt(i+2)]; | |
output[j] = base64ReverseLookup[b64.charCodeAt(i)] << 2 | b1 >> 4; | |
output[j+1] = b1 << 4 | b2 >> 2; | |
output[j+2] = b2 << 6 | base64ReverseLookup[b64.charCodeAt(i+3)]; | |
} | |
return output; | |
} |
That doesn't use bulk memory AFAICT. Strange that it changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyhow, it must be a weird closure quirk, as x()
is called once and the return value isn't used.
I did find one other issue that I won't get fixed today, so documenting it here.
Either would be straightforward to implement, I haven't thought about which yet. |
I think this makes the most sense. The lowering pass is really only designed to llvm output and I think if there are any bulk-memory usages outside of bulk-memory-core (such as active segements) then its ok for the binaryn pass to simply error out. This means it should also be fine to strip the data count section. Actually I think either approach is fine but that binaryen change will be simpler. |
I'm pretty sure test_offset_converter wasm64_4g is flake, I'm going to land this. |
Yay! |
Did we forget to update the Changelog for this? I wonder if we should just combine all 3 of the new features into a single changelog entry for 4.0? |
Ah yes we did. I can roll that into #23312 |
It looks like we still have this in emcc.py:
|
_emscripten_memcpy_js
and fold memcpy and memset into libc-mno-bulk-memory
at compile time (enabling it in object files)See #23184