-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-1123] Support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency #2101
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2101 +/- ##
==========================================
+ Coverage 46.61% 46.64% +0.03%
==========================================
Files 166 166
Lines 10695 10699 +4
Branches 977 977
==========================================
+ Hits 4984 4989 +5
Misses 5386 5386
+ Partials 325 324 -1 ☔ View full report in Codecov by Sentry. |
Hi @kerwin-zk , could you take a look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. cc @kerwin-zk
@waitinfuture this feature is not covered by CI, any changes require manual testing, maybe we can put a modified spark tgz in OSS, and write an IT to download it then run tests in CI? |
maybe the Github Action cache is enough? |
@kerwin-zk, could you help to review this pull request? |
...-shuffle/src/main/java/org/apache/spark/shuffle/celeborn/ColumnarHashBasedShuffleWriter.java
Outdated
Show resolved
Hide resolved
Add UT to shuffle writer and reader respectively when schema is null and not null. |
I don't get your point, we need a modified spark binary tgz to test it, also, we can apply patch and rebuild from spark source each time, but the cost is too high |
7097326
to
0d93af9
Compare
…hat cannot be obtained from shuffle dependency
0d93af9
to
5a8926c
Compare
@kerwin-zk, I have added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for @gaochao0509 updates.
What changes were proposed in this pull request?
Support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency.
Why are the changes needed?
When columnar shuffle is enabled, it was found that the shuffle class operator of Spark RDD is not supported. It's recommended to support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
CelebornColumnarShuffleReaderSuite#columnarShuffleReaderNewSerializerInstance
ColumnarHashBasedShuffleWriterSuiteJ#createColumnarShuffleWriter