Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1123] Support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency #2101

Closed
wants to merge 1 commit into from

Conversation

gaochao0509
Copy link
Contributor

@gaochao0509 gaochao0509 commented Nov 15, 2023

What changes were proposed in this pull request?

Support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency.

Why are the changes needed?

When columnar shuffle is enabled, it was found that the shuffle class operator of Spark RDD is not supported. It's recommended to support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • CelebornColumnarShuffleReaderSuite#columnarShuffleReaderNewSerializerInstance
  • ColumnarHashBasedShuffleWriterSuiteJ#createColumnarShuffleWriter

Copy link

codecov bot commented Nov 15, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7263f64) 46.61% compared to head (0d93af9) 46.64%.
Report is 5 commits behind head on main.

❗ Current head 0d93af9 differs from pull request most recent head 5a8926c. Consider uploading reports for the commit 5a8926c to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2101      +/-   ##
==========================================
+ Coverage   46.61%   46.64%   +0.03%     
==========================================
  Files         166      166              
  Lines       10695    10699       +4     
  Branches      977      977              
==========================================
+ Hits         4984     4989       +5     
  Misses       5386     5386              
+ Partials      325      324       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@waitinfuture
Copy link
Contributor

Hi @kerwin-zk , could you take a look at this PR?

Copy link
Member

@SteNicholas SteNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. cc @kerwin-zk

@pan3793
Copy link
Member

pan3793 commented Nov 16, 2023

@waitinfuture this feature is not covered by CI, any changes require manual testing, maybe we can put a modified spark tgz in OSS, and write an IT to download it then run tests in CI?

@cfmcgrady
Copy link
Contributor

@waitinfuture this feature is not covered by CI, any changes require manual testing, maybe we can put a modified spark tgz in OSS, and write an IT to download it then run tests in CI?

maybe the Github Action cache is enough?

@gaochao0509
Copy link
Contributor Author

@kerwin-zk, could you help to review this pull request?

@kerwin-zk
Copy link
Contributor

Add UT to shuffle writer and reader respectively when schema is null and not null.

@pan3793
Copy link
Member

pan3793 commented Nov 17, 2023

maybe the Github Action cache is enough?

I don't get your point, we need a modified spark binary tgz to test it, also, we can apply patch and rebuild from spark source each time, but the cost is too high

…hat cannot be obtained from shuffle dependency
@gaochao0509
Copy link
Contributor Author

@kerwin-zk, I have added CelebornColumnarShuffleReaderSuite#columnarShuffleReaderNewSerializerInstance and ColumnarHashBasedShuffleWriterSuiteJ#createColumnarShuffleWriter. PTAL.

Copy link
Member

@SteNicholas SteNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for @gaochao0509 updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants