Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1841] Support custom implementation of EventExecutorChooser to avoid deadlock when calling await in EventLoop thread #3071

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

littlexyw
Copy link

@littlexyw littlexyw commented Jan 19, 2025

What changes were proposed in this pull request?

Support custom implementation of EventExecutorChooser to avoid deadlock when calling await in EventLoop thread

Why are the changes needed?

In Flink Celeborn Client, you can create a new connection in the EventLoop thread. To wait for the connection to complete, cf.await is called, which can cause a deadlock because the thread bound to the newly connected channel may be the same as the current EventLoop thread. The current thread is suspended by wait and needs to wait for the current thread to notify. This change is to avoid binding the same thread.

Does this PR introduce any user-facing change?

celeborn..io.conflictAvoidChooser.enable is introduced.

How was this patch tested?

manual test

… to avoid deadlock when calling await in EventLoop thread
s"it works for replicate client of worker replicating data to peer worker.")
.booleanConf
.createWithDefault(false)

Copy link
Contributor

@zaynt4606 zaynt4606 Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to run the following command to refresh docs if there are configuration changes.

UPDATE=1 build/mvn clean test -pl common -am -Dtest=none -DwildcardSuites=org.apache.celeborn.ConfigurationSuite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants