-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the broadcast joins issues caused by InputFileBlockRule[databricks] #9673
Changes from all commits
2c742c0
314e8ce
2ab7ad7
6cb8167
1da5d52
24e9b7f
69f05f1
6907c38
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,6 +78,22 @@ abstract class GpuBroadcastHashJoinMetaBase( | |
} | ||
} | ||
|
||
// Called in runAfterTagRules for a special post tagging for this broadcast join. | ||
def checkTagForBuildSide(): Unit = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make more sense to move this into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not do that because there are 4 shims for GpuBroadcastJoinMeta, which means I need to duplicate this code 4 times. The current option looks much simpler, only two times. |
||
val Seq(leftChild, rightChild) = childPlans | ||
val buildSideMeta = buildSide match { | ||
case GpuBuildLeft => leftChild | ||
case GpuBuildRight => rightChild | ||
} | ||
// Check both of the conditions to avoid duplicate reason string. | ||
if (!canThisBeReplaced && canBuildSideBeReplaced(buildSideMeta)) { | ||
buildSideMeta.willNotWorkOnGpu("the BroadcastHashJoin this feeds is not on the GPU") | ||
} | ||
if (canThisBeReplaced && !canBuildSideBeReplaced(buildSideMeta)) { | ||
willNotWorkOnGpu("the broadcast for this join must be on the GPU too") | ||
} | ||
} | ||
|
||
def convertToGpu(): GpuExec | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm why do we still need to return true given it's already converted to Gpu case? Given the reason mentioned above is
GPU plans may get incorrect file name or file start or file length from a CPU scan
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be used for two stages during the overiding process. The stage after inserting transitions for row and column may get a InputFileName or a GpuInputFileName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning this issue, we will never get a GpuInputFileName since plan conversion does not happen.