Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix an internal error in EXPLAIN ANALYZE in some cases #139071

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yuzefovich
Copy link
Member

f2be108 made it so that we embed the plan gist factory into optPlanningCtx that we reuse across queries, which is achieved by initializing it before the execbuild and then resetting right after it. This can be problematic in some EXPLAIN scenarios where we call the captured GetExplainPlan function for cascades and triggers with createIfMissing=true option. That option means that we might attempt to run the execbuilding for a cascade or a trigger at a different point in time than the execbuilding of the main query. In particular, this can be the case for EXPLAIN ANALYZE if the cascade or trigger was short-circuited. (There is a separate issue to highlight these short-circuited cascades and triggers.) Regular EXPLAIN is unaffected because we create all needed plans eagerly in startExec method; also, nested cascades and triggers are unaffected because they didn't capture the gist factory.

This commit fixes the problem by removing the only reference to the reset gist factory from the explain factory.

Fixes: #138974.

Release note: None

f2be108 made it so that we embed the
plan gist factory into `optPlanningCtx` that we reuse across queries,
which is achieved by initializing it before the execbuild and then
resetting right after it. This can be problematic in some EXPLAIN
scenarios where we call the captured `GetExplainPlan` function for
cascades and triggers with `createIfMissing=true` option. That option
means that we might attempt to run the execbuilding for a cascade or a
trigger at a different point in time than the execbuilding of the main
query. In particular, this can be the case for EXPLAIN ANALYZE if the
cascade or trigger was short-circuited. (There is a separate issue to
highlight these short-circuited cascades and triggers.) Regular EXPLAIN
is unaffected because we create all needed plans eagerly in `startExec`
method; also, nested cascades and triggers are unaffected because they
didn't capture the gist factory.

This commit fixes the problem by removing the only reference to the
reset gist factory from the explain factory.

Release note: None
@yuzefovich yuzefovich requested review from mgartner and a team January 14, 2025 21:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner
Copy link
Collaborator

pkg/sql/plan_opt.go line 891 at r1 (raw file):

		opc.gf.Init(f)
		defer func(f exec.Factory) {
			if explainFactory != nil {

I'm worried this closure or explainFactory will be heap allocated. I'll run some tests to try to figure that out.

@mgartner
Copy link
Collaborator

-- commits line 13 at r1:
What does it mean for a cascade or trigger to be short-circuited?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)


-- commits line 13 at r1:

Previously, mgartner (Marcus Gartner) wrote…

What does it mean for a cascade or trigger to be short-circuited?

If we have a DELETE from parent that cascades into child, but the DELETE doesn't actually remove any rows from parent, then we "short-circuit" execution of the child cascade plan since the cascade, in effect, wasn't triggered. Right now under EXPLAIN ANALYZE we would call GetExplainPlan that "augments" the PlanFn of the main execFactory, which might have the gist factory around it.

If the cascade is not short-circuited, then we create plan via the "vanilla" PlanFn of the main execFactory during the query execution, so later on when we populate the EXPLAIN output GetExplainPlan returns already created plan, so we don't hit the gist factory that has been reset.

One possible alternative is to avoid the plan creation with short-circuited cascade. I'm currently working on that and it would also resolve #125509.

@yuzefovich yuzefovich marked this pull request as draft January 14, 2025 22:23
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)


-- commits line 13 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

If we have a DELETE from parent that cascades into child, but the DELETE doesn't actually remove any rows from parent, then we "short-circuit" execution of the child cascade plan since the cascade, in effect, wasn't triggered. Right now under EXPLAIN ANALYZE we would call GetExplainPlan that "augments" the PlanFn of the main execFactory, which might have the gist factory around it.

If the cascade is not short-circuited, then we create plan via the "vanilla" PlanFn of the main execFactory during the query execution, so later on when we populate the EXPLAIN output GetExplainPlan returns already created plan, so we don't hit the gist factory that has been reset.

One possible alternative is to avoid the plan creation with short-circuited cascade. I'm currently working on that and it would also resolve #125509.

That alternative approach seems promising #139078, so let's hold off on this PR for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

explain: internal error with short-circuited post queries in EXPLAIN ANALYZE
3 participants