Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explain: don't create missing postquery plans in some cases #139078

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jan 14, 2025

bc50897 where we added the support
for showing plans of cascades in EXPLAIN output we needed to introduce
an ability to "create post-query plan if missing". This was needed since
in "vanilla" EXPLAIN we never actually run the main query, so the
cascades are never actually planned. We also enabled this logic in two
other places - in EXPLAIN ANALYZE as well as when populating "plan for
stats". (The latter is used in exec stats feature, e.g. as plan column
in system.statement_statistics.) Both of these cases result in
creating post-query plans after the query execution which could lead
to unexpected behavior. One such behavior was observed when we added the
EXPLAIN support for triggers when the txn captured by the trigger plan
function could have been committed. Another case was recently introduced
where we reset the gist factory after the execbuilding the main query,
yet the gist factory was captured by the cascade plan function.

This shows that the mechanism is fragile. Note that the "vanilla"
EXPLAIN code path is unaffected by these cases because

  1. we created all separate exec.Factory objects (in
    execFactory.ConstructExplain and execbuilder.Builder.buildExplain),
    so there is no concern about factories being reset after the
    "main" optimizer plan was created.
  2. the txn in which the EXPLAIN statement runs is still open
    since we're in the middle of the execution of the
    explainPlanNode.

Additionally, we want to highlight the cases when cascades and triggers
didn't actually run because the main query didn't modify any rows. We
can address this desire as well as the fragility of the mechanism by
disallowing "creating post-query plans if missing" in EXPLAIN ANALYZE
which is what this commit does. Now, when we try to emit the plan for
a cascade or a trigger and we realize that we didn't cache the plan, we
won't attempt to create it, and we'll instead add short-circuited
attribute to the output.

Due to the same fragility we also disallow "creating post-query plans if
missing" in the "plan for stats" case. This means that the "plan for
stats" could change depending on the data that the query ran over.
(For example, in one case the cascade could be triggered and in another
the cascade could be skipped ("short-circuited") depending on whether the
main query modified some rows or not.) This is a bit unfortunate because
we would produce different "plan for stats" even though the query is the
same, but it seems like an edge case that we can accept to avoid the
fragility.

Fixes: #125509.
Fixes: #135157.
Fixes: #138974.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

In bc50897 where we added the support
for showing plans of cascades in EXPLAIN output we needed to introduce
an ability to "create post-query plan if missing". This was needed since
in "vanilla" EXPLAIN we never actually run the main query, so the
cascades are never actually planned. We also enabled this logic in two
other places - in EXPLAIN ANALYZE as well as when populating "plan for
stats". (The latter is used in exec stats feature, e.g. as `plan` column
in `system.statement_statistics`.) Both of these cases result in
creating post-query plans _after_ the query execution which could lead
to unexpected behavior. One such behavior was observed when we added the
EXPLAIN support for triggers when the txn captured by the trigger plan
function could have been committed. Another case was recently introduced
where we reset the gist factory after the execbuilding the main query,
yet the gist factory was captured by the cascade plan function.

This shows that the mechanism is fragile. Note that the "vanilla"
EXPLAIN code path is unaffected by these cases because
1. we created all separate exec.Factory objects (in
`execFactory.ConstructExplain` and `execbuilder.Builder.buildExplain`),
so there is no concern about factories being reset after the
"main" optimizer plan was created.
2. the txn in which the EXPLAIN statement runs is still open
since we're in the middle of the execution of the
`explainPlanNode`.

Additionally, we want to highlight the cases when cascades and triggers
didn't actually run because the main query didn't modify any rows. We
can address this desire as well as the fragility of the mechanism by
disallowing "creating post-query plans if missing" in EXPLAIN ANALYZE
which is what this commit does. Now, when we try to emit the plan for
a cascade or a trigger and we realize that we didn't cache the plan, we
won't attempt to create it, and we'll instead add `short-circuited`
attribute to the output.

Due to the same fragility we also disallow "creating post-query plans if
missing" in the "plan for stats" case. This means that the "plan for
stats" could change depending on the data that the query ran over.
(For example, in one case the cascade could be triggered and in another
the cascade could be skipped ("short-circuited") depending on whether the
main query modified some rows or not.) This is a bit unfortunate because
we would produce different "plan for stats" even though the query is the
same, but it seems like an edge case that we can accept to avoid the
fragility.

Release note: None
@yuzefovich yuzefovich changed the title explain: indicate short-circuited cascades and triggers explain: don't create missing postquery plans in some cases Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants