Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-24.3: colexec: fix memory leak in simpleProjectOp #139095

Open
wants to merge 1 commit into
base: release-24.3
Choose a base branch
from

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Jan 15, 2025

Backport 1/1 commits from #138804 on behalf of @yuzefovich.

/cc @cockroachdb/release


This commit fixes a bounded memory leak in simpleProjectOp that has
been present since 20.2 version. The leak was introduced via the
combination of

  • 57eb4f8 in which we began tracking
    all batches seen by the operator so that we could decide whether we
    need to allocate a fresh projecting batch or not
  • 895125b in which we started using the
    dynamic sizing for batches (where we'd start with size 1 and grow
    exponentially until 1024 while previously we would always use 1024).

Both changes combined made it so that the simpleProjectOp would keep
all batches with different sizes alive until the query shutdown. The
problem later got more exacerbated when we introduced dynamic batch
sizing all over the place (for example, in the spilling queue).

Let's discuss the reasoning for why we needed something like the
tracking we did in the first change mentioned above. The simple project
op works by putting a small wrapper (a "lense") projectingBatch over
the batch coming from the input. That wrapper can be modified later by
other operators (for example, a new vector might be appended), which
would also modify the original batch coming from the input, so we need
to allow for the wrapper to be mutated accordingly. At the same time,
the input operator might decide to produce a fresh batch (for example,
because of the dynamically growing size) which wouldn't have the same
"upstream" mutation applied to it, so we need to allocate a fresh
projecting batch and let the upstream do its modifications. In the first
change we solved it by keeping a map from the input batch to the
projecting batch.

This commit addresses the same issue by only checking whether the input
batch is the same one as was seen by the simpleProjectOp on the
previous call. If they are the same, then we can reuse the last
projecting batch; otherwise, we need to allocate a fresh one and memoize
it. In other words, we effectively reduce the map to have at most one
entry.

This means that with dynamic batch size growth we'd create a few
projectingBatch wrappers, but that is ok given that we expect the
dynamic size heuristics to quickly settle on the size.

This commit adjusts the contract of Operator.Next to explicitly
mention that implementations should reuse the same batch whenever
possible. This is already the case pretty much everywhere, except when
the dynamic batch size grows or shrinks.

Before we introduced the dynamic batch sizing, different batches could
only be produced by the unordered synchronizers, so that's another place
this commit adjusts. If we didn't do anything, then the simplistic
mapping with at most one entry could result in "thrashing" - i.e. in the
extreme case where the inputs to the synchronizer would produce batches
in round-robin fashion, we'd end up creating a new projectingBatch
every time which would be quite wasteful. In this commit we modify both
the parallel and the serial unordered synchronizers to always emit the
same output batch which is populated by manually inserting vectors from
the input batch.

I did some manual testing on the impact of this change. I used a table
and a query (with some window functions) from the customer cluster that
has been seeing some OOMs. I had one node cluster running locally with
--max-go-memory=256MiB (so that the memory needed by the query was
forcing GC all the time since it exceeded the soft memory limit) and
distsql_workmem=128MiB (since we cannot spill to disk some state in
the row-by-row window functions). Before this patch I observed the max
RAM usage at 1.42GB, and after this patch 0.56GB (the latter is pretty
close to what we report on EXPLAIN ANALYZE).

Fixes: #138803.

Release note (bug fix): Bounded memory leak that could previously occur
when evaluating some memory-intensive queries via the vectorized engine
in CockroachDB has now been fixed. The leak has been present since 20.2
version.


Release justification: bug fix.

@blathers-crl blathers-crl bot requested a review from a team as a code owner January 15, 2025 02:32
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-24.3-138804 branch from d018f27 to 0bc527a Compare January 15, 2025 02:32
@blathers-crl blathers-crl bot requested review from rytaft and removed request for a team January 15, 2025 02:32
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Jan 15, 2025
Copy link
Author

blathers-crl bot commented Jan 15, 2025

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Jan 15, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

This commit fixes a bounded memory leak in `simpleProjectOp` that has
been present since 20.2 version. The leak was introduced via the
combination of
- 57eb4f8 in which we began tracking
all batches seen by the operator so that we could decide whether we
need to allocate a fresh projecting batch or not
- 895125b in which we started using the
dynamic sizing for batches (where we'd start with size 1 and grow
exponentially until 1024 while previously we would always use 1024).

Both changes combined made it so that the `simpleProjectOp` would keep
_all_ batches with different sizes alive until the query shutdown. The
problem later got more exacerbated when we introduced dynamic batch
sizing all over the place (for example, in the spilling queue).

Let's discuss the reasoning for why we needed something like the
tracking we did in the first change mentioned above. The simple project
op works by putting a small wrapper (a "lense") `projectingBatch` over
the batch coming from the input. That wrapper can be modified later by
other operators (for example, a new vector might be appended), which
would also modify the original batch coming from the input, so we need
to allow for the wrapper to be mutated accordingly. At the same time,
the input operator might decide to produce a fresh batch (for example,
because of the dynamically growing size) which wouldn't have the same
"upstream" mutation applied to it, so we need to allocate a fresh
projecting batch and let the upstream do its modifications. In the first
change we solved it by keeping a map from the input batch to the
projecting batch.

This commit addresses the same issue by only checking whether the input
batch is the same one as was seen by the `simpleProjectOp` on the
previous call. If they are the same, then we can reuse the last
projecting batch; otherwise, we need to allocate a fresh one and memoize
it. In other words, we effectively reduce the map to have at most one
entry.

This means that with dynamic batch size growth we'd create a few
`projectingBatch` wrappers, but that is ok given that we expect the
dynamic size heuristics to quickly settle on the size.

This commit adjusts the contract of `Operator.Next` to explicitly
mention that implementations should reuse the same batch whenever
possible. This is already the case pretty much everywhere, except when
the dynamic batch size grows or shrinks.

Before we introduced the dynamic batch sizing, different batches could
only be produced by the unordered synchronizers, so that's another place
this commit adjusts. If we didn't do anything, then the simplistic
mapping with at most one entry could result in "thrashing" - i.e. in the
extreme case where the inputs to the synchronizer would produce batches
in round-robin fashion, we'd end up creating a new `projectingBatch`
every time which would be quite wasteful. In this commit we modify both
the parallel and the serial unordered synchronizers to always emit the
same output batch which is populated by manually inserting vectors from
the input batch.

I did some manual testing on the impact of this change. I used a table
and a query (with some window functions) from the customer cluster that
has been seeing some OOMs. I had one node cluster running locally with
`--max-go-memory=256MiB` (so that the memory needed by the query was
forcing GC all the time since it exceeded the soft memory limit) and
`distsql_workmem=128MiB` (since we cannot spill to disk some state in
the row-by-row window functions). Before this patch I observed the max
RAM usage at 1.42GB, and after this patch 0.56GB (the latter is pretty
close to what we report on EXPLAIN ANALYZE).

Release note (bug fix): Bounded memory leak that could previously occur
when evaluating some memory-intensive queries via the vectorized engine
in CockroachDB has now been fixed. The leak has been present since 20.2
version.
@yuzefovich yuzefovich force-pushed the blathers/backport-release-24.3-138804 branch from 0bc527a to 7026921 Compare January 15, 2025 02:43
@yuzefovich yuzefovich requested review from mgartner and removed request for rytaft and DrewKimball January 15, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants