colexec: fix memory leak in simpleProjectOp #138804

yuzefovich · 2025-01-10T07:01:16Z

This commit fixes a bounded memory leak in simpleProjectOp that has
been present since 20.2 version. The leak was introduced via the
combination of

57eb4f8 in which we began tracking
all batches seen by the operator so that we could decide whether we
need to allocate a fresh projecting batch or not
895125b in which we started using the
dynamic sizing for batches (where we'd start with size 1 and grow
exponentially until 1024 while previously we would always use 1024).

Both changes combined made it so that the simpleProjectOp would keep
all batches with different sizes alive until the query shutdown. The
problem later got more exacerbated when we introduced dynamic batch
sizing all over the place (for example, in the spilling queue).

Let's discuss the reasoning for why we needed something like the
tracking we did in the first change mentioned above. The simple project
op works by putting a small wrapper (a "lense") projectingBatch over
the batch coming from the input. That wrapper can be modified later by
other operators (for example, a new vector might be appended), which
would also modify the original batch coming from the input, so we need
to allow for the wrapper to be mutated accordingly. At the same time,
the input operator might decide to produce a fresh batch (for example,
because of the dynamically growing size) which wouldn't have the same
"upstream" mutation applied to it, so we need to allocate a fresh
projecting batch and let the upstream do its modifications. In the first
change we solved it by keeping a map from the input batch to the
projecting batch.

This commit addresses the same issue by only checking whether the input
batch is the same one as was seen by the simpleProjectOp on the
previous call. If they are the same, then we can reuse the last
projecting batch; otherwise, we need to allocate a fresh one and memoize
it. In other words, we effectively reduce the map to have at most one
entry.

This means that with dynamic batch size growth we'd create a few
projectingBatch wrappers, but that is ok given that we expect the
dynamic size heuristics to quickly settle on the size.

This commit adjusts the contract of Operator.Next to explicitly
mention that implementations should reuse the same batch whenever
possible. This is already the case pretty much everywhere, except when
the dynamic batch size grows or shrinks.

Before we introduced the dynamic batch sizing, different batches could
only be produced by the unordered synchronizers, so that's another place
this commit adjusts. If we didn't do anything, then the simplistic
mapping with at most one entry could result in "thrashing" - i.e. in the
extreme case where the inputs to the synchronizer would produce batches
in round-robin fashion, we'd end up creating a new projectingBatch
every time which would be quite wasteful. In this commit we modify both
the parallel and the serial unordered synchronizers to always emit the
same output batch which is populated by manually inserting vectors from
the input batch.

I did some manual testing on the impact of this change. I used a table
and a query (with some window functions) from the customer cluster that
has been seeing some OOMs. I had one node cluster running locally with
--max-go-memory=256MiB (so that the memory needed by the query was
forcing GC all the time since it exceeded the soft memory limit) and
distsql_workmem=128MiB (since we cannot spill to disk some state in
the row-by-row window functions). Before this patch I observed the max
RAM usage at 1.42GB, and after this patch 0.56GB (the latter is pretty
close to what we report on EXPLAIN ANALYZE).

Fixes: #138803.

Release note (bug fix): Bounded memory leak that could previously occur
when evaluating some memory-intensive queries via the vectorized engine
in CockroachDB has now been fixed. The leak has been present since 20.2
version.

blathers-crl · 2025-01-10T07:01:20Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2025-01-10T07:01:26Z

This change is

michae2

Nice fix!

Reviewed 11 of 11 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)

pkg/sql/colexec/parallel_unordered_synchronizer.go line 106 at r1 (raw file):

	// simpleProjectOp which needs to decide whether to allocate a fresh
	// projectingBatch or not (which it needs to do whenever it observes a
	// particular batch for the first time).

Is there a way this reasoning could be turned into / phrased as a contract for all colexecop.Operator that is independent of the specifics of simpleProjectOp and ParallelUnorderedSynchronizer?

Maybe something like "operators should try to produce the same coldata.Batch across multiple calls to Next if _ hasn't changed, as a hint to other operators" or maybe the inverse: "operators should be sure to emit a different coldata.Batch if _ has changed because _"?

(I ask, because coupling together the behavior of two specific operators reduces the power of the Operator abstraction. When possible, it's nice to instead increase the power of the Operator abstraction by specifying the contract for all Operators in more detail. Then each operator can be reasoned about in isolation, as long as it maintains the contract.)

pkg/sql/colexec/parallel_unordered_synchronizer.go line 457 at r1 (raw file):

			}
			for i, vec := range msg.b.ColVecs() {
				s.outputBatch.ReplaceCol(vec, i)

Checking if I understand: so if a later operator added a vector to this batch, this mutation is still safe because it's only replacing vectors that this operator knows about?

This commit fixes a bounded memory leak in `simpleProjectOp` that has been present since 20.2 version. The leak was introduced via the combination of - 57eb4f8 in which we began tracking all batches seen by the operator so that we could decide whether we need to allocate a fresh projecting batch or not - 895125b in which we started using the dynamic sizing for batches (where we'd start with size 1 and grow exponentially until 1024 while previously we would always use 1024). Both changes combined made it so that the `simpleProjectOp` would keep _all_ batches with different sizes alive until the query shutdown. The problem later got more exacerbated when we introduced dynamic batch sizing all over the place (for example, in the spilling queue). Let's discuss the reasoning for why we needed something like the tracking we did in the first change mentioned above. The simple project op works by putting a small wrapper (a "lense") `projectingBatch` over the batch coming from the input. That wrapper can be modified later by other operators (for example, a new vector might be appended), which would also modify the original batch coming from the input, so we need to allow for the wrapper to be mutated accordingly. At the same time, the input operator might decide to produce a fresh batch (for example, because of the dynamically growing size) which wouldn't have the same "upstream" mutation applied to it, so we need to allocate a fresh projecting batch and let the upstream do its modifications. In the first change we solved it by keeping a map from the input batch to the projecting batch. This commit addresses the same issue by only checking whether the input batch is the same one as was seen by the `simpleProjectOp` on the previous call. If they are the same, then we can reuse the last projecting batch; otherwise, we need to allocate a fresh one and memoize it. In other words, we effectively reduce the map to have at most one entry. This means that with dynamic batch size growth we'd create a few `projectingBatch` wrappers, but that is ok given that we expect the dynamic size heuristics to quickly settle on the size. This commit adjusts the contract of `Operator.Next` to explicitly mention that implementations should reuse the same batch whenever possible. This is already the case pretty much everywhere, except when the dynamic batch size grows or shrinks. Before we introduced the dynamic batch sizing, different batches could only be produced by the unordered synchronizers, so that's another place this commit adjusts. If we didn't do anything, then the simplistic mapping with at most one entry could result in "thrashing" - i.e. in the extreme case where the inputs to the synchronizer would produce batches in round-robin fashion, we'd end up creating a new `projectingBatch` every time which would be quite wasteful. In this commit we modify both the parallel and the serial unordered synchronizers to always emit the same output batch which is populated by manually inserting vectors from the input batch. I did some manual testing on the impact of this change. I used a table and a query (with some window functions) from the customer cluster that has been seeing some OOMs. I had one node cluster running locally with `--max-go-memory=256MiB` (so that the memory needed by the query was forcing GC all the time since it exceeded the soft memory limit) and `distsql_workmem=128MiB` (since we cannot spill to disk some state in the row-by-row window functions). Before this patch I observed the max RAM usage at 1.42GB, and after this patch 0.56GB (the latter is pretty close to what we report on EXPLAIN ANALYZE). Release note (bug fix): Bounded memory leak that could previously occur when evaluating some memory-intensive queries via the vectorized engine in CockroachDB has now been fixed. The leak has been present since 20.2 version.

yuzefovich

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @michae2)

pkg/sql/colexec/parallel_unordered_synchronizer.go line 106 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Is there a way this reasoning could be turned into / phrased as a contract for all colexecop.Operator that is independent of the specifics of simpleProjectOp and ParallelUnorderedSynchronizer?

Maybe something like "operators should try to produce the same coldata.Batch across multiple calls to Next if _ hasn't changed, as a hint to other operators" or maybe the inverse: "operators should be sure to emit a different coldata.Batch if _ has changed because _"?

(I ask, because coupling together the behavior of two specific operators reduces the power of the Operator abstraction. When possible, it's nice to instead increase the power of the Operator abstraction by specifying the contract for all Operators in more detail. Then each operator can be reasoned about in isolation, as long as it maintains the contract.)

Good point. Adjusted the contract and mentioned it in the comment.

pkg/sql/colexec/parallel_unordered_synchronizer.go line 457 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Checking if I understand: so if a later operator added a vector to this batch, this mutation is still safe because it's only replacing vectors that this operator knows about?

Yes, exactly. At this point we are dealing with coldata.Batch and not projectingBatch, so we are modifying the vectors in exactly the same positions as they are in the input batch. If an upstream operator has appended a vector to outputBatch, that vector won't be affected in any case.

craig · 2025-01-15T02:31:55Z

Build succeeded:

blathers-crl · 2025-01-15T02:32:13Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from d264306 to blathers/backport-release-23.2-138804: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.

error creating merge commit from d264306 to blathers/backport-release-24.1-138804: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.1.x failed. See errors above.

error creating merge commit from d264306 to blathers/backport-release-24.2-138804: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.2.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

yuzefovich force-pushed the fix-project-op branch 2 times, most recently from b7fee6c to f3b94a4 Compare January 14, 2025 02:13

yuzefovich added backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3 labels Jan 14, 2025

yuzefovich force-pushed the fix-project-op branch from f3b94a4 to e9980a1 Compare January 14, 2025 02:27

yuzefovich requested review from michae2 and DrewKimball January 14, 2025 03:11

yuzefovich marked this pull request as ready for review January 14, 2025 03:11

yuzefovich requested a review from a team as a code owner January 14, 2025 03:11

michae2 approved these changes Jan 14, 2025

View reviewed changes

yuzefovich force-pushed the fix-project-op branch from e9980a1 to d264306 Compare January 15, 2025 01:27

yuzefovich commented Jan 15, 2025

View reviewed changes

craig bot merged commit 5b7eddf into cockroachdb:master Jan 15, 2025
19 checks passed

blathers-crl bot mentioned this pull request Jan 15, 2025

release-24.3: colexec: fix memory leak in simpleProjectOp #139095

Open

This was referenced Jan 15, 2025

release-24.2: colexec: fix memory leak in simpleProjectOp #139096

Open

release-24.1: colexec: fix memory leak in simpleProjectOp #139097

Open

release-23.2: colexec: fix memory leak in simpleProjectOp #139098

Open

yuzefovich deleted the fix-project-op branch January 15, 2025 02:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: fix memory leak in simpleProjectOp #138804

colexec: fix memory leak in simpleProjectOp #138804

yuzefovich commented Jan 10, 2025 •

edited

Loading

blathers-crl bot commented Jan 10, 2025

cockroach-teamcity commented Jan 10, 2025

michae2 left a comment

yuzefovich left a comment

craig bot commented Jan 15, 2025

blathers-crl bot commented Jan 15, 2025

colexec: fix memory leak in simpleProjectOp #138804

colexec: fix memory leak in simpleProjectOp #138804

Conversation

yuzefovich commented Jan 10, 2025 • edited Loading

blathers-crl bot commented Jan 10, 2025

cockroach-teamcity commented Jan 10, 2025

michae2 left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Jan 15, 2025

blathers-crl bot commented Jan 15, 2025

yuzefovich commented Jan 10, 2025 •

edited

Loading