Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: fix restarts integer overflow #4232

Merged

Conversation

EdwardX29
Copy link
Contributor

@EdwardX29 EdwardX29 commented Jan 7, 2025

Fix bug with integer overflows while indexing into blocks with large KVs
in SeekGE() and SeekLT() in rowblk_iter. Updated members in blockEntry
that represent offsets in blocks to be type offsetInBlock (alias for int64).

Added check in rowblk_writer to ensure that block sizes do not exceed
rowblk.MaximumSize before writing more data to the block.

Wrote unit tests to verify correct behavior for SeekGE() and SeekLT() with large
blocks >2GB.

@EdwardX29 EdwardX29 requested a review from a team as a code owner January 7, 2025 19:35
@EdwardX29 EdwardX29 requested a review from RaduBerinde January 7, 2025 19:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@EdwardX29 EdwardX29 requested a review from jbowens January 7, 2025 19:35
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a test with a very large block and verify we can iterate it.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @EdwardX29 and @jbowens)


sstable/rowblk/rowblk_iter.go line 116 at r3 (raw file):

	// offset is the byte index that marks where the current key/value is
	// encoded in the block.
	offset int64

You can consider adding a type offsetInBlock int64 which would make it evident what fields are offsets.


sstable/rowblk/rowblk_iter.go line 200 at r3 (raw file):

type blockEntry struct {
	offset   int64
	keyStart uint32

Aren't the rest of these also offsets?

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @RaduBerinde)


sstable/rowblk/rowblk_iter.go line 116 at r3 (raw file):

Previously, RaduBerinde wrote…

You can consider adding a type offsetInBlock int64 which would make it evident what fields are offsets.

Good idea—we can add a comment on the type discussing the motivation for using an int64, including our expectation that an offset may be > math.MaxUint32 (even though restart points are serialized as uint32s, because the offset fields are sometimes updated to point after the last KV in the block)


sstable/rowblk/rowblk_iter.go line 200 at r3 (raw file):

Previously, RaduBerinde wrote…

Aren't the rest of these also offsets?

+1, I think they're all offsets except for valSize

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status: 2 of 3 files reviewed, 6 unresolved discussions (waiting on @EdwardX29 and @RaduBerinde)


sstable/rowblk/rowblk_iter_test.go line 481 at r8 (raw file):

	var largeKeySize int64 = 2 << 30   // 2GB key size
	var largeValueSize int64 = 2 << 30 // 2GB value size

nit: these can be const instead of var, and if you omit the int64too, then down below we won't have to cast into an int when calling bytes.Repeat


sstable/rowblk/rowblk_iter_test.go line 534 at r8 (raw file):

	}

	KVTestPairs := func() []KVTestPair {

curious why use an anonymous function here?


sstable/rowblk/rowblk_iter_test.go line 538 at r8 (raw file):

		for i := 0; i < numKVs; i++ {
			key := fmt.Sprintf("key-%04d", i)
			value := strings.Repeat("a", valueSize)

nit: since we're memory constrained, maybe lift this to the outer scope as:

value4MB := bytes.Repeat([]byte("a"), valueSize)

so that every KV we add is reusing the same constructed byte slice rather than re-creating it


sstable/rowblk/rowblk_iter_test.go line 584 at r8 (raw file):

	numKVs := 63
	valueSize := 4 * (1 << 20)
	var FourGB int64 = 4 * (1 << 30)

nit: use consts for those of these that are constant


sstable/rowblk/rowblk_iter.go line 198 at r8 (raw file):

}

// Type representing an offset in a block

nit: it's idiomatic to start a Go comment on an identifier with the name of the identifier, in this case:

// offsetInBlock represents an offset in a block.

Copy link
Contributor Author

@EdwardX29 EdwardX29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @jbowens and @RaduBerinde)


sstable/rowblk/rowblk_iter_test.go line 481 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: these can be const instead of var, and if you omit the int64too, then down below we won't have to cast into an int when calling bytes.Repeat

32 bit architectures are not happy with large constants being used as ints in bytes.Repeat as constant expressions are evaluated at compile time. Adding a check for strconv.IntSize == 32 doesn't solve the issue either. CI run. However, using var and int64 solves this issue as overflow is at runtime instead; run here


sstable/rowblk/rowblk_iter_test.go line 534 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

curious why use an anonymous function here?

Originally was going to have both Multi large KV tests in one test so I wrote a lambda to generate KVs for both. switched to standard initialization.

@EdwardX29 EdwardX29 requested a review from jbowens January 9, 2025 21:12
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Make sure to squash the commits before merging

Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @EdwardX29 and @RaduBerinde)


sstable/rowblk/rowblk_iter_test.go line 481 at r8 (raw file):

Previously, EdwardX29 (Edward Xiao) wrote…

32 bit architectures are not happy with large constants being used as ints in bytes.Repeat as constant expressions are evaluated at compile time. Adding a check for strconv.IntSize == 32 doesn't solve the issue either. CI run. However, using var and int64 solves this issue as overflow is at runtime instead; run here

Ah, got it.

@EdwardX29 EdwardX29 force-pushed the edwardx29/block-restarts-overflow2 branch from 10f183e to 1b3ca39 Compare January 9, 2025 22:15
@EdwardX29 EdwardX29 changed the title sstable: Fix restarts integer overflow sstable: fix restarts integer overflow Jan 10, 2025
Fix bug with integer overflows while indexing into blocks with large KVs
in SeekGE() and SeekLT() in rowblk_iter. Updated members in blockEntry
that represent  offsets in blocks to be type offsetInBlock (alias for int64).

Added check in rowblk_writer to ensure that block sizes do not exceed
rowblk.MaximumSize before writing more data to the block.

Wrote unit tests to verify correct behavior for SeekGE() and SeekLT() with large
blocks >2GB.
@EdwardX29 EdwardX29 force-pushed the edwardx29/block-restarts-overflow2 branch from 1b3ca39 to 8db3b62 Compare January 10, 2025 15:42
@EdwardX29 EdwardX29 merged commit 816fda5 into cockroachdb:master Jan 10, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants