kgo: return decompression errors while consuming #883

twmb · 2025-01-08T01:45:33Z

Kafka can return partial batches, so decompression errors are common. If I ask for at most 100 bytes, and the broker has two 60 byte batches, I will receive one valid 60 byte batch and then a partial 40 byte batch. The second partial batch will fail at decompressing. This is the reason I previously never returned decompression errors.

However, if a client truly does produce somewhat-valid compressed data that some decompressors can process, but others (Go's) cannot, then the first batch received could fail to decompress. The client would fail processing, return an empty batch, and try consuming at the same spot. The client would spin loop trying to consume and the end user would never be aware.

Now, if the first error received is a decompression error, we bubble it up to the end user.

This is hard to test internally, so this was hack manually tested.

Scenario one:

I changed the code to ignore crc errors, since that just got in the way
I ran a local kfake where the first five bytes of a RecordBatch.Records was overwritten with "aaaaa"
I consumed before this patch -- the client spin-looped, never progressing and never printing anything.
I consumed after this patch -- the client immediately received the error.

Scenario two:

Same crc ignoring
I ran a local kfake where, when consuming, all batches AFTER the first had their RecordBatch.Records overwritten with "aaaaa".
I consumed before and after this patch -- in both cases, the client progressed to the end of the partition and no errors were printed.
To double verify the decompression error was being encountered, I added a println in kgo where the decompression error is generated -- the println was always encountered.

Closes #854.

Kafka can return partial batches, so decompression errors are common. If I ask for at most 100 bytes, and the broker has two 60 byte batches, I will receive one valid 60 byte batch and then a partial 40 byte batch. The second partial batch will fail at decompressing. This is the reason I previously never returned decompression errors. However, if a client truly does produce somewhat-valid compressed data that *some* decompressors can process, but *others* (Go's) cannot, then the first batch received could fail to decompress. The client would fail processing, return an empty batch, and try consuming at the same spot. The client would spin loop trying to consume and the end user would never be aware. Now, if the first error received is a decompression error, we bubble it up to the end user. This is hard to test internally, so this was hack manually tested. Scenario one: * I changed the code to ignore crc errors, since that just got in the way * I ran a local kfake where the first five bytes of a RecordBatch.Records was overwritten with "aaaaa" * I consumed _before_ this patch -- the client spin-looped, never progressing and never printing anything. * I consumed _after_ this patch -- the client immediately received the error. Scenario two: * Same crc ignoring * I ran a local kfake where, when consuming, all batches AFTER the first had their RecordBatch.Records overwritten with "aaaaa". * I consumed before and after this patch -- in both cases, the client progressed to the end of the partition and no errors were printed. * To double verify the decompression error was being encountered, I added a println in kgo where the decompression error is generated -- the println was always encountered. Closes #854.

twmb mentioned this pull request Jan 8, 2025

Incorrect Value for BufferedFetchRecords When Pausing/Resuming #865

Open

twmb added the patch label Jan 8, 2025

twmb mentioned this pull request Jan 15, 2025

v1.18.1 release status #888

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kgo: return decompression errors while consuming #883

kgo: return decompression errors while consuming #883

twmb commented Jan 8, 2025

kgo: return decompression errors while consuming #883

Are you sure you want to change the base?

kgo: return decompression errors while consuming #883

Conversation

twmb commented Jan 8, 2025