Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for zstd compression #1247

Merged
merged 19 commits into from
Nov 10, 2024

Conversation

EdgarModesto23
Copy link
Contributor

@EdgarModesto23 EdgarModesto23 commented Oct 6, 2024

Following #1200

This patch adds support for zstd compression and it updates build scripts and the main README to include the new -DPISTACHE_USE_CONTENT_ENCODING_ZSTD flag. I mostly followed the style suggested by @kiplingw on #1177 and #1178.

It uses version 1.5.6 of the C implementation of zstd library.

@kiplingw kiplingw added enhancement fix in progress dependencies Pull requests that update a dependency file experimental labels Oct 6, 2024
// Allocate a smart buffer to contain compressed data...
std::unique_ptr compressedData = std::make_unique<std::byte[]>(estimated_size);

// Compress data using compresion_level = 5: https://facebook.github.io/zstd/zstd_manual.html#Chapter5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently don't have a public API for the user to specify the compression level (each compressor can use a different range). Until we figure out an elegant way to do that, it might be best to set the compression level consistent with whatever we're using on the others (I think max).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe max is really max on zstd. We could also set compression level dynamically, or partially dynamically, taking into account available CPU bandwidth, available network bandwidth, number of cores (zstd is good with multithreading) etc.

But maybe a middling value is wisest until we do a well-thought-through API/algorithm for compression level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgreatwood @kiplingw I agree. Using a middle value makes the most sense to me. At least until we find a better solution as both of you guys mentioned.

version.txt Outdated Show resolved Hide resolved
@@ -77,7 +81,8 @@ foreach test_name : pistache_test_files
gmock_dep,
curl_dep,
cpp_httplib_dep,
brotli_dep
brotli_dep,
zstd_dep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that libzstd-dev also needs to be added to Build-Depends in debian/control. That's in a separate branch and so you'll need a separate PR for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @EdgarModesto23 -

I see a number of the github workflows (https://github.com/pistacheio/pistache/actions) are failing, e.g. with:
tests/meson.build:71:1: ERROR: Unknown variable "zstd_dep".
Some of that may be related to the need for an update to debian/control. Certainly the way brotli is dealt with should be a good model.

The workflows linuxflibev, linux and macOS libevent should all pass (occasionally the debian:testing tests under linux workflow will fail if debian:testing is in a funny state, but in fact it seems OK at present). The conventional-commits and abidiff are less important (just make sure you bump version number if abidiff changes).

Copy link
Collaborator

@dgreatwood dgreatwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding http_server_test.cpp -
There are a lot of formatting changes that amount to indenting code that lies within encapsulating (scope limiting) curly braces, like:

{ // encapsulate
    // Indent this line by 4 more spaces
    // Etc.
}

The extra indentation that has been added is 100% logical, however it was not added before so that the encapsulating braces can be added or removed without causing a "diff" for each of the enclosed lines.

Would it be practical to revert the formatting changes from http_server_test.cpp, leaving only the substantive changes to be reviewed?

Copy link

codecov bot commented Oct 12, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 76.26%. Comparing base (e07dc0b) to head (acb30d4).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/common/http_header.cc 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1247      +/-   ##
==========================================
- Coverage   76.29%   76.26%   -0.04%     
==========================================
  Files          58       58              
  Lines       10027     9698     -329     
==========================================
- Hits         7650     7396     -254     
+ Misses       2377     2302      -75     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kiplingw
Copy link
Member

@Tachi107, your feedback is appreciated.

Copy link
Member

@Tachi107 Tachi107 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some feedback in a few different comments, check it out!

Also, the 1.5.6 release of zstd seems to have specific improvements to usage in web contexts. You might want to see what changed and evaluate if tweaking your implementation is worthwhile.

include/pistache/http.h Outdated Show resolved Hide resolved
include/pistache/http.h Outdated Show resolved Hide resolved
src/common/http.cc Outdated Show resolved Hide resolved
meson_options.txt Outdated Show resolved Hide resolved
src/common/http.cc Outdated Show resolved Hide resolved
tests/http_server_test.cc Outdated Show resolved Hide resolved
version.txt Outdated Show resolved Hide resolved
@kiplingw
Copy link
Member

kiplingw commented Nov 2, 2024

@Tachi107 and @dgreatwood, looks fine to me.

@kiplingw kiplingw merged commit 2e98845 into pistacheio:master Nov 10, 2024
8 of 10 checks passed
@kiplingw
Copy link
Member

@EdgarModesto23, please review our PPA's build log. Your PR seems to break on all architectures on jammy. Here's is the log for amd64.

Can you look into your build dependencies? If we can't satisfy Jammy, we will have to either drop it, or at least find a way to disable support for it in Jammy in debian/rules.

@EdgarModesto23
Copy link
Contributor Author

@kiplingw I'll fix it right away.

@EdgarModesto23
Copy link
Contributor Author

@kiplingw I made a PR that should fix the problem. I'll stay around in case anything comes up. Thank you for reaching out!

@dgreatwood
Copy link
Collaborator

Hi @EdgarModesto23 -

May I raise another question / concern, this time about the test code?

In your http_server_test.cc/server_with_content_encoding_zstd, you have:

const auto compressionStatus = ZSTD_getFrameContentSize(newlyDecompressedData.data(), newlyDecompressedData.size());

const auto decompressed_size = ZSTD_decompress((void*)newlyDecompressedData.data(), compressionStatus, newlyCompressedResponse.data(), newlyCompressedResponse.size());

Here are my concerns:

  1. Per spec, ZSTD_getFrameContentSize's first parameter points "to the start of a ZSTD encoded frame". However, newlyDecompressedData doesn't contain a ZSTD frame. It doesn't actually contain anything yet, since we have yet to decompress the data when ZSTD_getFrameContentSize is called.
  2. If ZSTD_getFrameContentSize fails, compressionStatus may be an error code. But compressionStatus is not checked for an error.
  3. Given (1) above, ZSTD_getFrameContentSize actually does fail, and compressionStatus actually is an error code.
  4. The only reason the subsequent call to ZSTD_decompress works is that the error code is a very large number (as an unsigned), and so it appears we're passing a sufficiently large - in fact very large - buffer to ZSTD_decompress.

In short, it seems that ZSTD_getFrameContentSize should act on newlyCompressedResponse, and then ZSTD_getFrameContentSize's return code should be checked. I included this code:

// Decompress...
auto decompressedSzFromFrame = ZSTD_getFrameContentSize(newlyCompressedResponse.data(), newlyCompressedResponse.size());
if (ZSTD_isError(decompressedSzFromFrame))
{
    LOGGER("test", "getFrameContentSize result: " <<
           ((decompressedSzFromFrame == ZSTD_CONTENTSIZE_UNKNOWN) ?
            "Content Size Unknown" :
            (decompressedSzFromFrame == ZSTD_CONTENTSIZE_ERROR) ?
            "Content Size Error" : "Other"));
    decompressedSzFromFrame = newlyDecompressedData.size();
}
const auto decompressed_size = ZSTD_decompress(reinterpret_cast<void*>(newlyDecompressedData.data()), decompressedSzFromFrame, newlyCompressedResponse.data(), newlyCompressedResponse.size());

(cf. https://github.com/dgreatwood/pistachefork/blob/windows/tests/http_server_test.cc)

Could you please review the comments above and likewise my proposed code?

Thanks much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement experimental fix in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants