Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXPORTER] Support handling retry-able errors for OTLP/HTTP #3223

Merged
merged 35 commits into from
Jan 17, 2025

Conversation

chusitoo
Copy link
Contributor

@chusitoo chusitoo commented Dec 30, 2024

Contributes to #2049

Changes

This change introduces a retry mechanism for OTLP/HTTP for select failures, mimicking the same exponential backoff approach used in OTLP/gRPC.

  • Add support to set retry values via environment variables.
  • Enabled by default, using the same configuration values as in OTel dotnet, java and js.
  • Users can opt-out of the retry capabilities by zeroing out any (or all) of the retry settings.
  • Similar to OTLP/gRPC, retries are transparent to the user so only the last attempt is "bubbled up" as the actual response.

The changes to support retries for OTLP/gRPC exporter are addressed in #3219

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

Copy link

netlify bot commented Dec 30, 2024

Deploy Preview for opentelemetry-cpp-api-docs canceled.

Name Link
🔨 Latest commit 082597a
🔍 Latest deploy log https://app.netlify.com/sites/opentelemetry-cpp-api-docs/deploys/6772c02f4c1c9800081accd4

Copy link

netlify bot commented Dec 30, 2024

Deploy Preview for opentelemetry-cpp-api-docs canceled.

Name Link
🔨 Latest commit 905e7ae
🔍 Latest deploy log https://app.netlify.com/sites/opentelemetry-cpp-api-docs/deploys/678a5a452eb906000855d49b

@chusitoo chusitoo changed the title Support handling retry-able errors for OTLP/HTTP exporter [EXPORTER] Support handling retry-able errors for OTLP/HTTP Dec 30, 2024
Copy link

codecov bot commented Dec 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.78%. Comparing base (02cda51) to head (905e7ae).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3223      +/-   ##
==========================================
+ Coverage   87.71%   87.78%   +0.07%     
==========================================
  Files         198      198              
  Lines        6273     6308      +35     
==========================================
+ Hits         5502     5537      +35     
  Misses        771      771              
Files with missing lines Coverage Δ
sdk/src/common/env_variables.cc 99.03% <100.00%> (+0.50%) ⬆️

@chusitoo chusitoo marked this pull request as ready for review January 1, 2025 23:53
@chusitoo chusitoo requested a review from a team as a code owner January 1, 2025 23:53
@chusitoo chusitoo force-pushed the RetryableErrorHttp branch from 99f7dd8 to f0abd5b Compare January 9, 2025 01:40
@chusitoo chusitoo force-pushed the RetryableErrorHttp branch from f0abd5b to 2e3ec77 Compare January 9, 2025 02:03
Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work.

The only part remaining at this point is to add a feature flag:

In CMakeList.txt, define a new option WITH_OTLP_RETRY_PREVIEW, default OFF

The same option name should be fine for both HTTP and gRPC.

In api/CMakeList.txt, add a target_compile_definitions (aka, a define), for ENABLE_OTLP_RETRY_PREVIEW

Use ifdef ENABLE_OTLP_RETRY_PREVIEW to protect the code path for this feature. A good place could be inside the implementation of IsRetryable().

Adjust the unit tests using the same ifdef, to make sure they pass when compiling with and without WITH_OTLP_RETRY_PREVIEW.

In CI, add WITH_OTLP_RETRY_PREVIEW=ON in all maintainer builds.

Having a feature flag is needed because:

  • the environment variables are not in the spec yet, so this code is considered experimental and subject to change
  • the code itself is new, so this mitigates risks for people in production with the OTLP exporter. Adoption will be voluntary since the default is OFF.

Much later, the feature flag will be ON by default, and then removed, assuming we get this in the spec at some point, to declare the whole thing stable.

Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Excellent work, thanks for the feature.

@marcalff
Copy link
Member

Ok to ignore CI failurews for test BasicCurlHttpTests.FinishInAsyncCallback, due to #3172

@marcalff
Copy link
Member

@chusitoo

Sorry for the merge conflicts.

This should be trivial, as we both added some members in the same place, let me know if you need help with resolving.

Please note that there is a new constructor in ext, for HttpClient, so your changes will be needed there as well.

I will review again after conflict resolution anyway.

@marcalff marcalff added the pr:fix-merge-conflicts Please fix merge conflicts for this pr label Jan 16, 2025
@marcalff
Copy link
Member

@chusitoo

Please resolve the merge conflicts, or indicate if you need help with this.

Thanks (and sorry for the conflicts)

@chusitoo
Copy link
Contributor Author

@chusitoo

Please resolve the merge conflicts, or indicate if you need help with this.

Thanks (and sorry for the conflicts)

Apologies for the delay, having a hard time finding time for this during the week.

Conflicts resolved but I have to look into addressing a concern raised in this discussion

@marcalff marcalff removed the pr:fix-merge-conflicts Please fix merge conflicts for this pr label Jan 16, 2025
Comment on lines +833 to +834
bool HttpClient::doRetrySessions(bool report_all)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, optional:

Please use ifdef ENABLE_OTLP_RETRY_PREVIEW as well in HttpClient::doRetrySessions(), to protect this code path.

@marcalff
Copy link
Member

To summarize remaining outstanding issues for this review:

I have a nit comment on HttpClient::doRetrySessions(bool)
@chusitoo please take a look.

HttpClient::doRetrySessions(bool) now takes a bool report_all parameter.

@owent
This should resolve the concern you raised, please take a look.

Copy link
Member

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks.

@marcalff marcalff merged commit 25f7a13 into open-telemetry:main Jan 17, 2025
57 checks passed
malkia added a commit to malkia/opentelemetry-cpp that referenced this pull request Jan 17, 2025
[EXPORTER] Support handling retry-able errors for OTLP/HTTP (open-telemetry#3223)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants