Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTEL SPEC violation: OTEL_EXPORTER_OTLP_ENDPOINT is not honored as an environment variable #5586

Open
julealgon opened this issue May 1, 2024 · 19 comments
Labels
bug Something isn't working pkg:OpenTelemetry.Exporter.OpenTelemetryProtocol Issues related to OpenTelemetry.Exporter.OpenTelemetryProtocol NuGet package

Comments

@julealgon
Copy link

Bug Report

List of all OpenTelemetry NuGet
packages
and version that you are
using (e.g. OpenTelemetry 1.0.2):

Tested with:

  • OpenTelemetry.Exporter.OpenTelemetryProtocol 1.8.0
  • OpenTelemetry.Exporter.OpenTelemetryProtocol 1.8.1

Runtime version (e.g. net462, net48, netcoreapp3.1, net6.0 etc. You can
find this information from the *.csproj file):

  • net472
  • net8.0

Symptom

When using either UseOtlpExporter or AddOtlpExporter to configure the OTLP exporter, the OTEL_EXPORTER_OTLP_ENDPOINT is not respected when provided as an environment variable. The setting is only honored if included in IConfiguration, which is not always the case in every project.

What is the expected behavior?

I would expect the environment variable to be honored regardless of the Configuration setup in the application: the setting should be read both from environment variables directly as well as from IConfiguration, so it respects both the spec (which only talks about environment variables) as well as the standard .NET configuration ecosystem.

What is the actual behavior?

The setting is only respected if it is found in IConfiguration. If for whatever reason the current project did not use .AddEnvironmentVariables without a prefix in the host setup, the setting is ignored completely.

Reproduce

https://github.com/julealgon/OpenTelemetry.UseOtlpExporterIssue

  1. Open the OpenTelemetry.UseOtlpExporterIssue solution
  2. Run the application with both profiles in launchsettings.json and observe the difference

Additional Context

This inconsistency appears to have been introduced between versions 1.3 and 1.4 of the package. Downgrading to 1.3 respects the environment variable, but stops reading it from IConfiguration.

Both options should work, with priority to IConfiguration.

@julealgon julealgon added the bug Something isn't working label May 1, 2024
@CodeBlanch
Copy link
Member

Thanks for this report @julealgon!

If for whatever reason the current project did not use .AddEnvironmentVariables without a prefix in the host setup, the setting is ignored completely.

There is default behavior which is to automatically load envvars into IConfiguration. If user has decided to NOT do that and instead opted to control things manually, why should we be opinionated? Seems they have indicated they do NOT want envvars to apply to their process. If we didn't respect that, wouldn't it be equally surprising?

As a workaround, users who decide to take full control of envvar loading could do this right?

builder.Configuration
   .AddEnvironmentVariables("MyCustomPrefix_")
   .AddEnvironmentVariables("OTEL_");

Both options should work, with priority to IConfiguration.

This seems problematic. I think the cure may be worse than the disease 🤣 IConfiguration is composited from any number of sources and users may control the order in which keys apply. If they manually set OTEL_EXPORTER_OTLP_ENDPOINT in IConfiguration to clear it out (could be done on command-line or via code or through ordering) and then we re-load it from the envvar directly because we think it wasn't set.... that would be equally surprising/bugged, no?

@julealgon
Copy link
Author

Thanks for this report @julealgon!

If for whatever reason the current project did not use .AddEnvironmentVariables without a prefix in the host setup, the setting is ignored completely.

There is default behavior which is to automatically load envvars into IConfiguration. If user has decided to NOT do that and instead opted to control things manually, why should we be opinionated? Seems they have indicated they do NOT want envvars to apply to their process. If we didn't respect that, wouldn't it be equally surprising?

I agree with you from one perspective, but disagree on others. First, keep in mind that the entire hosting model is optional. Not only that, but parts of the hosting model can be used separately in any process.

For example, I could setup just configuration and dependency injection, using IConfigurationBuilder and IServiceCollection parts of the hosting setup. When doing that, I could easily leverage AddOpenTelemetry from OpenTelemetry.Extensions.Hosting and end up in this situation where I'm actually now forced not only to rely on IConfiguration, but to also ensure I'm loading environment variables as well.

I think this goes against what the spec specifies: that this should be directly controllable by environment variables. IConfiguration is a higher level abstraction that can read from environment variables, but these are not interchangeable.

Additionally, you can also create a cmdline tool with only the DI aspect and still leverage AddOpenTelemetry. Sure, you get a "partially functional" setup if you do that since you don't get the hosted service and have to manually instantiate the various providers from the container yourself, but it works.

All I'm trying to say here is that we shouldn't force a higher level abstraction on people if they don't currently rely on it or for any reason can't rely on it.

I discovered this in a couple of our projects because we are using a "blank" and partial host setup just to leverage configuration, logging and dependency injection without necessarily tapping into hosted services. Yes, its not the most common use case, but it is a valid one. As is simpler usages as I outlined above especially with cmdline tools that don't use generic host but still want some level of abstraction like configuration and DI.

Similarly, you could be dealing with an old System.Web-based project, like WebForms, that cannot be migrated to modern .NET and that does not natively support the hosting and IConfiguration abstractions from Microsoft.Extensions, but that still do support Dependency Injection separately. We also have a couple of these types of projects on our side, and we use a "partial" hosting model on them as well, while still reading configuration using the old ConfigurationManager in a few cases.

To be very clear here: this is currently not a blocker for us. We are feeding that endpoint setting via our global Azure AppConfiguration provider (and not via environment variables). I stumbled upon this while debugging one of the apps locally, putting the environment variable in the launchsettings.json and expecting it to work, only to find a couple hours of attempts later that the library was ignoring that variable completely.

I thought that was a terrible-enough experience to warrant the creation of this issue for visibility, and because I trully believe that if you are not directly accepting an environment variable for this and relying on an abstraction, you are not compliying with the spec.

As a workaround, users who decide to take full control of envvar loading could do this right?

builder.Configuration
   .AddEnvironmentVariables("MyCustomPrefix_")
   .AddEnvironmentVariables("OTEL_");

This is not a valid workaround. When you call AddEnvironmentVariables with a prefix, that prefix is trimmed out of the included keys. So, if you add AddEnvironmentVariables("OTEL_"), then pass in OTEL_EXPORTER_OTLP_ENDPOINT, you'll actually get a setting with a key called EXPORTER_OTLP_ENDPOINT, which will not be recognized at all. If you had a setup like this, you'd need to then define multiple versions of the same environment variable:

  • OTEL_EXPORTER_OTLP_ENDPOINT for those projects who read all environment variables
  • SOMEPREFIX_OTEL_EXPORTER_OTLP_ENDPOINT for projects who only read SOMEPREFIX-prefixed environment variables

This would obviously be far from ideal as you just duplicated maintenance around those variables and they could eventually go out of sync.

Both options should work, with priority to IConfiguration.

This seems problematic. I think the cure may be worse than the disease 🤣 IConfiguration is composited from any number of sources and users may control the order in which keys apply. If they manually set OTEL_EXPORTER_OTLP_ENDPOINT in IConfiguration to clear it out (could be done on command-line or via code or through ordering) and then we re-load it from the envvar directly because we think it wasn't set.... that would be equally surprising/bugged, no?

The "with priority to IConfiguration" is just my immediate opinion but it could go either way. It would certainly need to be well docummented.

I do not disagree it could be confusing, but I just don't see any other possibility for you to both honor the spec, and leverage the IConfiguration abstraction when it exists. To me, as it is, it is also confusing because there are many situations where even though I have the environment variable set, it doesn't apply. Dropping support for IConfiguration and have this work only with raw environment variables would certainly be compliant with the spec, but it would be a terrible experience too, so I'm not advocating for that either.

If the team decides to default the value to the environment variable, and then potentially override it with the value from IConfiguration, sure... that's also an option and I could get behind that no problem.

@CodeBlanch
Copy link
Member

Dang there's a lot to unpack 🤣

I want to talk about the IConfiguration and hosting dependency referenced above a bit. I think there might be some confusion about what is going on.

The SDK doesn't require a host or IConfiguration for this to work!

Here are couple unit tests to try out:

    [Fact]
    public void EnvVarTestUsingHostingExtensionsWithoutHost()
    {
        var expectedEndpoint = "http://my_custom_endpoint";

        Environment.SetEnvironmentVariable("OTEL_EXPORTER_OTLP_ENDPOINT", expectedEndpoint);

        var services = new ServiceCollection();

        services.AddOpenTelemetry()
            .WithTracing(tracing => tracing.AddOtlpExporter());

        using var sp = services.BuildServiceProvider();

        sp.GetRequiredService<TracerProvider>();

        Assert.Equal(expectedEndpoint, sp.GetRequiredService<IConfiguration>()["OTEL_EXPORTER_OTLP_ENDPOINT"]);

        var options = sp.GetRequiredService<IOptions<OtlpExporterOptions>>().Value;

        Assert.Equal(expectedEndpoint, options.Endpoint.OriginalString);
    }

    [Fact]
    public void EnvVarTestUsingSdkCreateStyle()
    {
        var testRun = false;
        var expectedEndpoint = "http://my_custom_endpoint";

        Environment.SetEnvironmentVariable("OTEL_EXPORTER_OTLP_ENDPOINT", expectedEndpoint);

        using var tracerProvider = Sdk.CreateTracerProviderBuilder()
            .AddOtlpExporter()
            .ConfigureServices(services =>
            {
                services.ConfigureOpenTelemetryTracerProvider((sp, _) =>
                {
                    var options = sp.GetRequiredService<IOptions<OtlpExporterOptions>>().Value;

                    Assert.Equal(expectedEndpoint, options.Endpoint.OriginalString);

                    Assert.Equal(expectedEndpoint, sp.GetRequiredService<IConfiguration>()["OTEL_EXPORTER_OTLP_ENDPOINT"]);

                    testRun = true;
                });
            })
            .Build();

        Assert.True(testRun);
    }

In both cases there is no host or IConfiguration yet everything works!

How?

// Note: When using a host builder IConfiguration is automatically
// registered and this registration will no-op. This only runs for
// Sdk.Create* style or when manually creating a ServiceCollection. The
// point of this registration is to make IConfiguration available in
// those cases.
services!.TryAddSingleton<IConfiguration>(
sp => new ConfigurationBuilder().AddEnvironmentVariables().Build());

The SDK will create IConfiguration from envvars if there is no IConfiguration found (no host present).

So for non-hosting scenarios, I don't think the issue applies. Users will just get envvars in that case and the SDK will behave as the spec defines.

For hosting scenarios using the defaults, everything will also work as the spec defines.

The only case where things will get strange is if users decide to manually construct their IConfiguration in their host and don't load all the envvars. But what I'm asking is, if they decide to not load all envvars, why would they expect for them to apply?

For sure the spec doesn't talk about IConfiguration. It couldn't be that opinionated because it is a .NET thing 😄 The spec does say a lot of places though SDKs should use whatever is idiomatic to the SIG/language where it can. What we're trying to do with OTel .NET is bridge the two worlds. .NET users expect to be able to do things via IConfiguration.

I'm not saying I wouldn't like to fix this, I just don't see a way at the moment to make it perfect. Open to ideas!

@julealgon
Copy link
Author

Dang there's a lot to unpack 🤣

Sorry about that 😅

The SDK will create IConfiguration from envvars if there is no IConfiguration found (no host present).

This is insteresting. So in a sense, the fallback chain I mentioned already exists here, but goes the other way: the envvars are a fallback to lack of IConfiguration. Again... fair enough, I'm not 100% opposed to this, it's a sensible option.

So for non-hosting scenarios, I don't think the issue applies.
...
The only case where things will get strange is if users decide to manually construct their IConfiguration in their host and don't load all the envvars.

Exactly. And I think this might be a more common scenario than you are alluding to. If I built a console app for example and didn't want to go all the way with generic host + hosted services, creating my own IConfiguration and registering that as a singleton in the container manually would make a ton of sense. I believe there must be many such cases out there as I'm pretty sure I've seen recommendations like that in the wild even from Microsoft.

But what I'm asking is, if they decide to not load all envvars, why would they expect for them to apply?

You might have a point here, but that's just not how this works usually. Following your logic, if I didn't load envvars wouldn't I want my entire application to be shielded from them? That's just not what happens: they keep working normally and completely disregard IConfiguration.

Take something like the Azure credential variables. AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET... those don't care at all if you have or don't have IConfiguration in place, and if you do, and you have not loaded envvars, that`s also irrelevant. They still apply.

I think these OTEL envvars should behave like that. They should work regardless of IConfiguration. IConfiguration should be an additional avenue to define them, but not exclude the environment variables completely.

For sure the spec doesn't talk about IConfiguration. It couldn't be that opinionated because it is a .NET thing 😄 The spec does say a lot of places though SDKs should use whatever is idiomatic to the SIG/language where it can. What we're trying to do with OTel .NET is bridge the two worlds. .NET users expect to be able to do things via IConfiguration.

Sure, but what I'm proposing would still leverage IConfiguration when available (as it is so much more flexible and convenient than raw env vars) but still apply the env var in cases where it makes sense.

I'm not saying I wouldn't like to fix this, I just don't see a way at the moment to make it perfect. Open to ideas!

I will reiterate this: this is not a problem for us specifically, but I think it is a consistency/principle of least astonishment violation for sure. I think the scenarios where the envvars are ignored today are incredibly confusing, in particular when people refer to general OpenTelemetry documentation and see that they should work.

I think I presented enough cases to support my point but I'll leave this up to you of course. I still believe it is a mistake to flat out ignore the environment variables in some cases. They should, at least, always be the default value and overriden by higher level abstractions then.

I believe the approach this library takes, of creating an IConfiguration if it can't find one in the container might be a brittle approach to this problem and should be reconsidered. I would suggest that instead of attempting to standardize the code around IConfiguration by creating a "standard" one, that instead the options models should just be initialized to the environment variable values if they are available, and allow those options models to then have the values further modified by IConfiguration / IOptions framework.

@reyang reyang added the pkg:OpenTelemetry.Exporter.OpenTelemetryProtocol Issues related to OpenTelemetry.Exporter.OpenTelemetryProtocol NuGet package label May 15, 2024
@Aaronontheweb
Copy link
Contributor

I wasted an hour of my time trying to understand why my OTLP Collector DaemonSet wasn't picking up any of my traces - turns out, this issue is why! It's very frustrating that the .NET client doesn't just "do the thing" that is referenced everywhere else in the OpenTelemetry documentation for every other runtime.

I don't think there's any value-add to making this an IConfiguration-powered thing, given that it's a footgun everyone is going to have to discover for themselves if they spend any time reading the OpenTelemetry for Kubernetes introduction or anything else that references this environment variable.

Aaronontheweb added a commit to Aaronontheweb/DrawTogether.NET that referenced this issue Jun 19, 2024
Aaronontheweb added a commit to petabridge/DrawTogether.NET that referenced this issue Jun 19, 2024
* install OpenTelemetry plugins

* added OpenTelemetry

* trying to figure out OTEL resource conventions

* added assembly version data

* fixed DLL hell issues

* simplify OTEL exporter config

* fixed K8s OTEL config for pod

* fixed set

* Manually load `OTEL_EXPORTER_OTLP_ENDPOINT` issue

work around for open-telemetry/opentelemetry-dotnet#5586
@julealgon
Copy link
Author

@Aaronontheweb

I don't think there's any value-add to making this an IConfiguration-powered thing, given that it's a footgun everyone is going to have to discover for themselves if they spend any time reading the OpenTelemetry for Kubernetes introduction or anything else that references this environment variable.

Now now... that's is just crazy. We rely on IConfiguration being possible heavily in our projects now, by storing all OTEL settings in Azure AppConfiguration and sharing them across dozens of projects.

If only environment variables were supported, this would be an absolute mess to manage as we'd have to set that variable individually in several places.

It needs to support both.

@Aaronontheweb
Copy link
Contributor

My comment should be read as: only supporting Msft dependency injection is crazy.

@Aaronontheweb
Copy link
Contributor

Decided to check some of the other popular environment variables while I was at it:

        services
            .AddOpenTelemetry()
            .ConfigureResource(builder =>
            {
                builder
                    .AddEnvironmentVariableDetector() // need this
                    .AddTelemetrySdk()
                    .AddAttributes(new []
                    {
                        new KeyValuePair<string, object>("service.version", typeof(OpenTelemetryConfig).Assembly.GetName().Version?.ToString() ?? "unknown")
                    });
            })
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "service.namespace=$(NAMESPACE),service.instance.id=$(POD_NAME)"
        - name: OTEL_SERVICE_NAME
          value: "drawtogether"

I see all of that data show up in Aspire / Grafana et al, so that works so long as the .AddEnvironmentVaribleDetector is included.

@cijothomas
Copy link
Member

@julealgon
Copy link
Author

My comment should be read as: only supporting Msft dependency injection is crazy.

I'm not following you at all here. Care to elaborate?

@CodeBlanch
Copy link
Member

@Aaronontheweb

Something in your report isn't adding up for me.

The issue @julealgon reported is when calling Host.CreateEmptyApplicationBuilder then environment variables are NOT automatically loaded.

But you are calling WebApplication.CreateBuilder:

https://github.com/petabridge/DrawTogether.NET/blob/12696faf5ae201a07f85b04fe3cc9cff03d57a1c/src/DrawTogether/Program.cs#L15-L19

WebApplication.CreateBuilder by default loads environment variables into IConfiguration.

(PS: You have code doing the envvar load and loading appSettings.json which should not be needed given the call to get the defaults.)

I don't think there's any value-add to making this an IConfiguration-powered thing, given that it's a footgun everyone is going to have to discover for themselves if they spend any time reading the OpenTelemetry for Kubernetes introduction or anything else that references this environment variable.

I don't think it is a "footgun" because if it was, people would be screaming at us 🤣 Lots of users are successfully doing what you are trying to do. And if you use the default patterns, everything works fine automatically. Only if you go out of your way to ask for an "empty" host do you then have to take some action.

  • I cloned your code. Thanks for open sourcing it!
  • Commented out manually setting the endpoint (.UseOtlpExporter(/*OtlpExportProtocol.Grpc, new Uri(otlpEndpoint)*/)).
  • Added OTEL_EXPORTER_OTLP_ENDPOINT set to some value.
  • Started the app in debug.

Everything worked fine!

I'm guessing the issue is/was in your k8s config loading somewhere. Not super familiar with k8s, did you make sure all updated config was deployed when you were testing?

@Aaronontheweb
Copy link
Contributor

I'm guessing the issue is/was in your k8s config loading somewhere. Not super familiar with k8s, did you make sure all updated config was deployed when you were testing?

Yes, numerous times - it never worked until I added the work-around.

@Aaronontheweb
Copy link
Contributor

i.e., I was inspecting the ENV section of each these containers individually in Docker Desktop (local K8s) to make sure this value was set correctly.

@Aaronontheweb
Copy link
Contributor

Aaronontheweb commented Jun 19, 2024

The relevant K8s configuration is OSS too:

https://github.com/petabridge/DrawTogether.NET/blob/12696faf5ae201a07f85b04fe3cc9cff03d57a1c/k8s/services/k8s-web-service.yaml#L89-L94

Edit: worth mentioning: I didn't use OTEL Collector DaemonSet when I was debugging this - used a separate one running on a well-known address in our private network that was already receiving metrics / traces / logs. That address received data no problem after I manually set the OTLP Uri.

@CodeBlanch
Copy link
Member

Here's some code you can use to inspect things with debugger:

using Microsoft.Extensions.Options;
using OpenTelemetry;

// Same test can be performed using real vars on the system
Environment.SetEnvironmentVariable("OTEL_EXPORTER_OTLP_ENDPOINT", "http://helloworld/");

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry().UseOtlpExporter();

var app = builder.Build();

var optionsType = typeof(OpenTelemetryBuilderOtlpExporterExtensions)
    .Assembly
    .GetType("OpenTelemetry.Exporter.OtlpExporterBuilderOptions");

var options = app.Services.GetService(typeof(IOptions<>).MakeGenericType(optionsType));

app.Run();

It is picking up the envvar:

image

So as far as I can tell, everything seems to be working fine.

@Aaronontheweb
Copy link
Contributor

My comment should be read as: only supporting Msft dependency injection is crazy.

I'm not following you at all here. Care to elaborate?

per @CodeBlanch 's comment, I'm not correct on that - so disregard

@zlepper
Copy link

zlepper commented Jun 27, 2024

As a point for always dealing with the env variable directly: the variables to control aspnetcore is not affected by this, for example the variables to control which ports should be bound.

@wjrogers
Copy link

@CodeBlanch

if they decide to not load all envvars, why would they expect for them to apply

I would expect them to apply because the spec says they do. I want components like the OTEL SDK to work consistently according to their own rules. I don't want them to guess my intentions based on how I use other components. Reading environment variables directly means there's a consistent way to configure OTEL components regardless of language or platform.

It's also a documentation problem:

/// <remarks>
/// Note: OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS,
/// OTEL_EXPORTER_OTLP_TIMEOUT, and OTEL_EXPORTER_OTLP_PROTOCOL environment
/// variables are parsed during object construction.
/// </remarks>

That's not strictly true. The IConfiguration values with the same names as the environment variables are parsed during construction. The environment variables themselves are not.

@codymullins
Copy link

So as far as I can tell, everything seems to be working fine.

@CodeBlanch I encountered the same issue as @Aaronontheweb and the workaround is the only thing that made this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:OpenTelemetry.Exporter.OpenTelemetryProtocol Issues related to OpenTelemetry.Exporter.OpenTelemetryProtocol NuGet package
Projects
None yet
Development

No branches or pull requests

8 participants