Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x](backport #42016) [Kubernetes Integration] Fix for apiserver token expiration #42231

Merged
merged 3 commits into from
Jan 7, 2025

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Jan 7, 2025

  • Bug

Proposed commit message

WHAT: Adds the ability to NewPrometheusClient to refresh the authentication bearer token

WHY: It is needed in specific K8s metrcisets that use prometheus to retrieve the metrics. In such cases, when the token expires, the client still uses old connection and eventually it gets rejected with 401 unauthorised error

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Using following documentation to build metricbeat locally

Use only following manifest:

 metricbeat.autodiscover:
      providers:
        - type: kubernetes
          scope: cluster
          node: ${NODE_NAME}
          unique: true
          templates:
            - config:
                - module: kubernetes
                  metricsets:
                    - apiserver
                  hosts: ["https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}"]
                  # use_kubeadm: true
                  # bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  bearer_token_file: /service-account/token
                  ssl.certificate_authorities:
                    - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                  period: 60s
...
       volumeMounts:
            - name: token-vol
              mountPath: /service-account
              readOnly: true
     ....
        volumes:
          - name: token-vol
            projected:
              sources:
                - serviceAccountToken:
                    path: token
                    expirationSeconds: 600

Related issues

Use cases

Reported here: https://github.com/elastic/sdh-beats/issues/5439

Screenshots

apiserver

For the last 30m or so processing continues

Logs

Consequative messages
message":"OLEEEE--- -> Denotes that prometheus metrics
"message":"PASSSS--- -> Denotes that prometheus metrics got 401 and then refresh of token happens
message":"OLEEEE--- -> Denotes that prometheus metrics continues processing

{"log.level":"info","@timestamp":"2024-12-12T14:32:07.296Z","log.logger":"PASSSSOLEEE","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/kubernetes/apiserver.(*Metricset).Fetch","file.name":"apiserver/metricset.go","file.line":80},"message":"OLEEEE--- TWe need to march 1:%!s(<nil>) and err: unexpected status code 401","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-12-12T14:32:34.199Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot","file.name":"log/log.go","file.line":192},"message":"Non-zero metrics in the last 30s","service.name":"metricbeat","monitoring":{"metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":3196932096}}}},"cpu":{"system":{"ticks":790,"time":{"ms":20}},"total":{"ticks":3960,"time":{"ms":160},"value":3960},"user":{"ticks":3170,"time":{"ms":140}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":10},"info":{"ephemeral_id":"eef1f440-b9d3-4dcf-954c-555c1cf5d5fe","uptime":{"ms":1110038},"version":"9.0.0"},"memstats":{"gc_next":29111896,"memory_alloc":25035752,"memory_total":1164904664,"rss":103301120},"runtime":{"goroutines":25}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"acked":528,"active":0,"batches":1,"total":528},"read":{"bytes":13880,"errors":1},"write":{"bytes":56510,"latency":{"histogram":{"count":19,"max":37,"mean":26.68421052631579,"median":27,"min":19,"p75":32,"p95":37,"p99":37,"p999":37,"stddev":5.2621051578736795}}}},"pipeline":{"clients":1,"events":{"active":0,"published":528,"total":528},"queue":{"acked":528,"added":{"bytes":451342,"events":528},"consumed":{"bytes":451342,"events":528},"filled":{"bytes":0,"events":0,"pct":0},"max_bytes":0,"max_events":3200,"removed":{"bytes":451342,"events":528}}}},"metricbeat":{"kubernetes":{"apiserver":{"events":528,"success":528}}},"system":{"load":{"1":5.87,"15":4.69,"5":4.9,"norm":{"1":0.8386,"15":0.67,"5":0.7}}}},"ecs.version":"1.6.0"}}
{"log.level":"info","@timestamp":"2024-12-12T14:33:04.201Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot","file.name":"log/log.go","file.line":192},"message":"Non-zero metrics in the last 30s","service.name":"metricbeat","monitoring":{"metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":3199254528}}}},"cpu":{"system":{"ticks":800,"time":{"ms":10}},"total":{"ticks":4010,"time":{"ms":50},"value":4010},"user":{"ticks":3210,"time":{"ms":40}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":10},"info":{"ephemeral_id":"eef1f440-b9d3-4dcf-954c-555c1cf5d5fe","uptime":{"ms":1140036},"version":"9.0.0"},"memstats":{"gc_next":29111896,"memory_alloc":26091616,"memory_total":1165960528,"rss":103825408},"runtime":{"goroutines":25}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"active":0},"write":{"latency":{"histogram":{"count":19,"max":37,"mean":26.68421052631579,"median":27,"min":19,"p75":32,"p95":37,"p99":37,"p999":37,"stddev":5.2621051578736795}}}},"pipeline":{"clients":1,"events":{"active":0},"queue":{"filled":{"bytes":0,"events":0,"pct":0},"max_bytes":0,"max_events":3200}}},"system":{"load":{"1":5.15,"15":4.67,"5":4.82,"norm":{"1":0.7357,"15":0.6671,"5":0.6886}}}},"ecs.version":"1.6.0"}}
{"log.level":"info","@timestamp":"2024-12-12T14:33:07.191Z","log.logger":"PASSSSOLEEE","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/kubernetes/apiserver.(*Metricset).Fetch","file.name":"apiserver/metricset.go","file.line":80},"message":"OLEEEE--- TWe need to march 1:unexpected status code 401 from server and err: unexpected status code 401","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-12-12T14:33:07.192Z","log.logger":"PASSSSOLEEE","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/kubernetes/apiserver.(*Metricset).Fetch","file.name":"apiserver/metricset.go","file.line":85},"message":"PASSSS--- This is the connection event with err: unexpected status code 401 from server","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-12-12T14:33:34.199Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot","file.name":"log/log.go","file.line":192},"message":"Non-zero metrics in the last 30s","service.name":"metricbeat","monitoring":{"metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":3198558208}}}},"cpu":{"system":{"ticks":830,"time":{"ms":30}},"total":{"ticks":4170,"time":{"ms":160},"value":4170},"user":{"ticks":3340,"time":{"ms":130}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":10},"info":{"ephemeral_id":"eef1f440-b9d3-4dcf-954c-555c1cf5d5fe","uptime":{"ms":1170037},"version":"9.0.0"},"memstats":{"gc_next":20762024,"memory_alloc":13202504,"memory_total":1224858216,"rss":101543936},"runtime":{"goroutines":25}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"acked":528,"active":0,"batches":1,"total":528},"read":{"bytes":13880,"errors":1},"write":{"bytes":56434,"latency":{"histogram":{"count":20,"max":37,"mean":26.45,"median":26.5,"min":19,"p75":31.75,"p95":36.849999999999994,"p99":37,"p999":37,"stddev":5.229483722127836}}}},"pipeline":{"clients":1,"events":{"active":0,"published":528,"total":528},"queue":{"acked":528,"added":{"bytes":450892,"events":528},"consumed":{"bytes":450892,"events":528},"filled":{"bytes":0,"events":0,"pct":0},"max_bytes":0,"max_events":3200,"removed":{"bytes":450892,"events":528}}}},"metricbeat":{"kubernetes":{"apiserver":{"events":528,"success":528}}},"system":{"load":{"1":5.35,"15":4.71,"5":4.93,"norm":{"1":0.7643,"15":0.6729,"5":0.7043}}}},"ecs.version":"1.6.0"}}
{"log.level":"info","@timestamp":"2024-12-12T14:34:04.202Z","log.logger":"monitoring","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot","file.name":"log/log.go","file.line":192},"message":"Non-zero metrics in the last 30s","service.name":"metricbeat","monitoring":{"metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":3199209472}}}},"cpu":{"system":{"ticks":840,"time":{"ms":10}},"total":{"ticks":4220,"time":{"ms":50},"value":4220},"user":{"ticks":3380,"time":{"ms":40}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":10},"info":{"ephemeral_id":"eef1f440-b9d3-4dcf-954c-555c1cf5d5fe","uptime":{"ms":1200037},"version":"9.0.0"},"memstats":{"gc_next":20762024,"memory_alloc":14106328,"memory_total":1225762040,"rss":101543936},"runtime":{"goroutines":25}},"libbeat":{"config":{"module":{"running":1}},"output":{"events":{"active":0},"write":{"latency":{"histogram":{"count":20,"max":37,"mean":26.45,"median":26.5,"min":19,"p75":31.75,"p95":36.849999999999994,"p99":37,"p999":37,"stddev":5.229483722127836}}}},"pipeline":{"clients":1,"events":{"active":0},"queue":{"filled":{"bytes":0,"events":0,"pct":0},"max_bytes":0,"max_events":3200}}},"system":{"load":{"1":4.66,"15":4.68,"5":4.81,"norm":{"1":0.6657,"15":0.6686,"5":0.6871}}}},"ecs.version":"1.6.0"}}
{"log.level":"info","@timestamp":"2024-12-12T14:34:07.306Z","log.logger":"PASSSSOLEEE","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/kubernetes/apiserver.(*Metricset).Fetch","file.name":"apiserver/metricset.go","file.line":80},"message":"OLEEEE--- TWe need to march 1:%!s(<nil>) and err: unexpected status code 401","service.name":"metricbeat","ecs.version":"1.6.0"}

This is an automatic backport of pull request #42016 done by [Mergify](https://mergify.com).

* initial fix for apiserver

* adding fix for controller and schedule

(cherry picked from commit 7e25c4d)
@mergify mergify bot requested a review from a team as a code owner January 7, 2025 07:51
@mergify mergify bot added the backport label Jan 7, 2025
@mergify mergify bot requested a review from a team as a code owner January 7, 2025 07:51
@mergify mergify bot requested review from MichaelKatsoulis and constanca-m and removed request for a team January 7, 2025 07:51
@mergify mergify bot assigned gizas Jan 7, 2025
@mergify mergify bot requested review from faec and leehinman and removed request for a team January 7, 2025 07:51
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 7, 2025
@botelastic
Copy link

botelastic bot commented Jan 7, 2025

This pull request doesn't have a Team:<team> label.

@gizas gizas enabled auto-merge (squash) January 7, 2025 14:27
@gizas gizas merged commit ec2f69a into 8.x Jan 7, 2025
38 checks passed
@gizas gizas deleted the mergify/bp/8.x/pr-42016 branch January 7, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport needs_team Indicates that the issue/PR needs a Team:* label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant