Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support prom discovery #93

Closed

Conversation

lujiajing1126
Copy link

Closes #59

According to the discussion in the thread, an extra option is added,

--discovery-method           TEXT  Method to discover workload in the cluster. [default: api-server]

The default mode is api-server but can be switched to prometheus.

Currently, the following entity is supported,

  • Deployment
  • STS
  • ...

@LeaveMyYard
Copy link
Contributor

Planned to merge in 1.7

@danielhass
Copy link

@LeaveMyYard this apparently didn't make it into 1.7. Can you guys share the current plans for this PR?

@aantn
Copy link
Contributor

aantn commented Mar 28, 2024

We're looking into this! It needs some updating for the current codebase and is currently incomplete.

So we can't commit to a timeline yet, but it is something we want to merge in principle.

There are two scenarios we're familiar with where this would be useful:

  1. For people who don't have kubectl access to the cluster
  2. For ephemeral workloads that no longer exist at the time of the scan (e.g. gitlab runners)

Just to confirm we're on the right track with this, is your use case one of those two?

@aantn
Copy link
Contributor

aantn commented Mar 28, 2024

And if there are any volunteers for updating this PR to the current version of Robusta and completing it - that would definitely accelerate how fast we can merge it.

@danielhass
Copy link

@aantn thanks for providing this feedback here!

Our use case falls into these cases and extends beyond I would say: we too don't want to require kubectl to the scanned cluster to run krr. Additionally in our landscape we consolidate our metrics on multiple central "infrastructure clusters" via Prom remote-writes. These infra clusters also store metrics with a much larger retention than the local Prom instances in the source/workload clusters. Running krr on these infra clusters would allow us to fit the tool much better in the centralized metrics approach we chose and possibly lead to better results as we have much more data (larger metrics retention) available there.

Does this makes sense?

@aantn
Copy link
Contributor

aantn commented Apr 1, 2024

Yep, makes sense.

You may be able to do that today without waiting for the PR. You can pass an explicit Prometheus url with -p and use --prometheus-label and -l to specify which metrics belong to each cluster. See "Scanning with a Centralized Prometheus" in this part of the README.

Let me know if that works for your case?

edit: To clarify, you'll still need kubectl access to each cluster, so it wont solve that problem. But I think it will solve the second goal of letting krr use the centralized metrics with larger retention.

@danielhass
Copy link

danielhass commented Apr 9, 2024

@aantn we could reproduce your edited findings 💯 on our side.

We were indeed able to make use of the larger retention on the central Prom instance with mentioned parameters. However krr failed as soon as we removed the kubectl access to the target cluster as this connection was always used for workload discovery.

@aantn
Copy link
Contributor

aantn commented Apr 10, 2024

Thank you for the update! That makes sense.

LeaveMyYard added a commit that referenced this pull request Apr 22, 2024
@LeaveMyYard
Copy link
Contributor

Closing this one, as #266 is now a successor, updated for a new code version
Thanks @lujiajing1126 for the work done, it helped a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support metrics-based workload discovery
4 participants