-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: The External LB is getting stuck into "pending" state on AWS & few other clouds when protocol is TCP #330
Comments
Hi team! Is this being worked on? If not, I want to give it a try. |
@Bhargav-InfraCloud, Thanks you for addressing the issue. You will likely need more details, so please join this channel if you are not here yet - https://kubernetes.slack.com/archives/C03Q64HNEJF, and contact Mridul Gain or Bharath for a detailed explanation. |
Sure @narmidm. Will check with them. Thanks! |
@Bhargav-InfraCloud any update. did you get clarification from @mridulgain? |
@narmidm Yes, he has shared a few details on Slack. Unfortunately, I got caught up with some other work around the same time and it is still going on. If anyone is up for working on this, please feel free to pick. Otherwise, I'll try again sometime. |
Hey @narmidm @mridulgain, Is this still open?
It seems like above is already fixed in this PR: #325 where we are only adding annotations for udp protocol? |
hey @Rajan-226 yes the issue is still open because even if the issue was patched by adding AWS specific annotations to the LoadBalancer, we still don't have a way to pass other vendor specific annotations to the control plane as per the proposed solution in the issue #330.
|
@mridulgain I can try working on this, I would just need a start. [Ignorable] I checked the codebase of woker-operator, but didn't find any cluster CRD? I could see serviceexport, serviceimport, slice, slicegateway. Could you help me with some first timer guide so that I can get this done? [Ignorable] Edit 0: It seems like cluster CRD is defined in the other repo. Trying to see how both repos are connected and their roles Edit 1: I see cluster crd controller is present in kubeslice-controller and would be responsible of managing cluster resource.
I've two questions here:
|
Yes @Rajan-226 this is what we intend to implement.
We don't want to hard code these cloud provider specific annotations in reconciler logic. Still we can keep the logic to add AWS specific annotation in the cluster controller for now and remove it eventually.
Please checkout this installation doc: https://kubeslice.io/documentation/open-source/1.3.0/category/install-kubeslice |
@mridulgain @narmidm After analysing, it seems like this will complicate things a bit. We're not fetching cluster CRD currently in worker-operator and instead just getting cloudregion from nodes labels. So, we would need some identifier here to fetch corresponding cluster CRD which seems to be not there. Also, to default AWS specific annotation in the new field introduced, we would need a mutuating webhook for cluster CRD and fetch nodes there to see if cloud region is AWS - which I'm not sure if it's feasible or not. |
@Rajan-226 Thanks for the suggestion. By storing this in the cluster CR, we can easily manage it during service creation and avoid any complexity that comes with different cloud provider. If no annotation is provided, we can simply skip adding it to the Load Balancer, which should cover the use case where no special annotation is needed. Please share if you have any alternative suggestions on this matter. |
@Rajan-226, Do you need any further help? |
📜 Description
while testing the SGE on private clusters, we found some issues with AWS specific annotations that worker-operator is adding.
https://github.com/kubeslice/worker-operator/blob/master/controllers/slice/slice_gw_edge.go#L128C1-L137C3
the LB is getting stuck into "pending" state on AWS when protocol is TCP. The issues were resolved simply by removing the annotations.
👟 Reproduction steps
while testing the SGE on private clusters, we found some issues with AWS specific annotations that worker-operator is adding.
👍 Expected behavior
external lb start running without any issue.
👎 Actual Behavior
The External LB is getting stuck into "pending" state.
🐚 Relevant log output
No response
Version
1.2.0
🖥️ What operating system are you seeing the problem on?
Windows
✅ Proposed Solution
We can introduce a new field in the Cluster Custom Resource Definitions (CRDs) for a set of annotations for the External Load Balancer (Ext LB). If any cloud provider needs specific annotations for their Network Load Balancer (Network LB), these can be passed as input during the creation of the Cluster CRD. The Slice Gateway Edge (SGE) will then take these annotations and apply them to the External Load Balancer during its creation.
👀 Have you spent some time to check if this issue has been raised before?
Code of Conduct
The text was updated successfully, but these errors were encountered: