Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: The External LB is getting stuck into "pending" state on AWS & few other clouds when protocol is TCP #330

Open
2 tasks done
narmidm opened this issue Feb 1, 2024 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@narmidm
Copy link
Member

narmidm commented Feb 1, 2024

📜 Description

while testing the SGE on private clusters, we found some issues with AWS specific annotations that worker-operator is adding.

// Note: Special treatment for AWS EKS clusters. The LB is not provisioned unless we add AWS specific annotations
    // to the service. This is needed only for EKS.
    if clusterProvider, _ := getClusterProviderID(ctx, r.Client); clusterProvider == "aws" {
        if svc.ObjectMeta.Annotations == nil {
            svc.ObjectMeta.Annotations = make(map[string]string)
        }
        svc.ObjectMeta.Annotations["service.beta.kubernetes.io/aws-load-balancer-type"] = "external"
        svc.ObjectMeta.Annotations["service.beta.kubernetes.io/aws-load-balancer-nlb-target-type"] = "ip"
        svc.ObjectMeta.Annotations["service.beta.kubernetes.io/aws-load-balancer-scheme"] = "internet-facing"
    }

https://github.com/kubeslice/worker-operator/blob/master/controllers/slice/slice_gw_edge.go#L128C1-L137C3
the LB is getting stuck into "pending" state on AWS when protocol is TCP. The issues were resolved simply by removing the annotations.

👟 Reproduction steps

while testing the SGE on private clusters, we found some issues with AWS specific annotations that worker-operator is adding.

👍 Expected behavior

external lb start running without any issue.

👎 Actual Behavior

The External LB is getting stuck into "pending" state.

🐚 Relevant log output

No response

Version

1.2.0

🖥️ What operating system are you seeing the problem on?

Windows

✅ Proposed Solution

We can introduce a new field in the Cluster Custom Resource Definitions (CRDs) for a set of annotations for the External Load Balancer (Ext LB). If any cloud provider needs specific annotations for their Network Load Balancer (Network LB), these can be passed as input during the creation of the Cluster CRD. The Slice Gateway Edge (SGE) will then take these annotations and apply them to the External Load Balancer during its creation.

👀 Have you spent some time to check if this issue has been raised before?

  • I checked and didn't find any similar issue

Code of Conduct

  • I agree to follow this project's Code of Conduct
@narmidm narmidm added the bug Something isn't working label Feb 1, 2024
@narmidm narmidm moved this to In Progress in KubeSlice Roadmap Feb 2, 2024
@narmidm narmidm moved this from In Progress to Todo in KubeSlice Roadmap Feb 2, 2024
@Bhargav-InfraCloud
Copy link
Contributor

Hi team! Is this being worked on? If not, I want to give it a try.
/assign

@narmidm narmidm moved this from Todo to In Progress in KubeSlice Roadmap Mar 12, 2024
@narmidm
Copy link
Member Author

narmidm commented Mar 12, 2024

@Bhargav-InfraCloud, Thanks you for addressing the issue. You will likely need more details, so please join this channel if you are not here yet - https://kubernetes.slack.com/archives/C03Q64HNEJF, and contact Mridul Gain or Bharath for a detailed explanation.

@Bhargav-InfraCloud
Copy link
Contributor

Sure @narmidm. Will check with them. Thanks!

@narmidm
Copy link
Member Author

narmidm commented Apr 24, 2024

@Bhargav-InfraCloud any update. did you get clarification from @mridulgain?

@Bhargav-InfraCloud
Copy link
Contributor

@narmidm Yes, he has shared a few details on Slack. Unfortunately, I got caught up with some other work around the same time and it is still going on. If anyone is up for working on this, please feel free to pick. Otherwise, I'll try again sometime.

@Rajan-226
Copy link

Hey @narmidm @mridulgain, Is this still open?

the LB is getting stuck into "pending" state on AWS when protocol is TCP. The issues were resolved simply by removing the annotations.

It seems like above is already fixed in this PR: #325 where we are only adding annotations for udp protocol?

@mridulgain
Copy link
Contributor

mridulgain commented Aug 5, 2024

hey @Rajan-226 yes the issue is still open because even if the issue was patched by adding AWS specific annotations to the LoadBalancer, we still don't have a way to pass other vendor specific annotations to the control plane as per the proposed solution in the issue #330.

✅ Proposed Solution
We can introduce a new field in the Cluster Custom Resource Definitions (CRDs) for a set of annotations for the External Load Balancer (Ext LB). If any cloud provider needs specific annotations for their Network Load Balancer (Network LB), these can be passed as input during the creation of the Cluster CRD. The Slice Gateway Edge (SGE) will then take these annotations and apply them to the External Load Balancer during its creation.

@Rajan-226
Copy link

Rajan-226 commented Aug 6, 2024

@mridulgain I can try working on this, I would just need a start.

[Ignorable] I checked the codebase of woker-operator, but didn't find any cluster CRD? I could see serviceexport, serviceimport, slice, slicegateway. Could you help me with some first timer guide so that I can get this done?

[Ignorable] Edit 0: It seems like cluster CRD is defined in the other repo. Trying to see how both repos are connected and their roles
https://github.com/kubeslice/kubeslice-controller/blob/master/apis/controller/v1alpha1/cluster_types.go

Edit 1: I see cluster crd controller is present in kubeslice-controller and would be responsible of managing cluster resource.
So, let me know if my understanding regarding the steps is correct?

  1. Add a field for required annotations in cluster CRD.
  2. Use that field in worker operator for adding annotation and remove existing logic.

I've two questions here:

  1. For existing users, we can't stop supporting aws specific annotations which are already added in code. So, this needs to be taken care in cluster controller now?
  2. This is a secondary thing, Is there any guide to test the flow?

@mridulgain
Copy link
Contributor

So, let me know if my understanding regarding the steps is correct?

  1. Add a field for required annotations in cluster CRD.
  2. Use that field in worker operator for adding annotation and remove existing logic.

Yes @Rajan-226 this is what we intend to implement.

  1. For existing users, we can't stop supporting aws specific annotations which are already added in code. So, this needs to be taken care in cluster controller now?

We don't want to hard code these cloud provider specific annotations in reconciler logic. Still we can keep the logic to add AWS specific annotation in the cluster controller for now and remove it eventually.

  1. This is a secondary thing, Is there any guide to test the flow?

Please checkout this installation doc: https://kubeslice.io/documentation/open-source/1.3.0/category/install-kubeslice

@Rajan-226
Copy link

@mridulgain @narmidm After analysing, it seems like this will complicate things a bit.

We're not fetching cluster CRD currently in worker-operator and instead just getting cloudregion from nodes labels. So, we would need some identifier here to fetch corresponding cluster CRD which seems to be not there.

Also, to default AWS specific annotation in the new field introduced, we would need a mutuating webhook for cluster CRD and fetch nodes there to see if cloud region is AWS - which I'm not sure if it's feasible or not.

@narmidm
Copy link
Member Author

narmidm commented Aug 20, 2024

@Rajan-226 Thanks for the suggestion.
I think taking annotations as an input in the cluster CR as a user configuration seems like a viable approach. It would simplify handling cloud-specific annotations by allowing users to specify them directly.

By storing this in the cluster CR, we can easily manage it during service creation and avoid any complexity that comes with different cloud provider. If no annotation is provided, we can simply skip adding it to the Load Balancer, which should cover the use case where no special annotation is needed.

Please share if you have any alternative suggestions on this matter.

@narmidm
Copy link
Member Author

narmidm commented Oct 26, 2024

@Rajan-226, Do you need any further help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

8 participants