Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confluent Control Center stops working after a couple of hours #305

Open
MedAzizTousli opened this issue Jun 12, 2024 · 1 comment
Open

Comments

@MedAzizTousli
Copy link

Hello everybody.

This issue I am getting with Control Center is making me go insane. After I deploy Confluent's Control Center using CRDs provided from Confluent for Kubernetes Operator, it works fine for a couple of hours. And then the next day, it starts crashing over and over, and throwing the below error. I checked everywhere on the Internet. I tried every possible configuration, and yet I was not able to fix it. Any help is much appreciated.

Aziz:~/environment $ kubectl logs controlcenter-0 | grep ERROR
Defaulted container "controlcenter" out of: controlcenter, config-init-container (init)
[2024-06-12 10:46:49,746] ERROR [_confluent-controlcenter-7-6-0-0-command-9a6a26f4-8b98-466c-801e-64d4d72d3e90-StreamThread-1] RackId doesn't exist for process 9a6a26f4-8b98-466c-801e-64d4d72d3e90 and consumer _confluent-controlcenter-7-6-0-0-command-9a6a26f4-8b98-466c-801e-64d4d72d3e90-StreamThread-1-consumer-a86738dc-d33b-4a03-99de-250d9c58f98d (org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor)
[2024-06-12 10:46:55,102] ERROR [_confluent-controlcenter-7-6-0-0-a182015e-cce9-40c0-9eb6-e83c7cbcaecb-StreamThread-8] RackId doesn't exist for process a182015e-cce9-40c0-9eb6-e83c7cbcaecb and consumer _confluent-controlcenter-7-6-0-0-a182015e-cce9-40c0-9eb6-e83c7cbcaecb-StreamThread-1-consumer-69db8b61-77d7-4ee5-9ce5-c018c5d12ad9 (org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor)
[2024-06-12 10:46:57,088] ERROR [_confluent-controlcenter-7-6-0-0-a182015e-cce9-40c0-9eb6-e83c7cbcaecb-StreamThread-7] [Consumer clientId=_confluent-controlcenter-7-6-0-0-a182015e-cce9-40c0-9eb6-e83c7cbcaecb-StreamThread-7-restore-consumer, groupId=null] Unable to find FetchSessionHandler for node 0. Ignoring fetch response. (org.apache.kafka.clients.consumer.internals.AbstractFetch)

This is my Control Center deployment using CRD provided from Confluent Operator for Kubernetes. I am available to provide any additional details if needed.

apiVersion: platform.confluent.io/v1beta1
kind: ControlCenter
metadata:
  name: controlcenter
  namespace: staging-kafka
spec:
  dataVolumeCapacity: 1Gi
  replicas: 1
  image:
    application: confluentinc/cp-enterprise-control-center:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  configOverrides:
    server:
      - confluent.controlcenter.internal.topics.replication=1
      - confluent.controlcenter.command.topic.replication=1
      - confluent.monitoring.interceptor.topic.replication=1
      - confluent.metrics.topic.replication=1
  dependencies:
    kafka:
      bootstrapEndpoint: kafka:9092
    schemaRegistry:
      url: http://schemaregistry:8081
    ksqldb:
      - name: ksqldb
        url: http://ksqldb:8088
    connect:
      - name: connect
        url: http://connect:8083
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: 'kafka'
              operator: In
              values:
              - 'true'
  externalAccess:
    type: loadBalancer
    loadBalancer:
      domain: 'domain.com'
      prefix: 'staging-controlcenter'
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: external
        service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
@MosheBlumbergX
Copy link
Contributor

Hi @MedAzizTousli , whilst this is indeed an error it's not a great indication of the issue.
I suggest looking at the kubectl events to see if there are any errors there, maybe even add initialDelaySeconds to the CRD:

spec:
  podTemplate:
    probe: 
      readiness: 
        initialDelaySeconds: 60 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants