Confluent Control Center stops working after a couple of hours #305

MedAzizTousli · 2024-06-12T14:52:18Z

Hello everybody.

This issue I am getting with Control Center is making me go insane. After I deploy Confluent's Control Center using CRDs provided from Confluent for Kubernetes Operator, it works fine for a couple of hours. And then the next day, it starts crashing over and over, and throwing the below error. I checked everywhere on the Internet. I tried every possible configuration, and yet I was not able to fix it. Any help is much appreciated.

Aziz:~/environment $ kubectl logs controlcenter-0 | grep ERROR
Defaulted container "controlcenter" out of: controlcenter, config-init-container (init)
[2024-06-12 10:46:49,746] ERROR [_confluent-controlcenter-7-6-0-0-command-9a6a26f4-8b98-466c-801e-64d4d72d3e90-StreamThread-1] RackId doesn't exist for process 9a6a26f4-8b98-466c-801e-64d4d72d3e90 and consumer _confluent-controlcenter-7-6-0-0-command-9a6a26f4-8b98-466c-801e-64d4d72d3e90-StreamThread-1-consumer-a86738dc-d33b-4a03-99de-250d9c58f98d (org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor)
[2024-06-12 10:46:55,102] ERROR [_confluent-controlcenter-7-6-0-0-a182015e-cce9-40c0-9eb6-e83c7cbcaecb-StreamThread-8] RackId doesn't exist for process a182015e-cce9-40c0-9eb6-e83c7cbcaecb and consumer _confluent-controlcenter-7-6-0-0-a182015e-cce9-40c0-9eb6-e83c7cbcaecb-StreamThread-1-consumer-69db8b61-77d7-4ee5-9ce5-c018c5d12ad9 (org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor)
[2024-06-12 10:46:57,088] ERROR [_confluent-controlcenter-7-6-0-0-a182015e-cce9-40c0-9eb6-e83c7cbcaecb-StreamThread-7] [Consumer clientId=_confluent-controlcenter-7-6-0-0-a182015e-cce9-40c0-9eb6-e83c7cbcaecb-StreamThread-7-restore-consumer, groupId=null] Unable to find FetchSessionHandler for node 0. Ignoring fetch response. (org.apache.kafka.clients.consumer.internals.AbstractFetch)

This is my Control Center deployment using CRD provided from Confluent Operator for Kubernetes. I am available to provide any additional details if needed.

apiVersion: platform.confluent.io/v1beta1
kind: ControlCenter
metadata:
  name: controlcenter
  namespace: staging-kafka
spec:
  dataVolumeCapacity: 1Gi
  replicas: 1
  image:
    application: confluentinc/cp-enterprise-control-center:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  configOverrides:
    server:
      - confluent.controlcenter.internal.topics.replication=1
      - confluent.controlcenter.command.topic.replication=1
      - confluent.monitoring.interceptor.topic.replication=1
      - confluent.metrics.topic.replication=1
  dependencies:
    kafka:
      bootstrapEndpoint: kafka:9092
    schemaRegistry:
      url: http://schemaregistry:8081
    ksqldb:
      - name: ksqldb
        url: http://ksqldb:8088
    connect:
      - name: connect
        url: http://connect:8083
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: 'kafka'
              operator: In
              values:
              - 'true'
  externalAccess:
    type: loadBalancer
    loadBalancer:
      domain: 'domain.com'
      prefix: 'staging-controlcenter'
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: external
        service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

The text was updated successfully, but these errors were encountered:

MosheBlumbergX · 2024-08-20T11:22:31Z

Hi @MedAzizTousli , whilst this is indeed an error it's not a great indication of the issue.
I suggest looking at the kubectl events to see if there are any errors there, maybe even add initialDelaySeconds to the CRD:

spec:
  podTemplate:
    probe: 
      readiness: 
        initialDelaySeconds: 60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confluent Control Center stops working after a couple of hours #305

Confluent Control Center stops working after a couple of hours #305

MedAzizTousli commented Jun 12, 2024

MosheBlumbergX commented Aug 20, 2024

Confluent Control Center stops working after a couple of hours #305

Confluent Control Center stops working after a couple of hours #305

Comments

MedAzizTousli commented Jun 12, 2024

MosheBlumbergX commented Aug 20, 2024