Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and Improve Eager Start Handling #393

Merged
merged 7 commits into from
Jan 14, 2025
Merged

Fix and Improve Eager Start Handling #393

merged 7 commits into from
Jan 14, 2025

Conversation

lucas-koehler
Copy link
Contributor

Requires:

Summary

This pull request addresses critical issues in the eager start handling, which was previously broken, and introduces improvements to naming, session management, pod synchronization, and performance in Kubernetes environments.

Details

Common

  • Name Handling for Eager Start:
    • Resolved an endless loop and a NullPointerException (NPE) in NamingUtil methods.
    • Fixed ID extraction from generated names to align with updates from an earlier PR.

Operator - Eager Session Handling

  • Added annotations to trigger ConfigMap updates for OAuth proxy synchronization in pods. This ensures, the assigned user can actually access the pod immediately after redirection
  • Implemented session cleanup and pod restart logic in the renamed EagerSessionHandler.

Service

  • Enhanced max instance checks to consider sessions relevant to the app definition instead of analyzing all pods.
  • Updated pod resolution logic to rely on owner references instead of environment variables, ensuring compatibility with eager start mode.
  • Improved logging and performance reporting in K8sUtil and the performance REST handler.

Testing

  • Added a Terraform test configuration for eager start mode.
  • Added a VS Code launch config to start the operator in eager mode for the new Terraform config
  • Extend testing page with a button to get session performance data

Notes on Testing

- Fix endless loop and a NPE in the NamingUtil's methods for eager start name creation (based on an app definition)
- Fix id extraction from generated names for the adapted name generation in an earlier PR.
Add an annotation to an eagerly started instance pod after the email config map was updated.
This is necessary to trigger a sync with the Kubelet.
Otherwise, the pod might not be updated with the new email list for the OAuth proxy in time.
This is the case because ConfigMap changes are not propagated to the pod immediately but during a periodic sync.
- Rename EagerSessionStartHandler to EagerSessionHandler
- Implement `sessionDeleted` method to clean up and make the instance available for another user.
  This includes restarting the pod to get a new temporary workspace.
Check whether max instances are reached by counting sessions relevant to the app definition that are not in error state.
This makes it unnecessary to analyze all pods belonging to the session. Thus, performance for large amounts of pods is increased.

K8sUtil#getPodForSession now finds a pod belonging to a session via owner references instead of ENV variables.
With this, the pod can also be resolved in eager mode where the session name cannot be injected as an env variable.

Also adds some logging to K8sUtil#reportPerformance and increases the log level for exceptions
in the performance REST handler method.
Copy link
Contributor

@jfaltermeier jfaltermeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice so far. Starting sessions works again. During testing I noticed an issue with deletion

- Extend LabelsUtil to add the session name to session labels
- Find service from session via labels instead of owner references in eager session delete handler
  With this, the handling also works if the owner reference has been removed automatically
- Add minor clarifying comments
@lucas-koehler
Copy link
Contributor Author

Hi @jfaltermeier , I added one commit that should fix session delete handling even if dangling owner references are cleaned up automatically. Please have a look again :)

Copy link
Contributor

@jfaltermeier jfaltermeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! The Ingress cleanup is working as expected now. Also, the Pod is restarted on session deletion, making sure that all previous user data is gone.

@lucas-koehler lucas-koehler merged commit 99dd2c8 into main Jan 14, 2025
14 checks passed
@lucas-koehler lucas-koehler deleted the lk/eager-start branch January 14, 2025 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants