Skip to content

Releases: GoogleCloudPlatform/ai-on-gke

v1.0.2

17 Nov 22:00
48e6cfe
Compare
Choose a tag to compare

Ray Serve

  • Introduced support for Ray on Autopilot with 3 predefined worker groups - small (only CPU), medium (1 GPU), and large (8 GPUs): 7082b13

Ray on GKE Storage
#87 provides examples for Ray on GKE storage solutions:

  • One-click deploy setup for GCS bucket + Kuberay access of control
  • Leveraging GKE GCS Fuse CSI to access GCS Buckets as a shared filesystem and use standard file semantics (thereby eliminating the need to use specialized fsspec libraries)

Ray Data
The Ray data API tutorial with stable diffusion e2e finetuning example (PR) deploys a Ray training job from a Jupyter notebook to a Ray cluster on GKE, and illustrates the following:

  • Caching HuggingFace StableDiffusion model checkpoint into a GCS bucket and mount it to Ray workers in the Ray cluster hosted on GKE
  • Using RayData APIs to perform batch inference to generate regularization images needed for the fine-tuning
  • Using RayTrain framework for distributed training with multiple GPUs in a multi-node GKE cluster setup

Kuberay

  • Pin Kuberay version to v0.6.0 and helm chart version to v0.6.1
  • Install Kuberay operator in a dedicated namespace (ray-system)

Jupyter Notebooks

  • Secure authentication via Identity-aware proxy (IAP) is now enabled by default for Jupyterhub, for both Standard & Autopilot clusters. Here is the sample user guide to configure the IAP client in your Jupyterhub installation. This ensures the Jupyterhub endpoint is no longer exposed to the public internet.

Distributed training of PyTorch CNN

  • JobSet example for distributed training of PyTorch CNN handwritten digit classification model using the MNIST dataset.
  • Indexed Job example for distributed training of a PyTorch CNN handwritten digit classification model the MNIST dataset on NVIDIA T4 GPUs.

Inferencing using Saxml and an HTTP Server

  • Example to deploy an HTTP Server to handle HTTP requests to Sax, which has support for features such as model publishing, listing, updating, unpublishing, and generating predictions. With an HTTP server, interaction with Sax can also expand further than at the VM-level. For example, integration with GKE and load balancing will enable requests to Sax from inside and outside the GKE cluster.

Finetuning and Serving Llama on L4 GPUs

  • Example for finetuning Llama 7B model on GKE using 8 x L4 GPUs
  • Example for serving Llama 70B model on GKE with 2 L4 GPUs

Validation of Changes to Ray on GKE Templates

  • Pull requests now trigger cloud build tests to detect breaking changes made to the GKE platform and Kuberay solution templates.

TPU support for Ray, persistant ray logs & metrics, JupyterHub improvements

15 Sep 00:18
fad927e
Compare
Choose a tag to compare

AI on GKE 1.0.1

The 1.0.1 patch introduces TPU support for Ray, persistent & searchable Ray logs and metrics and pre-configured resource profiles for Jupyterhub.

Support for TPUs with Ray

TPUs are now a first-class citizen in Ray’s resource orchestration layer, making the experience just like using GPUs. The user guide outlines how to get started with TPUs on Ray.

Improvements to Ray observability

Ray on GKE automatically write Ray logs and metrics to GCP, so users can view persistent logs & metrics across multiple clusters. Even if your ray cluster dies, you still have visibility into previous jobs via GCP.
See the Logging & Monitoring section for more details on usage.

  • Logs are exported via a fluentbit sidecar and tagged with the Ray job submission ID. The job submission ID can be used to filter Ray job logs in Cloud Logging:

image

  • Metrics are exported via Prometheus and can be viewed in Cloud Monitoring:

image

Multiple user profiles support for JupyterHub

JupyterHub comes installed with different user profiles, each profile specifies different types of resources (GPU/CPU, memory, image). This user guide outlines how to get started with JupyterHub and configure profiles for your use case:

image