Skip to content

v1.0.2

Compare
Choose a tag to compare
@imreddy13 imreddy13 released this 17 Nov 22:00
· 733 commits to main since this release
48e6cfe

Ray Serve

  • Introduced support for Ray on Autopilot with 3 predefined worker groups - small (only CPU), medium (1 GPU), and large (8 GPUs): 7082b13

Ray on GKE Storage
#87 provides examples for Ray on GKE storage solutions:

  • One-click deploy setup for GCS bucket + Kuberay access of control
  • Leveraging GKE GCS Fuse CSI to access GCS Buckets as a shared filesystem and use standard file semantics (thereby eliminating the need to use specialized fsspec libraries)

Ray Data
The Ray data API tutorial with stable diffusion e2e finetuning example (PR) deploys a Ray training job from a Jupyter notebook to a Ray cluster on GKE, and illustrates the following:

  • Caching HuggingFace StableDiffusion model checkpoint into a GCS bucket and mount it to Ray workers in the Ray cluster hosted on GKE
  • Using RayData APIs to perform batch inference to generate regularization images needed for the fine-tuning
  • Using RayTrain framework for distributed training with multiple GPUs in a multi-node GKE cluster setup

Kuberay

  • Pin Kuberay version to v0.6.0 and helm chart version to v0.6.1
  • Install Kuberay operator in a dedicated namespace (ray-system)

Jupyter Notebooks

  • Secure authentication via Identity-aware proxy (IAP) is now enabled by default for Jupyterhub, for both Standard & Autopilot clusters. Here is the sample user guide to configure the IAP client in your Jupyterhub installation. This ensures the Jupyterhub endpoint is no longer exposed to the public internet.

Distributed training of PyTorch CNN

  • JobSet example for distributed training of PyTorch CNN handwritten digit classification model using the MNIST dataset.
  • Indexed Job example for distributed training of a PyTorch CNN handwritten digit classification model the MNIST dataset on NVIDIA T4 GPUs.

Inferencing using Saxml and an HTTP Server

  • Example to deploy an HTTP Server to handle HTTP requests to Sax, which has support for features such as model publishing, listing, updating, unpublishing, and generating predictions. With an HTTP server, interaction with Sax can also expand further than at the VM-level. For example, integration with GKE and load balancing will enable requests to Sax from inside and outside the GKE cluster.

Finetuning and Serving Llama on L4 GPUs

  • Example for finetuning Llama 7B model on GKE using 8 x L4 GPUs
  • Example for serving Llama 70B model on GKE with 2 L4 GPUs

Validation of Changes to Ray on GKE Templates

  • Pull requests now trigger cloud build tests to detect breaking changes made to the GKE platform and Kuberay solution templates.