This is a Terraform module for provisioning a Weights & Biases Cluster on Azure. Weights & Biases Server is our self-hosted distribution of wandb.ai. It offers enterprises a private instance of the Weights & Biases application, with no resource limits and with additional enterprise-grade architectural features like audit logging and single sign-on.
This module is intended to run in an Azure account with minimal preparation, however it does have the following pre-requisites:
By default, the type of kubernetes instances, number of instances, redis cluster size, and database instance sizes are
standardized via configurations in ./deployment-size.tf, and is configured via the size
input
variable.
Available sizes are, small
, medium
, large
, xlarge
, and xxlarge
. Default is small
.
All the values set via deployment-size.tf
can be overridden by setting the appropriate input variables.
kubernetes_instance_type
- The instance type for the EKS nodeskubernetes_min_node_per_az
- The minimum number of nodes in the EKS clusterkubernetes_max_node_per_az
- The maximum number of nodes in the EKS clusterredis_capacity
- The instance type for the redis clusterdatabase_sku_name
- The instance type for the database
We have included documentation and reference examples for additional common installation scenarios for Weights & Biases, as well as examples for supporting resources that lack official modules.
- Route
Name | Version |
---|---|
terraform | ~> 1.9 |
azapi | ~> 1.0 |
azurerm | ~> 3.17 |
helm | ~> 2.6 |
kubernetes | ~> 2.23 |
Name | Version |
---|---|
azapi | ~> 1.0 |
azurerm | ~> 3.17 |
Name | Source | Version |
---|---|---|
app_aks | ./modules/app_aks | n/a |
app_lb | ./modules/app_lb | n/a |
cert_manager | ./modules/cert_manager | n/a |
clickhouse | ./modules/clickhouse | n/a |
cron_job | ./modules/cron_job | n/a |
database | ./modules/database | n/a |
identity | ./modules/identity | n/a |
networking | ./modules/networking | n/a |
pod_identity | ./modules/identity | n/a |
redis | ./modules/redis | n/a |
storage | ./modules/storage | n/a |
vault | ./modules/vault | n/a |
wandb | wandb/wandb/helm | 2.0.0 |
Name | Type |
---|---|
azapi_resource_list.az_zones | data source |
azurerm_subscription.current | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
allowed_ip_ranges | Allowed public IP addresses or CIDR ranges. | list(string) |
[] |
no |
allowed_subscriptions | List of allowed customer subscriptions coma seperated values | string |
"" |
no |
app_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
azuremonitor | # To support otel azure monitor sql and redis metrics need operator-wandb chart minimum version 0.14.0 | bool |
false |
no |
blob_container | Use an existing bucket. | string |
"" |
no |
bucket_path | path of where to store data for the instance-level bucket | string |
"" |
no |
clickhouse_private_endpoint_service_name | ClickHouse private endpoint 'Service name' (ends in .azure.privatelinkservice). | string |
"" |
no |
clickhouse_region | ClickHouse region (eastus2, westus3, etc). | string |
"" |
no |
cluster_sku_tier | The Azure AKS SKU Tier to use for this cluster (https://learn.microsoft.com/en-us/azure/aks/free-standard-pricing-tiers) | string |
"Free" |
no |
controller_image_tag | Tag of the controller image to deploy | string |
"1.14.0" |
no |
create_private_link | Use for the azure private link. | bool |
false |
no |
create_redis | Boolean indicating whether to provision an redis instance (true) or not (false). | bool |
false |
no |
database_availability_mode | n/a | string |
"SameZone" |
no |
database_sku_name | Specifies the SKU Name for this MySQL Server. Defaults to null and value from deployment-size.tf is used | string |
null |
no |
database_version | Version for MySQL | string |
"5.7" |
no |
deletion_protection | If the instance should have deletion protection enabled. The database / Bucket can't be deleted when this value is set to true . |
bool |
true |
no |
disable_storage_vault_key_id | Flag to disable the customer_managed_key block, the properties 'encryption.identity, encryption.keyvaultproperties' cannot be updated in a single operation. |
bool |
false |
no |
domain_name | Domain for accessing the Weights & Biases UI. | string |
null |
no |
enable_database_vault_key | Flag to enable managed key encryption for the database. Once enabled, cannot be disabled. | bool |
false |
no |
enable_helm_release | Enable or disable applying and releasing Helm chart | bool |
true |
no |
enable_storage_vault_key | Flag to enable managed key encryption for the storage account. | bool |
false |
no |
external_bucket | config an external bucket | any |
null |
no |
kubernetes_cluster_oidc_issuer_url | OIDC issuer URL for the Kubernetes cluster. Can be determined using kubectl get --raw /.well-known/openid-configuration |
string |
"" |
no |
kubernetes_instance_type | Instance type for primary node group. Defaults to null and value from deployment-size.tf is used | string |
null |
no |
kubernetes_max_node_per_az | Maximum number of nodes for the AKS cluster. Defaults to null and value from deployment-size.tf is used | number |
null |
no |
kubernetes_min_node_per_az | Minimum number of nodes for the AKS cluster. Defaults to null and value from deployment-size.tf is used | number |
null |
no |
license | Your wandb/local license | string |
n/a | yes |
location | n/a | string |
n/a | yes |
namespace | String used for prefix resources. | string |
n/a | yes |
node_max_pods | Maximum number of pods per node | number |
30 |
no |
node_pool_num_zones | Number of availability zones to use for the node pool when node_pool_zones is not set. If neither are set, 3 zones will be used | number |
2 |
no |
node_pool_zones | Availability zones for the node pool | list(string) |
null |
no |
oidc_auth_method | OIDC auth method | string |
"implicit" |
no |
oidc_client_id | The Client ID of application in your identity provider | string |
"" |
no |
oidc_issuer | A url to your Open ID Connect identity provider, i.e. https://cognito-idp.us-east-1.amazonaws.com/us-east-1_uiIFNdacd | string |
"" |
no |
oidc_secret | The Client secret of application in your identity provider | string |
"" |
no |
operator_chart_version | Version of the operator chart to deploy | string |
"1.3.4" |
no |
other_wandb_env | Extra environment variables for W&B | map(any) |
{} |
no |
parquet_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
redis_capacity | Number indicating size of an redis instance. Defaults to null and value from deployment-size.tf is used | number |
null |
no |
size | Deployment size | string |
"small" |
no |
ssl | Enable SSL certificate | bool |
true |
no |
storage_account | Azure storage account name | string |
"" |
no |
storage_key | Azure primary storage access key | string |
"" |
no |
subdomain | Subdomain for accessing the Weights & Biases UI. Default creates record at Route53 Route. | string |
null |
no |
tags | Map of tags for resource | map(string) |
{} |
no |
use_internal_queue | Uses an internal redis queue instead of using azure queue. | bool |
false |
no |
wandb_image | Docker repository of to pull the wandb image from. | string |
"wandb/local" |
no |
wandb_version | The version of Weights & Biases local to deploy. | string |
"latest" |
no |
weave_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
Name | Description |
---|---|
address | n/a |
aks_max_node_count | n/a |
aks_min_node_count | n/a |
aks_node_instance_type | n/a |
client_id | n/a |
cluster_ca_certificate | n/a |
cluster_client_certificate | n/a |
cluster_client_key | n/a |
cluster_host | n/a |
database_instance_type | n/a |
fqdn | The FQDN to the W&B application |
oidc_issuer_url | n/a |
private_link_resource_id | n/a |
private_link_sub_resource_name | n/a |
standardized_size | n/a |
tenant_id | n/a |
url | The URL to the W&B application |
3.0.0 introduced autoscaling to the AKS cluster and made the size
variable the preferred way to set the cluster size.
Previously, unless the size
variable was set explicitly, there were default values for the following variables:
kubernetes_instance_type
kubernetes_node_count
redis_capacity
database_sku_name
The size
variable is now defaulted to small
, and the following values to can be used to partially override the values
set by the size
variable:
kubernetes_instance_type
kubernetes_min_node_per_az
kubernetes_max_node_per_az
redis_capacity
database_sku_name
For more information on the available sizes, see the Cluster Sizing section.
If having the cluster scale nodes in and out is not desired, the kubernetes_min_node_per_az
and
kubernetes_max_node_per_az
can be set to the same value to prevent the cluster from scaling.
When upgrading from 2.x to 3.x, the following changes are required:
- Add the
azapi
provider to therequired_providers
block:
terraform {
required_providers {
azapi = {
source = "azure/azapi"
version = "~> 1.0"
}
}
}
- Add the
azapi
provider to theprovider
block:
provider "azapi" {
# azapi provider configuration should be the same as azurerm provider configuration
}