Skip to content

Commit

Permalink
Merge branch 'add-azure-credential' of https://github.com/MaggieZhang…
Browse files Browse the repository at this point in the history
…-01/cube into MaggieZhang-01-add-azure-credential

# Conflicts:
#	packages/cubejs-backend-shared/src/env.ts
#	packages/cubejs-databricks-jdbc-driver/package.json
#	packages/cubejs-databricks-jdbc-driver/src/DatabricksDriver.ts
  • Loading branch information
KSDaemon committed Jan 16, 2025
2 parents d9bc147 + 63a3856 commit fd4d3b6
Show file tree
Hide file tree
Showing 9 changed files with 380 additions and 25 deletions.
15 changes: 15 additions & 0 deletions docs/pages/product/configuration/data-sources/databricks-jdbc.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,17 @@ CUBEJS_DB_EXPORT_BUCKET=wasbs://[email protected]
CUBEJS_DB_EXPORT_BUCKET_AZURE_KEY=<AZURE_STORAGE_ACCOUNT_ACCESS_KEY>
```

Access key provides full access to the configuration and data,
to use a fine-grained control over access to storage resources, follow [the Databricks guide on authorize with Azure Active Directory][authorize-with-azure-active-directory].

[Create the service principal][azure-authentication-with-service-principal] and replace the access key as follows:

```dotenv
CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID=<AZURE_TENANT_ID>
CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID=<AZURE_CLIENT_ID>
CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET=<AZURE_CLIENT_SECRET>
```

## SSL/TLS

Cube does not require any additional configuration to enable SSL/TLS for
Expand All @@ -150,6 +161,10 @@ bucket][self-preaggs-export-bucket] **must be** configured.
[azure-bs]: https://azure.microsoft.com/en-gb/services/storage/blobs/
[azure-bs-docs-get-key]:
https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&tabs=azure-portal#view-account-access-keys
[authorize-with-azure-active-directory]:
https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-azure-active-directory
[azure-authentication-with-service-principal]:
https://learn.microsoft.com/en-us/azure/developer/java/sdk/identity-service-principal-auth
[databricks]: https://databricks.com/
[databricks-docs-dbfs]: https://docs.databricks.com/en/dbfs/mounts.html
[databricks-docs-azure]:
Expand Down
60 changes: 60 additions & 0 deletions docs/pages/reference/configuration/environment-variables.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,66 @@ with a data source][ref-config-multiple-ds-decorating-env].
| -------------------------------------- | ---------------------- | --------------------- |
| [A valid AWS region][aws-docs-regions] | N/A | N/A |

## `CUBEJS_DB_EXPORT_BUCKET_AZURE_KEY`

The Azure Access Key to use for the export bucket.

<InfoBox>

When using multiple data sources, this environment variable can be [decorated
with a data source][ref-config-multiple-ds-decorating-env].

</InfoBox>

| Possible Values | Default in Development | Default in Production |
| ------------------------ | ---------------------- | --------------------- |
| A valid Azure Access Key | N/A | N/A |

## `CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID`

The Azure tenant ID to use for the export bucket.

<InfoBox>

When using multiple data sources, this environment variable can be [decorated
with a data source][ref-config-multiple-ds-decorating-env].

</InfoBox>

| Possible Values | Default in Development | Default in Production |
| ----------------------- | ---------------------- | --------------------- |
| A valid Azure Tenant ID | N/A | N/A |

## `CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID`

The Azure client ID to use for the export bucket.

<InfoBox>

When using multiple data sources, this environment variable can be [decorated
with a data source][ref-config-multiple-ds-decorating-env].

</InfoBox>

| Possible Values | Default in Development | Default in Production |
| ----------------------- | ---------------------- | --------------------- |
| A valid Azure Client ID | N/A | N/A |

## `CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET`

The Azure client secret to use for the export bucket.

<InfoBox>

When using multiple data sources, this environment variable can be [decorated
with a data source][ref-config-multiple-ds-decorating-env].

</InfoBox>

| Possible Values | Default in Development | Default in Production |
| --------------------------- | ---------------------- | --------------------- |
| A valid Azure Client Secret | N/A | N/A |

## `CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIR`

The mount path to use for a [Databricks DBFS mount][databricks-docs-dbfs].
Expand Down
13 changes: 13 additions & 0 deletions packages/cubejs-backend-shared/src/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -795,6 +795,19 @@ const variables: Record<string, (...args: any) => any> = {
]
),

/**
* Client Secret for the Azure based export bucket storage.
*/
dbExportBucketAzureClientSecret: ({
dataSource,
}: {
dataSource: string,
}) => (
process.env[
keyByDataSource('CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET', dataSource)
]
),

/**
* Azure Federated Token File Path for the Azure based export bucket storage.
*/
Expand Down
87 changes: 87 additions & 0 deletions packages/cubejs-backend-shared/test/db_env_multi.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -956,6 +956,93 @@ describe('Multiple datasources', () => {
);
});

test('getEnv("dbExportBucketAzureTenantId")', () => {
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'default1';
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'postgres1';
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'wrong1';
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toEqual('default1');
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toEqual('postgres1');
expect(() => getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toThrow(
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
);

process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'default2';
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'postgres2';
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'wrong2';
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toEqual('default2');
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toEqual('postgres2');
expect(() => getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toThrow(
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
);

delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID;
delete process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_TENANT_ID;
delete process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_TENANT_ID;
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toBeUndefined();
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toBeUndefined();
expect(() => getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toThrow(
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
);
});

test('getEnv("dbExportBucketAzureClientId")', () => {
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'default1';
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'postgres1';
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'wrong1';
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toEqual('default1');
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toEqual('postgres1');
expect(() => getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toThrow(
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
);

process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'default2';
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'postgres2';
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'wrong2';
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toEqual('default2');
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toEqual('postgres2');
expect(() => getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toThrow(
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
);

delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID;
delete process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_ID;
delete process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_ID;
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toBeUndefined();
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toBeUndefined();
expect(() => getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toThrow(
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
);
});

test('getEnv("dbExportBucketAzureClientSecret")', () => {
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'default1';
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'postgres1';
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'wrong1';
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toEqual('default1');
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toEqual('postgres1');
expect(() => getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toThrow(
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
);

process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'default2';
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'postgres2';
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'wrong2';
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toEqual('default2');
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toEqual('postgres2');
expect(() => getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toThrow(
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
);

delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET;
delete process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET;
delete process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET;
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toBeUndefined();
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toBeUndefined();
expect(() => getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toThrow(
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
);
});

test('getEnv("dbExportIntegration")', () => {
process.env.CUBEJS_DB_EXPORT_INTEGRATION = 'default1';
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_INTEGRATION = 'postgres1';
Expand Down
51 changes: 51 additions & 0 deletions packages/cubejs-backend-shared/test/db_env_single.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,57 @@ describe('Single datasources', () => {
expect(getEnv('dbExportBucketAzureKey', { dataSource: 'wrong' })).toBeUndefined();
});

test('getEnv("dbExportBucketAzureTenantId")', () => {
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'default1';
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toEqual('default1');
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toEqual('default1');
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toEqual('default1');

process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'default2';
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toEqual('default2');
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toEqual('default2');
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toEqual('default2');

delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID;
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toBeUndefined();
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toBeUndefined();
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toBeUndefined();
});

test('getEnv("dbExportBucketAzureClientId")', () => {
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'default1';
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toEqual('default1');
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toEqual('default1');
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toEqual('default1');

process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'default2';
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toEqual('default2');
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toEqual('default2');
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toEqual('default2');

delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID;
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toBeUndefined();
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toBeUndefined();
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toBeUndefined();
});

test('getEnv("dbExportBucketAzureClientSecret")', () => {
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'default1';
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toEqual('default1');
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toEqual('default1');
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toEqual('default1');

process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'default2';
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toEqual('default2');
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toEqual('default2');
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toEqual('default2');

delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET;
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toBeUndefined();
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toBeUndefined();
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toBeUndefined();
});

test('getEnv("dbExportIntegration")', () => {
process.env.CUBEJS_DB_EXPORT_INTEGRATION = 'default1';
expect(getEnv('dbExportIntegration', { dataSource: 'default' })).toEqual('default1');
Expand Down
34 changes: 33 additions & 1 deletion packages/cubejs-base-driver/src/BaseDriver.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import {
} from '@azure/storage-blob';
import {
DefaultAzureCredential,
ClientSecretCredential,
} from '@azure/identity';

import { cancelCombinator } from './utils';
Expand Down Expand Up @@ -73,6 +74,15 @@ export type AzureStorageClientConfig = {
* the Azure library will try to use the AZURE_TENANT_ID env
*/
tenantId?: string,
/**
* Azure service principal client secret.
* Enables authentication to Microsoft Entra ID using a client secret that was generated
* for an App Registration. More information on how to configure a client secret can be found here:
* https://learn.microsoft.com/entra/identity-platform/quickstart-configure-app-access-web-apis#add-credentials-to-your-web-application
* In case of DefaultAzureCredential flow if it is omitted
* the Azure library will try to use the AZURE_CLIENT_SECRET env
*/
clientSecret?: string,
/**
* The path to a file containing a Kubernetes service account token that authenticates the identity.
* In case of DefaultAzureCredential flow if it is omitted
Expand Down Expand Up @@ -760,7 +770,7 @@ export abstract class BaseDriver implements DriverInterface {
const parts = bucketName.split(splitter);
const account = parts[0];
const container = parts[1].split('/')[0];
let credential: StorageSharedKeyCredential | DefaultAzureCredential;
let credential: StorageSharedKeyCredential | ClientSecretCredential | DefaultAzureCredential;
let blobServiceClient: BlobServiceClient;
let getSas;

Expand All @@ -778,6 +788,28 @@ export abstract class BaseDriver implements DriverInterface {
},
credential as StorageSharedKeyCredential
).toString();
} else if (azureConfig.clientSecret && azureConfig.tenantId && azureConfig.clientId) {
credential = new ClientSecretCredential(
azureConfig.tenantId,
azureConfig.clientId,
azureConfig.clientSecret,
);
getSas = async (name: string, startsOn: Date, expiresOn: Date) => {
const userDelegationKey = await blobServiceClient.getUserDelegationKey(startsOn, expiresOn);
return generateBlobSASQueryParameters(
{
containerName: container,
blobName: name,
permissions: ContainerSASPermissions.parse('r'),
startsOn,
expiresOn,
protocol: SASProtocol.Https,
version: '2020-08-04',
},
userDelegationKey,
account
).toString();
};
} else {
const opts = {
tenantId: azureConfig.tenantId,
Expand Down
2 changes: 2 additions & 0 deletions packages/cubejs-databricks-jdbc-driver/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
"build": "rm -rf dist && npm run tsc",
"tsc": "tsc",
"watch": "tsc -w",
"test": "npm run unit",
"unit": "jest dist/test --forceExit",
"lint": "eslint src/* --ext .ts",
"lint:fix": "eslint --fix src/* --ext .ts",
"postinstall": "node bin/post-install"
Expand Down
Loading

0 comments on commit fd4d3b6

Please sign in to comment.