Skip to content

Latest commit

 

History

History
442 lines (358 loc) · 22.9 KB

BAZEL_MIGRATION.md

File metadata and controls

442 lines (358 loc) · 22.9 KB

Bazel Migration

This document details the steps to migrate a package to build with Bazel. These steps are easiest to understand with a working example, so this doc references tfjs-core's setup as much as possible. Since this migration is still in progress, the steps and processes listed here may change as we improve on the process, add features to each package's build, and create tfjs-specific build functions.

Scope

Migrating a package to Bazel involves adding Bazel targets that build the package, running its tests, and packing the package for publishing to npm. To ease the transition to Bazel, we're incrementally transitioning packages to build with Bazel, starting with root packages (tfjs-core, tfjs-backend-cpu) and gradually expanding to leaf packages. This is different from our original approach of maintaining our current build and a new Bazel build in parallel, which ended up not working due to some changes that Bazel required to the ts sources.

Caveats

  • Bazel will only make a dependency or file available to the build if you explicitly declare it as a dependency / input to the rule you're using.
  • All Bazel builds use the root package.json, so you may have to add packages to it. As long as you use @npm//dependency-name in BUILD files to add dependencies, you won't need to worry about the build accidentally seeing the package's node_modules directory instead of the root node_modules. Bazel will only make the root node_modules directory visible to the build.
    • Even though the build doesn't use the package's node_modules, you may have to run yarn within the package to get code completion to work correctly. We're looking into why this is the case.
  • There may be issues with depending on explicitly pinned versions of @tensorflow scoped packages, which might affect some demos if they're migrated to use Bazel. We might just want to leave demos out of Bazel so they're easier to understand.

Steps

These steps are general guidelines for how to build a package with Bazel. They should work for most packages, but there may be some exceptions (e.g. wasm, react native).

Make sure all dependencies build with Bazel

A package's dependencies must be migrated before it can be migrated. Take a look at the package's issue, which can be found by checking #5287, to find its dependencies.

Add dependencies to the root package.json

Bazel (through rules_nodejs) uses a single root package.json for its npm dependencies. When converting a package to build with Bazel, dependencies in the package's package.json will need to be added to the root package.json as well.

Create a BUILD.bazel file in the package's root

Bazel looks for targets to run in BUILD and BUILD.bazel files. Use the .bazel extension since blaze uses BUILD. You may want to install an extension for your editor to get syntax highlighting. Here's the vscode extension.

This BUILD file will handle package-wide rules like bundling for npm.

Create another BUILD.bazel file in src

This BUILD file will compile the source files of the package using ts_library and may also define test bundles.

Compile the package with ts_library

In the src BUILD.bazel file, we use ts_library to compile the package's typescript files. ts_library is a rule provided by rules_nodejs. We wrap ts_library in a macro that sets some project-specific settings.

Here's an example of how tfjs-core uses ts_library to build.

tfjs-core/src/BUILD.bazel

load("//tools:defaults.bzl", "ts_library")

TEST_SRCS = [
    "**/*_test.ts",
    "image_test_util.ts",
]

# Compiles the majority of tfjs-core using the `@tensorflow/tfjs-core/dist`
# module name.
ts_library(
    name = "tfjs-core_src_lib",
    srcs = glob(
        ["**/*.ts"],
        exclude = TEST_SRCS + ["index.ts"],
    ),
    module_name = "@tensorflow/tfjs-core/dist",
    deps = [
        "@npm//@types",
        "@npm//jasmine-core",
        "@npm//seedrandom",
    ],
)

# Compiles the `index.ts` entrypoint of tfjs-core separately from the rest of
# the sources in order to use the `@tensorflow/tfjs-core` module name instead
# of `@tensorflow/tfjs-core/dist`,
ts_library(
    name = "tfjs-core_lib",
    srcs = ["index.ts"],
    module_name = "@tensorflow/tfjs-core",
    deps = [
        ":tfjs-core_src_lib",
    ],
)

ts_library is used twice in order to have the correct module_name for the output files. Most files are imported relative to @tensorflow/tfjs-core/src/, but index.ts, the entrypoint of tfjs-core, should be importable as @tensorflow/tfjs-core.

If your package imports from dist (e.g. import {} from @tensorflow/tfjs-core/dist/ops/ops_for_converter), that import likely corresponds to a rule in that packages src/BUILD.bazel file. Look for a rule that includes the file you're importing and has module_name set correctly for that import.

Bundle the package

This step involves bundling the compiled files from the compilation step into a single file, and the rules are added to the package's root BUILD file (instead of src/BUILD.bazel). In order to support different execution environments, TFJS generates several bundles for each package. We provide a tfjs_bundle macro to generate these bundles.

tfjs-core/BUILD.bazel

load("//tools:tfjs_bundle.bzl", "tfjs_bundle")

tfjs_bundle(
    name = "tf-core",
    entry_point = "//tfjs-core/src:index.ts",
    external = [
        "node-fetch",
        "util",
    ],
    umd_name = "tf",
    deps = [
        "//tfjs-core/src:tfjs-core_lib",
        "//tfjs-core/src:tfjs-core_src_lib",
    ],
)

The tfjs_bundle macro generates several different bundles which are published in the package publishing step.

Compile the tests with ts_library

In the src/BUILD.bazel file, we compile the tests with ts_library. In the case of tfjs-core, we actually publish the test files, since other packages use them in their tests. Therefore, it's important that we set the module_name to @tensorflow/tfjs-core/dist. If a package's tests are not published, the module_name can probably be omitted. In a future major version of tfjs, we may stop publishing the tests to npm.

tfjs-core/src/BUILD.bazel

load("//tools:defaults.bzl", "ts_library")

ts_library(
    name = "tfjs-core_test_lib",
    srcs = glob(TEST_SRCS),
    # TODO(msoulanille): Mark this as testonly once it's no longer needed in the
    # npm package (for other downstream packages' tests).
    module_name = "@tensorflow/tfjs-core/dist",
    deps = [
        ":tfjs-core_lib",
        ":tfjs-core_src_lib",
    ],
)

Update the tests entrypoint

Many packages have a src/run_tests.ts file (or similar) that they use for selecting which tests to run. That file defines the paths to the test files that Jasmine uses. Since Bazel outputs appearin a different location, the paths to the test files must be updated. As an example, the following paths

const coreTests = 'node_modules/@tensorflow/tfjs-core/src/tests.ts';
const unitTests = 'src/**/*_test.ts';

would need to be updated to

const coreTests = 'tfjs-core/src/tests.js';
const unitTests = 'the-package-name/src/**/*_test.js';

Note that .ts has been changed to .js. This is because we're no longer running node tests with ts-node, so the input test files are now .js outputs created by the ts_library rule that compiled the tests.

It's also important to make sure the nodejs_test rule that runs the test has link_workspace_root = True. Otherwise, the test files will not be accessable at runtime.

Run node tests with nodejs_test

Our test setup allows fine-tuning of exactly what tests are run via setTestEnvs and setupTestFilters in jasmine_util.ts, which are used in a custom Jasmine entrypoint file setup_test.ts. This setup does not work well with jasmine_node_test, which provides its own entrypoint for starting Jasmine. Instead, we use the nodejs_test rule.

tfjs-core/BUILD.bazel

load("@build_bazel_rules_nodejs//:index.bzl", "js_library", "nodejs_test")

# This is necessary for tests to have acess to
# the package.json so src/version_test.ts can 'require()' it.
js_library(
    name = "package_json",
    srcs = [
        ":package.json",
    ],
)

nodejs_test(
    name = "tfjs-core_node_test",
    data = [
        ":package_json",
        "//tfjs-backend-cpu/src:tfjs-backend-cpu_lib",
        "//tfjs-core/src:tfjs-core_lib",
        "//tfjs-core/src:tfjs-core_src_lib",
        "//tfjs-core/src:tfjs-core_test_lib",
    ],
    entry_point = "//tfjs-core/src:test_node.ts",
    link_workspace_root = True,
    tags = ["ci"],
)

It's important to tag tests with ci if you would like them to run in continuous integration.

Run browser tests with karma_web_test

We use esbuild to bundle the tests into a single file.

tfjs-core/src/BUILD.bazel

load("//tools:defaults.bzl", "esbuild")

esbuild(
    name = "tfjs-core_test_bundle",
    testonly = True,
    entry_point = "setup_test.ts",
    external = [
        # webworker tests call 'require('@tensorflow/tfjs')', which
        # is external to the test bundle.
        # Note: This is not a bazel target. It's just a string.
        "@tensorflow/tfjs",
        "worker_threads",
        "util",
    ],
    sources_content = True,
    deps = [
        ":tfjs-core_lib",
        ":tfjs-core_test_lib",
        "//tfjs-backend-cpu/src:tfjs-backend-cpu_lib",
        "//tfjs-core:package_json",
    ],
)

The esbuild bundle is then used in the tfjs_web_test macro, which uses karma_web_test to serve it to a browser to be run. Different browserstack browsers can be enabled or disabled in the browsers argument, and the full list of browsers is located in tools/karma_template.conf.js. Browserstack browser tests are automatically tagged with ci.

tfjs-core/BUILD.bazel

load("//tools:tfjs_web_test.bzl", "tfjs_web_test")

tfjs_web_test(
    name = "tfjs-core_test",
    srcs = [
        "//tfjs-core/src:tfjs-core_test_bundle",
    ],
    browsers = [
        "bs_chrome_mac",
        "bs_firefox_mac",
        "bs_safari_mac",
        "bs_ios_12",
        "bs_android_9",
        "win_10_chrome",
    ],
    static_files = [
        # Listed here so sourcemaps are served
        "//tfjs-core/src:tfjs-core_test_bundle",
        # For the webworker
        ":tf-core.min.js",
        ":tf-core.min.js.map",
        "//tfjs-backend-cpu:tf-backend-cpu.min.js",
        "//tfjs-backend-cpu:tf-backend-cpu.min.js.map",
    ],
)

Whereas before, tests were included based on the karma.conf.js file, now, tests must be included in the test bundle to be run. Make sure to import each test file in the test bundle's entrypoint. To help with this, we provide an enumerate_tests Bazel rule to generate a tests.ts file with the required imports.

tfjs-core/src/BUILD.bazel

load("//tools:enumerate_tests.bzl", "enumerate_tests")

# Generates the 'tests.ts' file that imports all test entrypoints.
enumerate_tests(
    name = "tests",
    srcs = [":all_test_entrypoints"], # all_test_entrypoints is a filegroup
    root_path = "tfjs-core/src",
)

Update the package.json

  1. Verify the entrypoints of the package.json to match the outputs generated by tfjs_bundle and ts_library. tfjs-core/package.json is an example.
    1. The main entrypoint should point to the node bundle, dist/tf-package-name.node.js.
    2. jsnext:main and module should point to the ESModule output dist/index.js created by copy_ts_library_to_dist.
  2. If the package has browser tests, update the sideEffects field to include .mjs files generated by the ts_library under ./src (e.g. src/foo.mjs). Bazel outputs directly to src, and although we copy those outputs to dist with another Bazel rule, the browser test bundles still import from src, so we need to mark them as sideEffects.

Package for npm

We use the pkg_npm rule to create and publish the package to npm. However, there are a few steps needed before we can declare the package. For most packages, we distribute all our compiled outputs in the dist directory. However, due to how ts_library works, it creates outputs in the same directory as the source files were compiled from (except they show up in Bazel's dist/bin output dir). We need to copy these from src to dist while making sure Bazel is aware of this copy (so we can still use pkg_npm).

We also need to copy several other files to dist, such as the bundles created by tfjs_bundle, and we need to create miniprogram files for WeChat.

To copy files, we usually use the copy_to_dist rule. This rule creates symlinks to all the files in srcs and places them in a filetree with the same structure in dest_dir (which defaults to dist).

However, we can't just copy the output of a ts_library, since its default output is the .d.ts declaration files. We need to extract the desired ES Module .mjs outputs of the rule and rename them to have the .js extension. The copy_ts_library_to_dist does this rename, and it also copies the files to dist (including the .d.ts declaration files).

load("//tools:copy_to_dist.bzl", "copy_ts_library_to_dist")

copy_ts_library_to_dist(
    name = "copy_src_to_dist",
    srcs = [
        "//tfjs-core/src:tfjs-core_lib",
        "//tfjs-core/src:tfjs-core_src_lib",
        "//tfjs-core/src:tfjs-core_test_lib",
    ],
    root = "src", # Consider 'src' to be the root directory of the copy
                  # (i.e. create 'dist/index.js' instead of 'dist/src/index.js')
    dest_dir = "dist", # Where to copy the files to. Defaults to 'dist', so it can
                       # actually be omitted in this case.
)

We can also copy the bundles output from tfjs_bundle

copy_to_dist(
    name = "copy_bundles",
    srcs = [
        ":tf-core",
        ":tf-core.node",
        ":tf-core.es2017",
        ":tf-core.es2017.min",
        ":tf-core.fesm",
        ":tf-core.fesm.min",
        ":tf-core.min",
    ],
)

We copy the miniprogram files as well, this time using the copy_file rule, which copies a single file to a destination.

load("@bazel_skylib//rules:copy_file.bzl", "copy_file")

copy_file(
    name = "copy_miniprogram",
    src = ":tf-core.min.js",
    out = "dist/miniprogram/index.js",
)

copy_file(
    name = "copy_miniprogram_map",
    src = ":tf-core.min.js.map",
    out = "dist/miniprogram/index.js.map",
)

Now that all the files are copied, we can declare a pkg_npm

load("@build_bazel_rules_nodejs//:index.bzl", "pkg_npm")

pkg_npm(
    name = "tfjs-core_pkg",
    package_name = "@tensorflow/tfjs-core",
    srcs = [
        # Add any static files the package should include here
        "package.json",
        "README.md",
    ],
    tags = ["ci"],
    deps = [
        ":copy_bundles",
        ":copy_miniprogram",
        ":copy_miniprogram_map",
        ":copy_src_to_dist",
        ":copy_test_snippets", # <- This is only in core, so I've omitted its
                               # definition in these docs.
    ],
)

Now the package can be published to npm with bazel run //tfjs-core:tfjs-core_pkg.publish.

Configure Publishing to npm

With a pkg_npm rule defined, we add a script to package.json to run it. This script will be used by the main script that publishes the monorepo.

"scripts" {
    "publish-npm": "bazel run :tfjs-core_pkg.publish"
}

Since we now use the publish-npm script to publish this package instead of npm publish, we need to make sure the release tests and release script know how to publish it.

  1. In scripts/publish-npm.ts, add your package's name to the BAZEL_PACKAGES set.
  2. In e2e/scripts/publish-tfjs-ci.sh, add your package's name to the BAZEL_PACKAGES list.

You should also add a script to build the package itself without publishing (used for the link-package).

"build": "bazel build :tfjs-core_pkg",

Update Downstream package.json Paths

If no packages depend on your package (i.e. no package.json file includes your package via a link dependency), then you can skip this section.

As a core featue of its design, Bazel places outputs in a different directory than sources. Outputs are symlinked to dist/bin/[package-name]/..... instead of appearing in [package-name]/dist. Due to the different location, all downstream packages' package.json files need to be updated to point to the new outputs. However, due to some details of how Bazel and the Node module resolution algorithm work, we can't directly link: to Bazel's output.

Instead, we maintain a link-package pseudopackage where we copy the Bazel outputs. This package allows for correct Node module resolution between Bazel outputs because it has its own node_modules folder. This package will never be published and will be removed once the migration is complete.

Add the Package to link-package

Add your package to the PACKAGES list in the build_deps.ts script in link-package. For a package with npm name @tensorflow/tfjs-foo, the package's directory in the monorepo and the value to add to PACKAGES should both be tfjs-foo. The name of the package's pkg_npm target should be tfjs-foo_pkg.

const PACKAGES: ReadonlySet<string> = new Set([
  ..., 'tfjs-foo',
]);  

Change Downstream Dependency package.json Paths

Update all downstream dependencies that depend on the package to point to its location in the link-package.

"devDependencies": {
  "@tensorflow/tfjs-core": "link:../link-package/node_modules/@tensorflow/tfjs-core",
  "@tensorflow/tfjs-foo": "link:../link-package/node_modules/@tensorflow/tfjs-foo",
},

To find downstream packages, run grep -r --exclude=yarn.lock --exclude-dir=node_modules "link:.*tfjs-foo" . in the root of the repository.

Update Downstream Packages' build-link-package Scripts

Make sure to list the new package in the call to yarn build-deps-for ... script for downstream packages. This includes packages that have tfjs-foo as a transitive dependency.

"scripts": {
  "build-link-package": "cd ../link-package && yarn build-deps-for tfjs-foo tfjs-some-other-dependency",
  "build-tfjs-foo": "remove this script", // <-- Don't forget to remove this earlier build script from downstream packages.
}

Move linting to the repo-wide lint script

Add the new Bazel package to the repo-wide tslint tsconfig:

Add the path mapping:

"paths": {
  ...,
  "@tensorflow/the-new-package": ["the-new-package/src/index.ts"],
  "@tensorflow/the-new-package/dist/*": ["the-new-package/src/*"]

Also, remove the package from the exclude list.

It's a good idea to test that linting is working on the package. Create a lint error in one if its files, e.g. const x = "Hello, world!" (note the double quotes), and then run yarn lint in the root of the repository.

Remove the package's own TypeScript linting scripts:

Remove the package.json lint script, the tslint.json file, and the cloudbuild lint step from the package's cloudbuild.yml file. Remove tslint-related dependencies from the package's package.json and run yarn to regenerate the yarn.lock file.

Update or Remove cloudbuild.yml

Update the cloudbuild.yml to remove any steps that are now built with Bazel. These will be run by the bazel-tests step, which runs before other packages' steps. Any Bazel rule tagged as ci will be tested / build in CI.

Note that the output paths of Bazel-created outputs will be different, so any remaining steps that now rely on Bazel outputs may need to be updated. Bazel outputs are located in tfjs/dist/bin/....

If all steps of the cloudbuild.yml file are handled by Bazel, it can be deleted. Do not remove the package from tfjs/scripts/package_dependencies.json.

Rebuild the cloudbuild golden files by running yarn update-cloudbuild-tests in the root of the repository.

Push to Git

Before pushing to Git, run the Bazel linter by running yarn bazel:format and yarn bazel:lint-fix in the root of the repo. We run the linter in CI, so if your build is failing in CI only, incorrectly formatted files may be the reason.

Done!

🎉🎉🎉

Notes for Reviewing PRs

  • Make sure the package is added to BAZEL_PACKAGES in e2e/scripts/publish-tfjs-ci.sh
  • Make sure the package is added to BAZEL_PACKAGES in scripts/publish-npm.ts
  • Make sure the package generated by pkg_npm has all the files it needs, e.g. the README.
  • Make sure the package is added to the link-package's package.json and that downstream pakcages are updated to point to the link package's copy instead of the package's directory.
  • For browser tests, it may be worth checking that all desired browser configurations will run in nightly CI.
  • Make sure browser tests include all required tests. The enumerate_tests rule is usually necessary to make the browser actually run tests.
  • Make sure as many cloudbuild steps as possible are converted to Bazel, and that those steps are removed from the cloudbuild file.
  • If the build and tests are fully handled by Bazel and don't need any other cloudbuild steps, make sure the package's cloudbuild.yml file is removed. Do not remove the package from scripts/package_dependencies.json.
  • Make sure tests are tagged with nightly or ci (tfjs_web_test automatically tags tests with nightly and ci).
  • Make sure the main pkg_npm rule is tagged with ci or nightly so all parts of the build are tested.
  • Make sure the package.json scripts are updated and that the package.json includes @bazel/bazelisk as a dev dependency.
  • Make sure the package has a build-npm script and a publish-npm script. These are used by the release script.
  • Check the generated bundle sizes and make sure they don't include any unexpected files. Check the _stats files for info on this.
  • Make sure the package is added to the repo-wide tslint tsconfig and that its original lint scripts are removed.