-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Nightly Tests
Every night, a set of tests run as part of the TeamCity project Nightlies. These tests have a few common characteristics:
- They set up a temporary CockroachDB cluster and run load against it.
- Their runtime is too long for them to be included in CI.
All nightly tests, except for Jepsen, use Terraform to create and destroy their temporary cluster. It may be wise to remove Terraform in the future, given the cognitive overhead of using a tool that provides much more functionality than we need.
TeamCity jobs execute various bash scripts that, in turn, run the relevant tests. These files are named teamcity/build-*.sh
. Key files include:
- build/teamcity-nightly-acceptance.sh - Wrapper script for running allocator, continuous load, and backup & restore tests.
- build/teamcity-jepsen.sh - Runs Jepsen tests.
- The simplest way to run a nightly test is to go to the Nightlies project, find the test you want to run, and click the Run button.
- To specify flags for
cockroach
for a single test run, click the ... button next to the Run button for a test. Then, go to the Parameters tab and specify a value forenv.COCKROACH_EXTRA_FLAGS
. - To launch a test locally, use the appropriate
build/teamcity-*.sh
script. For many nightlies, this is build/teamcity-nightly-acceptance.sh. See the comments at the top of that script for setup steps.
-
pkg/acceptance/terrafarm -
terrafarm.(*Farmer)
is our thin wrapper around Terraform. It's used by most nightlies to setup, interact with, and destroy the temporary cluster for the test. - pkg/acceptance/terraform/azure - Contains the Terraform config files. Reference: Terraform configuration docs.
- pkg/acceptance/allocator_test.go - Allocator tests, including the schema change test and test steady 6 nodes.
- pkg/acceptance/continuous_load_test.go - Continuous load tests.
The allocator tests stress the replica allocator under load. At a high level, they do the following:
- Create a temporary cluster.
- Restore tarballs of test data (which are TPC-H data sets with various scale factors) on to each node in the cluster.
- Add new nodes to the cluster. The only current exception to this is the "steady 6 nodes" test.
- Starts load generators.
- Wait until the replica allocators reach equilibrium (no replicas added/removed in the last N minutes).
- The test passes only if the standard deviation of range counts is lower than the threshold (set to 5% of the mean range count). This must happen before
TESTTIMEOUT
elapses. - Destroys the temporary cluster.
These are straightforward tests that set up test clusters and run load against them. They pass if TESTTIMEOUT
elapses with no crashes and no periods with 0 QPS.
- Care should be taken when upgrading Terraform. Various backward incompatible changes have been introduced over time (e.g.
terraform init
). - The cloud provider Terraform uses for the temporary clusters is independent of the cloud provider used by TeamCity agents. For example, at the time I'm writing this, TeamCity agents run on GCE agents, and most Terraform clusters run on Azure.
- Azure-based tests can take a long time to iterate on.
- Core dumps aren't enabled for cockroach