Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for parallel Jenkins multi-arch builds #99

Merged
merged 1 commit into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions Jenkinsfile.build
Original file line number Diff line number Diff line change
@@ -1,15 +1,25 @@
// one job per arch (for now) that just builds "the top thing" (triggered by the meta-update job)
// any number of jobs per arch that build the specified buildId (triggered by the respective trigger job)
properties([
disableConcurrentBuilds(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, hmm, something to think about here:

https://github.com/docker-library/meta/blob/f595ec2375e11540c4e26352274b05247953d763/.github/workflows/build.yml#L28-L30

I wish Jenkins had something like GitHub's concurrency groups -- this prevents the same buildId from even attempting to build twice at the same time, and with this PR the only thing that prevents that is that these are canonically triggered by the "trigger" job, which is fine for the normal case, but when things go wrong and we're running "build" by hand to debug, nothing will stop "trigger" from firing and potentially clobbering the thing we're testing. 🤔

I think the closest thing Jenkins has is "Lockable Resources" (https://plugins.jenkins.io/lockable-resources/), but they're a really awful experience and IIRC nothing cleans them up, so we can't reasonably put an arbitrary number of those into the system. 😭

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "Throttle Concurrent Builds" plugin can at least get things with the same parameters from running at the same time: https://www.gusi.me/2022/05/06/Disable-concurrent-builds-based-on-parameters.html. It will still be quasi-queued though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	throttleJobProperty(
		limitOneJobWithMatchingParams: true,
		paramsToUseForLimit: 'identifier, buildId',
		throttleEnabled: true,
		throttleOption: 'project',
	),

Ok, I did a local test using the "Throttle Builds" plugin with this job config and found that it can stop concurrent jobs from adding to the Jenkins queue when specific parameters match already queued/running jobs.

image

If the build job activation comes from something like the trigger job that uses waitForStart :true, then the trigger job will be stuck waiting to add it to the queue and not start anything else. This should be fine since trigger is the only one that should be starting them and so would only happen if we manually had started a build.

// limited by one job per buildId so that the same build cannot run concurrently
throttleJobProperty(
limitOneJobWithMatchingParams: true,
paramsToUseForLimit: 'buildId',
throttleEnabled: true,
throttleOption: 'project',
),
disableResume(),
durabilityHint('PERFORMANCE_OPTIMIZED'),
parameters([
string(name: 'buildId', trim: true),
string(name: 'identifier', trim: true, description: '(optional) used to set <code>currentBuild.displayName</code> to a meaningful value earlier'),
]),
])

env.BASHBREW_ARCH = env.JOB_NAME.minus('/build').split('/')[-1] // "windows-amd64", "arm64v8", etc
env.BUILD_ID = params.buildId
if (params.identifier) {
currentBuild.displayName = params.identifier + ' (#' + currentBuild.number + ')'
}

node('multiarch-' + env.BASHBREW_ARCH) { ansiColor('xterm') {
stage('Checkout') {
Expand Down
125 changes: 78 additions & 47 deletions Jenkinsfile.trigger
Original file line number Diff line number Diff line change
Expand Up @@ -102,69 +102,100 @@ if (breakEarly) { return } // thanks Jenkins...

// new data to be added to the past-jobs.json
// { lastTime: unixTimestamp, url: "" }
def buildCompletionData = [:]

for (buildObj in queue) {
stage(buildObj.identifier) {
//def json = writeJSON(json: buildObj, returnText: true)
//echo(json) // for debugging/data purposes

// "catchError" to set "stageResult" :(
catchError(message: 'Build of "' + buildObj.identifier + '" failed', buildResult: 'UNSTABLE', stageResult: 'FAILURE') {
if (buildObj.gha_payload) {
node {
withEnv([
'payload=' + buildObj.gha_payload,
]) {
withCredentials([
string(
variable: 'GH_TOKEN',
credentialsId: 'github-access-token-docker-library-bot-meta',
),
buildCompletionData = [:]

// list of closures that we can use to wait for the jobs on.
def waitQueue = [:]
def waitQueueClosure(identifier, buildId, externalizableId) {
return {
stage(identifier) {
// "catchError" to set "stageResult" :(
catchError(message: 'Build of "' + identifier + '" failed', buildResult: 'UNSTABLE', stageResult: 'FAILURE') {
def res = waitForBuild(
runId: externalizableId,
propagateAbort: true, // allow cancelling this job to cancel all the triggered jobs
)
buildCompletionData[buildId] = [
lastTime: (res.startTimeInMillis + res.duration) / 1000, // convert to seconds
url: res.absoluteUrl,
]
if (res.result != 'SUCCESS') {
// set stage result via catchError
error(res.result)
}
}
}
}
}

// stage to wrap up all the build job triggers that get waited on later
stage('trigger') {
for (buildObj in queue) {
if (buildObj.gha_payload) {
stage(buildObj.identifier) {
// "catchError" to set "stageResult" :(
catchError(message: 'Build of "' + buildObj.identifier + '" failed', buildResult: 'UNSTABLE', stageResult: 'FAILURE') {
node {
withEnv([
'payload=' + buildObj.gha_payload,
]) {
sh '''
set -u +x

# https://docs.github.com/en/free-pro-team@latest/rest/actions/workflows?apiVersion=2022-11-28#create-a-workflow-dispatch-event
curl -fL \
-X POST \
-H 'Accept: application/vnd.github+json' \
-H "Authorization: Bearer $GH_TOKEN" \
-H 'X-GitHub-Api-Version: 2022-11-28' \
https://api.github.com/repos/docker-library/meta/actions/workflows/build.yml/dispatches \
-d "$payload"
'''
withCredentials([
string(
variable: 'GH_TOKEN',
credentialsId: 'github-access-token-docker-library-bot-meta',
),
]) {
sh '''
set -u +x

# https://docs.github.com/en/free-pro-team@latest/rest/actions/workflows?apiVersion=2022-11-28#create-a-workflow-dispatch-event
curl -fL \
-X POST \
-H 'Accept: application/vnd.github+json' \
-H "Authorization: Bearer $GH_TOKEN" \
-H 'X-GitHub-Api-Version: 2022-11-28' \
https://api.github.com/repos/docker-library/meta/actions/workflows/build.yml/dispatches \
-d "$payload"
'''
}
}
// record that GHA was triggered (for tracking continued triggers that fail to push an image)
buildCompletionData[buildObj.buildId] = [
lastTime: System.currentTimeMillis() / 1000, // convert to seconds
url: currentBuild.absoluteUrl,
]
}
// record that GHA was triggered (for tracking continued triggers that fail to push an image)
buildCompletionData[buildObj.buildId] = [
lastTime: System.currentTimeMillis() / 1000, // convert to seconds
url: currentBuild.absoluteUrl,
]
}
} else {
}
} else {
// "catchError" to set "stageResult" :(
catchError(message: 'Build of "' + buildObj.identifier + '" failed', buildResult: 'UNSTABLE', stageResult: 'FAILURE') {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blank line is extraneous, but it doesn't bother me to leave it in. 👍

Suggested change

// why not parallel these build() invocations?
// jenkins parallel closures get started in a randomish order, ruining our sorted queue
def res = build(
job: 'build',
parameters: [
string(name: 'buildId', value: buildObj.buildId),
string(name: 'identifier', value: buildObj.identifier),
],
propagate: false,
quietPeriod: 5, // seconds
// trigger these quickly so they all get added to Jenkins queue in "queue" order (also using "waitForStart" means we have to wait for the entire "quietPeriod" before we get to move on and schedule more)
quietPeriod: 0, // seconds
// we'll wait on the builds in parallel after they are all queued (so our sorted order is the queue order)
waitForStart: true,
)
// record the job failure
buildCompletionData[buildObj.buildId] = [
lastTime: (res.startTimeInMillis + res.duration) / 1000, // convert to seconds
url: res.absoluteUrl,
]
if (res.result != 'SUCCESS') {
// set stage result via catchError
error(res.result)
}
waitQueue[buildObj.identifier] = waitQueueClosure(buildObj.identifier, buildObj.buildId, res.externalizableId)
}
}
}
}

// wait on all the 'build' jobs that were queued
if (waitQueue.size() > 0) {
parallel waitQueue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer to be explicit about the () in single-argument invocations like this (although I don't usually go as far as "naming" the parameter), but I don't think it really matters much either way. 👍

Suggested change
parallel waitQueue
parallel(waitQueue)

}

// save currentJobs so we can use it next run as pastJobs
node {
def buildCompletionDataJson = writeJSON(json: buildCompletionData, returnText: true)
Expand Down