-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor data model to support more atomic index generation #27
Conversation
cacheFunc, wasCached := cacheResolve.LoadOrStore(refString, sync.OnceValues(func() (*ocispec.Index, error) { | ||
return registry.SynthesizeIndex(ctx, ref) | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically now that we're transparently caching all <=4MiB registry responses in-memory we don't need this to cache, but it's currently acting as a mutex (and there are some TODO
s in the cache code related to mutexes that we also need to resolve for this to fully parallelize properly). There's also enough work being done by SynthesizeIndex
to make it seem pretty reasonable to leave this as-is for now (otherwise it'll re-parse/re-process the same JSON documents each time it hits the same ref).
Here's the easiest way to illustrate what the output of $ ./.go-env.sh go run lookup.go docker:dind
+ exec docker run --interactive --rm --init --user 1000:1000 --mount type=bind,src=/home/tianon/docker/doi-meta/.scripts,dst=/app --workdir /app --tmpfs /tmp,exec --env HOME=/tmp --env GOPATH=/go --mount type=volume,src=doi-meta-gopath,dst=/go --env GOCACHE=/go/.cache --env CGO_ENABLED=0 --env GOTOOLCHAIN=local --env GODEBUG --env GOFLAGS --env GOOS --env GOARCH --env GO386 --env GOAMD64 --env GOARM --env DOCKERHUB_PUBLIC_PROXY --env DOCKERHUB_PUBLIC_PROXY_HOST --tty golang:1.21 go run lookup.go docker:dind
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:c84968d89ea608b1c71c19f27346b6e4b215544c82a5825940073b454c3fc598",
"size": 4004,
"annotations": {
"com.docker.official-images.bashbrew.arch": "amd64",
"org.opencontainers.image.base.digest": "sha256:93fd3077b1577a8de4e8c6437cea5713b7c38b319d42373f09fb30f9c227bc87",
"org.opencontainers.image.base.name": "docker:25-cli",
"org.opencontainers.image.created": "2024-02-07T02:49:16Z",
"org.opencontainers.image.ref.name": "docker.io/library/docker:dind@sha256:c84968d89ea608b1c71c19f27346b6e4b215544c82a5825940073b454c3fc598",
"org.opencontainers.image.revision": "de55ce1ae86abd97836f99759ae8badc30d0e0e6",
"org.opencontainers.image.source": "https://github.com/docker-library/docker.git#de55ce1ae86abd97836f99759ae8badc30d0e0e6:25/dind",
"org.opencontainers.image.url": "https://hub.docker.com/_/docker",
"org.opencontainers.image.version": "25.0.3-dind"
},
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:e87ef0aa213c9b4a33325132ed784d79c4e6d6a6326ce7d31d60d5ccfc1851c4",
"size": 840,
"annotations": {
"com.docker.official-images.bashbrew.arch": "amd64",
"org.opencontainers.image.ref.name": "docker.io/library/docker:dind@sha256:e87ef0aa213c9b4a33325132ed784d79c4e6d6a6326ce7d31d60d5ccfc1851c4",
"vnd.docker.reference.digest": "sha256:c84968d89ea608b1c71c19f27346b6e4b215544c82a5825940073b454c3fc598",
"vnd.docker.reference.type": "attestation-manifest"
},
"platform": {
"architecture": "unknown",
"os": "unknown"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:a886b118a9c0f75fa6c6db47ff87717703e763824d56c8a35125feaf89f17338",
"size": 4006,
"annotations": {
"com.docker.official-images.bashbrew.arch": "arm32v6",
"org.opencontainers.image.base.digest": "sha256:6fc269faa9944670876f14c57619a64010a64534b7646cb21eeec402ba60cf64",
"org.opencontainers.image.base.name": "docker:25-cli",
"org.opencontainers.image.created": "2024-02-07T10:50:38Z",
"org.opencontainers.image.ref.name": "docker.io/library/docker:dind@sha256:a886b118a9c0f75fa6c6db47ff87717703e763824d56c8a35125feaf89f17338",
"org.opencontainers.image.revision": "de55ce1ae86abd97836f99759ae8badc30d0e0e6",
"org.opencontainers.image.source": "https://github.com/docker-library/docker.git#de55ce1ae86abd97836f99759ae8badc30d0e0e6:25/dind",
"org.opencontainers.image.url": "https://hub.docker.com/_/docker",
"org.opencontainers.image.version": "25.0.3-dind"
},
"platform": {
"architecture": "arm",
"os": "linux",
"variant": "v6"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:3b4f8e53edfcf31851d5f127b5e66880d829eecaa1429076e620f2796b2ae311",
"size": 567,
"annotations": {
"com.docker.official-images.bashbrew.arch": "arm32v6",
"org.opencontainers.image.ref.name": "docker.io/library/docker:dind@sha256:3b4f8e53edfcf31851d5f127b5e66880d829eecaa1429076e620f2796b2ae311",
"vnd.docker.reference.digest": "sha256:a886b118a9c0f75fa6c6db47ff87717703e763824d56c8a35125feaf89f17338",
"vnd.docker.reference.type": "attestation-manifest"
},
"platform": {
"architecture": "unknown",
"os": "unknown"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:97354094d74920f349c7687639528353c732159aeb91f76fcf083ccfaf031489",
"size": 4006,
"annotations": {
"com.docker.official-images.bashbrew.arch": "arm32v7",
"org.opencontainers.image.base.digest": "sha256:741c325d7b83c5341c593a5299983014f75ce031dec81cb8cdfa0ff316009d34",
"org.opencontainers.image.base.name": "docker:25-cli",
"org.opencontainers.image.created": "2024-02-07T03:39:51Z",
"org.opencontainers.image.ref.name": "docker.io/library/docker:dind@sha256:97354094d74920f349c7687639528353c732159aeb91f76fcf083ccfaf031489",
"org.opencontainers.image.revision": "de55ce1ae86abd97836f99759ae8badc30d0e0e6",
"org.opencontainers.image.source": "https://github.com/docker-library/docker.git#de55ce1ae86abd97836f99759ae8badc30d0e0e6:25/dind",
"org.opencontainers.image.url": "https://hub.docker.com/_/docker",
"org.opencontainers.image.version": "25.0.3-dind"
},
"platform": {
"architecture": "arm",
"os": "linux",
"variant": "v7"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:b836f12139e632159518a402a934a752ec1b9e2e7da1508c3483033a3a840671",
"size": 840,
"annotations": {
"com.docker.official-images.bashbrew.arch": "arm32v7",
"org.opencontainers.image.ref.name": "docker.io/library/docker:dind@sha256:b836f12139e632159518a402a934a752ec1b9e2e7da1508c3483033a3a840671",
"vnd.docker.reference.digest": "sha256:97354094d74920f349c7687639528353c732159aeb91f76fcf083ccfaf031489",
"vnd.docker.reference.type": "attestation-manifest"
},
"platform": {
"architecture": "unknown",
"os": "unknown"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:a0033ee30d87d9255fcec7bda4171da530dc134fd1d68a4ec7a1af1a9b54adc3",
"size": 4006,
"annotations": {
"com.docker.official-images.bashbrew.arch": "arm64v8",
"org.opencontainers.image.base.digest": "sha256:2ea7f9e11392461bb40958afd8c2ca0db345159dc84b61a6411a61426c091dd1",
"org.opencontainers.image.base.name": "docker:25-cli",
"org.opencontainers.image.created": "2024-02-07T03:26:48Z",
"org.opencontainers.image.ref.name": "docker.io/library/docker:dind@sha256:a0033ee30d87d9255fcec7bda4171da530dc134fd1d68a4ec7a1af1a9b54adc3",
"org.opencontainers.image.revision": "de55ce1ae86abd97836f99759ae8badc30d0e0e6",
"org.opencontainers.image.source": "https://github.com/docker-library/docker.git#de55ce1ae86abd97836f99759ae8badc30d0e0e6:25/dind",
"org.opencontainers.image.url": "https://hub.docker.com/_/docker",
"org.opencontainers.image.version": "25.0.3-dind"
},
"platform": {
"architecture": "arm64",
"os": "linux",
"variant": "v8"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:7c4319183a3e7fe811e619fd4752531245e4b1710149a96f47c2e84d8e58f9ef",
"size": 840,
"annotations": {
"com.docker.official-images.bashbrew.arch": "arm64v8",
"org.opencontainers.image.ref.name": "docker.io/library/docker:dind@sha256:7c4319183a3e7fe811e619fd4752531245e4b1710149a96f47c2e84d8e58f9ef",
"vnd.docker.reference.digest": "sha256:a0033ee30d87d9255fcec7bda4171da530dc134fd1d68a4ec7a1af1a9b54adc3",
"vnd.docker.reference.type": "attestation-manifest"
},
"platform": {
"architecture": "unknown",
"os": "unknown"
}
}
],
"annotations": {
"org.opencontainers.image.ref.name": "docker.io/library/docker:dind@sha256:915cd1624f521b6337f135075f712c8fb14c0b151595c6144d7ce05d2f257869"
}
} |
I guess it's even more interesting to post an example from something like $ ./.go-env.sh go run lookup.go mcr.microsoft.com/windows/nanoserver@sha256:48aac1dc133f7b6b4985b17303a5c97d422f915b769a05fbca350a0fd438b4df
+ exec docker run --interactive --rm --init --user 1000:1000 --mount type=bind,src=/home/tianon/docker/doi-meta/.scripts,dst=/app --workdir /app --tmpfs /tmp,exec --env HOME=/tmp --env GOPATH=/go --mount type=volume,src=doi-meta-gopath,dst=/go --env GOCACHE=/go/.cache --env CGO_ENABLED=0 --env GOTOOLCHAIN=local --env GODEBUG --env GOFLAGS --env GOOS --env GOARCH --env GO386 --env GOAMD64 --env GOARM --env DOCKERHUB_PUBLIC_PROXY --env DOCKERHUB_PUBLIC_PROXY_HOST --tty golang:1.21 go run lookup.go mcr.microsoft.com/windows/nanoserver@sha256:48aac1dc133f7b6b4985b17303a5c97d422f915b769a05fbca350a0fd438b4df
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"digest": "sha256:48aac1dc133f7b6b4985b17303a5c97d422f915b769a05fbca350a0fd438b4df",
"size": 429,
"annotations": {
"com.docker.official-images.bashbrew.arch": "windows-amd64",
"org.opencontainers.image.ref.name": "mcr.microsoft.com/windows/nanoserver@sha256:48aac1dc133f7b6b4985b17303a5c97d422f915b769a05fbca350a0fd438b4df"
},
"platform": {
"architecture": "amd64",
"os": "windows",
"os.version": "10.0.20348.2322"
}
}
],
"annotations": {
"org.opencontainers.image.ref.name": "mcr.microsoft.com/windows/nanoserver@sha256:48aac1dc133f7b6b4985b17303a5c97d422f915b769a05fbca350a0fd438b4df"
}
} |
Converting to draft so I can spend more time splitting up the monolith that is |
355721e
to
51c004b
Compare
As a note for reviewers, it might be helpful to review the general shape of https://pkg.go.dev/cuelabs.dev/go/oci/ociregistry before reviewing this because it makes heavy use of that interface -- it's designed for writing registry servers, but if you make a server look like RPC, then a client is just a consumer of a server interface and that's what (This is the library that actually renewed my otherwise extremely dwindling faith in the interface model of Go because it models the problem space here so well 😅) |
After being away from this code for a bit and now re-reviewing it with fresh eyes, I think it needs more godoc/function comments, so I'm going to add those (but generally still feel good about the shape). |
397f082
to
8114545
Compare
Ok, added some godocs (and moved a few functions around so it's easier to follow the logic/flow and so things are grouped more correctly). |
(Added a test for the function that's really trivial to test correctly -- intend to add more tests over time, but this seems OK to start since more tests require more refactoring or writing what are effectively integration tests which will likely fail often for unrelated reasons. Edit: and this code is tested in large part by our existing smoke test in this repository, so I'm not as worried about it being tested in Go.) |
This adjusts our data model to store/track data from the registry as an image index (with more data fidelity) instead of as a custom data structure. One of the most notable benefits is that we can now track the annotation objects for each architecture, but even more importantly, this allows us to generate the indexes we need to push in deploy directly from data that's on-disk (where the old structure would require querying the registry). Notably, however, this does *not* change deploy (yet) -- those changes are still in progress, but this was a large enough refactor/rewrite that I figured I should start here. This also switches us from using containerd's registry client library to using [`cuelabs.dev/go/oci/ociregistry`](https://pkg.go.dev/cuelabs.dev/go/oci/ociregistry), which is much closer to the distribution-spec APIs (modeled/designed after them, in fact), where containerd's (and frankly most others) are a higher-level abstraction. This is important because we're running into raw number of request rate limits, and almost every library for this always starts with a `HEAD` before pushing content, and this will allow us to go directly for a `PUT` (and then only do the "copy child objects" dance if the `PUT` fails). This also drops our `allTags` data entry (which we only *ever* used the first value of, and then only to identify a `sourceId`/`buildId` in a human-meaningful way, because it's not useful/meaningful for any other use case), and moves `tags` from being arch-specific up to the per-source-object level because it's always identical across all arches for a given `sourceId` so it's silly to copy it N times for every architecture object.
Full cache bust took ~24 minutes and 12 seconds. Running again immediately after (full cache hit) took 35s. |
🔥 (Even though our |
(This is the first half of the work for #22)
This adjusts our data model to store/track data from the registry as an image index (with more data fidelity) instead of as a custom data structure. One of the most notable benefits is that we can now track the annotation objects for each architecture, but even more importantly, this allows us to generate the indexes we need to push in deploy directly from data that's on-disk (where the old structure would require querying the registry).
Notably, however, this does not change deploy (yet) -- those changes are still in progress, but this was a large enough refactor/rewrite that I figured I should start here.
This also switches us from using containerd's registry client library to using
cuelabs.dev/go/oci/ociregistry
, which is much closer to the distribution-spec APIs (modeled/designed after them, in fact), where containerd's (and frankly most others) are a higher-level abstraction. This is important because we're running into raw number of request rate limits, and almost every library for this always starts with aHEAD
before pushing content, and this will allow us to go directly for aPUT
(and then only do the "copy child objects" dance if thePUT
fails).This also drops our
allTags
data entry (which we only ever used the first value of, and then only to identify asourceId
/buildId
in a human-meaningful way, because it's not useful/meaningful for any other use case), and movestags
from being arch-specific up to the per-source-object level because it's always identical across all arches for a givensourceId
so it's silly to copy it N times for every architecture object.