Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom docker images a lot larger than expected #253

Open
MiTPenguin opened this issue Sep 14, 2021 · 3 comments
Open

Custom docker images a lot larger than expected #253

MiTPenguin opened this issue Sep 14, 2021 · 3 comments

Comments

@MiTPenguin
Copy link

I have to preface this by saying I'm not very experienced with docker, and most of what I'm doing is stemming from this Terra community guide.

I am trying to update my custom Terra images (in particular, extending the r and bioconductor versions). I have done this successfully in the past, but haven't updated it in a while. So I decided to start from the latest images available, and just added some additional packages on the bioconductor install list (or from CRANS) in the dockerfiles.

My built images ended up being a lot larger than I expected (>15gb), where as my old versions were sitting at ~1gb. I tried just building one of these images directly from dockerfile, with no modification, and my images still ended almost just as large. Curiously, I noticed it's much larger than the corresponding images listed on the Broad gcr (for example: us.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:2.0.0).

Am I missing something here? Was there a compression step that I might've missed? With images this large, it's obviously very difficult to push to dockerhub, which was the way I had been accessing my images.

@rtitle
Copy link
Collaborator

rtitle commented Sep 14, 2021

Hi @MiTPenguin,

The size jump is expected. The size increased when Terra images began to extend from the Google Deep Learning image family. All the images on that page are 13-15gb.

When an image is launched in Terra, the base images are cached on the VM boot disk. So the entire 15G doesn't need to be re-downloaded at launch time (just your extensions on top of it).

The large size does make it unwieldy to work with locally and push though. One trick I've used in the past is to use a cloud VM (could even be a Terra VM) to build and push images which might have a faster Internet connection. In terms of docker repos, we currently support Dockerhub, GCR, and GitHub Container Registry. We also have an open ticket to support quay.io as another free option (especially since Dockerhub starting introducing rate limiting of requests).

@MiTPenguin
Copy link
Author

Thanks @rtitle. Just to be clear about what's going on when I'm creating the image, all I need to carry with me is the dockerfile, correct? I shouldn't need all the other files in the repo here (there are some test scripts and other things that I'm not sure about).

Would building the image on a VM also solve some of the problem with apt-key? I've had to change the line to "apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 51716619E084DAB9" to get it to run on my end. Otherwise it couldn't reach the keyserver.

@rtitle
Copy link
Collaborator

rtitle commented Sep 14, 2021

You shouldn't really need this repo at all -- e.g. you could make a Dockerfile like:

FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:2.0.1

USER root

R -e 'BiocManager::install("my-awesome-package")'

USER $USER

That Dockerfile can live outside of this repo (e.g. it could be pushed to another repo).

With regard to apt-key, I think that's needed to install R from the Ubuntu repository: https://cloud.r-project.org/bin/linux/ubuntu/fullREADME.html#using-apt-key. Not really sure what exact issue you're seeing -- but that should really only be needed if installing R packages using apt-get I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants