Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How harsh is 10GB VRAM requirement? #38

Open
MegaMraz opened this issue Aug 19, 2022 · 8 comments
Open

How harsh is 10GB VRAM requirement? #38

MegaMraz opened this issue Aug 19, 2022 · 8 comments

Comments

@MegaMraz
Copy link

I have 6GB VRAM, and here is what I got running txt2img:

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 5.80 GiB total capacity; 4.12 GiB already allocated; 682.94 MiB free; 4.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

with only options --prompt "tower" and --plms

Maybe I can reduce quality or something?

@leszekhanusz
Copy link

You could try using the cpu only.
See stable-diffusion PR #33 or latent-diffusion PR #123

@DrakeFruit
Copy link

DrakeFruit commented Aug 20, 2022

I managed with a 256 x 512 resolution. just type -W 256 -H 512 or vice versa. I have a rtx 2060 super

@kybercore
Copy link

This fork requires a lot less VRAM according to most Reddit comments

https://github.com/basujindal/stable-diffusion/

@MegaMraz
Copy link
Author

Will try new keys and forks. Meanwhile, default, on 3080 10GB, shows off with this:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.00 GiB (GPU 0; 9.78 GiB total capacity; 5.62 GiB already allocated; 2.25 GiB free; 5.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have installed nightly torch, because stable only has sm_70, not suitable for Amperes

@MegaMraz
Copy link
Author

resolution decrease helped. 10GB now does its thing, thanks

@Bendito999
Copy link

I've been running 512x512 just fine on 1.4 on my 6gb rtx 2060 in a laptop to animate prompt walk morphs.
I put it in a github comment here in this very cool script that does those https://gist.github.com/nateraw/c989468b74c616ebbc6474aa8cdd9e53
Probably a similar thing can be set in the main scripts (will try this later with the real txt2img and img2img), but this is how I set it in that 'walk between prompts' example above
Basically adding torch_dtype=torch.float16, it optimizes it to work better within 6GB, probably with some other speed tradeoff but is fine.

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16, use_auth_token=True)

Also had to make sure the rest of the desktop was on Integrated instead of taking up GPU vram to save enough space to get it running. To do this,
NVDIA X server settings -> PRIME Profiled -> Turn on NVIDIA (On Demand). (Instead of being set on Performance)
Then I rebooted and could have a bunch of desktop windows open at the same time as running the diffusion.

Noting that for 1.3 I was using the basujindal fork of the txt2img and img2img and those were working fine too, that fork pulls a lot of VRAM tricks I think to get it in under 4GB but with speed tradeoffs. 1.4 probably works fine with that fork too just haven't tried it yet

@breadbrowser
Copy link

or maybe just use this https://huggingface.co/spaces/stabilityai/stable-diffusion

enzymezoo-code added a commit to enzymezoo-code/stable-diffusion that referenced this issue Sep 2, 2022
colemickens pushed a commit to colemickens/stable-diffusion that referenced this issue Sep 15, 2022
…ez/font-linux

💯 💯 Fix font loading on linux
rubenvandeven pushed a commit to rubenvandeven/stable-diffusion that referenced this issue Sep 30, 2022
add web demo link on Hugging Face Spaces
@Coskon
Copy link

Coskon commented Nov 9, 2022

Im currently generating up to 640x960, 832x704 (around 600k pixels) with a 2gb gtx 1050 using --lowvram. Its pretty slow but its about 3 times faster than just using the cpu, and impressive that it even runs with 2gb vram

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants