Inference hardware requirements #2

abrichr · 2024-10-10T18:35:14Z

Hello, and thank you for the excellent work!

In the paper it says:

The first stage takes about 50 hours on a single 4x NVIDIA A100 machine (global batch size 128 with gradient
accumulation). And for the large scale GUI data training, we use 112 NVIDIA H100 GPUs and finish the
training in about 6 hours (global batch size 448).

Can you please clarify what are the inference time hardware requirements? Any chance of running this on CPU?

Thanks again!

boyugou · 2024-10-10T19:19:19Z

Overall, it's built on LLava with slight adaptations (mainly about input image processing), so it's definitely possible to run on CPU (take Ollama as a reference). I remember 4bit llava can run very smoothly on my laptop.

dprokhorov17 · 2024-10-25T15:33:29Z

@abrichr I'm running this within a docker container and my memory footprint is the following:

It's running on bf16 float precision using cached=True

GuoHaoren · 2024-11-10T13:33:15Z

Hi, I am implement the inference "single_infer.py" on my mac with CPU only. It is very slow. Is there any suggestion on the configurations settings regarding to "load_pretrained_model", "tokenizer_image_token" and "model.generate()".

boyugou · 2025-01-04T16:12:56Z

@GuoHaoren @abrichr @dprokhorov17

We have trained and released a stronger while smaller 2B model:

osunlp/UGround-V1-2B

I'm still trying to see what's the best way to run on CPU.

Maybe by quantization like
GGUF: https://huggingface.co/mradermacher/UGround-V1-2B-GGUF
or AWQ/GPQ which are suggested by Qwen2-VL

I tried with the GGUF one, which runs pretty fast on my 16GB Macbook (with LM Studio). But so far I have no idea how to handle image inputs.

dprokhorov17 · 2025-01-06T15:53:50Z

@boyugou and still, you didn't provide any finetuning/training script and most important the data in order to be able to reproduce your results.

boyugou · 2025-01-06T19:27:02Z

@boyugou and still, you didn't provide any finetuning/training script and most important the data in order to be able to reproduce your results.

Let me clarify a little bit. Here are the main training script and code for the previous training:

Pretrain:
https://github.com/boyugou/llava_uground/blob/90ff02d24c3f8c7a9fb5c90050fa003b0512910f/scripts/ui_v1/pretrain_7b.sh

SFT:
https://github.com/boyugou/llava_uground/blob/90ff02d24c3f8c7a9fb5c90050fa003b0512910f/scripts/ui_v1/finetune_task_lora.sh

You will likely need to change the dataloader logic a little bit, as I assumed using a parquet file from s3 for streaming data and mistakenly deleted the naive implementation on top of original LLaVA's train.py:
https://github.com/boyugou/llava_uground/blob/90ff02d24c3f8c7a9fb5c90050fa003b0512910f/llava/train/train_s3.py

The Qwen2-VL-based models are trained by the infra of Mosaic ML, using an in-house codebase which I cannot share with you. But I will share the yaml files for the details of hyper-parameters used. (I think the critical things are: lr=1e-6, max_pixels=1344*1344, ep=～1.5ep

The data used for the above two are totally the same

boyugou · 2025-01-06T19:28:57Z

@boyugou and still, you didn't provide any finetuning/training script and most important the data in order to be able to reproduce your results.

Regarding the data, plz give me a bit more time. And I will release the code I used first. For big companies, it should be easy to collect better data than what I used (Web-Hybrid) with the same pipeline, by using:

better webpage url list (I randomly sampled from Common Crawl)
better captioning MLLM and rewriting LM
larger scale

I know I have been asked a lot regarding the data, especially in another issue. Sorry for the delay. For the raw data, fair use, copyright, potential harmful contents are our main concerns. Hope you can understand.

Overall, the Qwen-based UGround-V1 is not the only thing in our release plan. They are still in our plan and will release soon (with a bunch of other stuff).

boyugou · 2025-01-06T19:34:34Z

@dprokhorov17

Do the above answers address your question? I hope to and will have them ready for everyone soon.

If you have any urgent projects or need any resources, feel free to contact me directly via email. I will do my best to assist you. For unknown reasons, GitHub seems not pushing every message to me through email.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference hardware requirements #2

Inference hardware requirements #2

abrichr commented Oct 10, 2024

boyugou commented Oct 10, 2024 •

edited

Loading

dprokhorov17 commented Oct 25, 2024

GuoHaoren commented Nov 10, 2024

boyugou commented Jan 4, 2025 •

edited

Loading

dprokhorov17 commented Jan 6, 2025 •

edited

Loading

boyugou commented Jan 6, 2025 •

edited

Loading

boyugou commented Jan 6, 2025 •

edited

Loading

boyugou commented Jan 6, 2025 •

edited

Loading

Inference hardware requirements #2

Inference hardware requirements #2

Comments

abrichr commented Oct 10, 2024

boyugou commented Oct 10, 2024 • edited Loading

dprokhorov17 commented Oct 25, 2024

GuoHaoren commented Nov 10, 2024

boyugou commented Jan 4, 2025 • edited Loading

dprokhorov17 commented Jan 6, 2025 • edited Loading

boyugou commented Jan 6, 2025 • edited Loading

boyugou commented Jan 6, 2025 • edited Loading

boyugou commented Jan 6, 2025 • edited Loading

boyugou commented Oct 10, 2024 •

edited

Loading

boyugou commented Jan 4, 2025 •

edited

Loading

dprokhorov17 commented Jan 6, 2025 •

edited

Loading

boyugou commented Jan 6, 2025 •

edited

Loading

boyugou commented Jan 6, 2025 •

edited

Loading

boyugou commented Jan 6, 2025 •

edited

Loading