-
Notifications
You must be signed in to change notification settings - Fork 114
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a comprehensive README for the Flux example, detailing environment setup and execution instructions. - Added a command-line interface for generating images using the Flux model, allowing users to specify parameters and optimization flags. - **Documentation** - Enhanced README with sections on performance comparison, dynamic shape support, and quality of generated images. - Included detailed instructions for setting up OneDiff, the NexFort backend, and the Diffusers library. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Loading branch information
Showing
5 changed files
with
376 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
# Run Flux with nexfort backend (Beta Release) | ||
|
||
1. [Environment Setup](#environment-setup) | ||
- [Set Up OneDiff](#set-up-onediff) | ||
- [Set Up NexFort Backend](#set-up-nexfort-backend) | ||
- [Set Up Diffusers Library](#set-up-diffusers) | ||
- [Download FLUX Model for Diffusers](#set-up-flux) | ||
2. [Execution Instructions](#run) | ||
3. [Performance Comparison](#performance-comparation) | ||
4. [Dynamic Shape for Flux](#dynamic-shape-for-flux) | ||
5. [Quality](#quality) | ||
|
||
## Environment setup | ||
### Set up onediff | ||
https://github.com/siliconflow/onediff?tab=readme-ov-file#installation | ||
|
||
### Set up nexfort backend | ||
https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort | ||
|
||
### Set up diffusers | ||
|
||
``` | ||
# Ensure diffusers include the Flux pipeline. | ||
pip3 install --upgrade diffusers[torch] | ||
``` | ||
### Set up Flux | ||
Model version for diffusers: https://huggingface.co/black-forest-labs/FLUX.1-dev | ||
|
||
HF pipeline: https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/flux.md | ||
|
||
## Run | ||
|
||
### Run 1024*1024 without compile (the original pytorch HF diffusers baseline) | ||
``` | ||
python3 onediff_diffusers_extensions/examples/flux/text_to_image_flux.py \ | ||
--saved-image flux.png | ||
``` | ||
|
||
### Run 1024*1024 with compile | ||
|
||
|
||
## Performance comparation | ||
### Acceleration with Onediff-Community | ||
|
||
``` | ||
NEXFORT_ENABLE_FP8_QUANTIZE_ATTENTION=0 python3 onediff_diffusers_extensions/examples/flux/text_to_image_flux.py \ | ||
--transform \ | ||
--saved-image flux_compile.png | ||
``` | ||
|
||
Testing on NVIDIA H20, with image size of 1024*1024, iterating 20 steps: | ||
| Metric | | | ||
| ------------------------------------------------ | ------------------- | | ||
| Data update date(yyyy-mm-dd) | 2024-11-13 | | ||
| PyTorch iteration speed | 1.38 it/s | | ||
| OneDiff iteration speed | 1.89 it/s (+37.0%) | | ||
| PyTorch E2E time | 14.94 s | | ||
| OneDiff E2E time | 11.30 s (-24.4%) | | ||
| PyTorch Max Mem Used | 33.849 GiB | | ||
| OneDiff Max Mem Used | 33.850 GiB | | ||
| PyTorch Warmup with Run time | 16.15 s | | ||
| OneDiff Warmup with Compilation time<sup>1</sup> | 166.22 s | | ||
| OneDiff Warmup with Cache time | 12.58 s | | ||
|
||
<sup>1</sup> OneDiff Warmup with Compilation time is tested on Intel(R) Xeon(R) Platinum 8468V. Note this is just for reference, and it varies a lot on different CPU. | ||
|
||
### Acceleration with Onediff-Enterprise(with quantization) | ||
``` | ||
NEXFORT_FORCE_QUANTE_ON_CUDA=1 python3 onediff_diffusers_extensions/examples/flux/text_to_image_flux.py \ | ||
--quantize \ | ||
--transform \ | ||
--saved-image flux_compile.png | ||
``` | ||
|
||
Testing on NVIDIA H20, with image size of 1024*1024, iterating 20 steps: | ||
| Metric | | | ||
| ------------------------------------------------ | ------------------- | | ||
| Data update date(yyyy-mm-dd) | 2024-11-13 | | ||
| PyTorch iteration speed | 1.38 it/s | | ||
| OneDiff iteration speed | 2.98 it/s (+115.9%) | | ||
| PyTorch E2E time | 14.94 s | | ||
| OneDiff E2E time | 7.17 s (-52.0%) | | ||
| PyTorch Max Mem Used | 33.849 GiB | | ||
| OneDiff Max Mem Used | 22.879 GiB | | ||
| PyTorch Warmup with Run time | 16.15 s | | ||
| OneDiff Warmup with Compilation time<sup>1</sup> | 229.56 s | | ||
| OneDiff Warmup with Cache time | 8.28 s | | ||
|
||
<sup>1</sup> OneDiff Warmup with Compilation time is tested on Intel(R) Xeon(R) Platinum 8468V. Note this is just for reference, and it varies a lot on different CPU. | ||
|
||
``` | ||
NEXFORT_FORCE_QUANTE_ON_CUDA=1 python3 onediff_diffusers_extensions/examples/flux/text_to_image_flux.py \ | ||
--quantize \ | ||
--transform \ | ||
--speedup-t5 \ # Must quantize t5, because 4090 has only 24GB of memory | ||
--saved-image flux_compile.png | ||
``` | ||
|
||
|
||
Testing on RTX 4090, with image size of 1024*1024, iterating 20 steps:: | ||
| Metric | | | ||
| ------------------------------------------------ | ------------------- | | ||
| Data update date(yyyy-mm-dd) | 2024-11-13 | | ||
| PyTorch iteration speed | OOM | | ||
| OneDiff iteration speed | 3.29 it/s | | ||
| PyTorch E2E time | OOM | | ||
| OneDiff E2E time | 6.50 s | | ||
| PyTorch Max Mem Used | OOM | | ||
| OneDiff Max Mem Used | 18.466 GiB | | ||
| PyTorch Warmup with Run time | OOM | | ||
| OneDiff Warmup with Compilation time<sup>2</sup> | 169.16 s | | ||
| OneDiff Warmup with Cache time | 7.12 s | | ||
|
||
<sup>2</sup> OneDiff Warmup with Compilation time is tested on AMD EPYC 7543 32-Core Processor | ||
|
||
|
||
## Dynamic shape for Flux | ||
|
||
Run: | ||
|
||
``` | ||
python3 onediff_diffusers_extensions/examples/flux/text_to_image_flux.py \ | ||
--quantize \ | ||
--transform \ | ||
--run_multiple_resolutions \ | ||
--saved-image flux_compile.png | ||
``` | ||
|
||
## Quality | ||
When using nexfort as the backend for onediff compilation acceleration, the generated images are nearly lossless.(The following images are generated on an NVIDIA H20) | ||
|
||
### Generated image with pytorch | ||
<p align="center"> | ||
<img src="../../../imgs/flux_base.png"> | ||
</p> | ||
|
||
### Generated image with nexfort acceleration(Community) | ||
<p align="center"> | ||
<img src="../../../imgs/nexfort_flux_community.png"> | ||
</p> | ||
|
||
### Generated image with nexfort acceleration(Enterprise) | ||
<p align="center"> | ||
<img src="../../../imgs/nexfort_flux_enterprise.png"> | ||
</p> |
231 changes: 231 additions & 0 deletions
231
onediff_diffusers_extensions/examples/flux/text_to_image_flux.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,231 @@ | ||
import argparse | ||
import json | ||
import time | ||
|
||
import nexfort | ||
|
||
import torch | ||
from diffusers import FluxPipeline | ||
|
||
|
||
def parse_args(): | ||
parser = argparse.ArgumentParser( | ||
description="Use nexfort to accelerate image generation with Flux." | ||
) | ||
parser.add_argument( | ||
"--model", | ||
type=str, | ||
default="black-forest-labs/FLUX.1-dev", | ||
help="Model path or identifier.", | ||
) | ||
parser.add_argument( | ||
"--speedup-t5", | ||
action="store_true", | ||
help="Enable optimize t5.", | ||
) | ||
parser.add_argument( | ||
"--quantize", | ||
action="store_true", | ||
help="Enable fp8 quantization.", | ||
) | ||
parser.add_argument( | ||
"--transform", | ||
action="store_true", | ||
help="Enable speedup with nexfort.", | ||
) | ||
parser.add_argument( | ||
"--prompt", | ||
type=str, | ||
default="evening sunset scenery blue sky nature, glass bottle with a galaxy in it", | ||
help="Prompt for the image generation.", | ||
) | ||
parser.add_argument( | ||
"--height", type=int, default=1024, help="Height of the generated image." | ||
) | ||
parser.add_argument( | ||
"--width", type=int, default=1024, help="Width of the generated image." | ||
) | ||
parser.add_argument( | ||
"--guidance_scale", | ||
type=float, | ||
default=0.0, | ||
help="The scale factor for the guidance.", | ||
) | ||
parser.add_argument( | ||
"--max_sequence_length", | ||
type=int, | ||
default=256, | ||
help="Maximum sequence length to use with the `prompt`.", | ||
) | ||
parser.add_argument( | ||
"--num-inference-steps", type=int, default=20, help="Number of inference steps." | ||
) | ||
parser.add_argument( | ||
"--saved-image", | ||
type=str, | ||
default="./flux.png", | ||
help="Path to save the generated image.", | ||
) | ||
parser.add_argument( | ||
"--seed", type=int, default=1, help="Seed for random number generation." | ||
) | ||
parser.add_argument( | ||
"--run_multiple_resolutions", | ||
action="store_true", | ||
) | ||
parser.add_argument( | ||
"--run_multiple_prompts", | ||
action="store_true", | ||
) | ||
return parser.parse_args() | ||
|
||
|
||
args = parse_args() | ||
|
||
device = torch.device("cuda") | ||
|
||
|
||
def generate_texts(min_length=50, max_length=302): | ||
base_text = "a female character with long, flowing hair that appears to be made of ethereal, swirling patterns resembling the Northern Lights or Aurora Borealis. The background is dominated by deep blues and purples, creating a mysterious and dramatic atmosphere. The character's face is serene, with pale skin and striking features. She" | ||
|
||
additional_words = [ | ||
"gracefully", | ||
"beautifully", | ||
"elegant", | ||
"radiant", | ||
"mysteriously", | ||
"vibrant", | ||
"softly", | ||
"gently", | ||
"luminescent", | ||
"sparkling", | ||
"delicately", | ||
"glowing", | ||
"brightly", | ||
"shimmering", | ||
"enchanting", | ||
"gloriously", | ||
"magnificent", | ||
"majestic", | ||
"fantastically", | ||
"dazzlingly", | ||
] | ||
|
||
for i in range(min_length, max_length): | ||
idx = i % len(additional_words) | ||
base_text += " " + additional_words[idx] | ||
yield base_text | ||
|
||
|
||
class FluxGenerator: | ||
def __init__( | ||
self, | ||
model, | ||
enable_quantize=False, | ||
enable_fast_transformer=False, | ||
enable_speedup_t5=False, | ||
): | ||
self.pipe = FluxPipeline.from_pretrained( | ||
model, | ||
torch_dtype=torch.bfloat16, | ||
) | ||
|
||
# Put the quantize process after `self.pipe.to(device)` if you have more than 32GB ram. | ||
if enable_quantize: | ||
print("quant...") | ||
from nexfort.quantization import quantize | ||
|
||
self.pipe.transformer = quantize( | ||
self.pipe.transformer, quant_type="fp8_e4m3_e4m3_dynamic_per_tensor" | ||
) | ||
if enable_speedup_t5: | ||
self.pipe.text_encoder_2 = quantize( | ||
self.pipe.text_encoder_2, | ||
quant_type="fp8_e4m3_e4m3_dynamic_per_tensor", | ||
) | ||
|
||
self.pipe.to(device) | ||
|
||
if enable_fast_transformer: | ||
print("compile...") | ||
from nexfort.compilers import transform | ||
|
||
self.pipe.transformer = transform(self.pipe.transformer) | ||
if enable_speedup_t5: | ||
self.pipe.text_encoder_2 = transform(self.pipe.text_encoder_2) | ||
|
||
def warmup(self, gen_args, warmup_iterations=1): | ||
warmup_args = gen_args.copy() | ||
|
||
# warmup_args["generator"] = torch.Generator(device=device).manual_seed(0) | ||
torch.manual_seed(args.seed) | ||
|
||
print("Starting warmup...") | ||
start_time = time.time() | ||
for _ in range(warmup_iterations): | ||
self.pipe(**warmup_args) | ||
end_time = time.time() | ||
print("Warmup complete.") | ||
print(f"Warmup time: {end_time - start_time:.2f} seconds") | ||
|
||
def generate(self, gen_args): | ||
# gen_args["generator"] = torch.Generator(device=device).manual_seed(args.seed) | ||
torch.manual_seed(args.seed) | ||
|
||
# Run the model | ||
start_time = time.time() | ||
image = self.pipe(**gen_args).images[0] | ||
end_time = time.time() | ||
|
||
image.save(args.saved_image) | ||
|
||
return image, end_time - start_time | ||
|
||
|
||
def main(): | ||
flux = FluxGenerator(args.model, args.quantize, args.transform, args.speedup_t5) | ||
|
||
if args.run_multiple_prompts: | ||
dynamic_prompts = generate_texts(max_length=101) | ||
prompt_list = list(dynamic_prompts) | ||
else: | ||
prompt_list = [args.prompt] | ||
|
||
gen_args = { | ||
"prompt": args.prompt, | ||
"num_inference_steps": args.num_inference_steps, | ||
"height": args.height, | ||
"width": args.width, | ||
"guidance_scale": args.guidance_scale, | ||
"max_sequence_length": args.max_sequence_length, | ||
} | ||
|
||
flux.warmup(gen_args) | ||
|
||
for prompt in prompt_list: | ||
gen_args["prompt"] = prompt | ||
print(f"Processing prompt of length {len(prompt)} characters.") | ||
image, inference_time = flux.generate(gen_args) | ||
print( | ||
f"Generated image saved to {args.saved_image} in {inference_time:.2f} seconds." | ||
) | ||
cuda_mem_after_used = torch.cuda.max_memory_allocated() / (1024**3) | ||
print(f"Max used CUDA memory : {cuda_mem_after_used:.3f} GiB") | ||
|
||
if args.run_multiple_resolutions: | ||
gen_args["prompt"] = args.prompt | ||
print("Test run with multiple resolutions...") | ||
sizes = [1536, 1024, 768, 720, 576, 512, 256] | ||
for h in sizes: | ||
for w in sizes: | ||
gen_args["height"] = h | ||
gen_args["width"] = w | ||
print(f"Running at resolution: {h}x{w}") | ||
start_time = time.time() | ||
flux.generate(gen_args) | ||
end_time = time.time() | ||
print(f"Inference time: {end_time - start_time:.2f} seconds") | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |