SD-WEB-UI | ComfyUI | decadetw-Auto-Prompt-LLM-Vision

Quick Links

Auto prompt by LLM and LLM-Vision (Trigger more details out inside model)
- SD-WEB-UI: https://github.com/xlinx/sd-webui-decadetw-auto-prompt-llm
- ComfyUI: https://github.com/xlinx/ComfyUI-decadetw-auto-prompt-llm
Auto msg to ur mobile (LINE | Telegram | Discord)
- SD-WEB-UI :https://github.com/xlinx/sd-webui-decadetw-auto-messaging-realtime
- ComfyUI: https://github.com/xlinx/ComfyUI-decadetw-auto-messaging-realtime
I'm SD-VJ. (share SD-generating-process in realtime by gpu)
- SD-WEB-UI: https://github.com/xlinx/sd-webui-decadetw-spout-syphon-im-vj
- ComfyUI: https://github.com/xlinx/ComfyUI-decadetw-spout-syphon-im-vj
CivitAI Info|discuss:

SD-WEB-UI | ComfyUI | decadetw-Auto-Prompt-LLM-Vision

Update Log

[add|20240730] | 🟢 LLM Recursive Prompt
[add|20240730] | 🟢 Keep ur prompt ahead each request
[add|20240731] | 🟢 LLM Vision
[add|20240803] | 🟢 translateFunction
- When LLM answered, use LLM translate result to your favorite language.
  - ex: Chinese. It's just for your reference, which won't affect SD.
[add|20240808] | 🟠 Before and After script | exe-command
[add|20240808] | 🟠 release LLM VRAM everytimes

Motivation💡

Call LLM : auto prompt for batch generate images
Call LLM-Vision: auto prompt for batch generate images
Image will get more details that u never though before.
prompt detail is important

Usage

LLM-Text

batch image generate with LLM
- a story
Using Recursive prompt say a story with image generate
Using LLM
- when generate forever mode
  - example as follows figure Red-box.
  - just tell LLM who, when or what
  - LLM will take care details.
- when a story-board mode (You can generate serial image follow a story by LLM context.)
  - its like comic book
  - a superstar on stage
  - she is singing
  - people give her flower
  - a fashion men is walking.

LLM-Vision 👀

batch image generate with LLM-Vision
- let LLM-Vision see a magazine
- see series of image
- see last-one-img for next-image
- make a serious of image like comic

Before and After script

support load script or exe-command Before-LLM and After-LLM
javascript fetch POST method (install Yourself )
- security issue, but u can consider as follows
- https://github.com/pmcculler/sd-dynamic-javascript
- https://github.com/ThereforeGames/unprompted
- https://github.com/adieyal/sd-dynamic-prompts
- https://en.wikipedia.org/wiki/Server-side_request_forgery
- and Command Line Arg --allow-code

[🟢] stable-diffusion-webui-AUTOMATIC1111
[🟢] stable-diffusion-webui-forge
[🟢] ComfyUI
1. SD-Prompt ✦
1girl
2.1 LLM-Text ✦	2.2 LLM-Vision ✦
a super star on stage.	Who is she in image?
2.3 LLM-Text-sys-prompt ✦	2.4 LLM-Vision-sys-prompt ✦
You are an AI prompt word engineer. Use the provided keywords to create a beautiful composition. Only the prompt words are needed, not your feelings. Customize the style, scene, decoration, etc., and be as detailed as possible without endings.	You are an AI prompt word engineer. Use the provided image to create a beautiful composition. Only the prompt words are needed, not your feelings. Customize the style, scene, decoration, etc., and be as detailed as possible without endings.
3. LLM will answer other detail ✦
The superstar, with their hair flowing in the wind, stands on the stage. The lights dance around them, creating a magical moment that fills everyone present with awe. Their eyes shine bright, as if they are ready to take on the world.
The superstar stands tall in their sparkling costume, surrounded by fans who chant and cheer their name. The lights shine down on them, making their hair shine like silver. The crowd is electric, every muscle tense, waiting for the superstar to perform

4. Main Interface \| sd-web-ui \| ComfyUI




ComfyUI Manager \| search keyword: auto

Usage

Input	Output
LLM-Text: a superstar on stage. LLM-Vision: What's pose in this image?. (okay, its cool.)
LLM-Text: a superstar on stage. LLM-Vision: with a zebra image (okie, cooool show dress. At least we don't have half zebra half human.)
LLM-Text: a superstar on stage. (okay, its cool.)
LLM: a superstar on stage. (Wow... the describe of light is great.)
LLM: a superstar on stage. (hnn... funny, it does make sense.)
CHALLENGE LLM-vision:A Snow White girl walk in forest. (detect ur LLM-Vision Model IQ; if u didnt get white dress and lot of snow.... plz let me know model name) SD model: Flux.1 D LLM model: llava-llama-3.1-8b LLM model: Eris_PrimeV4-Vision-32k-7B-IQ3_XXS
FLUX model hnn...NSFW show. I'm not mean that, but not a wrong answer. (Trigger more details; that u never thought about it.) SD model: Flux.1 D LLM model: llava-llama-3.1-8b LLM model: Eris_PrimeV4-Vision-32k-7B-IQ3_XXS
advanced use \| before-after-action in fact, u can run any u want script \| (storyboard) \| random read line from txt send into LLM
Special LLM Loop Connect 1st LLM-Text output to 2nd LLM-Text Input
Special LLM Loop - keep each feature assign to different obj not mix it on one. LLM-Text output ask looply : here
[new tool 20240915] Civitai Prompt Grabber quick prompt from civitai. u can pick some prompt from another area model(ex indoor design or building model) with ur 1girl, ex: 1girl(up figure)) + in-door design model-prompt. then u will get full detail in background(bottom figure) : https://civitai.com/models/85691 this is good present in FLUX model. trigger more detail in background. Make the photo getting more realistic feeling option1. just quick append prompt from other model from civitai or option2. of course u can send it into LLM too.
[update] LLM-ask-LLM🌀 [support] Cloud Service: Gemini Pro cloud service: https://generativelanguage.googleapis.com/v1 model: gemini-1.5-flash (vision) support text and vision it will get more🌀 and more🌀 and more🌀 like....(bottom to top)

Usage Tips

tips1:
- leave only 1 or fewer keyword(deep inside CLIP encode) for SD-Prompt, others just fitting into LLM
- SD-Prompt: 1girl, [xxx,]<--(the keyword u use usually, u got usually image)
- LLM-Prompt: xxx, yyy, zzz, <--(move it to here; trigger more detail that u never though.)
tips2:
- leave only 1 or fewer keyword(deep inside CLIP encode) for SD-Prompt, others just fit into LLM
- SD-Prompt: 1girl,
- LLM-Prompt: a superstar on stage. <--(say a story)
tips3:
- action script - Before
  - random/series pick prompt txt file random line fit into LLM-Text [read_random_line.bat]
  - random/series pick image path file fit into LLM-Vision
- action script - After
  - u can call what u want command
  - ex: release LLM VRAM each call: "curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'" @Pdonor
  - ex: bra bra. Interactive anything.
tipsX: Enjoy it, inspire ur idea, and tell everybody how u use this.

Installtion

You need install LM Studio or ollama first.
- LM Studio: Start the LLM service on port 1234. (suggest use this one)
- ollama: Start service on port 11434 .
Pick one language model from under list
- text base(small ~2G)
- text&vision base(a little big ~8G)
Start web-ui or ComfyUI install extensions or node
- stable-diffusion-webui | stable-diffusion-webui-forge:
  - go Extensions->Available [official] or Install from URL
    - https://github.com/xlinx/sd-webui-decadetw-auto-prompt-llm
- ComfyUI: using Manager install node
  - Manager -> Customer Node Manager -> Search keyword: auto
  - https://github.com/ltdrdata/ComfyUI-Manager
  - https://registry.comfy.org/
  - https://ltdrdata.github.io/
Open ur favorite UI
- Lets inactive with LLM. go~
- trigger more detail by LLM

Suggestion software info list

https://lmstudio.ai/ (win, mac, linux)
https://ollama.com/ (win[beta], mac, linux)
https://github.com/openai/openai-python
https://github.com/LostRuins/koboldcpp (all os)

Suggestion LLM Model

LLM-text (normal, chat, assistant)
- 4B VRAM<2G
  - CHE-72/Qwen1.5-4B-Chat-Q2_K-GGUF/qwen1.5-4b-chat-q2_k.gguf
    - https://huggingface.co/CHE-72/Qwen1.5-4B-Chat-Q2_K-GGUF
- 7B VRAM<8G
  - ccpl17/Llama-3-Taiwan-8B-Instruct-GGUF/Llama-3-Taiwan-8B-Instruct.Q2_K.gguf
  - Lewdiculous/L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix/L3-8B-Stheno-v3.2-IQ3_XXS-imat.gguf
- Google-Gemma
  - https://huggingface.co/bartowski/gemma-2-9b-it-GGUF
  - bartowski/gemma-2-9b-it-GGUF/gemma-2-9b-it-IQ2_M.gguf
    - small and good for SD-Prompt
LLM-vision 👀 (work with SDXL, VRAM >=8G is better )
- https://huggingface.co/xtuner/llava-phi-3-mini-gguf
  - llava-phi-3-mini-mmproj-f16.gguf (600MB,vision adapter)
  - ⭐⭐⭐llava-phi-3-mini-f16.gguf (7G, main model)
- https://huggingface.co/FiditeNemini/Llama-3.1-Unhinged-Vision-8B-GGUF
  - llava-llama-3.1-8b-mmproj-f16.gguf
  - ⭐⭐⭐Llama-3.1-Unhinged-Vision-8B-Q8.0.gguf
- https://huggingface.co/Lewdiculous/Eris_PrimeV4-Vision-32k-7B-GGUF-IQ-Imatrix#quantization-information
  - quantization_options = ["Q4_K_M", "Q4_K_S", "IQ4_XS", "Q5_K_M", "Q5_K_S","Q6_K", "Q8_0", "IQ3_M", "IQ3_S", "IQ3_XXS"]
  - ⭐⭐⭐⭐⭐for low VRAM super small: IQ3_XXS (2.83G)
  - in fact, it's enough uses.

Using Online LLM Service Setup example

OpenAI ChatGPT

In Auto-LLM Setup tab
- LLM-URL=https://api.openai.com/v1
get ur api key from openAI : https://platform.openai.com/api-keys
- LLM-API-KEY = xxxxxxxxxxxxxxxxxxxxxxx
- LLM-Model-Name = gpt-3.5-turbo

Google Gemini

LLM-URL= https://generativelanguage.googleapis.com/v1
get the api key: https://ai.google.dev/gemini-api/docs/api-key?hl=zh-tw

X Grok

register here: https://x.ai/geo-block
Its not open for my region. i cant test for u guys.

claude.ai

seems not for api call: https://claude.ai/upgrade

Hugging face space

https://huggingface.co/spaces

Javascript!

security issue, but u can consider as follows.

https://github.com/pmcculler/sd-dynamic-javascript
https://github.com/ThereforeGames/unprompted
https://github.com/adieyal/sd-dynamic-prompts
https://en.wikipedia.org/wiki/Server-side_request_forgery
and Command Line Arg --allow-code

Buy me a Coca cola ☕

https://buymeacoffee.com/xxoooxx

Colophon

Made for fun. I hope if brings you great joy, and perfect hair forever. Contact me with questions and comments, but not threats, please. And feel free to contribute! Pull requests and ideas in Discussions or Issues will be taken quite seriously! --- https://decade.tw

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
before-after-actions		before-after-actions
images		images
javascript		javascript
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.py		install.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Links

SD-WEB-UI | ComfyUI | decadetw-Auto-Prompt-LLM-Vision

Update Log

Motivation💡

Usage

LLM-Text

LLM-Vision 👀

Before and After script

Usage

Usage Tips

Installtion

Suggestion software info list

Suggestion LLM Model

Using Online LLM Service Setup example

OpenAI ChatGPT

Google Gemini

X Grok

claude.ai

Hugging face space

Javascript!

Buy me a Coca cola ☕

Colophon

About

Releases

Packages

Contributors 2

Languages

License

xlinx/sd-webui-decadetw-auto-prompt-llm

Folders and files

Latest commit

History

Repository files navigation

Quick Links

SD-WEB-UI | ComfyUI | decadetw-Auto-Prompt-LLM-Vision

Update Log

Motivation💡

Usage

LLM-Text

LLM-Vision 👀

Before and After script

Usage

Usage Tips

Installtion

Suggestion software info list

Suggestion LLM Model

Using Online LLM Service Setup example

OpenAI ChatGPT

Google Gemini

X Grok

claude.ai

Hugging face space

Javascript!

Buy me a Coca cola ☕

Colophon

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages