Skip to content

Latest commit

 

History

History
36 lines (31 loc) · 2.26 KB

README.md

File metadata and controls

36 lines (31 loc) · 2.26 KB

Needs fixes

Latent Diffusion Multi-Model with CLIP guidance

This is a Web Application that pertains to "generating" Images based on the specific prompt. A machine learning model is implemented, which utilizes the input prompt as a guidance to the kind of image that should be denoised. An iterative process of Denoising a completely obscure image, until termination of the loop. Utilizes multiple Deep Learning Model Arcitectures

During Training

1) The idea is such that we use a Gaussian distribution to noise the training image at particular timestamps. Instead of sequentially utilizing the output of the previous timestep to apply noise until t timestamp, we directly sample the noised image at all timestamps Xt, this can be done because the sum of Gaussian distribution is nothing but Gaussian itself.

2) Consequently, the output of this process is then fed into a UNet Model along with the text label for the pertaining image. The goal of the UNet model is to transform the text label, and the image into a smaller dimensional space, famously known as the latent space. The latent space is the representation of compressed data, containing data that are similar, are closer together. ( Represents the Probability Distribution of the data )

3) Contrastive Loss and Cosine Similarity are utilized as guidance to optimize the generation of these images. The weights in the attention modules are shifted accordingly, until the loss reaches a minimal amount.

Run it Locally

git clone https://github.com/Aryan-Deshpande/Latent-Diffusion-AI
docker compose up

then go to localhost:3001

Papers Referenced / Used