Posts

Showing posts from June, 2024

AI Reading Notes: Image And Video Gen

Image
  Overall highlights Common approach is to encode or add noise to image and convert it to a latent low dimension image first and then denoise/decode some random noise to generate image Because one complicated distribution model can be treated as a sequence of transformation of Gaussian model Loss function usually involves calculating KL divergence between encoder/ downsampling model (which reflects real training data distribution) and decoder/denoising model There are many variations of loss function. Some calculates loss function upper bound with KL divergence. Some generates loss function based on variational bayesian and graphical model. Diffusion model is the most popular one because it is both flexible and tractable though it is expensive to generate image with. Self Attention and Cross attention component can be added into one diffusion step to Calculate Gaussian distribution median and variation Cross reference different partitions of images Cross reference image a