We can see that the output from the model in general reflect the text prompts' description. And we can observe that as the number of steps incrases, the quality of the output is higher and less cartoonish and it is closer to the description of the text prompt.
We can see that iterative denoising yeiled the best result. Comparing to one-step denoising method, the iterative denoising clearly has a better background and more details in the denoised image.
We can see that although the diffusion model can genearete some images, but their quality is not so good. We will use a technique called Classifier-Free Guidance (CFG) to improve it.
We can see now, using CFG, the quality of the images are largely improved. And the content of the images "make more sense".