Project 5: Fun With Diffusion Models (GitHub repo)


0 Sampling from Model

Number of steps = 5

"an oil painting of a snowy mountain village"
"a man wearing a hat"
"a rocket ship"

Number of steps = 20

"an oil painting of a snowy mountain village"
"a man wearing a hat"
"a rocket ship"

Number of steps = 50

"an oil painting of a snowy mountain village"
"a man wearing a hat"
"a rocket ship"

We can see that the output from the model in general reflect the text prompts' description. And we can observe that as the number of steps incrases, the quality of the output is higher and less cartoonish and it is closer to the description of the text prompt.

1.1 Forward Process

t=0 (original)
t = 250
t = 500
t = 750

1.2 Classical Denoising (Gaussian Blur)

t = 250
t = 500
t = 750
Classic Denoising t = 250
Classic Denoising t = 500
Classic Denoising t = 750

1.3 One-Step Denoising

t = 250
t = 500
t = 750
One-Step Denoising t = 250
One-Step Denoising t = 500
One-Step Denoising t = 750

1.4 Iterative Denoising

Step=0 (original noisy image)
Step=10
Step=15
Step=20
Step=25
Step = 30
Final

Comparision

Classic (Gaussian Blur) Denoising
One-step Denoising
Iterative Denoising

We can see that iterative denoising yeiled the best result. Comparing to one-step denoising method, the iterative denoising clearly has a better background and more details in the denoised image.

1.5 Diffusion Model Sampling

We can see that although the diffusion model can genearete some images, but their quality is not so good. We will use a technique called Classifier-Free Guidance (CFG) to improve it.

1.6 Classifier-Free Guidance (CFG)

We can see now, using CFG, the quality of the images are largely improved. And the content of the images "make more sense".

1.7 Image-to-image Translation

Demo 1

Start index=1
Start index=3
Start index=5
Start index=7
Start index=10
Start index=20
Original image

Demo 2

Start index=1
Start index=3
Start index=5
Start index=7
Start index=10
Start index=20
Original image

Demo 3

Start index=1
Start index=3
Start index=5
Start index=7
Start index=10
Start index=20
Original image

1.7.1 Editing Hand-Drawn and Web Images

Demo 1

Start index=1
Start index=3
Start index=5
Start index=7
Start index=10
Start index=20
Original image

Demo 2 (Hand-Drawn Image)

Start index=1
Start index=3
Start index=5
Start index=7
Start index=10
Start index=20
Original image

Demo 1 (Hand-Drawn Image)

Start index=1
Start index=3
Start index=5
Start index=7
Start index=10
Start index=20
Original image

1.7.2 Inpainting

Demo 1

Original Image
Mask
Area to be replaced index=5
Inpaint Image

Demo 2

Original Image
Mask
Area to be replaced index=5
Inpaint Image

Demo 3

Original Image
Mask
Area to be replaced index=5
Inpaint Image

1.7.3 Text-Conditional Image-to-image Translation

Demo 1, Prompt: "a rocket ship"

Start index = 1
Start index = 3
Start index = 5
Start index = 7
Start index = 10
Start index = 20
Original Image

Demo 2, Prompt: "an oil painting of a snowy mountain village"

Start index = 1
Start index = 3
Start index = 5
Start index = 7
Start index = 10
Start index = 20
Original Image

Demo 3, Prompt: "a lithograph of a skull"

Start index = 1
Start index = 3
Start index = 5
Start index = 7
Start index = 10
Start index = 20
Original Image

1.8 Visual Anagrams

Demo 1:

"an oil painting of people around a campfire"
"an oil painting of an old man"

Demo 2

"a photo of a dog"
"an oil painting of an old man"

Demo 3

"an oil painting of a snowy mountain village"
"a lithograph of a skull"

1.9 Hybrid Images

Demo 1:

"a lithograph of waterfalls"
"a lithograph of a skull"

Demo 2:

"an oil painting of an old man"
"a photo of a dog"

Demo 3:

"an oil painting of a snowy mountain village"
"a lithograph of a skull"

Training a Single-Step Denoising UNet

Noisying process

UNet Trainign Loss Curve

Sampling Result Comparison (Epoch 1 vs Epoch 5)

Out-of-Distribution Result

Training a Diffusion Model: Time-Conditioned UNet

Training Loss Curve

Sampling Results

Time-Conditioned UNet Epoch 1
Time-Conditioned UNet Epoch 5
Time-Conditioned UNet Epoch 10
Time-Conditioned UNet Epoch 15
Time-Conditioned UNet Epoch 20

Training a Diffusion Model: Class-Conditioned UNet

Training Loss Curve

Sampling Results

Time-Conditioned UNet Epoch 1
Time-Conditioned UNet Epoch 5
Time-Conditioned UNet Epoch 10
Time-Conditioned UNet Epoch 15
Time-Conditioned UNet Epoch 20

B&W

Course Icon generated using diffusion model and inpaint technique: keep the logo "UC Berkeley CS 280A" and use diffusion model to generate the backgournd using a customized prompt: ""a lithograph of a bear silhouette with grid pattern and blue and gold background with grid pattern".

Course Icon
logo image
logo mask