aizmin in Tutorial

Creating a Stable Diffusion (Deforum) Video Tutorial

Deforum is a tool for creating animation videos with Stable Diffusion. You only need to provide the text prompts and settings for how the camera moves.

In this article, we will go through the steps of making this deforum video.

This post is for beginners who have not made a deforum video before. You will learn

What deforum is.
How to install the deforum extension on AUTOMATIC1111 Stable Diffusion.
The basic settings.
How to create your first deforum video step-by-step.

What is deforum?

Deforum is open-source and free software for making animations. It uses Stable Diffusion’s image-to-image function to generate a series of images and stitches them together to create a video.

It applies small transformations to an image frame and uses the image-to-image function to create the next frame. Since the change between frames is small, it creates the perception of a continuous video.

How to install deforum?

You will first install the deforum extension in AUTOMATIC1111 Stable Diffusion WebUI. You can use this GUI on Windows, Mac, or Google Colab.

Use deforum on Google Colab

If you use the Colab notebook in the Quick Start Guide, check the Deforum extension before starting AUTOMATIC1111.

That’s it!

You should see the Deforum tab in AUTOMATIC1111 GUI after startup.

Installing deforum on Windows or Mac

Follow these instructions to install deforum if you run AUTOMATIC1111 locally on Windows or Mac.

To install an extension in AUTOMATIC1111 Stable Diffusion WebUI:

Start AUTOMATIC1111 Web-UI normally.

2. Navigate to the Extension Page.

3. Click the Install from URL tab.

4. Enter the following URL in the URL for extension’s git repository field.

https://github.com/deforum-art/sd-webui-deforum

5. Click the Install button.

6. Wait for the confirmation message that the installation is complete.

7. Restart AUTOMATIC1111. You should see the Deforum tab after restarting the AUTOMATIC1111 GUI.

Generate a test video

This step is optional but will give you an overview of where to find the settings we will use.

Step 1: In AUTOMATIC1111 GUI, Navigate to the Deforum page.

Step 2: Navigate to the keyframes tab.

You will see a Motion tab on the bottom half of the page. Here’s where you will set the camera parameters.
Max frames are the number of frames of your video. Higher value makes the video longer.

You can use the default values.

Step 3: Navigate to the Prompts tab. You will see a list of prompts with a number in front of each of them. The number is the frame that the prompt becomes effective.

For the prompts below, it will use the first prompt at the beginning of the video. It will then switch to using the second prompt in at 30th frame, and the third prompt at the 60th frame, and the fourth prompt at the 90th frame.

You can leave the prompts as they are.

Step 4: Click on Generate to start generating a video.

Step 5: When it is done, click on the button above the Generate button to see the video.

You can save the video to your local storage by clicking the three vertical dots in the bottom right corner. Or you can find your video in the output directory under the img2img-images folder.

Basic settings (with examples)

We will first go through the two most important settings

Motions (2D and 3D)
Prompts

It’s important to understand what Deforum can do before going through the step-by-step examples for creating videos.

In this section, you will see examples of changing one parameter while keeping everything else fixed. These are the building blocks of your video.

By combining them and turning them on and off at different times, you can create stunning visual effects.

Motion settings

Motion settings are some of the most used options in Deforum. You can make a decent video by simply changing them and the prompts. So you should have a good grasp of how motion settings work and what they can do.

Let’s cover the two most used animation modes

2D – treat the images as 2D and perform various transformations like zoom and rotation to create an illusion of motion.
3D – treat the images as a view of a 3D scene. You can move the camera’s viewport in any 3D operation.

2D motion settings

2D Zoom

Use the zoom function to zoom in or out of the image. Use a zoom value larger than 1 to zoom in and less than 1 to zoom out.

The further away the value is from 1, the faster the zoom is.

By default, the zoom is focused at the center. You can control the focus by setting Transform Center X and Transform Center Y. We will cover them in a few scrolls down.

2D Angle

Use a 2D Angle to rotate the images. A positive value rotates the image counterclockwise, and a negative value rotates the image clockwise.

A larger value rotates the image faster.

By default, the rotation is around the center of the image. You can control the center of rotation by setting Transform Center X and Transform Center Y. We will cover them in a few scrolls down.

2D Translation X

Use Translation X to move the image sideways. Use a positive value to move the image to the right and a negative value to move the image to the left.

2D Translation Y

Use Translation Y to move the camera up and down. Use a positive value to move the image down and a negative value to move the image up.

2D translation Y: 5

2D translation Y: -5

2D Transform Center

Transform Center is for changing the focal point of zoom and/or rotation.

The default value is 0.5 for both X and Y, which is the center of the image. (X, Y) = (0, 0) is the top left corner, and (1, 1) is the bottom right corner. See the following diagram for other common locations.

You can specify values less than 0 or larger than 1. They will be outside of the image.

Below are two examples of zooming in at the top left corner (0, 0) and the bottom right (1, 1).

Transform Center (0,0) with zoom

Transform Center (1,1) with zoom

2D Perspective flip

Perspective flip performs 3D-like transformations to the image to create some cool effects.

You will need to select Enable perspective flip to enable these options.

theta: 12

phi: 12

gamma: 12

3D motion settings

3D motion is an alternative to 2D motion. Think of it as you are holding a camera. You can move and rotate the camera any way you want.

3D Translation X

Translation X moves the camera sideways. A positive value moves the camera to the right. A negative value moves the camera to the left.

3D translation X: 2

3D Translation Y

Translation Y moves the camera up and down. Using a positive value moves the camera up. A negative value moves the camera down.

3D translation Y: 2

3D Translation Z

Translation Z in 3D is similar to zoom in 2D motions.

3D translation Z: 2

3D rotation X

Rotation X rotates the camera about the X-axis.

3D rotation X: 2

3D rotation Y

Rotation Y rotates the camera about the Y-axis.

3D rotation Y: 2

3D rotation Z

Rotation Z rotates the camera about the Z-axis.

3D rotation Z: 2

Motion schedule

The motion settings are put in with the form

frame1:(value1), frame2:(value2), frame3:(value3), ...

Each entry consists of two numbers: The frame number it takes effect and the motion’s value. The frame and value of each entry have to be separated by a colon, and the value has to be bracketed.

You always need an entry for frame 0.

You can have as many entries as you want.

It’s important to note that when you have two or more entries, it means interpolation between the two frames.

For example, the following formula used in zoom means gradually increasing the zoom value from 1 to 1.02 over the first 100 frames and decreasing the zoom value back to 1 over the next 100 frames.

0:(1), 100:(1.02), 200:(1)

If you want a new zoom value to take effect starting the 100th frame, you can write something like:

0:(1), 99:(1), 100:(1.02), 150:(1.02), 151:(1), 200:(1)

This formula will apply the zoom effect only between frames 100 and 150.

Each setting has its motion schedule.

Zoom: 0:(1)

Angle: 0:(0)

Transform Center X: 0:(0.5)

Transform Center Y: 0:(1)

Translate X: 0:(0)

Translate Y: 0:(5), 60:(0)

Tips: cannot just write 180:(5). Write 0:(0), 180:(5). The first entry has to be for the 0th frame.

A step-by-step example

Step 1: Generate an initial image

The initial image is one of the few things you have total control over in a deforum video. It is arguably the most important because it sets the tone and color for the rest of the animation.

Take your time to generate a good starting image in the txt2img tab.

In this example, I used the following prompt.

portrait of henry cavill as james bond, casino, key art, sprinting, palm trees, highly detailed, digital painting, artstation, concept art, cinematic lighting, sharp focus, illustration, by gaston bussiere alphonse mucha

And this negative prompt.

deformed, disfigured

Set the seed to random (-1).

I used the Protogen v2.2 model to bring out a photorealistic illustration style.

Note down the seed value (highlighted in the screenshot above) once you see an image you like.

Step 2: Generate the first segment of the video

Enter the prompt in the Prompts tab. I decided to reuse the 2nd prompt in the default prompts. The prompts are

{
    "0": "portrait of henry cavill as james bond, casino, key art, sprinting, palm trees, highly detailed, digital painting, artstation, concept art, cinematic lighting, sharp focus, illustration, by gaston bussiere alphonse mucha --neg deformed, disfigured",
    "50": "anthropomorphic clean cat, surrounded by fractals, epic angle and pose, symmetrical, 3d, depth of field, ruan jia and fenghua zhong"
}

The prompt is switched to describing a cat in the 50th frame.

Now go to the Run tab.

Select the Protogen model.
Set the seed to 2020548858. Fixing the seed lets you start with the same image every time so you can keep building on the same video.

Since I have my James Bond facing left in the initial image, it is nice to have the camera moving right. We will use the 3D animation model.

In the Keyframes tab,

Select the 3D Animation mode.
Set Max frames to 100. This is for generating enough frames to see the first two prompts

In the Motion tab down below, set:

Translation X to 0:(2). This is for moving the camera to the right.
Translation Z to 0:(1.75). This is for zooming in at a bit slower rate.

Keep the rest 0:(0) for doing nothing.

Press Generate to start making the video.

This is the video so far.

The camera is moving in the way we expected. James Bond transitioned to a fractal cat nicely.

So far so good.

Step 3: Add the next prompt

Now brainstorm the next prompt in the txt2img tab.

I decided it would be nice to transition to a space scene. This is the final deforum prompt.

{
    "0": "portrait of henry cavill as james bond, casino, key art, sprinting, palm trees, highly detailed, digital painting, artstation, concept art, cinematic lighting, sharp focus, illustration, by gaston bussiere alphonse mucha --neg deformed, disfigured",
    "50": "anthropomorphic clean cat, surrounded by fractals, epic angle and pose, symmetrical, 3d, depth of field, ruan jia and fenghua zhong",
    "90": "giant floating space station, futuristic, star war style, highly detailed, beautiful machine aesthetic, in space, galaxies, dark deep space <lora:epiNoiseoffset_v2:1> --neg bad art, amateur"
}

(I used the epi_noiseoffset LoRA model modifier in the third prompt. See the LoRA tutorial for details.)

Setting the following parameters

Max frames to 250.
Rotation 3D X to 0:(0), 70:(0), 71:(0.5). This adds a change of rotation at frame 71.

The rest of the settings are kept the same. Below are the final motion settings.

Press Generate.

We get the final video.

You will typically spend a lot of time messing with the motion and prompts to achieve the exact effect you want.

You can repeat this step and add as many prompts as you want.

Tips

Prompts with a large subject work better than scenes with many small objects.
The small details will frequently change. This is just how image-to-image works. So prompts with patterns (like a fractal) or imaginative subjects tend to work better as the second and later prompts.
If you see artifacts during a prompt transition, shifting the frame of the prompt by a few frames may eliminate the artifact.
Make an animated gif by using the Ouput option Make GIF.
Use Delete Imgs option in Output options to delete the intermediate images automatically and only keep the video.
Add a soundtrack by using the output option Add a soundtrack.

Useful Resources

FizzleDorf’s Animation Guide – Deforum – A comprehensive guide on parameters and settings.
Animation Video Examples Gallery – Video examples of some parameters.
Official deforum site
Quick Guide to deforum – Mostly about the deforum Colab Notebook but you will also find explanations of parameters.
Deforum Discord – A good page to ask for help and see what others are making.
Create Amazing Videos With AI (Deforum Deep-Dive) – The creator of Deforum

Next Read: Edit Photos with Text: Enhance Images Using Pix2Pix »

aizmin: