Transforming videos into animation is never easier with Stable Diffusion AI. You will find step-by-step guides for 5 video-to-video techniques in this article. The best of all: You can run them FREE on your local machine!
- ControlNet-M2M script
- ControlNet img2img
- Mov2mov extension
- SD-CN Animation extension
- Temporal Kit
- AnimateDiff prompt travel
They all use a similar approach of transforming each video frame individually with the help of ControlNet.
At the end of the article, I will survey other video-to-video methods for Stable Diffusion.
Now you will see a sample video from each method. Below is the original video.
(Download this video here if you wish to use it in the tutorial.)
Below is an example of using method 1: ControlNet-M2M script method.
Below is an example of using method 2: ControlNet img2img.
Below is an example of using method 3: mov2mov extension.
Below is an example of using method 4: SD-CN Animation extension.
Here’s an example of Method 5: Temporal Kit.
Here’s an example of Method 6: AnimateDiff Prompt Travel.
Software
You will need AUTOMATIC1111 Stable Diffusion GUI. You can use this GUI on Windows, Mac, or Google Colab.
You will need to install the ControlNet extension. Follow the installation instructions in the ControlNet article.
Method 1: ControlNet m2m script
This video-to-video method is simpler to use but has a higher amount of flickering.
We will use the following video.
Use the following button to download the video if you wish to follow with the same video.
Step 1: Update A1111 settings
Before using the controlnet m2m script in AUTOMATIC1111, you must go to Settings > ControlNet. Select the following two options.
- Do not append detectmap to output: Yes
- Allow other script to control this extension: Yes
The first option disables saving the control image to the image output folder, so you can grab the frame images easier.
The second setting lets the controlnet m2m script feed the video frames to the ControlNet extension.
Click Apply Settings. Reload the Web-UI page.
Step 2: Upload the video to ControlNet-M2M
In AUTOMATIC1111 Web-UI, navigate to the txt2img page.
In the Script dropdown menu, select the ControlNet m2m script.
Expand the ControlNet-M2M section.
Upload the mp4 video to the ControlNet-0 tab.
Step 3: Enter ControlNet setting
Expand the ControlNet section. Enter the following settings:
Enable: Yes
Pixel Perfect: Yes
Control Type: Lineart
Preprocessor: lineart realistic
Model: control_xxxx_lineart
Control weight: 0.6
For your own videos, you will want to experiment with different control types and preprocessors.
Step 4: Enter txt2img settings
Select a model you wish to use in the Stable Diffusion checkpoint at the top of the page. I will use deliberate v2.
Come up with a prompt and a negative prompt. I will use the following:
photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting, white background
deformed, disfigured, ugly
Enter the following generation parameters:
Sampling method: Euler a
Sampling steps: 20
Width: 768
Height: 512
CFG Scale: 7
Seed: 100
The seed value needs to be fixed to reduce flickering. Changing the seed will change the background and the look of the character.
Click Generate.
Step 5: Make an animated GIF or mp4 video
The script converts the image with ControlNet frame-by-frame. You will find a series of png files in the txt2img output folder.
You have two options: (1) Combine the PNG files into an animated GIF, and (2) make an mp4 video.
Animated GIF
Use the EZGIF page to convert the png files to an animated GIF.
In the GIF option, set the Delay time to 5 (in 1/100 seconds) for 20 frames per second.
Here’s the final animated GIF.
MP4 video
Use the following command to convert the png files to an MP4 video. (You will need to have ffmpeg installed on your local PC)
ffmpeg -framerate 20 -pattern_type glob -i '*.png' -c:v libx264 -pix_fmt yuv420p out.mp4
A reader pointed out that the above command didn’t work on Windows and suggested the command below instead.
ffmpeg -framerate 20 -pattern_type sequence -start_number 00000 -i ‘%05d-100.png’ -c:v libx264 -pix_fmt yuv420p out.mp4
Notes for ControlNet m2m script
Unfortunately, as of the time of writing, multiple ControlNet does NOT work with the m2m script. As we will see later, multiple ControlNet is a useful technique in reducing flickering. Hope future updates will rectify that.
Here’s another video transformed with ControlNet Line art realistic.
Experiment with different ControlNets to get different results. Here’s a video with the Tile resample.
Method 2: ControlNet img2img
This video-to-video method converts a video to a series of images and then uses Stable Diffusion img2img with ControlNet to transform each frame.
Use the following button to download the video if you wish to follow with the same video.
Step 1: Convert the mp4 video to png files
You can use the ezgif site to convert the mp4 video to png image files.
Upload the mp4 video file.
Use the following settings.
- Size: Original
- Frame rate: 10 fps
Click Convert to PNG!
Click Download frames as ZIP.
Extract the ZIP file to a folder of png image files.
Rename the folder name as video
. Now you should have a folder called video
containing the png files.
Alternatively, below are the commands if you prefer to use the command line to convert the mp4 video to png files.
mkdir video
ffmpeg -i girl_dance.mp4 -r 10 video/%05d.png
Step 2: Enter Img2img settings
You will need image-to-image and ControlNet settings to apply to each frame.
Now open AUTOMATIC1111.
We need some special settings for ControlNet. Go to the Settings page.
Click Show all pages on the left panel.
Search the text (Ctrl+F for Windows. Cmd+F for Mac) “Quicksettings list”. Add “initial_noise_multiplier” and “img2img_color_correction” to the list.
It should look like this:
Go to the top of the page. Click Apply settings and then Reload UI.
You should see two new settings: Noise multiplier for img2img and Apply color correction… on top of the page.
Set Noise multipler for img2img to 0.5. This scaling factor is applied to the random latent tensor for img2img. Lowering it reduces flickering.
Check Apply color correction… setting. This option is for matching the color of the original content. This helps color consistency across frames.
Now one more setting… Go to Settings > ControlNet. Select the following option.
- Do not append detectmap to output: Yes
Click Apply Settings. Reload the Web-UI.
Go to img2img page.
In the Stable Diffusion checkpoint at the top of the page, select a model you wish to use. I will use deliberate v2.
Come up with a prompt and a negative prompt. I will use:
photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting
deformed, disfigured, ugly
Upload one of the frames (i.e. the png files) to the img2img canvas.
Resize mode: Just resize
Sampling method: DPM++ 2M Karras
Sampling Steps: 20
Width: 908 (This is set to maintain the aspect ratio of the video)
Height: 512 (The shorter dimension is fixed to 512)
CFG scale: 20 (Experiment with this. The higher you set, the more it follows the prompt.)
Denoising strength: 0.4 (Experiment with this. The higher you set, the more changes but also more flickering)
Seed: -1 (random)
Step 3: Enter ControlNet settings
Now go to the ControlNet section…
Upload the same frame to the image canvas.
Enable: Yes
Pixel Perfect: Yes
Allow Preview: Yes
Control Type: Lineart
Preprocessor: Lineart Realistic
Model: control_xxxx_lineart
(Experiment with the control type, preprocessor, and model. Many of them will work just as well. The goal is to see the details like eyes, mouth, hairstyle got outlined in the preview)
Control Weight: 0.6 (Lower it when you see color artifacts)
Step 4: Choose a seed
Press Generate to test the setting.
Select the image on the left panel once you are happy with the effect.
To fix the seed, click the recycle icon next to the Seed value box. You should see the value changes from -1 to a positive number.
Step 5: Batch img2img with ControlNet
Now with all the hard work, you have generated one frame… The goal is to apply the same setting to ALL frames. Luckily, you can do that with batch processing.
First, remove the reference image in the ControlNet section. This step is important. Otherwise, you will be using this reference image for all frames! Click the cross icon on the top right to remove the image.
Confirm you see the reference image removed, like the screenshot below.
Keep the rest of the ControlNet settings untouched.
Now switch to the Batch tab on the Img2img page.
Enter the paths of the
- Input directory: The folder containing the PNG files of your video.
- Output directory: A new folder for your processed PNG files.
In Windows, the input directory is the folder location of the PNG files in File Explorer.
If you use Google Colab, copy the PNG files to your Google Drive and specify a path. The path can be found in the file explorer on the left and right-clicking a folder.
Click Generate to start the generation process.
Step 6: Convert the output PNG files to video or animated gif
Animated GIF
Use the EZGIF page to convert the png files to an animated GIF.
In the GIF option, set the Delay time to 10 (in 1/100 seconds) for 10 frames per second.
Here’s the video-to-video result (Denoising strength 0.5):
Increasing the denoising strength to 0.7 changes the video more but also increases flickering.
MP4 video
Use the following command to convert the png files to an MP4 video.
ffmpeg -framerate 10 -pattern_type glob -i '*.png' -c:v libx264 -pix_fmt yuv420p out.mp4
Note on ControlNet img2img
This is probably the most laborious out of all video-to-video methods. The reason you want to use it is simple: To gain total control of the process.
You will see quite a few Youtubers advocate this method. I recommend Enigmatic_e‘s videos to learn more about this method (and, generally, video-making with Stable Diffusion). This video from Corridor Crew walks you through a laborious method that produces high-quality Stable Diffusion videos.
Method 3: Mov2mov extension
The Mov2mov extension automates many of the manual steps of video-to-video tasks.
Use the following button to download the video if you wish to follow with the same video.
Step 1: Install Mov2mov extension
In AUTOMATIC1111 Web-UI, navigate to the Extension page.
Select Install from URL tab.
In the URL for extension’s git repository field, enter
https://github.com/Scholar01/sd-webui-mov2mov
Click Install.
Completely close and restart the Web-UI.
Step 2: Enter mov2mov settings
You should see a new page called mov2mov.
Select a Stable Diffusion checkpoint in the dropdown menu at the page top. I used Deliberate v2.
Enter the prompt and the negative prompt.
photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting, white background
deformed, disfigured, ugly
Upload the video by dropping it to the video canvas.
Resize mode: Crop and resize.
Set the width to 768 and the height to 512 for a landscape movie. (Adjust according for your own video)
The extension has a nice slider for noise multiplier. Keep it at 0 to reduce flickering.
Adjust the CFGscale to control how much the prompt should be followed. (7 in this video)
Adjust the denoising strength to control how much the video should be changed. (0.75 in this video)
The Max frame is the total number of frames to be generated. Set to a low number e.g. 10 for initial testing. Set to -1 to generate a full-length video.
The seed determines the seed value of the FIRST frame. All frames will use the same seed value even if you set the seed to -1 (random).
Step 3: Enter ControlNet settings
Enter the following settings for ControlNet.
Enable: Yes
Pixel Perfect: Yes
Control Type: Lineart
Preprocessor: lineart_realistic
Model: control_xxxx_lineart (See installation instructions)
Control weight: 0.6
Important: Don’t upload a reference image. Mov2mov will use the current frame for the reference image.
Step 4: Generate the video
Click Generate to start generating the video.
It will take a while… When it is down, your new video will appear on the right.
Click Save to save the video.
Go to the output/mov2mov-videos
folder to find the video if it doesn’t show up.
Try a different Video Mode if there is an error.
If the video generation fails, make the video yourself from the image series. They are in the folder output/mov2mov-images
. Follow this step to convert the images to a video.
Here’s the final video from Mov2mov.
Note for mov2mov
For some reason, deterministic samplers (e.g. Euler, LMS, DPM++2M Karras…) do NOT work well with this extension. Otherwise, it would be a good way to reduce flickering.
Method 4: SD-CN-Animation
SD-CN-Animation is an AUTOMATIC1111 extension that provides a convenient way to perform video-to-video tasks using Stable Diffusion.
SD-CN-Animation uses an optical flow model (RAFT) to make the animation smoother. The model tracks the movements of the pixels and creates a mask for generating the next frame.
Note that this extension does not work for all videos. For example, it produces poor results with the video used in the previous 3 methods. Presumably, it is because of its dark background.
So I switch to another video for this walkthrough.
Download this video here if you want to use it to follow this tutorial.
Step 1: Installing the extension
In AUTOMATIC1111 Web-UI, navigate to the Extension page.
Select Install from URL tab.
In the URL for extension’s git repository field, enter
https://github.com/volotat/SD-CN-Animation
Click Install.
Completely close and restart the Web-UI.
Step 2: Enter SD-CN-Animation parameters
In AUTOMATIC1111 Web-UI, navigate to the SD-CN-Animation page.
Make sure “Apply color correction to img2img results to match original colors.” is NOT selected. (If you have enabled this option when testing the previous method) This color correction affects the RAFT model and produces poor results.
Upload the mp4 video file to the Input video section.
Set the width to 512. Set to height to 512. (Adjust accordingly for your video.)
Set the Prompt to
photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting, white background
Set the Negative Prompt to
deformed, disfigured, ugly
Set the sampling method to DPM++2M Karras.
Step 3: Enter ControlNet Settings
We will use 2 ControlNets. If you don’t see multiple ControlNet tabs, go to Settings > ControlNet to enable them.
For ControlNet Unit 0:
- Enable: Yes
- Pixel Perfect: Yes
- Control Type: Line Art
- Preprocessor: Line art realistic
- Model: control_xxxx_lineart
- Control weight: 0.6
DON’T upload an image.
Leave all other settings as the default.
For ControlNet Unit 1:
- Enable: Yes
- Pixel Perfect: Yes
- Control Type: Tile
- Preprocessor: Tile resample
- Model: control_xxxx_tile
- Control weight: 0.5
DON’T upload an image.
Leave all other settings as the default.
Step 4: Generate the video
Click Generate to start processing.
Once it is done, right-click on the video, and you will find an option to save it.
Here’s what I got.
Notes for SD-CN-Animation
I like the fact that this extension is quite polished. Things work without error. (My expectation in software engineering is low when using A1111…)
Make sure you have unchecked “Apply Color correction…” for img2img. Otherwise, you won’t get the best results.
All samplers work in this extension. Make sure to pick a deterministic sampler to reduce flickering. (See comments on flickering below)
Method 5: Temporal Kit
Temporal Kit implements several methods for video-to-video conversion. I’m only going to tell you the best one here.
The basic idea is to pick keyframes across the video (e.g. 16), stylize them with image-to-image, and use them as references to paint adjacent frames.
This method was pioneered by EbSynth, a computer program for painting videos. It was created before Stable Diffusion, but img2img capability in Stable Diffusion has given it a new life.
However, the result will be poor if you do image-to-image on individual frames. The reason is that the resulting images lack coherence.
The trick is to transform ALL keyframes at once by stitching them together in one giant sheet. Like this:
We used to do it manually. But with Temporal Kit, you don’t have to.
Step 1: Install Temporal Kit extension
In AUTOMATIC1111 Web-UI, navigate to the Extension page.
Select Install from URL tab.
In the URL for extension’s git repository field, enter
https://github.com/CiaraStrawberry/TemporalKit
Click Install.
Completely close and restart the Web-UI.
Step 2: Install FFmpeg
Visit FFmpeg’s download page and download the FFmpeg program for your platform.
It should be a zip file. After unzipping, you should see a file called ffmpeg
or ffmpeg.exe
. This is the FFmpeg program you need!
But to let Temporal Kit use it, you need to put it in the PATH so that it can be accessed anywhere, by everyone.
If you know what PATH means. Put it in one of the directories in the PATH.
Read on if not…
Windows
Press the Windows key. Type envirnoment
and click the item “Edit environment variables for your account”.
Select PATH, and then Edit.
Add a new entry by clicking New and then type
%USERPROFILE%bin
After adding, you should see the new entry of the above path.
Click OK to save and exit.
Open File Explorer. In the address bar, type
%USERPROFILE%
And press Enter. You should have gone to your home folder.
Create a new folder called bin
.
Test going there by putting the following in the address bar and press Enter.
%USERPROFILE%bin
You should be in your newly created folder bin
.
Now put ffmpeg.exe
in this folder, and you are all set. Now the file is in your PATH.
To test, open a command prompt by pressing the Windows key and type cmd
. Press Enter.
In the command prompt, type
ffmpeg
and press Enter.
You should see ffmpeg’s help page.
Mac or Linux
Open the Terminal App.
Create a new folder bin
in your home directory.
mkdir ~/bin
Put the ffmpeg
file in the new directory. You can use Finder.
Edit .zprofile
in your home directory.
Add the following line
export PATH=~/bin:$PATH
Start a new Terminal and type
ffmpeg
You should see the help page of ffmpeg displayed. This verifies FFmpeg is in your path.
Step 3: Enter Pre-processing parameters
In AUTOMATIC1111, Go to the Temporal Kit page.
Go to the Pre-Processing tab.
Upload your video to the Input video canvas. (Download this video if you want to use the same video to follow the tutorial.)
Next is to generate one giant sheet of keyframes. This sheet will go through img2img so that all keyframes will have the same style.
Set:
- Side: 4 (This is setting a 4×4 images grid)
- Height resolution: 2048 (Since each image is 512, 512×4 = 2048)
- frames per keyframe: 4 (How many frames each keyframe is responsible for)
- fps: 30
- EbSyth mode: Yes
- Target Folder: Put in a folder name path to wish to save this project to. E.g.
G:temporalkittest1
Click Run on the right panel. You should see a sheet of 4×4 keyframes generated.
Make sure these keyframes cover the whole video for your own video. Adjust those parameters if not.
If you encounter out of memory issue in the next img2img step, reduce the side or resolution parameters.
Click Save Settings if you are happy with the result.
Click Send to img2img.
Step 4: Perform Img2img on keyframes
Go to the Img2img page. The giant sheet of keyframes should already be in the image canvas.
Switch to the Batch tab.
Input directory: The name of your target directory with input
appended. E.g. G:temporalkittest1input
Output directory: Similarly but with output
appended. Eg. G:temporalkittest1output
The image size should be corrected and set automatically. (2048×2048).
Enter a prompt. I used
photo of Sci fi cute girl, pink hair, photorealistic, in the style of franciszek starowieyski, white porcelain sci fi, mecha, 32k uhd, machine aesthetics, dark white and azure, hans zatzka, silver and pink, science fiction city,shiny pink hair, half body, oil painting, white background
And a negative prompt:
deformed, disfigured, ugly
Sampling method: DPM++2M Karras
Sampling steps: 20
CFG scale: 7
Denoising strength: 0.5 (adjust accordingly)
In ControlNet (Unit 0) section, set:
- Enable: Yes
- Pixel Perfect: Yes
- ControlType: Tile
- Preprocessor: tile_resample
- Model: control_xxxx_tile
Press Generate. After it is done, you will find the image in the batch output folder.
Make sure to open the image in full size and inspect the details in full size. Make sure they look sharp and have a consistent style.
Step 5: Prepare EbSynth data
Now we need to generate data to put into EbSynth.
Go to Temporal-Kit page and switch to the Ebsynth-Process tab.
Input Folder: Put in the same target folder path you put in the Pre-Processing page. E.g. G:temporalkittest1
Click read last_settings. If your input folder is correct, the video and the settings will be populated.
Click prepare ebsynth. After it is done, you should see the keys folder populated with your stylized keyframes, and the frames folder populated with your images.
Step 6: Process with EbSynth
Now open the EbSynth program.
Open the File Explorer and navigate to the project folder. You should folder like the ones showed below. We need the keys folder and the frames folder for EbSynth.
Drag the keys folder from the File Explorer and drop it to the Keyframes field in EbSynth.
Drag the frames folder from the File Explorer and drop it to the frames field in EbSynth.
After these two steps, EbSynth should have populated with the correct settings with a bunch of Synth buttons. There is one row for each keyframe. Each keyframe acts as a reference and stylizes a certain number of frames.
Click Run All and wait for them to complete.
When it is done, you should see a series of out_#####
directories generated in the target project folder.
Step 7: Make the final video
Now go back to AUTOMATIC1111. You should still be on the Temporal Kit page and Ebsynth-Process tab.
Click recombine ebsynth and you are done!
Look how smooth the video is. With some tweaking, you can probably make it better!
Method 6: AnimateDiff Prompt Travel
AnimateDiff Prompt Travel is a video-to-video method that uses AnimateDiff to maintain frame-to-frame consistency, ControlNet to copy the motion of a reference video, and Stable Diffusion prompts to control content at different time points.
Using the lineart ControlNet method, it can be used to stylize a video. See the AnimateDiff Prompt Travel tutorial for setup details. Here’s the workflow in ComfyUI.
Here’s the video generated.
Variations
Multiple ControlNets
Experiment with Multiple ControlNet to further fix small details and reduce flickering.
For example, you can add a second ControlNet to use reference only to fix the character’s look. I used a frame from a video generated before.
Here’s the video with lineart and reference-only ControlNets.
Other settings
Experiment with denoising strength for a trade-off between the amount of change and flickering.
Sometimes, the input video is too hard to process well with ControlNet. Try another one with a bigger and slow-moving subject.
Deflickering
Multiple ControlNet
Using multiple ControlNet to fix features in the video can significantly reduce flickering.
For example, this is with ONE ControlNet, Line art realistic.
What if we add one more ControlNet, the Tile Resample?
Not only it flickers less, but it also helps to preserve the color of the original video.
A similar degree of deflickering can be achieved by adding the Canny ControlNet.
The tradeoff is it is going to take longer to process a video. But I think its worth it!
Post-processing
Videos made using Stable Diffusion ControlNet still have some degree of flickering. Here are something you can do the flickering.
Da Vinci Resolve has a deflickering plugin you can easily apply to the Stable Diffusion video. Unfortunately, it is only available in the paid version (Studio).
If you are not prepared to shell out for that and are tech-savvy, use this deflickering model to process your videos.
deterministic samplers
Use a deterministic sampler to reduce flickering.
Below is using Euler a, a stochastic sampler. (Produced with SD-CN-animation.)
The video below uses the same settings except using DPM++2M Karas, a deterministic sampler.
Note her face and hair flicker less.
Some examples of deterministic samplers are
- Euler
- LMS
- Heun
- DPM++2M
- DPM++2M Karas
See the sampler article for an overview.
Other video-to-video Options
EbSynth
EBSynth is used to paint over a video, either manually or with AI image generators such as Stable Diffusion.
You will then dice the image back to 4 individual images and use them as keyframes in EbSynth.
The reason to go through this process is to improve the consistency across the keyframes. If the keyframes were transformed with img2img individually, they would normally have too much variation.
The EbSynth method is best to be executed with the Temporal Kit.
Deforum
Deforum has a video-to-video function with ControlNet. (Writing about this soon.)
Stable WarpFusion
Stable WarpFusion is a paid Colab Notebook that produces amazing video transformations with custom algorithms.