Enhancing Eyes and Faces with VAE

VAE is a partial update to Stable Diffusion 1.4 or 1.5 models that will make rendering eyes better. I will explain what VAE is, what you can expect, where you can get it, and how to install and use it.

What is VAE?

VAE stands for variational autoencoder. It is part of the neural network model that encodes and decodes the images to and from the smaller latent space, so that computation can be faster.

Do I need a VAE?

You don’t need to install a VAE file to run Stable Diffusion—any models you use, whether v1, v2 or custom, already have a default VAE.

When people say downloading and using a VAE, they refer to using an improved version of it. This happens when the model trainer further fine-tunes the VAE part of the model with additional data. Instead of releasing a whole new model, which is a big file, they release only the tiny part that has been updated.

What is the effect of using VAE?

Usually, it’s pretty tiny. An improved VAE decodes the image better from the latent space. Fine details are better recovered. It helps render eyes and text where all fine details matter.

Stability AI released two variants of fine-tuned VAE decoders, EMA and MSE. (Exponential Moving Average and Mean Square Error are metrics for measuring how good the autoencoders are.)

See their comparison reproduced below.

Stability AI’s comparison between EMA, MSE, and the original decoder. (256×256 images)

Which one should you use? Stability’s assessment with 256×256 images is that EMA produces sharper images while MSE’s images are smoother. (That matches my own testing.)

In my own testing of Stable Diffusion v1.4 and v1.5 with 512×512 images, I see good improvements in rendering eyes in some images, especially when the faces are small. I didn’t see any improvements to rendering text, but I don’t think many people are using Stable Diffusion for this reason, anyway.

In no case, the new VAE performs worse. Either doing better or nothing.

Below is a comparison between the original, EMA, and MSE using Stable Diffusion v1.5 model. (prompt can be found here.) Enlarge and compare the difference.

Original
EMA
MSE

Improvements to text generation are not as clear (Added “holding a sign said Stable Diffusion” to the prompt):

Original
EMA
MSE

You can also use these VAEs with a custom model. I tested with some anime models but didn’t see any improvements. I encourage you to do your own test.

As a final note, EMA and MSE are compatible with Stable Diffusion v2.0. You can use them but the effect is minimal. 2.0 is already very good at rendering eyes. Perhaps they have already incorporated the improvement to the model.

Should I use a VAE?

You don’t need to use a VAE if you are happy with the result you are getting. E.g., you are already using face restoration like CodeFormer to fix eyes.

You should use a VAE if you are in the camp of taking all the little improvements you can get. You only need to go through the trouble of setting it up once. After that, the art creation workflow stays the same.

How to use VAE?

VAEs are ready to use in the Colab Notebook included in the Quick Start Guide.

Download

Currently, there are two improved versions of VAE released by Stability. Below are direct download links.

Download link for EMA VAE

Download link for MSE VAE

Installation

This install instruction applies to AUTOMATIC1111 GUI. Place the downloaded VAE files in the directory.

stable-diffusion-webui/models/VAE

For Linux and Mac OS

For your convenience, run the commands below in Linux or Mac OS under stable-diffusion-webui’s directory, and download and install the VAE files.

wget https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt -O models/VAE/vae-ft-ema-560000-ema-pruned.ckpt
wget https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt -O models/VAE/vae-ft-mse-840000-ema-pruned.ckpt

Use

To use a VAE in AUTOMATIC1111 GUI, click the Settings tab on the left and click the VAE section.

In the SD VAE dropdown menu, select the VAE file you want to use.

Press the big red Apply Settings button on top. You should see the message

Settings: sd_vae applied

in the Setting tab when the loading is successful.

Other options in the dropdown menu are:

  • None: Use the original VAE that comes with the model.
  • Auto: see this post for behavior. I don’t recommend beginners use Auto since it is easy to confuse which VAE is used.

Pro tip: If you cannot find a setting, click Show All Pages on the left. All settings will be shown on a single page. Use Ctrl-F to find the setting.

Summary

We have gone through how to use the two improved VAE decoders released by Stability AI. They provide small but noticeable improvements to rendering eyes. You can decide whether you want to use it.

I am using it because I don’t see any cases that harm my images. I hope this article helps!

aizmin: