Enhancing Images with Self-Attention: Upgrade Background

Self-attention Guidance (SAG) enhances details in an image while preserving the overall composition. It is useful for fixing nonsensical details that give the AI image away.

In the example below, it fixes the man’s bow tie and modifies other details in the background.

Here’s the video version of this article.

Software

We will use ComfyUI, an alternative to AUTOMATIC1111.

Read the ComfyUI installation guide and ComfyUI beginner’s guide if you are new to ComfyUI.

Take the ComfyUI course to learn ComfyUI step-by-step.

How does SAG work?

The details of Self-attention guidance (SAG) can be found in the technical report “Improving Sample Quality of Diffusion Models Using Self-Attention Guidance” by Susung Hong and coworkers.

SAG works by adding additional guidance on top of the classifier-free guidance. The general idea is similar to the perturbed attention guidance (PAG). SAG is a prior work of PAG produced by the same research group.

SAG adds additional guidance to CFG.

Blurring diffusion

To understand SAG, we first need to understand the idea behind its predecessor, the Blurring diffusion model. The idea is simple. It replaces the image used for the negative prompt with a blurred one.

The blurring process removes the fine details from the image, forcing the model to focus on the global composition.

SAG

SAG goes one step further by selectively blurring the parts of the image the model deems important based on the self-attention map. The Stable Diffusion model generates this map in the normal image generation process to decide which part of the image it should focus on.

SAG steers away from the negative prompt based on a selectively blurred image.

SAG steers away from the negative prompt based on the selectively blurred image. It forces the model to focus on the global composition of selected areas. Because the attention map is derived from the negative latent image, it tends to fix the details in the background instead of the main subject.

In other words, SAG is not a new model but an alternative guidance method. It works with any Stable Diffusion model and sampling method. You may not get the full benefit when using fast models like Stable Diffusion Lightning or Hyper. They tend to ignore negative prompts.

However, SAG does come with a price: It makes one more U-Net call per sampling step (3 instead of 2). You can expect the image generation time to increase by 50%.

ComfyUI

ComfyUI has native support of SAG. All you need to do is to insert the Self-Attention Guidance node between the Load Checkpoint and KSampler nodes.

Or you can download the workflow below.

The following workflow compares images with and without SAG using the same seed.

The scale parameter controls the strength of the guidance. It is analogous to the CFG scale. Setting it to 0 disables SAG.

The blur_sigma controls how much you blur the latent image for the negative guidance.

Parameters

SAG scale

You can enhance the effect of SAG by increasing its scale value. The effect saturates around 1.5. Setting it to 0 disables it.

A good SAG value is between 0.5 and 1.

Blur Sigma

Theoretically, higher sigma increases the blur of the negative image. The effect of changing sigma is very subtle. You can leave it at the default value of 3.

More examples

Below are a few more examples of SAG (scale 1 and blur sigma 3). You can see some subtle changes in the background. It is not always better but you should see a general improvement.

Many users report that SAG can clean up nonsensual objects in the background. In my testing, it does tend to change the background more than the main subject. It doesn’t always change things for the better, but I did see improvements more often.

So, you can use SAG as a “background fixer.” If you see an image you like, you can add the SAG node and see what if like it.

References

Related Posts