Disturbed Focus Directive

Perturbed Attention Guidance is a simple modification to the sampling process to enhance your Stable Diffusion images.

I will cover:

  • What Perturbed Attention Guidance is.
  • How to use it in ComfyUI and AUTOMATIC1111.
  • Comparison of settings.

Software

AUTOMATIC1111

We will use AUTOMATIC1111 , a popular and free Stable Diffusion software. Check out the installation guides on WindowsMac, or Google Colab.

Check out the AUTOMATIC1111 Guide if you are new to AUTOMATIC1111.

ComfyUI

We will use ComfyUI in this section. It is an alternative to AUTOMATIC1111.

Read the ComfyUI installation guide and ComfyUI beginner’s guide if you are new to ComfyUI.

Take the Stable Diffusion Courses to learn ComfyUI and AUTOMATIC1111 step-by-step.

What is Perturbed Attention Guidance?

Perturbed Attention Guidance (PAG) is a change in the sampling process to enhance the image quality. You can use this technique in SD 1.5 and SDXL models.

You can read the research article Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance by Donghoon Ahn and his coworkers.

Attentions in U-Nets

Stable Diffusion SD 1.5 and SDXL models use a deep neural called U-Net to denoise the image during sampling. There are many attention operations in the U-Net. There are two types of attentions:

  1. Cross-attention between the prompt and the latent image.
  2. Self-attention within the latent image.

The above applies to both the positive and negative latent images controlled by the positive and the negative prompts, respectively. The negative prompt is optional, but using it improves image quality.

The negative latent image is also called the unconditioned latent image because, originally, there was no negative prompt! The diffusion process steers away from a random, unconditioned image.

The negative prompt is a later invention that hacks the unconditioned latent image by injecting a prompt so that it steers away from the concepts in the negative prompt.

Perturbed Attention Guidance (PAG)

PAG only modifies the diffusion of the unconditioned latent image, corresponding to the one specified by the negative prompt.

It also modifies only one small step: The self-attention operation of the middle block of the U-Net.

The author argues that the unconditioned latent image is slow to form due to a lack of guidance (when the negative prompt is not used).

Instead of performing a self-attention to determine what part of the unconditoned latent image is important, PAG simply says the whole image is equally important.

In practice, as implemented in ComfyUI and A1111, the PAG doesn’t replace classifier-free guidance (CFG). Instead, both are used. The PAG diffusion direction is added to that of CFG and controlled by an independent scale factor analogous to the CFG scale.

The diffusion step is a combination of CFG and PAG.

Mathematically, the total guidance during sampling is:

Total guidance = CFG scale + PAG scale

That’s why the default setting is a CFG scale of 4 and PAG scale of 3, summing up to 7, a widely used CFG value.

Use PAG on ComfyUI

ComfyUI has native support for the Perturbed Attention Guidance node. To use it, you must update ComfyUI, which you can do easily with ComfyUI Manager.

Click the Manager > Update ComfyUI. Restart ComfyUI.

Add the PerturbedAtttentionGuidance node between the Model and KSampler node.

Or download the PAG txt2img workflow below.

The following workflow compares images with and without PAG using the same seed and image size.

Use PAG on AUTOMATIC1111

You can use Perturbed Attention Guidance with AUTOMATIC11111. You will need to install the Incantation extension.

Installing the Incantation extension

To install an extension in AUTOMATIC1111 Stable Diffusion WebUI:

  1. Start AUTOMATIC1111 Web-UI normally.

2. Navigate to the Extension Page.

3. Click the Install from URL tab.

4. Enter the URL in the URL for extension’s git repository field.

https://github.com/v0xie/sd-webui-incantations

5. Click the Install button.

6. Wait for the confirmation message that the installation is complete.

7. Restart AUTOMATIC1111.

Using PAG

To use Perturbed Attention Guidance, expand the Incantations section on the txt2img page.

Check the Active box.

Set PAG Scale to 3.

This setting works for SD 1.5 and SDXL models.

Enter a prompt and hit Generate to create an image.

PAG settings

I will use the following prompt and the Juggernaunt XL v7 model.

realistic anime half body dark and gritty cinematic lighting vibrant and Final Fantasy, goth, dark angel, dynamic pose, japanese, asymmetrical goth fashion, sorcerer’s stronghold, silver hair, dimly lit, empty hall

PAG Scale

I will use the default CFG setting of 4.

Setting the PAG scale to 0 turns it off. So PAG 0 is the reference image without PAG.

The sweet spot is between a PAG scale of 1 to 3. It’s a matter of choosing how saturated you want the images to be.

Setting it to higher than 3 over-saturates the image, an effect similar to setting a high CFG scale.

Overall, I think it’s an improvement (for this CFG setting).

Fixing total guidance

The comparison above is not entirely fair because each image has a different total guidance (CFG scale + PAG scale). You can expect a similar result of higher contrast by changing the CFG scale alone!

So, let’s fix the total guidance to 7 and see if PAG is really doing anything better.

A low PAG value (1-3) indeed improves the image quality. We also see that PAG provides stronger guidance than CFG, as the image is fried at the PAG scale of 7.

Negative prompts

A missing piece of the research article is the negative prompt.

We can get higher image quality by substituting the unconditioned latent image with a latent image conditioned by the negative prompt without using PAG.

How does PAG fare when negative prompts are used? Let’s find out.

Let’s add this negative prompt:

disfigured, ugly, deformed, low quality, beginner

The left column is with PAG 0 and CFG 7, while the right column is PAG 3 and CFG 4.

With negative prompts, using PAG still seems to be better.

Reference

Related Posts