The negative prompt is an additional way to nudge Stable Diffusion to give you what you want. Unlike inpainting, which requires drawing a mask, you can use a negative prompt with all the convenience of text input. In fact, some images can only be generated by using negative prompts.
In this article, we will review a simple example of using a negative prompt. Then, you will learn how a negative prompt works in Stable Diffusion.
This is the first part of the two-part series on using negative prompts. See the second part: How to use negative prompts for guidelines on building good negative prompts.
A simple example
Positive prompt only
Let’s try generating some images of man. That’s right. We are going into uncharted territory here… I am using Stable Diffusion v1.5 with the prompt:
Portrait photo of a man.
OK, we got what we expected. No surprise. However, these men look a bit too serious. Let’s try removing their mustaches to lighten them up. Let’s try the prompt:
Portrait photo of a man without mustache.
We have a problem here. We get even more prominent mustaches! What’s going on? The culprit is likely the failure of cross-attention to associate “without” and “mustache”. Stable Diffusion understood the prompt as “man” and “mustache”. That’s why you see both of them.
Positive and negative prompts
So what can we do to generate men without mustache? Is this something Stable Diffusion cannot do? The answer is using negative prompts. If we use the prompt
Portrait photo of a man.
together with the negative prompt
mustache
We can finally generate some men without a mustache! You will get similar results using v2 models.
This example demonstrates the principle of using negative prompts:
If you see something you don’t want, put it in the negative prompt.
How does a negative prompt work?
Recall in text-to-image conditioning, the prompt is converted to embedding vectors, which are in turn fed to the U-Net noise predictor. Well, that’s not the whole story. (Sorry, this has happened so many times…) There are actually two sets of embedding vectors, one for the positive prompt and the other for the negative prompt.
The positive and negative prompts are on equal footing. They both have 77 tokens. You can always use one with or without the other.
The negative prompt is implemented in samplers, the algorithm responsible for implementing the reverse diffusion. To understand how a negative prompt works, we will first need to understand how sampling works without using a negative prompt.
Sampling without negative prompt
In a sampling step in Stable Diffusion, the algorithm first denoises the image a little bit with conditional sampling, guided by the text prompt. The sampler then denoises the same image a little bit with unconditional sampling. That is totally unguided, as if you don’t use a text prompt. Note that it would still diffuse towards a decent image, like a basketball or a wineglass below, but it could be anything. The diffusion step that’s actually done is the difference between the conditional and unconditional samplings. This process is repeated for the number of sampling steps.
Sampling with negative prompt
The negative prompt is implemented by hijacking the unconditional sampling. Instead of using an empty prompt, which generates random images, a negative prompt is used.
Technically, a positive prompt steers the diffusion toward the images associated with it, while a negative prompt steers the diffusion away from it. Note that the diffusion in Stable Diffusion happens in latent space, not images. The above figures in the image space are for illustration purposes only. See this great write-up if you are interested in how it is implemented at the code level.
Sampling space
Let’s consider the following illustration of sampling space. When we use the prompt “Portrait photo of a man”, Stable Diffusion samples images from the whole latent space of all men, with and without a mustache. You should get images of men with and without it.
When the negative prompt “mustache” is added, the “Men with mustache” space is excluded. Effectively, we are sampling images from men without mustache.
Summary
I hope this article gives you a good overview of the negative prompt and how it works.
A negative prompt removes objects or styles in a way that may not be possible by tinkering with a positive prompt alone. It works by hijacking the unconditional sampling in each sampling step. The diffusion steers away from what’s described in the negative prompt.
Head to the second part: How to use a negative prompt if you want to know how to use them.