aizmin in Tutorial

Stable Diffusion: Image Composition Control in Regional Prompter

Do you know you can specify the prompts for different regions of an image? You can do that on AUTOMATIC1111 with the Regional Prompter extension.

In this post, you will first go through a simple step-by-step example of using the regional prompting technqiue. Then you will learn more advanced usages of using regional prompting together with ControlNet.

Software

We will use AUTOMATIC1111 Stable Diffusion GUI. You can use this GUI on Windows, Mac, or Google Colab.

Installing Regional Prompter extension

Colab Notebook

Installing Regional Prompter extension in the Colab Notebook in the Quick Start Guide is easy. All you need to do is to check the Regional Prompter extension.

Windows or Mac

Follow these steps to install the Regional Prompter extension in AUTOMATIC1111.

Start AUTOMATIC1111 Web-UI normally.
Navigate to the Extension Page.
Click the Available tab.
Click Load from: button.
Find the extension “Regional Prompter”.
Click Install.
Restart the web-ui.

A simple example

Let’s go through a simple example. I will use a very simple prompt in order to illustrate the effect.

Let’s say you want to generate a man and a woman in the same image. Using the simple prompt

a man and a woman

and the negative prompt

disfigured, ugly

We get… a man and a woman.

So far so good. But what if you want to be more specific? Like generating a man with black hair and a woman with blonde hair? Naturally, you write that in the prompt.

a man with black hair, a woman with blonde hair

You will get what you describe sometimes, but more often than not Stable Diffusion confuses which hair color should go with whom. The situation will be even more difficult if you want to further specify the color of the clothing, etc.

What happened? Why can’t Stable Diffusion do even this simple thing? The self-attention mechanism incorrectly associates the hair color and the person.

There’s a solution to this problem: specify the prompt for a black-haired man only to the left-hand side of the image and the blonde-haired woman to the right-hand side of the image.

Regional Prompter

To use the Regional Prompter extension:

Unfold the Regional Prompter section on the txt2img page.

2. Check Active to activate the regional prompter.

3. Most default settings are good to go with this example. Check the following settings.

Main Splitting: Columns
Divide Ratio: 1, 1
Width and Height: They have to match the txt2img setting.

4. Click visualize and make template. You will see an image with two regions: region 0 is on the left, and Region 1 is on the right. They are divided equally in the 1-to-1 ratio.

5. Put in the prompt

a man and a woman, a man with black hair
BREAK
a man and a woman, a woman with blonde hair

The BREAK keyword separates the prompts. We have two prompts above. The first prompt will be applied to region 0. The second prompt will be applied to region 1.

Negative prompt:

disfigured, deformed, ugly

Since there’s no BREAK in the negative prompt, the whole prompt will be applied to both regions.

These are what we get:

Stable Diffusion correctly generates a man with black hair in region 0 (left) and a woman with blonde hair in region 1. (right)

Note that this doesn’t work 100% of the time. In my experience, it’s more like 75% of the time. But still, it’s much better than leaving it to pure chance.

Common prompt

You may have noticed the two prompts shared a common part “a man and a woman”.

a man and a woman, a man with black hair
BREAK
a man and a woman, a woman with blonde hair

Why do you need the common prompt? Stable Diffusion will only generate one person if you don’t have the common prompt:

a man with black hair
BREAK
a woman with blonde hair

Why? Both prompts for the left and the right regions describe a single person. So you get… one person! You need to tell Stable Diffusion that this is a picture of two persons: a man and a woman.

That’s why you need a common prompt, “a man and a woman”.

Unlike this toy example, the common prompt is typically pretty long. Instead of writing it out in each region, there’s a convenient way to deal with this.

Check the option Use common prompt.

2. Now you can add the common prompt (a man and a woman) at the beginning.

a man and a woman
BREAK
a man with black hair
BREAK
a woman with blonde hair

We have three prompts above: (1) common prompt, (2) prompt for region 0, and (2) prompt for region 1.

The common prompt is added to the beginning of the prompt for each region.

The common prompt is just a syntactic sugar. Using it is equivalent to what we had in the original prompt:

a man and a woman, a man with black hair
BREAK
a man and a woman, a woman with blonde hair

More complex regions

The secret of mastering Regional Prompter is to define regions accurately. In this section, I will explain how to set the divide ratio correctly to break up the image the way you want. It could be difficult to understand or remember how to specify the regions correctly. You can always click visualize and make template to generate the region image to confirm.

In one-dimensional division, you can divide the regions horizontally or vertically.

Horizontal division

To divide the regions horizontally, select Columns in Main Splitting. A number separated by commas represent each region. The number represents the size of the region.

Examples of Divide Ratio:

1,1

1,1,1

1,2,1

Vertical division

The Rows splitting is similar, except the regions are in rows. Below are some examples of the divide ratio.

1,1

1,1,1

1,2,1

2D regions

You can divide the region both vertically and horizontally in the same image. Select the Columns splitting. The rules are

Rows are separated by ;
Each row is a series of numbers separated by commas, e.g. 1,1,1
The first number in each row represents the height of the row. The subsequent numbers represent the width of the regions.

Let’s look at a few examples.

1,1,1; 1,1,1

This defines two rows, with each row of height 1. Both rows have two regions with equal widths (1,1).

There are 4 regions in total.

1,1,1; 2,1,1

This defines two rows.
The height of the first row is 1. The height of the second row is 2.
Each row has two regions with equal widths (1,1).
There are 4 regions in total.

Finally, let’s look at a more complicated example. If you understand this, you understand everything about region division!

1,1,1,1; 2,1,2

There are two rows.
The height of the first row is 1. The height of the second row is 2.
The first row has 3 regions with widths 1. (1,1,1)
The second row has two regions with widths 1 and 2. (1,2)
There are 5 regions in total.

A 2D regional prompting example

Let’s say I am trying images for real. I came up with the following prompt.

Model: Lyriel v1.5

Prompt:

a witch, highly detailed face, half body, studio lighting, dramatic lighting, highly detailed clothing, looking at you, mysterious, dramatic lighting, (full moon:1.3), (beautiful fire magic: 1.2)

Negative prompt:

underage, immature, disfigured, deformed

We get some decent images like the following.

Not bad, but there’s no way to control where the moon and the fire are. All you can do is keep hitting the Generate button until you get the placements you want.

This is where the Regional Prompter can help.

Use the following settings:

Main Split: Columns
Use common prompt: Yes
Divide ratio: 1,1,1;2,1,1

Prompt:

a witch, highly detailed face, half body, studio lighting, dramatic lighting, highly detailed clothing, looking at you, mysterious, dramatic lighting
BREAK
(full moon:1.3)
BREAK
BREAK
BREAK
(beautiful fire magic: 1.2)

This put the moon in region 0 (top left) and the fire in region 3 (bottom right).

We now have control over the positions!

Let’s put the moon at the top right (region 1) and fire at the bottom left (region 2).

a witch, highly detailed face, half body, studio lighting, dramatic lighting, highly detailed clothing, looking at you, mysterious, dramatic lighting
BREAK
BREAK
(full moon:1.3)
BREAK
(beautiful fire magic: 1.2)
BREAK

See the moon at the top right and the fire at the bottom left.

Again, you should be aware that regional prompting does not work 100% of the time. So, generate at least a few images at a time.

Region prompting with ControlNet

Regional prompter can specify prompts for each region but cannot control overall image composition. Well, we have a tool that can do precisely that: ControlNet.

Let’s go through two examples of using Regional Prompter and ControlNet together to achieve the degree of manipulation that we can only dream of without them.

Let’s say you want to generate an image of a wizard studying an old scroll in a small cellar space. In addition, you want to have a wolf next to him and some skulls on the floor.

That’s a lot of elements to take care of. You will see a range of compositions using the regular text-to-image.

Text-to-image

As a clueless Stable Diffusion user, I put in this prompt and hope for the best.

a mysterious wizard , highly detailed face, highly detailed clothing, cinematic, dark, horror, worn stone wall, ancient symbol, old mystical torn scroll, wolf, many skulls

Negative prompt:

underage, immature, disfigured, deformed

Model: Lyriel v1.5

These are decent images, thanks to my prompting skill. (!)

But that’s not quite what I wanted to generate. Maybe I didn’t say clearly he’s studying a scroll. Let’s rearrange the prompt a bit.

a mysterious wizard studying old mystical torn scroll, highly detailed face, highly detailed clothing, cinematic, dark, horror, worn stone wall, ancient symbol, wolf, many skulls

Now it’s closer to what’s in my mind. But I have little control over the Wizard’s pose and how close it zooms in.

Adding ControlNet

Naturally, the next step is to control the pose using ControlNet. I assume you already have it installed and know the basics.

I will guide you through using it in this workflow. Read the ControlNet article if you want to learn more.

I will use this stock image as a reference.

Reference image.

Step 1. Upload the reference image to the image canvas. You can drag and drop the reference image there.

Step 2. Check Enable.

Step 3. Select openpose in the Preprocessor dropdown menu.

Step 4. Select control_opepose in the Model dropdown menu.

Optionally, preview the extracted pose by performing the following steps.

Check Allow Preview.
A new icon that looks like an explosion will appear next to the Model dropdown menu. Click on the icon to preview the pose.

Press Generate to generate images with ControlNet.

Here’s what we get.

Now it’s one step forward. We have fixed the wizard’s pose. He now always sits down and shows his full body.

But it still lacks a mechanism to specify the prompts in certain areas. You probably know what I am going with. That’s right, add regional prompts!

Adding regional prompts

Now, activate the Regional Prompter extension by checking the Active checkbox.

We will still use the HorizontalDivide mode.

Check Use common prompt.

We will divide the image into 4 regions. The Divide Ratio is

1,1,1.5; 1,1,1.5

The 4 regions are like this.

We want to have the following:

Whole image: a wizard
Region 0: a stone wall with ancient symbols
Region 1: wizard reading a scroll
Region 2: a wolf beside a stone wall
Region 3: some skulls

So, the prompt is

a mysterious wizard , highly detailed face, highly detailed clothing, cinematic, dark, horror
BREAK
worn stone wall, (ancient symbols :1.3)
BREAK
old mystical (torn scroll :1.2)
BREAK
worn stone wall, (wolf:1.5)
BREAK
(many skulls:1.5), blurry

Note that I increase the weights of some keywords. Otherwise, the objects may not show up.

Now you have complete control of where the dog, the skull, and the mystical symbol are. See the images below.

Correct color assignment

Let’s say you want to generate some photoshoots of a woman with brown hair, a yellow blouse, and a blue skirt. Sounds easy?

You will know this is a challenge if you have tried generating something like that.

Let’s see some examples with the following prompt. (Modified from the Realistic People tutorial)

full body photo of young woman, natural brown hair, yellow blouse, blue skirt, busy street, rim lighting, studio lighting, looking at the camera, dslr, ultra quality, sharp focus, tack sharp, dof, film grain, Fujifilm XT3, crystal clear, 8K UHD, highly detailed glossy eyes, high detailed skin, skin pores

disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w

Model: Realistic Vision v2

Stable Diffusion improvised! The colors are all mixed up.

You will see it’s not that easy to tell Stable Diffusion which color should go where. The self-attention of the prompt tokens does not work well here.

You will get a correct assignment by chance. But I would rather use that chance to get a good composition…

Regional Prompter

The color assignment is something the regional prompter can help with. Let’s divide the image vertically into 3 parts.

Main Splitting: Rows

Divide ratio: 1, 1, 1.5

Use common prompt: Yes

Prompt:

full body photo of young woman, busy street, rim lighting, studio lighting, looking at the camera, dslr, ultra quality, sharp focus, tack sharp, dof, film grain, Fujifilm XT3, crystal clear, 8K UHD, highly detailed glossy eyes, high detailed skin, skin pores
BREAK
natural brown hair
BREAK
(yellow blouse: 1.3)
BREAK
(blue skirt: 1.3)

The negative prompt is the same:

disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w

Very nice! Regional prompts are effective solutions to the color assignment problem.

Use ControlNet Pose to get even more control.

Regional Prompter as a creative tool

We are blessed with Stable Diffusion in our hands. Regional prompter gives you the ability to prompt at different parts of the image. Let’s think about making something new! Create some visual effects that were not possible before!

Below is an example of dividing an image of a nature scene into four parts horizontally and assigning different weather to each part.

Divide mode: Horizontal

Divide ratio: 1,1,1,1

Use common prompt: Yes

Model: Lyriel v1.5

Prompt:

a beautiful wild park, path to freedom, courage and love, national geographic photo of the year
BREAK
spring, trees, birds, green grasses, (sunny, wild flowers:1.2), god ray, clear sky
BREAK
cloudy, dry
BREAK
thunderstorm, rain
BREAK
winter, heavy snow, barren trees

Negative prompt

BREAK
snow
BREAK
BREAK
BREAK
BREAK

I’m sure you can be more creative than I am. Let your ideas flow and start experimenting!

Final notes

Increase the weight of a keyword if you don’t see the object.
It’s pretty normal to get an imperfect image. Fix it up here or there with inpainting. Unlike many other extensions, Regional Prompter shares settings between txt2img and img2img. So make sure to uncheck Active if you don’t want to use it for inpainting.
This extension has more functions than the ones I have gone through. See the regional prompter GitHub page to learn more.
There’s an earlier plugin called Latent Couple that does something similar. Regional Prompter is getting updated and has some extra functions.
Experiment with Attention and Latent generation mode to see which one works best for you. (Attention works pretty well for me.)

Next Read: 1. Achieving Stable Diffusion: 5 Proven Techniques 2. Unlocking Consistent Facial Results with Stable Diffusion 3. Mastering Face Stability through Effective Diffusion Methods 4. Consistent Face Generation: 5 Key Strategies for Stable Diffusion 5. Enhancing Facial Consistency with Reliable Diffusion Techniques »

ExtensionTxt2img

aizmin: