Enhance Image Details with AI Upscaler Technology

AI Image upscalers like ESRGAN are indispensable tools to improve the quality of AI images generated by Stable Diffusion. It is so commonly used that many Stable Diffusion GUIs have built-in support.

Here, we will learn what image upscalers are, how they work, and how to use them.

Why do we need an image upscaler?

The default image size of Stable Diffusion v1 is 512×512 pixels. This is pretty low in today’s standard. Let’s take the iPhone 12 as an example. Its camera produces 12 MP images – that is 4,032 × 3,024 pixels. Its screen displays 2,532 x 1,170 pixels, so an unscaled Stable Diffusion image would need to be enlarged and look low quality.

To complicate the matter, a complex scene generated by Stable Diffusion is often not as sharp as it should be. It struggles with fine details.

Why can’t we use a traditional upscaler?

You can, but the result won’t be as good.

Traditional algorithms for resizing images, such as the nearest neighbor interpolation and Lanczos interpolation, have been criticized for using only pixel values of the image. They enlarge the canvas and fill in the new pixels by performing mathematical operations using only the image’s pixel values. However, if the image itself is corrupted or distorted, there’s no way for these algorithms to fill in missing information accurately.

How does AI upscaler work?

AI upscalers are neural network models trained with massive amounts of data. They can fill in details while enlarging an image.

In the training, images are artificially corrupted to emulate real-world degradation. The AI upscaler model is then trained to recover the original images.

A massive amount of prior knowledge is embedded into the model. It can fill in the missing information. It’s like humans don’t need to study a person’s face in great detail to remember it. We mainly pay attention to a few key features.

Below is an example of comparing the traditional (Lanczos) and AI (R-ESRGAN) upscalers. Because of the knowledge embedded in the AI upscaler, it can enlarge the image and recover the details simultaneously.

Compare image recovery between Lanczos (traditional upscaler) and R-ESRGAN (AI upscaler)

How to use an AI upscaler?

Let’s go through how to use an AI upscaler in AUTOMATIC1111 WebUI for Stable Diffusion.

See the Quick Start Guide for setting up AUTOMATIC1111 GUI.

Go to the Extras page (I know the name is confusing), and select Single Image.

Upload the image you want to upscale to the source canvas.

Set the Resize factor. Many AI upscalers enlarge images 4 times natively. So 4 is a good choice. Set it to a lower value, like 2, if you don’t want the image to be that big.

If your image is 512×512 pixels, resizing 2x is 1024×1024 pixels, and 4x is 2048×2048 pixels.

Select R-ESRGAN 4x+, an AI upscaler that works for most images.

Press Generate to start upscaling.

When it is done, the upscaled image will appear in the output window on the right. Right-click on the image to save.

AI upscaler options

Let’s go through a few notable AI upscaler options.

LDSR

Latent Diffusion Super Resolution (LDSR) upscaler was initially released along with Stable Diffusion 1.4. It is a latent diffusion model trained to perform upscaling tasks.

Although it delivers superior quality, it is extremely slow. I don’t recommend it.

ESRGAN 4x

Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) is an upscaling network that has won the 2018 Perceptual Image Restoration and Manipulation Challenge. It is an enhancement to the previous SRGAN model.

It tends to retain fine details and produce crisp and sharp images. ESRGAN is the base model of many other upscalers.

R-ESRGAN 4x

The Real-ESRGAN (R-ESRGAN) is an enhancement to ESRGAN and can restore a variety of real-world images. It models various degrees of distortion from the camera lens and digital compression.

Compared to ESRGAN, it tends to produce smoother images.

R-ESRGAN performs best with realistic photo images.

Other Options

There’s a good comparison in this post to check out other options.

R-ESRAGN is a good choice for photographs or realistic paintings. Anime images require upscalers specifically trained to recover animes.

Visit the Open model database to find and download more upscalers.

Installing new upscaler

To install a new upscaler in AUTOMATIC1111 GUI, download a model from the Open model database and put it in the folder.

stable-diffusion-webui/models/ESRGAN

Restart the GUI. Your upscaler should now be available for selection in the upscaler dropdown menu. Below is what you should see after installing the Universal Upscaler V2.

The following models are good general-purpose upscalers.

Example of upscaled images

Below is an example of a complex scene upscaled using R-ESRGAN. Enlarge and switch between them to observe the difference. Compare them on computer and cell phone screens to see the difference.

Enhancing details with SD upscale

Using an upscaler alone is not ideal. If you have stable diffusion in hand, why not adding it to your upscaler workflow?

SD Upscale is a script that comes with AUTOMATIC1111 that performs upscaling with an upscaler followed by an image-to-image to enhance details.

Step 1. Navigate to Img2img page.

Step 2. Upload an image to the img2img canvas.

(Alternatively, use Send to Img2img button to send the image to the img2img canvas)

Step 3. In the Script dropdown menu at the bottom, select SD Upscale.

Step 4. Set Scale factor to 4 to scale to 4x the original size.

Step 5. Set denoising strength to between 0.1 and 0.3. The higher it is, the more the image will change. (You should experiment with this)

Step 6. Set the number of sampling steps to 100. Higher steps improve details. (You should experiment with this)

Step 7. You can use the original prompt and the negative prompt. If you don’t have one, use “highly detailed” as the prompt.

Step 8. Press Generate.

Below is a comparison of adding an additional image-to-image with the SD Upscale script.

  • Left: Universal Upscaler v2 to 4x.
  • Right: SD Upscale with Universal Upscaler v2 to 4x, prompt “highly detailed”, denoising strength 0.3 and 100 sampling steps.
Left: Universal Upscaler v2. Right: SD Upscale.

The SD Upscale script helps to improve details and reduce upscaling artifacts.

Hires Fix in txt2img page

You can optionally upscale every image generated on the txt2img page. To do so, you simply need to check the Hires. fix.

Additional options will appear under the checkbox. The options are similar to those using the SD Upscale script.

Personally, I don’t use Hires fix much because it slows down image generation. Instead of upscaling all images, I would rather only upscale the ones I am going to keep.

Once you see a good image, you can send it to img2img for SD upscaling.

Learn about a new upscaling method for Stable Diffusion: ControlNet Tile Upscale.

Related Posts