Civitai Guide to ControlNet – Part I – Basics

Last UpdatedChanges
9/13/2023First version published

What is ControlNet?

ControlNet is an implementation of the research Adding Conditional Control to Text-to-Image Diffusion Models.

It’s a neural network which exerts control over Stable Diffusion (SD) image generation in the following way;

But what does it mean for us, as users? ControlNet is a collection of models which do a bunch of things, most notably subject pose replication, style and color transfer, and depth-map image manipulation. They’re tools allowing us to transfer one aspect of an image to another.

There’s a standalone Github repository for ControlNet, maintained by user Illyasviel, and an Extension for popular SD interface Automatic1111, maintained by user Mikubill.

Show me examples!

ControlNet is best described with example images. In the first example, we’re replicating the composition of an image, but changing the style and theme, using a ControlNet model called Canny. The top left image is the original output from SD.

The second example uses a model called OpenPose to extract a character’s pose from an input image (in this case a real photograph), duplicating the position of the body, arms, head, appendages, etc. in the generated image. The input image can be a photograph, or a generated image – anything in which a human body can be detected.

Real photograph on the left, SD image using ControlNet and OpenPose on the right.

The ControlNet Models

There are ControlNet models for SD 1.5, SD 2.X, and SDXL. There have been a few versions of SD 1.5 ControlNet models – we’re only listing the latest 1.1 versions for SD 1.5 for download, below, along with the most recent SDXL models.

Note that many developers have released ControlNet models – the models below may not be an exhaustive list of every model available!

Also note that many ControlNet models require additional .yaml configuration files to be placed in the same directory as the model. On Civitai, check the “Files” section while downloading to see if a config file is available.

Illyasviel ControlNet 1.1 Models for SD 1.5Civitai Model Page

ModelDownload Link (Civitai)
CannyControlNet 1.1 Models – Canny
DepthControlNet 1.1 Models – Depth
HED/SoftEdgeControlNet 1.1 Models – Softedge
NormalControlNet 1.1 Models – Normal
ScribbleControlNet 1.1 Models – Scribble
MLSDControlNet 1.1 Models – MLSD
OpenPoseControlNet 1.1 Models – OpenPose
SegControlNet 1.1 Models – Seg
InpaintControlNet 1.1 Models – Inpaint
ip2pControlNet 1.1 Models – Pix2Pix
LineartControlNet 1.1 Models – Lineart
lineart animeControlNet 1.1 Models – AnimeLine
ShuffleControlNet 1.1 Models – Shuffle
TileControlNet 1.1 Models – Tile (e)

TencentARC T2I Adapter Models for SD 1.5Civitai Model Page

ModelDownload Link (Civitai)
ColorControlNet T2I – Color
StyleControlNet T2I – Style
SketchControlNet T2I – Sketch
OpenPoseControlNet T2I – OpenPose
KeyPoseControlNet T2I – KeyPose
DepthControlNet T2I – Depth
CannyControlNet T2I – Canny
BodyPoseControlNet T2I – BodyPose
SegControlNet T2I – Seg

SDXL Models – Various Developers – Civitai Model Page

NameDeveloperDownload Link (Civitai)
Real BlurKohya-ssControlNet – Kohya – Blur
CannyKohya-ssControlNet – Kohya – Canny
Depth (new)Kohya-ssControlNet – Kohya – Depth
Anime BlurKohya-ssControlNet – Kohya – Anime Blur
Blur BetaKohya-ssControlNet – Kohya – Blur (Beta)
CannyKohya-ssControlNet – Kohya – Canny
DepthKohya-ssControlNet – Kohya – Depth
OpenPoseKohya-ssControlNet – Kohya – OpenPose
OpenPose V2Kohya-ssControlNet – Kohya – OpenPose v2
ScribbleKohya-ssControlNet – Kohya – Scribble
CannyStability.aiControlNet – Stability – Canny
DepthStability.aiControlNet – Stability – Depth
RecolorStability.aiControlNet – Stability – Recolor
RevisionStability.aiControlNet – Stability – Revision
SketchStability.aiControlNet – Stability – Sketch
Depth-16b-zoeSargeZTControlNet – SargeZT – Depth-16b-zoe
Depth-faid-viditSargeZTControlNet – SargeZT – faid-vidit
Depth-zeedSargeZTControlNet – SargeZT – Depth-zeed
Depth-zoeSargeZTControlNet -SargeZT – Depth-zoe
SoftedgeSargeZTControlNet -SargeZT – Softedge
BinarySargeZT t2i AdapterControlNet – SargeZT – T2I – Binary
CannySargeZT t2i AdapterControlNet – SargeZT – T2I – Canny
ColorSargeZT t2i AdapterControlNet – SargeZT – T2I – Color
DepthSargeZT t2i AdapterControlNet – SargeZT – T2I – Depth
SegSargeZT t2i AdapterControlNet – SargeZT – T2I – Seg
SoftedgeSargeZT t2i AdapterControlNet – SargeZT – T2I – Softedge
CannyTencentARC t2i AdapterControlNet – TencentARC – T2I – Canny
Depth-midasTencentARC t2i AdapterControlNet – TencentARC – T2I – Depth-midas
Depth-zoeTencentARC t2i AdapterControlNet – TencentARC – T2I – Depth-zoe
LineartTencentARC t2i AdapterControlNet – TencentARC – T2I – Lineart
OpenPoseTencentARC t2i AdapterControlNet – TencentARC – T2I – OpenPose
RecolorTencentARC t2i AdapterControlNet – TencentARC – T2I – Recolor
SketchTencentARC t2i AdapterControlNet – TencentARC – T2I – Sketch
OpenPoseThibaudControlNet – Thibaud – OpenPose
OpenPose-LoRAThibaudControlNet – Thibaud – OpenPose-LoRA
H94 ip-adapterThibaudControlNet – Thibaud – H94 IP-Adapter

What do the Models do?

Many of the models above are “duplicates”, in that there are multiple models which perform much the same task, but produce slightly different outputs, being trained to perform the task with slightly different training parameters. The list below gives one example from each of the popular model types.

You really have to try them out for yourself to see if you prefer Stability.ai’s Depth vs Kohya’s Depth, for example. There’s a lot of personal preference involved.

The Preprocessor (also called the Annotator) is what converts your uploaded image into a detectmap (examples below), which is fed into ControlNet to produce the output effect. The Preprocessor does not need to be set if uploading a pre-made detectmap, if you created an OpenPose skeleton manually, for example.

A number of the most popular models are demonstrated below.

Canny – Edge Detection

Canny creates simple sharp lines around areas of high/low contrast;

License Free Unsplash Stock
Preprocessor Generated Detectmap
ControlNet + SD Prompt Output

MLSD – Mobile Line Segment Detection

Straight Line Detection model for architecture and man-made objects;

License Free Unsplash Stock
Preprocessor Generated Detectmap
ControlNet + SD Prompt Output

HED – Holistically-Nested Edge Detection (Also SoftEdge)

Creates smooth lines around objects, especially useful for recoloring and stylizing, using soft-edge detection.

License Free Unsplash Stock
Preprocessor Generated Detectmap
ControlNet + SD Prompt Output

Scribble/Sketch

Converts sketches and other line-drawn art to images.

Original Sketch
Preprocessor Generated Detectmap
ControlNet + SD Prompt Output

OpenPose (and Derivatives, OpenPose v2, BodyPose, etc.)

OpenPose will detect a human pose and apply it to a subject in your image. It creates a “skeleton” with a head, trunk, and limbs, and can even include hands (with fingers) and facial orientation. Multiple OpenPose Skeletons can be combined to create dynamic crowd scenes;

License Free Unsplash Stock
Preprocessor Generated Detectmap
ControlNet + SD Generated Output

SEG – Semantic Segmentation

SEG detects, and segments, parts of images based on color and shape;

License Free Unsplash Stock
Preprocessor Generated Detectmap
ControlNet + SD Prompt Output

Depth

Replace or re-draw the subject (or parts of an image), based on greyscale depth maps;

SDXL Image
Preprocessor Generated Detectmap
ControlNet + SD Generated Output

Normal Map

Normal Maps are somewhat similar to Depth Maps, but retain minor surface details and geometry;

SDXL Image
Preprocessor Generated Detectmap
ControlNet + SD Generated Output

Color

Produces a color swatch/palette based on the input image, which is then applied to the prompted image;

SDXL Image
Preprocessor Generated Detectmap
ControlNet + SD Generated Output

Style

Transfer the theme, style, or certain elements of an image into your generated image, without mentioning them in the prompt. Note that this Model uses the clip_vision preprocessor (more on Preprocessor types below) and does not produce a Detectmap. Also note that to function, this Model requires prompts to be under 75 tokens long.

SD 1.5 Image
ControlNet + SD Generated Output

Installing ControlNet & The Models

Automatic1111

Installing ControlNet for Automatic1111 is extremely straightforward; no different than installing any other extension. Make sure your Automatic1111 installation is up to date, then head to the Extensions tab.

Search for sd-webui-controlnet, and look for the Extension highlighted above – it’s easy to find as it has over 12,000 stars on Github. If you’re having trouble finding it, you can Order the list by Stars, and it will jump to the top of the list.

Once you’ve installed the Extension and restarted the WebUI Console, you’ll need to download the models (links above) into the ControlNet models directory. By default, you’ll find that directory at;

stable-diffusion-webui\extensions\sd-webui-controlnet\models

We can also configure custom save directories for both the ControlNet models, and Preprocessor models (more details below!), by looking in the WebUI Settings > ControlNet Options.

ComfyUI

ComfyUI has native out-the-box support for ControlNet; no third-party extensions are required. ControlNet models should be downloaded and placed in the following directory;

ComfyUI\models\controlnet

Using ControlNet (Automatic1111 WebUI)

Once installed to Automatic1111 WebUI ControlNet will appear in the accordion menu below the Prompt and Image Configuration Settings as a collapsed drawer. It will show the version number of the currently installed version. ControlNet is one of the most frequently updated extensions, with new features being added (and broken!) on an almost weekly basis, so it’s extremely useful to know, at-a-glance, which version is installed.

Location of the ControlNet settings drawer, on the left of the WebUI

The ControlNet interface appears for use in both text2img and img2img modes.

ControlNet Options

The ControlNet interface can appear intimidating at first glance, but we’ll step through all the options and explain what each does, and the choices will be demystified in no time! From top to bottom;

Interface OptionFunction
The Image BoxThis is where we drop (or upload) our Source Image – the image which we want to extract some trait from, to pass to the new image to be generated.
EnableTurns the ControlNet instance on/off
Low VRAMChecking this option allows ControlNet to function with less than 6GB of VRAM, at the expense of processing speed.
Pixel PerfectWhen checked, automatically calculates the correct Preprocessor resolution for the input image (more details below!)
Allow PreviewWhen checked, will display the Preprocessor created Detectmap alongside the ControlNet input image – extremely useful to see exactly what the Preprocessor is doing.
Interface OptionFunction
Control TypeSelecting a Control Type radio button will attempt to automatically set the Preprocessor and Model appropriately, but if we want control over which Preprocessor to use with a given model, setting manually is best.
PreprocessorA list of available Preprocessors (more details below!)
“Bang” button (Preview Annotator Result)Clicking this button, once you’ve selected a Preprocessor, will run the Preprocessor against the input image, displaying a preview of the output.
ModelThe ControlNet model we wish to use. Note that the Preprocessor and the Model should be set appropriately. The Depth Preprocessors work with the Depth Models, and so forth, although there is some overlap (more details below!)
Model Refresh ButtonIf ControlNet models have been downloaded while WebUI is running, there’s no need to restart – simply click this button to refresh the Model list.
Interface OptionFunction
Control WeightHow much emphasis to apply to the ControlNet image when generating the final output.
Starting Control StepRather than applying ControlNet to every Step of image generation, this allows us to start the application of ControlNet at a predefined Step of the image generation process.
Ending Control StepThis allows us to define at which Step ControlNet should stop applying, during image generation.

Below these we have the options for Control Mode. Control Mode was previously called “Guess Mode” in older ControlNet versions.

Interface OptionFunction
Control Mode – BalancedBalanced strikes balance between the input prompt and ControlNet. Puts ControlNet on both sides of the GFG scale. The same as having Guess Mode disabled in the old ControlNet.
My prompt is more importantUses progressively reduced U-Net injections of ControlNet to ensure that your prompt is given more influence over the image generation.
ControlNet is more importantPuts ControlNet only on the Conditional Side. This means that ControlNet will be made N times stronger, based on your CFG setting! If your CFG Scale is set to 7, ControlNet will be injected at 7 times the strength. Note that this setting is distinct from Control Weight. Using this setting gives ControlNet more leeway to guess what is missing from the prompt, in generating the final image.

The next options are the Resize Modes. Resize modes tell ControlNet how to handle input images of different dimensions than those of the txt2img settings.

Interface OptionFunction
Just ResizeThe ControlNet input image will be stretched (or compressed) to match the height and width of the text2img (or img2img) settings. This will alter the aspect ratio of the Detectmap.
Crop and ResizeThe ControlNet Detectmap will be cropped and re-scaled to fit inside the height and width of the txt2img settings. The Default setting, and the most useful.
Resize and FillFits the Detectmap into the text2img canvas settings, and extends the Detectmap with “emptiness” to fill any spaces.

The last settings allow us to perform a Loopback, and set ControlNet Presets;

Interface OptionFunction
LoopbackPasses the generated image back into ControlNet for a second pass!
PresetsGives the ability to save and reload ControlNet settings as Presets.

Additionally, there are some buttons below the Input Image which perform some specialized functions;

Interface ButtonFunction
?Creates a new Canvas (see below)
?Enable your webcam! Take selfies and apply them as the ControlNet input image.
?Flip webcam horizontal orientation
??Send Dimension button – duplicates the dimensions from the ControlNet input image to the txt2img (or img2img) Width and Height.

Other ControlNet Options – Multiple ControlNet Instances

We have the ability to enable up to 10 (!!) ControlNet instances (called “Units“), which we can chain together to produce phenomenal results. Each instance will be displayed as a tab, nested under the ControlNet drawer.

ControlNet with three Instances/Units enabled

To enable multiple Instances/Units, open the WebUI Settings for ControlNet, and use the Multi ControlNet slider to specify how many Instances/Units to enable in the interface.

Preprocessors (Annotators)

Preprocessors (also called Annotators in the original ControlNet research paper), (often, but not always) correspond to a ControlNet model, and there are sometimes multiple Preprocessor choices for each model! The examples below show some of these Preprocessor outputs for each model type, using the same ControlNet input images, for consistency.

Input image to test Preprocessors
Input image to test OpenPose Preprocessors
Input image to test scribble/sketch Preprocessors

Note that the first time a Preprocessor is selected from the Preprocessor list and an image generated, it may seem like nothing is happening for an extended period. Upon the initial run of each Preprocessor, additional required files and models will be downloaded.

Depth

Depth provides four Preprocessors which produce varying gradients between high/low areas.

depth_midas
depth_zoe
depth_leres++
depth_leres

NormalMap

There are two NormalMap Preprocessors, picking up different layers of detail.

Output examples to follow.

normal_bae
normal_midas

OpenPose

There are four OpenPose Preprocessors, becoming progressively more detailed until featuring hand and finger posing, and facial orientation. Note that the base openpose Preprocessor only captures the “body” of a subject, and openpose_full is a combination of openpose + openpose hand (not shown) + openpose_face.

Output examples to follow.

openpose
openpose_face
openpose_faceonly
openpose_full

Lineart

Trained on line drawings, can generate rough or detailed lineart from input images, and can be used to translate lineart images into full color images.

Output examples to follow.

lineart_anime
lineart_anime_denoise
lineart_coarse
lineart_realistic

Softedge

Extremely versatile Preprocessor, great for capturing the outline and detail of many types of image.

When looking for best result quality, softedge_hed is the clear winner, followed by softedge_pidinet.

softedge_hed
softedge_hedsafe
softedge_pidinet
softedge_pidisafe

Scribble

Four Preprocessors, each capable of turning hand-drawn scribble drawings into images.

scribble_hed
scribble_pidinet
scribble_xdog
t2ia_sketch_pidi

Segmentation

Three Preprocessors excelling in semantic segmentation.

seg_ofade20k
seg_ofcoco
seg_ufade20k

Reference

The Preprocessor reference_only is an unusual type of Preprocessor which does not require any Control model, but guides diffusion directly using the source image as a reference.

This can be used to make images of a similar style, especially anime and cartoons!

Revision

Similar to the reference_only Preprocessor, revision_clipvision and revision_ignore_prompt use the ControlNet image as a source for the generation of image variationsno prompt needed!

Note that the revision_clipvision Preprocessor is 3.4GB in size.

Part II – Coming Soon!

Part I has just scratched the surface of ControlNet! We’ve looked at what it does, how to install it and where to get the models from. We’ve covered the settings and options in the interface, and we’ve explored some of the Preprocessor options.

Part II will look at;

  • Real-world use-cases – how we can use ControlNet to level-up our generations.
  • Using ControlNet with ComfyUI – the nodes, sample workflows.
  • Companion Extensions, such as OpenPose 3D, which can be used to give us unparalleled control over subjects in our generations.
  • ControlNet resources on Civitai.com