Last Updated | Changes |
9/13/2023 | First version published |
What is ControlNet?
ControlNet is an implementation of the research Adding Conditional Control to Text-to-Image Diffusion Models.
It’s a neural network which exerts control over Stable Diffusion (SD) image generation in the following way;
data:image/s3,"s3://crabby-images/c5127/c5127422888220cd7a5fc8995ddb733d465156d4" alt=""
But what does it mean for us, as users? ControlNet is a collection of models which do a bunch of things, most notably subject pose replication, style and color transfer, and depth-map image manipulation. They’re tools allowing us to transfer one aspect of an image to another.
There’s a standalone Github repository for ControlNet, maintained by user Illyasviel, and an Extension for popular SD interface Automatic1111, maintained by user Mikubill.
Show me examples!
ControlNet is best described with example images. In the first example, we’re replicating the composition of an image, but changing the style and theme, using a ControlNet model called Canny. The top left image is the original output from SD.
data:image/s3,"s3://crabby-images/2ee89/2ee89f8da7ab90cdb520e31b4bc12dfe8a0df41a" alt=""
The second example uses a model called OpenPose to extract a character’s pose from an input image (in this case a real photograph), duplicating the position of the body, arms, head, appendages, etc. in the generated image. The input image can be a photograph, or a generated image – anything in which a human body can be detected.
data:image/s3,"s3://crabby-images/ce057/ce0573dadfdaaded29bc5bc4da7d8060ed107757" alt=""
The ControlNet Models
There are ControlNet models for SD 1.5, SD 2.X, and SDXL. There have been a few versions of SD 1.5 ControlNet models – we’re only listing the latest 1.1 versions for SD 1.5 for download, below, along with the most recent SDXL models.
Note that many developers have released ControlNet models – the models below may not be an exhaustive list of every model available!
Also note that many ControlNet models require additional .yaml configuration files to be placed in the same directory as the model. On Civitai, check the “Files” section while downloading to see if a config file is available.
data:image/s3,"s3://crabby-images/54faf/54faf5880153ea2a3958d9d17dff31a24149cb6f" alt=""
Illyasviel ControlNet 1.1 Models for SD 1.5 – Civitai Model Page
Model | Download Link (Civitai) |
Canny | ControlNet 1.1 Models – Canny |
Depth | ControlNet 1.1 Models – Depth |
HED/SoftEdge | ControlNet 1.1 Models – Softedge |
Normal | ControlNet 1.1 Models – Normal |
Scribble | ControlNet 1.1 Models – Scribble |
MLSD | ControlNet 1.1 Models – MLSD |
OpenPose | ControlNet 1.1 Models – OpenPose |
Seg | ControlNet 1.1 Models – Seg |
Inpaint | ControlNet 1.1 Models – Inpaint |
ip2p | ControlNet 1.1 Models – Pix2Pix |
Lineart | ControlNet 1.1 Models – Lineart |
lineart anime | ControlNet 1.1 Models – AnimeLine |
Shuffle | ControlNet 1.1 Models – Shuffle |
Tile | ControlNet 1.1 Models – Tile (e) |
TencentARC T2I Adapter Models for SD 1.5 – Civitai Model Page
Model | Download Link (Civitai) |
Color | ControlNet T2I – Color |
Style | ControlNet T2I – Style |
Sketch | ControlNet T2I – Sketch |
OpenPose | ControlNet T2I – OpenPose |
KeyPose | ControlNet T2I – KeyPose |
Depth | ControlNet T2I – Depth |
Canny | ControlNet T2I – Canny |
BodyPose | ControlNet T2I – BodyPose |
Seg | ControlNet T2I – Seg |
SDXL Models – Various Developers – Civitai Model Page
What do the Models do?
Many of the models above are “duplicates”, in that there are multiple models which perform much the same task, but produce slightly different outputs, being trained to perform the task with slightly different training parameters. The list below gives one example from each of the popular model types.
You really have to try them out for yourself to see if you prefer Stability.ai’s Depth vs Kohya’s Depth, for example. There’s a lot of personal preference involved.
The Preprocessor (also called the Annotator) is what converts your uploaded image into a detectmap (examples below), which is fed into ControlNet to produce the output effect. The Preprocessor does not need to be set if uploading a pre-made detectmap, if you created an OpenPose skeleton manually, for example.
A number of the most popular models are demonstrated below.
Canny – Edge Detection
Canny creates simple sharp lines around areas of high/low contrast;
data:image/s3,"s3://crabby-images/e2164/e21641fa2743223479d3704f726a3f8cbd2e5f20" alt=""
data:image/s3,"s3://crabby-images/c9bdb/c9bdb7c1b829f499512dffd73f21cfc4a7656da3" alt=""
data:image/s3,"s3://crabby-images/53dc6/53dc6bb0a90e2c639ab17689f76cb9b1bbbb403d" alt=""
MLSD – Mobile Line Segment Detection
Straight Line Detection model for architecture and man-made objects;
data:image/s3,"s3://crabby-images/cbc2d/cbc2d95c5c41710932cabcea76a552fc5f5a42d6" alt=""
data:image/s3,"s3://crabby-images/64a69/64a69bb0118fd434023ab2afe2224d394173d1b4" alt=""
data:image/s3,"s3://crabby-images/b8df9/b8df9430852f01dabe8eea91c8e205e19bd5bfb4" alt=""
HED – Holistically-Nested Edge Detection (Also SoftEdge)
Creates smooth lines around objects, especially useful for recoloring and stylizing, using soft-edge detection.
data:image/s3,"s3://crabby-images/d87c1/d87c10fe368f3e58a8a8cdbdbda48580ad7bcf56" alt=""
data:image/s3,"s3://crabby-images/425e2/425e281f5ee5c07e1130cde21a52ea2b7ca263b0" alt=""
data:image/s3,"s3://crabby-images/cddaf/cddaf1a0fb5294ad5eb8f5bd6cefb45e19c990f0" alt=""
Scribble/Sketch
Converts sketches and other line-drawn art to images.
data:image/s3,"s3://crabby-images/43f29/43f29cebd24d78a92d3da33fb57815c4a7c4c50a" alt=""
data:image/s3,"s3://crabby-images/f828e/f828e066432432b9ee4ce2402ec7d039fe4ac492" alt=""
data:image/s3,"s3://crabby-images/9bb40/9bb402208b3f5b4597247c78757606840714dac9" alt=""
OpenPose (and Derivatives, OpenPose v2, BodyPose, etc.)
OpenPose will detect a human pose and apply it to a subject in your image. It creates a “skeleton” with a head, trunk, and limbs, and can even include hands (with fingers) and facial orientation. Multiple OpenPose Skeletons can be combined to create dynamic crowd scenes;
data:image/s3,"s3://crabby-images/2017a/2017a3af178cb7f49f1f7f17bab4c1828f694b22" alt=""
data:image/s3,"s3://crabby-images/129c6/129c642c913b94ca21a1c6278c6d0908abede7ea" alt=""
data:image/s3,"s3://crabby-images/11788/117880d3628395777d3c13ce7b7cc2fe3a6d36d5" alt=""
SEG – Semantic Segmentation
SEG detects, and segments, parts of images based on color and shape;
data:image/s3,"s3://crabby-images/28267/282676267599006669b57155d5eaff6426aad7f5" alt=""
data:image/s3,"s3://crabby-images/78165/7816528833f90052e770f833bb10d46593e5b70a" alt=""
data:image/s3,"s3://crabby-images/ef0e3/ef0e37177620ed3f04c5046b5427941f8be05b8d" alt=""
Depth
Replace or re-draw the subject (or parts of an image), based on greyscale depth maps;
data:image/s3,"s3://crabby-images/a27fc/a27fc1387fdf708073e55415a45555d1b0528cfa" alt=""
data:image/s3,"s3://crabby-images/2c395/2c3955157d3dbed552fc239df74e06f25bc64eae" alt=""
data:image/s3,"s3://crabby-images/59751/5975193cdc827591c84c66aed168273676f9a0f4" alt=""
Normal Map
Normal Maps are somewhat similar to Depth Maps, but retain minor surface details and geometry;
data:image/s3,"s3://crabby-images/0916d/0916d8d99b2eb19c735603aa1d047311bbdd6f5c" alt=""
data:image/s3,"s3://crabby-images/dbe9a/dbe9ab8599ed8bc8822a2b3be214ad0d629f53ce" alt=""
data:image/s3,"s3://crabby-images/61915/61915ed93c0ca052d1a552af8d97d243f3066c82" alt=""
Color
Produces a color swatch/palette based on the input image, which is then applied to the prompted image;
data:image/s3,"s3://crabby-images/97343/97343cab2b5b4b97b631e04f400d9e33a4ae2de6" alt=""
data:image/s3,"s3://crabby-images/671e5/671e515ebe9760cb7bde82a91f7dccdcad997e17" alt=""
data:image/s3,"s3://crabby-images/a1866/a186606e7af7b53f031742d2cba39da2dcdd177b" alt=""
Style
Transfer the theme, style, or certain elements of an image into your generated image, without mentioning them in the prompt. Note that this Model uses the clip_vision preprocessor (more on Preprocessor types below) and does not produce a Detectmap. Also note that to function, this Model requires prompts to be under 75 tokens long.
data:image/s3,"s3://crabby-images/01fc4/01fc459e1852b4c83c0212e4a39405c83ae06d5a" alt=""
data:image/s3,"s3://crabby-images/59910/599105273133f1e4181b6c25d2e20d2b461c90ca" alt=""
Installing ControlNet & The Models
Automatic1111
Installing ControlNet for Automatic1111 is extremely straightforward; no different than installing any other extension. Make sure your Automatic1111 installation is up to date, then head to the Extensions tab.
data:image/s3,"s3://crabby-images/3d12f/3d12fde033d9324d3fd62b887068c8fdff29f5f8" alt=""
Search for sd-webui-controlnet, and look for the Extension highlighted above – it’s easy to find as it has over 12,000 stars on Github. If you’re having trouble finding it, you can Order the list by Stars, and it will jump to the top of the list.
Once you’ve installed the Extension and restarted the WebUI Console, you’ll need to download the models (links above) into the ControlNet models directory. By default, you’ll find that directory at;
stable-diffusion-webui\extensions\sd-webui-controlnet\model
s
We can also configure custom save directories for both the ControlNet models, and Preprocessor models (more details below!), by looking in the WebUI Settings > ControlNet Options.
data:image/s3,"s3://crabby-images/e904c/e904cf115f957fc568717a0e91a72d3b133349c4" alt=""
ComfyUI
ComfyUI has native out-the-box support for ControlNet; no third-party extensions are required. ControlNet models should be downloaded and placed in the following directory;
ComfyUI\models\controlnet
Using ControlNet (Automatic1111 WebUI)
Once installed to Automatic1111 WebUI ControlNet will appear in the accordion menu below the Prompt and Image Configuration Settings as a collapsed drawer. It will show the version number of the currently installed version. ControlNet is one of the most frequently updated extensions, with new features being added (and broken!) on an almost weekly basis, so it’s extremely useful to know, at-a-glance, which version is installed.
data:image/s3,"s3://crabby-images/06b35/06b35dd63500e377ef7367b587ed8aef713d8f04" alt=""
The ControlNet interface appears for use in both text2img and img2img modes.
ControlNet Options
The ControlNet interface can appear intimidating at first glance, but we’ll step through all the options and explain what each does, and the choices will be demystified in no time! From top to bottom;
data:image/s3,"s3://crabby-images/52e58/52e5898b08d08139a397fa69a184918e3a2cd1e1" alt=""
Interface Option | Function |
---|---|
The Image Box | This is where we drop (or upload) our Source Image – the image which we want to extract some trait from, to pass to the new image to be generated. |
Enable | Turns the ControlNet instance on/off |
Low VRAM | Checking this option allows ControlNet to function with less than 6GB of VRAM, at the expense of processing speed. |
Pixel Perfect | When checked, automatically calculates the correct Preprocessor resolution for the input image (more details below!) |
Allow Preview | When checked, will display the Preprocessor created Detectmap alongside the ControlNet input image – extremely useful to see exactly what the Preprocessor is doing. |
data:image/s3,"s3://crabby-images/90577/90577abd60430ddd131364fef26e9b0670d5c7c4" alt=""
Interface Option | Function |
---|---|
Control Type | Selecting a Control Type radio button will attempt to automatically set the Preprocessor and Model appropriately, but if we want control over which Preprocessor to use with a given model, setting manually is best. |
Preprocessor | A list of available Preprocessors (more details below!) |
“Bang” button (Preview Annotator Result) | Clicking this button, once you’ve selected a Preprocessor, will run the Preprocessor against the input image, displaying a preview of the output. |
Model | The ControlNet model we wish to use. Note that the Preprocessor and the Model should be set appropriately. The Depth Preprocessors work with the Depth Models, and so forth, although there is some overlap (more details below!) |
Model Refresh Button | If ControlNet models have been downloaded while WebUI is running, there’s no need to restart – simply click this button to refresh the Model list. |
data:image/s3,"s3://crabby-images/f5f10/f5f1058f1203d3b1e138a6b7badf835f70dcf9b2" alt=""
Interface Option | Function |
---|---|
Control Weight | How much emphasis to apply to the ControlNet image when generating the final output. |
Starting Control Step | Rather than applying ControlNet to every Step of image generation, this allows us to start the application of ControlNet at a predefined Step of the image generation process. |
Ending Control Step | This allows us to define at which Step ControlNet should stop applying, during image generation. |
Below these we have the options for Control Mode. Control Mode was previously called “Guess Mode” in older ControlNet versions.
Interface Option | Function |
---|---|
Control Mode – Balanced | Balanced strikes balance between the input prompt and ControlNet. Puts ControlNet on both sides of the GFG scale. The same as having Guess Mode disabled in the old ControlNet. |
My prompt is more important | Uses progressively reduced U-Net injections of ControlNet to ensure that your prompt is given more influence over the image generation. |
ControlNet is more important | Puts ControlNet only on the Conditional Side. This means that ControlNet will be made N times stronger, based on your CFG setting! If your CFG Scale is set to 7, ControlNet will be injected at 7 times the strength. Note that this setting is distinct from Control Weight. Using this setting gives ControlNet more leeway to guess what is missing from the prompt, in generating the final image. |
The next options are the Resize Modes. Resize modes tell ControlNet how to handle input images of different dimensions than those of the txt2img settings.
Interface Option | Function |
---|---|
Just Resize | The ControlNet input image will be stretched (or compressed) to match the height and width of the text2img (or img2img) settings. This will alter the aspect ratio of the Detectmap. |
Crop and Resize | The ControlNet Detectmap will be cropped and re-scaled to fit inside the height and width of the txt2img settings. The Default setting, and the most useful. |
Resize and Fill | Fits the Detectmap into the text2img canvas settings, and extends the Detectmap with “emptiness” to fill any spaces. |
The last settings allow us to perform a Loopback, and set ControlNet Presets;
Interface Option | Function |
---|---|
Loopback | Passes the generated image back into ControlNet for a second pass! |
Presets | Gives the ability to save and reload ControlNet settings as Presets. |
Additionally, there are some buttons below the Input Image which perform some specialized functions;
Interface Button | Function |
---|---|
? | Creates a new Canvas (see below) |
? | Enable your webcam! Take selfies and apply them as the ControlNet input image. |
? | Flip webcam horizontal orientation |
?? | Send Dimension button – duplicates the dimensions from the ControlNet input image to the txt2img (or img2img) Width and Height. |
Other ControlNet Options – Multiple ControlNet Instances
We have the ability to enable up to 10 (!!) ControlNet instances (called “Units“), which we can chain together to produce phenomenal results. Each instance will be displayed as a tab, nested under the ControlNet drawer.
data:image/s3,"s3://crabby-images/fa246/fa24609e7858e4769f354b9a22666bd147f28a9c" alt=""
To enable multiple Instances/Units, open the WebUI Settings for ControlNet, and use the Multi ControlNet slider to specify how many Instances/Units to enable in the interface.
data:image/s3,"s3://crabby-images/5c389/5c38962d94c4112b2d43752bdd0b4c633170516b" alt=""
Preprocessors (Annotators)
Preprocessors (also called Annotators in the original ControlNet research paper), (often, but not always) correspond to a ControlNet model, and there are sometimes multiple Preprocessor choices for each model! The examples below show some of these Preprocessor outputs for each model type, using the same ControlNet input images, for consistency.
data:image/s3,"s3://crabby-images/5fa33/5fa33080e35ad78e9e9dd9bc7af1752a337220ea" alt=""
data:image/s3,"s3://crabby-images/b77e5/b77e54c5dff04ae2d67ab5c952bb2d87619dc35b" alt=""
data:image/s3,"s3://crabby-images/b3d45/b3d45b5b2ce9f39536efc85afc27ca9cbb5bacf5" alt=""
Note that the first time a Preprocessor is selected from the Preprocessor list and an image generated, it may seem like nothing is happening for an extended period. Upon the initial run of each Preprocessor, additional required files and models will be downloaded.
Depth
Depth provides four Preprocessors which produce varying gradients between high/low areas.
data:image/s3,"s3://crabby-images/5c67f/5c67f8855b0255090f8bfd297cf4e0adc3da26d6" alt=""
data:image/s3,"s3://crabby-images/bcd43/bcd4316a3e26465e337bf6caef204df9fd177112" alt=""
data:image/s3,"s3://crabby-images/0018c/0018c581218c7fb04c7c901c7080ba2bbef5c824" alt=""
data:image/s3,"s3://crabby-images/3edc8/3edc8920db4b3c45cf968b3dfb7516a5e381c68f" alt=""
data:image/s3,"s3://crabby-images/7f2d0/7f2d009dc2deab0b5e6043b2a5270541c3779847" alt=""
data:image/s3,"s3://crabby-images/8d7ff/8d7ff3f5139f6e312e0c0ba81b661b3fa7ee4738" alt=""
data:image/s3,"s3://crabby-images/69606/69606a14750e91a57dc46cba531cfc5d5ccc55eb" alt=""
data:image/s3,"s3://crabby-images/905ca/905ca81c4832454a00f755a605975351c792a266" alt=""
NormalMap
There are two NormalMap Preprocessors, picking up different layers of detail.
Output examples to follow.
data:image/s3,"s3://crabby-images/50167/50167b8d84d7c4349630971a6197995bdfd63c5c" alt=""
data:image/s3,"s3://crabby-images/d0ad7/d0ad7de249de5af6bf1f7ef4aa2981557b2bd12d" alt=""
OpenPose
There are four OpenPose Preprocessors, becoming progressively more detailed until featuring hand and finger posing, and facial orientation. Note that the base openpose Preprocessor only captures the “body” of a subject, and openpose_full is a combination of openpose + openpose hand (not shown) + openpose_face.
Output examples to follow.
data:image/s3,"s3://crabby-images/97cd2/97cd270437f6062e5f341982d00cfd4f44ae0893" alt=""
data:image/s3,"s3://crabby-images/0aedd/0aedde6a2577dec0e6e8ea5118710ba6ad58af09" alt=""
data:image/s3,"s3://crabby-images/e36a5/e36a5f4b65c1a2bffddf6f671f7b047ebd71c8af" alt=""
data:image/s3,"s3://crabby-images/4cdd0/4cdd0d52fa96bb1ae823a0fa2a11a0d3b2e81206" alt=""
Lineart
Trained on line drawings, can generate rough or detailed lineart from input images, and can be used to translate lineart images into full color images.
Output examples to follow.
data:image/s3,"s3://crabby-images/5aae3/5aae31569026bd3c4618289ccc928a028f2bc616" alt=""
data:image/s3,"s3://crabby-images/3a91c/3a91c2a604bd7bd3e5116aea085a2f4e0af4a64d" alt=""
data:image/s3,"s3://crabby-images/1da3b/1da3b0d911be40727b24ac82d06a91e9bbbec24a" alt=""
data:image/s3,"s3://crabby-images/4c035/4c03581f5fdd7fd27a5e8f812a428080a043f3d6" alt=""
Softedge
Extremely versatile Preprocessor, great for capturing the outline and detail of many types of image.
When looking for best result quality, softedge_hed is the clear winner, followed by softedge_pidinet.
data:image/s3,"s3://crabby-images/d675d/d675dd9096b41aab8536782a992e8c4c793cbdc2" alt=""
data:image/s3,"s3://crabby-images/7fea9/7fea9bd9ebc34e7585388784fa9c98c073a13818" alt=""
data:image/s3,"s3://crabby-images/fff0d/fff0d0cfb3f80660d97ccb378736456ae931f732" alt=""
data:image/s3,"s3://crabby-images/16f84/16f84d21c3a0fdf00a1c0a61e584da44172d8261" alt=""
data:image/s3,"s3://crabby-images/e211c/e211c2579a8c7402da81a80e70430e361de02534" alt=""
data:image/s3,"s3://crabby-images/3e81e/3e81ed34a15b99e83e11202fce986c27faca9d15" alt=""
data:image/s3,"s3://crabby-images/e8fc3/e8fc31c39410c37da24bd95ef123d80206b0c391" alt=""
data:image/s3,"s3://crabby-images/ad7a0/ad7a072dc75b3652f9517d1a1d84dfc44be4b26d" alt=""
Scribble
Four Preprocessors, each capable of turning hand-drawn scribble drawings into images.
data:image/s3,"s3://crabby-images/6de96/6de96f515571142b1ac84bf9c9acb46dc1a40aa2" alt=""
data:image/s3,"s3://crabby-images/61b45/61b454131de7a3d469e0f83df7b3a4d4f21839ad" alt=""
data:image/s3,"s3://crabby-images/da324/da324f7d765ba466fb00cb42ed35ff426bc80b70" alt=""
data:image/s3,"s3://crabby-images/56bcd/56bcd7d91560011608974089e45f69eb4db45536" alt=""
data:image/s3,"s3://crabby-images/fb2d6/fb2d6b82ec7f1093f1390bd05e48d8244b54ee9e" alt=""
data:image/s3,"s3://crabby-images/dda23/dda2313d43df6ae045479804a7f466671c8171a3" alt=""
data:image/s3,"s3://crabby-images/28756/287561d6937b301b2bcd8d923a01f489b629f89a" alt=""
data:image/s3,"s3://crabby-images/5f6f7/5f6f793aec946d629c3f0ac2a8fd03e2a74efc08" alt=""
Segmentation
Three Preprocessors excelling in semantic segmentation.
data:image/s3,"s3://crabby-images/97ac8/97ac848281e4a8caa3ce226c255fc90c032fc4c5" alt=""
data:image/s3,"s3://crabby-images/b142c/b142c52d7ccfd13ee8a7501cf7522f8acc744ceb" alt=""
data:image/s3,"s3://crabby-images/0bb1c/0bb1c02e8ae52854036136489600183d3d138570" alt=""
data:image/s3,"s3://crabby-images/ec768/ec768738496d2f6f89fa18acbd977c1b35eea9a5" alt=""
data:image/s3,"s3://crabby-images/d828d/d828d076a5798053970f4d1a1e6183ef7dc46013" alt=""
data:image/s3,"s3://crabby-images/19265/192650e8f6e6c1323d11cca1083fd8d569b7f1f2" alt=""
Reference
The Preprocessor reference_only is an unusual type of Preprocessor which does not require any Control model, but guides diffusion directly using the source image as a reference.
This can be used to make images of a similar style, especially anime and cartoons!
data:image/s3,"s3://crabby-images/63cc3/63cc36545afb619c0c750229a42db9dc07b46597" alt=""
Revision
Similar to the reference_only Preprocessor, revision_clipvision and revision_ignore_prompt use the ControlNet image as a source for the generation of image variations – no prompt needed!
Note that the revision_clipvision Preprocessor is 3.4GB in size.
data:image/s3,"s3://crabby-images/157e1/157e141a045544211c63863df5aaccd710a5f228" alt=""
Part II – Coming Soon!
Part I has just scratched the surface of ControlNet! We’ve looked at what it does, how to install it and where to get the models from. We’ve covered the settings and options in the interface, and we’ve explored some of the Preprocessor options.
Part II will look at;
- Real-world use-cases – how we can use ControlNet to level-up our generations.
- Using ControlNet with ComfyUI – the nodes, sample workflows.
- Companion Extensions, such as OpenPose 3D, which can be used to give us unparalleled control over subjects in our generations.
- ControlNet resources on Civitai.com