Last Updated | Changes |
11/16/2023 | First version published |
11/24/2023 | Update on near real-time processing |
What is an LCM?
Latent Consistency Models (LCMs) are quickly becoming the hottest new Stable Diffusion technology, but there’s a lot of confusion about what they are, and how they can be used. In this guide we’ll try to dispel some of the fog surrounding the technology, and help you get started creating your own LCM-assisted images!
The LCM LoRA behaves exactly like a normal LoRA – there’s an SDXL version, and an SD 1.5 version. You place it in the same LoRA directory as you’d normal download your LoRAs to, and include it in your positive prompt just like any other image generation. The LCM magic only happens when we drop the CFG down to ~1.0, and the Sample Steps down to between ~3 and 8 steps!
To seasoned Prompters, these values seem absolutely absurd, but they allow us to to produce excellent quality images in a fraction of the time needed without the LCM model; in some cases, to sub one-second times.
You can read more about the math, and technical implementation here.
Why should I be excited by LCM LoRAs?
Speed! Images generated with the LCM LoRA active generate much faster than typical generations, and with reduced VRAM usage. We’re seeing near instantaneous generations on the highest end hardware, and significant speed boosts across all GPUs!
A few examples below;
The following SDXL images were generated on an RTX 4090 at 1280×1024 and upscaled to 1920×1152, in 4.8 seconds each, in the Automatic1111 interface.
The following SDXL images were generated on an RTX 4090 at 1024×1024, with 0.7 second generation times, via the ComfyUI interface.
Prompts by Wizz
How can I use the LCM LoRAs?
ComfyUI is leading the pack when it comes to leveraging the LCM LoRAs, but it is possible to generate (and get excellent results) with Automatic1111.
There’s support for both SDXL and SD 1.5 models, with corresponding LoRA files (in .safetensors format) which can be downloaded from the following locations and placed within the normal LoRA folder for your preferred Stable Diffusion Interface.
We then need to include the LoRA in our prompt, as we would any other LoRA. Lastly, we need to drastically lower both the CFG and Step Count! It is necessary to experiment with the CFG, Steps, and the LoRA strength, to get good results. Different models and even different prompts will require tweaks to the settings. Some guide ranges;
Setting | Suggested Value Range |
---|---|
LCM LoRA Strength | Between 1.0 (full strength) and 0.5 (half strength), depending on model |
Step Count | 3 to 8 |
CFG | 1 to 2.5 |
Sampler | ComfyUI now has LCM Sampler support! Select it from the sampler name in the KSampler node. There’s no need to use any third-party nodes. Automatic1111 users should use Euler a, or experiment with samplers until LCM support is added! |
Caveats – What’s the Catch?
Image quality! There is often a noticeable loss of image quality in many cases, and as mentioned previously, much tweaking is required to generate good images – there’s definitely a sweet-spot to be found, balancing generation speed for visual fidelity.
The examples below are all sub 5 second generations, and showcase the kind of quality that can be achieved with a bit of experimentation;
Images/Prompts by Wizz
Advanced: Near Real-Time Usage
We’re able to leverage the rapid image generation speed to create near real-time interactive workflows, processing images on-the-fly. A couple of implementations of this include;
Webcam-to-Image
Using Creator ToyXYZ’s experimental ComfyUI nodes, we can capture a .jpg or .png webcam image stream, bringing it into ComfyUI with a new node type;
This allows us to capture a webcam output image stream (see ToyXYZ’s CaptureCam Github), passing it into our flow as the input image (instead of a latent image), to create wonderful outputs like the following!
The possibilities for this tech are really exciting! Check back often, as we’ll keep this guide up to date to show off the latest developments.
Live OpenPose
We can leverage those same nodes, combined with OBS Studio and a Virtual Camera, to pose our characters in real-time with OpenPose and ControlNet!
We’ll create an LCM advanced usage guide, as this workflow and process are a little too complex to explain in a Quickstart guide! Also keep an eye on our Twitch stream where you’ll see this being demonstrated and explained soon!
The Future
We’ll continue to expand this quickstart guide with more information as it becomes available!