Last Updated | Changes |
10/22/2024 | First Version |
10/28/2024 | Generator and File Update |
SD 3.5?
Stable Diffusion 3 is back! After the lacklustre, much derided, launch of Stable Diffusion 3, Stability have made sweeping architecture and training changes to their latest line of models, the 3.5 suite, now released under an updated, more permissive Community License, with enhancements to image fidelity, prompt adherence & controllability, and text rendering.

Civitai generated test images (prompts borrowed from community uploads), using SD 3.5 Large;





What’s New?
Stability AI describe the SD3.5 models as excelling in the following areas;
- Customizability: Easily fine-tune the model to meet your specific creative needs, or build applications based on customized workflows.
- Efficient Performance: Optimized to run on standard consumer hardware without heavy demands, especially the Stable Diffusion 3.5 Medium and Stable Diffusion 3.5 Large Turbo models.
- Diverse Outputs: Creates images representative of the world, not just one type of person, with different skin tones and features, without the need for extensive prompting.
- Versatile Styles: Capable of generating a wide range of styles and aesthetics like 3D, photography, painting, line art, and virtually any visual style imaginable.
And with the revised license terms, comes;
- Ownership of outputs: Retain ownership of the media generated without restrictive licensing implications.
According to Stability AI’s own ELO scoring, the models lead the market in prompt adherence and rival much larger models in image quality – edging out Flux.1 Dev;

Using on Civitai’s Image Generator
SD 3.5 Large and Turbo are available for use on the Image Generator. Find them from the model search, or the Create buttons on the models;

We’re still adjusting settings for the best quality images, and the pricing model – everything is so new that it’s all subject to change right now!

Weights / Downloads
SD 3.5 comes in three flavors; Large, Large Turbo, and Medium weight, the latter of which will run on consumer hardware; GPUs with at least 12 GB of VRAM.
At time of writing, there is support for ComfyUI only.
Stable Diffusion 3.5 Large: 8 billion parameters, this base model is the most powerful in the Stable Diffusion family. This model is ideal for professional use cases at 1 megapixel resolution.
Stable Diffusion 3.5 Large Turbo: A distilled version of Stable Diffusion 3.5 Large generates high-quality images in just 4 steps, making it considerably faster than Stable Diffusion 3.5 Large.
Stable Diffusion 3.5 Medium: At 2.5 billion parameters, with improved MMDiT-X architecture and training methods, this model is designed to run “out of the box” on consumer hardware, striking a balance between quality and ease of customization. It is capable of generating images ranging between 0.25 and 2 megapixel resolution.
Model | Download Link |
---|---|
Stable Diffusion 3.5 Large | Civitai |
Stable Diffusion 3.5 Large Turbo | Civitai |
Stable Diffusion 3.5 Medium | Civitai |
Comfy-Org Stable Diffusion Large 3.5 pf8 (with text encoders built in) | Civitai / HuggingFace |
To run SD3.5, the following Clip Text Encoders are also required;
Model | Download Link |
---|---|
OpenCLIP-ViT/G | Civitai / HuggingFace |
CLIP-ViT/L | Civitai / HuggingFace |
T5-xxl_fp16 | Civitai / HuggingFace |
T5-xxl_fp8_e4m3fn | Civitai / HuggingFace |
To be placed within the \ComfyUI\models\clip directory.
Model Comparison
We’ve put together a quick and extremely unscientific comparison of Large, Large Turbo, Medium, and the closest competitor, Flux Dev. These images share the same prompts and settings (steps, cfg, and sampler adjusted for Large Turbo) and are slightly cherry-picked – taking the best of 5 generations for each model;



Workflows
We’ve provided a couple of simple ComfyUI text2img workflows, based upon the official Stability AI SD 3.5 workflows, edited for clarity and ease of use;



License Considerations
Stable Diffusion 3.5 is released under the Stability AI Community License; Free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in total annual revenue. More details can be found in the Stability AI Community License Agreement.
Individuals and organizations with an annual revenue above $1M must contact Stability to obtain an Enterprise License.
Other Details
- We’ve read that the context length (prompt length) has a maximum of 256 tokens, half of the context length of Flux. How this will affect prompting remains to be seen.
- There’s a Stability finetuning guide, and Kohya support for SD 3.5 Large. No Medium support yet!
- There’s a cut-down fp8 Text Encoder from Comfy-Org, designed for lower VRAM platforms.
- The models have some NSFW capability – much more so than SD3, or Flux!
- LoRAs created with SD 3.5 Large do not currently work with SD 3.5 Medium weights. This might be a limitation of ComfyUI – we’re not sure!
I need more help!
If you have a question about SD3.5 don’t hesitate to reach out to one of our Community Managers or contact [email protected]