Stable Diffusion 3 Pre-Release Overview

Last Updated	Changes
6/5/2024	First Version

?? Spot any juicy SD3 info I’ve missed? Shoot me an email – ally @ civitai.com! ??

What is Stable Diffusion 3?

Stable Diffusion 3 (SD3) is an eagerly anticipated text-to-image model set to be released by Stability AI. This next evolution in their model line follows the groundbreaking SDXL, which was launched around the same time last year.

The excitement for SD3 has been building, and we’ve gathered all the available information from various sources into one comprehensive guide for you. While there are still many details we don’t know, we’re thrilled to share what we do know about this upcoming release.

Watch the full announcement on the Stable Diffusion 3 Medium weight release at @computex_taipei by our Co-CEO, @chrlaf: https://t.co/lcVl0tEr8M ? https://t.co/IoHXwZPYhE
— Stability AI (@StabilityAI) June 3, 2024

The official release date for SD3 is June 12th, a date announced by Stability AI Co-CEO, Christian Laforte, at the Computex Taipei conference.

So what do we know about SD3? Here’s some information from Stability’s SD3 Research Paper;

It, allegedly, outperforms both DALL·E 3 and Midjourney v6 in typography and prompt adherence, based on human preference evaluations.

It uses a new Multimodal Diffusion Transformer (MMDiT) architecture.

In early testing, the 8B Parameter model (not the one coming June 12th), could fit on a 24 GB RTX 4090, and would take 34 seconds to generate a 1024×1024 image, at 50 steps.

Much emphasis has been placed upon the text/typographical capabilities of the model, with SD3 slated to use “three different text embedders – two CLIP models and T5 – to encode text representations.” [Per SD3 Research Paper announcement]

SD3 employs a Rectified Flow (RF) formulation, based on a number of research papers (Arxiv Links: 1, 2, 3), which “…results in straighter inference paths, which then allow sampling with fewer steps.” [Per SD3 Research Paper announcement]

What does all that mean for image generation?

We know that the model being released on the 12th will be the Medium weights – a 2 Billion Parameter model. This has caused some consternation in the community, with various complaints, allegations, and memes, regarding the relatively small parameter size.

In comparison, SDXL was a “larger” model, leading some in the community to fear that SD3 will be inferior. However, a Stability AI staff member, via Re d dit, has this response;

“SDXL has 2.6b unet, and it’s not using MMDiT. Not comparable at all. It’s like comparing 2kg of dirt and 1.9kg of gold. Not to mention the 3 text encoders, adding up to ~15b params alone. And the 16ch vae.”

Note that SDXL has a 4ch vae. Stability have been emphasizing the significance of the 16 channel vae for a few months now (Source).

Other key information;

It has a native output of 1 MP.

832 x 1152, in the demo picture, generated with the Medium 2B weights.

Rectified Flow Transformers:
SD3 introduces rectified flow transformers, which use a new generative model formulation that connects data and noise in a straight line. This model is theoretically simpler and has better performance compared to traditional diffusion models. [Per SD3 Research Paper announcement]

Improved Noise Sampling:
The paper discusses enhancements in noise sampling techniques, which are biased towards perceptually relevant scales. This results in better image quality and more efficient training. [Per SD3 Research Paper announcement]

2B Model sample images, generated at native resolution, single generation (no upscale, or further tweaks). Source.

Where do we get it?

You don’t (yet!) – the model weights will be released some time on June 12th, and will be available for download from Civitai.com shortly thereafter. Keep an eye out!

How do we use it NOW!?

A version of SD3 is currently available for inference on the Stabilit y A PI, along with a Stable Diffusion 3 Turbo version.

?? We do know that the model running on the API is not the same model which will be released on June 12th!

Hardware Requirements

GPU requirements for local inference haven’t been officially communicated, but we know the following, from Stability AI employees, via X;

“6GB (VRAM) can probably run this easily with TEs on CPU. 11GB all in GPU with offloading.” Sour c e

License Considerations

SD3 will be released with a non-commercial use license only. Inference/Generation services wishing to host SD3 on their platforms must contact Stability to discuss Enterprise licensing options.

Local Interfaces for SD3

At time of writing, the favored interface for all Stability releases is ComfyUI. We’ve seen confirmation from Stability staff, via Reddit, that ComfyUI and Stable Swarm, along with anything powered by Diffusers, will have day 1 support.

It’s also indicated that Auto1111 development team are discussing the changes required in their private channels. Source.

Training SD3

We haven’t seen much talk from Stability regarding training against SD3, however, we do know that to; “… expect SD3-Medium training requirements to be similar and slightly lower than SDXL.” S o urce.

We also know, from the SD3 announcement email, that the model is “Capable of absorbing nuanced details from small datasets, making it perfect for customization and creativity.“

Any Other Details?

A Stability AI employee announced, via Reddit, that they will be releasing the other SD3 model versions – for free – as they’re finished training, including Small (1B parameter), Large (4B parameter), and Huge (8B Parameter) versions.

EDUCATION