Civitai’s Quickstart Guide to Mochi 1 text2video!

Last UpdatedChanges
11/6/2024First Version
11/14/2024Mochi – now on-site at Civitai.com!

Update! Mochi Available on Civitai.com!

That’s right! We’ve just pushed our first text2img and img2img tools to Civitai.com’s Generator, and Mochi is one of them! Check out the full Guide to Video in the Civitai Generator for details!

Mochi 1?

Mochi 1 preview, by creators Genmo, is an open source, state-of-the-art, video generation model with high-fidelity motion and strong prompt adherence.

This model dramatically closes the gap between closed and open video generation systems, and it’s released under the permissive Apache 2.0 license.

Genmo’s Mochi 1 ranked at #2, based upon ELO scoring, HuggingFace Video Generation Arena Leaderboard

Even better, we can now run it locally on mid-tier consumer GPUs*! A huge development from launch, when it required 3 x H100 GPUs to output results!

The model can currently output videos in 480p, but an HD model is slated to appear later this year.

Output Examples

Mochi in Civitai’s Generator

It’s here! Check out our Guide to Video in the Civitai Generator!

Local Generation with Mochi

Required Files

The official Mochi weights are available on Civitai, along with the Mochi VAE, and text encoders. You may already have one of the required text encoders from a previous SD 3/3.5 installation.

Note that local video generation with these Mochi models is, at time of writing, currently only available via ComfyUI;

ModelDownload LocationDownload Source
Mochi 1 Preview BF16models/diffusion_modelsCivitai
Mochi 1 FP8 Scaledmodels/diffusion_modelsCivitai
Mochi VAEComfyUI/models/vaeCivitai
T5XXL FP16models/clipCivitai
T5XXL FP8 e4m3fn Scaledmodels/clipCivitai

ComfyUI

ComfyUI added native Mochi support in early November, allowing anyone with a consumer GPU to generate locally. 24 GB+ of VRAM is recommended, but we’ve seen reports of Mochi already running on 12GB VRAM systems with some 3rd party nodes and wrappers!

Getting started in ComfyUI is as simple as;

  1. Updating ComfyUI to the latest version which includes Mochi Support
  2. Download the weights (see above)
  3. Download a text encoder (see above)
  4. Download the VAE (see above)
  5. Download and load the example workflow (see below) – plug in the models – and start generating!

Sample Workflow

Low VRAM Options

If you have a GPU with less than 24GB of VRAM, you can try the following options to get running, but beware, generation times might be significant!

If you’re still having trouble running on your hardware, you can check out ComfyUI-MochiWrapper by creator Kijai, which offers significant speed boosts at the expense of some quality, via quantized GGUF models and custom nodes;

Kijai ModelDownload Source
Mochi 1 preview GGUF Q4 0 V1HuggingFace
Mochi 1 preview GGUF Q4 0 V2HuggingFace
Mochi 1 preview GGUF Q8 0HuggingFace
Mochi 1 preview BF16 VAE DecoderHuggingFace
Mochi 1 preview BF16 VAE EncoderHuggingFace
Mochi 1 preview FP32 VAE EncoderHuggingFace

Limitations

  • Pricing! The pricing of Mochi in the Civitai Generator is subject to change as we discuss options with our Partners. We’re aware that the cost for Mochi generation is high, and that’s something we’re hoping to address.
  • Complexity! While we now have native ComfyUI support for local generation, this is still a complex model to get running offline. The information above should provide the basics to getting started, but some further reading and experimentation will be required to get the most out of Mochi in an offline environment!

I need more help!

If you’re experiencing issues generating Video with the Civitai Generator and a solution isn’t mentioned on this page, please reach out to our Support team at [email protected]. If you’re having trouble setting up Mochi for local generation, please join the Civitai Discord and seek assistance in the #ai-help channel!