Last Updated | Changes |
11/14/2024 | First Version |
1/15/2025 | Bring Your Own Image – Img2Video From Any Image! |
8/27/2025 | New Video Generation options |
Text2Video & Image2Video in the Generator?
We’re thrilled to announce a major new feature in the Civitai Generator – video generation! You can now create short video clips directly from text prompts, or by using a “starter” image. This exciting addition opens up fresh creative possibilities, making it easier than ever to bring your ideas to life in motion!
Over time, we’ll be adding more video generation tools to give you even more style options, control, and flexibility in your video creations. As we continue to expand, you’ll notice new tools, interface tweaks, and enhancements that make creating and customizing your videos more intuitive and powerful. Stay tuned for updates as we refine and grow this feature based on your feedback and our ongoing improvements and partnerships!
Note: For detailed information on navigating the Civitai Generator’s image creation interface, see our Guide to the Civitai Image Generator.
Tools
We currently have a number of choices for video generation, with more in the pipeline! Please note that the Buzz costs for each generation service are current at time of writing, but are subject to change.
Google’s Veo3
Veo?3 is Google DeepMind’s most advanced text-to-video generation model, released in May 2025, and marks a major leap beyond silent AI video. Now, it natively creates not only photorealistic visuals but also perfectly synchronized audio – everything from dialogue and ambient sound to music – all in one seamless output. The model excels at realistic physics, accurate lip-sync, and visual fidelity, enabling creators to bring cinematic scenes to life with just a prompt or an image input.
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived | |
Fast Mode | A faster, cheaper, less high-fidelity mode | Reduces base Buzz cost |
Standard | The base, default model quality | |
Prompt Enhancer | Adds cinematic flair, detail, and polish to prompts. | Increases base Buzz cost |
Generate Audio | When enabled, will generate sound effects, speech, and/or music as specified. | Increases base Buzz cost |
Be aware that Veo 3 is a PG-only model due to Google’s strict policies. Any use of profanity or sexually explicit language in the prompt will result in a generic or unrelated PG video output, and you will not receive a refund.
Vidu Q1
Vidu delivers cutting-edge text-to-video generation, enabling users to create dynamic, visually compelling videos from simple prompts. Whether you’re aiming for cinematic flair, artistic stylization, or unique visual effects, Vidu offers a wide range of high-quality output styles to suit your creative needs. Support for image-to-video generation is also on the horizon! Check out Vidu’s prompt guide for more information on how to get the best out of this phenomenal model!
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived | |
Reference to Video | Accepts multiple images (up to 7) from which aspects can be drawn to form the output video | |
Prompt Enhancer | Adds cinematic flair, detail, and polish to prompts. | Does not change base Buzz cost |
Style | General / Animation Style | Does not change base Buzz cost |
Movement Amplitude | The desired amount of movement – Auto / Small / Medium / Large | Does not change base Buzz cost |
Hailuo by MiniMax
Hailuo, developed by the Chinese company MiniMax, is an innovative AI video generator that transforms text prompts and images into dynamic, high-quality videos. Launched in September 2024, it allows users to create videos using both text-to-video and img-to-video workflows, with a range of aspect ratios supported.
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived | |
Prompt Enhancer | Adds cinematic flair, detail, and polish to prompts. | Does not change base Buzz cost |
Kling
Kling AI, developed by Kuaishou Technology, is a cutting-edge text-to-video and image-to-video generation model that enables users to create high-quality videos from text descriptions or static images. Launched in June 2024, Kling can produce videos, accurately simulating real-world physics and complex motions. Its features include advanced camera controls, motion brushes for precise object movement, which aren’t available on Civitai.com’s Generator just yet, but we’re working on those!
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived | |
Version | v1.6 / v2 – Kling model versions | v2 model increases base Buzz cost |
Duration | 5 seconds / 10 seconds | 10 second option increases base Buzz cost |
Mode | Standard quality / Professional Quality selector | Professional option increases base Buzz cost |
Lightricks LTXV
Lightricks LTX-Video was the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768×512 resolution practically faster than they can be watched! It works in text-to-video and image-to-video modes, and it was the first video service running entirely in-house, on our own GPU hardware!
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived |
Haiper
Haiper.ai’s 2.5 model, is a robust and advanced engine designed for producing both text-to-video and image-to-video content. This model brings a seamless, straightforward approach to video creation, requiring minimal setup to achieve impressive results.
Haiper.ai offer a range of helpful resources to ensure you get the most out of their video generation service. Since prompting for text-to-video can be a bit different than what we’re used to with text-to-image models on Civitai, these resources are designed to guide you through the nuances of video-focused prompts.
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived | |
Duration | 2 seconds / 4 seconds / 8 seconds video length | Modifies base Buzz cost |
Mochi
Mochi 1 preview, by creators Genmo, is an open source, state-of-the-art, video generation model with high-fidelity motion and strong prompt adherence. Along with on-site Generation, Mochi can be used offline/locally, with ComfyUI and at least 12 GB of VRAM!
You can read our full Quickstart Guide to Mochi, here!
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived | |
Prompt Enhancer | Adds cinematic flair, detail, and polish to prompts | Does not change base Buzz cost |
Hunyuan
Hunyuan is Tencent’s powerful text-to-video generation model, now available on our platform with full video LoRA support. This allows for greater flexibility and fine-tuning of visual outputs to better match specific styles or themes. While currently offered in text-to-video mode, we are working to enhance fidelity, improve overall quality, and accelerate generation speeds to deliver an even better experience.
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived | |
Duration | 3 seconds / 5 seconds | 5 second option increases base Buzz cost |
Wan 2.1
Wan 2.1 is Alibaba’s foray into text-to-video technology, offering visually rich and coherent video outputs from natural language prompts. This model emphasizes strong scene consistency, smooth motion, and detailed artistic styles. It’s an excellent choice for creators looking to produce high-quality videos with minimal input. We’re continuing to monitor performance and explore expanded functionality – it’s not a lightweight model and requires significant hardware resources! Note: Wan 2.1 runs on our own hardware and can be used for adult content generation.
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived | |
Duration | 3 seconds / 5 seconds | 5 second option increases base Buzz cost |
Additional Resources | Accepts LoRAs | Each LoRA carries an additional Buzz cost |
Wan 2.2
Wan 2.2 is the latest open-source, multimodal video generation model from Alibaba’s Wan AI platform. It introduces a Mixture-of?Experts (MoE) architecture, blending a high?noise and a low?noise models to significantly boost video quality.
This upgrade also brings huge gains in complex motion generation, having been trained on 65.6% more images and 83.2% more videos than its predecessor, Wan?2.1 – resulting in better generalization across semantics, dynamics, and aesthetics.
Note that Wan 2.2 is currently leveraging Fal.ai‘s generation API and is not hosted on our own in-house hardware. Generations must abide by Fal’s content policies. We are hoping to move Wan 2.2 generations onto our own Civitai hardware in the near future.
Generation Options | Details | Changes Buzz Cost? |
---|---|---|
v2.2 | Latest, full, Wan version | |
v2.2-5b | Latest, 5B parameter Wan version | |
Text to Video | Accepts descriptive text prompts which are translated into video | |
Image to Video | Accepts an image from which the video is derived | |
Turbo Mode (v2.2) | Applies an 8 step LoRA, faster generations, output quality somewhat reduced | Does not change base Buzz cost |
Draft Mode (v.2.2-5b) | Switches to a Distilled Model, faster generations, output quality somewhat reduced | Does not change base Buzz cost |
Resolution | 480p / 720p / 580p (v2.2-5b) | Higher resolutions increase baseBuzz cost |
Duration | 3 seconds / 5 seconds | Does not change base Buzz cost |
Shift | Influences the level of noise, and the level of motion/quality of output | Does not change base Buzz cost |
Image To Video
Some tools let you use a “starter” or initialization image as the starting point for a video.
To try this, pick any image from your Generation Queue or Feed, then choose the “Image to Video” option from the context menu, or the Magic Wand menu, shown on any image.
The image will be loaded into the la Image to Video interface, where a prompt can be provided, specifying the desired movement.

There are multiple video tools with a capacity for image to video – when an image is loaded into the interface, you can select which you’d prefer to use from the dropdown list;

Bring Your Own Image – Img2Video From Any Image!
Bring Your Own Image (BYOI) lets you easily import images for our img2video services.
Drag images directly from Civitai, other websites, or your local PC into the drop-zone, or use an image URL. Supported formats include .png, .jpeg, .jpg, and .webp, with a maximum file size of 16 MB.
Images uploaded to the generator via BYOI are not posted to your Civitai Profile.
Be aware that uploaded images may be processed off-site by third-party video generation API providers and are subject to their policies and Terms of Service. Each service has its own content rules. If your request is denied by the partner API, you’ll receive a notification, and your Buzz will be refunded.
Note that when using BYOI, non-AI generated images, or AI images which we can’t read the metadata of, will only be valid for PG-rated outputs.
Limitations
- This is the first iteration of video generation to feature on Civitai.com, and as such it may lack some of the polish of other generation features! We’ll be improving video generation over time by tweaking the interface, introducing new quality and speed options, and implementing new Tools.
- Pricing! We know video gen is pricey; it’s cutting-edge, resource-demanding technology. We’re actively working with our inference partners to explore options that make it more accessible for everyone. Prices for all video generation options are subject to change!
- Terms of Service. While generating videos, you must adhere to Civitai’s standard Terms of Service. Additionally, please ensure you follow the Terms of Service specific to the tool you’re using for video generation.

- Results. Video generation can be unpredictable – getting the perfect output may take a few tries, as each result can vary, widely. Please note that we cannot refund generations that complete successfully but don’t meet your expectations.
Additionally, once a generation begins processing, it’s locked-in and cannot be canceled. You might be able to cancel immediately after starting, but once generation is underway, it’s committed. Of course, if the generation fails, refunds will be provided.
- Provider instability. Since many of the video options available via the Civitai Generator depend on external API partners to handle video generation requests, occasional failures, outages, or disruptions may occur that are outside of Civitai’s direct control.
- Kling Generation may return a “No Provider Supports this Job” error which is related to their API and not something we have control over. Jobs which return this error will be refunded instantly.
- Hailuo by Minimax does not have the concept of a seed. If running the same prompt, or the same input image with the same prompt, you’ll receive the same output. You will not be charged for this duplicate output.
- Hailuo by Minimax and Kling have relatively small concurrency limits; the number of jobs we can send to their API at the same time. At peak times, there may be a queue!
I need more help!
If you’re experiencing issues generating Video with the Civitai Image Generator and a solution isn’t mentioned on this page, please reach out to our Support Team at [email protected].