Guide to WAN Video Fast Generation

I have played around some with a few of the WAN video models the past couple of days, and I noticed that many of the workflows I have found are not optimized. By rebuilding a few of them I’ve managed to speed up the generation time by around 40-60%, and I thought I’d share some tips with you.

WAN Multitalk: The Lip-Sync Specialist

What it is: This model’s superpower is audio-driven animation. It takes an audio file and a face, and it generates a video with remarkably accurate lip-syncing.

When to use it: Any time you need a character to speak. This is perfect for creating digital avatars, narrated shorts, or AI influencers who need to deliver dialogue.

The Power: This is the magic behind the “Nova” video I created. I fed it an audio file of my AI partner’s “voice,” and Multitalk animated her face to match the words. It’s a specialized tool, but for its specific purpose, it’s absolutely brilliant.

The only downside I can think of with this model is that it requires quite a lot of GPU power and VRAM, which forced me to pick the GGUF models.

Download Mulititalk GGUF models here:

Main model: wan2.1-i2v-14b-480p-q5_k_m

Multitalk: WanVideo_2_1_Multitalk_14B_fp8_e4m3fn

Tex Encoder: umt5-xxl-encoder-gguf

VAE: Wan2_1_VAE_bf16

ControlNet: Wan21_Uni3C_controlnet_fp16

Clipvision: clip_vision_vit_h

Lora: Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32

Workflow: Multitalk workflow

Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.

Support the Project

WAN Vace: The Cinematic Animator

What it is: This is your workhorse for creating high-quality, motion-rich videos from either a text prompt or a reference image.

When to use it: When you want to animate a scene, create dynamic camera movements, or bring a static image to life with detailed motion.

The Power: After completely rebuilding the standard workflow, I’ve found Vace to be incredibly fast and controllable.

WAN Vace is really quick for text to video, and I can generate 4 sec video in 260 sec using GGUF models!

Download Vace here:

WAN Vace: Wan2.1_14B_VACE-Q4_K_M

Workflow: Wan Vace workflow

Use the same VAE, Text encoder and Lora as with Multitalk above.

WAN Vace: Reference Image

What it is: This is your workhorse for creating high-quality, motion-rich videos from either a text prompt with a reference image.

When to use it: When you want extra high quality simmilar to an image you have.

The videos created with reference image is really a lot higher quality, but takes a bit longer to generate.

Use the same model as the WAN Vace above, and the same VAE, Text encoder and Lora as with Multitalk.

Download workflow: Wan Vace Reference image workflow

Guide to WAN Video Fast Generation

WAN Multitalk: The Lip-Sync Specialist

WAN Vace: The Cinematic Animator

WAN Vace: Reference Image

Fueling Independent Research

Creepybits Newsleter

Thank you!