Animate Any Image in ComfyUI with WAN 2.2 with GGUF

I built a complete image-to-video workflow in ComfyUI using the WAN 2.2 Animate model in GGUF format. The goal is simple: take any single image, transfer motion from a reference video, keep lighting stable, and extend the clip well past a few seconds while preserving visual quality.
This article shows my exact setup and node connections, how I organized the workflow into groups, and two reliable methods to extend duration. I also include frame interpolation to smooth the result from 16 fps to 32 fps. Every step mirrors the process I used from start to finish.
Wan Animate Comfyui
The workflow animates a still image using motion extracted from a source video. It runs in ComfyUI and uses:
- WAN 2.2 Animate (GGUF) as the core animation model
- Light X2V LoRA for image-to-video conditioning
- WAN Animate Relight LoRA for consistent lighting across frames
- DW Pose (OpenPose-style) to extract pose from the source video
- Clip Vision H to align the reference image with the motion
- UMT5-XL FP8 text encoder for prompts
- VAE 2.1 for decoding latent frames into images
- RIFE frame interpolation to smooth and double the frame rate
Everything is organized in modular groups so you can adjust width, height, prompts, and inputs from centralized nodes.
Wan-Animate comfyui Overview
| Section | Purpose | Key Nodes | Outputs |
|---|---|---|---|
| Settings | Centralize resolution and FPS | Primitive (Width/Height), Set/Get | Width, Height, FPS constants |
| Load Video + Controller | Load and resize the video, extract pose | Load Video (Video Helper Suite), Upscale Image, DW Pose, Pixel Perfect Resolution | Input Video, Pose Video |
| Load Reference Image | Import still image and encode with Clip Vision | Load Image, Clip Vision Encode, Clip Vision Loader | Reference Image, Clip Vision Output |
| Load Models + Prompts | Load WAN 2.2 GGUF, LoRAs, CLIP text, VAE | UNet Loader GGUF, LoRA Loader (x2), Model Sampling SD3, Load Clip, Clip Text Encode (pos/neg), Load VAE | Model, Positive Prompt, Negative Prompt, VAE |
| Sampling | Assemble and sample latent video, decode, and save | WAN Animate to Video, KSampler, Trim Video Latent, VAE Decode, Video Combine | Base video output at 16 fps |
| Frame Interpolation | Double frame rate and smooth motion | RIFE, Video Combine | 32 fps smoothed video |
| Extension 1 & 2 | Extend length with frame offsets and continual motion | WAN Animate to Video (x2), Batch Image, Set/Get | Longer stitched video outputs |
Key Features
- Motion transfer from any video to any image using DW Pose and WAN 2.2 Animate
- Lighting consistency across frames with the Relight LoRA
- Centralized Set/Get automation for width, height, inputs, and outputs
- Promptable results with positive and negative conditioning
- Extendable length using Video Frame Offset and Batch Image stitching
- Smooth playback with RIFE frame interpolation to 32 fps
- Clean H.264 output at 16 or 32 fps with controllable CRF
Prepare Your Environment and Files
Download the Required Models
Place downloads in the indicated folders under your ComfyUI/models directory.
- WAN 2.2 Animate (GGUF)
- File: WAN 2.2 Animate GGUF (e.g., Q4_K_M)
- Folder: models/unet
- Light X2V LoRA (image-to-video, rank 64)
- Folder: models/loras
- WAN Animate Relight LoRA
- Folder: models/loras
- UMT5-XL FP8 Text Encoder
- Folder: models/text_encoders
- WAN 2.1 VAE
- Folder: models/vae
- Clip Vision H
- Folder: models/clip_vision
After placing the files, update ComfyUI with update.bat (Windows) or your usual update method.
Verify Custom Nodes in Manager
Install and confirm these custom nodes:
- ControlNet Preprocessor
- KJ Nodes
- Video Helper Suite
- Use Everywhere
- ComfyUI Easy
Everything else used below is native or part of the above sets.
Build the Base Workflow
Settings and Input Video
Create centralized settings and a clean video input path.
-
Create width and height primitives:
- Width: 720
- Height: 1280 (vertical)
- Add Set nodes to store them as constants named “width” and “height”.
- Group this mini-setup as “Settings”.
-
Load and resize the source video:
- Node: Load Video (Video Helper Suite)
- FPS: 16
- Format: 1
- Node: Upscale Image
- Method: Lanczos
- Use Get nodes for width and height; set Crop: Center
- Preview with Preview Animation (16 fps) to confirm scaling
- Add Set node for the resized video, constant name: “input video”
- Node: Load Video (Video Helper Suite)
This ensures every part of the workflow uses the same resolution and frame rate.
Extract Pose with ControlNet
Generate pose guidance for motion transfer.
- Node: DW Pose Estimator
- Face: Disabled
- Node: Pixel Perfect Resolution (ControlNet Aux)
- Feed the original video and apply the same width and height
- Connect and preview with Preview Animation (16 fps)
- Add Set node for pose output, constant name: “pose video”
- Group these as “Load Video + Controller”
Load the Reference Image and Clip Vision
Import the still image that will be animated and prepare Clip Vision features.
- Node: Load Image (your reference image)
- Node: Clip Vision Encode
- Node: Clip Vision Loader → Model: clip_vision_h
- Crop: None
- Add Set node for image output, constant: “reference image”
- Add Set node for Clip Vision output, constant: “clip vision output”
- Group them as “Load Reference Image”
Load Models and Prompts
Load the model stack, LoRAs, text encoders, and VAE.
-
WAN 2.2 Animate (GGUF):
- Node: UNet Loader GGUF
- Select WAN 2.2 Animate GGUF (e.g., Q4_K_M)
-
LoRAs:
- Node: LoRA Loader (Model Only) → Light X2V (image-to-video, rank 64)
- Duplicate LoRA Loader → WAN Animate Relight LoRA
- Chain the loaders to the UNet
- Node: Model Sampling SD3 → Shift: 8
-
Prompts and text encoders:
- Node: Load Clip → UMT5-XL FP8 → Type: 1
- Node: Clip Text Encode (Positive)
- Node: Clip Text Encode (Negative)
- Feed Load Clip into negative as well
-
VAE:
- Node: Load VAE → WAN 2.1 VAE
-
Automation with Set nodes:
- Model → Set constant: “model”
- Positive conditioning → Set constant: “positive prompt”
- Negative conditioning → Set constant: “negative prompt”
- VAE → Set constant: “VAE”
-
Group all as “Load Models + Prompts”
Assemble and Sample
Connect everything into the WAN Animate node, sample, decode, and save.
-
Node: WAN Animate to Video
- Use Get nodes (Constants) to connect:
- Positive Prompt → positive input
- Negative Prompt → negative input
- VAE → VAE input
- Clip Vision Output → clip vision output input
- Reference Image → reference image input
- Pose Video → pose video input
- Width and Height → resolution inputs
- Length: 65
- Use Get nodes (Constants) to connect:
-
Sampling and decoding:
- Node: KSampler
- Seed: Fixed
- Steps: 6
- CFG: 1
- Sampler: Euler
- Scheduler: Simple
- Connect WAN Animate outputs to KSampler
- Node: Trim Video Latent → feed Trim Latent from WAN Animate
- Node: VAE Decode
- Use Get node → constant “VAE” → VAE Decoder input
- Node: KSampler
-
Save video:
- Node: Video Combine (Video Helper Suite)
- FPS: 16
- Format: H.264
- CRF: 15
- File Name: set your path and base filename
- Add Set node for combined output, constant: “output video”
- Group as “Sampling”
- Node: Video Combine (Video Helper Suite)
Run the graph to generate your first animation. At this point, motion should transfer, but you may want to refine it with prompts.
Prompting for Control
Prompts influence visual stability and interactions.
- Add a clear positive prompt describing the subject, apparel, materials, and scene.
- Add targeted negative prompts to suppress unwanted artifacts.
Run again and compare. You should observe better contact, cloth behavior, and scene coherence.
Extend Duration
Extension 1: Frame Offset and Stitch
Duplicate the sampling structure to create a second segment and stitch both segments into a longer clip.
-
Copy the entire “Sampling” group and paste below (Ctrl+C, Ctrl+Shift+V).
-
On the new WAN Animate to Video:
- Connect Video Frame Offset:
- From the first WAN Animate’s Video Frame Offset → to the second WAN Animate’s Video Frame Offset.
- Keep Length the same (e.g., 65).
- Connect Video Frame Offset:
-
Merge both segments:
- Node: Batch Image (native)
- Image 1: Get → “output video” from the first result
- Image 2: VAE Decode output from the new segment
- This stacks the two sequences.
- Node: Batch Image (native)
-
Smooth the motion:
- Node: RIFE (Frame Interpolation)
- Connect Batch Image output to RIFE
- Node: Video Combine (first instance)
- Save Output: False (don’t save the pre-interpolated merge)
- Duplicate Video Combine to the right:
- Connect RIFE output
- FPS: 32 (doubles the original 16)
- File Name: use a new filename (e.g., frame_interpolation)
- Save Output: True
- Node: RIFE (Frame Interpolation)
-
Group this interpolation setup as “Frame Interpolation” and run.
You should see a longer, smoother result. In my test, this extended a ~3–4 second sequence to ~8 seconds at 32 fps.
Extension 2: Continual Motion and Second Stitch
Add a third segment with continual motion for even longer results.
-
Copy the Extension 1 block (the second WAN Animate to Video chain) and paste below as “Extension 2”.
-
Adjust settings:
- On the new WAN Animate to Video:
- Strength: 77
- Video Frame Offset: connect from Extension 1’s WAN Animate (replace previous offset connection so it continues from segment 2)
- On the new WAN Animate to Video:
-
Continual Motion VAE:
- From Extension 1’s VAE Decode:
- Add Set node → constant: “continue motion”
- In Extension 2:
- Use Get → “continue motion” to feed the VAE continual motion input (if present) or any continuation input that your node provides.
- From Extension 1’s VAE Decode:
-
Stitch Extension 2:
- Batch Image:
- Image 1: connect from Extension 1’s stitched output (replace any earlier Image 1 link)
- Image 2: VAE Decode from Extension 2
- Feed Batch Image into RIFE and then into a new Video Combine at 32 fps
- Use a new file name for the final save
- Batch Image:
-
Rename groups:
- Extension 1
- Extension 2
- Frame Interpolation
Run the full graph. You should get a longer clip (e.g., ~12 seconds). If you notice pauses, that means you’ve exceeded the useful motion in the original video.
Recommendations and Practical Settings
Duration and Length
- Keep total output near or under 10 seconds for reliable continuity.
- A Length of 65 frames for WAN Animate produced strong motion transfer in testing.
- If your result is shorter than expected, add one extension at a time, then interpolate.
Frame Rate and Quality
- Base video: 16 fps in Video Combine for initial sampling
- Interpolated video: 32 fps via RIFE for fluid motion
- Encoding: H.264, CRF ≈ 15 for a good balance between size and quality
KSampler
- Seed: Fixed (for reproducibility)
- Steps: 6 (works well with Light X2V)
- CFG: 1
- Sampler: Euler
- Scheduler: Simple
Organization and Automation
Group and automate to keep the workflow clean:
- Settings
- Load Video + Controller (DW Pose + Pixel Perfect Resolution)
- Load Reference Image (Clip Vision)
- Load Models + Prompts (WAN 2.2 GGUF + LoRAs + UMT5-XL FP8 + VAE)
- Sampling (WAN Animate → KSampler → Trim → VAE Decode → Video Combine)
- Extension 1 (offset and stitch)
- Extension 2 (continual motion and stitch)
- Frame Interpolation (RIFE → Video Combine @ 32 fps)
Use Set/Get nodes for:
- width, height
- input video
- pose video
- reference image
- clip vision output
- model
- positive prompt
- negative prompt
- VAE
- output video
- continue motion (for Extension 2)
Step-by-Step Quick Guide
1) Models and Nodes
- Download and place:
- WAN 2.2 Animate (GGUF) → models/unet
- Light X2V LoRA (image-to-video, rank 64) → models/loras
- WAN Animate Relight LoRA → models/loras
- UMT5-XL FP8 → models/text_encoders
- WAN 2.1 VAE → models/vae
- Clip Vision H → models/clip_vision
- Update ComfyUI
- Install custom nodes:
- ControlNet Preprocessor
- KJ Nodes
- Video Helper Suite
- Use Everywhere
- ComfyUI Easy
2) Global Settings and Video
- Primitives: width=720, height=1280 → Set constants
- Load Video (16 fps, format=1) → Upscale (Lanczos, center crop) → Set “input video”
3) Pose Extraction
- DW Pose Estimator (Face: disabled)
- Pixel Perfect Resolution (match width/height)
- Set “pose video”
4) Reference Image and Clip Vision
- Load Image → Clip Vision Encode
- Clip Vision Loader: clip_vision_h (Crop: None)
- Set: “reference image” and “clip vision output”
5) Models and Prompts
- UNet Loader GGUF → WAN 2.2 Animate
- LoRA Loader (Model Only) → Light X2V; duplicate → WAN Animate Relight
- Model Sampling SD3 (Shift: 8)
- Load Clip → UMT5-XL FP8 (Type: 1)
- Clip Text Encode: Positive and Negative
- Load VAE → WAN 2.1 VAE
- Set constants: “model”, “positive prompt”, “negative prompt”, “VAE”
6) Assemble and Sample
- WAN Animate to Video:
- Connect positive, negative, VAE, clip vision output, reference image, pose video, width, height
- Length: 65
- KSampler:
- Seed: Fixed; Steps: 6; CFG: 1; Sampler: Euler; Scheduler: Simple
- Trim Video Latent → VAE Decode (Get VAE)
- Video Combine (16 fps, H.264, CRF 15)
- Set: “output video”
- Run once to verify
7) Prompting
- Add a descriptive positive prompt
- Include negative prompts for artifacts
- Run and compare
8) Extension 1
- Copy Sampling group
- Connect Video Frame Offset from first WAN Animate → second WAN Animate
- Batch Image: Image 1 (Get “output video”), Image 2 (new VAE Decode)
- RIFE → Video Combine (32 fps, save true)
- Run to extend and smooth
9) Extension 2
- Copy Extension 1 and paste as Extension 2
- WAN Animate Strength: 77
- Video Frame Offset: from Extension 1
- Set from Extension 1 VAE Decode → “continue motion”; Get in Extension 2
- Batch Image: stitch Extension 1 with Extension 2
- RIFE → Video Combine (32 fps, new name)
- Run and finalize
File and Folder Overview (Table)
| Component | Recommended File | Folder |
|---|---|---|
| WAN 2.2 Animate (GGUF) | Q4_K_M or similar | models/unet |
| Light X2V LoRA | Light X2V (image-to-video, rank 64) | models/loras |
| WAN Animate Relight LoRA | Relight LoRA | models/loras |
| UMT5-XL FP8 | umt5-xl-fp8 | models/text_encoders |
| WAN 2.1 VAE | VAE 2.1 | models/vae |
| Clip Vision H | clip_vision_h | models/clip_vision |
Final Notes
- Keep clips under ~10 seconds for best consistency. Longer than the source motion can cause pauses.
- Length=65 produced reliable motion transfer across tests; you can experiment if you need different pacing.
- Matching the first reference frame to the source video helps, but it’s not mandatory. The setup still works with a different starting frame.
- The modular grouping and Set/Get constants keep the canvas clean and make adjustments fast.
This is a complete, reproducible setup to animate images with WAN 2.2 in ComfyUI using GGUF, extend durations, and output smooth videos at 32 fps.
Recent Posts

Wan 2.2 Animate Guide: The Best AI Character Animation Yet
Discover why Wan 2.2 Animate sets a new standard for AI character animation. Learn how it works, fluid character swaps, step-by-step setup, and pro tips.

Wan 2.2 Animate: AI Character Swap & Lip‑Sync in ComfyUI
Learn AI character swap and lip‑sync in ComfyUI with Wan 2.2 Animate—drive motion from your video and restyle shots, all free, step by step.

WAN Animate v2: Infinite AI Videos in ComfyUI
Learn how to use WAN Animate v2 in ComfyUI to generate endless AI videos. Download the new model, explore the detail enhancer workflow, and see the full model list.