Animate Any Image in ComfyUI with WAN 2.2 with GGUF

I built a complete image-to-video workflow in ComfyUI using the WAN 2.2 Animate model in GGUF format. The goal is simple: take any single image, transfer motion from a reference video, keep lighting stable, and extend the clip well past a few seconds while preserving visual quality.

This article shows my exact setup and node connections, how I organized the workflow into groups, and two reliable methods to extend duration. I also include frame interpolation to smooth the result from 16 fps to 32 fps. Every step mirrors the process I used from start to finish.

Wan Animate Comfyui

The workflow animates a still image using motion extracted from a source video. It runs in ComfyUI and uses:

WAN 2.2 Animate (GGUF) as the core animation model
Light X2V LoRA for image-to-video conditioning
WAN Animate Relight LoRA for consistent lighting across frames
DW Pose (OpenPose-style) to extract pose from the source video
Clip Vision H to align the reference image with the motion
UMT5-XL FP8 text encoder for prompts
VAE 2.1 for decoding latent frames into images
RIFE frame interpolation to smooth and double the frame rate

Everything is organized in modular groups so you can adjust width, height, prompts, and inputs from centralized nodes.

Wan-Animate comfyui Overview

Section	Purpose	Key Nodes	Outputs
Settings	Centralize resolution and FPS	Primitive (Width/Height), Set/Get	Width, Height, FPS constants
Load Video + Controller	Load and resize the video, extract pose	Load Video (Video Helper Suite), Upscale Image, DW Pose, Pixel Perfect Resolution	Input Video, Pose Video
Load Reference Image	Import still image and encode with Clip Vision	Load Image, Clip Vision Encode, Clip Vision Loader	Reference Image, Clip Vision Output
Load Models + Prompts	Load WAN 2.2 GGUF, LoRAs, CLIP text, VAE	UNet Loader GGUF, LoRA Loader (x2), Model Sampling SD3, Load Clip, Clip Text Encode (pos/neg), Load VAE	Model, Positive Prompt, Negative Prompt, VAE
Sampling	Assemble and sample latent video, decode, and save	WAN Animate to Video, KSampler, Trim Video Latent, VAE Decode, Video Combine	Base video output at 16 fps
Frame Interpolation	Double frame rate and smooth motion	RIFE, Video Combine	32 fps smoothed video
Extension 1 & 2	Extend length with frame offsets and continual motion	WAN Animate to Video (x2), Batch Image, Set/Get	Longer stitched video outputs

Key Features

Motion transfer from any video to any image using DW Pose and WAN 2.2 Animate
Lighting consistency across frames with the Relight LoRA
Centralized Set/Get automation for width, height, inputs, and outputs
Promptable results with positive and negative conditioning
Extendable length using Video Frame Offset and Batch Image stitching
Smooth playback with RIFE frame interpolation to 32 fps
Clean H.264 output at 16 or 32 fps with controllable CRF

Prepare Your Environment and Files

Download the Required Models

Place downloads in the indicated folders under your ComfyUI/models directory.

WAN 2.2 Animate (GGUF)
- File: WAN 2.2 Animate GGUF (e.g., Q4_K_M)
- Folder: models/unet
Light X2V LoRA (image-to-video, rank 64)
- Folder: models/loras
WAN Animate Relight LoRA
- Folder: models/loras
UMT5-XL FP8 Text Encoder
- Folder: models/text_encoders
WAN 2.1 VAE
- Folder: models/vae
Clip Vision H
- Folder: models/clip_vision

After placing the files, update ComfyUI with update.bat (Windows) or your usual update method.

Verify Custom Nodes in Manager

Install and confirm these custom nodes:

ControlNet Preprocessor
KJ Nodes
Video Helper Suite
Use Everywhere
ComfyUI Easy

Everything else used below is native or part of the above sets.

Build the Base Workflow

Settings and Input Video

Create centralized settings and a clean video input path.

Create width and height primitives:
- Width: 720
- Height: 1280 (vertical)
- Add Set nodes to store them as constants named “width” and “height”.
- Group this mini-setup as “Settings”.
Load and resize the source video:
- Node: Load Video (Video Helper Suite)
  - FPS: 16
  - Format: 1
- Node: Upscale Image
  - Method: Lanczos
  - Use Get nodes for width and height; set Crop: Center
- Preview with Preview Animation (16 fps) to confirm scaling
- Add Set node for the resized video, constant name: “input video”

This ensures every part of the workflow uses the same resolution and frame rate.

Extract Pose with ControlNet

Generate pose guidance for motion transfer.

Node: DW Pose Estimator
- Face: Disabled
Node: Pixel Perfect Resolution (ControlNet Aux)
- Feed the original video and apply the same width and height
Connect and preview with Preview Animation (16 fps)
Add Set node for pose output, constant name: “pose video”
Group these as “Load Video + Controller”

Load the Reference Image and Clip Vision

Import the still image that will be animated and prepare Clip Vision features.

Node: Load Image (your reference image)
Node: Clip Vision Encode
- Node: Clip Vision Loader → Model: clip_vision_h
- Crop: None
Add Set node for image output, constant: “reference image”
Add Set node for Clip Vision output, constant: “clip vision output”
Group them as “Load Reference Image”

Load Models and Prompts

Load the model stack, LoRAs, text encoders, and VAE.

WAN 2.2 Animate (GGUF):
- Node: UNet Loader GGUF
- Select WAN 2.2 Animate GGUF (e.g., Q4_K_M)
LoRAs:
- Node: LoRA Loader (Model Only) → Light X2V (image-to-video, rank 64)
- Duplicate LoRA Loader → WAN Animate Relight LoRA
- Chain the loaders to the UNet
- Node: Model Sampling SD3 → Shift: 8
Prompts and text encoders:
- Node: Load Clip → UMT5-XL FP8 → Type: 1
- Node: Clip Text Encode (Positive)
- Node: Clip Text Encode (Negative)
- Feed Load Clip into negative as well
VAE:
- Node: Load VAE → WAN 2.1 VAE
Automation with Set nodes:
- Model → Set constant: “model”
- Positive conditioning → Set constant: “positive prompt”
- Negative conditioning → Set constant: “negative prompt”
- VAE → Set constant: “VAE”
Group all as “Load Models + Prompts”

Assemble and Sample

Connect everything into the WAN Animate node, sample, decode, and save.

Node: WAN Animate to Video
- Use Get nodes (Constants) to connect:
  - Positive Prompt → positive input
  - Negative Prompt → negative input
  - VAE → VAE input
  - Clip Vision Output → clip vision output input
  - Reference Image → reference image input
  - Pose Video → pose video input
  - Width and Height → resolution inputs
- Length: 65
Sampling and decoding:
- Node: KSampler
  - Seed: Fixed
  - Steps: 6
  - CFG: 1
  - Sampler: Euler
  - Scheduler: Simple
- Connect WAN Animate outputs to KSampler
- Node: Trim Video Latent → feed Trim Latent from WAN Animate
- Node: VAE Decode
  - Use Get node → constant “VAE” → VAE Decoder input
Save video:
- Node: Video Combine (Video Helper Suite)
  - FPS: 16
  - Format: H.264
  - CRF: 15
  - File Name: set your path and base filename
- Add Set node for combined output, constant: “output video”
- Group as “Sampling”

Run the graph to generate your first animation. At this point, motion should transfer, but you may want to refine it with prompts.

Prompting for Control

Prompts influence visual stability and interactions.

Add a clear positive prompt describing the subject, apparel, materials, and scene.
Add targeted negative prompts to suppress unwanted artifacts.

Run again and compare. You should observe better contact, cloth behavior, and scene coherence.

Extend Duration

Extension 1: Frame Offset and Stitch

Duplicate the sampling structure to create a second segment and stitch both segments into a longer clip.

Copy the entire “Sampling” group and paste below (Ctrl+C, Ctrl+Shift+V).
On the new WAN Animate to Video:
- Connect Video Frame Offset:
  - From the first WAN Animate’s Video Frame Offset → to the second WAN Animate’s Video Frame Offset.
- Keep Length the same (e.g., 65).
Merge both segments:
- Node: Batch Image (native)
  - Image 1: Get → “output video” from the first result
  - Image 2: VAE Decode output from the new segment
- This stacks the two sequences.
Smooth the motion:
- Node: RIFE (Frame Interpolation)
  - Connect Batch Image output to RIFE
- Node: Video Combine (first instance)
  - Save Output: False (don’t save the pre-interpolated merge)
- Duplicate Video Combine to the right:
  - Connect RIFE output
  - FPS: 32 (doubles the original 16)
  - File Name: use a new filename (e.g., frame_interpolation)
  - Save Output: True
Group this interpolation setup as “Frame Interpolation” and run.

You should see a longer, smoother result. In my test, this extended a ~3–4 second sequence to ~8 seconds at 32 fps.

Extension 2: Continual Motion and Second Stitch

Add a third segment with continual motion for even longer results.

Copy the Extension 1 block (the second WAN Animate to Video chain) and paste below as “Extension 2”.
Adjust settings:
- On the new WAN Animate to Video:
  - Strength: 77
  - Video Frame Offset: connect from Extension 1’s WAN Animate (replace previous offset connection so it continues from segment 2)
Continual Motion VAE:
- From Extension 1’s VAE Decode:
  - Add Set node → constant: “continue motion”
- In Extension 2:
  - Use Get → “continue motion” to feed the VAE continual motion input (if present) or any continuation input that your node provides.
Stitch Extension 2:
- Batch Image:
  - Image 1: connect from Extension 1’s stitched output (replace any earlier Image 1 link)
  - Image 2: VAE Decode from Extension 2
- Feed Batch Image into RIFE and then into a new Video Combine at 32 fps
- Use a new file name for the final save
Rename groups:
- Extension 1
- Extension 2
- Frame Interpolation

Run the full graph. You should get a longer clip (e.g., ~12 seconds). If you notice pauses, that means you’ve exceeded the useful motion in the original video.

Recommendations and Practical Settings

Duration and Length

Keep total output near or under 10 seconds for reliable continuity.
A Length of 65 frames for WAN Animate produced strong motion transfer in testing.
If your result is shorter than expected, add one extension at a time, then interpolate.

Frame Rate and Quality

Base video: 16 fps in Video Combine for initial sampling
Interpolated video: 32 fps via RIFE for fluid motion
Encoding: H.264, CRF ≈ 15 for a good balance between size and quality

KSampler

Seed: Fixed (for reproducibility)
Steps: 6 (works well with Light X2V)
CFG: 1
Sampler: Euler
Scheduler: Simple

Organization and Automation

Group and automate to keep the workflow clean:

Settings
Load Video + Controller (DW Pose + Pixel Perfect Resolution)
Load Reference Image (Clip Vision)
Load Models + Prompts (WAN 2.2 GGUF + LoRAs + UMT5-XL FP8 + VAE)
Sampling (WAN Animate → KSampler → Trim → VAE Decode → Video Combine)
Extension 1 (offset and stitch)
Extension 2 (continual motion and stitch)
Frame Interpolation (RIFE → Video Combine @ 32 fps)

Use Set/Get nodes for:

width, height
input video
pose video
reference image
clip vision output
model
positive prompt
negative prompt
VAE
output video
continue motion (for Extension 2)

Step-by-Step Quick Guide

1) Models and Nodes

Download and place:
- WAN 2.2 Animate (GGUF) → models/unet
- Light X2V LoRA (image-to-video, rank 64) → models/loras
- WAN Animate Relight LoRA → models/loras
- UMT5-XL FP8 → models/text_encoders
- WAN 2.1 VAE → models/vae
- Clip Vision H → models/clip_vision
Update ComfyUI
Install custom nodes:
- ControlNet Preprocessor
- KJ Nodes
- Video Helper Suite
- Use Everywhere
- ComfyUI Easy

2) Global Settings and Video

Primitives: width=720, height=1280 → Set constants
Load Video (16 fps, format=1) → Upscale (Lanczos, center crop) → Set “input video”

3) Pose Extraction

DW Pose Estimator (Face: disabled)
Pixel Perfect Resolution (match width/height)
Set “pose video”

4) Reference Image and Clip Vision

Load Image → Clip Vision Encode
Clip Vision Loader: clip_vision_h (Crop: None)
Set: “reference image” and “clip vision output”

5) Models and Prompts

UNet Loader GGUF → WAN 2.2 Animate
LoRA Loader (Model Only) → Light X2V; duplicate → WAN Animate Relight
Model Sampling SD3 (Shift: 8)
Load Clip → UMT5-XL FP8 (Type: 1)
Clip Text Encode: Positive and Negative
Load VAE → WAN 2.1 VAE
Set constants: “model”, “positive prompt”, “negative prompt”, “VAE”

6) Assemble and Sample

WAN Animate to Video:
- Connect positive, negative, VAE, clip vision output, reference image, pose video, width, height
- Length: 65
KSampler:
- Seed: Fixed; Steps: 6; CFG: 1; Sampler: Euler; Scheduler: Simple
Trim Video Latent → VAE Decode (Get VAE)
Video Combine (16 fps, H.264, CRF 15)
Set: “output video”
Run once to verify

7) Prompting

Add a descriptive positive prompt
Include negative prompts for artifacts
Run and compare

8) Extension 1

Copy Sampling group
Connect Video Frame Offset from first WAN Animate → second WAN Animate
Batch Image: Image 1 (Get “output video”), Image 2 (new VAE Decode)
RIFE → Video Combine (32 fps, save true)
Run to extend and smooth

9) Extension 2

Copy Extension 1 and paste as Extension 2
WAN Animate Strength: 77
Video Frame Offset: from Extension 1
Set from Extension 1 VAE Decode → “continue motion”; Get in Extension 2
Batch Image: stitch Extension 1 with Extension 2
RIFE → Video Combine (32 fps, new name)
Run and finalize

File and Folder Overview (Table)

Component	Recommended File	Folder
WAN 2.2 Animate (GGUF)	Q4_K_M or similar	models/unet
Light X2V LoRA	Light X2V (image-to-video, rank 64)	models/loras
WAN Animate Relight LoRA	Relight LoRA	models/loras
UMT5-XL FP8	umt5-xl-fp8	models/text_encoders
WAN 2.1 VAE	VAE 2.1	models/vae
Clip Vision H	clip_vision_h	models/clip_vision

Final Notes

Keep clips under ~10 seconds for best consistency. Longer than the source motion can cause pauses.
Length=65 produced reliable motion transfer across tests; you can experiment if you need different pacing.
Matching the first reference frame to the source video helps, but it’s not mandatory. The setup still works with a different starting frame.
The modular grouping and Set/Get constants keep the canvas clean and make adjustments fast.

This is a complete, reproducible setup to animate images with WAN 2.2 in ComfyUI using GGUF, extend durations, and output smooth videos at 32 fps.