Animate Any Image in ComfyUI with WAN 2.2 with GGUF

Animate Any Image in ComfyUI with WAN 2.2 with GGUF

I built a complete image-to-video workflow in ComfyUI using the WAN 2.2 Animate model in GGUF format. The goal is simple: take any single image, transfer motion from a reference video, keep lighting stable, and extend the clip well past a few seconds while preserving visual quality.

This article shows my exact setup and node connections, how I organized the workflow into groups, and two reliable methods to extend duration. I also include frame interpolation to smooth the result from 16 fps to 32 fps. Every step mirrors the process I used from start to finish.

Wan Animate Comfyui

The workflow animates a still image using motion extracted from a source video. It runs in ComfyUI and uses:

  • WAN 2.2 Animate (GGUF) as the core animation model
  • Light X2V LoRA for image-to-video conditioning
  • WAN Animate Relight LoRA for consistent lighting across frames
  • DW Pose (OpenPose-style) to extract pose from the source video
  • Clip Vision H to align the reference image with the motion
  • UMT5-XL FP8 text encoder for prompts
  • VAE 2.1 for decoding latent frames into images
  • RIFE frame interpolation to smooth and double the frame rate

Everything is organized in modular groups so you can adjust width, height, prompts, and inputs from centralized nodes.

Wan-Animate comfyui Overview

Section Purpose Key Nodes Outputs
Settings Centralize resolution and FPS Primitive (Width/Height), Set/Get Width, Height, FPS constants
Load Video + Controller Load and resize the video, extract pose Load Video (Video Helper Suite), Upscale Image, DW Pose, Pixel Perfect Resolution Input Video, Pose Video
Load Reference Image Import still image and encode with Clip Vision Load Image, Clip Vision Encode, Clip Vision Loader Reference Image, Clip Vision Output
Load Models + Prompts Load WAN 2.2 GGUF, LoRAs, CLIP text, VAE UNet Loader GGUF, LoRA Loader (x2), Model Sampling SD3, Load Clip, Clip Text Encode (pos/neg), Load VAE Model, Positive Prompt, Negative Prompt, VAE
Sampling Assemble and sample latent video, decode, and save WAN Animate to Video, KSampler, Trim Video Latent, VAE Decode, Video Combine Base video output at 16 fps
Frame Interpolation Double frame rate and smooth motion RIFE, Video Combine 32 fps smoothed video
Extension 1 & 2 Extend length with frame offsets and continual motion WAN Animate to Video (x2), Batch Image, Set/Get Longer stitched video outputs

Key Features

  • Motion transfer from any video to any image using DW Pose and WAN 2.2 Animate
  • Lighting consistency across frames with the Relight LoRA
  • Centralized Set/Get automation for width, height, inputs, and outputs
  • Promptable results with positive and negative conditioning
  • Extendable length using Video Frame Offset and Batch Image stitching
  • Smooth playback with RIFE frame interpolation to 32 fps
  • Clean H.264 output at 16 or 32 fps with controllable CRF

Prepare Your Environment and Files

Download the Required Models

Place downloads in the indicated folders under your ComfyUI/models directory.

  • WAN 2.2 Animate (GGUF)
    • File: WAN 2.2 Animate GGUF (e.g., Q4_K_M)
    • Folder: models/unet
  • Light X2V LoRA (image-to-video, rank 64)
    • Folder: models/loras
  • WAN Animate Relight LoRA
    • Folder: models/loras
  • UMT5-XL FP8 Text Encoder
    • Folder: models/text_encoders
  • WAN 2.1 VAE
    • Folder: models/vae
  • Clip Vision H
    • Folder: models/clip_vision

After placing the files, update ComfyUI with update.bat (Windows) or your usual update method.

Verify Custom Nodes in Manager

Install and confirm these custom nodes:

  • ControlNet Preprocessor
  • KJ Nodes
  • Video Helper Suite
  • Use Everywhere
  • ComfyUI Easy

Everything else used below is native or part of the above sets.


Build the Base Workflow

Settings and Input Video

Create centralized settings and a clean video input path.

  1. Create width and height primitives:

    • Width: 720
    • Height: 1280 (vertical)
    • Add Set nodes to store them as constants named “width” and “height”.
    • Group this mini-setup as “Settings”.
  2. Load and resize the source video:

    • Node: Load Video (Video Helper Suite)
      • FPS: 16
      • Format: 1
    • Node: Upscale Image
      • Method: Lanczos
      • Use Get nodes for width and height; set Crop: Center
    • Preview with Preview Animation (16 fps) to confirm scaling
    • Add Set node for the resized video, constant name: “input video”

This ensures every part of the workflow uses the same resolution and frame rate.

Extract Pose with ControlNet

Generate pose guidance for motion transfer.

  1. Node: DW Pose Estimator
    • Face: Disabled
  2. Node: Pixel Perfect Resolution (ControlNet Aux)
    • Feed the original video and apply the same width and height
  3. Connect and preview with Preview Animation (16 fps)
  4. Add Set node for pose output, constant name: “pose video”
  5. Group these as “Load Video + Controller”

Load the Reference Image and Clip Vision

Import the still image that will be animated and prepare Clip Vision features.

  1. Node: Load Image (your reference image)
  2. Node: Clip Vision Encode
    • Node: Clip Vision Loader → Model: clip_vision_h
    • Crop: None
  3. Add Set node for image output, constant: “reference image”
  4. Add Set node for Clip Vision output, constant: “clip vision output”
  5. Group them as “Load Reference Image”

Load Models and Prompts

Load the model stack, LoRAs, text encoders, and VAE.

  1. WAN 2.2 Animate (GGUF):

    • Node: UNet Loader GGUF
    • Select WAN 2.2 Animate GGUF (e.g., Q4_K_M)
  2. LoRAs:

    • Node: LoRA Loader (Model Only) → Light X2V (image-to-video, rank 64)
    • Duplicate LoRA Loader → WAN Animate Relight LoRA
    • Chain the loaders to the UNet
    • Node: Model Sampling SD3 → Shift: 8
  3. Prompts and text encoders:

    • Node: Load Clip → UMT5-XL FP8 → Type: 1
    • Node: Clip Text Encode (Positive)
    • Node: Clip Text Encode (Negative)
    • Feed Load Clip into negative as well
  4. VAE:

    • Node: Load VAE → WAN 2.1 VAE
  5. Automation with Set nodes:

    • Model → Set constant: “model”
    • Positive conditioning → Set constant: “positive prompt”
    • Negative conditioning → Set constant: “negative prompt”
    • VAE → Set constant: “VAE”
  6. Group all as “Load Models + Prompts”

Assemble and Sample

Connect everything into the WAN Animate node, sample, decode, and save.

  1. Node: WAN Animate to Video

    • Use Get nodes (Constants) to connect:
      • Positive Prompt → positive input
      • Negative Prompt → negative input
      • VAE → VAE input
      • Clip Vision Output → clip vision output input
      • Reference Image → reference image input
      • Pose Video → pose video input
      • Width and Height → resolution inputs
    • Length: 65
  2. Sampling and decoding:

    • Node: KSampler
      • Seed: Fixed
      • Steps: 6
      • CFG: 1
      • Sampler: Euler
      • Scheduler: Simple
    • Connect WAN Animate outputs to KSampler
    • Node: Trim Video Latent → feed Trim Latent from WAN Animate
    • Node: VAE Decode
      • Use Get node → constant “VAE” → VAE Decoder input
  3. Save video:

    • Node: Video Combine (Video Helper Suite)
      • FPS: 16
      • Format: H.264
      • CRF: 15
      • File Name: set your path and base filename
    • Add Set node for combined output, constant: “output video”
    • Group as “Sampling”

Run the graph to generate your first animation. At this point, motion should transfer, but you may want to refine it with prompts.

Prompting for Control

Prompts influence visual stability and interactions.

  • Add a clear positive prompt describing the subject, apparel, materials, and scene.
  • Add targeted negative prompts to suppress unwanted artifacts.

Run again and compare. You should observe better contact, cloth behavior, and scene coherence.


Extend Duration

Extension 1: Frame Offset and Stitch

Duplicate the sampling structure to create a second segment and stitch both segments into a longer clip.

  1. Copy the entire “Sampling” group and paste below (Ctrl+C, Ctrl+Shift+V).

  2. On the new WAN Animate to Video:

    • Connect Video Frame Offset:
      • From the first WAN Animate’s Video Frame Offset → to the second WAN Animate’s Video Frame Offset.
    • Keep Length the same (e.g., 65).
  3. Merge both segments:

    • Node: Batch Image (native)
      • Image 1: Get → “output video” from the first result
      • Image 2: VAE Decode output from the new segment
    • This stacks the two sequences.
  4. Smooth the motion:

    • Node: RIFE (Frame Interpolation)
      • Connect Batch Image output to RIFE
    • Node: Video Combine (first instance)
      • Save Output: False (don’t save the pre-interpolated merge)
    • Duplicate Video Combine to the right:
      • Connect RIFE output
      • FPS: 32 (doubles the original 16)
      • File Name: use a new filename (e.g., frame_interpolation)
      • Save Output: True
  5. Group this interpolation setup as “Frame Interpolation” and run.

You should see a longer, smoother result. In my test, this extended a ~3–4 second sequence to ~8 seconds at 32 fps.

Extension 2: Continual Motion and Second Stitch

Add a third segment with continual motion for even longer results.

  1. Copy the Extension 1 block (the second WAN Animate to Video chain) and paste below as “Extension 2”.

  2. Adjust settings:

    • On the new WAN Animate to Video:
      • Strength: 77
      • Video Frame Offset: connect from Extension 1’s WAN Animate (replace previous offset connection so it continues from segment 2)
  3. Continual Motion VAE:

    • From Extension 1’s VAE Decode:
      • Add Set node → constant: “continue motion”
    • In Extension 2:
      • Use Get → “continue motion” to feed the VAE continual motion input (if present) or any continuation input that your node provides.
  4. Stitch Extension 2:

    • Batch Image:
      • Image 1: connect from Extension 1’s stitched output (replace any earlier Image 1 link)
      • Image 2: VAE Decode from Extension 2
    • Feed Batch Image into RIFE and then into a new Video Combine at 32 fps
    • Use a new file name for the final save
  5. Rename groups:

    • Extension 1
    • Extension 2
    • Frame Interpolation

Run the full graph. You should get a longer clip (e.g., ~12 seconds). If you notice pauses, that means you’ve exceeded the useful motion in the original video.


Recommendations and Practical Settings

Duration and Length

  • Keep total output near or under 10 seconds for reliable continuity.
  • A Length of 65 frames for WAN Animate produced strong motion transfer in testing.
  • If your result is shorter than expected, add one extension at a time, then interpolate.

Frame Rate and Quality

  • Base video: 16 fps in Video Combine for initial sampling
  • Interpolated video: 32 fps via RIFE for fluid motion
  • Encoding: H.264, CRF ≈ 15 for a good balance between size and quality

KSampler

  • Seed: Fixed (for reproducibility)
  • Steps: 6 (works well with Light X2V)
  • CFG: 1
  • Sampler: Euler
  • Scheduler: Simple

Organization and Automation

Group and automate to keep the workflow clean:

  • Settings
  • Load Video + Controller (DW Pose + Pixel Perfect Resolution)
  • Load Reference Image (Clip Vision)
  • Load Models + Prompts (WAN 2.2 GGUF + LoRAs + UMT5-XL FP8 + VAE)
  • Sampling (WAN Animate → KSampler → Trim → VAE Decode → Video Combine)
  • Extension 1 (offset and stitch)
  • Extension 2 (continual motion and stitch)
  • Frame Interpolation (RIFE → Video Combine @ 32 fps)

Use Set/Get nodes for:

  • width, height
  • input video
  • pose video
  • reference image
  • clip vision output
  • model
  • positive prompt
  • negative prompt
  • VAE
  • output video
  • continue motion (for Extension 2)

Step-by-Step Quick Guide

1) Models and Nodes

  • Download and place:
    • WAN 2.2 Animate (GGUF) → models/unet
    • Light X2V LoRA (image-to-video, rank 64) → models/loras
    • WAN Animate Relight LoRA → models/loras
    • UMT5-XL FP8 → models/text_encoders
    • WAN 2.1 VAE → models/vae
    • Clip Vision H → models/clip_vision
  • Update ComfyUI
  • Install custom nodes:
    • ControlNet Preprocessor
    • KJ Nodes
    • Video Helper Suite
    • Use Everywhere
    • ComfyUI Easy

2) Global Settings and Video

  • Primitives: width=720, height=1280 → Set constants
  • Load Video (16 fps, format=1) → Upscale (Lanczos, center crop) → Set “input video”

3) Pose Extraction

  • DW Pose Estimator (Face: disabled)
  • Pixel Perfect Resolution (match width/height)
  • Set “pose video”

4) Reference Image and Clip Vision

  • Load Image → Clip Vision Encode
  • Clip Vision Loader: clip_vision_h (Crop: None)
  • Set: “reference image” and “clip vision output”

5) Models and Prompts

  • UNet Loader GGUF → WAN 2.2 Animate
  • LoRA Loader (Model Only) → Light X2V; duplicate → WAN Animate Relight
  • Model Sampling SD3 (Shift: 8)
  • Load Clip → UMT5-XL FP8 (Type: 1)
  • Clip Text Encode: Positive and Negative
  • Load VAE → WAN 2.1 VAE
  • Set constants: “model”, “positive prompt”, “negative prompt”, “VAE”

6) Assemble and Sample

  • WAN Animate to Video:
    • Connect positive, negative, VAE, clip vision output, reference image, pose video, width, height
    • Length: 65
  • KSampler:
    • Seed: Fixed; Steps: 6; CFG: 1; Sampler: Euler; Scheduler: Simple
  • Trim Video Latent → VAE Decode (Get VAE)
  • Video Combine (16 fps, H.264, CRF 15)
  • Set: “output video”
  • Run once to verify

7) Prompting

  • Add a descriptive positive prompt
  • Include negative prompts for artifacts
  • Run and compare

8) Extension 1

  • Copy Sampling group
  • Connect Video Frame Offset from first WAN Animate → second WAN Animate
  • Batch Image: Image 1 (Get “output video”), Image 2 (new VAE Decode)
  • RIFE → Video Combine (32 fps, save true)
  • Run to extend and smooth

9) Extension 2

  • Copy Extension 1 and paste as Extension 2
  • WAN Animate Strength: 77
  • Video Frame Offset: from Extension 1
  • Set from Extension 1 VAE Decode → “continue motion”; Get in Extension 2
  • Batch Image: stitch Extension 1 with Extension 2
  • RIFE → Video Combine (32 fps, new name)
  • Run and finalize

File and Folder Overview (Table)

Component Recommended File Folder
WAN 2.2 Animate (GGUF) Q4_K_M or similar models/unet
Light X2V LoRA Light X2V (image-to-video, rank 64) models/loras
WAN Animate Relight LoRA Relight LoRA models/loras
UMT5-XL FP8 umt5-xl-fp8 models/text_encoders
WAN 2.1 VAE VAE 2.1 models/vae
Clip Vision H clip_vision_h models/clip_vision

Final Notes

  • Keep clips under ~10 seconds for best consistency. Longer than the source motion can cause pauses.
  • Length=65 produced reliable motion transfer across tests; you can experiment if you need different pacing.
  • Matching the first reference frame to the source video helps, but it’s not mandatory. The setup still works with a different starting frame.
  • The modular grouping and Set/Get constants keep the canvas clean and make adjustments fast.

This is a complete, reproducible setup to animate images with WAN 2.2 in ComfyUI using GGUF, extend durations, and output smooth videos at 32 fps.

Recent Posts