Co-Speech Gesture Video Generatio
with Implicit Motion-Audio Entanglement

CVPR 2025

Videos for Comparisons Videos for Ablation Studies Videos for Other Identities

Videos for Ablation Studies

On this page, we present videos for the ablation studies, showcasing the impact of various model components on the overall performance and visual quality.

The incomplete model versions suffer from low visual quality, background inconsistencies with the reference image, distorted hands, extra fingers, and hands that appear detached from the body. Moreover, the generated videos show significant motion inconsistencies, with severe motion shaking.

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours

Ref Img

W/o Ref

W/o Motion

W/o First Stage

W/o Slow-Fast

Ours