LongCat-Video-Avatar 1.5
Technique Report Code 🤗 Hugging Face
An expressive avatar model built upon LongCat-Video
An expressive avatar model built upon LongCat-Video
Introduce what LongCat-Video-Avatar 1.5 is, why it upgrades the 1.0 release, and how the model combines stable avatar generation with practical inference speed.
Demonstrate stronger mouth-shape accuracy, smooth expression transitions, identity consistency, and coherent full-body motion across long speaking shots and hand-object interactions.
Singing examples for dynamic motion, musical expression, and stable full-body or upper-body performance.
Animation examples with expressive motion, stylized characters, and stable audio-driven performance.
Multi-speaker and group interaction cases with stable identities and natural turn-taking behavior.
Compare LongCat-Video-Avatar 1.5 with HeyGen, Kling Avatar 2.0, and OmniHuman-1.5 under the same or similar inputs, focusing on stability, consistency, and natural lip motion.
Highlight the upgrade from 1.0 to 1.5: better mouth-shape accuracy, stronger long-video identity preservation, broader interactive scenarios, and faster 8-step generation.
Part of images and audios are derived from real videos solely to demonstrate the capabilities of this research, e.g., expressions, gestures, and naturalness. The generated content is for academic use only and commercial use is not permitted. If there are any concerns, please contact us (zhangyong202303@gmail.com) and we will delete it in time.