Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
[go: Go Back, main page]

https://hjrphoebus.github.io/3DiMo/

\n","updatedAt":"2026-02-04T03:59:57.810Z","author":{"_id":"66a356f2f7352f4ffbb4e74a","avatarUrl":"/avatars/b905cc8719eb31b29e8a94717219a79b.svg","fullname":"He","name":"Phoebux","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.37316057085990906},"editors":["Phoebux"],"editorAvatarUrls":["/avatars/b905cc8719eb31b29e8a94717219a79b.svg"],"reactions":[],"isReport":false}},{"id":"6982d40a291c5930ff3cb2bd","author":{"_id":"65684c80a9a1a6a50d779f58","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65684c80a9a1a6a50d779f58/it534ZdH5LxRub1M_o3uM.jpeg","fullname":"Silin Chen","name":"Silin-Chen","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-02-04T05:07:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"小黑子","html":"

小黑子

\n","updatedAt":"2026-02-04T05:07:22.982Z","author":{"_id":"65684c80a9a1a6a50d779f58","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65684c80a9a1a6a50d779f58/it534ZdH5LxRub1M_o3uM.jpeg","fullname":"Silin Chen","name":"Silin-Chen","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"zh","probability":0.630032479763031},"editors":["Silin-Chen"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65684c80a9a1a6a50d779f58/it534ZdH5LxRub1M_o3uM.jpeg"],"reactions":[{"reaction":"👍","users":["fengyanzi","morphism42","micoraea","upyzwup","imzhengzx","fzxhhh","qizekun","Yann0122","ASHIDAKA","zcyeee"],"count":10}],"isReport":false}},{"id":"6982d64a73adc5323477847b","author":{"_id":"60efa4da8432bc401cd0abc6","avatarUrl":"/avatars/3c8d8db9aaa5bd5fd1f870ac0a6b655a.svg","fullname":"Changze Lv","name":"fdu-lcz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-02-04T05:16:58.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"小黑子,已取餐","html":"

小黑子,已取餐

\n","updatedAt":"2026-02-04T05:16:58.629Z","author":{"_id":"60efa4da8432bc401cd0abc6","avatarUrl":"/avatars/3c8d8db9aaa5bd5fd1f870ac0a6b655a.svg","fullname":"Changze Lv","name":"fdu-lcz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"zh","probability":0.9966009855270386},"editors":["fdu-lcz"],"editorAvatarUrls":["/avatars/3c8d8db9aaa5bd5fd1f870ac0a6b655a.svg"],"reactions":[],"isReport":false}},{"id":"6983bea7036f5289e4767ae2","author":{"_id":"645a7855c4acfcf66401046b","avatarUrl":"/avatars/9e395c981b9afaae9a9284003f90b212.svg","fullname":"hz.Lin","name":"ahhhhhha","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-02-04T21:48:23.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"坤?","html":"

坤?

\n","updatedAt":"2026-02-04T21:48:23.698Z","author":{"_id":"645a7855c4acfcf66401046b","avatarUrl":"/avatars/9e395c981b9afaae9a9284003f90b212.svg","fullname":"hz.Lin","name":"ahhhhhha","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"zh","probability":0.9623088240623474},"editors":["ahhhhhha"],"editorAvatarUrls":["/avatars/9e395c981b9afaae9a9284003f90b212.svg"],"reactions":[],"isReport":false}},{"id":"6983f4c6ba30450e91182557","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-05T01:39:18.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning](https://huggingface.co/papers/2601.21716) (2026)\n* [FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint](https://huggingface.co/papers/2512.11645) (2025)\n* [STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits](https://huggingface.co/papers/2512.13247) (2025)\n* [Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation](https://huggingface.co/papers/2601.10214) (2026)\n* [Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis](https://huggingface.co/papers/2601.14253) (2026)\n* [CETCAM: Camera-Controllable Video Generation via Consistent and Extensible Tokenization](https://huggingface.co/papers/2512.19020) (2025)\n* [EgoReAct: Egocentric Video-Driven 3D Human Reaction Generation](https://huggingface.co/papers/2512.22808) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-05T01:39:18.468Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6647888422012329},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"69873bd023830cd9f863be3a","author":{"_id":"663f0504df93b468040b6f12","avatarUrl":"/avatars/d19c6c7082e2ee8bce2ef7c205adc794.svg","fullname":"theshy","name":"shyuc","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-02-07T13:19:12.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"怎么还在这黑我家哥哥\n","html":"

怎么还在这黑我家哥哥

\n","updatedAt":"2026-02-07T13:19:12.590Z","author":{"_id":"663f0504df93b468040b6f12","avatarUrl":"/avatars/d19c6c7082e2ee8bce2ef7c205adc794.svg","fullname":"theshy","name":"shyuc","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"zh","probability":0.9665306806564331},"editors":["shyuc"],"editorAvatarUrls":["/avatars/d19c6c7082e2ee8bce2ef7c205adc794.svg"],"reactions":[],"isReport":false}},{"id":"69878356410898e152aa4046","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-07T18:24:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/3d-aware-implicit-motion-control-for-view-adaptive-human-video-generation-5023-55d74565\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/3d-aware-implicit-motion-control-for-view-adaptive-human-video-generation-5023-55d74565

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-02-07T18:24:22.208Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7032208442687988},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.03796","authors":[{"_id":"6982ba459084cb4f0ecb56b0","name":"Zhixue Fang","hidden":false},{"_id":"6982ba459084cb4f0ecb56b1","name":"Xu He","hidden":false},{"_id":"6982ba459084cb4f0ecb56b2","name":"Songlin Tang","hidden":false},{"_id":"6982ba459084cb4f0ecb56b3","name":"Haoxian Zhang","hidden":false},{"_id":"6982ba459084cb4f0ecb56b4","name":"Qingfeng Li","hidden":false},{"_id":"6982ba459084cb4f0ecb56b5","name":"Xiaoqiang Liu","hidden":false},{"_id":"6982ba459084cb4f0ecb56b6","name":"Pengfei Wan","hidden":false},{"_id":"6982ba459084cb4f0ecb56b7","name":"Kun Gai","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/66a356f2f7352f4ffbb4e74a/eYfmS5epN6LuK7w-itRi8.mp4"],"publishedAt":"2026-02-03T17:59:09.000Z","submittedOnDailyAt":"2026-02-04T01:18:19.937Z","title":"3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation","submittedOnDailyBy":{"_id":"66a356f2f7352f4ffbb4e74a","avatarUrl":"/avatars/b905cc8719eb31b29e8a94717219a79b.svg","isPro":false,"fullname":"He","user":"Phoebux","type":"user"},"summary":"Existing methods for human motion control in video generation typically rely on either 2D poses or explicit 3D parametric models (e.g., SMPL) as control signals. However, 2D poses rigidly bind motion to the driving viewpoint, precluding novel-view synthesis. Explicit 3D models, though structurally informative, suffer from inherent inaccuracies (e.g., depth ambiguity and inaccurate dynamics) which, when used as a strong constraint, override the powerful intrinsic 3D awareness of large-scale video generators. In this work, we revisit motion control from a 3D-aware perspective, advocating for an implicit, view-agnostic motion representation that naturally aligns with the generator's spatial priors rather than depending on externally reconstructed constraints. We introduce 3DiMo, which jointly trains a motion encoder with a pretrained video generator to distill driving frames into compact, view-agnostic motion tokens, injected semantically via cross-attention. To foster 3D awareness, we train with view-rich supervision (i.e., single-view, multi-view, and moving-camera videos), forcing motion consistency across diverse viewpoints. Additionally, we use auxiliary geometric supervision that leverages SMPL only for early initialization and is annealed to zero, enabling the model to transition from external 3D guidance to learning genuine 3D spatial motion understanding from the data and the generator's priors. Experiments confirm that 3DiMo faithfully reproduces driving motions with flexible, text-driven camera control, significantly surpassing existing methods in both motion fidelity and visual quality.","upvotes":57,"discussionId":"6982ba459084cb4f0ecb56b8","ai_summary":"3DiMo enables view-agnostic human motion control in video generation by training a motion encoder alongside a pretrained video generator to distill driving frames into compact motion tokens that align with the generator's spatial priors.","ai_keywords":["motion encoder","video generator","motion tokens","cross-attention","view-rich supervision","geometric supervision","SMPL","camera control","motion fidelity","visual quality"],"organization":{"_id":"662c559b322afcbae51b3c8b","name":"KlingTeam","fullname":"Kling Team","avatar":"https://cdn-uploads.huggingface.co/production/uploads/60e272ca6c78a8c122b12127/ZQV1aKLUDPf2rUcxxAqj6.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"662f93942510ef5735d7ad00","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662f93942510ef5735d7ad00/ZIDIPm63sncIHFTT5b0uR.png","isPro":false,"fullname":"magicwpf","user":"magicwpf","type":"user"},{"_id":"66a356f2f7352f4ffbb4e74a","avatarUrl":"/avatars/b905cc8719eb31b29e8a94717219a79b.svg","isPro":false,"fullname":"He","user":"Phoebux","type":"user"},{"_id":"646f3418a6a58aa29505fd30","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f3418a6a58aa29505fd30/1z13rnpb6rsUgQsYumWPg.png","isPro":false,"fullname":"QINGHE WANG","user":"Qinghew","type":"user"},{"_id":"651ed7ef755e92f7f12742e6","avatarUrl":"/avatars/57a9cc189b4a59299aad6c96191b18d8.svg","isPro":false,"fullname":"yu li","user":"lyabc","type":"user"},{"_id":"64f94370c3c12b377cc51086","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64f94370c3c12b377cc51086/6CXcHhqAoykqXcShqM8Rd.jpeg","isPro":false,"fullname":"Minghong Cai","user":"onevfall","type":"user"},{"_id":"66078994c50f8393c56ed837","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/aYYde45zaFACRllyEhJyU.jpeg","isPro":false,"fullname":"Tianrui Zhu","user":"xilluill","type":"user"},{"_id":"637cba13b8e573d75be96ea6","avatarUrl":"/avatars/5eca230e63d66947b2a05c1ff964a96c.svg","isPro":false,"fullname":"Nina","user":"NinaKarine","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"66743477ab975c859114d410","avatarUrl":"/avatars/ac692cc336e383fb2cb53db6d1e3fe8c.svg","isPro":false,"fullname":"yawenluo","user":"yawenluo","type":"user"},{"_id":"66d8539136aa505569fbd0b2","avatarUrl":"/avatars/bd2681946de858a342bab3e9a16dde1a.svg","isPro":false,"fullname":"jyyyyy","user":"jyyyyy67","type":"user"},{"_id":"646f38348180f35af535b03a","avatarUrl":"/avatars/2fc94ac35fc80d83219778ba29e815a9.svg","isPro":true,"fullname":"Du Ricky","user":"sddwt","type":"user"},{"_id":"66a0b434968fb59eb2434a0a","avatarUrl":"/avatars/67538f97ebf1995c530563f411d3cd95.svg","isPro":false,"fullname":"Zixuan Ye","user":"zixuan-ye","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"662c559b322afcbae51b3c8b","name":"KlingTeam","fullname":"Kling Team","avatar":"https://cdn-uploads.huggingface.co/production/uploads/60e272ca6c78a8c122b12127/ZQV1aKLUDPf2rUcxxAqj6.jpeg"}}">
Papers
arxiv:2602.03796

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Published on Feb 3
· Submitted by
He
on Feb 4
Authors:
,
,
,
,
,
,
,

Abstract

3DiMo enables view-agnostic human motion control in video generation by training a motion encoder alongside a pretrained video generator to distill driving frames into compact motion tokens that align with the generator's spatial priors.

AI-generated summary

Existing methods for human motion control in video generation typically rely on either 2D poses or explicit 3D parametric models (e.g., SMPL) as control signals. However, 2D poses rigidly bind motion to the driving viewpoint, precluding novel-view synthesis. Explicit 3D models, though structurally informative, suffer from inherent inaccuracies (e.g., depth ambiguity and inaccurate dynamics) which, when used as a strong constraint, override the powerful intrinsic 3D awareness of large-scale video generators. In this work, we revisit motion control from a 3D-aware perspective, advocating for an implicit, view-agnostic motion representation that naturally aligns with the generator's spatial priors rather than depending on externally reconstructed constraints. We introduce 3DiMo, which jointly trains a motion encoder with a pretrained video generator to distill driving frames into compact, view-agnostic motion tokens, injected semantically via cross-attention. To foster 3D awareness, we train with view-rich supervision (i.e., single-view, multi-view, and moving-camera videos), forcing motion consistency across diverse viewpoints. Additionally, we use auxiliary geometric supervision that leverages SMPL only for early initialization and is annealed to zero, enabling the model to transition from external 3D guidance to learning genuine 3D spatial motion understanding from the data and the generator's priors. Experiments confirm that 3DiMo faithfully reproduces driving motions with flexible, text-driven camera control, significantly surpassing existing methods in both motion fidelity and visual quality.

Community

Paper submitter
edited 17 days ago

TL;DR: 3DiMo can faithfully transfer genuine 3D motion from a given driving video to a reference character, while enabling flexible free-view camera control.

Paper submitter

小黑子

小黑子,已取餐

坤?

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

怎么还在这黑我家哥哥

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/3d-aware-implicit-motion-control-for-view-adaptive-human-video-generation-5023-55d74565

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03796 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.03796 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03796 in a Space README.md to link it from this page.

Collections including this paper 6