Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
https://shapegen4d.github.io/\n","updatedAt":"2025-10-08T12:16:37.742Z","author":{"_id":"63daa44df03c3d71ef33da2d","avatarUrl":"/avatars/3193967d40b82a7d678b58a8e4d0ec1a.svg","fullname":"Jiraphon Yenphraphai","name":"domejiraphon","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.49802249670028687},"editors":["domejiraphon"],"editorAvatarUrls":["/avatars/3193967d40b82a7d678b58a8e4d0ec1a.svg"],"reactions":[],"isReport":false}},{"id":"68e7120afc8ddfb122d9e369","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-10-09T01:38:18.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models](https://huggingface.co/papers/2509.17985) (2025)\n* [UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation](https://huggingface.co/papers/2509.25079) (2025)\n* [FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction](https://huggingface.co/papers/2509.21657) (2025)\n* [WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving](https://huggingface.co/papers/2509.23402) (2025)\n* [PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos](https://huggingface.co/papers/2509.25183) (2025)\n* [Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation](https://huggingface.co/papers/2509.19296) (2025)\n* [Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation](https://huggingface.co/papers/2509.10687) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-10-09T01:38:18.269Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6390213966369629},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2510.06208","authors":[{"_id":"68e654fc975ac4c405ef2275","user":{"_id":"63daa44df03c3d71ef33da2d","avatarUrl":"/avatars/3193967d40b82a7d678b58a8e4d0ec1a.svg","isPro":false,"fullname":"Jiraphon Yenphraphai","user":"domejiraphon","type":"user"},"name":"Jiraphon Yenphraphai","status":"claimed_verified","statusLastChangedAt":"2025-10-09T06:51:35.786Z","hidden":false},{"_id":"68e654fc975ac4c405ef2276","name":"Ashkan Mirzaei","hidden":false},{"_id":"68e654fc975ac4c405ef2277","name":"Jianqi Chen","hidden":false},{"_id":"68e654fc975ac4c405ef2278","name":"Jiaxu Zou","hidden":false},{"_id":"68e654fc975ac4c405ef2279","name":"Sergey Tulyakov","hidden":false},{"_id":"68e654fc975ac4c405ef227a","name":"Raymond A. Yeh","hidden":false},{"_id":"68e654fc975ac4c405ef227b","name":"Peter Wonka","hidden":false},{"_id":"68e654fc975ac4c405ef227c","name":"Chaoyang Wang","hidden":false}],"publishedAt":"2025-10-07T17:58:11.000Z","submittedOnDailyAt":"2025-10-08T10:44:28.526Z","title":"ShapeGen4D: Towards High Quality 4D Shape Generation from Videos","submittedOnDailyBy":{"_id":"63daa44df03c3d71ef33da2d","avatarUrl":"/avatars/3193967d40b82a7d678b58a8e4d0ec1a.svg","isPro":false,"fullname":"Jiraphon Yenphraphai","user":"domejiraphon","type":"user"},"summary":"Video-conditioned 4D shape generation aims to recover time-varying 3D\ngeometry and view-consistent appearance directly from an input video. In this\nwork, we introduce a native video-to-4D shape generation framework that\nsynthesizes a single dynamic 3D representation end-to-end from the video. Our\nframework introduces three key components based on large-scale pre-trained 3D\nmodels: (i) a temporal attention that conditions generation on all frames while\nproducing a time-indexed dynamic representation; (ii) a time-aware point\nsampling and 4D latent anchoring that promote temporally consistent geometry\nand texture; and (iii) noise sharing across frames to enhance temporal\nstability. Our method accurately captures non-rigid motion, volume changes, and\neven topological transitions without per-frame optimization. Across diverse\nin-the-wild videos, our method improves robustness and perceptual fidelity and\nreduces failure modes compared with the baselines.","upvotes":19,"discussionId":"68e6550e975ac4c405ef227d","projectPage":"https://shapegen4d.github.io/","ai_summary":"A video-to-4D shape generation framework uses temporal attention, time-aware point sampling, and noise sharing to produce dynamic 3D representations from videos, enhancing temporal stability and perceptual fidelity.","ai_keywords":["temporal attention","time-aware point sampling","4D latent anchoring","noise sharing","non-rigid motion","volume changes","topological transitions","dynamic 3D representation"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63daa44df03c3d71ef33da2d","avatarUrl":"/avatars/3193967d40b82a7d678b58a8e4d0ec1a.svg","isPro":false,"fullname":"Jiraphon Yenphraphai","user":"domejiraphon","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"634db15dd00bb5d92c3bd94f","avatarUrl":"/avatars/9a6a1231bc5205911272d83527593f1a.svg","isPro":false,"fullname":"Ashkan Mirzaei","user":"ashmrz","type":"user"},{"_id":"66ac5b9e84b4d052c38e8f58","avatarUrl":"/avatars/529e6f9b15e3a8bdc4b3d50fa6b10868.svg","isPro":false,"fullname":"Zhengyuan Li","user":"ZYYY99","type":"user"},{"_id":"64ee1c311c9470d52f573fdb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ee1c311c9470d52f573fdb/7rotxakaEsyTk8FyGXmxX.png","isPro":false,"fullname":"Haomeng Zhang","user":"haomengz","type":"user"},{"_id":"6429ac8e8136224fee087253","avatarUrl":"/avatars/f7e4f0885e8c4a90c75c4e36ae3fba6e.svg","isPro":false,"fullname":"Amber Yijia Zheng","user":"amberyzheng","type":"user"},{"_id":"661e07e02a8496916011c08a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/6vIkL4LJ0bLeJL99E20F_.jpeg","isPro":false,"fullname":"Md Ashiqur Rahman","user":"ashiq24","type":"user"},{"_id":"643ac5720e5495afdefa1a51","avatarUrl":"/avatars/37fdbdca6e683f98cbbc6c048a9b86bc.svg","isPro":false,"fullname":"Jiaxu Zou","user":"zoujx96","type":"user"},{"_id":"64c166de11f3f1d23a99c6fd","avatarUrl":"/avatars/d4730f70a6f1cb51266862c7a8d54f77.svg","isPro":false,"fullname":"Chaoyang Wang","user":"cwang9","type":"user"},{"_id":"619f9755da83161f25840698","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/619f9755da83161f25840698/FM421pE1mz5v1YhrxA8ZA.jpeg","isPro":false,"fullname":"Muhammad Umair","user":"umair894","type":"user"},{"_id":"64ea59beb36ed038b6638ece","avatarUrl":"/avatars/a74b3c8b63b5ca8ebb3a00455f6f803f.svg","isPro":false,"fullname":"Slava","user":"wertlon","type":"user"},{"_id":"651c240a37fecec1fe96c60b","avatarUrl":"/avatars/5af52af97b7907e138efecac0f20799b.svg","isPro":false,"fullname":"S.F.","user":"search-facility","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
A video-to-4D shape generation framework uses temporal attention, time-aware point sampling, and noise sharing to produce dynamic 3D representations from videos, enhancing temporal stability and perceptual fidelity.
AI-generated summary
Video-conditioned 4D shape generation aims to recover time-varying 3D
geometry and view-consistent appearance directly from an input video. In this
work, we introduce a native video-to-4D shape generation framework that
synthesizes a single dynamic 3D representation end-to-end from the video. Our
framework introduces three key components based on large-scale pre-trained 3D
models: (i) a temporal attention that conditions generation on all frames while
producing a time-indexed dynamic representation; (ii) a time-aware point
sampling and 4D latent anchoring that promote temporally consistent geometry
and texture; and (iii) noise sharing across frames to enhance temporal
stability. Our method accurately captures non-rigid motion, volume changes, and
even topological transitions without per-frame optimization. Across diverse
in-the-wild videos, our method improves robustness and perceptual fidelity and
reduces failure modes compared with the baselines.