Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - GeometryCrafter: Consistent Geometry Estimation for Open-world Videos
with Diffusion Priors
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-04-03T01:33:49.213Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6748038530349731},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2504.01016","authors":[{"_id":"67ec958ebb1d6dd924f94a31","user":{"_id":"65f8e4778dc7bb5b4db97f92","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/gBmQdovmANmjV72k3gaW8.png","isPro":false,"fullname":"Tian-Xing Xu","user":"slothfulxtx","type":"user"},"name":"Tian-Xing Xu","status":"admin_assigned","statusLastChangedAt":"2025-04-02T09:51:06.226Z","hidden":false},{"_id":"67ec958ebb1d6dd924f94a32","user":{"_id":"64c0953a8137192a1e2474dc","avatarUrl":"/avatars/546405a7eaf2f60ad108ceaa0dda7d08.svg","isPro":false,"fullname":"xiangjun gao","user":"xiangjun0211","type":"user"},"name":"Xiangjun Gao","status":"admin_assigned","statusLastChangedAt":"2025-04-02T09:51:14.978Z","hidden":false},{"_id":"67ec958ebb1d6dd924f94a33","user":{"_id":"657a7458afbb0117ba15c59f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657a7458afbb0117ba15c59f/8_iwTS1UG_mKnfylFbLsY.jpeg","isPro":false,"fullname":"Wenbo Hu","user":"wbhu-tc","type":"user"},"name":"Wenbo Hu","status":"claimed_verified","statusLastChangedAt":"2025-04-02T08:22:52.313Z","hidden":false},{"_id":"67ec958ebb1d6dd924f94a34","user":{"_id":"66d8284b6bddfb32e77ddafb","avatarUrl":"/avatars/edd371ecef6e7d58b945a30bbc6095ee.svg","isPro":false,"fullname":"Xiaoyu Li","user":"Xiaoyu521","type":"user"},"name":"Xiaoyu Li","status":"claimed_verified","statusLastChangedAt":"2025-09-03T09:06:53.629Z","hidden":false},{"_id":"67ec958ebb1d6dd924f94a35","name":"Song-Hai Zhang","hidden":false},{"_id":"67ec958ebb1d6dd924f94a36","user":{"_id":"63ca3ddc04c979828310bfcb","avatarUrl":"/avatars/615e0d8622950b4408b40d550f02a894.svg","isPro":false,"fullname":"Ying Shan","user":"yshan2u","type":"user"},"name":"Ying Shan","status":"admin_assigned","statusLastChangedAt":"2025-04-02T09:51:45.869Z","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/657a7458afbb0117ba15c59f/6He3mQcB_AO1G_Sq8xYc0.mp4"],"publishedAt":"2025-04-01T17:58:03.000Z","submittedOnDailyAt":"2025-04-02T00:15:17.585Z","title":"GeometryCrafter: Consistent Geometry Estimation for Open-world Videos\n with Diffusion Priors","submittedOnDailyBy":{"_id":"657a7458afbb0117ba15c59f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657a7458afbb0117ba15c59f/8_iwTS1UG_mKnfylFbLsY.jpeg","isPro":false,"fullname":"Wenbo Hu","user":"wbhu-tc","type":"user"},"summary":"Despite remarkable advancements in video depth estimation, existing methods\nexhibit inherent limitations in achieving geometric fidelity through the\naffine-invariant predictions, limiting their applicability in reconstruction\nand other metrically grounded downstream tasks. We propose GeometryCrafter, a\nnovel framework that recovers high-fidelity point map sequences with temporal\ncoherence from open-world videos, enabling accurate 3D/4D reconstruction,\ncamera parameter estimation, and other depth-based applications. At the core of\nour approach lies a point map Variational Autoencoder (VAE) that learns a\nlatent space agnostic to video latent distributions for effective point map\nencoding and decoding. Leveraging the VAE, we train a video diffusion model to\nmodel the distribution of point map sequences conditioned on the input videos.\nExtensive evaluations on diverse datasets demonstrate that GeometryCrafter\nachieves state-of-the-art 3D accuracy, temporal consistency, and generalization\ncapability.","upvotes":29,"discussionId":"67ec9593bb1d6dd924f94b3e","projectPage":"https://geometrycrafter.github.io/","githubRepo":"https://github.com/TencentARC/GeometryCrafter","githubRepoAddedBy":"user","ai_summary":"GeometryCrafter uses a point map VAE and video diffusion model to estimate high-fidelity, temporally coherent depth maps from open-world videos, enhancing 3D reconstruction and camera parameter estimation.","ai_keywords":["point map VAE","video diffusion model","high-fidelity point maps","temporal coherence","3D/4D reconstruction","camera parameter estimation","latent space","distribution modeling"],"githubStars":431},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"657a7458afbb0117ba15c59f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657a7458afbb0117ba15c59f/8_iwTS1UG_mKnfylFbLsY.jpeg","isPro":false,"fullname":"Wenbo Hu","user":"wbhu-tc","type":"user"},{"_id":"67e9fc3797cd6860c81d5838","avatarUrl":"/avatars/6c37731156bf52c123bd390823890d28.svg","isPro":false,"fullname":"Jangho Park","user":"jhpark96","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"65f8e4778dc7bb5b4db97f92","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/gBmQdovmANmjV72k3gaW8.png","isPro":false,"fullname":"Tian-Xing Xu","user":"slothfulxtx","type":"user"},{"_id":"6342796a0875f2c99cfd313b","avatarUrl":"/avatars/98575092404c4197b20c929a6499a015.svg","isPro":false,"fullname":"Yuseung \"Phillip\" Lee","user":"phillipinseoul","type":"user"},{"_id":"635964636a61954080850e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635964636a61954080850e1d/0bfExuDTrHTtm8c-40cDM.png","isPro":false,"fullname":"William Lamkin","user":"phanes","type":"user"},{"_id":"63c5d43ae2804cb2407e4d43","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1673909278097-noauth.png","isPro":false,"fullname":"xziayro","user":"xziayro","type":"user"},{"_id":"6506b77a773ceaa8d52ecea1","avatarUrl":"/avatars/0e769a0795063e1491c44760a4a83097.svg","isPro":false,"fullname":"CJH","user":"Howe666","type":"user"},{"_id":"638f308fc4444c6ca870b60a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638f308fc4444c6ca870b60a/Q11NK-8-JbiilJ-vk2LAR.png","isPro":true,"fullname":"Linoy Tsaban","user":"linoyts","type":"user"},{"_id":"66fa882fa9312392f2d5a6e0","avatarUrl":"/avatars/b4fe2141925bc187636f2effa45d0cba.svg","isPro":false,"fullname":"BeeFX","user":"BeeFX","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"634dffc49b777beec3bc6448","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670144568552-634dffc49b777beec3bc6448.jpeg","isPro":false,"fullname":"Zhipeng Yang","user":"svjack","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
GeometryCrafter uses a point map VAE and video diffusion model to estimate high-fidelity, temporally coherent depth maps from open-world videos, enhancing 3D reconstruction and camera parameter estimation.
AI-generated summary
Despite remarkable advancements in video depth estimation, existing methods
exhibit inherent limitations in achieving geometric fidelity through the
affine-invariant predictions, limiting their applicability in reconstruction
and other metrically grounded downstream tasks. We propose GeometryCrafter, a
novel framework that recovers high-fidelity point map sequences with temporal
coherence from open-world videos, enabling accurate 3D/4D reconstruction,
camera parameter estimation, and other depth-based applications. At the core of
our approach lies a point map Variational Autoencoder (VAE) that learns a
latent space agnostic to video latent distributions for effective point map
encoding and decoding. Leveraging the VAE, we train a video diffusion model to
model the distribution of point map sequences conditioned on the input videos.
Extensive evaluations on diverse datasets demonstrate that GeometryCrafter
achieves state-of-the-art 3D accuracy, temporal consistency, and generalization
capability.