Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
[go: Go Back, main page]

https://indu1ge.github.io/DepthMaster_page

\n","updatedAt":"2025-01-07T11:53:36.860Z","author":{"_id":"6697ac8427e4e21a3a92da27","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6697ac8427e4e21a3a92da27/9vn07-1_BBDk9zfDtDpcG.png","fullname":"Ruijie Zhu","name":"RuijieZhu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.42010796070098877},"editors":["RuijieZhu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6697ac8427e4e21a3a92da27/9vn07-1_BBDk9zfDtDpcG.png"],"reactions":[{"reaction":"👍","users":["Cmonsta","young98"],"count":2}],"isReport":false}},{"id":"677dd62d2b763404dd79a537","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-01-08T01:34:37.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation](https://huggingface.co/papers/2412.00671) (2024)\n* [Amodal Depth Anything: Amodal Depth Estimation in the Wild](https://huggingface.co/papers/2412.02336) (2024)\n* [PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation](https://huggingface.co/papers/2501.01121) (2025)\n* [MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation](https://huggingface.co/papers/2411.10886) (2024)\n* [SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation](https://huggingface.co/papers/2411.18229) (2024)\n* [Balancing Shared and Task-Specific Representations: A Hybrid Approach to Depth-Aware Video Panoptic Segmentation](https://huggingface.co/papers/2412.07966) (2024)\n* [GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion](https://huggingface.co/papers/2412.06080) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-01-08T01:34:37.543Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6812219619750977},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"677fc559eae14849e7072bb1","author":{"_id":"66dd88305598833ba13db22e","avatarUrl":"/avatars/9b9beff4ce1c8749511c62716bccbc49.svg","fullname":"Chris","name":"Cmonsta","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-01-09T12:47:21.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Great work! \n\nIs there a plan to release training code for this paper, or is the plan to continue to optimize the parameter count? ","html":"

Great work!

\n

Is there a plan to release training code for this paper, or is the plan to continue to optimize the parameter count?

\n","updatedAt":"2025-01-09T12:47:21.737Z","author":{"_id":"66dd88305598833ba13db22e","avatarUrl":"/avatars/9b9beff4ce1c8749511c62716bccbc49.svg","fullname":"Chris","name":"Cmonsta","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8623336553573608},"editors":["Cmonsta"],"editorAvatarUrls":["/avatars/9b9beff4ce1c8749511c62716bccbc49.svg"],"reactions":[],"isReport":false},"replies":[{"id":"678076b343a58ab7b531b3c4","author":{"_id":"666fbf760de9ee884b99db29","avatarUrl":"/avatars/9a7ecedc703e92d1ebf34ca0bd35e41b.svg","fullname":"Ziyang Song","name":"zysong212","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2025-01-10T01:24:03.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thank you for your interest in our work! The code is currently undergoing a confidentiality review, and we will release it soon. Besides, we do have the plan to further optimize the parameter count.","html":"

Thank you for your interest in our work! The code is currently undergoing a confidentiality review, and we will release it soon. Besides, we do have the plan to further optimize the parameter count.

\n","updatedAt":"2025-01-10T01:24:03.928Z","author":{"_id":"666fbf760de9ee884b99db29","avatarUrl":"/avatars/9a7ecedc703e92d1ebf34ca0bd35e41b.svg","fullname":"Ziyang Song","name":"zysong212","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9244105815887451},"editors":["zysong212"],"editorAvatarUrls":["/avatars/9a7ecedc703e92d1ebf34ca0bd35e41b.svg"],"reactions":[],"isReport":false,"parentCommentId":"677fc559eae14849e7072bb1"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2501.02576","authors":[{"_id":"677d15386ddea8749283f9c1","user":{"_id":"666fbf760de9ee884b99db29","avatarUrl":"/avatars/9a7ecedc703e92d1ebf34ca0bd35e41b.svg","isPro":false,"fullname":"Ziyang Song","user":"zysong212","type":"user"},"name":"Ziyang Song","status":"admin_assigned","statusLastChangedAt":"2025-01-07T13:23:41.821Z","hidden":false},{"_id":"677d15386ddea8749283f9c2","user":{"_id":"668600d61de6c00f43c5665b","avatarUrl":"/avatars/04dceea51ae26a069403a21f293c0f2f.svg","isPro":false,"fullname":"Zerong.Wang","user":"Zerong007","type":"user"},"name":"Zerong Wang","status":"admin_assigned","statusLastChangedAt":"2025-01-07T13:23:02.681Z","hidden":false},{"_id":"677d15386ddea8749283f9c3","user":{"_id":"6493236b70d925ae8050a1bf","avatarUrl":"/avatars/b16069de1445cfa8608567175deaa2ae.svg","isPro":false,"fullname":"Bo Li","user":"BoLi-aisecure","type":"user"},"name":"Bo Li","status":"admin_assigned","statusLastChangedAt":"2025-01-07T13:22:55.281Z","hidden":false},{"_id":"677d15386ddea8749283f9c4","user":{"_id":"666ad446f0f6bd52df7a1a1a","avatarUrl":"/avatars/8c0ffef22e6fa6ac2937d6dfcabc7194.svg","isPro":false,"fullname":"ZhangHao","user":"PierreHaoZHANG","type":"user"},"name":"Hao Zhang","status":"claimed_verified","statusLastChangedAt":"2025-01-13T08:54:43.137Z","hidden":false},{"_id":"677d15386ddea8749283f9c5","user":{"_id":"6697ac8427e4e21a3a92da27","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6697ac8427e4e21a3a92da27/9vn07-1_BBDk9zfDtDpcG.png","isPro":false,"fullname":"Ruijie Zhu","user":"RuijieZhu","type":"user"},"name":"Ruijie Zhu","status":"admin_assigned","statusLastChangedAt":"2025-01-07T13:22:26.716Z","hidden":false},{"_id":"677d15386ddea8749283f9c6","name":"Li Liu","hidden":false},{"_id":"677d15386ddea8749283f9c7","user":{"_id":"6423efbdb77cc3daf8429755","avatarUrl":"/avatars/a5b480713b4dd1dae8191545cb4c6f94.svg","isPro":false,"fullname":"Peng-Tao Jiang","user":"ptjiang","type":"user"},"name":"Peng-Tao Jiang","status":"admin_assigned","statusLastChangedAt":"2025-01-07T13:22:04.674Z","hidden":false},{"_id":"677d15386ddea8749283f9c8","user":{"_id":"662b8d0083de3e26a6d9f1d1","avatarUrl":"/avatars/d9566ba1881f86413f5de10a45f24673.svg","isPro":false,"fullname":"Tianzhu Zhang","user":"ztz1989","type":"user"},"name":"Tianzhu Zhang","status":"admin_assigned","statusLastChangedAt":"2025-01-07T13:21:57.347Z","hidden":false}],"publishedAt":"2025-01-05T15:18:32.000Z","submittedOnDailyAt":"2025-01-07T09:23:36.849Z","title":"DepthMaster: Taming Diffusion Models for Monocular Depth Estimation","submittedOnDailyBy":{"_id":"6697ac8427e4e21a3a92da27","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6697ac8427e4e21a3a92da27/9vn07-1_BBDk9zfDtDpcG.png","isPro":false,"fullname":"Ruijie Zhu","user":"RuijieZhu","type":"user"},"summary":"Monocular depth estimation within the diffusion-denoising paradigm\ndemonstrates impressive generalization ability but suffers from low inference\nspeed. Recent methods adopt a single-step deterministic paradigm to improve\ninference efficiency while maintaining comparable performance. However, they\noverlook the gap between generative and discriminative features, leading to\nsuboptimal results. In this work, we propose DepthMaster, a single-step\ndiffusion model designed to adapt generative features for the discriminative\ndepth estimation task. First, to mitigate overfitting to texture details\nintroduced by generative features, we propose a Feature Alignment module, which\nincorporates high-quality semantic features to enhance the denoising network's\nrepresentation capability. Second, to address the lack of fine-grained details\nin the single-step deterministic framework, we propose a Fourier Enhancement\nmodule to adaptively balance low-frequency structure and high-frequency\ndetails. We adopt a two-stage training strategy to fully leverage the potential\nof the two modules. In the first stage, we focus on learning the global scene\nstructure with the Feature Alignment module, while in the second stage, we\nexploit the Fourier Enhancement module to improve the visual quality. Through\nthese efforts, our model achieves state-of-the-art performance in terms of\ngeneralization and detail preservation, outperforming other diffusion-based\nmethods across various datasets. Our project page can be found at\nhttps://indu1ge.github.io/DepthMaster_page.","upvotes":15,"discussionId":"677d15396ddea8749283fa0b","projectPage":"https://indu1ge.github.io/DepthMaster_page","githubRepo":"https://github.com/indu1ge/DepthMaster","githubRepoAddedBy":"user","ai_summary":"DepthMaster, a single-step diffusion model with Feature Alignment and Fourier Enhancement modules, achieves state-of-the-art performance in monocular depth estimation by balancing generative and discriminative features.","ai_keywords":["diffusion-denoising paradigm","monocular depth estimation","deterministic paradigm","generative features","discriminative features","DepthMaster","Feature Alignment module","Fourier Enhancement module","two-stage training strategy","global scene structure","visual quality"],"githubStars":67},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6697ac8427e4e21a3a92da27","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6697ac8427e4e21a3a92da27/9vn07-1_BBDk9zfDtDpcG.png","isPro":false,"fullname":"Ruijie Zhu","user":"RuijieZhu","type":"user"},{"_id":"666fbf760de9ee884b99db29","avatarUrl":"/avatars/9a7ecedc703e92d1ebf34ca0bd35e41b.svg","isPro":false,"fullname":"Ziyang Song","user":"zysong212","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"66dd88305598833ba13db22e","avatarUrl":"/avatars/9b9beff4ce1c8749511c62716bccbc49.svg","isPro":false,"fullname":"Chris","user":"Cmonsta","type":"user"},{"_id":"6423efbdb77cc3daf8429755","avatarUrl":"/avatars/a5b480713b4dd1dae8191545cb4c6f94.svg","isPro":false,"fullname":"Peng-Tao Jiang","user":"ptjiang","type":"user"},{"_id":"666ad446f0f6bd52df7a1a1a","avatarUrl":"/avatars/8c0ffef22e6fa6ac2937d6dfcabc7194.svg","isPro":false,"fullname":"ZhangHao","user":"PierreHaoZHANG","type":"user"},{"_id":"66d6766ea5098dc770ab41bd","avatarUrl":"/avatars/b1c9b3c3c177230f050d3ad07fbb3ff7.svg","isPro":false,"fullname":"Yuqi Yang","user":"Yangyq223","type":"user"},{"_id":"6627a1710ea5d4bd919330b4","avatarUrl":"/avatars/96d3ffeab387bb1b98379759647ce039.svg","isPro":false,"fullname":"Lujian Yao","user":"LujianYao","type":"user"},{"_id":"66f650f734e277e391b4c631","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66f650f734e277e391b4c631/31zQP-vya4ejHMZrDQHIZ.jpeg","isPro":false,"fullname":"Huang Long Fei","user":"LongfeiH","type":"user"},{"_id":"65eeec3f7fc3ae80789e3a18","avatarUrl":"/avatars/408bea13050dbd4434e3319ea43cf281.svg","isPro":false,"fullname":"xiaoxiao","user":"XXiao12138","type":"user"},{"_id":"673d5cc65dfe2ce962dd1e8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Qxb1NHX4KUD04a4nYYdgC.png","isPro":false,"fullname":"shuolinxu","user":"shuolin","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2501.02576

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Published on Jan 5, 2025
· Submitted by
Ruijie Zhu
on Jan 7, 2025

Abstract

DepthMaster, a single-step diffusion model with Feature Alignment and Fourier Enhancement modules, achieves state-of-the-art performance in monocular depth estimation by balancing generative and discriminative features.

AI-generated summary

Monocular depth estimation within the diffusion-denoising paradigm demonstrates impressive generalization ability but suffers from low inference speed. Recent methods adopt a single-step deterministic paradigm to improve inference efficiency while maintaining comparable performance. However, they overlook the gap between generative and discriminative features, leading to suboptimal results. In this work, we propose DepthMaster, a single-step diffusion model designed to adapt generative features for the discriminative depth estimation task. First, to mitigate overfitting to texture details introduced by generative features, we propose a Feature Alignment module, which incorporates high-quality semantic features to enhance the denoising network's representation capability. Second, to address the lack of fine-grained details in the single-step deterministic framework, we propose a Fourier Enhancement module to adaptively balance low-frequency structure and high-frequency details. We adopt a two-stage training strategy to fully leverage the potential of the two modules. In the first stage, we focus on learning the global scene structure with the Feature Alignment module, while in the second stage, we exploit the Fourier Enhancement module to improve the visual quality. Through these efforts, our model achieves state-of-the-art performance in terms of generalization and detail preservation, outperforming other diffusion-based methods across various datasets. Our project page can be found at https://indu1ge.github.io/DepthMaster_page.

Community

Paper author Paper submitter

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Great work!

Is there a plan to release training code for this paper, or is the plan to continue to optimize the parameter count?

·
Paper author

Thank you for your interest in our work! The code is currently undergoing a confidentiality review, and we will release it soon. Besides, we do have the plan to further optimize the parameter count.

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.02576 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 4