Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Masked Depth Modeling for Spatial Perception
[go: Go Back, main page]

https://github.com/Robbyant/lingbot-depth

\n","updatedAt":"2026-01-27T23:07:22.111Z","author":{"_id":"6485ce5ec7f19728a49df17a","avatarUrl":"/avatars/e83966e6906c1d0f151300981e30f85a.svg","fullname":"Nan","name":"cherubicxn","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6955024600028992},"editors":["cherubicxn"],"editorAvatarUrls":["/avatars/e83966e6906c1d0f151300981e30f85a.svg"],"reactions":[{"reaction":"❤️","users":["IceTTTB","enoky"],"count":2}],"isReport":false}},{"id":"69795e36b8a99cd529db7438","author":{"_id":"6759054caedad0cfca1d970e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/J9UuS12g5U_4nNAXNZuXz.png","fullname":"Bin Tan","name":"IceTTTB","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2026-01-28T00:54:14.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"LingBot-Depth transforms incomplete and noisy depth sensor data into high-quality, metric-accurate 3D measurements. By jointly aligning RGB appearance and depth geometry in a unified latent space, LingBot-Depth serves as a powerful spatial perception foundation for robot learning and 3D vision applications.\n\n","html":"

LingBot-Depth transforms incomplete and noisy depth sensor data into high-quality, metric-accurate 3D measurements. By jointly aligning RGB appearance and depth geometry in a unified latent space, LingBot-Depth serves as a powerful spatial perception foundation for robot learning and 3D vision applications.

\n","updatedAt":"2026-01-28T00:54:14.804Z","author":{"_id":"6759054caedad0cfca1d970e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/J9UuS12g5U_4nNAXNZuXz.png","fullname":"Bin Tan","name":"IceTTTB","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.874785304069519},"editors":["IceTTTB"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/J9UuS12g5U_4nNAXNZuXz.png"],"reactions":[],"isReport":false}},{"id":"69796837ff25f025743871b9","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-01-28T01:36:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding](https://huggingface.co/papers/2512.14028) (2025)\n* [Pixel-Perfect Visual Geometry Estimation](https://huggingface.co/papers/2601.05246) (2026)\n* [MT-Depth: Multi-task Instance feature analysis for the Depth Completion](https://huggingface.co/papers/2512.04734) (2025)\n* [EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes](https://huggingface.co/papers/2512.00771) (2025)\n* [Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation](https://huggingface.co/papers/2512.23705) (2025)\n* [BokehDepth: Enhancing Monocular Depth Estimation through Bokeh Generation](https://huggingface.co/papers/2512.12425) (2025)\n* [Geometry-Aware Sparse Depth Sampling for High-Fidelity RGB-D Depth Completion in Robotic Systems](https://huggingface.co/papers/2512.08229) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-01-28T01:36:55.653Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6666870713233948},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["cherubicxn","Mosai-Sys"],"count":2}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.17895","authors":[{"_id":"69781d38026bdf0473116d7f","name":"Bin Tan","hidden":false},{"_id":"69781d38026bdf0473116d80","name":"Changjiang Sun","hidden":false},{"_id":"69781d38026bdf0473116d81","name":"Xiage Qin","hidden":false},{"_id":"69781d38026bdf0473116d82","name":"Hanat Adai","hidden":false},{"_id":"69781d38026bdf0473116d83","name":"Zelin Fu","hidden":false},{"_id":"69781d38026bdf0473116d84","name":"Tianxiang Zhou","hidden":false},{"_id":"69781d38026bdf0473116d85","name":"Han Zhang","hidden":false},{"_id":"69781d38026bdf0473116d86","user":{"_id":"6555da9adcf410fd0753569c","avatarUrl":"/avatars/ac58a796bb54f334fdb475a4b75c4d27.svg","isPro":false,"fullname":"Yinghao Xu","user":"justimyhxu","type":"user"},"name":"Yinghao Xu","status":"claimed_verified","statusLastChangedAt":"2026-02-04T12:34:13.671Z","hidden":false},{"_id":"69781d38026bdf0473116d87","name":"Xing Zhu","hidden":false},{"_id":"69781d38026bdf0473116d88","name":"Yujun Shen","hidden":false},{"_id":"69781d38026bdf0473116d89","name":"Nan Xue","hidden":false}],"publishedAt":"2026-01-25T16:13:49.000Z","submittedOnDailyAt":"2026-01-27T20:37:22.098Z","title":"Masked Depth Modeling for Spatial Perception","submittedOnDailyBy":{"_id":"6485ce5ec7f19728a49df17a","avatarUrl":"/avatars/e83966e6906c1d0f151300981e30f85a.svg","isPro":true,"fullname":"Nan","user":"cherubicxn","type":"user"},"summary":"Spatial visual perception is a fundamental requirement in physical-world applications like autonomous driving and robotic manipulation, driven by the need to interact with 3D environments. Capturing pixel-aligned metric depth using RGB-D cameras would be the most viable way, yet it usually faces obstacles posed by hardware limitations and challenging imaging conditions, especially in the presence of specular or texture-less surfaces. In this work, we argue that the inaccuracies from depth sensors can be viewed as \"masked\" signals that inherently reflect underlying geometric ambiguities. Building on this motivation, we present LingBot-Depth, a depth completion model which leverages visual context to refine depth maps through masked depth modeling and incorporates an automated data curation pipeline for scalable training. It is encouraging to see that our model outperforms top-tier RGB-D cameras in terms of both depth precision and pixel coverage. Experimental results on a range of downstream tasks further suggest that LingBot-Depth offers an aligned latent representation across RGB and depth modalities. We release the code, checkpoint, and 3M RGB-depth pairs (including 2M real data and 1M simulated data) to the community of spatial perception.","upvotes":25,"discussionId":"69781d39026bdf0473116d8a","projectPage":"https://technology.robbyant.com/lingbot-depth","githubRepo":"https://github.com/Robbyant/lingbot-depth","githubRepoAddedBy":"user","ai_summary":"LingBot-Depth is a depth completion model that uses visual context to refine depth maps through masked depth modeling and automated data curation for improved spatial perception in robotics and autonomous systems.","ai_keywords":["depth completion","masked depth modeling","visual context","automated data curation","RGB-D cameras","spatial perception","latent representation"],"githubStars":873,"organization":{"_id":"69709f892cd08371c1011a2e","name":"robbyant","fullname":"Robbyant","avatar":"https://cdn-uploads.huggingface.co/production/uploads/67aeffda7330db26f93cd62f/ZTuImney4XzRmBHyUL47F.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6485ce5ec7f19728a49df17a","avatarUrl":"/avatars/e83966e6906c1d0f151300981e30f85a.svg","isPro":true,"fullname":"Nan","user":"cherubicxn","type":"user"},{"_id":"6759054caedad0cfca1d970e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/J9UuS12g5U_4nNAXNZuXz.png","isPro":false,"fullname":"Bin Tan","user":"IceTTTB","type":"user"},{"_id":"64252045a4f3051f54dd1d53","avatarUrl":"/avatars/0e423a3291091be3b4736a14da3ce495.svg","isPro":false,"fullname":"kecheng zheng","user":"zkcys001","type":"user"},{"_id":"67aeffda7330db26f93cd62f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67aeffda7330db26f93cd62f/ktdFt5_F05qNJ-dnD_Fk3.jpeg","isPro":false,"fullname":"jiangbonadia","user":"NadiaJiang","type":"user"},{"_id":"686517e6decc7f797adeba43","avatarUrl":"/avatars/6702a56d97658b94e504cde2478dd4b7.svg","isPro":false,"fullname":"Xiage Qin","user":"qinxiage","type":"user"},{"_id":"68234b1167281062097fc34e","avatarUrl":"/avatars/20ab2a15f27035c8b23d9a123e5ce22c.svg","isPro":false,"fullname":"HeSun","user":"he777771","type":"user"},{"_id":"64b7920930a0b8ff601d9ff5","avatarUrl":"/avatars/828c0b9e7325fcebf782ec46c38e0c2d.svg","isPro":false,"fullname":"Aidai","user":"Hanat","type":"user"},{"_id":"69773d4ee6878183fb90a8c7","avatarUrl":"/avatars/3e9e4081e3beaf3f69c380387b8ee4c2.svg","isPro":false,"fullname":"Wei Wu","user":"Weiww99","type":"user"},{"_id":"6979a6afca1b42120a41011a","avatarUrl":"/avatars/3dfe50466bbf338afb1bd523628af7c7.svg","isPro":false,"fullname":"Linyu","user":"Susheng111","type":"user"},{"_id":"66b30fc792810adbb0443fc4","avatarUrl":"/avatars/9f0d726e43f000ea442489eab197e363.svg","isPro":false,"fullname":"chen","user":"x2sss","type":"user"},{"_id":"685be91389666d5c63dd62c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/2eXN6Wx5liJsLdFYSDFba.png","isPro":false,"fullname":"zhangbaojie","user":"zhangmarya","type":"user"},{"_id":"6979a85f022b3696695fb43a","avatarUrl":"/avatars/1272f4b6253a2587a6b0591b444a1e55.svg","isPro":false,"fullname":"MU","user":"YUJIANLOU","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"69709f892cd08371c1011a2e","name":"robbyant","fullname":"Robbyant","avatar":"https://cdn-uploads.huggingface.co/production/uploads/67aeffda7330db26f93cd62f/ZTuImney4XzRmBHyUL47F.png"}}">
Papers
arxiv:2601.17895

Masked Depth Modeling for Spatial Perception

Published on Jan 25
· Submitted by
Nan
on Jan 27
Authors:
,
,
,
,
,
,
,
,
,

Abstract

LingBot-Depth is a depth completion model that uses visual context to refine depth maps through masked depth modeling and automated data curation for improved spatial perception in robotics and autonomous systems.

AI-generated summary

Spatial visual perception is a fundamental requirement in physical-world applications like autonomous driving and robotic manipulation, driven by the need to interact with 3D environments. Capturing pixel-aligned metric depth using RGB-D cameras would be the most viable way, yet it usually faces obstacles posed by hardware limitations and challenging imaging conditions, especially in the presence of specular or texture-less surfaces. In this work, we argue that the inaccuracies from depth sensors can be viewed as "masked" signals that inherently reflect underlying geometric ambiguities. Building on this motivation, we present LingBot-Depth, a depth completion model which leverages visual context to refine depth maps through masked depth modeling and incorporates an automated data curation pipeline for scalable training. It is encouraging to see that our model outperforms top-tier RGB-D cameras in terms of both depth precision and pixel coverage. Experimental results on a range of downstream tasks further suggest that LingBot-Depth offers an aligned latent representation across RGB and depth modalities. We release the code, checkpoint, and 3M RGB-depth pairs (including 2M real data and 1M simulated data) to the community of spatial perception.

Community

Paper submitter

Website: technology.robbyant.com/lingbot-depth
Code: https://github.com/Robbyant/lingbot-depth

LingBot-Depth transforms incomplete and noisy depth sensor data into high-quality, metric-accurate 3D measurements. By jointly aligning RGB appearance and depth geometry in a unified latent space, LingBot-Depth serves as a powerful spatial perception foundation for robot learning and 3D vision applications.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.17895 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.17895 in a Space README.md to link it from this page.

Collections including this paper 1