Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Theia: Distilling Diverse Vision Foundation Models for Robot Learning
[go: Go Back, main page]

http://theia.theaiinstitute.com/

\n","updatedAt":"2024-07-30T14:30:37.954Z","author":{"_id":"666c76ad3ae28e1bcfe02c16","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/666c76ad3ae28e1bcfe02c16/GG9ImF8QF8BC__7ofbaxO.jpeg","fullname":"Brandon B. May","name":"bmay","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3829937279224396},"editors":["bmay"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/666c76ad3ae28e1bcfe02c16/GG9ImF8QF8BC__7ofbaxO.jpeg"],"reactions":[],"isReport":false}},{"id":"66a99264b49f572327c58277","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-07-31T01:24:52.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Learning Manipulation by Predicting Interaction](https://huggingface.co/papers/2406.00439) (2024)\n* [Pretrained Visual Representations in Reinforcement Learning](https://huggingface.co/papers/2407.17238) (2024)\n* [LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning](https://huggingface.co/papers/2406.11815) (2024)\n* [OpenVLA: An Open-Source Vision-Language-Action Model](https://huggingface.co/papers/2406.09246) (2024)\n* [HRP: Human Affordances for Robotic Pre-Training](https://huggingface.co/papers/2407.18911) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-07-31T01:24:52.180Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.746969997882843},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2407.20179","authors":[{"_id":"66a8542b26f6962f854fd356","user":{"_id":"63f686c49cbd6730302cf0f2","avatarUrl":"/avatars/b553823be640a07eeb4c7edc2c176d5d.svg","isPro":false,"fullname":"Jinghuan Shang","user":"Jinghuan","type":"user"},"name":"Jinghuan Shang","status":"admin_assigned","statusLastChangedAt":"2024-07-30T08:15:44.374Z","hidden":false},{"_id":"66a8542b26f6962f854fd357","user":{"_id":"662c012cf5f38dec89bf8ec1","avatarUrl":"/avatars/672c96fbf25225be51cd023e8be8e1f9.svg","isPro":false,"fullname":"Karl Schmeckpeper","user":"kschmeckpeper","type":"user"},"name":"Karl Schmeckpeper","status":"admin_assigned","statusLastChangedAt":"2024-07-30T08:15:49.919Z","hidden":false},{"_id":"66a8542b26f6962f854fd358","user":{"_id":"666c76ad3ae28e1bcfe02c16","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/666c76ad3ae28e1bcfe02c16/GG9ImF8QF8BC__7ofbaxO.jpeg","isPro":false,"fullname":"Brandon B. May","user":"bmay","type":"user"},"name":"Brandon B. May","status":"admin_assigned","statusLastChangedAt":"2024-07-30T08:16:27.428Z","hidden":false},{"_id":"66a8542b26f6962f854fd359","user":{"_id":"665a13ce042dcc43ad3cefec","avatarUrl":"/avatars/28a1c0948431eac2e166dcc88468ffd8.svg","isPro":false,"fullname":"Maria Vittoria Minniti","user":"mminniti-bdai","type":"user"},"name":"Maria Vittoria Minniti","status":"admin_assigned","statusLastChangedAt":"2024-07-30T08:16:33.590Z","hidden":false},{"_id":"66a8542b26f6962f854fd35a","user":{"_id":"63bb24c9df5897db7f05047f","avatarUrl":"/avatars/80cfd3dc400f9e2dccd275dc54d3e92b.svg","isPro":false,"fullname":"Tarik Kelestemur","user":"kelestemur","type":"user"},"name":"Tarik Kelestemur","status":"admin_assigned","statusLastChangedAt":"2024-07-30T08:16:38.905Z","hidden":false},{"_id":"66a8542b26f6962f854fd35b","user":{"_id":"66a8cc342c7c3ebdd77652f7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66a8cc342c7c3ebdd77652f7/K9gKXX2ciBcSZOm0IPXiq.jpeg","isPro":false,"fullname":"David Watkins","user":"davidwatkins","type":"user"},"name":"David Watkins","status":"claimed_verified","statusLastChangedAt":"2024-07-30T11:38:47.253Z","hidden":false},{"_id":"66a8542b26f6962f854fd35c","user":{"_id":"66a10bba8645e7fa2629bde6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66a10bba8645e7fa2629bde6/HvF95j4bwEzc_ZCIqN6gI.jpeg","isPro":false,"fullname":"Laura Herlant","user":"lherlant","type":"user"},"name":"Laura Herlant","status":"admin_assigned","statusLastChangedAt":"2024-07-30T08:17:08.638Z","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/63f686c49cbd6730302cf0f2/I4rVYXO38v1lsXEsSw9kD.mp4"],"publishedAt":"2024-07-29T17:08:21.000Z","submittedOnDailyAt":"2024-07-30T01:55:27.975Z","title":"Theia: Distilling Diverse Vision Foundation Models for Robot Learning","submittedOnDailyBy":{"_id":"63f686c49cbd6730302cf0f2","avatarUrl":"/avatars/b553823be640a07eeb4c7edc2c176d5d.svg","isPro":false,"fullname":"Jinghuan Shang","user":"Jinghuan","type":"user"},"summary":"Vision-based robot policy learning, which maps visual inputs to actions,\nnecessitates a holistic understanding of diverse visual tasks beyond\nsingle-task needs like classification or segmentation. Inspired by this, we\nintroduce Theia, a vision foundation model for robot learning that distills\nmultiple off-the-shelf vision foundation models trained on varied vision tasks.\nTheia's rich visual representations encode diverse visual knowledge, enhancing\ndownstream robot learning. Extensive experiments demonstrate that Theia\noutperforms its teacher models and prior robot learning models using less\ntraining data and smaller model sizes. Additionally, we quantify the quality of\npre-trained visual representations and hypothesize that higher entropy in\nfeature norm distributions leads to improved robot learning performance. Code\nand models are available at https://github.com/bdaiinstitute/theia.","upvotes":47,"discussionId":"66a8543226f6962f854fd56d","githubRepo":"https://github.com/bdaiinstitute/theia","githubRepoAddedBy":"auto","ai_summary":"Theia, a vision foundation model for robot learning, outperforms existing models with less data and smaller sizes by distilling knowledge from multiple vision foundation models.","ai_keywords":["vision foundation model","robot policy learning","visual representations","off-the-shelf models","feature norm distributions","entropy"],"githubStars":269},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63f686c49cbd6730302cf0f2","avatarUrl":"/avatars/b553823be640a07eeb4c7edc2c176d5d.svg","isPro":false,"fullname":"Jinghuan Shang","user":"Jinghuan","type":"user"},{"_id":"6369c6b17f319ce3573771ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6369c6b17f319ce3573771ae/g7zVTXWBeeOBQqR5mOByS.jpeg","isPro":false,"fullname":"Xiang Li","user":"variante","type":"user"},{"_id":"6657a8d56e1aa2b59f4173d5","avatarUrl":"/avatars/e301f9812cb6c73f0c5e149d29b09426.svg","isPro":false,"fullname":"Xueying","user":"xybai","type":"user"},{"_id":"63bb24c9df5897db7f05047f","avatarUrl":"/avatars/80cfd3dc400f9e2dccd275dc54d3e92b.svg","isPro":false,"fullname":"Tarik Kelestemur","user":"kelestemur","type":"user"},{"_id":"61af81009f77f7b669578f95","avatarUrl":"/avatars/fb50773ac49948940eb231834ee6f2fd.svg","isPro":false,"fullname":"rotem israeli","user":"irotem98","type":"user"},{"_id":"664775508fa42b4fe7f9638c","avatarUrl":"/avatars/302f0adca572a139023221ee6476e22f.svg","isPro":false,"fullname":"Yoo Sung Jang","user":"yjang43","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"642a298ae5f33939cf3ee600","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642a298ae5f33939cf3ee600/ROic-g7m7FDVAD3nfEEcE.jpeg","isPro":false,"fullname":"Zhuoheng Li","user":"StarCycle","type":"user"},{"_id":"65eff89cc109e95938ce3383","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65eff89cc109e95938ce3383/vYajrpB8SO2uwiZK7eCCd.png","isPro":false,"fullname":"Enneng Yang","user":"EnnengYang","type":"user"},{"_id":"658613bc08f83845fce85d7b","avatarUrl":"/avatars/6d00d23fc80294b3097093c92261c3dc.svg","isPro":false,"fullname":"Brandon May","user":"spart1cle","type":"user"},{"_id":"666c76ad3ae28e1bcfe02c16","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/666c76ad3ae28e1bcfe02c16/GG9ImF8QF8BC__7ofbaxO.jpeg","isPro":false,"fullname":"Brandon B. May","user":"bmay","type":"user"},{"_id":"65646b22ac9d3c2bd7b14788","avatarUrl":"/avatars/0bf19dcfa568a694361fb3a63b999997.svg","isPro":false,"fullname":"Juhwan Choi","user":"c-juhwan","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2407.20179

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Published on Jul 29, 2024
· Submitted by
Jinghuan Shang
on Jul 30, 2024

Abstract

Theia, a vision foundation model for robot learning, outperforms existing models with less data and smaller sizes by distilling knowledge from multiple vision foundation models.

AI-generated summary

Vision-based robot policy learning, which maps visual inputs to actions, necessitates a holistic understanding of diverse visual tasks beyond single-task needs like classification or segmentation. Inspired by this, we introduce Theia, a vision foundation model for robot learning that distills multiple off-the-shelf vision foundation models trained on varied vision tasks. Theia's rich visual representations encode diverse visual knowledge, enhancing downstream robot learning. Extensive experiments demonstrate that Theia outperforms its teacher models and prior robot learning models using less training data and smaller model sizes. Additionally, we quantify the quality of pre-trained visual representations and hypothesize that higher entropy in feature norm distributions leads to improved robot learning performance. Code and models are available at https://github.com/bdaiinstitute/theia.

Community

Paper author Paper submitter

Theia builds a robot vision foundation model by distilling existing vision foundation models, which improves downstream robot learning performance, as well as has a smaller model size.

Paper author

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 9

Browse 9 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.20179 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.20179 in a Space README.md to link it from this page.

Collections including this paper 6