Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - In-Context Imitation Learning via Next-Token Prediction
[go: Go Back, main page]

https://icrt.dev/
Code, checkpoints, dataset: https://github.com/Max-Fu/icrt

\n

","updatedAt":"2024-08-29T04:00:42.261Z","author":{"_id":"64908d6b0c18343a09361b93","avatarUrl":"/avatars/2fa44344fe760f68f19de775b76f45b8.svg","fullname":"Max (Letian) Fu","name":"mlfu7","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7993589639663696},"editors":["mlfu7"],"editorAvatarUrls":["/avatars/2fa44344fe760f68f19de775b76f45b8.svg"],"reactions":[],"isReport":false},"replies":[{"id":"66d58b932fa8a088dcad4a10","author":{"_id":"5f1158120c833276f61f1a84","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1608042047613-5f1158120c833276f61f1a84.jpeg","fullname":"Niels Rogge","name":"nielsr","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1096,"isUserFollowing":false},"createdAt":"2024-09-02T09:55:31.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Congrats @mlfu7! I opened https://github.com/Max-Fu/icrt/issues/1 for some small improvements","html":"

Congrats \n\n@mlfu7\n\t! I opened https://github.com/Max-Fu/icrt/issues/1 for some small improvements

\n","updatedAt":"2024-09-02T09:55:31.522Z","author":{"_id":"5f1158120c833276f61f1a84","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1608042047613-5f1158120c833276f61f1a84.jpeg","fullname":"Niels Rogge","name":"nielsr","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1096,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7928901314735413},"editors":["nielsr"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1608042047613-5f1158120c833276f61f1a84.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"66cff26ab170758646a7257f"}}]},{"id":"66d1213d41428ae3381ed71f","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-08-30T01:32:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning](https://huggingface.co/papers/2408.01147) (2024)\n* [Robotic Control via Embodied Chain-of-Thought Reasoning](https://huggingface.co/papers/2407.08693) (2024)\n* [Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts](https://huggingface.co/papers/2407.14872) (2024)\n* [GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy](https://huggingface.co/papers/2408.14368) (2024)\n* [Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals](https://huggingface.co/papers/2407.05996) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-08-30T01:32:45.612Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7237108945846558},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2408.15980","authors":[{"_id":"66cfe2810c8862832e552b36","user":{"_id":"64908d6b0c18343a09361b93","avatarUrl":"/avatars/2fa44344fe760f68f19de775b76f45b8.svg","isPro":false,"fullname":"Max (Letian) Fu","user":"mlfu7","type":"user"},"name":"Letian Fu","status":"admin_assigned","statusLastChangedAt":"2024-08-29T07:53:10.487Z","hidden":false},{"_id":"66cfe2810c8862832e552b37","name":"Huang Huang","hidden":false},{"_id":"66cfe2810c8862832e552b38","user":{"_id":"65caf5f998a46a5626426bd1","avatarUrl":"/avatars/636301ed1f3577b85787959651b93279.svg","isPro":false,"fullname":"Gaurav Datta","user":"gdatta","type":"user"},"name":"Gaurav Datta","status":"admin_assigned","statusLastChangedAt":"2024-08-29T07:54:05.965Z","hidden":false},{"_id":"66cfe2810c8862832e552b39","user":{"_id":"6683696648edefb453aa39f1","avatarUrl":"/avatars/b056d8499420966049b40bb6a03a8782.svg","isPro":false,"fullname":"Lawrence Chen","user":"yunliangchen","type":"user"},"name":"Lawrence Yunliang Chen","status":"admin_assigned","statusLastChangedAt":"2024-08-29T07:54:46.288Z","hidden":false},{"_id":"66cfe2810c8862832e552b3a","name":"William Chung-Ho Panitch","hidden":false},{"_id":"66cfe2810c8862832e552b3b","user":{"_id":"63609ef63605bd411c1c7509","avatarUrl":"/avatars/3fadc3c4117cdf17d25f3efc545388cd.svg","isPro":false,"fullname":"Fangchen Liu","user":"fangchenliu","type":"user"},"name":"Fangchen Liu","status":"admin_assigned","statusLastChangedAt":"2024-08-29T07:55:11.431Z","hidden":false},{"_id":"66cfe2810c8862832e552b3c","name":"Hui Li","hidden":false},{"_id":"66cfe2810c8862832e552b3d","name":"Ken Goldberg","hidden":false}],"publishedAt":"2024-08-28T17:50:19.000Z","submittedOnDailyAt":"2024-08-29T02:30:42.256Z","title":"In-Context Imitation Learning via Next-Token Prediction","submittedOnDailyBy":{"_id":"64908d6b0c18343a09361b93","avatarUrl":"/avatars/2fa44344fe760f68f19de775b76f45b8.svg","isPro":false,"fullname":"Max (Letian) Fu","user":"mlfu7","type":"user"},"summary":"We explore how to enhance next-token prediction models to perform in-context\nimitation learning on a real robot, where the robot executes new tasks by\ninterpreting contextual information provided during the input phase, without\nupdating its underlying policy parameters. We propose In-Context Robot\nTransformer (ICRT), a causal transformer that performs autoregressive\nprediction on sensorimotor trajectories without relying on any linguistic data\nor reward function. This formulation enables flexible and training-free\nexecution of new tasks at test time, achieved by prompting the model with\nsensorimotor trajectories of the new task composing of image observations,\nactions and states tuples, collected through human teleoperation. Experiments\nwith a Franka Emika robot demonstrate that the ICRT can adapt to new tasks\nspecified by prompts, even in environment configurations that differ from both\nthe prompt and the training data. In a multitask environment setup, ICRT\nsignificantly outperforms current state-of-the-art next-token prediction models\nin robotics on generalizing to unseen tasks. Code, checkpoints and data are\navailable on https://icrt.dev/","upvotes":10,"discussionId":"66cfe2830c8862832e552bf1","projectPage":"https://icrt.dev/","githubRepo":"https://github.com/Max-Fu/icrt","githubRepoAddedBy":"user","ai_summary":"In-Context Robot Transformer (ICRT) enables flexible and training-free execution of new tasks by interpreting sensorimotor trajectories during input phase without updating policy parameters.","ai_keywords":["next-token prediction models","in-context imitation learning","causal transformer","autoregressive prediction","sensorimotor trajectories","prompt-based","human teleoperation","multitask environment","unseen tasks"],"githubStars":107},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"63797c273f575acc2f6893c0","avatarUrl":"/avatars/32d7a6a8881c8c4d80a097b732ed24b6.svg","isPro":true,"fullname":"Long(Tony) Lian","user":"longlian","type":"user"},{"_id":"64908d6b0c18343a09361b93","avatarUrl":"/avatars/2fa44344fe760f68f19de775b76f45b8.svg","isPro":false,"fullname":"Max (Letian) Fu","user":"mlfu7","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"6334f2f1259c518276efa730","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6334f2f1259c518276efa730/z_SH_OBkDyj4RCN9mqsKS.jpeg","isPro":false,"fullname":"Shuo Zhang","user":"Meteonis","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"64587be872b60ae7a3817858","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64587be872b60ae7a3817858/BbdOOxOCEzWTvEpkWp8MM.png","isPro":false,"fullname":"Minbyul Jeong","user":"Minbyul","type":"user"},{"_id":"635fd74e14657fb8cff2bc13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635fd74e14657fb8cff2bc13/lUlHB0z1CRPJpwwT3JcnO.jpeg","isPro":false,"fullname":"Chan Kim","user":"chanmuzi","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"663ccbff3a74a20189d4aa2e","avatarUrl":"/avatars/83a54455e0157480f65c498cd9057cf2.svg","isPro":false,"fullname":"Nguyen Van Thanh","user":"NguyenVanThanhHust","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2408.15980

In-Context Imitation Learning via Next-Token Prediction

Published on Aug 28, 2024
· Submitted by
Max (Letian) Fu
on Aug 29, 2024
Authors:
,
,
,

Abstract

In-Context Robot Transformer (ICRT) enables flexible and training-free execution of new tasks by interpreting sensorimotor trajectories during input phase without updating policy parameters.

AI-generated summary

We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor trajectories without relying on any linguistic data or reward function. This formulation enables flexible and training-free execution of new tasks at test time, achieved by prompting the model with sensorimotor trajectories of the new task composing of image observations, actions and states tuples, collected through human teleoperation. Experiments with a Franka Emika robot demonstrate that the ICRT can adapt to new tasks specified by prompts, even in environment configurations that differ from both the prompt and the training data. In a multitask environment setup, ICRT significantly outperforms current state-of-the-art next-token prediction models in robotics on generalizing to unseen tasks. Code, checkpoints and data are available on https://icrt.dev/

Community

Paper author Paper submitter

TL;DR: We approach in-context, multi-task imitation learning on a physical robot as a next-token prediction problem. We train a causal transformer on concatenated robot trajectories. During testing, the model can execute a new task in a different environment configuration without fine-tuning by being prompted with raw robot trajectories collected via human teleoperation that perform the new task.

Website: https://icrt.dev/
Code, checkpoints, dataset: https://github.com/Max-Fu/icrt

·

Congrats @mlfu7 ! I opened https://github.com/Max-Fu/icrt/issues/1 for some small improvements

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2408.15980 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.15980 in a Space README.md to link it from this page.

Collections including this paper 2