https://arxivexplained.com/papers/go-to-zero-towards-zero-shot-motion-generation-with-million-scale-data

\n","updatedAt":"2025-07-10T15:00:49.730Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7451276779174805},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[],"isReport":false}},{"id":"686fd61acd677c699a28f7f6","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2025-07-10T15:02:50.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/go-to-zero-towards-zero-shot-motion-generation-with-million-scale-data","html":"

arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/go-to-zero-towards-zero-shot-motion-generation-with-million-scale-data

\n","updatedAt":"2025-07-10T15:02:50.196Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7451276779174805},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2507.07095","authors":[{"_id":"686f2579d938c25d68441b43","name":"Ke Fan","hidden":false},{"_id":"686f2579d938c25d68441b44","name":"Shunlin Lu","hidden":false},{"_id":"686f2579d938c25d68441b45","user":{"_id":"6853b71ec1be83a29eb5ba36","avatarUrl":"/avatars/ae205c2ec2c421a0d7851755b4f123a2.svg","isPro":false,"fullname":"Minyue Dai","user":"Jixi111","type":"user"},"name":"Minyue Dai","status":"claimed_verified","statusLastChangedAt":"2025-07-10T07:09:19.845Z","hidden":false},{"_id":"686f2579d938c25d68441b46","name":"Runyi Yu","hidden":false},{"_id":"686f2579d938c25d68441b47","name":"Lixing Xiao","hidden":false},{"_id":"686f2579d938c25d68441b48","user":{"_id":"645223fb01d7bd9555ea399a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645223fb01d7bd9555ea399a/fVR7XmGg6pMSRKx9sEdvT.png","isPro":false,"fullname":"Zhiyang Dou","user":"frankzydou","type":"user"},"name":"Zhiyang Dou","status":"claimed_verified","statusLastChangedAt":"2025-07-17T08:21:12.199Z","hidden":false},{"_id":"686f2579d938c25d68441b49","name":"Junting Dong","hidden":false},{"_id":"686f2579d938c25d68441b4a","name":"Lizhuang Ma","hidden":false},{"_id":"686f2579d938c25d68441b4b","name":"Jingbo Wang","hidden":false}],"publishedAt":"2025-07-09T17:52:04.000Z","submittedOnDailyAt":"2025-07-10T01:12:54.854Z","title":"Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data","submittedOnDailyBy":{"_id":"66d59dc9b005ad82ca6fc61d","avatarUrl":"/avatars/0ba424690afd1144a89665c5bacdfde7.svg","isPro":false,"fullname":"Runyi YU","user":"IngridYU","type":"user"},"summary":"Generating diverse and natural human motion sequences based on textual\ndescriptions constitutes a fundamental and challenging research area within the\ndomains of computer vision, graphics, and robotics. Despite significant\nadvancements in this field, current methodologies often face challenges\nregarding zero-shot generalization capabilities, largely attributable to the\nlimited size of training datasets. Moreover, the lack of a comprehensive\nevaluation framework impedes the advancement of this task by failing to\nidentify directions for improvement. In this work, we aim to push\ntext-to-motion into a new era, that is, to achieve the generalization ability\nof zero-shot. To this end, firstly, we develop an efficient annotation pipeline\nand introduce MotionMillion-the largest human motion dataset to date, featuring\nover 2,000 hours and 2 million high-quality motion sequences. Additionally, we\npropose MotionMillion-Eval, the most comprehensive benchmark for evaluating\nzero-shot motion generation. Leveraging a scalable architecture, we scale our\nmodel to 7B parameters and validate its performance on MotionMillion-Eval. Our\nresults demonstrate strong generalization to out-of-domain and complex\ncompositional motions, marking a significant step toward zero-shot human motion\ngeneration. The code is available at\nhttps://github.com/VankouF/MotionMillion-Codes.","upvotes":56,"discussionId":"686f2579d938c25d68441b4c","githubRepo":"https://github.com/VankouF/MotionMillion-Codes","githubRepoAddedBy":"user","ai_summary":"A new dataset and evaluation framework for text-to-motion generation achieve zero-shot generalization using a large-scale model.","ai_keywords":["MotionMillion","MotionMillion-Eval","scalable architecture"],"githubStars":284},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66d59dc9b005ad82ca6fc61d","avatarUrl":"/avatars/0ba424690afd1144a89665c5bacdfde7.svg","isPro":false,"fullname":"Runyi YU","user":"IngridYU","type":"user"},{"_id":"64f3f7333f3c92a7ee93a69f","avatarUrl":"/avatars/302bba633b56aac902bffa2fa623638a.svg","isPro":false,"fullname":"KeFan","user":"Vankou","type":"user"},{"_id":"63bbf972d8d676a2299cdb44","avatarUrl":"/avatars/366d6ca7a4e19e42d2ec236a38d74ebd.svg","isPro":false,"fullname":"Sangwon","user":"agwmon","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"686f5af22dad47efb8f3d7a8","avatarUrl":"/avatars/86074209d87491af3e76d59992c3027f.svg","isPro":false,"fullname":"Penink","user":"Penink","type":"user"},{"_id":"6853b71ec1be83a29eb5ba36","avatarUrl":"/avatars/ae205c2ec2c421a0d7851755b4f123a2.svg","isPro":false,"fullname":"Minyue Dai","user":"Jixi111","type":"user"},{"_id":"655de8835119b63e7ab6a45c","avatarUrl":"/avatars/be592bb065d914115b68357df261eab3.svg","isPro":false,"fullname":"Liang Xu","user":"liangxuy","type":"user"},{"_id":"652e39fea3da41257d6ec253","avatarUrl":"/avatars/fc00bd8bf3704afb269b45b5bf7fd1b9.svg","isPro":false,"fullname":"Li Luo","user":"ReyLock","type":"user"},{"_id":"6165514db5ec555e8e9c203b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6165514db5ec555e8e9c203b/mBdEDqRcuzIOEf3Okq6y7.png","isPro":false,"fullname":"Dai-Wenxun","user":"wxDai","type":"user"},{"_id":"64a4ce8118f4e2529546daef","avatarUrl":"/avatars/6d88aa68eccfa07d2009df405f957fd7.svg","isPro":false,"fullname":"Jiang Lihan","user":"lhjiang","type":"user"},{"_id":"64eadcb03d76028d805a7818","avatarUrl":"/avatars/528e4fded4419caf08589b2ed40437bc.svg","isPro":false,"fullname":"Li Kang","user":"FACEONG","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">

Papers

arxiv:2507.07095

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Published on Jul 9, 2025

· Submitted by

Runyi YU on Jul 10, 2025

Upvote

Authors:

Minyue Dai ,

Zhiyang Dou ,

Abstract

A new dataset and evaluation framework for text-to-motion generation achieve zero-shot generalization using a large-scale model.

AI-generated summary

Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training datasets. Moreover, the lack of a comprehensive evaluation framework impedes the advancement of this task by failing to identify directions for improvement. In this work, we aim to push text-to-motion into a new era, that is, to achieve the generalization ability of zero-shot. To this end, firstly, we develop an efficient annotation pipeline and introduce MotionMillion-the largest human motion dataset to date, featuring over 2,000 hours and 2 million high-quality motion sequences. Additionally, we propose MotionMillion-Eval, the most comprehensive benchmark for evaluating zero-shot motion generation. Leveraging a scalable architecture, we scale our model to 7B parameters and validate its performance on MotionMillion-Eval. Our results demonstrate strong generalization to out-of-domain and complex compositional motions, marking a significant step toward zero-shot human motion generation. The code is available at https://github.com/VankouF/MotionMillion-Codes.