https://x.com/BlinkDL_AI/status/1898579674575552558

\n","updatedAt":"2025-03-19T03:23:48.931Z","author":{"_id":"62b3d8d651b07307bd12b7f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1655953609090-noauth.jpeg","fullname":"BlinkDL","name":"BlinkDL","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":936,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5948665142059326},"editors":["BlinkDL"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1655953609090-noauth.jpeg"],"reactions":[{"reaction":"🚀","users":["ZhangRC","shoumenchougou","BlinkDL","stereoplegic","saitejautpala","jadexlaw","luisgalan","hevok","dont-worry-1","howard-hou"],"count":10},{"reaction":"🔥","users":["howard-hou","ZhangRC"],"count":2}],"isReport":false,"parentCommentId":"67da317b11b6db69207a9db2"}},{"id":"67dbd0bd7d4649d04c6782be","author":{"_id":"651c240a37fecec1fe96c60b","avatarUrl":"/avatars/5af52af97b7907e138efecac0f20799b.svg","fullname":"S.F.","name":"search-facility","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2025-03-20T08:24:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"any chance for moving to BlueSky? BlueSky still lacks AI community for now, though...","html":"

any chance for moving to BlueSky? BlueSky still lacks AI community for now, though...

\n","updatedAt":"2025-03-20T08:24:29.331Z","author":{"_id":"651c240a37fecec1fe96c60b","avatarUrl":"/avatars/5af52af97b7907e138efecac0f20799b.svg","fullname":"S.F.","name":"search-facility","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9345461130142212},"editors":["search-facility"],"editorAvatarUrls":["/avatars/5af52af97b7907e138efecac0f20799b.svg"],"reactions":[],"isReport":false,"parentCommentId":"67da317b11b6db69207a9db2"}}]},{"id":"67da698a04c8e330ee189390","author":{"_id":"6797a449ece6b7910c0b385b","avatarUrl":"/avatars/d0e20ebca1d8aa31c3b8e745d0268a72.svg","fullname":"banma","name":"wuhanluojia","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-03-19T06:51:54.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"可并行训练+\n边推理边学习（推理速度显存都恒定）+\n无限上下文+\n无限CoT+\n想知道上述无限具体指的是什么，比如具体应用的时候无限上下文？无限cot？","html":"

可并行训练+
边推理边学习（推理速度显存都恒定）+
无限上下文+
无限CoT+
想知道上述无限具体指的是什么，比如具体应用的时候无限上下文？无限cot？

\n","updatedAt":"2025-03-19T06:51:54.226Z","author":{"_id":"6797a449ece6b7910c0b385b","avatarUrl":"/avatars/d0e20ebca1d8aa31c3b8e745d0268a72.svg","fullname":"banma","name":"wuhanluojia","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"zh","probability":0.9647883176803589},"editors":["wuhanluojia"],"editorAvatarUrls":["/avatars/d0e20ebca1d8aa31c3b8e745d0268a72.svg"],"reactions":[],"isReport":false}},{"id":"67da71451384ff4afc818d33","author":{"_id":"6418629fd13ffa408128d7ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1679319546731-noauth.png","fullname":"Zhang Ruichong","name":"ZhangRC","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":27,"isUserFollowing":false},"createdAt":"2025-03-19T07:24:53.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"> 可并行训练+\n> 边推理边学习（推理速度显存都恒定）+\n> 无限上下文+\n> 无限CoT+\n> 想知道上述无限具体指的是什么，比如具体应用的时候无限上下文？无限cot？\n\nThis claim about \"infinite context\" and \"infinite CoT\" is slightly overclaimed. This isn't from the authors but probably from a product manager. \nHowever, RWKV-7 as an RNN can accept \"indefinite\" context length. It can accept arbitrarily long input but still suffer from forgetting. This is different from \"infinite\", since RNNs may still forget certain information in a very lengthy context.\nSo, \"无限\" means indefinite. \nFor example, RWKV-7 models trained with context length 4k can retain perfect memory until around 8k-16k, after that, they may drastically forget previous information. If necessary, you can fine-tune RWKV-7 models with arbitrarily long context length. Please test specific use cases for clarity.","html":"

\n
可并行训练+
边推理边学习（推理速度显存都恒定）+
无限上下文+
无限CoT+
想知道上述无限具体指的是什么，比如具体应用的时候无限上下文？无限cot？
\n

This claim about \"infinite context\" and \"infinite CoT\" is slightly overclaimed. This isn't from the authors but probably from a product manager.
However, RWKV-7 as an RNN can accept \"indefinite\" context length. It can accept arbitrarily long input but still suffer from forgetting. This is different from \"infinite\", since RNNs may still forget certain information in a very lengthy context.
So, \"无限\" means indefinite.
For example, RWKV-7 models trained with context length 4k can retain perfect memory until around 8k-16k, after that, they may drastically forget previous information. If necessary, you can fine-tune RWKV-7 models with arbitrarily long context length. Please test specific use cases for clarity.

\n","updatedAt":"2025-03-19T09:56:50.889Z","author":{"_id":"6418629fd13ffa408128d7ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1679319546731-noauth.png","fullname":"Zhang Ruichong","name":"ZhangRC","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":27,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.9491083025932312},"editors":["ZhangRC"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1679319546731-noauth.png"],"reactions":[{"reaction":"👍","users":["AdinaY","ZhangRC","jerryji","hevok","EliMC","search-facility","marinaretik","mtasic85","bndp"],"count":9}],"isReport":false}},{"id":"67da7a1666a49d16fa221d14","author":{"_id":"63cac1a50932c72f13995d6f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63cac1a50932c72f13995d6f/Bd9jEsCL9yWXdyAQCx61J.jpeg","fullname":"Nathan Wilce","name":"m8than","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":18,"isUserFollowing":false},"createdAt":"2025-03-19T08:02:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"New paper just dropped 👌👌","html":"

New paper just dropped 👌👌

\n","updatedAt":"2025-03-19T08:02:30.273Z","author":{"_id":"63cac1a50932c72f13995d6f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63cac1a50932c72f13995d6f/Bd9jEsCL9yWXdyAQCx61J.jpeg","fullname":"Nathan Wilce","name":"m8than","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":18,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9254369139671326},"editors":["m8than"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63cac1a50932c72f13995d6f/Bd9jEsCL9yWXdyAQCx61J.jpeg"],"reactions":[{"reaction":"🚀","users":["ZhangRC","hevok","search-facility"],"count":3}],"isReport":false}},{"id":"67da8bbeda32970c672a736e","author":{"_id":"6340fcb095c20b94473d1b6b","avatarUrl":"/avatars/5bb04f2fb6b2aaab9f9b5351a80794b6.svg","fullname":"Jellyfish042","name":"Jellyfish042","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false},"createdAt":"2025-03-19T09:17:50.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Transformers are toast this time for real","html":"

Transformers are toast this time for real

\n","updatedAt":"2025-03-19T09:17:50.975Z","author":{"_id":"6340fcb095c20b94473d1b6b","avatarUrl":"/avatars/5bb04f2fb6b2aaab9f9b5351a80794b6.svg","fullname":"Jellyfish042","name":"Jellyfish042","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.880553126335144},"editors":["Jellyfish042"],"editorAvatarUrls":["/avatars/5bb04f2fb6b2aaab9f9b5351a80794b6.svg"],"reactions":[{"reaction":"😎","users":["ZhangRC","saitejautpala","hevok"],"count":3}],"isReport":false}},{"id":"67dad52f7589cb037ad469ec","author":{"_id":"6418629fd13ffa408128d7ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1679319546731-noauth.png","fullname":"Zhang Ruichong","name":"ZhangRC","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":27,"isUserFollowing":false},"createdAt":"2025-03-19T14:31:11.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"The RWKV community features diverse backgrounds and expertise. This time, most of our authors come from fields outside traditional AI, including physics, mathematics, game development, material science, biochemistry and cybernetics. \nJoin our community at https://discord.com/invite/bDSBUMeFpc and have a nice journey!","html":"

The RWKV community features diverse backgrounds and expertise. This time, most of our authors come from fields outside traditional AI, including physics, mathematics, game development, material science, biochemistry and cybernetics.
Join our community at https://discord.com/invite/bDSBUMeFpc and have a nice journey!

\n","updatedAt":"2025-03-19T14:31:11.255Z","author":{"_id":"6418629fd13ffa408128d7ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1679319546731-noauth.png","fullname":"Zhang Ruichong","name":"ZhangRC","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":27,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9018046855926514},"editors":["ZhangRC"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1679319546731-noauth.png"],"reactions":[{"reaction":"🚀","users":["mindbound","ZhangRC","hevok","mtasic85"],"count":4},{"reaction":"😎","users":["search-facility","ZhangRC","hevok","mtasic85"],"count":4}],"isReport":false}},{"id":"67db713fdb1ac801d5a4e90f","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-03-20T01:37:03.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling](https://huggingface.co/papers/2502.14553) (2025)\n* [Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models](https://huggingface.co/papers/2501.13428) (2025)\n* [Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method](https://huggingface.co/papers/2502.13725) (2025)\n* [UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs](https://huggingface.co/papers/2502.00439) (2025)\n* [EmbBERT-Q: Breaking Memory Barriers in Embedded NLP](https://huggingface.co/papers/2502.10001) (2025)\n* [Scaling Inference-Efficient Language Models](https://huggingface.co/papers/2501.18107) (2025)\n* [Reasoning with Reinforced Functional Token Tuning](https://huggingface.co/papers/2502.13389) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-03-20T01:37:03.284Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6825762391090393},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"67e1e8a33f936771b9a96898","author":{"_id":"64414d0a66a62c605c94d14d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64414d0a66a62c605c94d14d/KGjlf9tRwslr7Q503YLnJ.jpeg","fullname":"howard-hou","name":"howard-hou","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false},"createdAt":"2025-03-24T23:20:03.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"The RWKV-7 delay has been too long. It was open-sourced in December last year, and after three months, the paper is finally complete. It hasn't been easy.","html":"

The RWKV-7 delay has been too long. It was open-sourced in December last year, and after three months, the paper is finally complete. It hasn't been easy.

\n","updatedAt":"2025-03-24T23:20:03.274Z","author":{"_id":"64414d0a66a62c605c94d14d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64414d0a66a62c605c94d14d/KGjlf9tRwslr7Q503YLnJ.jpeg","fullname":"howard-hou","name":"howard-hou","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9965810179710388},"editors":["howard-hou"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64414d0a66a62c605c94d14d/KGjlf9tRwslr7Q503YLnJ.jpeg"],"reactions":[{"reaction":"👍","users":["search-facility","ZhangRC","SkyMind","hegderavin","bndp"],"count":5},{"reaction":"❤️","users":["ZhangRC","SkyMind"],"count":2}],"isReport":false}},{"id":"694aef8961707ccb9365ec5f","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2025-12-23T19:37:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/rwkv-7-goose-with-expressive-dynamic-state-evolution-1949-e940c720\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/rwkv-7-goose-with-expressive-dynamic-state-evolution-1949-e940c720

Executive Summary
Detailed Breakdown
Practical Applications

\n","updatedAt":"2025-12-23T19:37:45.406Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7334539294242859},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.14456","authors":[{"_id":"67da21ed78c08b432f9fee0c","user":{"_id":"62b3d8d651b07307bd12b7f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1655953609090-noauth.jpeg","isPro":false,"fullname":"BlinkDL","user":"BlinkDL","type":"user"},"name":"Bo Peng","status":"claimed_verified","statusLastChangedAt":"2025-03-19T09:44:45.587Z","hidden":false},{"_id":"67da21ed78c08b432f9fee0d","user":{"_id":"6418629fd13ffa408128d7ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1679319546731-noauth.png","isPro":false,"fullname":"Zhang Ruichong","user":"ZhangRC","type":"user"},"name":"Ruichong Zhang","status":"claimed_verified","statusLastChangedAt":"2025-03-19T09:44:54.749Z","hidden":false},{"_id":"67da21ed78c08b432f9fee0e","user":{"_id":"647f4bac45baf21ad709fcd0","avatarUrl":"/avatars/14c04cdda95de676aeefa9ae3e7c19ba.svg","isPro":false,"fullname":"Dan Goldstein","user":"SmerkyG","type":"user"},"name":"Daniel Goldstein","status":"claimed_verified","statusLastChangedAt":"2025-03-19T09:44:57.333Z","hidden":false},{"_id":"67da21ed78c08b432f9fee0f","name":"Eric Alcaide","hidden":false},{"_id":"67da21ed78c08b432f9fee10","user":{"_id":"64414d0a66a62c605c94d14d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64414d0a66a62c605c94d14d/KGjlf9tRwslr7Q503YLnJ.jpeg","isPro":false,"fullname":"howard-hou","user":"howard-hou","type":"user"},"name":"Haowen Hou","status":"claimed_verified","statusLastChangedAt":"2025-03-20T19:17:18.200Z","hidden":false},{"_id":"67da21ed78c08b432f9fee11","user":{"_id":"66b3d98e040c500914ef558f","avatarUrl":"/avatars/a90f8306dbd7747520ce5b941ee3bbcb.svg","isPro":false,"fullname":"Janna","user":"jannalu","type":"user"},"name":"Janna Lu","status":"claimed_verified","statusLastChangedAt":"2025-03-20T10:46:12.180Z","hidden":false},{"_id":"67da21ed78c08b432f9fee12","name":"William Merrill","hidden":false},{"_id":"67da21ed78c08b432f9fee13","user":{"_id":"622c062645261ac5cc0bda94","avatarUrl":"/avatars/ce544b74110f7fe1ad11a3939526f5da.svg","isPro":false,"fullname":"Guangyu Song","user":"Guangyu","type":"user"},"name":"Guangyu Song","status":"admin_assigned","statusLastChangedAt":"2025-03-19T09:50:42.957Z","hidden":false},{"_id":"67da21ed78c08b432f9fee14","name":"Kaifeng Tan","hidden":false},{"_id":"67da21ed78c08b432f9fee15","user":{"_id":"638f1fd8c4444c6ca86ff823","avatarUrl":"/avatars/405807c3868663246cfe371a2034f351.svg","isPro":false,"fullname":"saitejautpala","user":"saitejautpala","type":"user"},"name":"Saiteja Utpala","status":"claimed_verified","statusLastChangedAt":"2025-03-19T09:48:50.276Z","hidden":false},{"_id":"67da21ed78c08b432f9fee16","user":{"_id":"63cac1a50932c72f13995d6f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63cac1a50932c72f13995d6f/Bd9jEsCL9yWXdyAQCx61J.jpeg","isPro":false,"fullname":"Nathan Wilce","user":"m8than","type":"user"},"name":"Nathan Wilce","status":"admin_assigned","statusLastChangedAt":"2025-03-19T09:50:59.151Z","hidden":false},{"_id":"67da21ed78c08b432f9fee17","name":"Johan S. Wind","hidden":false},{"_id":"67da21ed78c08b432f9fee18","name":"Tianyi Wu","hidden":false},{"_id":"67da21ed78c08b432f9fee19","user":{"_id":"63f6706a9cbd6730302b80dc","avatarUrl":"/avatars/dcbbeeeb6b12641b5884df38c5e13766.svg","isPro":false,"fullname":"Dr. Daniel Wuttke","user":"hevok","type":"user"},"name":"Daniel Wuttke","status":"claimed_verified","statusLastChangedAt":"2025-03-19T13:15:03.098Z","hidden":false},{"_id":"67da21ed78c08b432f9fee1a","user":{"_id":"6584f042b378d311dccea501","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/vkq1AQhcuZwIjpFDkdgPQ.png","isPro":false,"fullname":"Christian Zhou-Zheng","user":"ChristianAzinn","type":"user"},"name":"Christian Zhou-Zheng","status":"admin_assigned","statusLastChangedAt":"2025-03-19T09:51:20.835Z","hidden":false}],"publishedAt":"2025-03-18T17:31:05.000Z","submittedOnDailyAt":"2025-03-19T00:29:42.147Z","title":"RWKV-7 \"Goose\" with Expressive Dynamic State Evolution","submittedOnDailyBy":{"_id":"6418629fd13ffa408128d7ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1679319546731-noauth.png","isPro":false,"fullname":"Zhang Ruichong","user":"ZhangRC","type":"user"},"summary":"We present RWKV-7 \"Goose\", a new sequence modeling architecture, along with\npre-trained language models that establish a new state-of-the-art in downstream\nperformance at the 3 billion parameter scale on multilingual tasks, and match\ncurrent SoTA English language performance despite being trained on dramatically\nfewer tokens than other top 3B models. Nevertheless, RWKV-7 models require only\nconstant memory usage and constant inference time per token. RWKV-7 introduces\na newly generalized formulation of the delta rule with vector-valued gating and\nin-context learning rates, as well as a relaxed value replacement rule. We show\nthat RWKV-7 can perform state tracking and recognize all regular languages,\nwhile retaining parallelizability of training. This exceeds the capabilities of\nTransformers under standard complexity conjectures, which are limited to\nTC^0. To demonstrate RWKV-7's language modeling capability, we also\npresent an extended open source 3.1 trillion token multilingual corpus, and\ntrain four RWKV-7 models ranging from 0.19 billion to 2.9 billion parameters on\nthis dataset.\n To foster openness, reproduction, and adoption, we release our models and\ndataset component listing at https://huggingface.co/RWKV, and our training and\ninference code at https://github.com/RWKV/RWKV-LM all under the Apache 2.0\nLicense.","upvotes":153,"discussionId":"67da21ee78c08b432f9fee71","projectPage":"https://rwkv.cn","githubRepo":"https://github.com/RWKV/RWKV-LM","githubRepoAddedBy":"user","ai_summary":"RWKV-7 \"Goose\" achieves state-of-the-art performance in multilingual tasks with optimal memory and inference efficiency, exceeding Transformer capabilities in complexity.","ai_keywords":["sequence modeling architecture","delta rule","vector-valued gating","in-context learning rates","relaxed value replacement rule","state tracking","parallelizability","Transformers","$\\mathsf{TC}^0$","multilingual corpus"],"githubStars":59},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"647f4bac45baf21ad709fcd0","avatarUrl":"/avatars/14c04cdda95de676aeefa9ae3e7c19ba.svg","isPro":false,"fullname":"Dan Goldstein","user":"SmerkyG","type":"user"},{"_id":"63121469ae8896941da562c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63121469ae8896941da562c1/L9YUUhou7wlvobn9D9vz8.png","isPro":false,"fullname":"OpenMOSE","user":"OpenMOSE","type":"user"},{"_id":"646350107e9025b09bd62bab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646350107e9025b09bd62bab/Oou_8-WG72ZbkatdQ1-q6.jpeg","isPro":false,"fullname":"momo","user":"wzc991222","type":"user"},{"_id":"638f6cfb190eba992be0f89e","avatarUrl":"/avatars/c1865fd024846f3a4e60f38aa0d9844c.svg","isPro":false,"fullname":"Benjamin","user":"BeardingFace","type":"user"},{"_id":"637c7503fe115289cfecbe6b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676361945047-637c7503fe115289cfecbe6b.jpeg","isPro":false,"fullname":"Wenhao Chai","user":"wchai","type":"user"},{"_id":"6584f042b378d311dccea501","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/vkq1AQhcuZwIjpFDkdgPQ.png","isPro":false,"fullname":"Christian Zhou-Zheng","user":"ChristianAzinn","type":"user"},{"_id":"644c32b826de083beb97e7c9","avatarUrl":"/avatars/f86b153e147a642b61ba1de011f39550.svg","isPro":false,"fullname":"Dont Worry","user":"dont-worry-1","type":"user"},{"_id":"64c4accbb496b4e176786171","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c4accbb496b4e176786171/eyh6ulc4fufpGmf_DO2K3.jpeg","isPro":false,"fullname":"Molly Sophia","user":"mollysama","type":"user"},{"_id":"6486477fcf5ad5e1f0518395","avatarUrl":"/avatars/119564f3932a68a3d990ec815c73c295.svg","isPro":false,"fullname":"Tak2Hu","user":"Tak2Hu","type":"user"},{"_id":"62b3d8d651b07307bd12b7f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1655953609090-noauth.jpeg","isPro":false,"fullname":"BlinkDL","user":"BlinkDL","type":"user"},{"_id":"63ef22b2bfe4ead22ca9e1e4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676616348535-noauth.jpeg","isPro":false,"fullname":"Phú Võ","user":"phuvo","type":"user"},{"_id":"6418629fd13ffa408128d7ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1679319546731-noauth.png","isPro":false,"fullname":"Zhang Ruichong","user":"ZhangRC","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">

Papers

arxiv:2503.14456

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Published on Mar 18, 2025

· Submitted by

Zhang Ruichong on Mar 19, 2025

Upvote

153

Authors:

Bo Peng ,

Ruichong Zhang ,

Daniel Goldstein ,

Haowen Hou ,

Janna Lu ,

Guangyu Song ,

Saiteja Utpala ,

Nathan Wilce ,

Daniel Wuttke ,

Christian Zhou-Zheng

Abstract

RWKV-7 "Goose" achieves state-of-the-art performance in multilingual tasks with optimal memory and inference efficiency, exceeding Transformer capabilities in complexity.

AI-generated summary

We present RWKV-7 "Goose", a new sequence modeling architecture, along with pre-trained language models that establish a new state-of-the-art in downstream performance at the 3 billion parameter scale on multilingual tasks, and match current SoTA English language performance despite being trained on dramatically fewer tokens than other top 3B models. Nevertheless, RWKV-7 models require only constant memory usage and constant inference time per token. RWKV-7 introduces a newly generalized formulation of the delta rule with vector-valued gating and in-context learning rates, as well as a relaxed value replacement rule. We show that RWKV-7 can perform state tracking and recognize all regular languages, while retaining parallelizability of training. This exceeds the capabilities of Transformers under standard complexity conjectures, which are limited to TC^0. To demonstrate RWKV-7's language modeling capability, we also present an extended open source 3.1 trillion token multilingual corpus, and train four RWKV-7 models ranging from 0.19 billion to 2.9 billion parameters on this dataset. To foster openness, reproduction, and adoption, we release our models and dataset component listing at https://huggingface.co/RWKV, and our training and inference code at https://github.com/RWKV/RWKV-LM all under the Apache 2.0 License.

View arXiv page View PDF Project page GitHub 59 Add to collection

Community

ZhangRC

Paper author Paper submitter Mar 19, 2025

RWKV-7 paper is finally out!

dont-worry-1

Mar 19, 2025

cant wait to see this paired with CoT-type reasoning

BlinkDL

Paper author Mar 19, 2025

RWKV7-G1 "GooseOne" reasoning models :)
https://x.com/BlinkDL_AI/status/1898579674575552558

wuhanluojia

Mar 19, 2025

可并行训练+
边推理边学习（推理速度显存都恒定）+
无限上下文+
无限CoT+
想知道上述无限具体指的是什么，比如具体应用的时候无限上下文？无限cot？

ZhangRC

Paper author Paper submitter Mar 19, 2025

•

edited Mar 19, 2025

可并行训练+
边推理边学习（推理速度显存都恒定）+
无限上下文+
无限CoT+
想知道上述无限具体指的是什么，比如具体应用的时候无限上下文？无限cot？

This claim about "infinite context" and "infinite CoT" is slightly overclaimed. This isn't from the authors but probably from a product manager.
However, RWKV-7 as an RNN can accept "indefinite" context length. It can accept arbitrarily long input but still suffer from forgetting. This is different from "infinite", since RNNs may still forget certain information in a very lengthy context.
So, "无限" means indefinite.
For example, RWKV-7 models trained with context length 4k can retain perfect memory until around 8k-16k, after that, they may drastically forget previous information. If necessary, you can fine-tune RWKV-7 models with arbitrarily long context length. Please test specific use cases for clarity.

m8than

Paper author Mar 19, 2025

New paper just dropped 👌👌

Jellyfish042

Mar 19, 2025

Transformers are toast this time for real

ZhangRC

Paper author Paper submitter Mar 19, 2025

librarian-bot

Mar 20, 2025

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

howard-hou

Paper author Mar 24, 2025

The RWKV-7 delay has been too long. It was open-sourced in December last year, and after three months, the paper is finally complete. It hasn't been easy.