Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Residual Context Diffusion Language Models
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-06T01:42:15.826Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.664414644241333},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"6985e535c3882063e8758d64","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-06T12:57:25.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/residual-context-diffusion-language-models-3072-46f69098\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/residual-context-diffusion-language-models-3072-46f69098

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-02-06T12:57:25.943Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6510380506515503},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}},{"id":"69874c7823830cd9f864756c","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-07T14:30:16.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/residual-context-diffusion-language-models-3072-46f69098\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/residual-context-diffusion-language-models-3072-46f69098

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-02-07T14:30:16.588Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6510380506515503},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.22954","authors":[{"_id":"698018846676f93322706587","user":{"_id":"67dc66fe55c24fc4f981a4ab","avatarUrl":"/avatars/7bd900ade802d99db7c562ad6c2f6661.svg","isPro":false,"fullname":"Yuezhou Hu","user":"yuezhouhu","type":"user"},"name":"Yuezhou Hu","status":"claimed_verified","statusLastChangedAt":"2026-02-02T16:53:04.233Z","hidden":false},{"_id":"698018846676f93322706588","name":"Harman Singh","hidden":false},{"_id":"698018846676f93322706589","name":"Monishwaran Maheswaran","hidden":false},{"_id":"698018846676f9332270658a","name":"Haocheng Xi","hidden":false},{"_id":"698018846676f9332270658b","name":"Coleman Hooper","hidden":false},{"_id":"698018846676f9332270658c","user":{"_id":"66c0a08bac74db25de8427ec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66c0a08bac74db25de8427ec/9D-piDBZqSt6KNkHImmkv.jpeg","isPro":false,"fullname":"Jintao Zhang","user":"jt-zhang","type":"user"},"name":"Jintao Zhang","status":"claimed_verified","statusLastChangedAt":"2026-02-05T10:57:09.830Z","hidden":false},{"_id":"698018846676f9332270658d","user":{"_id":"65fe49de871b36bf84c0ba05","avatarUrl":"/avatars/0fe082518fb9ea40e23414c83ee5043e.svg","isPro":false,"fullname":"Aditya Tomar","user":"adityastomar","type":"user"},"name":"Aditya Tomar","status":"claimed_verified","statusLastChangedAt":"2026-02-05T10:57:07.939Z","hidden":false},{"_id":"698018846676f9332270658e","name":"Michael W. Mahoney","hidden":false},{"_id":"698018846676f9332270658f","name":"Sewon Min","hidden":false},{"_id":"698018846676f93322706590","name":"Mehrdad Farajtabar","hidden":false},{"_id":"698018846676f93322706591","name":"Kurt Keutzer","hidden":false},{"_id":"698018846676f93322706592","name":"Amir Gholami","hidden":false},{"_id":"698018846676f93322706593","name":"Chenfeng Xu","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/67dc66fe55c24fc4f981a4ab/FVQu1kNOzJkrevK4FdpOP.mp4"],"publishedAt":"2026-01-30T13:16:32.000Z","submittedOnDailyAt":"2026-02-05T00:48:00.818Z","title":"Residual Context Diffusion Language Models","submittedOnDailyBy":{"_id":"67dc66fe55c24fc4f981a4ab","avatarUrl":"/avatars/7bd900ade802d99db7c562ad6c2f6661.svg","isPro":false,"fullname":"Yuezhou Hu","user":"yuezhouhu","type":"user"},"summary":"Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to purely autoregressive language models because they can decode multiple tokens in parallel. However, state-of-the-art block-wise dLLMs rely on a \"remasking\" mechanism that decodes only the most confident tokens and discards the rest, effectively wasting computation. We demonstrate that recycling computation from the discarded tokens is beneficial, as these tokens retain contextual information useful for subsequent decoding iterations. In light of this, we propose Residual Context Diffusion (RCD), a module that converts these discarded token representations into contextual residuals and injects them back for the next denoising step. RCD uses a decoupled two-stage training pipeline to bypass the memory bottlenecks associated with backpropagation. We validate our method on both long CoT reasoning (SDAR) and short CoT instruction following (LLaDA) models. We demonstrate that a standard dLLM can be efficiently converted to the RCD paradigm with merely ~1 billion tokens. RCD consistently improves frontier dLLMs by 5-10 points in accuracy with minimal extra computation overhead across a wide range of benchmarks. Notably, on the most challenging AIME tasks, RCD nearly doubles baseline accuracy and attains up to 4-5x fewer denoising steps at equivalent accuracy levels.","upvotes":33,"discussionId":"698018856676f93322706594","projectPage":"https://yuezhouhu.github.io/projects/residual-context-diffusion/index.html","githubRepo":"https://github.com/yuezhouhu/residual-context-diffusion","githubRepoAddedBy":"user","ai_summary":"Residual Context Diffusion (RCD) enhances diffusion large language models by recycling discarded token information through contextual residuals, improving accuracy with minimal computational overhead.","ai_keywords":["diffusion large language models","autoregressive language models","remasking mechanism","token representations","contextual residuals","denoising steps","decoupled two-stage training","backpropagation","long CoT reasoning","short CoT instruction following","AIME tasks"],"githubStars":53,"organization":{"_id":"66b1baeff10262fc4fa61961","name":"UCBerkeley","fullname":"University of California, Berkeley","avatar":"https://cdn-uploads.huggingface.co/production/uploads/63f425c3a096536aeab42dea/bxNKEkprdm5JI1wkjmNAL.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67dc66fe55c24fc4f981a4ab","avatarUrl":"/avatars/7bd900ade802d99db7c562ad6c2f6661.svg","isPro":false,"fullname":"Yuezhou Hu","user":"yuezhouhu","type":"user"},{"_id":"66c0a08bac74db25de8427ec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66c0a08bac74db25de8427ec/9D-piDBZqSt6KNkHImmkv.jpeg","isPro":false,"fullname":"Jintao Zhang","user":"jt-zhang","type":"user"},{"_id":"63e78f4dfdb4097ef65f8c1f","avatarUrl":"/avatars/b7e4064472773d6d804465b255619a8f.svg","isPro":false,"fullname":"LIBANGGUO","user":"Lbg21","type":"user"},{"_id":"66ce751a8ec9fda2cf5a9e85","avatarUrl":"/avatars/c17093ca81dad007b3e50bae503955a7.svg","isPro":false,"fullname":"Haocheng Xi","user":"xihc-ucb","type":"user"},{"_id":"69840f0d84dfa06deccda0e5","avatarUrl":"/avatars/16d5310d03dc80ed3294a72d40d5bbe5.svg","isPro":false,"fullname":"chen chen","user":"xueyingtu","type":"user"},{"_id":"6463554dd2044cd1d7c6e0bf","avatarUrl":"/avatars/d7653623117268c545a7063fec69664b.svg","isPro":false,"fullname":"Bingzheng Wei","user":"Bingzheng","type":"user"},{"_id":"65092e4af75ac8c6f97010db","avatarUrl":"/avatars/9c4efddbc1908508a4b7de7f41abacc8.svg","isPro":false,"fullname":"Bingrui Li","user":"Bingrui","type":"user"},{"_id":"66ab9a9c661901e7adacbba0","avatarUrl":"/avatars/ee146edc35a6ac5e669188af35bf34d0.svg","isPro":false,"fullname":"MinseoKim","user":"minseo25","type":"user"},{"_id":"654ca71d5255ee86711b52c5","avatarUrl":"/avatars/52bf00fd74c8db5643c4daa185c678e6.svg","isPro":false,"fullname":"Chenfeng Xu","user":"chenfengx","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"65fe49de871b36bf84c0ba05","avatarUrl":"/avatars/0fe082518fb9ea40e23414c83ee5043e.svg","isPro":false,"fullname":"Aditya Tomar","user":"adityastomar","type":"user"},{"_id":"668d9e22d10c3be5d3ae56ee","avatarUrl":"/avatars/668b48f9ecae31dc8f94d9b3e2d9fc14.svg","isPro":false,"fullname":"Jiayi Li","user":"jiayili","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"66b1baeff10262fc4fa61961","name":"UCBerkeley","fullname":"University of California, Berkeley","avatar":"https://cdn-uploads.huggingface.co/production/uploads/63f425c3a096536aeab42dea/bxNKEkprdm5JI1wkjmNAL.png"}}">
Papers
arxiv:2601.22954

Residual Context Diffusion Language Models

Published on Jan 30
· Submitted by
Yuezhou Hu
on Feb 5
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Residual Context Diffusion (RCD) enhances diffusion large language models by recycling discarded token information through contextual residuals, improving accuracy with minimal computational overhead.

AI-generated summary

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to purely autoregressive language models because they can decode multiple tokens in parallel. However, state-of-the-art block-wise dLLMs rely on a "remasking" mechanism that decodes only the most confident tokens and discards the rest, effectively wasting computation. We demonstrate that recycling computation from the discarded tokens is beneficial, as these tokens retain contextual information useful for subsequent decoding iterations. In light of this, we propose Residual Context Diffusion (RCD), a module that converts these discarded token representations into contextual residuals and injects them back for the next denoising step. RCD uses a decoupled two-stage training pipeline to bypass the memory bottlenecks associated with backpropagation. We validate our method on both long CoT reasoning (SDAR) and short CoT instruction following (LLaDA) models. We demonstrate that a standard dLLM can be efficiently converted to the RCD paradigm with merely ~1 billion tokens. RCD consistently improves frontier dLLMs by 5-10 points in accuracy with minimal extra computation overhead across a wide range of benchmarks. Notably, on the most challenging AIME tasks, RCD nearly doubles baseline accuracy and attains up to 4-5x fewer denoising steps at equivalent accuracy levels.

Community

Paper author Paper submitter

We introduce Residual Context Diffusion (RCD): a simple idea to boost diffusion LLMs—stop wasting “remasked” tokens.

Diffusion LLMs decode in parallel but often lag AR models because low-confidence tokens are discarded each step. RCD turns those discarded distributions into residual context and injects them into the next denoising step, recycling computation instead of throwing it away.

Results: consistent gains over Sequential Denoising (SeqD) on SDAR & LLaDA, with biggest jumps on hard math reasoning (AIME24/25, MinervaMath).

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/residual-context-diffusion-language-models-3072-46f69098

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/residual-context-diffusion-language-models-3072-46f69098

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.22954 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.22954 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.22954 in a Space README.md to link it from this page.

Collections including this paper 2