$\"OIZP1113\"$

\n","updatedAt":"2026-02-06T05:11:44.916Z","author":{"_id":"6513717d749380c079b72bda","avatarUrl":"/avatars/86221220d4d2da5eb50c5e4f40548e29.svg","fullname":"Li","name":"vincentleebang","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3925101161003113},"editors":["vincentleebang"],"editorAvatarUrls":["/avatars/86221220d4d2da5eb50c5e4f40548e29.svg"],"reactions":[],"isReport":false}},{"id":"698658e8171cd823797c1d4f","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-06T21:11:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/reinforced-attention-learning-1838-e9667192\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/reinforced-attention-learning-1838-e9667192

Executive Summary
Detailed Breakdown
Practical Applications

\n","updatedAt":"2026-02-06T21:11:04.575Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7235903143882751},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}},{"id":"69869877a587f22625400b79","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-07T01:42:15.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [KEPO: Knowledge-Enhanced Preference Optimization for Reinforcement Learning with Reasoning](https://huggingface.co/papers/2602.00400) (2026)\n* [LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning](https://huggingface.co/papers/2601.10129) (2026)\n* [CORD: Bridging the Audio-Text Reasoning Gap via Weighted On-policy Cross-modal Distillation](https://huggingface.co/papers/2601.16547) (2026)\n* [Masking Teacher and Reinforcing Student for Distilling Vision-Language Models](https://huggingface.co/papers/2512.22238) (2025)\n* [Reinforcement Learning via Self-Distillation](https://huggingface.co/papers/2601.20802) (2026)\n* [Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models](https://huggingface.co/papers/2601.18734) (2026)\n* [Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection](https://huggingface.co/papers/2512.13240) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-07T01:42:15.228Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7205070853233337},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.04884","authors":[{"_id":"698565d64ad556f294b7ebee","name":"Bangzheng Li","hidden":false},{"_id":"698565d64ad556f294b7ebef","name":"Jianmo Ni","hidden":false},{"_id":"698565d64ad556f294b7ebf0","name":"Chen Qu","hidden":false},{"_id":"698565d64ad556f294b7ebf1","name":"Ian Miao","hidden":false},{"_id":"698565d64ad556f294b7ebf2","name":"Liu Yang","hidden":false},{"_id":"698565d64ad556f294b7ebf3","user":{"_id":"6336091b2db86a181ccd6054","avatarUrl":"/avatars/829f69436225d05d2c2136bc90f640d7.svg","isPro":false,"fullname":"Xingyu Fu","user":"Fiaa","type":"user"},"name":"Xingyu Fu","status":"claimed_verified","statusLastChangedAt":"2026-02-09T08:35:32.418Z","hidden":false},{"_id":"698565d64ad556f294b7ebf4","name":"Muhao Chen","hidden":false},{"_id":"698565d64ad556f294b7ebf5","name":"Derek Zhiyuan Cheng","hidden":false}],"publishedAt":"2026-02-04T18:59:52.000Z","submittedOnDailyAt":"2026-02-06T02:41:44.910Z","title":"Reinforced Attention Learning","submittedOnDailyBy":{"_id":"6513717d749380c079b72bda","avatarUrl":"/avatars/86221220d4d2da5eb50c5e4f40548e29.svg","isPro":true,"fullname":"Li","user":"vincentleebang","type":"user"},"summary":"Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs (MLLMs) through verbose rationales yields limited gains for perception and can even degrade performance.\n We propose Reinforced Attention Learning (RAL), a policy-gradient framework that directly optimizes internal attention distributions rather than output token sequences. By shifting optimization from what to generate to where to attend, RAL promotes effective information allocation and improved grounding in complex multimodal inputs. Experiments across diverse image and video benchmarks show consistent gains over GRPO and other baselines. We further introduce On-Policy Attention Distillation, demonstrating that transferring latent attention behaviors yields stronger cross-modal alignment than standard knowledge distillation. Our results position attention policies as a principled and general alternative for multimodal post-training.","upvotes":28,"discussionId":"698565d64ad556f294b7ebf6","ai_summary":"Reinforced Attention Learning optimizes internal attention distributions in multimodal language models, improving information allocation and cross-modal alignment through policy-gradient methods.","ai_keywords":["Reinforcement Learning","Large Language Models","Multimodal LLMs","policy-gradient framework","attention distributions","test-time scaling","GRPO","On-Policy Attention Distillation","cross-modal alignment","knowledge distillation"],"organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-uploads.huggingface.co/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62ea79dd01ed9b0e8f61ccd3","avatarUrl":"/avatars/70af83e0e267be39fcd5f23b85e2dafa.svg","isPro":false,"fullname":"Chengsong Huang","user":"ChengsongHuang","type":"user"},{"_id":"64548986cd09ceba0e1709cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64548986cd09ceba0e1709cb/muGiatjmPfzxYb3Rjcqas.jpeg","isPro":false,"fullname":"www.minds.com/jelyazko/","user":"21world","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"},{"_id":"67247adb73d1eb17b6bfd27c","avatarUrl":"/avatars/57bdbb7362f9854c87dd0a71ae071652.svg","isPro":false,"fullname":"Zefeng He","user":"yhx12","type":"user"},{"_id":"6342796a0875f2c99cfd313b","avatarUrl":"/avatars/98575092404c4197b20c929a6499a015.svg","isPro":false,"fullname":"Yuseung \"Phillip\" Lee","user":"phillipinseoul","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6463554dd2044cd1d7c6e0bf","avatarUrl":"/avatars/d7653623117268c545a7063fec69664b.svg","isPro":false,"fullname":"Bingzheng Wei","user":"Bingzheng","type":"user"},{"_id":"69838556dea8956a5cef3ebd","avatarUrl":"/avatars/1a82518bc4e1a14af9f7874e0d720924.svg","isPro":false,"fullname":"Johannes Kirmayr","user":"johanneskirmayr","type":"user"},{"_id":"68b07eb244940767d574fa84","avatarUrl":"/avatars/bb8e20cc9f1e57d3302f36619e491393.svg","isPro":false,"fullname":"ernawati","user":"ern2025","type":"user"},{"_id":"6513717d749380c079b72bda","avatarUrl":"/avatars/86221220d4d2da5eb50c5e4f40548e29.svg","isPro":true,"fullname":"Li","user":"vincentleebang","type":"user"},{"_id":"62f70a22ffd6a0853ee85c9c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f70a22ffd6a0853ee85c9c/dqrSeg0yjg4Cnu5D60Ms4.jpeg","isPro":false,"fullname":"Moses Joshua Coker","user":"MosesJoshuaCoker","type":"user"},{"_id":"6336091b2db86a181ccd6054","avatarUrl":"/avatars/829f69436225d05d2c2136bc90f640d7.svg","isPro":false,"fullname":"Xingyu Fu","user":"Fiaa","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-uploads.huggingface.co/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"}}">

Papers

arxiv:2602.04884

Reinforced Attention Learning

Published on Feb 4

· Submitted by

Li on Feb 6

Google

Upvote

Authors:

Xingyu Fu ,

Abstract

Reinforced Attention Learning optimizes internal attention distributions in multimodal language models, improving information allocation and cross-modal alignment through policy-gradient methods.

AI-generated summary

Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs (MLLMs) through verbose rationales yields limited gains for perception and can even degrade performance. We propose Reinforced Attention Learning (RAL), a policy-gradient framework that directly optimizes internal attention distributions rather than output token sequences. By shifting optimization from what to generate to where to attend, RAL promotes effective information allocation and improved grounding in complex multimodal inputs. Experiments across diverse image and video benchmarks show consistent gains over GRPO and other baselines. We further introduce On-Policy Attention Distillation, demonstrating that transferring latent attention behaviors yields stronger cross-modal alignment than standard knowledge distillation. Our results position attention policies as a principled and general alternative for multimodal post-training.