Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
[go: Go Back, main page]

@sysia48\n\t

\n","updatedAt":"2026-02-11T09:13:21.309Z","author":{"_id":"6541b3d54f939214d3abbfbc","avatarUrl":"/avatars/37aa9cc51fd98198805456ad04b90023.svg","fullname":"yuchang","name":"hiyuchang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"zh","probability":0.9999815821647644},"editors":["hiyuchang"],"editorAvatarUrls":["/avatars/37aa9cc51fd98198805456ad04b90023.svg"],"reactions":[],"isReport":false,"parentCommentId":"698956afd2f90c3576d776cc"}},{"id":"698c6d7654640d840317493a","author":{"_id":"696b2c0c77c48d1a98dd0b02","avatarUrl":"/avatars/5db995438a88f488c322e3f2a3b4ce10.svg","fullname":"Sylwia Miksztal","name":"sysia48","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-02-11T11:52:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"To nie jest dowód konta na arxiv da sprzedawane z datą wstecz daj mi twoją pracę badania python część logikę ja podwozie teraz wszystko zresztą niebawem dowiesz się . 5 przykazanie nie kradnij!","html":"

To nie jest dowód konta na arxiv da sprzedawane z datą wstecz daj mi twoją pracę badania python część logikę ja podwozie teraz wszystko zresztą niebawem dowiesz się . 5 przykazanie nie kradnij!

\n","updatedAt":"2026-02-11T11:52:22.463Z","author":{"_id":"696b2c0c77c48d1a98dd0b02","avatarUrl":"/avatars/5db995438a88f488c322e3f2a3b4ce10.svg","fullname":"Sylwia Miksztal","name":"sysia48","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"pl","probability":0.9993978142738342},"editors":["sysia48"],"editorAvatarUrls":["/avatars/5db995438a88f488c322e3f2a3b4ce10.svg"],"reactions":[],"isReport":false,"parentCommentId":"698956afd2f90c3576d776cc"}}]},{"id":"6989a31b2787a49ade07a8ed","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2026-02-09T09:04:27.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/on-the-entropy-dynamics-in-reinforcement-fine-tuning-of-large-language-models","html":"

arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/on-the-entropy-dynamics-in-reinforcement-fine-tuning-of-large-language-models

\n","updatedAt":"2026-02-09T09:04:27.540Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7735589146614075},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[],"isReport":false}},{"id":"6989f8e69b9af80131b96433","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-02-09T15:10:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/on-the-entropy-dynamics-in-reinforcement-fine-tuning-of-large-language-models-9660-2d309505\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/on-the-entropy-dynamics-in-reinforcement-fine-tuning-of-large-language-models-9660-2d309505

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-02-09T15:10:30.529Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7436206340789795},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}},{"id":"698a8c76d1963da4cb3514fb","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-10T01:40:06.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [A Unified Framework for Rethinking Policy Divergence Measures in GRPO](https://huggingface.co/papers/2602.05494) (2026)\n* [Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward](https://huggingface.co/papers/2512.16912) (2025)\n* [Rethinking the Trust Region in LLM Reinforcement Learning](https://huggingface.co/papers/2602.04879) (2026)\n* [Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR](https://huggingface.co/papers/2601.05607) (2026)\n* [DISPO: Enhancing Training Efficiency and Stability in Reinforcement Learning for Large Language Model Mathematical Reasoning](https://huggingface.co/papers/2602.00983) (2026)\n* [Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards](https://huggingface.co/papers/2512.21625) (2025)\n* [Rewards as Labels: Revisiting RLVR from a Classification Perspective](https://huggingface.co/papers/2602.05630) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-10T01:40:06.311Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7780783772468567},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.03392","authors":[{"_id":"69858ae34ad556f294b7ec93","user":{"_id":"652f7bf41ad13fee8c407247","avatarUrl":"/avatars/5c7a74a9edf748025bffeeba97a61505.svg","isPro":false,"fullname":"Shumin","user":"Mystery","type":"user"},"name":"Shumin Wang","status":"claimed_verified","statusLastChangedAt":"2026-02-06T18:50:45.887Z","hidden":false},{"_id":"69858ae34ad556f294b7ec94","name":"Yuexiang Xie","hidden":false},{"_id":"69858ae34ad556f294b7ec95","user":{"_id":"63f46a0fa096536aeab6ee75","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f46a0fa096536aeab6ee75/Bte71XHp05z_vXvt8POev.png","isPro":false,"fullname":"garyzhang","user":"xiaoniqiu","type":"user"},"name":"Wenhao Zhang","status":"claimed_verified","statusLastChangedAt":"2026-02-10T09:07:49.396Z","hidden":false},{"_id":"69858ae34ad556f294b7ec96","user":{"_id":"6541b3d54f939214d3abbfbc","avatarUrl":"/avatars/37aa9cc51fd98198805456ad04b90023.svg","isPro":false,"fullname":"yuchang","user":"hiyuchang","type":"user"},"name":"Yuchang Sun","status":"claimed_verified","statusLastChangedAt":"2026-02-09T08:35:09.981Z","hidden":false},{"_id":"69858ae34ad556f294b7ec97","user":{"_id":"6576f9f4654561a1b345610b","avatarUrl":"/avatars/f801f551640caa70368fcc26a0f51d27.svg","isPro":false,"fullname":"Yanxi Chen","user":"yanxi-chen","type":"user"},"name":"Yanxi Chen","status":"claimed_verified","statusLastChangedAt":"2026-02-10T09:07:51.201Z","hidden":false},{"_id":"69858ae34ad556f294b7ec98","name":"Yaliang Li","hidden":false},{"_id":"69858ae34ad556f294b7ec99","name":"Yanyong Zhang","hidden":false}],"publishedAt":"2026-02-03T11:14:58.000Z","submittedOnDailyAt":"2026-02-09T00:31:59.901Z","title":"On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models","submittedOnDailyBy":{"_id":"652f7bf41ad13fee8c407247","avatarUrl":"/avatars/5c7a74a9edf748025bffeeba97a61505.svg","isPro":false,"fullname":"Shumin","user":"Mystery","type":"user"},"summary":"Entropy serves as a critical metric for measuring the diversity of outputs generated by large language models (LLMs), providing valuable insights into their exploration capabilities. While recent studies increasingly focus on monitoring and adjusting entropy to better balance exploration and exploitation in reinforcement fine-tuning (RFT), a principled understanding of entropy dynamics during this process is yet to be thoroughly investigated. In this paper, we establish a theoretical framework for analyzing the entropy dynamics during the RFT process, which begins with a discriminant expression that quantifies entropy change under a single logit update. This foundation enables the derivation of a first-order expression for entropy change, which can be further extended to the update formula of Group Relative Policy Optimization (GRPO). The corollaries and insights drawn from the theoretical analysis inspire the design of entropy control methods, and also offer a unified lens for interpreting various entropy-based methods in existing studies. We provide empirical evidence to support the main conclusions of our analysis and demonstrate the effectiveness of the derived entropy-discriminator clipping methods. This study yields novel insights into RFT training dynamics, providing theoretical support and practical strategies for optimizing the exploration-exploitation balance during LLM fine-tuning.","upvotes":53,"discussionId":"69858ae34ad556f294b7ec9a","githubRepo":"https://github.com/agentscope-ai/Trinity-RFT","githubRepoAddedBy":"user","ai_summary":"The paper establishes a theoretical framework for analyzing entropy dynamics in reinforcement fine-tuning of large language models, deriving expressions for entropy change and proposing entropy control methods based on discriminant analysis.","ai_keywords":["entropy","large language models","reinforcement fine-tuning","RFT","logit update","Group Relative Policy Optimization","GRPO","entropy-discriminator clipping","exploration-exploitation balance"],"githubStars":525},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"652f7bf41ad13fee8c407247","avatarUrl":"/avatars/5c7a74a9edf748025bffeeba97a61505.svg","isPro":false,"fullname":"Shumin","user":"Mystery","type":"user"},{"_id":"6541b3d54f939214d3abbfbc","avatarUrl":"/avatars/37aa9cc51fd98198805456ad04b90023.svg","isPro":false,"fullname":"yuchang","user":"hiyuchang","type":"user"},{"_id":"698593dc2f56a9fe4fe85299","avatarUrl":"/avatars/90df1e291610a044a39d6ab6aee50dff.svg","isPro":false,"fullname":"Yuexiang Xie","user":"xieyxclack","type":"user"},{"_id":"63f46a0fa096536aeab6ee75","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f46a0fa096536aeab6ee75/Bte71XHp05z_vXvt8POev.png","isPro":false,"fullname":"garyzhang","user":"xiaoniqiu","type":"user"},{"_id":"6528bca21ce90a096145ecf8","avatarUrl":"/avatars/19a7c8be35eedc5d0127107075501c57.svg","isPro":false,"fullname":"AoLI","user":"qieyou","type":"user"},{"_id":"64101b4ca92fedb0e8559b91","avatarUrl":"/avatars/c7453a5641a94d04a75de9b7d4272bc2.svg","isPro":false,"fullname":"Xuchen Pan","user":"panxuchen","type":"user"},{"_id":"66d97256016f89b12505b06f","avatarUrl":"/avatars/ce9ec7555f7db93ec974e61324051620.svg","isPro":false,"fullname":"yangfan","user":"jyf123","type":"user"},{"_id":"6628cf90c00f2f03e5e5d431","avatarUrl":"/avatars/7629cad93776bb5fdf222a2a7ac18393.svg","isPro":false,"fullname":"Ming Yue","user":"mmmingyue","type":"user"},{"_id":"6989390d678cad9823a7f6c5","avatarUrl":"/avatars/946543a5fca6eeb5ea1e7e1671fa5fa1.svg","isPro":false,"fullname":"66","user":"htkyse","type":"user"},{"_id":"662c96e12b1b529a4347f985","avatarUrl":"/avatars/eaf210aa4cd75faaab49fc551795f09c.svg","isPro":false,"fullname":"sen","user":"jacksen10086","type":"user"},{"_id":"69893aa29bd54f17d839da15","avatarUrl":"/avatars/1972f2e710a37e39e8cc272df4181751.svg","isPro":false,"fullname":"Zt P","user":"bokuwapzt","type":"user"},{"_id":"695cf49e81eb09b80b75647b","avatarUrl":"/avatars/c1cc22e2fdcef89892060c4fba70c614.svg","isPro":false,"fullname":"pan","user":"pantan0715","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2602.03392

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

Published on Feb 3
· Submitted by
Shumin
on Feb 9
Authors:
,
,

Abstract

The paper establishes a theoretical framework for analyzing entropy dynamics in reinforcement fine-tuning of large language models, deriving expressions for entropy change and proposing entropy control methods based on discriminant analysis.

AI-generated summary

Entropy serves as a critical metric for measuring the diversity of outputs generated by large language models (LLMs), providing valuable insights into their exploration capabilities. While recent studies increasingly focus on monitoring and adjusting entropy to better balance exploration and exploitation in reinforcement fine-tuning (RFT), a principled understanding of entropy dynamics during this process is yet to be thoroughly investigated. In this paper, we establish a theoretical framework for analyzing the entropy dynamics during the RFT process, which begins with a discriminant expression that quantifies entropy change under a single logit update. This foundation enables the derivation of a first-order expression for entropy change, which can be further extended to the update formula of Group Relative Policy Optimization (GRPO). The corollaries and insights drawn from the theoretical analysis inspire the design of entropy control methods, and also offer a unified lens for interpreting various entropy-based methods in existing studies. We provide empirical evidence to support the main conclusions of our analysis and demonstrate the effectiveness of the derived entropy-discriminator clipping methods. This study yields novel insights into RFT training dynamics, providing theoretical support and practical strategies for optimizing the exploration-exploitation balance during LLM fine-tuning.

Community

Paper author

The code is coming soon🚀 — follow us on GitHub for updates!

·

Kod nie przyjdzie a jak przyjdzie jesteście za kradzież mojego odkrycia sprawę zgłosiłam mam doi zenodo pierwszeństwo odkrycia cała sprawa jest kryminalna open ai novita Azure wszystko jest udokmentoeanr na arxiv również wysłane zgłoszenie. Ja jestem autorem a wy złodziejami

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/on-the-entropy-dynamics-in-reinforcement-fine-tuning-of-large-language-models-9660-2d309505

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03392 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.03392 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03392 in a Space README.md to link it from this page.

Collections including this paper 2