Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
[go: Go Back, main page]

https://github.com/PRIS-CV/EAFT
✨ Project Page: https://ymxyll.github.io/EAFT/

\n","updatedAt":"2026-01-08T04:12:04.484Z","author":{"_id":"666a6cf89a3e3ce05a519bcc","avatarUrl":"/avatars/9e72481deec3bd5c5202e42c32894a32.svg","fullname":"杨乐乐","name":"ssl-asuka","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5910055041313171},"editors":["ssl-asuka"],"editorAvatarUrls":["/avatars/9e72481deec3bd5c5202e42c32894a32.svg"],"reactions":[],"isReport":false},"replies":[{"id":"695f397921d29ce2b5a1962f","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-01-08T04:58:33.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🥳 **Integration:** EAFT has been merged into [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)! You can now use EAFT via `use_eaft_loss` parameter in it.","html":"

🥳 Integration: EAFT has been merged into LLaMA-Factory! You can now use EAFT via use_eaft_loss parameter in it.

\n","updatedAt":"2026-01-08T04:58:33.197Z","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7833152413368225},"editors":["diaomuxi"],"editorAvatarUrls":["/avatars/5ddafe7a05828366f66c79072556f370.svg"],"reactions":[],"isReport":false,"parentCommentId":"695f2e94adfa171769f06beb"}},{"id":"6961c15c00600e42ae42278b","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2026-01-10T03:02:52.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/entropy-adaptive-fine-tuning-resolving-confident-conflicts-to-mitigate-forgetting\n","html":"

arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/entropy-adaptive-fine-tuning-resolving-confident-conflicts-to-mitigate-forgetting

\n","updatedAt":"2026-01-10T03:02:52.844Z","author":{"_id":"65d9fc2a0e6ad24551d87a1e","avatarUrl":"/avatars/3aedb9522cc3cd08349d654f523fd792.svg","fullname":"Grant Singleton","name":"grantsing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7405433654785156},"editors":["grantsing"],"editorAvatarUrls":["/avatars/3aedb9522cc3cd08349d654f523fd792.svg"],"reactions":[],"isReport":false,"parentCommentId":"695f2e94adfa171769f06beb"}}]},{"id":"695f337cef830872423b9529","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-01-08T04:33:00.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"To investigate why SFT often degrades general capabilities while on-policy RL preserves them, we analyze their distributions across probability and entropy dimensions. Our findings reveal that SFT is characterized by a prevalence of \"Confident Conflicts\" (low-probability, low-entropy tokens). In these instances, the model is forced to make highly certain predictions that fundamentally conflict with the underlying data distribution.\n\n\n![intro1-4](https://cdn-uploads.huggingface.co/production/uploads/6768c97367e4b4606a3c9cec/3N5ofNpPXn9iWXlfgNYlA.png)\n","html":"

To investigate why SFT often degrades general capabilities while on-policy RL preserves them, we analyze their distributions across probability and entropy dimensions. Our findings reveal that SFT is characterized by a prevalence of \"Confident Conflicts\" (low-probability, low-entropy tokens). In these instances, the model is forced to make highly certain predictions that fundamentally conflict with the underlying data distribution.

\n

\"intro1-4\"

\n","updatedAt":"2026-01-08T04:33:00.427Z","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.86757493019104},"editors":["diaomuxi"],"editorAvatarUrls":["/avatars/5ddafe7a05828366f66c79072556f370.svg"],"reactions":[],"isReport":false}},{"id":"695f35c3d153a8db364758f3","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-01-08T04:42:43.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"To verify if these tokens drive catastrophic forgetting, we perform a pilot study where we mask the loss contribution of the lowest 15% probability and entropy tokens. Our results show that simply ignoring the gradients from these 'Confident Conflicts' significantly mitigates capacity loss compared to standard SFT.\n\n\n![gate_lossce_general_curve](https://cdn-uploads.huggingface.co/production/uploads/6768c97367e4b4606a3c9cec/IVWAeYxIhXa91sjtTZp82.png)\n","html":"

To verify if these tokens drive catastrophic forgetting, we perform a pilot study where we mask the loss contribution of the lowest 15% probability and entropy tokens. Our results show that simply ignoring the gradients from these 'Confident Conflicts' significantly mitigates capacity loss compared to standard SFT.

\n

\"gate_lossce_general_curve\"

\n","updatedAt":"2026-01-08T04:42:43.302Z","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.763461709022522},"editors":["diaomuxi"],"editorAvatarUrls":["/avatars/5ddafe7a05828366f66c79072556f370.svg"],"reactions":[],"isReport":false}},{"id":"695f3781d153a8db36476c30","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-01-08T04:50:09.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"The pilot study confirms our hypothesis, but hard masking has several drawbacks: it discards training data, hinders target-domain learning, and relies on sensitive hyperparameters. To overcome these, we propose Entropy-Adaptive Fine-Tuning (EAFT), a soft gating mechanism that adjusts the learning signal based on model uncertainty.\n\n\n![EAFT_loss](https://cdn-uploads.huggingface.co/production/uploads/6768c97367e4b4606a3c9cec/0oCAjNI2ra3q2CwamV4wH.png)\n\nThis normalization enables a self-regulating process:\n\nConflict Suppression (H_t→0😨): When the model is certain (low entropy), the weight drops. This masks harmful gradients from conflicting labels.\nKnowledge Acquisition (H_t→1😄): When the model is uncertain (high entropy), the weight stays high. This recovers the standard SFT objective to learn new patterns.","html":"

The pilot study confirms our hypothesis, but hard masking has several drawbacks: it discards training data, hinders target-domain learning, and relies on sensitive hyperparameters. To overcome these, we propose Entropy-Adaptive Fine-Tuning (EAFT), a soft gating mechanism that adjusts the learning signal based on model uncertainty.

\n

\"EAFT_loss\"

\n

This normalization enables a self-regulating process:

\n

Conflict Suppression (H_t→0😨): When the model is certain (low entropy), the weight drops. This masks harmful gradients from conflicting labels.
Knowledge Acquisition (H_t→1😄): When the model is uncertain (high entropy), the weight stays high. This recovers the standard SFT objective to learn new patterns.

\n","updatedAt":"2026-01-08T04:50:09.151Z","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8138085603713989},"editors":["diaomuxi"],"editorAvatarUrls":["/avatars/5ddafe7a05828366f66c79072556f370.svg"],"reactions":[{"reaction":"🚀","users":["zzzyzh"],"count":1}],"isReport":false}},{"id":"695f38cf7e6ede0dd9f79e13","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-01-08T04:55:43.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"We demonstrate the effectiveness and universality of EAFT across multiple domains, including Math (reasoning), Agent tool-calling (format), and Medical (knowledge). Our evaluation spans different model families (Qwen, GLM) and scales (4B to 32B), confirming EAFT is a robust solution for diverse fine-tuning scenarios.\n\n\n![main](https://cdn-uploads.huggingface.co/production/uploads/6768c97367e4b4606a3c9cec/rc1AuDBylqHQw-a9AFNAF.png)\n","html":"

We demonstrate the effectiveness and universality of EAFT across multiple domains, including Math (reasoning), Agent tool-calling (format), and Medical (knowledge). Our evaluation spans different model families (Qwen, GLM) and scales (4B to 32B), confirming EAFT is a robust solution for diverse fine-tuning scenarios.

\n

\"main\"

\n","updatedAt":"2026-01-08T04:55:43.124Z","author":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","fullname":"diaomuxi","name":"diaomuxi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7516465187072754},"editors":["diaomuxi"],"editorAvatarUrls":["/avatars/5ddafe7a05828366f66c79072556f370.svg"],"reactions":[],"isReport":false}},{"id":"69605ba173628fa86106e152","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-01-09T01:36:33.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Diversity or Precision? A Deep Dive into Next Token Prediction](https://huggingface.co/papers/2512.22955) (2025)\n* [DaGRPO: Rectifying Gradient Conflict in Reasoning via Distinctiveness-Aware Group Relative Policy Optimization](https://huggingface.co/papers/2512.06337) (2025)\n* [Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning](https://huggingface.co/papers/2601.03190) (2026)\n* [SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization](https://huggingface.co/papers/2511.17938) (2025)\n* [ICPO: Intrinsic Confidence-Driven Group Relative Preference Optimization for Efficient Reinforcement Learning](https://huggingface.co/papers/2511.21005) (2025)\n* [In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback](https://huggingface.co/papers/2511.09865) (2025)\n* [ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning](https://huggingface.co/papers/2512.13095) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-01-09T01:36:33.745Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7279813289642334},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.02151","authors":[{"_id":"695f2d8a5fa3847525c41f8d","user":{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","isPro":false,"fullname":"diaomuxi","user":"diaomuxi","type":"user"},"name":"Muxi Diao","status":"claimed_verified","statusLastChangedAt":"2026-01-08T08:31:56.507Z","hidden":false},{"_id":"695f2d8a5fa3847525c41f8e","user":{"_id":"666a6cf89a3e3ce05a519bcc","avatarUrl":"/avatars/9e72481deec3bd5c5202e42c32894a32.svg","isPro":false,"fullname":"杨乐乐","user":"ssl-asuka","type":"user"},"name":"Lele Yang","status":"claimed_verified","statusLastChangedAt":"2026-01-08T08:31:51.246Z","hidden":false},{"_id":"695f2d8a5fa3847525c41f8f","user":{"_id":"64c0f972d76592ba899c2c9c","avatarUrl":"/avatars/d6940beb135f99241c6fb2cf0e8ccdbe.svg","isPro":false,"fullname":"GongWuxuan","user":"Wuxuan-Gong","type":"user"},"name":"Wuxuan Gong","status":"admin_assigned","statusLastChangedAt":"2026-01-08T08:44:19.470Z","hidden":false},{"_id":"695f2d8a5fa3847525c41f90","name":"Yutong Zhang","hidden":false},{"_id":"695f2d8a5fa3847525c41f91","user":{"_id":"64fbd4e69a62bb2791b3a665","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64fbd4e69a62bb2791b3a665/ZEMtU8O0z98ryeRCG3l_K.jpeg","isPro":false,"fullname":"Zhonghao Yan","user":"zzzyzh","type":"user"},"name":"Zhonghao Yan","status":"claimed_verified","statusLastChangedAt":"2026-01-08T08:31:53.904Z","hidden":false},{"_id":"695f2d8a5fa3847525c41f92","name":"Yufei Han","hidden":false},{"_id":"695f2d8a5fa3847525c41f93","user":{"_id":"67f4a56928cbc4f2f75c008d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/qoX8HT0JsjqW2OQoENSvg.png","isPro":false,"fullname":"Kongming Liang","user":"KongmingLiang","type":"user"},"name":"Kongming Liang","status":"admin_assigned","statusLastChangedAt":"2026-01-08T08:44:38.812Z","hidden":false},{"_id":"695f2d8a5fa3847525c41f94","name":"Weiran Xu","hidden":false},{"_id":"695f2d8a5fa3847525c41f95","name":"Zhanyu Ma","hidden":false}],"publishedAt":"2026-01-05T14:28:17.000Z","submittedOnDailyAt":"2026-01-08T01:42:04.476Z","title":"Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting","submittedOnDailyBy":{"_id":"666a6cf89a3e3ce05a519bcc","avatarUrl":"/avatars/9e72481deec3bd5c5202e42c32894a32.svg","isPro":false,"fullname":"杨乐乐","user":"ssl-asuka","type":"user"},"summary":"Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning (RL) effectively preserves general capabilities. We investigate this discrepancy and identify a fundamental distributional gap: while RL aligns with the model's internal belief, SFT forces the model to fit external supervision. This mismatch often manifests as \"Confident Conflicts\" tokens characterized by low probability but low entropy. In these instances, the model is highly confident in its own prediction but is forced to learn a divergent ground truth, triggering destructive gradient updates. To address this, we propose Entropy-Adaptive Fine-Tuning (EAFT). Unlike methods relying solely on prediction probability, EAFT utilizes token-level entropy as a gating mechanism to distinguish between epistemic uncertainty and knowledge conflict. This allows the model to learn from uncertain samples while suppressing gradients on conflicting data. Extensive experiments on Qwen and GLM series (ranging from 4B to 32B parameters) across mathematical, medical, and agentic domains confirm our hypothesis. EAFT consistently matches the downstream performance of standard SFT while significantly mitigating the degradation of general capabilities.","upvotes":109,"discussionId":"695f2d8b5fa3847525c41f96","projectPage":"https://ymxyll.github.io/EAFT/","githubRepo":"https://github.com/PRIS-CV/EAFT","githubRepoAddedBy":"user","ai_summary":"Entropy-Adaptive Fine-Tuning addresses catastrophic forgetting in supervised fine-tuning by using token-level entropy to distinguish uncertainty from knowledge conflict, enabling better preservation of general capabilities.","ai_keywords":["supervised fine-tuning","catastrophic forgetting","on-policy reinforcement learning","distributional gap","Confident Conflicts","token-level entropy","epistemic uncertainty","knowledge conflict","gradient updates","downstream performance"],"githubStars":83,"organization":{"_id":"64283c0c68faf6ddab552684","name":"BUPT-PRIS","fullname":"BUPT AI PRIS"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64fbd4e69a62bb2791b3a665","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64fbd4e69a62bb2791b3a665/ZEMtU8O0z98ryeRCG3l_K.jpeg","isPro":false,"fullname":"Zhonghao Yan","user":"zzzyzh","type":"user"},{"_id":"6768c97367e4b4606a3c9cec","avatarUrl":"/avatars/5ddafe7a05828366f66c79072556f370.svg","isPro":false,"fullname":"diaomuxi","user":"diaomuxi","type":"user"},{"_id":"668369c82ca1c52c274182d8","avatarUrl":"/avatars/31fbedaf27ac78bea6f0aa508b1b1c85.svg","isPro":false,"fullname":"Yufei Han","user":"Cquer","type":"user"},{"_id":"67bc74a95a5abd08727e66be","avatarUrl":"/avatars/b8f9d134e8791671ada10d0300b9a6a1.svg","isPro":false,"fullname":"sssl","user":"asuka9","type":"user"},{"_id":"67bca2d0d692cc805a255630","avatarUrl":"/avatars/bcae5081880ccf819ac4751c305127d7.svg","isPro":false,"fullname":"ssl","user":"sl-asuka","type":"user"},{"_id":"68d5052b8b5d39ebc1b69fb5","avatarUrl":"/avatars/50b7a9658f6a3c66d0ddcb6c345450e5.svg","isPro":false,"fullname":"wssf8727","user":"wssf8727","type":"user"},{"_id":"68e8d35fe36fce6d5a5bb53d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ybZaZDlhKN2fxVz1D4O_r.png","isPro":false,"fullname":"yunbo","user":"yunbo1132","type":"user"},{"_id":"668402a8a4ebc8d956091c50","avatarUrl":"/avatars/8f018e9e9857f38b319ca7c83127c2cf.svg","isPro":false,"fullname":"YangZaiyan","user":"Tintoki","type":"user"},{"_id":"65572fd2bcdc315c0a1395d7","avatarUrl":"/avatars/36135ca35bbd4baca37a9092f3e2d82c.svg","isPro":false,"fullname":"Fu-Dayuan","user":"fudayuan","type":"user"},{"_id":"68774ab72944cd1606942b2f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/wL3-8MWTVaaRnqTGzDcr1.png","isPro":false,"fullname":"xuyuk","user":"bupt2023kyx","type":"user"},{"_id":"6303c5041dd5d3c624836739","avatarUrl":"/avatars/7dbc3d6e894c2eed9a2fe4cef7c1ce4a.svg","isPro":false,"fullname":"Ayami I","user":"Ayakinokiki","type":"user"},{"_id":"695f34a2d6ebf38009b629fb","avatarUrl":"/avatars/ceaf5c980bcc263a90abedcd065edaa1.svg","isPro":false,"fullname":"NSNkongbai","user":"NSNKB","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1,"organization":{"_id":"64283c0c68faf6ddab552684","name":"BUPT-PRIS","fullname":"BUPT AI PRIS"}}">
Papers
arxiv:2601.02151

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Published on Jan 5
· Submitted by
杨乐乐
on Jan 8
#1 Paper of the day
Authors:
,
,
,

Abstract

Entropy-Adaptive Fine-Tuning addresses catastrophic forgetting in supervised fine-tuning by using token-level entropy to distinguish uncertainty from knowledge conflict, enabling better preservation of general capabilities.

AI-generated summary

Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning (RL) effectively preserves general capabilities. We investigate this discrepancy and identify a fundamental distributional gap: while RL aligns with the model's internal belief, SFT forces the model to fit external supervision. This mismatch often manifests as "Confident Conflicts" tokens characterized by low probability but low entropy. In these instances, the model is highly confident in its own prediction but is forced to learn a divergent ground truth, triggering destructive gradient updates. To address this, we propose Entropy-Adaptive Fine-Tuning (EAFT). Unlike methods relying solely on prediction probability, EAFT utilizes token-level entropy as a gating mechanism to distinguish between epistemic uncertainty and knowledge conflict. This allows the model to learn from uncertain samples while suppressing gradients on conflicting data. Extensive experiments on Qwen and GLM series (ranging from 4B to 32B parameters) across mathematical, medical, and agentic domains confirm our hypothesis. EAFT consistently matches the downstream performance of standard SFT while significantly mitigating the degradation of general capabilities.

Community

Paper author Paper submitter
·
Paper author

🥳 Integration: EAFT has been merged into LLaMA-Factory! You can now use EAFT via use_eaft_loss parameter in it.

Paper author

To investigate why SFT often degrades general capabilities while on-policy RL preserves them, we analyze their distributions across probability and entropy dimensions. Our findings reveal that SFT is characterized by a prevalence of "Confident Conflicts" (low-probability, low-entropy tokens). In these instances, the model is forced to make highly certain predictions that fundamentally conflict with the underlying data distribution.

intro1-4

Paper author

To verify if these tokens drive catastrophic forgetting, we perform a pilot study where we mask the loss contribution of the lowest 15% probability and entropy tokens. Our results show that simply ignoring the gradients from these 'Confident Conflicts' significantly mitigates capacity loss compared to standard SFT.

gate_lossce_general_curve

Paper author

The pilot study confirms our hypothesis, but hard masking has several drawbacks: it discards training data, hinders target-domain learning, and relies on sensitive hyperparameters. To overcome these, we propose Entropy-Adaptive Fine-Tuning (EAFT), a soft gating mechanism that adjusts the learning signal based on model uncertainty.

EAFT_loss

This normalization enables a self-regulating process:

Conflict Suppression (H_t→0😨): When the model is certain (low entropy), the weight drops. This masks harmful gradients from conflicting labels.
Knowledge Acquisition (H_t→1😄): When the model is uncertain (high entropy), the weight stays high. This recovers the standard SFT objective to learn new patterns.

Paper author

We demonstrate the effectiveness and universality of EAFT across multiple domains, including Math (reasoning), Agent tool-calling (format), and Medical (knowledge). Our evaluation spans different model families (Qwen, GLM) and scales (4B to 32B), confirming EAFT is a robust solution for diverse fine-tuning scenarios.

main

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.02151 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.02151 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.02151 in a Space README.md to link it from this page.

Collections including this paper 12