Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
[go: Go Back, main page]

https://github.com/Utaotao/ProFit

\n","updatedAt":"2026-01-19T02:52:49.619Z","author":{"_id":"6621cea88850e38ffbb1854f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg","fullname":"Taki WU","name":"taki555","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9010511636734009},"editors":["taki555"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg"],"reactions":[],"isReport":false}},{"id":"696d9c7a776f8ff4db94ad13","author":{"_id":"6621cea88850e38ffbb1854f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg","fullname":"Taki WU","name":"taki555","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false},"createdAt":"2026-01-19T02:52:42.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Quick Takeaway:\n1) We need to lose the target of SFT as part of the semantically crucial tokens. \n2) We find that predicted token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. \n3) Proposed ProFit method selectively masks low-probability tokens to prevent surface-level overfitting.\n","html":"

Quick Takeaway:

\n
    \n
  1. We need to lose the target of SFT as part of the semantically crucial tokens.
  2. \n
  3. We find that predicted token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions.
  4. \n
  5. Proposed ProFit method selectively masks low-probability tokens to prevent surface-level overfitting.
  6. \n
\n","updatedAt":"2026-01-19T02:52:42.288Z","author":{"_id":"6621cea88850e38ffbb1854f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg","fullname":"Taki WU","name":"taki555","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8706371784210205},"editors":["taki555"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg"],"reactions":[{"reaction":"🚀","users":["yang31210999"],"count":1}],"isReport":false}},{"id":"696e121e6d2da41dabc41000","author":{"_id":"6960eca92f7ad9b043b5cbe0","avatarUrl":"/avatars/e68dcc7fd04f143d849d40414866e633.svg","fullname":"Noah","name":"noahml","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-01-19T11:14:38.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I created a podcast to explain the key concepts: \nhttps://researchpod-share.vercel.app/episode/ace13947-7c31-4ec2-b1d3-3cfe4115da3f","html":"

I created a podcast to explain the key concepts:
https://researchpod-share.vercel.app/episode/ace13947-7c31-4ec2-b1d3-3cfe4115da3f

\n","updatedAt":"2026-01-19T11:14:38.371Z","author":{"_id":"6960eca92f7ad9b043b5cbe0","avatarUrl":"/avatars/e68dcc7fd04f143d849d40414866e633.svg","fullname":"Noah","name":"noahml","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6892015933990479},"editors":["noahml"],"editorAvatarUrls":["/avatars/e68dcc7fd04f143d849d40414866e633.svg"],"reactions":[{"reaction":"👍","users":["taki555","yang31210999"],"count":2}],"isReport":false},"replies":[{"id":"696e1324ca5779cca9572ece","author":{"_id":"6621cea88850e38ffbb1854f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg","fullname":"Taki WU","name":"taki555","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false},"createdAt":"2026-01-19T11:19:00.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Wow, many thanks!","html":"

Wow, many thanks!

\n","updatedAt":"2026-01-19T11:19:00.682Z","author":{"_id":"6621cea88850e38ffbb1854f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg","fullname":"Taki WU","name":"taki555","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9834467172622681},"editors":["taki555"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"696e121e6d2da41dabc41000"}}]},{"id":"696e56dd4d4172c3b7ef23fc","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-01-19T16:07:57.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"arXivlens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/profit-leveraging-high-value-signals-in-sft-via-probability-guided-token-selection-5856-2d8ab01c\n\n- Executive Summary\n- Detailed Breakdown\n- Practical Applications","html":"

arXivlens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/profit-leveraging-high-value-signals-in-sft-via-probability-guided-token-selection-5856-2d8ab01c

\n
    \n
  • Executive Summary
  • \n
  • Detailed Breakdown
  • \n
  • Practical Applications
  • \n
\n","updatedAt":"2026-01-19T16:07:57.814Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7604149580001831},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[{"reaction":"❤️","users":["taki555"],"count":1}],"isReport":false}},{"id":"696edc2eb2a5568e5ecc710c","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-01-20T01:36:46.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting](https://huggingface.co/papers/2601.02151) (2026)\n* [DaGRPO: Rectifying Gradient Conflict in Reasoning via Distinctiveness-Aware Group Relative Policy Optimization](https://huggingface.co/papers/2512.06337) (2025)\n* [AIR: Post-training Data Selection for Reasoning via Attention Head Influence](https://huggingface.co/papers/2512.13279) (2025)\n* [Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy](https://huggingface.co/papers/2512.21017) (2025)\n* [GIFT: Unlocking Global Optimality in Post-Training via Finite-Temperature Gibbs Initialization](https://huggingface.co/papers/2601.09233) (2026)\n* [Learning from Mistakes: Negative Reasoning Samples Enhance Out-of-Domain Generalization](https://huggingface.co/papers/2601.04992) (2026)\n* [Diversity or Precision? A Deep Dive into Next Token Prediction](https://huggingface.co/papers/2512.22955) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-01-20T01:36:46.429Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7459679245948792},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.09195","authors":[{"_id":"6969ca6832f0333869ff94c0","user":{"_id":"67be83a727ba0c1993f48c4f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67be83a727ba0c1993f48c4f/0FFHsI_S-htU4a9PfHLLA.jpeg","isPro":false,"fullname":"Tao Liu","user":"utaotao","type":"user"},"name":"Tao Liu","status":"claimed_verified","statusLastChangedAt":"2026-01-19T14:54:40.157Z","hidden":false},{"_id":"6969ca6832f0333869ff94c1","user":{"_id":"6621cea88850e38ffbb1854f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg","isPro":false,"fullname":"Taki WU","user":"taki555","type":"user"},"name":"Taiqiang Wu","status":"claimed_verified","statusLastChangedAt":"2026-01-19T09:23:59.501Z","hidden":false},{"_id":"6969ca6832f0333869ff94c2","user":{"_id":"66441bbd6df04abec508648e","avatarUrl":"/avatars/dcbc33742318d357ab9d426d12efa89a.svg","isPro":false,"fullname":"Rummy","user":"yang31210999","type":"user"},"name":"Runming Yang","status":"claimed_verified","statusLastChangedAt":"2026-01-19T14:33:54.695Z","hidden":false},{"_id":"6969ca6832f0333869ff94c3","user":{"_id":"656af95af7be0986b44e7eef","avatarUrl":"/avatars/8b0c25ddb1d248a2eed0928f3403d521.svg","isPro":false,"fullname":"Shaoning Sun","user":"shaoningsun","type":"user"},"name":"Shaoning Sun","status":"admin_assigned","statusLastChangedAt":"2026-01-19T14:41:33.930Z","hidden":false},{"_id":"6969ca6832f0333869ff94c4","user":{"_id":"62579c55b98dcaa7e0de285d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62579c55b98dcaa7e0de285d/0YUd5nloul_bW9yolDGGo.jpeg","isPro":false,"fullname":"wangjunjie","user":"wanng","type":"user"},"name":"Junjie Wang","status":"claimed_verified","statusLastChangedAt":"2026-01-20T09:41:17.098Z","hidden":false},{"_id":"6969ca6832f0333869ff94c5","user":{"_id":"64ca1fe838837b12d5e529b7","avatarUrl":"/avatars/44a3ad9e59318784ac531993b5f69f6b.svg","isPro":false,"fullname":"Yujiu Yang","user":"Thu-redrobot","type":"user"},"name":"Yujiu Yang","status":"admin_assigned","statusLastChangedAt":"2026-01-19T14:41:43.285Z","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6621cea88850e38ffbb1854f/pPOt4b6-MlXHcGoicirKU.png"],"publishedAt":"2026-01-14T05:50:40.000Z","submittedOnDailyAt":"2026-01-19T00:19:47.118Z","title":"ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection","submittedOnDailyBy":{"_id":"6621cea88850e38ffbb1854f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg","isPro":false,"fullname":"Taki WU","user":"taki555","type":"user"},"summary":"Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting. Extensive experiments confirm that ProFit consistently outperforms traditional SFT baselines on general reasoning and mathematical benchmarks.","upvotes":15,"discussionId":"6969ca6932f0333869ff94c6","githubRepo":"https://github.com/Utaotao/ProFit","githubRepoAddedBy":"user","ai_summary":"Supervised fine-tuning with multiple references addresses overfitting to non-core expressions by masking low-probability tokens based on their semantic importance.","ai_keywords":["supervised fine-tuning","Large Language Models","one-to-many nature","token probability","semantic importance","ProFit","surface-level overfitting"],"githubStars":29,"organization":{"_id":"66f55d53853f0506904d1922","name":"IIGroup","fullname":"Tsinghua IIGroup","avatar":"https://cdn-uploads.huggingface.co/production/uploads/62579c55b98dcaa7e0de285d/A1SKeBEvaODFnkAZusICK.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6621cea88850e38ffbb1854f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6621cea88850e38ffbb1854f/LeytEEjSwnnqB-zFN1Tgt.jpeg","isPro":false,"fullname":"Taki WU","user":"taki555","type":"user"},{"_id":"67be83a727ba0c1993f48c4f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67be83a727ba0c1993f48c4f/0FFHsI_S-htU4a9PfHLLA.jpeg","isPro":false,"fullname":"Tao Liu","user":"utaotao","type":"user"},{"_id":"66c4431b7be2e67389e594a0","avatarUrl":"/avatars/2e2d8325bebc70dd480df2d8b899e8dd.svg","isPro":false,"fullname":"Shaoning Sun","user":"sunshaoning","type":"user"},{"_id":"68b50f0cb7980be3d65a80a8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68b50f0cb7980be3d65a80a8/XRoCXP6c21mdzDgQb9RBn.jpeg","isPro":false,"fullname":"KenGiaChan","user":"KenGia","type":"user"},{"_id":"6445f17f3a0fa0e98cd11d50","avatarUrl":"/avatars/3bdad74ff6d09fa7300a7119afe65392.svg","isPro":false,"fullname":"武杰","user":"21223wj","type":"user"},{"_id":"641c6c51dad24840739667ed","avatarUrl":"/avatars/916bf79bd0cb2e3c3214edf5cba25784.svg","isPro":false,"fullname":"Shuoshuo Zhang","user":"zss01","type":"user"},{"_id":"623b290048f658f28aef79f7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648044277149-noauth.jpeg","isPro":false,"fullname":"Xinyu Zhu","user":"TianHongZXY","type":"user"},{"_id":"651ed7ef755e92f7f12742e6","avatarUrl":"/avatars/57a9cc189b4a59299aad6c96191b18d8.svg","isPro":false,"fullname":"yu li","user":"lyabc","type":"user"},{"_id":"66441bbd6df04abec508648e","avatarUrl":"/avatars/dcbc33742318d357ab9d426d12efa89a.svg","isPro":false,"fullname":"Rummy","user":"yang31210999","type":"user"},{"_id":"6732d16b7f210b55b19a9aa5","avatarUrl":"/avatars/7498511fa3114ba991cf856e99a7c782.svg","isPro":false,"fullname":"Chloe","user":"Newyear212","type":"user"},{"_id":"62579c55b98dcaa7e0de285d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62579c55b98dcaa7e0de285d/0YUd5nloul_bW9yolDGGo.jpeg","isPro":false,"fullname":"wangjunjie","user":"wanng","type":"user"},{"_id":"650be23ec4e52db6a4db63ef","avatarUrl":"/avatars/03af548029b38bee49ec295fefe74f9a.svg","isPro":false,"fullname":"Haoling Li","user":"Ringo1110","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"66f55d53853f0506904d1922","name":"IIGroup","fullname":"Tsinghua IIGroup","avatar":"https://cdn-uploads.huggingface.co/production/uploads/62579c55b98dcaa7e0de285d/A1SKeBEvaODFnkAZusICK.png"}}">
Papers
arxiv:2601.09195

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Published on Jan 14
¡ Submitted by
Taki WU
on Jan 19

Abstract

Supervised fine-tuning with multiple references addresses overfitting to non-core expressions by masking low-probability tokens based on their semantic importance.

AI-generated summary

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting. Extensive experiments confirm that ProFit consistently outperforms traditional SFT baselines on general reasoning and mathematical benchmarks.

Community

Paper author Paper submitter
•
edited Jan 19

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language
Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting. Extensive experiments confirm that ProFit consistently outperforms traditional SFT baselines on general reasoning and mathematical benchmarks. Codes are available at https://github.com/Utaotao/ProFit

Paper author Paper submitter

Quick Takeaway:

  1. We need to lose the target of SFT as part of the semantically crucial tokens.
  2. We find that predicted token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions.
  3. Proposed ProFit method selectively masks low-probability tokens to prevent surface-level overfitting.
¡
Paper author

Wow, many thanks!

arXivlens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/profit-leveraging-high-value-signals-in-sft-via-probability-guided-token-selection-5856-2d8ab01c

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.09195 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.09195 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.09195 in a Space README.md to link it from this page.

Collections including this paper 1