Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-02-18T01:34:25.363Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6937152147293091},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2502.08130","authors":[{"_id":"67b3716bab1b992c7f4599da","name":"Sonam Gupta","hidden":false},{"_id":"67b3716bab1b992c7f4599db","user":{"_id":"6626284411772517e546d799","avatarUrl":"/avatars/a9f2f1958c9f17ccc911fa323d9af2f3.svg","isPro":false,"fullname":"Yatin Nandwani","user":"ynandwan","type":"user"},"name":"Yatin Nandwani","status":"admin_assigned","statusLastChangedAt":"2025-02-19T15:52:54.491Z","hidden":false},{"_id":"67b3716bab1b992c7f4599dc","user":{"_id":"638324f862badff43269e588","avatarUrl":"/avatars/907a39a9b44fc8b7f3fad35858b01fb7.svg","isPro":false,"fullname":"Asaf Yehudai","user":"Asaf-Yehudai","type":"user"},"name":"Asaf Yehudai","status":"admin_assigned","statusLastChangedAt":"2025-02-19T15:53:01.485Z","hidden":false},{"_id":"67b3716bab1b992c7f4599dd","name":"Dinesh Khandelwal","hidden":false},{"_id":"67b3716bab1b992c7f4599de","name":"Dinesh Raghu","hidden":false},{"_id":"67b3716bab1b992c7f4599df","name":"Sachindra Joshi","hidden":false}],"publishedAt":"2025-02-12T05:24:21.000Z","submittedOnDailyAt":"2025-02-17T14:57:43.231Z","title":"Selective Self-to-Supervised Fine-Tuning for Generalization in Large\n Language Models","submittedOnDailyBy":{"_id":"638324f862badff43269e588","avatarUrl":"/avatars/907a39a9b44fc8b7f3fad35858b01fb7.svg","isPro":false,"fullname":"Asaf Yehudai","user":"Asaf-Yehudai","type":"user"},"summary":"Fine-tuning Large Language Models (LLMs) on specific datasets is a common\npractice to improve performance on target tasks. However, this performance gain\noften leads to overfitting, where the model becomes too specialized in either\nthe task or the characteristics of the training data, resulting in a loss of\ngeneralization. This paper introduces Selective Self-to-Supervised Fine-Tuning\n(S3FT), a fine-tuning approach that achieves better performance than the\nstandard supervised fine-tuning (SFT) while improving generalization. S3FT\nleverages the existence of multiple valid responses to a query. By utilizing\nthe model's correct responses, S3FT reduces model specialization during the\nfine-tuning stage. S3FT first identifies the correct model responses from the\ntraining set by deploying an appropriate judge. Then, it fine-tunes the model\nusing the correct model responses and the gold response (or its paraphrase) for\nthe remaining samples. The effectiveness of S3FT is demonstrated through\nexperiments on mathematical reasoning, Python programming and reading\ncomprehension tasks. The results show that standard SFT can lead to an average\nperformance drop of up to 4.4 on multiple benchmarks, such as MMLU and\nTruthfulQA. In contrast, S3FT reduces this drop by half, i.e. 2.5, indicating\nbetter generalization capabilities than SFT while performing significantly\nbetter on the fine-tuning tasks.","upvotes":9,"discussionId":"67b3716bab1b992c7f459a15","ai_summary":"Selective Self-to-Supervised Fine-Tuning (S3FT) improves generalization and performance of fine-tuned Large Language Models (LLMs) by leveraging correct responses and reducing specialization.","ai_keywords":["Fine-Tuning","Large Language Models (LLMs)","overfitting","Selective Self-to-Supervised Fine-Tuning (S3FT)","standard supervised fine-tuning (SFT)","generalization","mathematical reasoning","Python programming","reading comprehension","MMLU","TruthfulQA"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"638324f862badff43269e588","avatarUrl":"/avatars/907a39a9b44fc8b7f3fad35858b01fb7.svg","isPro":false,"fullname":"Asaf Yehudai","user":"Asaf-Yehudai","type":"user"},{"_id":"6626284411772517e546d799","avatarUrl":"/avatars/a9f2f1958c9f17ccc911fa323d9af2f3.svg","isPro":false,"fullname":"Yatin Nandwani","user":"ynandwan","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"631e14ac473a6825f285e89d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/631e14ac473a6825f285e89d/K-6QnoeGLg8XFvbTMMdqA.jpeg","isPro":false,"fullname":"Yury Panikov","user":"panikov","type":"user"},{"_id":"64bfaee81a62149c5eb710a9","avatarUrl":"/avatars/e585882ead6779c82ace7b1cca19ebbe.svg","isPro":false,"fullname":"Sonam Gupta","user":"sonam1602","type":"user"},{"_id":"64d4615cf8082bf19b916492","avatarUrl":"/avatars/8e1b59565ec5e4b31090cf1b911781b9.svg","isPro":false,"fullname":"wongyukim","user":"wongyukim","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/Ydjcjd4VuNUGj5Cd4QHdB.png","isPro":false,"fullname":"atayloraerospace","user":"Taylor658","type":"user"},{"_id":"63b65e002bd3611e6679caba","avatarUrl":"/avatars/dced955e106926b8caff071010c92daf.svg","isPro":false,"fullname":"Dinesh Khandelwal","user":"dineshkh","type":"user"},{"_id":"6415ebd2107962562e9a0712","avatarUrl":"/avatars/dfd2e6d42c6110bb64b6a73c96eeb3f3.svg","isPro":false,"fullname":"Ariel Gera","user":"arielgera","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2502.08130

Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

Published on Feb 12, 2025
· Submitted by
Asaf Yehudai
on Feb 17, 2025
Authors:
,
,
,

Abstract

Selective Self-to-Supervised Fine-Tuning (S3FT) improves generalization and performance of fine-tuned Large Language Models (LLMs) by leveraging correct responses and reducing specialization.

AI-generated summary

Fine-tuning Large Language Models (LLMs) on specific datasets is a common practice to improve performance on target tasks. However, this performance gain often leads to overfitting, where the model becomes too specialized in either the task or the characteristics of the training data, resulting in a loss of generalization. This paper introduces Selective Self-to-Supervised Fine-Tuning (S3FT), a fine-tuning approach that achieves better performance than the standard supervised fine-tuning (SFT) while improving generalization. S3FT leverages the existence of multiple valid responses to a query. By utilizing the model's correct responses, S3FT reduces model specialization during the fine-tuning stage. S3FT first identifies the correct model responses from the training set by deploying an appropriate judge. Then, it fine-tunes the model using the correct model responses and the gold response (or its paraphrase) for the remaining samples. The effectiveness of S3FT is demonstrated through experiments on mathematical reasoning, Python programming and reading comprehension tasks. The results show that standard SFT can lead to an average performance drop of up to 4.4 on multiple benchmarks, such as MMLU and TruthfulQA. In contrast, S3FT reduces this drop by half, i.e. 2.5, indicating better generalization capabilities than SFT while performing significantly better on the fine-tuning tasks.

Community

Paper author Paper submitter

Fine-tuning Large Language Models (LLMs) on specific datasets is a common practice to improve performance on target tasks. However, this performance gain often leads to overfitting, where the model becomes too specialized in either the task or the characteristics of the training data, resulting in a loss of generalization. This paper introduces Selective Self-to-Supervised Fine-Tuning (S3FT), a fine-tuning approach that achieves better performance than the standard supervised fine-tuning (SFT) while improving generalization. S3FT leverages the existence of multiple valid responses to a query. By utilizing the model's correct responses, S3FT reduces model specialization during the fine-tuning stage. S3FT first identifies the correct model responses from the training set by deploying an appropriate judge. Then, it fine-tunes the model using the correct model responses and the gold response (or its paraphrase) for the remaining samples. The effectiveness of S3FT is demonstrated through experiments on mathematical reasoning, Python programming, and reading comprehension tasks. The results show that standard SFT can lead to an average performance drop of up to 4.4 on multiple benchmarks, such as MMLU and TruthfulQA. In contrast, S3FT reduces this drop by half, i.e. 2.5, indicating better generalization capabilities than SFT while performing significantly better on the fine-tuning tasks.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.08130 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.08130 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.08130 in a Space README.md to link it from this page.

Collections including this paper 2