Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Stronger Models are NOT Stronger Teachers for Instruction Tuning
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-11-14T01:33:34.425Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7660465836524963},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2411.07133","authors":[{"_id":"6734679ad3521b36246eb784","user":{"_id":"653df1323479e9ebbe3eb6cc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/653df1323479e9ebbe3eb6cc/K_g-r1iMRNKj99LXPuYF3.jpeg","isPro":true,"fullname":"Zhangchen Xu","user":"zhangchenxu","type":"user"},"name":"Zhangchen Xu","status":"claimed_verified","statusLastChangedAt":"2024-11-15T09:24:56.226Z","hidden":false},{"_id":"6734679ad3521b36246eb785","user":{"_id":"6531e1021dd8ebbdc1a6fd8e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6531e1021dd8ebbdc1a6fd8e/lIcl7zCPtzRsfiUh6uY1o.jpeg","isPro":false,"fullname":"Fengqing Jiang","user":"fqjiang","type":"user"},"name":"Fengqing Jiang","status":"claimed_verified","statusLastChangedAt":"2024-11-15T09:25:03.632Z","hidden":false},{"_id":"6734679ad3521b36246eb786","user":{"_id":"666dfd4770f5a2cb4aefd12f","avatarUrl":"/avatars/fa0e0dbc203a21e58dda8fdb4cbc67ad.svg","isPro":false,"fullname":"Luyao Niu","user":"LNIU","type":"user"},"name":"Luyao Niu","status":"claimed_verified","statusLastChangedAt":"2025-07-15T19:14:30.100Z","hidden":false},{"_id":"6734679ad3521b36246eb787","user":{"_id":"607f666a4ad99100d63ce35c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/607f666a4ad99100d63ce35c/QxhxnvfeV6efkxwUFHwjI.png","isPro":false,"fullname":"Bill Yuchen Lin","user":"yuchenlin","type":"user"},"name":"Bill Yuchen Lin","status":"extracted_pending","statusLastChangedAt":"2024-11-13T08:47:23.435Z","hidden":false},{"_id":"6734679ad3521b36246eb788","name":"Radha Poovendran","hidden":false}],"publishedAt":"2024-11-11T17:06:48.000Z","submittedOnDailyAt":"2024-11-13T06:17:49.344Z","title":"Stronger Models are NOT Stronger Teachers for Instruction Tuning","submittedOnDailyBy":{"_id":"607f666a4ad99100d63ce35c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/607f666a4ad99100d63ce35c/QxhxnvfeV6efkxwUFHwjI.png","isPro":false,"fullname":"Bill Yuchen Lin","user":"yuchenlin","type":"user"},"summary":"Instruction tuning has been widely adopted to ensure large language models\n(LLMs) follow user instructions effectively. The resulting\ninstruction-following capabilities of LLMs heavily rely on the instruction\ndatasets used for tuning. Recently, synthetic instruction datasets have emerged\nas an economically viable solution to provide LLMs diverse and high-quality\ninstructions. However, existing approaches typically assume that larger or\nstronger models are stronger teachers for instruction tuning, and hence simply\nadopt these models as response generators to the synthetic instructions. In\nthis paper, we challenge this commonly-adopted assumption. Our extensive\nexperiments across five base models and twenty response generators reveal that\nlarger and stronger models are not necessarily stronger teachers of smaller\nmodels. We refer to this phenomenon as the Larger Models' Paradox. We observe\nthat existing metrics cannot precisely predict the effectiveness of response\ngenerators since they ignore the compatibility between teachers and base models\nbeing fine-tuned. We thus develop a novel metric, named as\nCompatibility-Adjusted Reward (CAR) to measure the effectiveness of response\ngenerators. Our experiments across five base models demonstrate that CAR\noutperforms almost all baselines.","upvotes":38,"discussionId":"6734679bd3521b36246eb7d0","ai_summary":"The Larger Models' Paradox reveals that larger models are not always better teachers for fine-tuning smaller models, and a new metric, Compatibility-Adjusted Reward (CAR), is introduced to measure and improve the effectiveness of response generators.","ai_keywords":["instruction tuning","large language models (LLMs)","instruction-following capabilities","instruction datasets","synthetic instruction datasets","response generators","Larger Models' Paradox","Compatibility-Adjusted Reward (CAR)"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62567c86d444a9b5a0ec51c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62567c86d444a9b5a0ec51c1/1vXJf2uGztPcXpkwyTBr6.png","isPro":false,"fullname":"Dongfu Jiang","user":"DongfuJiang","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"643b19f8a856622f978df30f","avatarUrl":"/avatars/c82779fdf94f80cdb5020504f83c818b.svg","isPro":false,"fullname":"Yatharth Sharma","user":"YaTharThShaRma999","type":"user"},{"_id":"653df1323479e9ebbe3eb6cc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/653df1323479e9ebbe3eb6cc/K_g-r1iMRNKj99LXPuYF3.jpeg","isPro":true,"fullname":"Zhangchen Xu","user":"zhangchenxu","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"612856f6875296178eccf491","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1630033724590-612856f6875296178eccf491.jpeg","isPro":false,"fullname":"rin2401","user":"rin2401","type":"user"},{"_id":"6531e1021dd8ebbdc1a6fd8e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6531e1021dd8ebbdc1a6fd8e/lIcl7zCPtzRsfiUh6uY1o.jpeg","isPro":false,"fullname":"Fengqing Jiang","user":"fqjiang","type":"user"},{"_id":"6439f43a1514b7ee7fb616a1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6439f43a1514b7ee7fb616a1/aFhmyAoicv3zcWKYZ27Z_.png","isPro":true,"fullname":"Jeonghwan Park","user":"maywell","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/Ydjcjd4VuNUGj5Cd4QHdB.png","isPro":false,"fullname":"atayloraerospace","user":"Taylor658","type":"user"},{"_id":"631c386bc73939ffc0716a37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1662793811119-noauth.jpeg","isPro":false,"fullname":"SeongWan Kim","user":"idgmatrix","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">
The Larger Models' Paradox reveals that larger models are not always better teachers for fine-tuning smaller models, and a new metric, Compatibility-Adjusted Reward (CAR), is introduced to measure and improve the effectiveness of response generators.
AI-generated summary
Instruction tuning has been widely adopted to ensure large language models
(LLMs) follow user instructions effectively. The resulting
instruction-following capabilities of LLMs heavily rely on the instruction
datasets used for tuning. Recently, synthetic instruction datasets have emerged
as an economically viable solution to provide LLMs diverse and high-quality
instructions. However, existing approaches typically assume that larger or
stronger models are stronger teachers for instruction tuning, and hence simply
adopt these models as response generators to the synthetic instructions. In
this paper, we challenge this commonly-adopted assumption. Our extensive
experiments across five base models and twenty response generators reveal that
larger and stronger models are not necessarily stronger teachers of smaller
models. We refer to this phenomenon as the Larger Models' Paradox. We observe
that existing metrics cannot precisely predict the effectiveness of response
generators since they ignore the compatibility between teachers and base models
being fine-tuned. We thus develop a novel metric, named as
Compatibility-Adjusted Reward (CAR) to measure the effectiveness of response
generators. Our experiments across five base models demonstrate that CAR
outperforms almost all baselines.