Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - On Teacher Hacking in Language Model Distillation
[go: Go Back, main page]

\"image.png\"

\n

\"image.png\"

\n","updatedAt":"2025-02-06T11:15:51.168Z","author":{"_id":"6262880c5eb4fa93219f0064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6262880c5eb4fa93219f0064/6yyBvRK4Oh7OhjaaweaVN.jpeg","fullname":"Daniil Tiapkin","name":"dtiapkin","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8842591047286987},"editors":["dtiapkin"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6262880c5eb4fa93219f0064/6yyBvRK4Oh7OhjaaweaVN.jpeg"],"reactions":[{"reaction":"👍","users":["sugatoray"],"count":1}],"isReport":false}},{"id":"67a563084b741937347365c4","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-02-07T01:34:00.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Self-Evolution Knowledge Distillation for LLM-based Machine Translation](https://huggingface.co/papers/2412.15303) (2024)\n* [Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting](https://huggingface.co/papers/2412.17846) (2024)\n* [Knowledge Injection via Prompt Distillation](https://huggingface.co/papers/2412.14964) (2024)\n* [Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision](https://huggingface.co/papers/2501.07886) (2025)\n* [Chunk-Distilled Language Modeling](https://huggingface.co/papers/2501.00343) (2024)\n* [MoPD: Mixture-of-Prompts Distillation for Vision-Language Models](https://huggingface.co/papers/2412.19087) (2024)\n* [Online Preference Alignment for Language Models via Count-based Exploration](https://huggingface.co/papers/2501.12735) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-02-07T01:34:00.776Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7171467542648315},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2502.02671","authors":[{"_id":"67a495ce0f2d0f0303a3af71","user":{"_id":"6262880c5eb4fa93219f0064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6262880c5eb4fa93219f0064/6yyBvRK4Oh7OhjaaweaVN.jpeg","isPro":false,"fullname":"Daniil Tiapkin","user":"dtiapkin","type":"user"},"name":"Daniil Tiapkin","status":"claimed_verified","statusLastChangedAt":"2025-02-06T14:14:19.738Z","hidden":false},{"_id":"67a495ce0f2d0f0303a3af72","name":"Daniele Calandriello","hidden":false},{"_id":"67a495ce0f2d0f0303a3af73","user":{"_id":"65afb7dbdd6bdfd73cd8e609","avatarUrl":"/avatars/b21069bc2d7ee4cc1508008e3c8ade64.svg","isPro":false,"fullname":"Johan Ferret","user":"ferretj","type":"user"},"name":"Johan Ferret","status":"admin_assigned","statusLastChangedAt":"2025-02-06T16:12:15.183Z","hidden":false},{"_id":"67a495ce0f2d0f0303a3af74","user":{"_id":"66328157b270ae503e91339b","avatarUrl":"/avatars/ea7a52060f5360f523ca28e137e85e33.svg","isPro":false,"fullname":"Sarah Perrin","user":"Sper42","type":"user"},"name":"Sarah Perrin","status":"admin_assigned","statusLastChangedAt":"2025-02-06T16:12:39.678Z","hidden":false},{"_id":"67a495ce0f2d0f0303a3af75","name":"Nino Vieillard","hidden":false},{"_id":"67a495ce0f2d0f0303a3af76","user":{"_id":"63c94ede00104ea998de19a6","avatarUrl":"/avatars/273959d87f0c67747588cf0700d64039.svg","isPro":false,"fullname":"Alexandre Rame","user":"alexrame","type":"user"},"name":"Alexandre Ramé","status":"admin_assigned","statusLastChangedAt":"2025-02-06T16:12:53.881Z","hidden":false},{"_id":"67a495ce0f2d0f0303a3af77","user":{"_id":"66d093b681e0683bca48bed6","avatarUrl":"/avatars/cc1fbccb0b6aa93d648bcbdf9c3a35e1.svg","isPro":false,"fullname":"Mathieu Blondel","user":"mblondel","type":"user"},"name":"Mathieu Blondel","status":"admin_assigned","statusLastChangedAt":"2025-02-06T16:13:01.896Z","hidden":false}],"publishedAt":"2025-02-04T19:26:28.000Z","submittedOnDailyAt":"2025-02-06T08:45:51.159Z","title":"On Teacher Hacking in Language Model Distillation","submittedOnDailyBy":{"_id":"6262880c5eb4fa93219f0064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6262880c5eb4fa93219f0064/6yyBvRK4Oh7OhjaaweaVN.jpeg","isPro":false,"fullname":"Daniil Tiapkin","user":"dtiapkin","type":"user"},"summary":"Post-training of language models (LMs) increasingly relies on the following\ntwo stages: (i) knowledge distillation, where the LM is trained to imitate a\nlarger teacher LM, and (ii) reinforcement learning from human feedback (RLHF),\nwhere the LM is aligned by optimizing a reward model. In the second RLHF stage,\na well-known challenge is reward hacking, where the LM over-optimizes the\nreward model. Such phenomenon is in line with Goodhart's law and can lead to\ndegraded performance on the true objective. In this paper, we investigate\nwhether a similar phenomenon, that we call teacher hacking, can occur during\nknowledge distillation. This could arise because the teacher LM is itself an\nimperfect approximation of the true distribution. To study this, we propose a\ncontrolled experimental setup involving: (i) an oracle LM representing the\nground-truth distribution, (ii) a teacher LM distilled from the oracle, and\n(iii) a student LM distilled from the teacher. Our experiments reveal the\nfollowing insights. When using a fixed offline dataset for distillation,\nteacher hacking occurs; moreover, we can detect it by observing when the\noptimization process deviates from polynomial convergence laws. In contrast,\nemploying online data generation techniques effectively mitigates teacher\nhacking. More precisely, we identify data diversity as the key factor in\npreventing hacking. Overall, our findings provide a deeper understanding of the\nbenefits and limitations of distillation for building robust and efficient LMs.","upvotes":18,"discussionId":"67a495d00f2d0f0303a3afde","ai_summary":"Experiments reveal that knowledge distillation can lead to teacher hacking, an issue akin to reward hacking, but it can be mitigated by using online data generation techniques that enhance data diversity.","ai_keywords":["knowledge distillation","reinforcement learning from human feedback","RLHF","reward hacking","Goodhart's law","oracle LM","teacher hacking","polynomial convergence laws","data diversity"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"63c94ede00104ea998de19a6","avatarUrl":"/avatars/273959d87f0c67747588cf0700d64039.svg","isPro":false,"fullname":"Alexandre Rame","user":"alexrame","type":"user"},{"_id":"6262880c5eb4fa93219f0064","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6262880c5eb4fa93219f0064/6yyBvRK4Oh7OhjaaweaVN.jpeg","isPro":false,"fullname":"Daniil Tiapkin","user":"dtiapkin","type":"user"},{"_id":"6519b63cf7724d9aee868a52","avatarUrl":"/avatars/ceded715cce5b384707294c4aad90e7e.svg","isPro":false,"fullname":"Kseniia Kuvshinova","user":"susieku","type":"user"},{"_id":"652d3b6403fdab13ae7b9b7e","avatarUrl":"/avatars/75c60daec5ba0352ff954990ed8743cb.svg","isPro":false,"fullname":"Timofei Gritsaev","user":"tgritsaev","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"630b4269e67c604e9b7a429c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630b4269e67c604e9b7a429c/qsmA2ObMFfLwPIAyveo9F.jpeg","isPro":true,"fullname":"Steffen Röcker","user":"sroecker","type":"user"},{"_id":"651e96991b97c9f33d26bde6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/651e96991b97c9f33d26bde6/-Bqs6qrmz0yCfwtB2e-6q.jpeg","isPro":true,"fullname":"Elie Bakouch","user":"eliebak","type":"user"},{"_id":"6152b4b9ecf3ca6ab820e325","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6152b4b9ecf3ca6ab820e325/tH8AMuUYUezkI1ssm9fVp.jpeg","isPro":true,"fullname":"Yohan Na","user":"nayohan","type":"user"},{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","isPro":true,"fullname":"Sugato Ray","user":"sugatoray","type":"user"},{"_id":"61b839889f7cfeae618e72c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61b839889f7cfeae618e72c9/5kFRCChdqwv7MGM8T_y5v.jpeg","isPro":false,"fullname":"Raja Biswas","user":"rbiswasfc","type":"user"},{"_id":"619d8e31c21bf5feb310bd82","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/619d8e31c21bf5feb310bd82/hTwvQUvD9UCtzkSsJGQ2X.jpeg","isPro":false,"fullname":"hyeong","user":"kimhyeongjun","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2502.02671

On Teacher Hacking in Language Model Distillation

Published on Feb 4, 2025
· Submitted by
Daniil Tiapkin
on Feb 6, 2025
Authors:
,
,

Abstract

Experiments reveal that knowledge distillation can lead to teacher hacking, an issue akin to reward hacking, but it can be mitigated by using online data generation techniques that enhance data diversity.

AI-generated summary

Post-training of language models (LMs) increasingly relies on the following two stages: (i) knowledge distillation, where the LM is trained to imitate a larger teacher LM, and (ii) reinforcement learning from human feedback (RLHF), where the LM is aligned by optimizing a reward model. In the second RLHF stage, a well-known challenge is reward hacking, where the LM over-optimizes the reward model. Such phenomenon is in line with Goodhart's law and can lead to degraded performance on the true objective. In this paper, we investigate whether a similar phenomenon, that we call teacher hacking, can occur during knowledge distillation. This could arise because the teacher LM is itself an imperfect approximation of the true distribution. To study this, we propose a controlled experimental setup involving: (i) an oracle LM representing the ground-truth distribution, (ii) a teacher LM distilled from the oracle, and (iii) a student LM distilled from the teacher. Our experiments reveal the following insights. When using a fixed offline dataset for distillation, teacher hacking occurs; moreover, we can detect it by observing when the optimization process deviates from polynomial convergence laws. In contrast, employing online data generation techniques effectively mitigates teacher hacking. More precisely, we identify data diversity as the key factor in preventing hacking. Overall, our findings provide a deeper understanding of the benefits and limitations of distillation for building robust and efficient LMs.

Community

Paper author Paper submitter

Post-training of language models (LMs) increasingly relies on the following two stages: (i) knowledge distillation, where the LM is trained to imitate a larger teacher LM, and (ii) reinforcement learning from human feedback (RLHF), where the LM is aligned by optimizing a reward model. In the second RLHF stage, a well-known challenge is reward hacking, where the LM over-optimizes the reward model. Such phenomenon is in line with Goodhart's law and can lead to degraded performance on the true objective. In this paper, we investigate whether a similar phenomenon, that we call teacher hacking, can occur during knowledge distillation. This could arise because the teacher LM is itself an imperfect approximation of the true distribution. To study this, we propose a controlled experimental setup involving: (i) an oracle LM representing the ground-truth distribution, (ii) a teacher LM distilled from the oracle, and (iii) a student LM distilled from the teacher. Our experiments reveal the following insights. When using a fixed offline dataset for distillation, teacher hacking occurs; moreover, we can detect it by observing when the optimization process deviates from polynomial convergence laws. In contrast, employing online data generation techniques effectively mitigates teacher hacking. More precisely, we identify data diversity as the key factor in preventing hacking. Overall, our findings provide a deeper understanding of the benefits and limitations of distillation for building robust and efficient LMs.

image.png

image.png

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.02671 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.02671 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.02671 in a Space README.md to link it from this page.

Collections including this paper 6