Librarian Bot. I found the following papers similar to this paper. \n
The following papers were recommended by the Semantic Scholar API
\n
\n
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-06T01:35:43.663Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7143381237983704},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.02783","authors":[{"_id":"67c82d2202935d02b0ce0df1","name":"Jie Wu","hidden":false},{"_id":"67c82d2202935d02b0ce0df2","user":{"_id":"650be23ec4e52db6a4db63ef","avatarUrl":"/avatars/03af548029b38bee49ec295fefe74f9a.svg","isPro":false,"fullname":"Haoling Li","user":"Ringo1110","type":"user"},"name":"Haoling Li","status":"claimed_verified","statusLastChangedAt":"2025-03-05T10:59:05.817Z","hidden":false},{"_id":"67c82d2202935d02b0ce0df3","user":{"_id":"641a9a4b05290a135041a3ed","avatarUrl":"/avatars/95d66ac607973abe95bd3558c6c93739.svg","isPro":false,"fullname":"Pluto","user":"CharonBony","type":"user"},"name":"Xin Zhang","status":"extracted_confirmed","statusLastChangedAt":"2026-02-04T11:58:34.543Z","hidden":false},{"_id":"67c82d2202935d02b0ce0df4","user":{"_id":"66adf5cc0c6056d9f4dc308f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66adf5cc0c6056d9f4dc308f/mVKo06P7M1qf6RYNG-c2i.jpeg","isPro":false,"fullname":"Jane Luo","user":"Luo2003","type":"user"},"name":"Jianwen Luo","status":"admin_assigned","statusLastChangedAt":"2025-03-05T11:37:40.552Z","hidden":false},{"_id":"67c82d2202935d02b0ce0df5","user":{"_id":"64c66647725ffa04b2fd6c94","avatarUrl":"/avatars/620f63f27fa1e90423b0dc22aa8e5809.svg","isPro":false,"fullname":"yangyu huang","user":"yangyu90","type":"user"},"name":"Yangyu Huang","status":"admin_assigned","statusLastChangedAt":"2025-03-05T11:37:46.285Z","hidden":false},{"_id":"67c82d2202935d02b0ce0df6","user":{"_id":"642e3bcb958faf258a40e89c","avatarUrl":"/avatars/dad142df2217f8eed1f45c9e7287d3ea.svg","isPro":false,"fullname":"Ruihang Chu","user":"Ruihang","type":"user"},"name":"Ruihang Chu","status":"admin_assigned","statusLastChangedAt":"2025-03-05T11:37:51.435Z","hidden":false},{"_id":"67c82d2202935d02b0ce0df7","user":{"_id":"64ca1fe838837b12d5e529b7","avatarUrl":"/avatars/44a3ad9e59318784ac531993b5f69f6b.svg","isPro":false,"fullname":"Yujiu Yang","user":"Thu-redrobot","type":"user"},"name":"Yujiu Yang","status":"admin_assigned","statusLastChangedAt":"2025-03-05T11:37:57.005Z","hidden":false},{"_id":"67c82d2202935d02b0ce0df8","user":{"_id":"67366efb049bfa3a9084e8d1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Bny5bUc6qFQ_RWpEKpRAW.jpeg","isPro":false,"fullname":"Scarlett Li","user":"lisijia0504","type":"user"},"name":"Scarlett Li","status":"admin_assigned","statusLastChangedAt":"2025-03-05T11:38:04.131Z","hidden":false}],"publishedAt":"2025-03-04T16:56:34.000Z","submittedOnDailyAt":"2025-03-05T08:24:00.530Z","title":"IterPref: Focal Preference Learning for Code Generation via Iterative\n Debugging","submittedOnDailyBy":{"_id":"650be23ec4e52db6a4db63ef","avatarUrl":"/avatars/03af548029b38bee49ec295fefe74f9a.svg","isPro":false,"fullname":"Haoling Li","user":"Ringo1110","type":"user"},"summary":"Preference learning enhances Code LLMs beyond supervised fine-tuning by\nleveraging relative quality comparisons. Existing methods construct preference\npairs from\n candidates based on test case success, treating the higher pass rate sample\nas positive and the lower as negative. However, this approach does not pinpoint\nspecific errors in the code, which prevents the model from learning more\ninformative error correction patterns, as aligning failing code as a whole\nlacks the granularity needed to capture meaningful error-resolution\nrelationships. To address these issues, we propose IterPref, a new preference\nalignment framework that mimics human iterative debugging to refine Code LLMs.\nIterPref explicitly locates error regions and aligns the corresponding tokens\nvia a tailored DPO algorithm. To generate informative pairs, we introduce the\nCodeFlow dataset, where samples are iteratively refined until passing tests,\nwith modifications capturing error corrections. Extensive experiments show that\na diverse suite of Code LLMs equipped with IterPref achieves significant\nperformance gains in code generation and improves on challenging tasks like\nBigCodeBench. In-depth analysis reveals that IterPref yields fewer errors. Our\ncode and data will be made publicaly available.","upvotes":7,"discussionId":"67c82d2302935d02b0ce0e3c","githubRepo":"https://github.com/JieWu02/Target-DPO","githubRepoAddedBy":"auto","ai_summary":"IterPref refines Code LLMs by aligning error regions through iterative debugging, improving performance and error reduction using the CodeFlow dataset.","ai_keywords":["preference learning","Code LLMs","supervised fine-tuning","preference pairs","error correction patterns","iterative debugging","error regions","tokens","DPO algorithm","CodeFlow dataset","code generation","BigCodeBench"],"githubStars":5},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"650be23ec4e52db6a4db63ef","avatarUrl":"/avatars/03af548029b38bee49ec295fefe74f9a.svg","isPro":false,"fullname":"Haoling Li","user":"Ringo1110","type":"user"},{"_id":"651c80a26ba9ab9b9582c273","avatarUrl":"/avatars/e963452eafd21f517d800f2e58e0f918.svg","isPro":false,"fullname":"siyeng feng","user":"siyengfeng","type":"user"},{"_id":"665b133508d536a8ac804f7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Uwi0OnANdTbRbHHQvGqvR.png","isPro":false,"fullname":"Paulson","user":"Pnaomi","type":"user"},{"_id":"650c8bfb3d3542884da1a845","avatarUrl":"/avatars/863a5deebf2ac6d4faedc4dd368e0561.svg","isPro":false,"fullname":"Adhurim ","user":"Limi07","type":"user"},{"_id":"6445f17f3a0fa0e98cd11d50","avatarUrl":"/avatars/3bdad74ff6d09fa7300a7119afe65392.svg","isPro":false,"fullname":"武杰","user":"21223wj","type":"user"},{"_id":"60c94c629cacafb192d805fc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1623805017432-noauth.jpeg","isPro":false,"fullname":"TimeLordRaps","user":"TimeLordRaps","type":"user"},{"_id":"63082bb7bc0a2a5ee2253523","avatarUrl":"/avatars/6cf8d12d16d15db1070fbea89b5b3967.svg","isPro":false,"fullname":"Kuo-Hsin Tu","user":"dapumptu","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
IterPref: Focal Preference Learning for Code Generation via Iterative
Debugging
Abstract
IterPref refines Code LLMs by aligning error regions through iterative debugging, improving performance and error reduction using the CodeFlow dataset.
Preference learning enhances Code LLMs beyond supervised fine-tuning by
leveraging relative quality comparisons. Existing methods construct preference
pairs from
candidates based on test case success, treating the higher pass rate sample
as positive and the lower as negative. However, this approach does not pinpoint
specific errors in the code, which prevents the model from learning more
informative error correction patterns, as aligning failing code as a whole
lacks the granularity needed to capture meaningful error-resolution
relationships. To address these issues, we propose IterPref, a new preference
alignment framework that mimics human iterative debugging to refine Code LLMs.
IterPref explicitly locates error regions and aligns the corresponding tokens
via a tailored DPO algorithm. To generate informative pairs, we introduce the
CodeFlow dataset, where samples are iteratively refined until passing tests,
with modifications capturing error corrections. Extensive experiments show that
a diverse suite of Code LLMs equipped with IterPref achieves significant
performance gains in code generation and improves on challenging tasks like
BigCodeBench. In-depth analysis reveals that IterPref yields fewer errors. Our
code and data will be made publicaly available.