Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Evaluating Language Models as Synthetic Data Generators
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-12-07T01:33:44.848Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7609120607376099},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2412.03679","authors":[{"_id":"67525bedeeb66c5ab7dc99fa","user":{"_id":"6469949654873f0043b09c22","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6469949654873f0043b09c22/Lk7IJAR16Wa_sGJ2g81AQ.jpeg","isPro":true,"fullname":"Seungone Kim","user":"seungone","type":"user"},"name":"Seungone Kim","status":"claimed_verified","statusLastChangedAt":"2024-12-06T09:09:33.388Z","hidden":false},{"_id":"67525bedeeb66c5ab7dc99fb","user":{"_id":"6138cc1306dd10833d2db64b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6138cc1306dd10833d2db64b/IRX4y-8M4YlzR_8jOwkKp.jpeg","isPro":true,"fullname":"Juyoung Suk","user":"juyoungml","type":"user"},"name":"Juyoung Suk","status":"admin_assigned","statusLastChangedAt":"2024-12-06T17:23:46.560Z","hidden":false},{"_id":"67525bedeeb66c5ab7dc99fc","user":{"_id":"6230d750d93e84e233882dbc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6230d750d93e84e233882dbc/4MGEekLW3oWzqeFWDWvIK.jpeg","isPro":false,"fullname":"Xiang Yue","user":"yuexiang96","type":"user"},"name":"Xiang Yue","status":"admin_assigned","statusLastChangedAt":"2024-12-06T17:23:39.076Z","hidden":false},{"_id":"67525bedeeb66c5ab7dc99fd","user":{"_id":"603c44e677a2a899efe25828","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1614562504843-noauth.jpeg","isPro":false,"fullname":"Vijay Viswanathan","user":"viswavi","type":"user"},"name":"Vijay Viswanathan","status":"admin_assigned","statusLastChangedAt":"2024-12-06T17:23:31.437Z","hidden":false},{"_id":"67525bedeeb66c5ab7dc99fe","user":{"_id":"6550c4f27bbfce1878f5f280","avatarUrl":"/avatars/0ecedbcd8a55b2c4abd1da9e741a6652.svg","isPro":false,"fullname":"seongyun_lee","user":"Seongyun","type":"user"},"name":"Seongyun Lee","status":"admin_assigned","statusLastChangedAt":"2024-12-06T17:23:18.129Z","hidden":false},{"_id":"67525bedeeb66c5ab7dc99ff","user":{"_id":"6269be67d1ac0cde592aba29","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6269be67d1ac0cde592aba29/EP3zZbD7-jWK9ITvHGbnZ.jpeg","isPro":false,"fullname":"Yizhong Wang","user":"yizhongw","type":"user"},"name":"Yizhong Wang","status":"admin_assigned","statusLastChangedAt":"2024-12-06T17:22:46.849Z","hidden":false},{"_id":"67525bedeeb66c5ab7dc9a00","user":{"_id":"638763415c68cf2713b8ad7c","avatarUrl":"/avatars/3113c3e71caa5cd5b6f8ce9c28241bc3.svg","isPro":false,"fullname":"Kiril Gashteovski","user":"kgashteo","type":"user"},"name":"Kiril Gashteovski","status":"admin_assigned","statusLastChangedAt":"2024-12-06T17:22:41.109Z","hidden":false},{"_id":"67525bedeeb66c5ab7dc9a01","name":"Carolin Lawrence","hidden":false},{"_id":"67525bedeeb66c5ab7dc9a02","user":{"_id":"63e3f3c59db5da2dc1ef6889","avatarUrl":"/avatars/f7546f57a5fd69bc99ff1640cc4a4853.svg","isPro":false,"fullname":"Sean Welleck","user":"wellecks","type":"user"},"name":"Sean Welleck","status":"admin_assigned","statusLastChangedAt":"2024-12-06T17:22:32.024Z","hidden":false},{"_id":"67525bedeeb66c5ab7dc9a03","user":{"_id":"60de14638bedd2315529d43f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1625166923504-noauth.png","isPro":false,"fullname":"Graham Neubig","user":"gneubig","type":"user"},"name":"Graham Neubig","status":"admin_assigned","statusLastChangedAt":"2024-12-06T17:22:07.993Z","hidden":false}],"publishedAt":"2024-12-04T19:20:32.000Z","submittedOnDailyAt":"2024-12-06T00:26:59.355Z","title":"Evaluating Language Models as Synthetic Data Generators","submittedOnDailyBy":{"_id":"6469949654873f0043b09c22","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6469949654873f0043b09c22/Lk7IJAR16Wa_sGJ2g81AQ.jpeg","isPro":true,"fullname":"Seungone Kim","user":"seungone","type":"user"},"summary":"Given the increasing use of synthetic data in language model (LM)\npost-training, an LM's ability to generate high-quality data has become nearly\nas crucial as its ability to solve problems directly. While prior works have\nfocused on developing effective data generation methods, they lack systematic\ncomparison of different LMs as data generators in a unified setting. To address\nthis gap, we propose AgoraBench, a benchmark that provides standardized\nsettings and metrics to evaluate LMs' data generation abilities. Through\nsynthesizing 1.26 million training instances using 6 LMs and training 99\nstudent models, we uncover key insights about LMs' data generation\ncapabilities. First, we observe that LMs exhibit distinct strengths. For\ninstance, GPT-4o excels at generating new problems, while Claude-3.5-Sonnet\nperforms better at enhancing existing ones. Furthermore, our analysis reveals\nthat an LM's data generation ability doesn't necessarily correlate with its\nproblem-solving ability. Instead, multiple intrinsic features of data\nquality-including response quality, perplexity, and instruction\ndifficulty-collectively serve as better indicators. Finally, we demonstrate\nthat strategic choices in output format and cost-conscious model selection\nsignificantly impact data generation effectiveness.","upvotes":47,"discussionId":"67525beeeeb66c5ab7dc9a98","githubRepo":"https://github.com/neulab/data-agora","githubRepoAddedBy":"auto","ai_summary":"AgoraBench evaluates language models' data generation abilities, revealing their distinct strengths and demonstrating various factors that impact effectiveness.","ai_keywords":["language models","data generation","AgoraBench","standardized settings","metrics","training instances","student models","GPT-4o","Claude-3.5-Sonnet","response quality","perplexity","instruction difficulty","output format","model selection"],"githubStars":40},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65decc75beffeb39ba679eba","avatarUrl":"/avatars/735b678bd5863a0c1b1bdd3bbf8858fa.svg","isPro":true,"fullname":"r","user":"oceansweep","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646def60df618b303b419323/JLJGYen4-5M8ivsLsSk0w.jpeg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"},{"_id":"6683fc5344a65be1aab25dc0","avatarUrl":"/avatars/e13cde3f87b59e418838d702807df3b5.svg","isPro":false,"fullname":"hjkim","user":"hojie11","type":"user"},{"_id":"6230d750d93e84e233882dbc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6230d750d93e84e233882dbc/4MGEekLW3oWzqeFWDWvIK.jpeg","isPro":false,"fullname":"Xiang Yue","user":"yuexiang96","type":"user"},{"_id":"60de14638bedd2315529d43f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1625166923504-noauth.png","isPro":false,"fullname":"Graham Neubig","user":"gneubig","type":"user"},{"_id":"64f7bf0c7565a69eb693ad1f","avatarUrl":"/avatars/aba6910aa39a3437a7f0df3f5cd49e6d.svg","isPro":false,"fullname":"minju","user":"iaminju","type":"user"},{"_id":"62ce50353529c21a228ab2d8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62ce50353529c21a228ab2d8/153N-GZ0Vj5YXWMW3noQe.jpeg","isPro":false,"fullname":"Se June Joo","user":"Joocjun","type":"user"},{"_id":"6434b6619bd5a84b5dcfa4de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6434b6619bd5a84b5dcfa4de/h8Q6kPNjFNc03wmdboHzq.jpeg","isPro":true,"fullname":"Young-Jun Lee","user":"passing2961","type":"user"},{"_id":"6469949654873f0043b09c22","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6469949654873f0043b09c22/Lk7IJAR16Wa_sGJ2g81AQ.jpeg","isPro":true,"fullname":"Seungone Kim","user":"seungone","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"668a17447d3f73951ed2ab84","avatarUrl":"/avatars/2a1785af3b1b02e495304d2b118accb0.svg","isPro":false,"fullname":"Shuai Wang","user":"Shuaiii","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
AgoraBench evaluates language models' data generation abilities, revealing their distinct strengths and demonstrating various factors that impact effectiveness.
AI-generated summary
Given the increasing use of synthetic data in language model (LM)
post-training, an LM's ability to generate high-quality data has become nearly
as crucial as its ability to solve problems directly. While prior works have
focused on developing effective data generation methods, they lack systematic
comparison of different LMs as data generators in a unified setting. To address
this gap, we propose AgoraBench, a benchmark that provides standardized
settings and metrics to evaluate LMs' data generation abilities. Through
synthesizing 1.26 million training instances using 6 LMs and training 99
student models, we uncover key insights about LMs' data generation
capabilities. First, we observe that LMs exhibit distinct strengths. For
instance, GPT-4o excels at generating new problems, while Claude-3.5-Sonnet
performs better at enhancing existing ones. Furthermore, our analysis reveals
that an LM's data generation ability doesn't necessarily correlate with its
problem-solving ability. Instead, multiple intrinsic features of data
quality-including response quality, perplexity, and instruction
difficulty-collectively serve as better indicators. Finally, we demonstrate
that strategic choices in output format and cost-conscious model selection
significantly impact data generation effectiveness.