Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Making, not Taking, the Best of N
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-10-03T01:37:29.307Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7446920871734619},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2510.00931","authors":[{"_id":"68de08046024653e8a3ed169","user":{"_id":"677cfa6cac2db4c2265edb26","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Kbi96ndfY-CIuJNd2TRZt.jpeg","isPro":false,"fullname":"Ammar Khairi","user":"ammar-cohere","type":"user"},"name":"Ammar Khairi","status":"claimed_verified","statusLastChangedAt":"2025-10-02T13:54:51.759Z","hidden":false},{"_id":"68de08046024653e8a3ed16a","name":"Daniel D'souza","hidden":false},{"_id":"68de08046024653e8a3ed16b","name":"Marzieh Fadaee","hidden":false},{"_id":"68de08046024653e8a3ed16c","name":"Julia Kreutzer","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/677cfa6cac2db4c2265edb26/prGa1iEoIedoN0S_pMqXi.png"],"publishedAt":"2025-10-01T14:14:31.000Z","submittedOnDailyAt":"2025-10-02T03:37:41.156Z","title":"Making, not Taking, the Best of N","submittedOnDailyBy":{"_id":"677cfa6cac2db4c2265edb26","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Kbi96ndfY-CIuJNd2TRZt.jpeg","isPro":false,"fullname":"Ammar Khairi","user":"ammar-cohere","type":"user"},"summary":"Obtaining high-quality generations in modern LLMs has largely been framed as\na selection problem: identifying a single winning generation from a diverse\npool of N samples, the Best-of-N (BoN). Yet, this approach is inherently\nzero-sum, discarding diverse and potentially useful information from the pool.\nInstead, we explore a collaborative setup, where all candidates can potentially\ncontribute to the final winning generation. To this end, we propose Fusion-of-N\n(FusioN): a method that uses a general LLM judge to synthesize the most\ninformative elements of each sample into a single final answer. We compare\nFusioN to BoN in two settings, (i) test-time scaling, where we sample and\naggregate from a single model at test-time (ii) synthetic data generation,\nwhere we fuse samples from a pool of diverse teachers to improve a student\nmodel. We extensively benchmark both setups across 11 languages, 3 diverse\ntasks and varying model scales. Across the bench, FusioN consistently\noutperforms BoN showing versatility and robustness both in test-time scaling\nand in downstream gains from synthetic data generation. We also perform\nextensive analysis on FusioN, where it shows surprising strengths and\nrobustness under challenging settings. These results show that we should shift\nhow we think about evaluating and utilizing LLM generations from a monolithic\nmeasure of quality, to embracing their polylithic nature. This shift allows us\nto integrate diverse strengths, unlock latent potential, and achieve\nimprovements that were previously inaccessible through selection alone.","upvotes":10,"discussionId":"68de08046024653e8a3ed16d","ai_summary":"Fusion-of-N (FusioN) method improves LLM generation quality by synthesizing elements from multiple samples, outperforming Best-of-N in various settings and tasks.","ai_keywords":["Best-of-N","Fusion-of-N","LLM judge","test-time scaling","synthetic data generation","diverse teachers","student model","polylithic nature"],"organization":{"_id":"640ca1c93623f6a56ddab373","name":"CohereLabs","fullname":"Cohere Labs","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1678549441248-5e70f6048ce3c604d78fe133.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"677cfa6cac2db4c2265edb26","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Kbi96ndfY-CIuJNd2TRZt.jpeg","isPro":false,"fullname":"Ammar Khairi","user":"ammar-cohere","type":"user"},{"_id":"63a1c145f30c46422779626b","avatarUrl":"/avatars/7f445419cc13f2f8f1fe466d6b5c0b46.svg","isPro":false,"fullname":"Ammar Khairi","user":"ammarnasr","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6544e43b12da508864c38f96","avatarUrl":"/avatars/76f0cd55b4bf9c03d2686e146c6f795f.svg","isPro":false,"fullname":"Julia Kreutzer","user":"JuliaKreutzerCohere","type":"user"},{"_id":"6658011eaba105a066e37e1b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6658011eaba105a066e37e1b/VPwyTv1bnVMQbVMoMQzcf.jpeg","isPro":true,"fullname":"Daniel D'souza","user":"dsouzadaniel","type":"user"},{"_id":"650ed7adf141bc34f91a12ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/rZDwRiBcqeqlUQ7mThoyU.jpeg","isPro":true,"fullname":"Hanna Yukhymenko","user":"hannayukhymenko","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"6807bcc8238d6f50a205d12c","avatarUrl":"/avatars/abd401eb8a25549d687779cf80074508.svg","isPro":false,"fullname":"Bizhou Liu","user":"greenboat11","type":"user"},{"_id":"686db5d4af2b856fabbf13aa","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/6BjMv2LVNoqvbX8fQSTPI.png","isPro":false,"fullname":"V bbbb","user":"Bbbbbnnn","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"640ca1c93623f6a56ddab373","name":"CohereLabs","fullname":"Cohere Labs","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1678549441248-5e70f6048ce3c604d78fe133.png"}}">
Fusion-of-N (FusioN) method improves LLM generation quality by synthesizing elements from multiple samples, outperforming Best-of-N in various settings and tasks.
AI-generated summary
Obtaining high-quality generations in modern LLMs has largely been framed as
a selection problem: identifying a single winning generation from a diverse
pool of N samples, the Best-of-N (BoN). Yet, this approach is inherently
zero-sum, discarding diverse and potentially useful information from the pool.
Instead, we explore a collaborative setup, where all candidates can potentially
contribute to the final winning generation. To this end, we propose Fusion-of-N
(FusioN): a method that uses a general LLM judge to synthesize the most
informative elements of each sample into a single final answer. We compare
FusioN to BoN in two settings, (i) test-time scaling, where we sample and
aggregate from a single model at test-time (ii) synthetic data generation,
where we fuse samples from a pool of diverse teachers to improve a student
model. We extensively benchmark both setups across 11 languages, 3 diverse
tasks and varying model scales. Across the bench, FusioN consistently
outperforms BoN showing versatility and robustness both in test-time scaling
and in downstream gains from synthetic data generation. We also perform
extensive analysis on FusioN, where it shows surprising strengths and
robustness under challenging settings. These results show that we should shift
how we think about evaluating and utilizing LLM generations from a monolithic
measure of quality, to embracing their polylithic nature. This shift allows us
to integrate diverse strengths, unlock latent potential, and achieve
improvements that were previously inaccessible through selection alone.