Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Making, not Taking, the Best of N
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-10-03T01:37:29.307Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7446920871734619},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2510.00931","authors":[{"_id":"68de08046024653e8a3ed169","user":{"_id":"677cfa6cac2db4c2265edb26","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Kbi96ndfY-CIuJNd2TRZt.jpeg","isPro":false,"fullname":"Ammar Khairi","user":"ammar-cohere","type":"user"},"name":"Ammar Khairi","status":"claimed_verified","statusLastChangedAt":"2025-10-02T13:54:51.759Z","hidden":false},{"_id":"68de08046024653e8a3ed16a","name":"Daniel D'souza","hidden":false},{"_id":"68de08046024653e8a3ed16b","name":"Marzieh Fadaee","hidden":false},{"_id":"68de08046024653e8a3ed16c","name":"Julia Kreutzer","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/677cfa6cac2db4c2265edb26/prGa1iEoIedoN0S_pMqXi.png"],"publishedAt":"2025-10-01T14:14:31.000Z","submittedOnDailyAt":"2025-10-02T03:37:41.156Z","title":"Making, not Taking, the Best of N","submittedOnDailyBy":{"_id":"677cfa6cac2db4c2265edb26","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Kbi96ndfY-CIuJNd2TRZt.jpeg","isPro":false,"fullname":"Ammar Khairi","user":"ammar-cohere","type":"user"},"summary":"Obtaining high-quality generations in modern LLMs has largely been framed as\na selection problem: identifying a single winning generation from a diverse\npool of N samples, the Best-of-N (BoN). Yet, this approach is inherently\nzero-sum, discarding diverse and potentially useful information from the pool.\nInstead, we explore a collaborative setup, where all candidates can potentially\ncontribute to the final winning generation. To this end, we propose Fusion-of-N\n(FusioN): a method that uses a general LLM judge to synthesize the most\ninformative elements of each sample into a single final answer. We compare\nFusioN to BoN in two settings, (i) test-time scaling, where we sample and\naggregate from a single model at test-time (ii) synthetic data generation,\nwhere we fuse samples from a pool of diverse teachers to improve a student\nmodel. We extensively benchmark both setups across 11 languages, 3 diverse\ntasks and varying model scales. Across the bench, FusioN consistently\noutperforms BoN showing versatility and robustness both in test-time scaling\nand in downstream gains from synthetic data generation. We also perform\nextensive analysis on FusioN, where it shows surprising strengths and\nrobustness under challenging settings. These results show that we should shift\nhow we think about evaluating and utilizing LLM generations from a monolithic\nmeasure of quality, to embracing their polylithic nature. This shift allows us\nto integrate diverse strengths, unlock latent potential, and achieve\nimprovements that were previously inaccessible through selection alone.","upvotes":10,"discussionId":"68de08046024653e8a3ed16d","ai_summary":"Fusion-of-N (FusioN) method improves LLM generation quality by synthesizing elements from multiple samples, outperforming Best-of-N in various settings and tasks.","ai_keywords":["Best-of-N","Fusion-of-N","LLM judge","test-time scaling","synthetic data generation","diverse teachers","student model","polylithic nature"],"organization":{"_id":"640ca1c93623f6a56ddab373","name":"CohereLabs","fullname":"Cohere Labs","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1678549441248-5e70f6048ce3c604d78fe133.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"677cfa6cac2db4c2265edb26","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Kbi96ndfY-CIuJNd2TRZt.jpeg","isPro":false,"fullname":"Ammar Khairi","user":"ammar-cohere","type":"user"},{"_id":"63a1c145f30c46422779626b","avatarUrl":"/avatars/7f445419cc13f2f8f1fe466d6b5c0b46.svg","isPro":false,"fullname":"Ammar Khairi","user":"ammarnasr","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6544e43b12da508864c38f96","avatarUrl":"/avatars/76f0cd55b4bf9c03d2686e146c6f795f.svg","isPro":false,"fullname":"Julia Kreutzer","user":"JuliaKreutzerCohere","type":"user"},{"_id":"6658011eaba105a066e37e1b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6658011eaba105a066e37e1b/VPwyTv1bnVMQbVMoMQzcf.jpeg","isPro":true,"fullname":"Daniel D'souza","user":"dsouzadaniel","type":"user"},{"_id":"650ed7adf141bc34f91a12ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/rZDwRiBcqeqlUQ7mThoyU.jpeg","isPro":true,"fullname":"Hanna Yukhymenko","user":"hannayukhymenko","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"6807bcc8238d6f50a205d12c","avatarUrl":"/avatars/abd401eb8a25549d687779cf80074508.svg","isPro":false,"fullname":"Bizhou Liu","user":"greenboat11","type":"user"},{"_id":"686db5d4af2b856fabbf13aa","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/6BjMv2LVNoqvbX8fQSTPI.png","isPro":false,"fullname":"V bbbb","user":"Bbbbbnnn","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"640ca1c93623f6a56ddab373","name":"CohereLabs","fullname":"Cohere Labs","avatar":"https://cdn-uploads.huggingface.co/production/uploads/1678549441248-5e70f6048ce3c604d78fe133.png"}}">
Papers
arxiv:2510.00931

Making, not Taking, the Best of N

Published on Oct 1, 2025
· Submitted by
Ammar Khairi
on Oct 2, 2025
Authors:
,
,

Abstract

Fusion-of-N (FusioN) method improves LLM generation quality by synthesizing elements from multiple samples, outperforming Best-of-N in various settings and tasks.

AI-generated summary

Obtaining high-quality generations in modern LLMs has largely been framed as a selection problem: identifying a single winning generation from a diverse pool of N samples, the Best-of-N (BoN). Yet, this approach is inherently zero-sum, discarding diverse and potentially useful information from the pool. Instead, we explore a collaborative setup, where all candidates can potentially contribute to the final winning generation. To this end, we propose Fusion-of-N (FusioN): a method that uses a general LLM judge to synthesize the most informative elements of each sample into a single final answer. We compare FusioN to BoN in two settings, (i) test-time scaling, where we sample and aggregate from a single model at test-time (ii) synthetic data generation, where we fuse samples from a pool of diverse teachers to improve a student model. We extensively benchmark both setups across 11 languages, 3 diverse tasks and varying model scales. Across the bench, FusioN consistently outperforms BoN showing versatility and robustness both in test-time scaling and in downstream gains from synthetic data generation. We also perform extensive analysis on FusioN, where it shows surprising strengths and robustness under challenging settings. These results show that we should shift how we think about evaluating and utilizing LLM generations from a monolithic measure of quality, to embracing their polylithic nature. This shift allows us to integrate diverse strengths, unlock latent potential, and achieve improvements that were previously inaccessible through selection alone.

Community

Paper author Paper submitter

Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.00931 in a model README.md to link it from this page.

Datasets citing this paper 5

Browse 5 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.00931 in a Space README.md to link it from this page.

Collections including this paper 1