Librarian Bot. I found the following papers similar to this paper. \n
The following papers were recommended by the Semantic Scholar API
\n
\n
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-05-21T01:37:50.549Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.703743577003479},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2505.12992","authors":[{"_id":"682c12290f622b7afc1fc98f","user":{"_id":"62c414354ce7250560a1f67f","avatarUrl":"/avatars/28fd73973d1703c84f4f59644fef8a80.svg","isPro":false,"fullname":"Baohao Liao","user":"baohao","type":"user"},"name":"Baohao Liao","status":"admin_assigned","statusLastChangedAt":"2025-05-20T08:33:43.331Z","hidden":false},{"_id":"682c12290f622b7afc1fc990","user":{"_id":"63a3ff69f91ad3ea5703841d","avatarUrl":"/avatars/69227c4bce01d33747c1377b6f9672db.svg","isPro":false,"fullname":"Hanze Dong","user":"hendrydong","type":"user"},"name":"Hanze Dong","status":"admin_assigned","statusLastChangedAt":"2025-05-20T08:33:50.212Z","hidden":false},{"_id":"682c12290f622b7afc1fc991","user":{"_id":"6602869253a0518b2a98cafd","avatarUrl":"/avatars/c14b5953a716f42c83ad28147f8308ae.svg","isPro":false,"fullname":"Yuhui Xu","user":"yuhuixu","type":"user"},"name":"Yuhui Xu","status":"admin_assigned","statusLastChangedAt":"2025-05-20T08:33:37.033Z","hidden":false},{"_id":"682c12290f622b7afc1fc992","user":{"_id":"65f84fd980481173afd91233","avatarUrl":"/avatars/6ac7bd6beba24d1476c5179b88c9e3fa.svg","isPro":false,"fullname":"Doyen","user":"doyensahoo","type":"user"},"name":"Doyen Sahoo","status":"admin_assigned","statusLastChangedAt":"2025-05-20T08:33:57.766Z","hidden":false},{"_id":"682c12290f622b7afc1fc993","name":"Christof Monz","hidden":false},{"_id":"682c12290f622b7afc1fc994","user":{"_id":"61f9d3b54ac99e8a1bae85f4","avatarUrl":"/avatars/ac47d13204dd22452e4bc46e280842d5.svg","isPro":false,"fullname":"JunnanLi","user":"JunnanLi","type":"user"},"name":"Junnan Li","status":"admin_assigned","statusLastChangedAt":"2025-05-20T08:34:18.715Z","hidden":false},{"_id":"682c12290f622b7afc1fc995","user":{"_id":"649dbcc4e0fff1ed099dc80a","avatarUrl":"/avatars/c87c273ca628dbcddccbf1ee19b2ce33.svg","isPro":false,"fullname":"Caiming Xiong","user":"cxiong","type":"user"},"name":"Caiming Xiong","status":"admin_assigned","statusLastChangedAt":"2025-05-20T08:34:27.628Z","hidden":false}],"publishedAt":"2025-05-19T11:30:41.000Z","submittedOnDailyAt":"2025-05-20T03:55:28.140Z","title":"Fractured Chain-of-Thought Reasoning","submittedOnDailyBy":{"_id":"6602869253a0518b2a98cafd","avatarUrl":"/avatars/c14b5953a716f42c83ad28147f8308ae.svg","isPro":false,"fullname":"Yuhui Xu","user":"yuhuixu","type":"user"},"summary":"Inference-time scaling techniques have significantly bolstered the reasoning\ncapabilities of large language models (LLMs) by harnessing additional\ncomputational effort at inference without retraining. Similarly,\nChain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy\nby generating rich intermediate reasoning trajectories, but these approaches\nincur substantial token costs that impede their deployment in latency-sensitive\nsettings. In this work, we first show that truncated CoT, which stops reasoning\nbefore completion and directly generates the final answer, often matches full\nCoT sampling while using dramatically fewer tokens. Building on this insight,\nwe introduce Fractured Sampling, a unified inference-time strategy that\ninterpolates between full CoT and solution-only sampling along three orthogonal\naxes: (1) the number of reasoning trajectories, (2) the number of final\nsolutions per trajectory, and (3) the depth at which reasoning traces are\ntruncated. Through extensive experiments on five diverse reasoning benchmarks\nand several model scales, we demonstrate that Fractured Sampling consistently\nachieves superior accuracy-cost trade-offs, yielding steep log-linear scaling\ngains in Pass@k versus token budget. Our analysis reveals how to allocate\ncomputation across these dimensions to maximize performance, paving the way for\nmore efficient and scalable LLM reasoning.","upvotes":23,"discussionId":"682c122a0f622b7afc1fc9b7","githubRepo":"https://github.com/BaohaoLiao/frac-cot","githubRepoAddedBy":"user","ai_summary":"Fractured Sampling optimizes inference in large language models by balancing token usage and accuracy through truncated reasoning trajectories.","ai_keywords":["inference-time scaling","large language models","Chain-of-Thought (CoT)","Long CoT","truncated CoT","Fractured Sampling","reasoning trajectories","final solutions","reasoning traces","Pass@k","token budget"],"githubStars":16},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63a3ff69f91ad3ea5703841d","avatarUrl":"/avatars/69227c4bce01d33747c1377b6f9672db.svg","isPro":false,"fullname":"Hanze Dong","user":"hendrydong","type":"user"},{"_id":"63d46969640bb0f7717d96a1","avatarUrl":"/avatars/2f39280ebacee6168c89e10c3ac89621.svg","isPro":false,"fullname":"Taha Aksu","user":"Taksu","type":"user"},{"_id":"64351475901c5734bcb64248","avatarUrl":"/avatars/12346d4301c1bfb00ce0ea128a93cc15.svg","isPro":false,"fullname":"Zhiyuan Hu","user":"zhiyuanhucs","type":"user"},{"_id":"6446766c2d2a9216bf2cdd06","avatarUrl":"/avatars/a18fc7234585384d8342670fecee664d.svg","isPro":false,"fullname":"ZHANG Jipeng","user":"OldFriends","type":"user"},{"_id":"65abd99ea92a64ef5bb59325","avatarUrl":"/avatars/1809479790584c5be6a5d370042155a9.svg","isPro":false,"fullname":"Zirui Zhao","user":"ryanzzr","type":"user"},{"_id":"618767e4238063b4615d042b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1636263880877-noauth.jpeg","isPro":false,"fullname":"Tianbao Xie","user":"tianbaoxiexxx","type":"user"},{"_id":"64802face9ff472e30dc1ceb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64802face9ff472e30dc1ceb/bcwTlgpaUrU7m2RMB5zCc.png","isPro":true,"fullname":"Yiran Zhao","user":"Yiran0924","type":"user"},{"_id":"65f84fd980481173afd91233","avatarUrl":"/avatars/6ac7bd6beba24d1476c5179b88c9e3fa.svg","isPro":false,"fullname":"Doyen","user":"doyensahoo","type":"user"},{"_id":"66ea4ade9f86fb716838a838","avatarUrl":"/avatars/ca51c3409e4a277063174346aca87b9a.svg","isPro":false,"fullname":"Akhilesh Deepak Gotmare","user":"akhileshgotmare","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6602869253a0518b2a98cafd","avatarUrl":"/avatars/c14b5953a716f42c83ad28147f8308ae.svg","isPro":false,"fullname":"Yuhui Xu","user":"yuhuixu","type":"user"},{"_id":"64feae72a39388f54fe33d2d","avatarUrl":"/avatars/965691697ef2ed2abed274ca51f9d976.svg","isPro":false,"fullname":"chenghao liu","user":"twinsken","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Fractured Chain-of-Thought Reasoning
Published on May 19, 2025
Abstract
Fractured Sampling optimizes inference in large language models by balancing token usage and accuracy through truncated reasoning trajectories.
Inference-time scaling techniques have significantly bolstered the reasoning
capabilities of large language models (LLMs) by harnessing additional
computational effort at inference without retraining. Similarly,
Chain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy
by generating rich intermediate reasoning trajectories, but these approaches
incur substantial token costs that impede their deployment in latency-sensitive
settings. In this work, we first show that truncated CoT, which stops reasoning
before completion and directly generates the final answer, often matches full
CoT sampling while using dramatically fewer tokens. Building on this insight,
we introduce Fractured Sampling, a unified inference-time strategy that
interpolates between full CoT and solution-only sampling along three orthogonal
axes: (1) the number of reasoning trajectories, (2) the number of final
solutions per trajectory, and (3) the depth at which reasoning traces are
truncated. Through extensive experiments on five diverse reasoning benchmarks
and several model scales, we demonstrate that Fractured Sampling consistently
achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling
gains in Pass@k versus token budget. Our analysis reveals how to allocate
computation across these dimensions to maximize performance, paving the way for
more efficient and scalable LLM reasoning.