Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
https://github.com/NovaSky-AI/SkyThought\n","updatedAt":"2025-02-12T03:58:37.601Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9179,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7095736861228943},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[{"reaction":"👍","users":["sugatoray"],"count":1}],"isReport":false}},{"id":"67ad4c22677096928ea5d134","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-02-13T01:34:26.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Demystifying Long Chain-of-Thought Reasoning in LLMs](https://huggingface.co/papers/2502.03373) (2025)\n* [CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis](https://huggingface.co/papers/2501.01668) (2025)\n* [Teaching LLMs to Refine with Tools](https://huggingface.co/papers/2412.16871) (2024)\n* [Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models](https://huggingface.co/papers/2501.18533) (2025)\n* [BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation](https://huggingface.co/papers/2502.03860) (2025)\n* [LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks](https://huggingface.co/papers/2412.15204) (2024)\n* [Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate](https://huggingface.co/papers/2501.17703) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-02-13T01:34:26.570Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7324166893959045},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2502.07374","authors":[{"_id":"67ac1c6436464325ebe3c6e3","user":{"_id":"63715b25ffc0489ed7d1f415","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63715b25ffc0489ed7d1f415/xZJepbs0LRqFbW1knnBKR.jpeg","isPro":false,"fullname":"Dacheng Li","user":"DachengLi","type":"user"},"name":"Dacheng Li","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:18:06.131Z","hidden":false},{"_id":"67ac1c6436464325ebe3c6e4","user":{"_id":"64ebbae6895a36ab28de811a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ebbae6895a36ab28de811a/gBiaQP4paS4L13eu-yRm7.jpeg","isPro":false,"fullname":"Shiyi Cao","user":"eva98","type":"user"},"name":"Shiyi Cao","status":"claimed_verified","statusLastChangedAt":"2025-02-14T08:01:33.967Z","hidden":false},{"_id":"67ac1c6436464325ebe3c6e5","name":"Tyler Griggs","hidden":false},{"_id":"67ac1c6436464325ebe3c6e6","name":"Shu Liu","hidden":false},{"_id":"67ac1c6436464325ebe3c6e7","name":"Xiangxi Mo","hidden":false},{"_id":"67ac1c6436464325ebe3c6e8","name":"Shishir G. Patil","hidden":false},{"_id":"67ac1c6436464325ebe3c6e9","name":"Matei Zaharia","hidden":false},{"_id":"67ac1c6436464325ebe3c6ea","name":"Joseph E. Gonzalez","hidden":false},{"_id":"67ac1c6436464325ebe3c6eb","name":"Ion Stoica","hidden":false}],"publishedAt":"2025-02-11T08:48:48.000Z","submittedOnDailyAt":"2025-02-12T01:28:37.585Z","title":"LLMs Can Easily Learn to Reason from Demonstrations Structure, not\n content, is what matters!","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Large reasoning models (LRMs) tackle complex reasoning problems by following\nlong chain-of-thoughts (Long CoT) that incorporate reflection, backtracking,\nand self-validation. However, the training techniques and data requirements to\nelicit Long CoT remain poorly understood. In this work, we find that a Large\nLanguage model (LLM) can effectively learn Long CoT reasoning through\ndata-efficient supervised fine-tuning (SFT) and parameter-efficient low-rank\nadaptation (LoRA). With just 17k long CoT training samples, the\nQwen2.5-32B-Instruct model achieves significant improvements on a wide range of\nmath and coding benchmarks, including 56.7% (+40.0%) on AIME 2024 and 57.0%\n(+8.1%) on LiveCodeBench, competitive to the proprietary o1-preview model's\nscore of 44.6% and 59.1%. More importantly, we find that the structure of Long\nCoT is critical to the learning process, whereas the content of individual\nreasoning steps has minimal impact. Perturbations affecting content, such as\ntraining on incorrect samples or removing reasoning keywords, have little\nimpact on performance. In contrast, structural modifications that disrupt\nlogical consistency in the Long CoT, such as shuffling or deleting reasoning\nsteps, significantly degrade accuracy. For example, a model trained on Long CoT\nsamples with incorrect answers still achieves only 3.2% lower accuracy compared\nto training with fully correct samples. These insights deepen our understanding\nof how to elicit reasoning capabilities in LLMs and highlight key\nconsiderations for efficiently training the next generation of reasoning\nmodels. This is the academic paper of our previous released Sky-T1-32B-Preview\nmodel. Codes are available at https://github.com/NovaSky-AI/SkyThought.","upvotes":40,"discussionId":"67ac1c6536464325ebe3c723","ai_summary":"Data-efficient supervised fine-tuning and parameter-efficient low-rank adaptation enable large language models to learn long chain-of-thought reasoning effectively, with structural correctness being more crucial than content accuracy in training samples.","ai_keywords":["Large reasoning models","long chain-of-thoughts","Large Language model","data-efficient supervised fine-tuning","parameter-efficient low-rank adaptation","Qwen2.5-32B-Instruct model","AIME 2024","LiveCodeBench","Sky-T1-32B-Preview model"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6168218a4ed0b975c18f82a8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6168218a4ed0b975c18f82a8/vD4Q6KVcz5Td39QWTG-s7.png","isPro":true,"fullname":"NIONGOLO Chrys Fé-Marty","user":"Svngoku","type":"user"},{"_id":"66f612b934b8ac9ffa44f084","avatarUrl":"/avatars/6836c122e19c66c90f1673f28b30d7f0.svg","isPro":false,"fullname":"Tang","user":"tommysally","type":"user"},{"_id":"6560d75d6ff1b91e28e3cd7b","avatarUrl":"/avatars/bf205b47c71b197c56414ad1aaae3453.svg","isPro":false,"fullname":"js","user":"rldy","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646def60df618b303b419323/JLJGYen4-5M8ivsLsSk0w.jpeg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"},{"_id":"6410089f06c3b5ca8844b09f","avatarUrl":"/avatars/996428c281db497f4a15d09501a0b215.svg","isPro":false,"fullname":"Mei dianwen","user":"mdw123","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"6682ec8e9e8f301884217372","avatarUrl":"/avatars/65acb4e0c2e7328b27d75b18d0927444.svg","isPro":false,"fullname":"Zixiang Zheng","user":"imzhengzx","type":"user"},{"_id":"65c0db0fbda79a18292dfbb7","avatarUrl":"/avatars/1201b8282664c2d8c18beaba2396c03b.svg","isPro":false,"fullname":"Alsu Sagirova","user":"alsu-sagirova","type":"user"},{"_id":"64408cd43e0374802e19f454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64408cd43e0374802e19f454/nOnDGGBF0p-AwkCGw0IZh.png","isPro":false,"fullname":"Darrel Bryan","user":"ZeroXClem","type":"user"},{"_id":"668cd4bbe990292e5f6974d3","avatarUrl":"/avatars/d1747b2372e94500ecb5fb56809b482d.svg","isPro":false,"fullname":"Jinyeong Kim","user":"rubatoyeong","type":"user"},{"_id":"634c5f8cfb80cc6bcaf42c03","avatarUrl":"/avatars/1f37db0e70cbaf9707f4c8cbcee37ca0.svg","isPro":false,"fullname":"Daniil Laptev","user":"dlaptev","type":"user"},{"_id":"633038c7b68c7453d2e87416","avatarUrl":"/avatars/7e178840ef99bcc27bebfdc0a1799172.svg","isPro":false,"fullname":"pideeeelll","user":"pideeell","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Data-efficient supervised fine-tuning and parameter-efficient low-rank adaptation enable large language models to learn long chain-of-thought reasoning effectively, with structural correctness being more crucial than content accuracy in training samples.
AI-generated summary
Large reasoning models (LRMs) tackle complex reasoning problems by following
long chain-of-thoughts (Long CoT) that incorporate reflection, backtracking,
and self-validation. However, the training techniques and data requirements to
elicit Long CoT remain poorly understood. In this work, we find that a Large
Language model (LLM) can effectively learn Long CoT reasoning through
data-efficient supervised fine-tuning (SFT) and parameter-efficient low-rank
adaptation (LoRA). With just 17k long CoT training samples, the
Qwen2.5-32B-Instruct model achieves significant improvements on a wide range of
math and coding benchmarks, including 56.7% (+40.0%) on AIME 2024 and 57.0%
(+8.1%) on LiveCodeBench, competitive to the proprietary o1-preview model's
score of 44.6% and 59.1%. More importantly, we find that the structure of Long
CoT is critical to the learning process, whereas the content of individual
reasoning steps has minimal impact. Perturbations affecting content, such as
training on incorrect samples or removing reasoning keywords, have little
impact on performance. In contrast, structural modifications that disrupt
logical consistency in the Long CoT, such as shuffling or deleting reasoning
steps, significantly degrade accuracy. For example, a model trained on Long CoT
samples with incorrect answers still achieves only 3.2% lower accuracy compared
to training with fully correct samples. These insights deepen our understanding
of how to elicit reasoning capabilities in LLMs and highlight key
considerations for efficiently training the next generation of reasoning
models. This is the academic paper of our previous released Sky-T1-32B-Preview
model. Codes are available at https://github.com/NovaSky-AI/SkyThought.