Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Implicit Reasoning in Transformers is Reasoning through Shortcuts
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-14T01:38:54.567Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7523351907730103},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.07604","authors":[{"_id":"67cfa4ecd8cb8688d7d6d8b5","name":"Tianhe Lin","hidden":false},{"_id":"67cfa4ecd8cb8688d7d6d8b6","user":{"_id":"62d65139667051e0a29bffe7","avatarUrl":"/avatars/0252aa2bcd4cf1c8e4b87e5f164b6da5.svg","isPro":false,"fullname":"Jian Xie","user":"hsaest","type":"user"},"name":"Jian Xie","status":"claimed_verified","statusLastChangedAt":"2025-03-12T08:42:36.765Z","hidden":false},{"_id":"67cfa4ecd8cb8688d7d6d8b7","user":{"_id":"62d62b333bf5e059f7d2b286","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1668513815771-62d62b333bf5e059f7d2b286.jpeg","isPro":false,"fullname":"Siyu Yuan","user":"siyuyuan","type":"user"},"name":"Siyu Yuan","status":"admin_assigned","statusLastChangedAt":"2025-03-12T15:48:31.002Z","hidden":false},{"_id":"67cfa4ecd8cb8688d7d6d8b8","name":"Deqing Yang","hidden":false}],"publishedAt":"2025-03-10T17:58:31.000Z","submittedOnDailyAt":"2025-03-12T01:01:59.633Z","title":"Implicit Reasoning in Transformers is Reasoning through Shortcuts","submittedOnDailyBy":{"_id":"62d65139667051e0a29bffe7","avatarUrl":"/avatars/0252aa2bcd4cf1c8e4b87e5f164b6da5.svg","isPro":false,"fullname":"Jian Xie","user":"hsaest","type":"user"},"summary":"Test-time compute is emerging as a new paradigm for enhancing language\nmodels' complex multi-step reasoning capabilities, as demonstrated by the\nsuccess of OpenAI's o1 and o3, as well as DeepSeek's R1. Compared to explicit\nreasoning in test-time compute, implicit reasoning is more inference-efficient,\nrequiring fewer generated tokens. However, why does the advanced reasoning\ncapability fail to emerge in the implicit reasoning style? In this work, we\ntrain GPT-2 from scratch on a curated multi-step mathematical reasoning dataset\nand conduct analytical experiments to investigate how language models perform\nimplicit reasoning in multi-step tasks. Our findings reveal: 1) Language models\ncan perform step-by-step reasoning and achieve high accuracy in both in-domain\nand out-of-domain tests via implicit reasoning. However, this capability only\nemerges when trained on fixed-pattern data. 2) Conversely, implicit reasoning\nabilities emerging from training on unfixed-pattern data tend to overfit a\nspecific pattern and fail to generalize further. Notably, this limitation is\nalso observed in state-of-the-art large language models. These findings suggest\nthat language models acquire implicit reasoning through shortcut learning,\nenabling strong performance on tasks with similar patterns while lacking\ngeneralization.","upvotes":23,"discussionId":"67cfa4edd8cb8688d7d6d908","githubRepo":"https://github.com/TianheL/LM-Implicit-Reasoning","githubRepoAddedBy":"user","ai_summary":"Language models can perform implicit reasoning for multi-step tasks but overfit when trained on unfixed-pattern data, leading to poor generalization.","ai_keywords":["test-time compute","implicit reasoning","GPT-2","multi-step mathematical reasoning","step-by-step reasoning","in-domain tests","out-of-domain tests","fixed-pattern data","unfixed-pattern data","shortcut learning","generalization"],"githubStars":17},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62d65139667051e0a29bffe7","avatarUrl":"/avatars/0252aa2bcd4cf1c8e4b87e5f164b6da5.svg","isPro":false,"fullname":"Jian Xie","user":"hsaest","type":"user"},{"_id":"646dbd9fdf618b303b4073ef","avatarUrl":"/avatars/5b3bf8373c8ad403b7729e02e4fed224.svg","isPro":false,"fullname":"THLin","user":"nishinonana","type":"user"},{"_id":"63f86b099f87cc3e645b51d9","avatarUrl":"/avatars/27ca5ba425640bf67474cee871e8e53a.svg","isPro":false,"fullname":"Ellie Chen","user":"sheep33333","type":"user"},{"_id":"67919d669bed6c0d6aa573f5","avatarUrl":"/avatars/f115862341a80dec2336f27f8ee8d38f.svg","isPro":false,"fullname":"yang","user":"fengfan933","type":"user"},{"_id":"65192fc334c26962537eb300","avatarUrl":"/avatars/25705437259469e5fe69819304079849.svg","isPro":false,"fullname":"ssss","user":"ADOHAHA123","type":"user"},{"_id":"64f2a228f40f35cfa3e8edfd","avatarUrl":"/avatars/0671cb4df8f3d3bcaaa95aad3d0a46c2.svg","isPro":false,"fullname":"Siye Wu","user":"Siye01","type":"user"},{"_id":"65bef8ca05dbcdb7c17758f4","avatarUrl":"/avatars/e49f09824c09cbca3d9d4637d14a8587.svg","isPro":false,"fullname":"Jinghan Xu","user":"Y-0L0","type":"user"},{"_id":"643f9e2288d9d4488fd81c52","avatarUrl":"/avatars/e589c9cbd47022883cf33d7555bee89c.svg","isPro":false,"fullname":"Tinghui Zhu","user":"DarthZhu","type":"user"},{"_id":"66f612b934b8ac9ffa44f084","avatarUrl":"/avatars/6836c122e19c66c90f1673f28b30d7f0.svg","isPro":false,"fullname":"Tang","user":"tommysally","type":"user"},{"_id":"64994245f8069251837c3a0c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64994245f8069251837c3a0c/3Ig0jgUDmibzEPWqFxmUg.png","isPro":false,"fullname":"CaiyuHu","user":"caiyuhu","type":"user"},{"_id":"6749b9b54431ba7184411328","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/c2DvvGF_Ga5rKY9iJuyib.png","isPro":false,"fullname":"Xinfeng","user":"Joanna-Yuan","type":"user"},{"_id":"677314621db9d72552e9c4e5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/RzZ5OVKejz_PpfmdC9aCR.png","isPro":false,"fullname":"Jerry Ji","user":"jerryji","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Language models can perform implicit reasoning for multi-step tasks but overfit when trained on unfixed-pattern data, leading to poor generalization.
AI-generated summary
Test-time compute is emerging as a new paradigm for enhancing language
models' complex multi-step reasoning capabilities, as demonstrated by the
success of OpenAI's o1 and o3, as well as DeepSeek's R1. Compared to explicit
reasoning in test-time compute, implicit reasoning is more inference-efficient,
requiring fewer generated tokens. However, why does the advanced reasoning
capability fail to emerge in the implicit reasoning style? In this work, we
train GPT-2 from scratch on a curated multi-step mathematical reasoning dataset
and conduct analytical experiments to investigate how language models perform
implicit reasoning in multi-step tasks. Our findings reveal: 1) Language models
can perform step-by-step reasoning and achieve high accuracy in both in-domain
and out-of-domain tests via implicit reasoning. However, this capability only
emerges when trained on fixed-pattern data. 2) Conversely, implicit reasoning
abilities emerging from training on unfixed-pattern data tend to overfit a
specific pattern and fail to generalize further. Notably, this limitation is
also observed in state-of-the-art large language models. These findings suggest
that language models acquire implicit reasoning through shortcut learning,
enabling strong performance on tasks with similar patterns while lacking
generalization.
This paper finds LMs can perform stepwise implicit reasoning if trained on fixed pattern data, yet such a capability is through shortcuts and cannot generalize.