Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - LLaMA Pro: Progressive LLaMA with Block Expansion
\n","updatedAt":"2024-06-08T23:34:17.928Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.504166841506958},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2401.02415","authors":[{"_id":"65978d26b4b5c254cb8b0924","name":"Chengyue Wu","hidden":false},{"_id":"65978d26b4b5c254cb8b0925","name":"Yukang Gan","hidden":false},{"_id":"65978d26b4b5c254cb8b0926","name":"Yixiao Ge","hidden":false},{"_id":"65978d26b4b5c254cb8b0927","user":{"_id":"635626a8ec32331b227f407b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635626a8ec32331b227f407b/KRkAEN3eXN_mYzzJQ_8dO.jpeg","isPro":false,"fullname":"LuZeyu","user":"whlzy","type":"user"},"name":"Zeyu Lu","status":"claimed_verified","statusLastChangedAt":"2024-09-04T12:00:53.981Z","hidden":false},{"_id":"65978d26b4b5c254cb8b0928","user":{"_id":"65994517d3f2137415bb4d5b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65994517d3f2137415bb4d5b/PhjWetLTMn3bThWFLGCxR.png","isPro":false,"fullname":"WANG Jiahao","user":"techmonsterwang","type":"user"},"name":"Jiahao Wang","status":"claimed_verified","statusLastChangedAt":"2024-04-11T09:02:17.717Z","hidden":false},{"_id":"65978d26b4b5c254cb8b0929","name":"Ye Feng","hidden":false},{"_id":"65978d26b4b5c254cb8b092a","name":"Ping Luo","hidden":false},{"_id":"65978d26b4b5c254cb8b092b","name":"Ying Shan","hidden":false}],"publishedAt":"2024-01-04T18:59:12.000Z","submittedOnDailyAt":"2024-01-05T02:31:27.313Z","title":"LLaMA Pro: Progressive LLaMA with Block Expansion","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Humans generally acquire new skills without compromising the old; however,\nthe opposite holds for Large Language Models (LLMs), e.g., from LLaMA to\nCodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with\nan expansion of Transformer blocks. We tune the expanded blocks using only new\ncorpus, efficiently and effectively improving the model's knowledge without\ncatastrophic forgetting. In this paper, we experiment on the corpus of code and\nmath, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from\nLLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro\nand its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced\nperformance among various benchmarks, demonstrating superiority over existing\nopen models in the LLaMA family and the immense potential of reasoning and\naddressing diverse tasks as an intelligent agent. Our findings provide valuable\ninsights into integrating natural and programming languages, laying a solid\nfoundation for developing advanced language agents that operate effectively in\nvarious environments.","upvotes":54,"discussionId":"65978d27b4b5c254cb8b0956","githubRepo":"https://github.com/tencentarc/llama-pro","githubRepoAddedBy":"auto","ai_summary":"A new post-pretraining method using expanded Transformer blocks for Large Language Models improves knowledge without catastrophic forgetting, yielding LLaMA Pro-8.3B that excels in general tasks, programming, and mathematics.","ai_keywords":["Large Language Models","LLMs","post-pretraining method","Transformer blocks","catastrophic forgetting","LLaMA","CodeLLaMA","LLaMA Pro-8.3B","instruction-following","advanced performance","benchmarks","reasoning","intelligent agent"],"githubStars":514},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"612ee6a7b960e78c6d2319d4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/612ee6a7b960e78c6d2319d4/2Hu9BaAyXbyh1vt0v1Qui.jpeg","isPro":false,"fullname":"Qian Liu","user":"SivilTaram","type":"user"},{"_id":"640e9762b03f4cd29f58d982","avatarUrl":"/avatars/81da37d628163fe3e094b247c7c3a3b5.svg","isPro":false,"fullname":"Yixiao Ge","user":"yxgeee","type":"user"},{"_id":"62ecdc18b72a69615d6bd857","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62ecdc18b72a69615d6bd857/qAHhWJbSsmoezFHiErBUT.png","isPro":true,"fullname":"Daniel (Unsloth)","user":"danielhanchen","type":"user"},{"_id":"640eb45dfdeaae139086c107","avatarUrl":"/avatars/4468296a032446b1109cbf79a585858b.svg","isPro":true,"fullname":"Elad Rave","user":"erave02","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6572aa5849719ff0da8a9c8c","avatarUrl":"/avatars/516d1d3aaa0439ab738977787ce9c7d4.svg","isPro":false,"fullname":"CLEMENT L","user":"LVXXX","type":"user"},{"_id":"628c07605f7c5912e46f58f6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653344093596-noauth.jpeg","isPro":false,"fullname":"Ilia Sidorenko","user":"noway","type":"user"},{"_id":"6311bca0ae8896941da24e66","avatarUrl":"/avatars/48de64894fc3c9397e26e4d6da3ff537.svg","isPro":false,"fullname":"Fynn KrΓΆger","user":"fynnkroeger","type":"user"},{"_id":"63284f86cbc744f197050300","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63284f86cbc744f197050300/cGbUDe5fn-8A8Jcmz5lre.png","isPro":false,"fullname":"Hoptimizer","user":"bunnycore","type":"user"},{"_id":"658317edd6cc28d6bd53f498","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/658317edd6cc28d6bd53f498/Y2SRjgS_UY_L00eeWbWBq.jpeg","isPro":false,"fullname":"Arthur Thouvenin","user":"athouvenin","type":"user"},{"_id":"64849395c830787e011af5e9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/-KlTJXknmhqqq3MwSKRXN.jpeg","isPro":false,"fullname":"Matin mollapur","user":"Matinmollapur01","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
A new post-pretraining method using expanded Transformer blocks for Large Language Models improves knowledge without catastrophic forgetting, yielding LLaMA Pro-8.3B that excels in general tasks, programming, and mathematics.
AI-generated summary
Humans generally acquire new skills without compromising the old; however,
the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to
CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with
an expansion of Transformer blocks. We tune the expanded blocks using only new
corpus, efficiently and effectively improving the model's knowledge without
catastrophic forgetting. In this paper, we experiment on the corpus of code and
math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from
LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro
and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced
performance among various benchmarks, demonstrating superiority over existing
open models in the LLaMA family and the immense potential of reasoning and
addressing diverse tasks as an intelligent agent. Our findings provide valuable
insights into integrating natural and programming languages, laying a solid
foundation for developing advanced language agents that operate effectively in
various environments.