Sure, sent in a request. Going to get my phone so I can join the WeChat group.
\n","updatedAt":"2024-06-12T10:17:12.732Z","author":{"_id":"61344ab1d19e49a35751462e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg","fullname":"Puffy Bird","name":"puffy310","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"editors":["puffy310"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg"],"reactions":[{"reaction":"🔥","users":["AdinaY","puffy310"],"count":2}],"isReport":false,"parentCommentId":"66696a0e14d1fa86f7b6ef3b"}},{"id":"666976729d9c9930ebb188c3","author":{"_id":"61344ab1d19e49a35751462e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg","fullname":"Puffy Bird","name":"puffy310","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false},"createdAt":"2024-06-12T10:20:34.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"WeChat QR link seems to link to HF Blog post.","html":"WeChat QR link seems to link to HF Blog post.
\n","updatedAt":"2024-06-12T10:20:34.243Z","author":{"_id":"61344ab1d19e49a35751462e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg","fullname":"Puffy Bird","name":"puffy310","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"editors":["puffy310"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"66696a0e14d1fa86f7b6ef3b"}},{"id":"666977284ca10632338bf6c9","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1145,"isUserFollowing":false},"createdAt":"2024-06-12T10:23:36.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Added ✅ Feel free to contribute and share your advice : ) \nFor Wechat, it's our public account, we usually share blogs and some news on it. ","html":"Added ✅ Feel free to contribute and share your advice : )
For Wechat, it's our public account, we usually share blogs and some news on it.
I misread as a group chat link haha, it would be a decent idea to have a group though.
\n","updatedAt":"2024-06-12T10:26:43.882Z","author":{"_id":"61344ab1d19e49a35751462e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg","fullname":"Puffy Bird","name":"puffy310","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"editors":["puffy310"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg"],"reactions":[{"reaction":"👍","users":["AdinaY"],"count":1}],"isReport":false,"parentCommentId":"66696a0e14d1fa86f7b6ef3b"}},{"id":"6669a0525d8457804a6d54de","author":{"_id":"61344ab1d19e49a35751462e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg","fullname":"Puffy Bird","name":"puffy310","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false},"createdAt":"2024-06-12T13:19:14.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I wonder if it’d be a good idea to mirror some model scope versions of current Chinese LLMs, some of them have slight differences (especially the instruction models)","html":"I wonder if it’d be a good idea to mirror some model scope versions of current Chinese LLMs, some of them have slight differences (especially the instruction models)
\n","updatedAt":"2024-06-12T13:19:14.493Z","author":{"_id":"61344ab1d19e49a35751462e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg","fullname":"Puffy Bird","name":"puffy310","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"editors":["puffy310"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1630816930903-noauth.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"66696a0e14d1fa86f7b6ef3b"}}]},{"id":"666a4652a638e57bb77351c6","author":{"_id":"64a2dfbfd1439128d2abf421","avatarUrl":"/avatars/4d45a97c78b438d944b61372b0317483.svg","fullname":"Chen","name":"Ethan1234567","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-06-13T01:07:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"great job. ","html":"great job.
\n","updatedAt":"2024-06-13T01:07:30.985Z","author":{"_id":"64a2dfbfd1439128d2abf421","avatarUrl":"/avatars/4d45a97c78b438d944b61372b0317483.svg","fullname":"Chen","name":"Ethan1234567","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"editors":["Ethan1234567"],"editorAvatarUrls":["/avatars/4d45a97c78b438d944b61372b0317483.svg"],"reactions":[],"isReport":false}},{"id":"666a481cf0a7955b82c53b4d","author":{"_id":"64a2dfbfd1439128d2abf421","avatarUrl":"/avatars/4d45a97c78b438d944b61372b0317483.svg","fullname":"Chen","name":"Ethan1234567","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-06-13T01:15:08.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"In section 3, \"Upcycling vs. From Scratch\", train the MoE model from scratch is better then upcycling.\n\nBut, in section 5, \"Skywork-MoE\", it initialized from our in-house pre-trained Skywork-13B.\n\nCan you tell me the detailed information?","html":"In section 3, \"Upcycling vs. From Scratch\", train the MoE model from scratch is better then upcycling.
\nBut, in section 5, \"Skywork-MoE\", it initialized from our in-house pre-trained Skywork-13B.
\nCan you tell me the detailed information?
\n","updatedAt":"2024-06-13T01:15:08.416Z","author":{"_id":"64a2dfbfd1439128d2abf421","avatarUrl":"/avatars/4d45a97c78b438d944b61372b0317483.svg","fullname":"Chen","name":"Ethan1234567","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"editors":["Ethan1234567"],"editorAvatarUrls":["/avatars/4d45a97c78b438d944b61372b0317483.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2406.06563","authors":[{"_id":"666919d303e3c114f06b8f15","user":{"_id":"63d8773a635499f21644601b","avatarUrl":"/avatars/07efea204d172e2844dbacc418131da2.svg","isPro":false,"fullname":"Wei Tianwen","user":"weitianwen","type":"user"},"name":"Tianwen Wei","status":"admin_assigned","statusLastChangedAt":"2024-06-12T09:14:32.771Z","hidden":false},{"_id":"666919d303e3c114f06b8f16","name":"Bo Zhu","hidden":false},{"_id":"666919d303e3c114f06b8f17","user":{"_id":"64996ce2ed09df36eaf236ea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64996ce2ed09df36eaf236ea/wIr1j2zkAezXfpsczTjq_.jpeg","isPro":false,"fullname":"zhaoliang","user":"zhao1iang","type":"user"},"name":"Liang Zhao","status":"claimed_verified","statusLastChangedAt":"2024-06-12T09:21:14.969Z","hidden":false},{"_id":"666919d303e3c114f06b8f18","user":{"_id":"64535b71bcbd25618f7655da","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64535b71bcbd25618f7655da/vDc8nrYQd4rpYgXLWOuIx.jpeg","isPro":false,"fullname":"cheng cheng","user":"chengtbf","type":"user"},"name":"Cheng Cheng","status":"claimed_verified","statusLastChangedAt":"2024-06-12T09:21:16.941Z","hidden":false},{"_id":"666919d303e3c114f06b8f19","name":"Biye Li","hidden":false},{"_id":"666919d303e3c114f06b8f1a","name":"Weiwei Lü","hidden":false},{"_id":"666919d303e3c114f06b8f1b","user":{"_id":"636b8f4429668f04eaf5d015","avatarUrl":"/avatars/84b6209914cb83af7cfce2ffb337d7bd.svg","isPro":false,"fullname":"CPFLAME","user":"CPFLAME","type":"user"},"name":"Peng Cheng","status":"claimed_verified","statusLastChangedAt":"2024-06-12T09:24:00.021Z","hidden":false},{"_id":"666919d303e3c114f06b8f1c","user":{"_id":"64235adf23b98d5ee6e9ca85","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64235adf23b98d5ee6e9ca85/m_7uwLaAkTXyt3qB2pPLA.png","isPro":false,"fullname":"Jianhao Zhang","user":"Heerozh","type":"user"},"name":"Jianhao Zhang","status":"admin_assigned","statusLastChangedAt":"2024-06-12T09:24:18.827Z","hidden":false},{"_id":"666919d303e3c114f06b8f1d","user":{"_id":"63647823a1503c61efd4fbcb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63647823a1503c61efd4fbcb/Df5U59Imobmr9S8wDf77I.png","isPro":false,"fullname":"Yi Wang","user":"BBuf","type":"user"},"name":"Xiaoyu Zhang","status":"claimed_verified","statusLastChangedAt":"2024-06-12T11:50:52.655Z","hidden":false},{"_id":"666919d303e3c114f06b8f1e","user":{"_id":"6621efe1a6eec3ad03e38759","avatarUrl":"/avatars/c35acce69f244ec0833dffd53eedf6a3.svg","isPro":false,"fullname":"Liang Zeng","user":"zengliangcs","type":"user"},"name":"Liang Zeng","status":"claimed_verified","statusLastChangedAt":"2024-06-12T09:31:25.323Z","hidden":false},{"_id":"666919d303e3c114f06b8f1f","user":{"_id":"62be9b5aae56e75e4d689e7c","avatarUrl":"/avatars/6772bc09d6eeb4e86b1210481be91720.svg","isPro":false,"fullname":"wangxiaokun","user":"shawn0wang","type":"user"},"name":"Xiaokun Wang","status":"claimed_verified","statusLastChangedAt":"2025-01-09T10:12:40.976Z","hidden":false},{"_id":"666919d303e3c114f06b8f20","name":"Yutuan Ma","hidden":false},{"_id":"666919d303e3c114f06b8f21","name":"Rui Hu","hidden":false},{"_id":"666919d303e3c114f06b8f22","name":"Shuicheng Yan","hidden":false},{"_id":"666919d303e3c114f06b8f23","name":"Han Fang","hidden":false},{"_id":"666919d303e3c114f06b8f24","name":"Yahui Zhou","hidden":false}],"publishedAt":"2024-06-03T03:58:41.000Z","submittedOnDailyAt":"2024-06-12T02:15:25.476Z","title":"Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts\n Language Models","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"In this technical report, we introduce the training methodologies implemented\nin the development of Skywork-MoE, a high-performance mixture-of-experts (MoE)\nlarge language model (LLM) with 146 billion parameters and 16 experts. It is\ninitialized from the pre-existing dense checkpoints of our Skywork-13B model.\nWe explore the comparative effectiveness of upcycling versus training from\nscratch initializations. Our findings suggest that the choice between these two\napproaches should consider both the performance of the existing dense\ncheckpoints and the MoE training budget. We highlight two innovative\ntechniques: gating logit normalization, which improves expert diversification,\nand adaptive auxiliary loss coefficients, allowing for layer-specific\nadjustment of auxiliary loss coefficients. Our experimental results validate\nthe effectiveness of these methods. Leveraging these techniques and insights,\nwe trained our upcycled Skywork-MoE on a condensed subset of our SkyPile\ncorpus. The evaluation results demonstrate that our model delivers strong\nperformance across a wide range of benchmarks.","upvotes":20,"discussionId":"666919d503e3c114f06b8f8c","ai_summary":"Skywork-MoE, a high-performance mixture-of-experts language model, uses upcycling and innovative techniques like gating logit normalization and adaptive auxiliary loss coefficients to achieve strong performance across benchmarks.","ai_keywords":["mixture-of-experts","large language model","MoE","gating logit normalization","adaptive auxiliary loss coefficients"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},{"_id":"64996ce2ed09df36eaf236ea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64996ce2ed09df36eaf236ea/wIr1j2zkAezXfpsczTjq_.jpeg","isPro":false,"fullname":"zhaoliang","user":"zhao1iang","type":"user"},{"_id":"64535b71bcbd25618f7655da","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64535b71bcbd25618f7655da/vDc8nrYQd4rpYgXLWOuIx.jpeg","isPro":false,"fullname":"cheng cheng","user":"chengtbf","type":"user"},{"_id":"652260e0728c0b6dc72dd957","avatarUrl":"/avatars/71950c532d8da892eb0af4e9f4f00151.svg","isPro":false,"fullname":"Skywork","user":"SkyworkAI","type":"user"},{"_id":"63d8773a635499f21644601b","avatarUrl":"/avatars/07efea204d172e2844dbacc418131da2.svg","isPro":false,"fullname":"Wei Tianwen","user":"weitianwen","type":"user"},{"_id":"646b342a36505117e229affe","avatarUrl":"/avatars/a76fff929dc54cedff9ff8642abd78c4.svg","isPro":false,"fullname":"Jędrzej","user":"YenJJ","type":"user"},{"_id":"64747f7e33192631bacd8831","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64747f7e33192631bacd8831/dstkZJ4sHJSeqLesV5cOC.jpeg","isPro":false,"fullname":"Taufiq Dwi Purnomo","user":"taufiqdp","type":"user"},{"_id":"6234a8105e7398c64ac62199","avatarUrl":"/avatars/1ae2fc3910a64bd91d20aadb267c0bc3.svg","isPro":false,"fullname":"Maozhou Ge","user":"Gmc2","type":"user"},{"_id":"6621efe1a6eec3ad03e38759","avatarUrl":"/avatars/c35acce69f244ec0833dffd53eedf6a3.svg","isPro":false,"fullname":"Liang Zeng","user":"zengliangcs","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"6468e3dff43574d9556d9927","avatarUrl":"/avatars/be3670d7359049d1738a2b067e67883d.svg","isPro":false,"fullname":"David","user":"flopsy1","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
Abstract
Skywork-MoE, a high-performance mixture-of-experts language model, uses upcycling and innovative techniques like gating logit normalization and adaptive auxiliary loss coefficients to achieve strong performance across benchmarks.
In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initializations. Our findings suggest that the choice between these two approaches should consider both the performance of the existing dense checkpoints and the MoE training budget. We highlight two innovative techniques: gating logit normalization, which improves expert diversification, and adaptive auxiliary loss coefficients, allowing for layer-specific adjustment of auxiliary loss coefficients. Our experimental results validate the effectiveness of these methods. Leveraging these techniques and insights, we trained our upcycled Skywork-MoE on a condensed subset of our SkyPile corpus. The evaluation results demonstrate that our model delivers strong performance across a wide range of benchmarks.
Community
great job.
In section 3, "Upcycling vs. From Scratch", train the MoE model from scratch is better then upcycling.
But, in section 5, "Skywork-MoE", it initialized from our in-house pre-trained Skywork-13B.
Can you tell me the detailed information?
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper