Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Spectrum: Targeted Training on Signal to Noise Ratio
https://github.com/cognitivecomputations/spectrum\n","updatedAt":"2024-06-12T13:13:29.900Z","author":{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","fullname":"Sugato Ray","name":"sugatoray","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":46,"isUserFollowing":false}},"numEdits":0,"editors":["sugatoray"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg"],"reactions":[],"isReport":false}},{"id":"66d7dc8ca5098dc7702a803d","author":{"_id":"643006f01572f43a481766a9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/643006f01572f43a481766a9/egLlzmsWxHovmuvqAPnBO.jpeg","fullname":"_","name":"Xa9aX","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false},"createdAt":"2024-09-04T04:05:32.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"The same theory has been established for years now by https://github.com/CalculatedContent/WeightWatcher\nHowever there is no attribution to the same which is a let down","html":"
\n","updatedAt":"2024-09-04T04:05:32.628Z","author":{"_id":"643006f01572f43a481766a9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/643006f01572f43a481766a9/egLlzmsWxHovmuvqAPnBO.jpeg","fullname":"_","name":"Xa9aX","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9710479378700256},"editors":["Xa9aX"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/643006f01572f43a481766a9/egLlzmsWxHovmuvqAPnBO.jpeg"],"reactions":[{"reaction":"🚀","users":["avpatil"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2406.06623","authors":[{"_id":"666968caa42cba0d67d9caf4","name":"Eric Hartford","hidden":false},{"_id":"666968caa42cba0d67d9caf5","name":"Lucas Atkins","hidden":false},{"_id":"666968caa42cba0d67d9caf6","user":{"_id":"646e57a5cb6ea6e6b6df1ad4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646e57a5cb6ea6e6b6df1ad4/PlGhM2SUynFBUdYAylaZK.jpeg","isPro":true,"fullname":"Fernando Fernandes Neto","user":"fernandofernandes","type":"user"},"name":"Fernando Fernandes Neto","status":"claimed_verified","statusLastChangedAt":"2025-12-02T16:51:41.354Z","hidden":false},{"_id":"666968caa42cba0d67d9caf7","name":"David Golchinfar","hidden":false}],"publishedAt":"2024-06-07T21:20:57.000Z","title":"Spectrum: Targeted Training on Signal to Noise Ratio","summary":"Efficiently post-training large language models remains a challenging task\ndue to the vast computational resources required. We present Spectrum, a method\nthat accelerates LLM training by selectively targeting layer modules based on\ntheir signal-to-noise ratio (SNR), and freezing the remaining modules. Our\napproach, which utilizes an algorithm to compute module SNRs prior to training,\nhas shown to effectively match the performance of full fine-tuning while\nreducing GPU memory usage. Experiments comparing Spectrum to existing methods\nsuch as QLoRA demonstrate its effectiveness in terms of model quality and VRAM\nefficiency in distributed environments.","upvotes":15,"discussionId":"666968cba42cba0d67d9cb35","githubRepo":"https://github.com/cognitivecomputations/spectrum","githubRepoAddedBy":"auto","ai_summary":"Spectrum accelerates LLM training by selectively targeting and freezing layer modules based on signal-to-noise ratio, reducing GPU memory usage while maintaining performance.","ai_keywords":["large language models","signal-to-noise ratio","SNR","module freezing","model quality","VRAM efficiency","QLoRA","distributed environments"],"githubStars":142},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","isPro":true,"fullname":"Sugato Ray","user":"sugatoray","type":"user"},{"_id":"626505d493e0b04d75710566","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626505d493e0b04d75710566/9rfJc9ORXU9J5a42Ev3v6.png","isPro":true,"fullname":"Stefano Fiorucci","user":"anakin87","type":"user"},{"_id":"629f3b18ee05727ce328ccbe","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1669189789447-629f3b18ee05727ce328ccbe.jpeg","isPro":false,"fullname":"Kashif Rasul","user":"kashif","type":"user"},{"_id":"604ecc325105a43f185b310f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659093846782-604ecc325105a43f185b310f.jpeg","isPro":false,"fullname":"Esmaeiliyan","user":"Mohammadreza","type":"user"},{"_id":"646e57a5cb6ea6e6b6df1ad4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646e57a5cb6ea6e6b6df1ad4/PlGhM2SUynFBUdYAylaZK.jpeg","isPro":true,"fullname":"Fernando Fernandes Neto","user":"fernandofernandes","type":"user"},{"_id":"64b999a40b24527e9c25583a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b999a40b24527e9c25583a/xFHCewJdf5EGn8qDPypqy.jpeg","isPro":true,"fullname":"David Golchinfar","user":"DavidGF","type":"user"},{"_id":"605b1cf890a4b6bc0eef99ad","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/605b1cf890a4b6bc0eef99ad/yellL1zLP9Odnp09rAjVF.jpeg","isPro":true,"fullname":"Florian Zimmermeister","user":"flozi00","type":"user"},{"_id":"5f17f0a0925b9863e28ad517","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f17f0a0925b9863e28ad517/fXIY5i9RLsIa1v3CCuVtt.jpeg","isPro":true,"fullname":"Victor Mustar","user":"victor","type":"user"},{"_id":"6317233cc92fd6fee317e030","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png","isPro":false,"fullname":"Tom Aarsen","user":"tomaarsen","type":"user"},{"_id":"60f0608166e5701b80ed3f02","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60f0608166e5701b80ed3f02/x3tcqufwDX_d0N69VVNvn.jpeg","isPro":false,"fullname":"Alvaro Bartolome","user":"alvarobartt","type":"user"},{"_id":"63107b18e87051f3e3e0f598","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63107b18e87051f3e3e0f598/R9onir4Y0MZuq1jEWCZ2-.jpeg","isPro":false,"fullname":"Unchun Yang","user":"ucyang","type":"user"},{"_id":"603d3246fd24000a35de1bf8","avatarUrl":"/avatars/ce90534887f167609daf8917d6ec4f9e.svg","isPro":false,"fullname":"Chintan Gotecha","user":"gaussfer","type":"user"}],"acceptLanguages":["*"]}">
Spectrum accelerates LLM training by selectively targeting and freezing layer modules based on signal-to-noise ratio, reducing GPU memory usage while maintaining performance.
AI-generated summary
Efficiently post-training large language models remains a challenging task
due to the vast computational resources required. We present Spectrum, a method
that accelerates LLM training by selectively targeting layer modules based on
their signal-to-noise ratio (SNR), and freezing the remaining modules. Our
approach, which utilizes an algorithm to compute module SNRs prior to training,
has shown to effectively match the performance of full fine-tuning while
reducing GPU memory usage. Experiments comparing Spectrum to existing methods
such as QLoRA demonstrate its effectiveness in terms of model quality and VRAM
efficiency in distributed environments.