Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-02-09T01:21:53.074Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7376682162284851},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"66655710da1e6adefec9c278","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false},"createdAt":"2024-06-09T07:17:36.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# BiLLM: Supercharge LLMs with 1-Bit Quantization! ๐Ÿš€\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/hPxk5dUwrohkA0_4wv-cS.mp4 \n\n## Links ๐Ÿ”—:\n๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix\n๐Ÿ‘‰ Twitter: https://x.com/arxflix\n๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

\n\t\n\t\t\n\t\n\t\n\t\tBiLLM: Supercharge LLMs with 1-Bit Quantization! ๐Ÿš€\n\t\n

\n

\n\n

\n\t\n\t\t\n\t\n\t\n\t\tLinks ๐Ÿ”—:\n\t\n

\n

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

\n

By Arxflix
\"9t4iCUHx_400x400-1.jpg\"

\n","updatedAt":"2024-06-09T07:17:36.110Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":176,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5152819752693176},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.04291","authors":[{"_id":"65c43a4ff19b126a3cd92d10","user":{"_id":"656db3f53dc1d277e5a64410","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656db3f53dc1d277e5a64410/9kiY2K3MCRcBDk7MrkTBK.png","isPro":false,"fullname":"Wei Huang","user":"AaronHuangWei","type":"user"},"name":"Wei Huang","status":"claimed_verified","statusLastChangedAt":"2024-04-22T06:55:21.980Z","hidden":false},{"_id":"65c43a4ff19b126a3cd92d11","name":"Yangdong Liu","hidden":false},{"_id":"65c43a4ff19b126a3cd92d12","user":{"_id":"65c49589c0b1921e19260a8d","avatarUrl":"/avatars/7ce9af8c627f2a0c3db6bde82290ee1f.svg","isPro":false,"fullname":"Haotong Qin","user":"HaotongQin","type":"user"},"name":"Haotong Qin","status":"extracted_confirmed","statusLastChangedAt":"2025-01-03T00:46:32.543Z","hidden":false},{"_id":"65c43a4ff19b126a3cd92d13","name":"Ying Li","hidden":false},{"_id":"65c43a4ff19b126a3cd92d14","name":"Shiming Zhang","hidden":false},{"_id":"65c43a4ff19b126a3cd92d15","name":"Xianglong Liu","hidden":false},{"_id":"65c43a4ff19b126a3cd92d16","name":"Michele Magno","hidden":false},{"_id":"65c43a4ff19b126a3cd92d17","user":{"_id":"6875266f9cd3191dfddc7071","avatarUrl":"/avatars/64c581910833b111e9a7bae5b8740229.svg","isPro":false,"fullname":"xiaojuan qi","user":"xjqi","type":"user"},"name":"Xiaojuan Qi","status":"claimed_verified","statusLastChangedAt":"2025-07-15T19:15:42.987Z","hidden":false}],"publishedAt":"2024-02-06T09:26:34.000Z","submittedOnDailyAt":"2024-02-07T23:50:01.017Z","title":"BiLLM: Pushing the Limit of Post-Training Quantization for LLMs","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Pretrained large language models (LLMs) exhibit exceptional general language\nprocessing capabilities but come with significant demands on memory and\ncomputational resources. As a powerful compression technology, binarization can\nextremely reduce model weights to a mere 1 bit, lowering the expensive\ncomputation and memory requirements. However, existing quantization techniques\nfall short of maintaining LLM performance under ultra-low bit-widths. In\nresponse to this challenge, we present BiLLM, a groundbreaking 1-bit\npost-training quantization scheme tailored for pretrained LLMs. Based on the\nweight distribution of LLMs, BiLLM first identifies and structurally selects\nsalient weights, and minimizes the compression loss through an effective binary\nresidual approximation strategy. Moreover, considering the bell-shaped\ndistribution of the non-salient weights, we propose an optimal splitting search\nto group and binarize them accurately. BiLLM achieving for the first time\nhigh-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit\nweights across various LLMs families and evaluation metrics, outperforms SOTA\nquantization methods of LLM by significant margins. Moreover, BiLLM enables the\nbinarization process of the LLM with 7 billion weights within 0.5 hours on a\nsingle GPU, demonstrating satisfactory time efficiency.","upvotes":50,"discussionId":"65c43a51f19b126a3cd92d63","githubRepo":"https://github.com/aaronhuang-778/billm","githubRepoAddedBy":"auto","ai_summary":"BiLLM, a 1-bit post-training quantization scheme for pretrained LLMs, achieves high-accuracy inference with reduced computational and memory requirements.","ai_keywords":["binarization","post-training quantization","salient weights","binary residual approximation","optimal splitting search","bell-shaped distribution","perplexity","LLaMA2-70B","SOTA"],"githubStars":228},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"64395f66b9ac1d55f41e5cc4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64395f66b9ac1d55f41e5cc4/qhzWbKjN0zyRIlwpg8JRe.png","isPro":false,"fullname":"gunasekar","user":"GunA-SD","type":"user"},{"_id":"644e1b1d9b4e87c31bab0a14","avatarUrl":"/avatars/88bb4c4a67dc8958069e9014f5e73a0b.svg","isPro":false,"fullname":"Michael Barry","user":"MichaelBarryUK","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"65c49589c0b1921e19260a8d","avatarUrl":"/avatars/7ce9af8c627f2a0c3db6bde82290ee1f.svg","isPro":false,"fullname":"Haotong Qin","user":"HaotongQin","type":"user"},{"_id":"656db3f53dc1d277e5a64410","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656db3f53dc1d277e5a64410/9kiY2K3MCRcBDk7MrkTBK.png","isPro":false,"fullname":"Wei Huang","user":"AaronHuangWei","type":"user"},{"_id":"65c49c515cc8843f4fef1b45","avatarUrl":"/avatars/60293bc1dd9a4c77d3880d283bf81f5f.svg","isPro":false,"fullname":"Tan zhangyao","user":"TanZhangyao","type":"user"},{"_id":"648855a86469f6c5639fb48b","avatarUrl":"/avatars/946c15ade2557d9d931d62ee741bddd4.svg","isPro":false,"fullname":"earsaxcs","user":"earsax","type":"user"},{"_id":"65c4a120fd704b3af2ac609b","avatarUrl":"/avatars/a518e08856ef3e4ce9b507b6907a154e.svg","isPro":false,"fullname":"Chris Wen","user":"chrisleff","type":"user"},{"_id":"65c49fb93b957da6c19618d8","avatarUrl":"/avatars/998daed14b12c2455129a07ac29219f2.svg","isPro":false,"fullname":"hulk","user":"hulk7610","type":"user"},{"_id":"6554b8dcfe564c494fa28a43","avatarUrl":"/avatars/f7e9bfe1544d0a5b01cb43f4f5c3cd4f.svg","isPro":false,"fullname":"lirunyang","user":"lryoung","type":"user"},{"_id":"65c4ad4a885203c0053c3483","avatarUrl":"/avatars/635f7c1b4b4ad04d422384a2b9633807.svg","isPro":false,"fullname":"Ivo Pang","user":"ivopang","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":2}">
Papers
arxiv:2402.04291

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Published on Feb 6, 2024
ยท Submitted by
AK
on Feb 7, 2024
#2 Paper of the day
Authors:
,
,
,
,
,

Abstract

BiLLM, a 1-bit post-training quantization scheme for pretrained LLMs, achieves high-accuracy inference with reduced computational and memory requirements.

AI-generated summary

Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful compression technology, binarization can extremely reduce model weights to a mere 1 bit, lowering the expensive computation and memory requirements. However, existing quantization techniques fall short of maintaining LLM performance under ultra-low bit-widths. In response to this challenge, we present BiLLM, a groundbreaking 1-bit post-training quantization scheme tailored for pretrained LLMs. Based on the weight distribution of LLMs, BiLLM first identifies and structurally selects salient weights, and minimizes the compression loss through an effective binary residual approximation strategy. Moreover, considering the bell-shaped distribution of the non-salient weights, we propose an optimal splitting search to group and binarize them accurately. BiLLM achieving for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families and evaluation metrics, outperforms SOTA quantization methods of LLM by significant margins. Moreover, BiLLM enables the binarization process of the LLM with 7 billion weights within 0.5 hours on a single GPU, demonstrating satisfactory time efficiency.

Community

Can't wait to read this, those are some impressive numbers

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

BiLLM: Supercharge LLMs with 1-Bit Quantization! ๐Ÿš€

Links ๐Ÿ”—:

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.04291 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.04291 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.04291 in a Space README.md to link it from this page.

Collections including this paper 21