Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - From PEFT to DEFT: Parameter Efficient Finetuning for Reducing
Activation Density in Transformers
\n","updatedAt":"2024-02-26T05:24:58.147Z","author":{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","fullname":"Sugato Ray","name":"sugatoray","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":46,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6294185519218445},"editors":["sugatoray"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.01911","authors":[{"_id":"65dc1fc8ff744e508bd74332","user":{"_id":"60d4c068b102039f04db9021","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60d4c068b102039f04db9021/cHgd6_cI_7MxAjJ9iG1e1.jpeg","isPro":false,"fullname":"Bharat Runwal","user":"bharatR","type":"user"},"name":"Bharat Runwal","status":"extracted_confirmed","statusLastChangedAt":"2024-02-26T05:29:21.737Z","hidden":false},{"_id":"65dc1fc8ff744e508bd74333","name":"Tejaswini Pedapati","hidden":false},{"_id":"65dc1fc8ff744e508bd74334","name":"Pin-Yu Chen","hidden":false}],"publishedAt":"2024-02-02T21:25:46.000Z","title":"From PEFT to DEFT: Parameter Efficient Finetuning for Reducing\n Activation Density in Transformers","summary":"Pretrained Language Models (PLMs) have become the de facto starting point for\nfine-tuning on downstream tasks. However, as model sizes continue to increase,\ntraditional fine-tuning of all parameters becomes challenging. To address this,\nparameter-efficient fine-tuning (PEFT) methods have gained popularity as a\nmeans to adapt PLMs effectively. In parallel, recent studies have revealed the\npresence of activation sparsity within the intermediate outputs of the\nmultilayer perception (MLP) blocks in transformers. Low activation density\nenables efficient model inference on sparsity-aware hardware. Building upon\nthis insight, in this work, we propose a novel density loss that encourages\nhigher activation sparsity (equivalently, lower activation density) in the\npre-trained models. We demonstrate the effectiveness of our approach by\nutilizing mainstream PEFT techniques including QLoRA, LoRA, Adapter,\nPrompt/Prefix Tuning to facilitate efficient model adaptation across diverse\ndownstream tasks. Experiments show that our proposed method DEFT,\nDensity-Efficient Fine-Tuning, can reduce the activation density consistently\nand up to 50.72% on RoBERTa_Large, and 53.19% (encoder density) and 90.60% (decoder density) on\nFlan-T5_XXL (11B) compared to PEFT using GLUE and QA\n(SQuAD) benchmarks respectively while maintaining competitive performance on\ndownstream tasks. We also showcase that DEFT works complementary with quantized\nand pruned models","upvotes":2,"discussionId":"65dc1fc9ff744e508bd74479","githubRepo":"https://github.com/ibm/deft","githubRepoAddedBy":"auto","ai_summary":"A density-efficient fine-tuning method enhances pre-trained language models by encouraging activation sparsity, improving inference efficiency without significantly affecting performance.","ai_keywords":["pretrained language models","parameter-efficient fine-tuning","activation sparsity","multilayer perception","MLP","transformers","density loss","QLoRA","LoRA","Adapter","Prompt/Prefix Tuning","DEFT","density-efficient fine-tuning","RoBERTa","Flan-T5","GLUE","QA","SQuAD","quantized models","pruned models"],"githubStars":7},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","isPro":true,"fullname":"Sugato Ray","user":"sugatoray","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"}],"acceptLanguages":["*"]}">
A density-efficient fine-tuning method enhances pre-trained language models by encouraging activation sparsity, improving inference efficiency without significantly affecting performance.
AI-generated summary
Pretrained Language Models (PLMs) have become the de facto starting point for
fine-tuning on downstream tasks. However, as model sizes continue to increase,
traditional fine-tuning of all parameters becomes challenging. To address this,
parameter-efficient fine-tuning (PEFT) methods have gained popularity as a
means to adapt PLMs effectively. In parallel, recent studies have revealed the
presence of activation sparsity within the intermediate outputs of the
multilayer perception (MLP) blocks in transformers. Low activation density
enables efficient model inference on sparsity-aware hardware. Building upon
this insight, in this work, we propose a novel density loss that encourages
higher activation sparsity (equivalently, lower activation density) in the
pre-trained models. We demonstrate the effectiveness of our approach by
utilizing mainstream PEFT techniques including QLoRA, LoRA, Adapter,
Prompt/Prefix Tuning to facilitate efficient model adaptation across diverse
downstream tasks. Experiments show that our proposed method DEFT,
Density-Efficient Fine-Tuning, can reduce the activation density consistently
and up to 50.72% on RoBERTa_Large, and 53.19% (encoder density) and 90.60% (decoder density) on
Flan-T5_XXL (11B) compared to PEFT using GLUE and QA
(SQuAD) benchmarks respectively while maintaining competitive performance on
downstream tasks. We also showcase that DEFT works complementary with quantized
and pruned models