Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-02-14T01:34:31.330Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7545871734619141},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2502.04411","authors":[{"_id":"67adad972883187d78409a7a","name":"Kunfeng Lai","hidden":false},{"_id":"67adad972883187d78409a7b","name":"Zhenheng Tang","hidden":false},{"_id":"67adad972883187d78409a7c","name":"Xinglin Pan","hidden":false},{"_id":"67adad972883187d78409a7d","user":{"_id":"6395f845aec00abff778ad31","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6395f845aec00abff778ad31/bZkAlchSvqER1HgBKmcHI.jpeg","isPro":false,"fullname":"PeijieDong","user":"pprp","type":"user"},"name":"Peijie Dong","status":"claimed_verified","statusLastChangedAt":"2025-06-02T07:53:30.947Z","hidden":false},{"_id":"67adad972883187d78409a7e","user":{"_id":"63024676056ec3a2a8714b24","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1661093436322-noauth.jpeg","isPro":false,"fullname":"Xiang Liu","user":"Dominic789654","type":"user"},"name":"Xiang Liu","status":"claimed_verified","statusLastChangedAt":"2025-02-13T08:45:17.030Z","hidden":false},{"_id":"67adad972883187d78409a7f","name":"Haolan Chen","hidden":false},{"_id":"67adad972883187d78409a80","name":"Li Shen","hidden":false},{"_id":"67adad972883187d78409a81","name":"Bo Li","hidden":false},{"_id":"67adad972883187d78409a82","name":"Xiaowen Chu","hidden":false}],"publishedAt":"2025-02-06T11:26:30.000Z","submittedOnDailyAt":"2025-02-13T06:00:35.137Z","title":"Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and\n Uncertainty Based Routing","submittedOnDailyBy":{"_id":"63024676056ec3a2a8714b24","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1661093436322-noauth.jpeg","isPro":false,"fullname":"Xiang Liu","user":"Dominic789654","type":"user"},"summary":"Model merging aggregates Large Language Models (LLMs) finetuned on different\ntasks into a stronger one. However, parameter conflicts between models leads to\nperformance degradation in averaging. While model routing addresses this issue\nby selecting individual models during inference, it imposes excessive storage\nand compute costs, and fails to leverage the common knowledge from different\nmodels. In this work, we observe that different layers exhibit varying levels\nof parameter conflicts. Building on this insight, we average layers with\nminimal parameter conflicts and use a novel task-level expert routing for\nlayers with significant conflicts. To further reduce storage costs, inspired by\ntask arithmetic sparsity, we decouple multiple fine-tuned experts into a dense\nexpert and several sparse experts. Considering the out-of-distribution samples,\nwe select and merge appropriate experts based on the task uncertainty of the\ninput data. We conduct extensive experiments on both LLaMA and Qwen with\nvarying parameter scales, and evaluate on real-world reasoning tasks. Results\ndemonstrate that our method consistently achieves significant performance\nimprovements while requiring less system cost compared to existing methods.","upvotes":4,"discussionId":"67adad992883187d78409aa8","ai_summary":"A method for merging Large Language Models leverages layer-wise averaging and task-level expert routing to improve performance and reduce system cost.","ai_keywords":["Model merging","Large Language Models","LLMs","parameter conflicts","model routing","layer-wise averaging","task-level expert routing","task arithmetic sparsity","dense expert","sparse experts","out-of-distribution samples","task uncertainty","LLaMA","Qwen"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63024676056ec3a2a8714b24","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1661093436322-noauth.jpeg","isPro":false,"fullname":"Xiang Liu","user":"Dominic789654","type":"user"},{"_id":"6395f845aec00abff778ad31","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6395f845aec00abff778ad31/bZkAlchSvqER1HgBKmcHI.jpeg","isPro":false,"fullname":"PeijieDong","user":"pprp","type":"user"},{"_id":"66f612b934b8ac9ffa44f084","avatarUrl":"/avatars/6836c122e19c66c90f1673f28b30d7f0.svg","isPro":false,"fullname":"Tang","user":"tommysally","type":"user"},{"_id":"6640bbd0220cfa8cbfdce080","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6640bbd0220cfa8cbfdce080/wiAHUu5ewawyipNs0YFBR.png","isPro":true,"fullname":"John Smith","user":"John6666","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2502.04411

Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing

Published on Feb 6, 2025
· Submitted by
Xiang Liu
on Feb 13, 2025
Authors:
,
,
,
,
,
,

Abstract

A method for merging Large Language Models leverages layer-wise averaging and task-level expert routing to improve performance and reduce system cost.

AI-generated summary

Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by selecting individual models during inference, it imposes excessive storage and compute costs, and fails to leverage the common knowledge from different models. In this work, we observe that different layers exhibit varying levels of parameter conflicts. Building on this insight, we average layers with minimal parameter conflicts and use a novel task-level expert routing for layers with significant conflicts. To further reduce storage costs, inspired by task arithmetic sparsity, we decouple multiple fine-tuned experts into a dense expert and several sparse experts. Considering the out-of-distribution samples, we select and merge appropriate experts based on the task uncertainty of the input data. We conduct extensive experiments on both LLaMA and Qwen with varying parameter scales, and evaluate on real-world reasoning tasks. Results demonstrate that our method consistently achieves significant performance improvements while requiring less system cost compared to existing methods.

Community

Paper author Paper submitter

Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by selecting individual models during inference, it imposes excessive storage and compute costs, and fails to leverage the common knowledge from different models. In this work, we observe that different layers exhibit varying levels of parameter conflicts. Building on this insight, we average layers with minimal parameter conflicts and use a novel task-level expert routing for layers with significant conflicts. To further reduce storage costs, inspired by task arithmetic sparsity, we decouple multiple fine-tuned experts into a dense expert and several sparse experts. Considering the out-of-distribution samples, we select and merge appropriate experts based on the task uncertainty of the input data. We conduct extensive experiments on both LLaMA and Qwen with varying parameter scales, and evaluate on real-world reasoning tasks. Results demonstrate that our method consistently achieves significant performance improvements while requiring less system cost compared to existing methods.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.04411 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.04411 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.04411 in a Space README.md to link it from this page.

Collections including this paper 2