Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-03-13T01:35:38.201Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6933829188346863},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.06594","authors":[{"_id":"67cfd77ff8ee57c14450221b","user":{"_id":"6440b38d3e0374802e1acc5e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6440b38d3e0374802e1acc5e/w-ZpW_9gCSHUeDKyGSeMt.jpeg","isPro":false,"fullname":"luoyingfeng","user":"luoyingfeng","type":"user"},"name":"Yingfeng Luo","status":"claimed_verified","statusLastChangedAt":"2025-03-12T08:42:33.649Z","hidden":false},{"_id":"67cfd77ff8ee57c14450221c","user":{"_id":"6623ea65b642e29cdf90a1b4","avatarUrl":"/avatars/e32e90574c1162b2be87ed78604e3e4d.svg","isPro":true,"fullname":"TongZheng","user":"TongZheng1999","type":"user"},"name":"Tong Zheng","status":"claimed_verified","statusLastChangedAt":"2025-03-12T20:46:10.813Z","hidden":false},{"_id":"67cfd77ff8ee57c14450221d","name":"Yongyu Mu","hidden":false},{"_id":"67cfd77ff8ee57c14450221e","name":"Bei Li","hidden":false},{"_id":"67cfd77ff8ee57c14450221f","name":"Qinghong Zhang","hidden":false},{"_id":"67cfd77ff8ee57c144502220","name":"Yongqi Gao","hidden":false},{"_id":"67cfd77ff8ee57c144502221","name":"Ziqiang Xu","hidden":false},{"_id":"67cfd77ff8ee57c144502222","user":{"_id":"67d90d2f1fd0e259d25c61be","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ooy_5Lzh3O9gi9IEATYdA.png","isPro":false,"fullname":"Peinan Feng","user":"fpnBytedance","type":"user"},"name":"Peinan Feng","status":"claimed_verified","statusLastChangedAt":"2025-11-13T13:08:48.396Z","hidden":false},{"_id":"67cfd77ff8ee57c144502223","name":"Xiaoqian Liu","hidden":false},{"_id":"67cfd77ff8ee57c144502224","user":{"_id":"67c5249a64910c03522af447","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67c5249a64910c03522af447/A5h3SWp5AcamJkH150nPF.png","isPro":false,"fullname":"Tong Xiao","user":"neupupil","type":"user"},"name":"Tong Xiao","status":"claimed_verified","statusLastChangedAt":"2025-03-17T08:45:00.260Z","hidden":false},{"_id":"67cfd77ff8ee57c144502225","name":"Jingbo Zhu","hidden":false}],"publishedAt":"2025-03-09T12:54:05.000Z","submittedOnDailyAt":"2025-03-12T07:17:40.510Z","title":"Beyond Decoder-only: Large Language Models Can be Good Encoders for\n Machine Translation","submittedOnDailyBy":{"_id":"6440b38d3e0374802e1acc5e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6440b38d3e0374802e1acc5e/w-ZpW_9gCSHUeDKyGSeMt.jpeg","isPro":false,"fullname":"luoyingfeng","user":"luoyingfeng","type":"user"},"summary":"The field of neural machine translation (NMT) has changed with the advent of\nlarge language models (LLMs). Much of the recent emphasis in natural language\nprocessing (NLP) has been on modeling machine translation and many other\nproblems using a single pre-trained Transformer decoder, while encoder-decoder\narchitectures, which were the standard in earlier NMT models, have received\nrelatively less attention. In this paper, we explore translation models that\nare universal, efficient, and easy to optimize, by marrying the world of LLMs\nwith the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder\nunchanged. We also develop methods for adapting LLMs to work better with the\nNMT decoder. Furthermore, we construct a new dataset involving multiple tasks\nto assess how well the machine translation system generalizes across various\ntasks. Evaluations on the WMT and our datasets show that results using our\nmethod match or surpass a range of baselines in terms of translation quality,\nbut achieve 2.4 sim 6.5 times inference speedups and a 75% reduction in\nthe memory footprint of the KV cache. It also demonstrates strong\ngeneralization across a variety of translation-related tasks.","upvotes":6,"discussionId":"67cfd780f8ee57c144502268","githubRepo":"https://github.com/NiuTrans/LaMaTE","githubRepoAddedBy":"user","ai_summary":"Combining large language models with the decoder of neural machine translation models results in improved inference speed, reduced memory usage, and strong generalization across various translation tasks.","ai_keywords":["neural machine translation","large language models","Transformer","encoder-decoder","inference speedup","memory footprint","KV cache","generalization"],"githubStars":28},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63369da91ba5d5ece24118a4","avatarUrl":"/avatars/67889e1ecadb04100a77bc8b5284c6fd.svg","isPro":false,"fullname":"wuyuhao","user":"mozhu","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"6623ea65b642e29cdf90a1b4","avatarUrl":"/avatars/e32e90574c1162b2be87ed78604e3e4d.svg","isPro":true,"fullname":"TongZheng","user":"TongZheng1999","type":"user"},{"_id":"66f612b934b8ac9ffa44f084","avatarUrl":"/avatars/6836c122e19c66c90f1673f28b30d7f0.svg","isPro":false,"fullname":"Tang","user":"tommysally","type":"user"},{"_id":"67c5249a64910c03522af447","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67c5249a64910c03522af447/A5h3SWp5AcamJkH150nPF.png","isPro":false,"fullname":"Tong Xiao","user":"neupupil","type":"user"},{"_id":"6757fab5861539cddf81bedc","avatarUrl":"/avatars/a97353e71421b4e3a75e49655acb8e7b.svg","isPro":false,"fullname":"fuermotao","user":"fuermotao","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2503.06594

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

Published on Mar 9, 2025
· Submitted by
luoyingfeng
on Mar 12, 2025
Authors:
,
,
,
,
,
,

Abstract

Combining large language models with the decoder of neural machine translation models results in improved inference speed, reduced memory usage, and strong generalization across various translation tasks.

AI-generated summary

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve 2.4 sim 6.5 times inference speedups and a 75% reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.

Community

Paper author Paper submitter

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.06594 in a Space README.md to link it from this page.

Collections including this paper 1