Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Beyond Decoder-only: Large Language Models Can be Good Encoders for
Machine Translation
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-13T01:35:38.201Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6933829188346863},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.06594","authors":[{"_id":"67cfd77ff8ee57c14450221b","user":{"_id":"6440b38d3e0374802e1acc5e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6440b38d3e0374802e1acc5e/w-ZpW_9gCSHUeDKyGSeMt.jpeg","isPro":false,"fullname":"luoyingfeng","user":"luoyingfeng","type":"user"},"name":"Yingfeng Luo","status":"claimed_verified","statusLastChangedAt":"2025-03-12T08:42:33.649Z","hidden":false},{"_id":"67cfd77ff8ee57c14450221c","user":{"_id":"6623ea65b642e29cdf90a1b4","avatarUrl":"/avatars/e32e90574c1162b2be87ed78604e3e4d.svg","isPro":true,"fullname":"TongZheng","user":"TongZheng1999","type":"user"},"name":"Tong Zheng","status":"claimed_verified","statusLastChangedAt":"2025-03-12T20:46:10.813Z","hidden":false},{"_id":"67cfd77ff8ee57c14450221d","name":"Yongyu Mu","hidden":false},{"_id":"67cfd77ff8ee57c14450221e","name":"Bei Li","hidden":false},{"_id":"67cfd77ff8ee57c14450221f","name":"Qinghong Zhang","hidden":false},{"_id":"67cfd77ff8ee57c144502220","name":"Yongqi Gao","hidden":false},{"_id":"67cfd77ff8ee57c144502221","name":"Ziqiang Xu","hidden":false},{"_id":"67cfd77ff8ee57c144502222","user":{"_id":"67d90d2f1fd0e259d25c61be","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ooy_5Lzh3O9gi9IEATYdA.png","isPro":false,"fullname":"Peinan Feng","user":"fpnBytedance","type":"user"},"name":"Peinan Feng","status":"claimed_verified","statusLastChangedAt":"2025-11-13T13:08:48.396Z","hidden":false},{"_id":"67cfd77ff8ee57c144502223","name":"Xiaoqian Liu","hidden":false},{"_id":"67cfd77ff8ee57c144502224","user":{"_id":"67c5249a64910c03522af447","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67c5249a64910c03522af447/A5h3SWp5AcamJkH150nPF.png","isPro":false,"fullname":"Tong Xiao","user":"neupupil","type":"user"},"name":"Tong Xiao","status":"claimed_verified","statusLastChangedAt":"2025-03-17T08:45:00.260Z","hidden":false},{"_id":"67cfd77ff8ee57c144502225","name":"Jingbo Zhu","hidden":false}],"publishedAt":"2025-03-09T12:54:05.000Z","submittedOnDailyAt":"2025-03-12T07:17:40.510Z","title":"Beyond Decoder-only: Large Language Models Can be Good Encoders for\n Machine Translation","submittedOnDailyBy":{"_id":"6440b38d3e0374802e1acc5e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6440b38d3e0374802e1acc5e/w-ZpW_9gCSHUeDKyGSeMt.jpeg","isPro":false,"fullname":"luoyingfeng","user":"luoyingfeng","type":"user"},"summary":"The field of neural machine translation (NMT) has changed with the advent of\nlarge language models (LLMs). Much of the recent emphasis in natural language\nprocessing (NLP) has been on modeling machine translation and many other\nproblems using a single pre-trained Transformer decoder, while encoder-decoder\narchitectures, which were the standard in earlier NMT models, have received\nrelatively less attention. In this paper, we explore translation models that\nare universal, efficient, and easy to optimize, by marrying the world of LLMs\nwith the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder\nunchanged. We also develop methods for adapting LLMs to work better with the\nNMT decoder. Furthermore, we construct a new dataset involving multiple tasks\nto assess how well the machine translation system generalizes across various\ntasks. Evaluations on the WMT and our datasets show that results using our\nmethod match or surpass a range of baselines in terms of translation quality,\nbut achieve 2.4 sim 6.5 times inference speedups and a 75% reduction in\nthe memory footprint of the KV cache. It also demonstrates strong\ngeneralization across a variety of translation-related tasks.","upvotes":6,"discussionId":"67cfd780f8ee57c144502268","githubRepo":"https://github.com/NiuTrans/LaMaTE","githubRepoAddedBy":"user","ai_summary":"Combining large language models with the decoder of neural machine translation models results in improved inference speed, reduced memory usage, and strong generalization across various translation tasks.","ai_keywords":["neural machine translation","large language models","Transformer","encoder-decoder","inference speedup","memory footprint","KV cache","generalization"],"githubStars":28},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63369da91ba5d5ece24118a4","avatarUrl":"/avatars/67889e1ecadb04100a77bc8b5284c6fd.svg","isPro":false,"fullname":"wuyuhao","user":"mozhu","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"6623ea65b642e29cdf90a1b4","avatarUrl":"/avatars/e32e90574c1162b2be87ed78604e3e4d.svg","isPro":true,"fullname":"TongZheng","user":"TongZheng1999","type":"user"},{"_id":"66f612b934b8ac9ffa44f084","avatarUrl":"/avatars/6836c122e19c66c90f1673f28b30d7f0.svg","isPro":false,"fullname":"Tang","user":"tommysally","type":"user"},{"_id":"67c5249a64910c03522af447","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67c5249a64910c03522af447/A5h3SWp5AcamJkH150nPF.png","isPro":false,"fullname":"Tong Xiao","user":"neupupil","type":"user"},{"_id":"6757fab5861539cddf81bedc","avatarUrl":"/avatars/a97353e71421b4e3a75e49655acb8e7b.svg","isPro":false,"fullname":"fuermotao","user":"fuermotao","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Combining large language models with the decoder of neural machine translation models results in improved inference speed, reduced memory usage, and strong generalization across various translation tasks.
AI-generated summary
The field of neural machine translation (NMT) has changed with the advent of
large language models (LLMs). Much of the recent emphasis in natural language
processing (NLP) has been on modeling machine translation and many other
problems using a single pre-trained Transformer decoder, while encoder-decoder
architectures, which were the standard in earlier NMT models, have received
relatively less attention. In this paper, we explore translation models that
are universal, efficient, and easy to optimize, by marrying the world of LLMs
with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder
unchanged. We also develop methods for adapting LLMs to work better with the
NMT decoder. Furthermore, we construct a new dataset involving multiple tasks
to assess how well the machine translation system generalizes across various
tasks. Evaluations on the WMT and our datasets show that results using our
method match or surpass a range of baselines in terms of translation quality,
but achieve 2.4 sim 6.5 times inference speedups and a 75% reduction in
the memory footprint of the KV cache. It also demonstrates strong
generalization across a variety of translation-related tasks.