https://huggingface.co/collections/kakaocorp/kanana-nano-21b-67a326cda1c449c8d4172259
github: https://github.com/kakao/kanana

\n","updatedAt":"2025-02-27T05:26:26.900Z","author":{"_id":"60436d159e905013ae8715d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1623809612769-60436d159e905013ae8715d7.jpeg","fullname":"Minho Ryu","name":"bzantium","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.6788538098335266},"editors":["bzantium"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1623809612769-60436d159e905013ae8715d7.jpeg"],"reactions":[],"isReport":false}},{"id":"67c112afd8247a49b8feb763","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2025-02-28T01:34:39.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning](https://huggingface.co/papers/2502.11573) (2025)\n* [UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings](https://huggingface.co/papers/2502.16961) (2025)\n* [From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning](https://huggingface.co/papers/2501.11877) (2025)\n* [Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe](https://huggingface.co/papers/2502.09056) (2025)\n* [Multilingual Language Model Pretraining using Machine-translated Data](https://huggingface.co/papers/2502.13252) (2025)\n* [The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities](https://huggingface.co/papers/2501.13921) (2025)\n* [Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study](https://huggingface.co/papers/2502.02481) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-02-28T01:34:39.378Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7262100577354431},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2502.18934","authors":[{"_id":"67bfe1bf4426925c82fe5953","name":"Kanana LLM Team","hidden":false},{"_id":"67bfe1bf4426925c82fe5954","user":{"_id":"64d08bd75de9e1e911b24226","avatarUrl":"/avatars/e572bb47659393573a0c1fb3d333dd7b.svg","isPro":false,"fullname":"Yunju Bak","user":"yunjubak63","type":"user"},"name":"Yunju Bak","status":"admin_assigned","statusLastChangedAt":"2025-02-27T12:55:35.505Z","hidden":false},{"_id":"67bfe1bf4426925c82fe5955","name":"Hojin Lee","hidden":false},{"_id":"67bfe1bf4426925c82fe5956","user":{"_id":"60436d159e905013ae8715d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1623809612769-60436d159e905013ae8715d7.jpeg","isPro":false,"fullname":"Minho Ryu","user":"bzantium","type":"user"},"name":"Minho Ryu","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:14:17.979Z","hidden":false},{"_id":"67bfe1bf4426925c82fe5957","user":{"_id":"66ebb4fdc5b2c25450fd17de","avatarUrl":"/avatars/e6b40dcbe2eba838ba21be9221758a3c.svg","isPro":false,"fullname":"Jiyeon Ham","user":"jiyeonham","type":"user"},"name":"Jiyeon Ham","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:14:11.786Z","hidden":false},{"_id":"67bfe1bf4426925c82fe5958","name":"Seungjae Jung","hidden":false},{"_id":"67bfe1bf4426925c82fe5959","user":{"_id":"66c82a50c1b3c03c61aea140","avatarUrl":"/avatars/3c508f96bdca2f2ce9746d3decd4718e.svg","isPro":false,"fullname":"daniel nam","user":"daniel-rl2","type":"user"},"name":"Daniel Wontae Nam","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:14:09.613Z","hidden":false},{"_id":"67bfe1bf4426925c82fe595a","name":"Taegyeong Eo","hidden":false},{"_id":"67bfe1bf4426925c82fe595b","name":"Donghun Lee","hidden":false},{"_id":"67bfe1bf4426925c82fe595c","user":{"_id":"6142e17fe9e656d4459121e4","avatarUrl":"/avatars/6baebd4598a845ec7fdb735eb0d53139.svg","isPro":false,"fullname":"Doohae Jung","user":"Doohae","type":"user"},"name":"Doohae Jung","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:14:06.858Z","hidden":false},{"_id":"67bfe1bf4426925c82fe595d","user":{"_id":"60f559be68ee3ef098e407cf","avatarUrl":"/avatars/e1f00ff1c1c9fa7f591535d39c7d5e44.svg","isPro":false,"fullname":"Boseop Kim","user":"seopbo","type":"user"},"name":"Boseop Kim","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:14:01.989Z","hidden":false},{"_id":"67bfe1bf4426925c82fe595e","user":{"_id":"6605028007a154c768e1c4c7","avatarUrl":"/avatars/88678edb83fdb466067e38acd22d07de.svg","isPro":false,"fullname":"Nayeon Kim","user":"lana-ny","type":"user"},"name":"Nayeon Kim","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:14:13.867Z","hidden":false},{"_id":"67bfe1bf4426925c82fe595f","user":{"_id":"6136f65440e43b8f748a0833","avatarUrl":"/avatars/f72a5ae3d3e94485de8aed8df94abdad.svg","isPro":false,"fullname":"Jaesun Park","user":"jaesun","type":"user"},"name":"Jaesun Park","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:14:15.898Z","hidden":false},{"_id":"67bfe1bf4426925c82fe5960","name":"Hyunho Kim","hidden":false},{"_id":"67bfe1bf4426925c82fe5961","user":{"_id":"5fd888cf61e46993190ce543","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1634604273263-5fd888cf61e46993190ce543.jpeg","isPro":false,"fullname":"Hyunwoong Ko","user":"hyunwoongko","type":"user"},"name":"Hyunwoong Ko","status":"admin_assigned","statusLastChangedAt":"2025-02-27T12:58:05.546Z","hidden":false},{"_id":"67bfe1bf4426925c82fe5962","user":{"_id":"63d268bb57ab367124ea7b75","avatarUrl":"/avatars/11312cde1e9f077aa9e5103b48be5de6.svg","isPro":false,"fullname":"Changmin Lee","user":"changminlee","type":"user"},"name":"Changmin Lee","status":"claimed_verified","statusLastChangedAt":"2025-02-27T09:14:04.506Z","hidden":false},{"_id":"67bfe1bf4426925c82fe5963","user":{"_id":"62bd31e1d2c8a6542f53fcba","avatarUrl":"/avatars/4ac18a7bcaf9dd3885b0478dea90818f.svg","isPro":false,"fullname":"Kyoung-Woon On","user":"kloud","type":"user"},"name":"Kyoung-Woon On","status":"admin_assigned","statusLastChangedAt":"2025-02-27T12:58:11.269Z","hidden":false},{"_id":"67bfe1bf4426925c82fe5964","name":"Seulye Baeg","hidden":false},{"_id":"67bfe1bf4426925c82fe5965","name":"Junrae Cho","hidden":false},{"_id":"67bfe1bf4426925c82fe5966","user":{"_id":"65e30342e8b017ee1384824c","avatarUrl":"/avatars/e5d07b037f611ccfaf719959d971d102.svg","isPro":false,"fullname":"Sunghee Jung","user":"hash2430","type":"user"},"name":"Sunghee Jung","status":"claimed_verified","statusLastChangedAt":"2025-04-03T08:29:34.214Z","hidden":false},{"_id":"67bfe1bf4426925c82fe5967","name":"Jieun Kang","hidden":false},{"_id":"67bfe1bf4426925c82fe5968","name":"EungGyun Kim","hidden":false},{"_id":"67bfe1bf4426925c82fe5969","name":"Eunhwa Kim","hidden":false},{"_id":"67bfe1bf4426925c82fe596a","name":"Byeongil Ko","hidden":false},{"_id":"67bfe1bf4426925c82fe596b","name":"Daniel Lee","hidden":false},{"_id":"67bfe1bf4426925c82fe596c","name":"Minchul Lee","hidden":false},{"_id":"67bfe1bf4426925c82fe596d","name":"Miok Lee","hidden":false},{"_id":"67bfe1bf4426925c82fe596e","name":"Shinbok Lee","hidden":false},{"_id":"67bfe1bf4426925c82fe596f","user":{"_id":"63148a8f5f47a18962765802","avatarUrl":"/avatars/bc58a863727794006dddf758efa09411.svg","isPro":false,"fullname":"gaeunseo","user":"gaeunseo","type":"user"},"name":"Gaeun Seo","status":"admin_assigned","statusLastChangedAt":"2025-02-27T12:59:39.670Z","hidden":false}],"publishedAt":"2025-02-26T08:36:20.000Z","submittedOnDailyAt":"2025-02-27T01:35:13.440Z","title":"Kanana: Compute-efficient Bilingual Language Models","submittedOnDailyBy":{"_id":"60436d159e905013ae8715d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1623809612769-60436d159e905013ae8715d7.jpeg","isPro":false,"fullname":"Minho Ryu","user":"bzantium","type":"user"},"summary":"We introduce Kanana, a series of bilingual language models that demonstrate\nexceeding performance in Korean and competitive performance in English. The\ncomputational cost of Kanana is significantly lower than that of\nstate-of-the-art models of similar size. The report details the techniques\nemployed during pre-training to achieve compute-efficient yet competitive\nmodels, including high quality data filtering, staged pre-training, depth\nup-scaling, and pruning and distillation. Furthermore, the report outlines the\nmethodologies utilized during the post-training of the Kanana models,\nencompassing supervised fine-tuning and preference optimization, aimed at\nenhancing their capability for seamless interaction with users. Lastly, the\nreport elaborates on plausible approaches used for language model adaptation to\nspecific scenarios, such as embedding, retrieval augmented generation, and\nfunction calling. The Kanana model series spans from 2.1B to 32.5B parameters\nwith 2.1B models (base, instruct, embedding) publicly released to promote\nresearch on Korean language models.","upvotes":65,"discussionId":"67bfe1c04426925c82fe59a1","projectPage":"https://huggingface.co/kakaocorp","githubRepo":"https://github.com/kakao/kanana","githubRepoAddedBy":"auto","ai_summary":"Kanana, a series of bilingual language models, achieves superior performance in Korean and competitive performance in English with lower computational costs through efficient pre-training and post-training techniques.","ai_keywords":["high quality data filtering","staged pre-training","depth up-scaling","pruning","distillation","supervised fine-tuning","preference optimization","embedding","retrieval augmented generation","function calling"],"githubStars":279},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6142e17fe9e656d4459121e4","avatarUrl":"/avatars/6baebd4598a845ec7fdb735eb0d53139.svg","isPro":false,"fullname":"Doohae Jung","user":"Doohae","type":"user"},{"_id":"66c82a50c1b3c03c61aea140","avatarUrl":"/avatars/3c508f96bdca2f2ce9746d3decd4718e.svg","isPro":false,"fullname":"daniel nam","user":"daniel-rl2","type":"user"},{"_id":"670dfa5f7625d0f5c6bbf61e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/f5oVw_777mhNJ-dNE3gTC.png","isPro":false,"fullname":"chloe-py","user":"chloe-py","type":"user"},{"_id":"60436d159e905013ae8715d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1623809612769-60436d159e905013ae8715d7.jpeg","isPro":false,"fullname":"Minho Ryu","user":"bzantium","type":"user"},{"_id":"66c2ea881ea0a61c6cc0142e","avatarUrl":"/avatars/0f7f0bf1217be5be50198c33b3729db7.svg","isPro":false,"fullname":"wavy-jung","user":"wavy-jung","type":"user"},{"_id":"60f559be68ee3ef098e407cf","avatarUrl":"/avatars/e1f00ff1c1c9fa7f591535d39c7d5e44.svg","isPro":false,"fullname":"Boseop Kim","user":"seopbo","type":"user"},{"_id":"63be91f74a2beec6555f167f","avatarUrl":"/avatars/2b0f02acfa976e72b2d2166c96a49e3b.svg","isPro":false,"fullname":"Hojin Lee","user":"hjlee1371","type":"user"},{"_id":"66e0ef48e11ef4dcc10a3fbf","avatarUrl":"/avatars/4b2ee28e2e2d922cabcaaed6ccaea9f2.svg","isPro":false,"fullname":"juyoung","user":"michael-jy","type":"user"},{"_id":"66e0f1f39ba10fc995e9a8d4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66e0f1f39ba10fc995e9a8d4/U9uotrg3RSKvxLL1jTZtk.jpeg","isPro":false,"fullname":"jason.gk","user":"jason-gk","type":"user"},{"_id":"66e0f557dd56688c29635e3d","avatarUrl":"/avatars/b621ab445d8c014a934b6d7dff2f88e2.svg","isPro":false,"fullname":"peter.roh","user":"peterroh","type":"user"},{"_id":"66825fefe642442e0dd16ea1","avatarUrl":"/avatars/9835de139672e595fdf6e267774fdfe1.svg","isPro":false,"fullname":"lee jeehye","user":"jessie-e","type":"user"},{"_id":"6729dfd6286bcc483b618eed","avatarUrl":"/avatars/1d41f6490f6f45ddcf00086d2d5e9847.svg","isPro":false,"fullname":"KIMHOON","user":"hoonkim73","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":2}">

Papers

arxiv:2502.18934

Kanana: Compute-efficient Bilingual Language Models

Published on Feb 26, 2025

· Submitted by

Minho Ryu on Feb 27, 2025

#2 Paper of the day

Upvote

Authors:

Yunju Bak ,

Minho Ryu ,

Jiyeon Ham ,

Daniel Wontae Nam ,

Doohae Jung ,

Boseop Kim ,

Nayeon Kim ,

Jaesun Park ,

Hyunwoong Ko ,

Changmin Lee ,

Kyoung-Woon On ,

Sunghee Jung ,

Abstract

Kanana, a series of bilingual language models, achieves superior performance in Korean and competitive performance in English with lower computational costs through efficient pre-training and post-training techniques.

AI-generated summary

We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, retrieval augmented generation, and function calling. The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding) publicly released to promote research on Korean language models.