@RefalMachine\n\t! Thanks for your interest!

Our supervisors didn't allow us to publish the weights of the current model, unfortunately, because it should better fit our publishing standards and have less copyright issues.

Good news, the second version of the model - trained once more time from scratch - is underway. Although it did require > 1T additional tokens to recover, it should be free of these (meaning, critical ones) issues now. We have also scaled the compute >16 times and it already yields better results both on the benchmarks and in subjective conversations.

As for the RuBIN benchmark, at this moment you can access it by request. We are planning to publish it for all to test upon.

Stay tuned! :3

\n","updatedAt":"2026-01-02T13:04:50.605Z","author":{"_id":"63177d85f957903db971a173","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png","fullname":"Artem","name":"kabachuha","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":47,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9516400098800659},"editors":["kabachuha"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png"],"reactions":[{"reaction":"🔥","users":["RefalMachine"],"count":1}],"isReport":false,"parentCommentId":"6957afd5df172c4fd94e4b5f"}}]},{"id":"6963908cdf2a458239435ba5","author":{"_id":"63177d85f957903db971a173","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png","fullname":"Artem","name":"kabachuha","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":47,"isUserFollowing":false},"createdAt":"2026-01-11T11:59:08.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@librarian-bot recommend","html":"

\n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-01-11T11:59:08.462Z","author":{"_id":"63177d85f957903db971a173","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png","fullname":"Artem","name":"kabachuha","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":47,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7918877601623535},"editors":["kabachuha"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png"],"reactions":[],"isReport":false}},{"id":"69972096e22a13ce5361cd5c","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-19T14:39:18.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages](https://huggingface.co/papers/2601.06395) (2026)\n* [Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs](https://huggingface.co/papers/2602.02104) (2026)\n* [\\\"UberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset](https://huggingface.co/papers/2602.15210) (2026)\n* [Bielik 11B v3: Multilingual Large Language Model for European Languages](https://huggingface.co/papers/2601.11579) (2025)\n* [BYOL: Bring Your Own Language Into LLMs](https://huggingface.co/papers/2601.10804) (2026)\n* [TabiBERT: A Large-Scale ModernBERT Foundation Model and A Unified Benchmark for Turkish](https://huggingface.co/papers/2512.23065) (2025)\n* [Kakugo: Distillation of Low-Resource Languages into Small Language Models](https://huggingface.co/papers/2601.14051) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-19T14:39:18.604Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7103052139282227},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2512.21580","authors":[{"_id":"6953818489916ff627aa40d1","user":{"_id":"68a6fe2dc0b7bcb6cfa40451","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/7rZ73pHnSDB0Zb8urk47A.jpeg","isPro":false,"fullname":"Alexander Podolskiy","user":"RapscallionA","type":"user"},"name":"Alexander Podolskiy","status":"claimed_verified","statusLastChangedAt":"2025-12-31T20:55:57.097Z","hidden":false},{"_id":"6953818489916ff627aa40d2","name":"Semen Molokov","hidden":false},{"_id":"6953818489916ff627aa40d3","name":"Timofey Gerasin","hidden":false},{"_id":"6953818489916ff627aa40d4","name":"Maksim Titov","hidden":false},{"_id":"6953818489916ff627aa40d5","name":"Alexey Rukhovich","hidden":false},{"_id":"6953818489916ff627aa40d6","user":{"_id":"63177d85f957903db971a173","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png","isPro":false,"fullname":"Artem","user":"kabachuha","type":"user"},"name":"Artem Khrapov","status":"claimed_verified","statusLastChangedAt":"2025-12-31T20:55:59.534Z","hidden":false},{"_id":"6953818489916ff627aa40d7","name":"Kirill Morozov","hidden":false},{"_id":"6953818489916ff627aa40d8","name":"Evgeny Tetin","hidden":false},{"_id":"6953818489916ff627aa40d9","name":"Constantine Korikov","hidden":false},{"_id":"6953818489916ff627aa40da","name":"Pavel Efimov","hidden":false},{"_id":"6953818489916ff627aa40db","name":"Polina Lazukova","hidden":false},{"_id":"6953818489916ff627aa40dc","name":"Yuliya Skripkar","hidden":false},{"_id":"6953818489916ff627aa40dd","name":"Nikita Okhotnikov","hidden":false},{"_id":"6953818489916ff627aa40de","name":"Irina Piontkovskaya","hidden":false},{"_id":"6953818489916ff627aa40df","name":"Meng Xiaojun","hidden":false},{"_id":"6953818489916ff627aa40e0","name":"Zou Xueyi","hidden":false},{"_id":"6953818489916ff627aa40e1","name":"Zhang Zhenhe","hidden":false}],"publishedAt":"2025-12-25T08:52:23.000Z","title":"Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM","summary":"We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Gamayun addresses the lack of research on small non-English-centric LLMs by adopting a novel two-stage pre-training strategy: balanced multilingual training for cross-lingual alignment, followed by high-quality English enrichment to transfer performance gains across languages. Our model supports 12 languages, with special focus on Russian. Despite a significantly smaller training budget than comparable models, Gamayun outperforms LLaMA3.2-1B (9T tokens) on all considered benchmarks, and surpasses Qwen2.5-1.5B (18T tokens) on a wide range of English and multilingual tasks. It matches or exceeds Qwen3 (36T tokens) on most tasks outside advanced STEM, achieving state-of-the-art results in Russian, including the MERA benchmark, among the models of comparable size (1-2B parameters).","upvotes":8,"discussionId":"6953818489916ff627aa40e2"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"679770b6cbb6655a3c93eb43","avatarUrl":"/avatars/5bbccf36af7d4dae2028079b95692f94.svg","isPro":false,"fullname":"Areg Barseghyan","user":"aregbars","type":"user"},{"_id":"6320782d8a09d76e80def193","avatarUrl":"/avatars/56aa9bacad777bcd7bf7edaa26a3e6e2.svg","isPro":false,"fullname":"Gregory Polyakov","user":"polgrisha","type":"user"},{"_id":"6167fcea081d4f73c3ae4a3d","avatarUrl":"/avatars/db626a81169b1efd188dfb749095ca3f.svg","isPro":false,"fullname":"Alexander Podolskiy","user":"APodolskiy","type":"user"},{"_id":"65ea03c4e36ad838b818561a","avatarUrl":"/avatars/d8c11f853dbb0dad4767a5601f89ffef.svg","isPro":false,"fullname":"tim","user":"timo13113","type":"user"},{"_id":"63177d85f957903db971a173","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png","isPro":false,"fullname":"Artem","user":"kabachuha","type":"user"},{"_id":"652cedbdf120598322ae358a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/652cedbdf120598322ae358a/RrxrP0gtQus4SfNwfyAg_.jpeg","isPro":false,"fullname":"Mikhail","user":"RefalMachine","type":"user"},{"_id":"630920925a5c889aaedc7f33","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630920925a5c889aaedc7f33/w00N19M21l2FXe6ZasSYc.jpeg","isPro":false,"fullname":"Kristaller486","user":"kristaller486","type":"user"},{"_id":"660fd34df03515e4ff3f2b64","avatarUrl":"/avatars/0c2a29b1081ece881234acdd8ef9371a.svg","isPro":false,"fullname":"Georgii Aparin","user":"Egorgij21","type":"user"}],"acceptLanguages":["*"]}">

Papers

arxiv:2512.21580

Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM

Published on Dec 25, 2025

Upvote

Authors:

Alexander Podolskiy ,

Artem Khrapov ,

Abstract

We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Gamayun addresses the lack of research on small non-English-centric LLMs by adopting a novel two-stage pre-training strategy: balanced multilingual training for cross-lingual alignment, followed by high-quality English enrichment to transfer performance gains across languages. Our model supports 12 languages, with special focus on Russian. Despite a significantly smaller training budget than comparable models, Gamayun outperforms LLaMA3.2-1B (9T tokens) on all considered benchmarks, and surpasses Qwen2.5-1.5B (18T tokens) on a wide range of English and multilingual tasks. It matches or exceeds Qwen3 (36T tokens) on most tasks outside advanced STEM, achieving state-of-the-art results in Russian, including the MERA benchmark, among the models of comparable size (1-2B parameters).

View arXiv page View PDF Add to collection

Community

RefalMachine

Jan 2

Hello! Great work, the new Russian pretrain is inspiring! I wanted to know if you plan to release: 1. the model 2. the Rubin dataset?

kabachuha

Paper author Jan 2

•

edited Jan 2

Hello, @RefalMachine ! Thanks for your interest!

Our supervisors didn't allow us to publish the weights of the current model, unfortunately, because it should better fit our publishing standards and have less copyright issues.

As for the RuBIN benchmark, at this moment you can access it by request. We are planning to publish it for all to test upon.

Stay tuned! :3