Our supervisors didn't allow us to publish the weights of the current model, unfortunately, because it should better fit our publishing standards and have less copyright issues.
\nGood news, the second version of the model - trained once more time from scratch - is underway. Although it did require > 1T additional tokens to recover, it should be free of these (meaning, critical ones) issues now. We have also scaled the compute >16 times and it already yields better results both on the benchmarks and in subjective conversations.
\nAs for the RuBIN benchmark, at this moment you can access it by request. We are planning to publish it for all to test upon.
\nStay tuned! :3
\n","updatedAt":"2026-01-02T13:04:50.605Z","author":{"_id":"63177d85f957903db971a173","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png","fullname":"Artem","name":"kabachuha","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":47,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9516400098800659},"editors":["kabachuha"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png"],"reactions":[{"reaction":"🔥","users":["RefalMachine"],"count":1}],"isReport":false,"parentCommentId":"6957afd5df172c4fd94e4b5f"}}]},{"id":"6963908cdf2a458239435ba5","author":{"_id":"63177d85f957903db971a173","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png","fullname":"Artem","name":"kabachuha","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":47,"isUserFollowing":false},"createdAt":"2026-01-11T11:59:08.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@librarian-bot recommend","html":"\n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-01-11T11:59:08.462Z","author":{"_id":"63177d85f957903db971a173","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png","fullname":"Artem","name":"kabachuha","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":47,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7918877601623535},"editors":["kabachuha"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png"],"reactions":[],"isReport":false}},{"id":"69972096e22a13ce5361cd5c","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2026-02-19T14:39:18.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages](https://huggingface.co/papers/2601.06395) (2026)\n* [Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs](https://huggingface.co/papers/2602.02104) (2026)\n* [\\\"UberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset](https://huggingface.co/papers/2602.15210) (2026)\n* [Bielik 11B v3: Multilingual Large Language Model for European Languages](https://huggingface.co/papers/2601.11579) (2025)\n* [BYOL: Bring Your Own Language Into LLMs](https://huggingface.co/papers/2601.10804) (2026)\n* [TabiBERT: A Large-Scale ModernBERT Foundation Model and A Unified Benchmark for Turkish](https://huggingface.co/papers/2512.23065) (2025)\n* [Kakugo: Distillation of Low-Resource Languages into Small Language Models](https://huggingface.co/papers/2601.14051) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\nThe following papers were recommended by the Semantic Scholar API
\n- \n
- AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages (2026) \n
- Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs (2026) \n
- \"UberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset (2026) \n
- Bielik 11B v3: Multilingual Large Language Model for European Languages (2025) \n
- BYOL: Bring Your Own Language Into LLMs (2026) \n
- TabiBERT: A Large-Scale ModernBERT Foundation Model and A Unified Benchmark for Turkish (2025) \n
- Kakugo: Distillation of Low-Resource Languages into Small Language Models (2026) \n
Please give a thumbs up to this comment if you found it helpful!
\nIf you want recommendations for any Paper on Hugging Face checkout this Space
\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM
Abstract
We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Gamayun addresses the lack of research on small non-English-centric LLMs by adopting a novel two-stage pre-training strategy: balanced multilingual training for cross-lingual alignment, followed by high-quality English enrichment to transfer performance gains across languages. Our model supports 12 languages, with special focus on Russian. Despite a significantly smaller training budget than comparable models, Gamayun outperforms LLaMA3.2-1B (9T tokens) on all considered benchmarks, and surpasses Qwen2.5-1.5B (18T tokens) on a wide range of English and multilingual tasks. It matches or exceeds Qwen3 (36T tokens) on most tasks outside advanced STEM, achieving state-of-the-art results in Russian, including the MERA benchmark, among the models of comparable size (1-2B parameters).
Community
Hello! Great work, the new Russian pretrain is inspiring! I wanted to know if you plan to release: 1. the model 2. the Rubin dataset?
Hello, @RefalMachine ! Thanks for your interest!
Our supervisors didn't allow us to publish the weights of the current model, unfortunately, because it should better fit our publishing standards and have less copyright issues.
Good news, the second version of the model - trained once more time from scratch - is underway. Although it did require > 1T additional tokens to recover, it should be free of these (meaning, critical ones) issues now. We have also scaled the compute >16 times and it already yields better results both on the benchmarks and in subjective conversations.
As for the RuBIN benchmark, at this moment you can access it by request. We are planning to publish it for all to test upon.
Stay tuned! :3
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages (2026)
- Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs (2026)
- "UberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset (2026)
- Bielik 11B v3: Multilingual Large Language Model for European Languages (2025)
- BYOL: Bring Your Own Language Into LLMs (2026)
- TabiBERT: A Large-Scale ModernBERT Foundation Model and A Unified Benchmark for Turkish (2025)
- Kakugo: Distillation of Low-Resource Languages into Small Language Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper