MamayLM can now see! We are releasing MamayLM v1.0, the best-performing efficient Ukrainian language model that surpasses all similar-sized models in both English and Ukrainian, while matching or overtaking up to 10x larger models.
We are delighted to announce the release of MamayLM v1.0, a new state-of-the-art LLM targeting the Ukrainian language. We are releasing the model in two sizes - 4B and 12B - both of which are cost-efficient, fast, multimodal and can be run on 1 GPU, yet are effective in both Ukrainian and English. The model comes with strong capabilities outpacing open models of similar sizes in both languages, while matching or favourably comparing against much larger models. MamayLM is a result of a collaboration between researchers at INSAIT and ETH Zurich.
In our v0.1 version we have successfully adapted Gemma 2 model to Ukrainian language, based on our former research about language transfer
In the previous version, our Ukrainian pre-training data was based on the FineWeb2
During pretraining, we used best-fit packing
Similarly to the v0.1 version, for the post-training stage we extracted topics relevant to Ukrainian history and culture, which enabled the generation of a synthetic dataset of Ukrainian QA pairs using knowledge distillation from a larger model. We also employed our LLM-based translation pipeline to translate domain-specific data to Ukrainian, enhancing both quantity and quality in the target language.
Our instruction-tuning dataset incorporates various open-source datasets, such as the Nemotron SFT and Post-Training datasets, OpenCoder (OPC) SFT dataset, Aya Collection and more. We acknowledge the significant contributions of the Ukrainian open-source community, particularly creators of Spivavtor, UAlpaca, UA-Squad, Ukrainian StackExchange, Crimean Tatar Parallel Corpora and UA-Lawyer QA, which amplify the potential of Ukrainian post-training.
In the pre-training stage we have split the dataset into two parts based on different massive web-sourced datasets and re-introducing smaller domain-specific datasets in both splits. Based on the training with different splits we utilized model souping technique to improve pre-trained model performance - this allowed us to increase pre-training performance dramatically.
In the post-training stage, we trained English- and Ukrainian-focused instruction-tuned models separately, which were later combined into a final better version. Such separated approach allows us to increase the performance on both languages even more thanks to the having data targeted for a specific language. We also applied an advanced model merging technique inspired by Layer Swapping
The chosen pipeline allows us to not just preserve visual and long-context capabilities, but even improve them for both languages without having specific datasets targeted for those domains. We believe that the visual multilingual performance is strongly dependent on the model's linguistic capabilities in given languages, therefore, we observe improvements on visual benchmarks without training on text-image data.
We evaluated MamayLM on a set of standard English benchmarks, a translated version of them in Ukrainian, as well as Ukrainian-specific benchmarks we collected:
We undertook the challenge of unraveling the best translation method for the English-only benchmarks. Although some effort has been made in this direction
To address these issues, we developed a translation framework that preserves the context of both questions and answers. It also employs multisampling and scoring of translation candidates to optimize the balance between machine translation quality and human involvement, ensuring maximum efficiency. All adapted benchmarks for Ukrainian are available in the according GitHub repository.
As illustrated by the figures below, across all benchmarks, MamayLM outperforms all similarly sized models (even outperforming much bigger 70B models on Ukrainian!). It does so in both English and Ukrainian, thanks to the particular method used to train MamayLM (mentioned above).
We also evaluated MamayLM v1.0 against current state-of-the-art LLMs. Impressively, our model outperforms models up to 6 times larger across various benchmarks, including those specific to Ukrainian contexts, as shown in the figure below.
Importantly, as the figure below shows, MamayLM v1.0 achieves the highest score on the ZNO (National Ukrainian) high school exams amongst similarly sized models, while outperforming much larger models, including Gemma2 27B, Llama 3.1 70B and Qwen 2.5 72B.
We also evaluated MamayLM v1.0 on visual benchmarks, where it demonstrates strong performance in both Ukrainian and English. The model's ability to understand and generate text based on visual inputs highlights its versatility and effectiveness across different modalities.
To assess the English performance we use original MMMU
To monitor Ukrainian visual performance we used ZNO-Vision
Beyond benchmark evaluations, we assessed generative capabilities of MamayLM v1.0 on a set of 500 complex questions. The results demonstrate that our model consistently outperforms significantly larger models, excelling both in the linguistic quality of the generated Ukrainian text and the accuracy of its content. To ensure unbiased and high-quality evaluations, we relied on Gemini 2.0 Flash, which has strong proficiency in Ukrainian and a deep understanding of its cultural and linguistic nuances.
We evaluate the model performance on factual Ukrainian QA data, where our model shows positive performance against much larger models as well as GPT-4o and Claude 3.7 Sonnet.
We also check the model performance on m-ArenaHard (Ukrainian subset), designed to evaluate more domain-specific knowledge in math and coding, where our model displays similarly good performance against much larger models.
We assess the capabilities of MamayLM v1.0 4B using the same benchmarks, targeted to evaluate text generation, comprehension, and domain-specific knowledge for both Ukrainian and English. The model shows strong performance against similarly sized models, demonstrating its effectiveness across a range of tasks.
Furthermore, MamayLM v1.0 4B achieves 50% accuracy on ZNO benchmark, showing promising performance on Ukrainian-focused tasks as a small model.
In the current technological landscape, the need for fast, adaptable, and locally optimized solutions has become critical. Available in 4B and 12B sizes, MamayLM is relatively compact and consistently outperforms models up to 10x larger in both English and Ukrainian. Its ability to operate on a single GPU allows for faster adaptation, lower operational costs, and simpler deployment, making it particularly well-suited for environments with limited resources and evolving demands. Moreover, the new version has now visual and long context capabilities with increased performance for both languages.
This offers significant advantages for Ukrainian local businesses and government institutions, which can integrate advanced AI technologies without the prohibitive costs or complex technical requirements typically associated with larger systems. Having smaller size option allows more flexibility in deployment and scaling for smaller businesses which do not have extensive infrastructure. Additionally, the model's bilingual capabilities support its application in sectors such as education and healthcare, where addressing language barriers can have a meaningful impact. In particular, it helps meet immediate needs in Ukraine by enhancing service delivery across critical areas.
We make normal and quantized versions of MamayLM available on HuggingFace, alongside a detailed description of how to use them for inference:
The Ukrainian benchmarks are available in the according GitHub repository.
If you use our models, please consider citing our work (citation below).
For any questions on MamayLM, please contact us at contact@insait.ai.
INSAIT is a world-class computer science and AI research institute, which is part of Sofia University, located in Sofia, Bulgaria. INSAIT was created in 2022, in partnership with Switzerland's ETH Zurich and EPFL. It is a strategic institution for Bulgaria, funded with an initial endowment of around 100M USD by the Bulgarian government, over a period of 10 years, and is generously supported with donations of roughly 15M USD from SiteGround, Google, AWS, VMware and other big-tech companies. INSAIT is the first center of its kind in Eastern Europe, structured according to top Western computer science and AI institutions – it provides world-class packages and conditions for outstanding tenure-track and tenured faculty, research scientists, post-docs, PhDs and many other positions. Currently, INSAIT hosts researchers from more than 23 nationalities and does research in areas spanning foundational models, safe and secure AI, robotics, computer vision, quantum computing, algorithms, information security, and other key areas.
For attribution in academic contexts, please cite this work as
"MamayLM v1.0: An efficient state-of-the-art multimodal Ukrainian LLM", 2025.
BibTeX citation
@misc{MamayLMv1, title={MamayLM v1.0: An efficient state-of-the-art multimodal Ukrainian LLM}, author={Yukhymenko, Hanna and Alexandrov, Anton and Vechev, Martin}, year={2025}, }
This blog was based on The Distill Template by Leandro von Werra.