The Mohamed Bin Zayed University of Artificial Intelligence in collaboration with the start-up G42 unveils an LLM of reasoning at only 32 billion parameters.
After massively recruiting high -level engineers in AI (including in France), the Emirates are already reaping the fruits of their strategy. The Mohamed Bin Zayed University of Artificial Intelligence and the local start-up specializing in IA G42 presented Tuesday, September 9, K2 Think, a model of border, open source, at only 32 billion parameters. The competition in the benchmarks of models up to 20 times larger. A small feat.
An excellent model in mathematics
K2-Think strikes strongly on mathematical benchmarks. On the most demanding competitions as liked 2024 and 2025, the model reaches 90.83% and 81.24% respectively, even exceeding the last open source model from Openai, GPT-OS 120b (89.58% and 84.59%) and Deepseek v3.1 671b (91.87% and 82.49%). On the science and programming side, on LiveCodebench, K2-Think obtains 63.97%, clearly surpassing QWEN3-235B-A22B (56.64%) but remaining back against GPT-OS 120B (74.53%). In science, with 71.08% on GPQA-Diamond, it is maintained in the high average without dominating.
On the other hand, the researchers do not hide their bias: rather than communicating on generalist benchmarks where K2-Think would probably display average performances, they deliberately emphasize his capacity as a “border” in mathematics.
K2-Think is therefore not designed to be used as a general model. On the other hand, its excellent mathematics capacities make it a model of choice for use cases around analysis and data handling, optimization or simulation. K2-Think can prove to be an excellent data analysis agent in an agentic system, for example.
K2 Think’s real highlight
The researchers first led to the basic model Qwen2.5-32b (in supervised fine-tuning) to produce detailed “thought channels”, that is to say by explaining step by step its reasoning rather than giving the answer directly. The model then learns to structure its reflection. The researchers then applied strengthening learning (reward for correct answers).
But the main tip occurs during the use of the model. K2-Think is not content to respond directly: it begins by creating a resolution plan, generates three different answers, then automatically selects the best. Contre-up, this planning stage shortens the responses of 12% while making them more precise. Result, K2 Think obtains performance at the height of models making up to 20 times its size.
The weights available in open source (Apache 2)
Emirati researchers have put the weight of K2-Think available on Hugging Face under Apache 2.0 license, the most permissive on the market. To infer it, it will be necessary to have around 60 to 70 GB of VRAM. Classic configuration: an H100 or an A100 to run the least compressed version.
To immediately test the model’s capabilities, a dedicated chat interface is available on k2think.ai. Like Mistral AI, the Emiratis have chosen to deploy their service on Cerebras processors. The infrastructure makes it possible to obtain very short response times: where a traditional GPU would take almost 3 minutes to generate a complex response of 32,000 tokens, K2-Think (with Cerebras) generates it in 16 seconds.
K2-Think represents a rare opportunity for companies, having a “border” level reasoning model while keeping total control over their sensitive data. Its Apache 2.0 license allows an internal deployment without restriction (and it is notable). Furthermore, companies can fine the model on their own sectoral data to create specialized assistants, all at a relatively mastered cost.




