Should we adopt Kimi K2, the new reference to open source?

Should we adopt Kimi K2, the new reference to open source?

The Chinese laboratory Moshot unveils a model of reasoning in the state of the art on several reference benchmarks.

Is this the new Deepseek? The start-up Monshot, back by Alibaba, launches Kimi K2 an agent model with reasoning. AI is established as the new reference of the open source with scores in benchmarks that have nothing to envy to the proprietary models.

Code, Mathematics… The strengths of Kimi K2

Kimi K2 stands out as a formidable competitor to the giants of owners on development and mathematics tasks. On LiveCodebench V6 (generation of real-time code), the model reaches 53.7%success, significantly surpassing Deepseek-V3 (46.9%) and positioning itself at Claude Sonnet 4 (48.5%) and Claude Opus 4 (47.4%). On Swe-Bench Verified (benchmark used to analyze the agentic capacity of the AI), Kimi K2 displays notable performance with 65.8% success, just behind Claude Sonnet 4 (72.7%). Even more impressive, on mathematics, the model obtains 69.6% on the like 2024 front of the Claude models.

However, Kimi K2 reveals some weaknesses on sovereign benchmarks. On Simpleqa (simple factual questions), the model caps at 31% success, far behind GPT-4.1 which displays 42.3%. The gap also widens on benchmarks like Humanity’s Last Exam (Advanced General Culture Exam) where Kimi K2 obtains only 4.7% against 7.1% for Claude Opus 4. Moshot AI also recognizes problems of excessive generation of tokens on certain complex reasoning tasks, which can lead to truncated or incomplete outings.

A MOE base, 1000 billion parameters

Kimi K2 is based on a MOE architecture (popularized by Mistral AI). The model has 1 trillion of parameters in total (1000 billion), of which only 32 billion activated simultaneously during inference. A broken architecture which allows to reach the performance of a dense model while retaining controlled computational costs.

For inference, the quantified Q8 full version of Kimi K2 requires approximately 8 h200 for optimal performance, with at least 250 GB of unified memory. However, less than 72 hours after its publication, the open source community has already produced optimized weight versions. Unslothai thus offers a version capable of operating on a MacBook M4 Max with 128 GB of VRAM (using the Offloading Memory) or on a 512 GB Ultra Mac Studio M3.

Kimi K2 is distributed under MIT license: The use and modification are allowed for commercial purposes. The only constraint concerns very large -scale applications: if a commercial product exceeds 100 million monthly active users or generates more than $ 20 million in monthly income, the user interface must display the mention “Kimi K2”.

Should we adopt Kimi K2?

Kimi K2 could impose itself as a reference for the agent code. Its very good performances on Swe-Bench Verified (65.8%) make it a serious candidate to replace the Claude models with an investment in adapted infrastructure. For companies, the local inference of Kimi K2 could be economically advantageous in the face of the prohibitive prices of the owners (for use cases around the code). Be careful however, to reproduce Kimi K2’s performance in benchmarks, you will have to use the lowest quantity version. A version that requires many more resources and therefore requires a precise assessment of costs.

Nevertheless, apart from use cases around development or advanced agentics, Kimi K2 is struggling to justify his adoption. Its performance/compute ratio turns out to be compared to more compact open source models like Phi, which offer higher energy efficiency and speed of inference. With only 31% on Simpleqa against 42.3% for GPT-4.1, this trillion of parameters is oversized for most general assistance tasks.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment