Gemini offers greater capabilities than the majority of LLMs on the market, but while the benchmarks are excellent, it is in design that Google’s AI outperforms.
“Congratulations to Google on Gemini 3! It looks like it’s a great model.” Even Sam Altman admits that the group hit very hard. Presented on November 18, Gemini 3 Pro rises to the state of the art on a wide range of benchmarks, in particular on multimodal analysis (image, video, audio) and on code generation. Google describes it as a truly “universal” model, designed to work seamlessly in Search, in the Gemini application and in the APIs intended for developers, a first on this scale.
But the real difference lies above all in the scaffold that Google has developed around the model: agentic mode, Visual Layout, Dynamic View… Key functionalities, but still being deployed for some users. For this reason, the JDN chose to test Gemini 3 Pro in its rawest version, without taking full advantage of this scaffold, in order to evaluate what the model is really worth, in itself. We also compared the model with GPT-5.1, the latest SOTA model on STEM benchmarks.
SVG, code, text in French: the JDN test
For the occasion, we tested Gemini 3 and GPT-5.1 on 4 different use cases: the generation of text in French, the generation of a fixed SVG, the generation of an animated SVG and finally the generation of the C++ code of a random password generator.
1. Generate a text in French
Prompt: Write an analysis of exactly 500 words in French on the current state of the financial markets. The text must cover the current valuation of the main stock market indices, return expectations for the coming months, the risks of bursting of the AI bubble with a focus on the gap between tech valuations and concrete deployment, as well as the overall macroeconomic context including inflation, interest rates and growth. Adopts a factual and analytical tone suitable for a professional readership. Write only the body of the text without a title. Absolute constraint: exactly 500 words, not one more nor one less.
Gemini offers a clear and more accessible analysis than GPT-5.1, which adopts a structure more segmented by themes, with distinct blocks (valuation, expectations, fragilities, macro). On the other hand, OpenAI’s AI adopts a more neutral and less alarmist tone than Gemini. In terms of style, Gemini 3 favors longer sentences with more subordinate clauses and GPT-5.1 uses shorter, factual sentences, with a more sustained cadence. Finally, when it comes to respecting the calibration instructions, Gemini is the big winner. It generates 502 words when GPT-5.1 generates 522. We are surprised, however, that OpenAI’s AI estimates that the markets are not at their highest, which is factually false.
2. Generate the SVG image of an iPhone
Prompt: Generates the full, standalone SVG code of an iPhone 16 Pro at maximum detail. Faithfully reproduces the model’s exact proportions, characteristic rounded curves, triple camera module with triangle layout, Action button, volume buttons, USB-C port, and Dynamic Island notch. Takes particular care of color gradients for titanium, reflections on the screen, cast shadows and details of the photo module. The SVG must be complete, ready to use and visually realistic with professional finishes worthy of an Apple product rendering.
The most realistic image is clearly that of Gemini 3. It almost perfectly reproduces the design of the iPhone 16 Pro. The reflections, the textures of the glass and metal, as well as the proportions are extremely close to reality (we instantly recognize a real recent iPhone). Only problem: the Apple logo is offset. On the GPT-5.1 side, the photo sensors are offset, just like the charging ring. Gemini 3 wins hands down, Google’s AI progress is quite notable.
3. Generate animated solar system SVG
Prompt: Generates the complete, self-contained SVG code of an animated representation of the solar system. The sun should be in the center with the eight planets orbiting around it. Each planet must have its characteristic colors true to reality: orange hues for Mercury, yellowish-white for Venus, blue-green for Earth, red-orange for Mars, beige and ocher bands for Jupiter, golden rings for Saturn, pale blue-green for Uranus, deep blue for Neptune. Integrates CSS or SMIL animations to simulate orbital rotations with consistent relative speeds and a 3D depth effect suggesting planets pass in front and behind the plane. Elliptical paths should be visible. The code must be complete, functional and visually immersive with careful finishes.
For the animated solar system, the gap is clear. The code generated by GPT-5.1 is verbose and wonky: the SVG doesn’t load properly, orbit animations don’t run, and the solar system simply remains frozen. Conversely, Gemini 3 produces a valid and immediately usable SVG: the eight planets revolve around the Sun with consistent relative speeds, follow visible elliptical trajectories and pass alternately in front and behind the plane, creating a credible depth effect. The advantage therefore clearly lies with Gemini 3.
4. A random password generator in C++
Prompt: Generates full C++ code for a cryptographically secure and appropriate password generator to ensure true unpredictability. The code must be directly compilable, well commented and structured with clear functions. Ensures uniform character distribution and maximum entropy for truly secure passwords.
GPT-5.1 objectively produces the best code. OpenAI’s AI uses the OS’s native system APIs for better code compatibility. Error handling is also better. The code is also more complete. Finally, the documentation is much clearer. In short, the GPT-5.1 code is directly usable in production. The real question is therefore whether Gemini 3 is really less efficient in backend generation, or if its apparent weakness comes mainly from the fact that it requires a much higher level of instruction in the absence of a programming scaffold.
To verify this theory we decide to try the generation again in Antigravity, Google’s new agentic IDE. And the result is truly better. On a Windows environment, the Gemini 3 code (200 lines) is actually as secure as that of GPT-5.1.
Gemini 3: sharply rising pricing
Gemini 3 Pro comes with significantly higher pricing than its predecessors. Google charges $2 per million tokens in input for requests below 200,000 tokens, an amount that rises to $4 beyond this threshold. Output costs $12 per million tokens for standard requests, reaching $18 for the largest. Prices significantly increased compared to Gemini 2.5 Pro, offered at $1.25 for input and $10 for output. Google seems to have aligned itself with OpenAI. The San Francisco scale-up offers its flagship model at $1.25 for input and $10 for output, slightly lower amounts but of the same order of magnitude.
| Model | Input (< 200k tokens) | Input (> 200k tokens) | Output (< 200k tokens) | Output (> 200k tokens) |
|---|---|---|---|---|
| Gemini 3 Pro | $2.00 | $4.00 | $12.00 | $18.00 |
| Gemini 2.5 Pro | $1.25 | $2.50 | $10.00 | $15.00 |
| GPT-5.1 | $1.25 | $10.00 |
Conclusion
Gemini 3 Pro stands out as a technically solid model, particularly efficient in visual and multimodal tasks. However, the comparison with GPT-5.1 reveals notable gaps in the backend code, where OpenAI’s AI produces more robust implementations that can be directly used in production. Is this an intrinsic weakness of the model or simply the need for more sophisticated prompting? Our test reveals above all that Gemini 3 requires its programming scaffold to give the best of itself. Integrated into Antigravity, Google’s agentic IDE, Gemini 3 should theoretically better structure its outputs and adapt to environmental constraints. Actual performance therefore largely depends on the tooling surrounding it.




