Google Veo 2 vs Sora of Openai: What is the best video generation model?

Google Veo 2 vs Sora of Openai: What is the best video generation model?

The two American publishers offer video generation models in the state of art. Price, performance, realism… comparison of Veo 2 and Sora.

It is a market in full hatch. The generation of video from prompt begins to develop little by little over the improvement of foundation models. Dominated by Chinese players at first, the market has seen new American heavyweights arrive in recent months. Finally unveiled in December 2024 by Openai, Sora became a must. A few days later, Google in turn unveiled its house solution: Veo 2. A model slightly higher than that of Openai according to the classification of users of the Arena video of Artificial Analysis. Price, quality, realism, duration of productions … Comparison of these two titans of the video AI.

Different technical approaches

VEO 2 is based on a cascade system of several diffusion models. A basic model first generates a low resolution video from the user prompt. Spatial super-resolution models then improve visual quality. Finally, temporal refinement models guarantee consistency between images. This architecture allows Veo 2 to excel in the reproduction of the physical principles of the real world. Concretely, this results in more natural movements, more credible interactions between objects, and a better understanding of the laws of physics in generated videos. Google particularly highlights the capacity of Veo 2 to understand “the unique language of cinematography”: the model correctly interprets the instructions relating to objectives, angles and camera movements.

According to the analysis of researchers from Lehigh University (Pennsylvania), Sora uses a radically different architecture. While Google uses several cascading models, Sora is based on a single transforming of pre -worn diffusion with flexible scale capacities (the model can treat videos of different dimensions without standardizing them). Conversely, conventional approaches like that potentially used by Veo 2 tend to standardize formats (conversion to squares, fixed resolutions), which can cause information loss or distortions during the final rendering. Clearly, in theory, Sora can generate a vertical video for Tiktok, a horizontal for YouTube or a square for Instagram without compromise on visual composition or quality.

The JDN test

In practice, the Google and Openai models are ultimately quite close. According to our different tests, OpenAi offers a more photorealistic rendering than Veo 2. On the other hand, the Google model offers physically more credible videos.

We ask for example the AI ​​to generate a simple video sequence of a Tesla rolling on the Champs-Elysées at sunset. VEO 2 offers a rear tracking of the car with a fairly credible overall rendering. Sora generates an even more realistic video with a fairly beautiful aerial view.

Prompt:

Tesla Model 3 sleek electric car, driving slowly down Champs-Élysées Paris, golden sunset light, Arc de Triomphe visible in background, cinematic atmosphere, reflections on car surface, pedestrians turning heads, warm orange glow, iconic Parisian architecture, high-end fashion stores, mild traffic, 8K ultra HD, cinematic drone shot following car, realistic lighting, photorealistic quality, smooth tracking shot.

Veo 2:

Sora:

For our second test, we ask the models to generate the landing of an A380 plane on a track at Charles de Gaulle airport during a thunderstorm. The two AIs seem to have trouble generating the exact moment of the touchdown. Veo 2 generates a truly credible view of the plane in the driving phase with fairly realistic flashes. For its part, Sora produces a video a slow-motion of the plane flying over the (virtual) camera. The moment seems frozen: the lightning is fixed and the whole is generally not very coherent. The model also generates a fairly unexpected element: a powerful jet of water straight from the plane. Everything is quite graphic but not very realistic.
Prompt:

Massive Airbus A380 aircraft landing at Charles de Gaulle Airport Paris, dramatic summer thunderstorm, lightning flashes illuminating dark skies, rain streaking across runway, tarmac reflecting puddles, airplane lights cutting through storm, water spray from landing wheels, cinematic slow motion, airport control tower visible, other planes waiting, industrial airport atmosphere, wet reflective surfaces, blue-purple storm lighting, 8K ultra HD, dynamic camera movement following aircraft descent, realistic thunder sound design, photorealistic quality, dramatic tension.

Veo 2:

Sora:

For our third, we ask Veo 2 and Sora to generate a cartoonian video of two mice leading a car in Paris near the Eiffel Tower. The result is generally satisfactory on both sides. The realism, however, goes to Veo 2 from Google which generates a cartoon style closer than that of Openai, more resembling a 3D simulation.

Prompt:

Two mice drive a car on the Paris ring road, with the effiel tower in the background, cartoon-style.

Veo 2:

Sora:

Finally, for our last test, we ask AI to generate the rear traveling of a cowboy on his horse in the Death Valley in the United States. The two models produce more than successful results. The physique and the general look of the cowboy are credible. Sora produces the best result with a scene that could almost come from a film. Veo 2 chooses to frame closer to the cowboy. The two models still lack monitoring of the exact instructions: rear tracking is not respected in any of the two versions.

Veo 2:

Sora:

Prompt:

A rugged cowboy riding horseback through Death Valley, California. Wide cinematic shot with the camera slowly tracking backwards to reveal the vast, desolate landscape. Golden hour lighting casts long shadows across the desert floor. The cowboy wears a traditional Stetson hat, weathered leather vest, and has a determined expression. His horse kicks up small clouds of dust as they traverse the iconic cracked earth and salt flats. Mountains loom in the background against a clear blue sky with scattered clouds.

What is the best model? As we previously explained, the two models each have their strengths and weaknesses. Veo 2 generates often more realistic videos thanks to a more faithful respect for the laws of physics. Sora, on the other hand, produces more photorealistic videos.

Availability and price

Sora d’Openai has taken one step ahead in terms of accessibility in Europe, being now available for French and European users, while VEO 2 by Google is not yet accessible in France officially. To access Sora, users have two pricing options: the Chatgpt plus subscription to € 20/month offering a quality limited to 720p and 10 seconds per video, or the pro formula at 200 €/month which allows the generation of videos in higher quality (up to 1080p and 20 seconds) with waterproof download.

On the Google side, VEO 2 should be offered via two main channels: Videofx for creatives and Green AI for developers wishing to integrate API. In terms of pricing, Google has adopted a payment model for use, invoicing $ 0.50 per second of video generated (about $ 30 per minute), a potentially more advantageous model for occasional uses, but which can quickly become expensive for intensive use.

If Sora d’Openai is distinguished by its superior photorealism and its immediate availability in France with clear subscription formulas, VEO 2 of Google impresses with its mastery of physical laws and its more credible rendering of movements and interactions. The choice between the two models will essentially depend on the specific needs of users: professionals seeking exceptional visual quality will be able to favor Sora, while those requiring physically realistic simulations will turn to Veo 2. The two models can be used to quickly create transitional sequences between two scenes without any problem.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment