Google is now taking on OpenAI’s Sora with Veo 2, a next-gen video generation model. This development comes a few months after the initial debut of Veo back in May, and comes alongside updates to Imagen 3, Google’s flagship image generation model, and Whisk, a new experiment that can utilize the enhanced Imagen 3 to remix existing images.

“Earlier this year, we introduced our video generation model, Veo, and our latest image generation model, Imagen 3. Since then, it’s been exciting to watch people bring their ideas to life with help from these models: YouTube creators are exploring the creative possibilities of video backgrounds for their YouTube Shorts, enterprise customers are enhancing creative workflows on Vertex AI and creatives are using VideoFX and ImageFX to tell their stories. Together with collaborators ranging from filmmakers to businesses, we’re continuing to develop and evolve these technologies,” Google announced in a blog post. “Today we’re introducing a new video model, Veo 2, and the latest version of Imagen 3, both of which achieve state-of-the-art results. These models are now available in VideoFX, ImageFX and our newest Labs experiment, Whisk.”

Google notes that its new video generation model comes with an understanding of real-world physics and human movement, which enables it to simulate motion and fluid dynamics with greater accuracy and generate videos that portray more realistic actions. For example, the model has refined its ability to simulate the pouring of liquids, such as coffee or syrup. Additionally, the model now better captures subtle human expressions, making the generated videos appear more lifelike. The videos themselves have 4K resolution, so there is a high level of clarity and detail. Google notes that the new model is still being tested within Google Labs through the VideoFX platform.

In addition to this, the model allows creators to capture objects or individuals from multiple angles to create more dynamic and varied shots, and can replicate complex cinematic effects—including different types of lenses and lighting techniques as well. Google notes that the model comes with a level of customization as well – creators can specify the desired genre, lens type, and cinematic effects while creating the video and having more control over the final output. For instance, a user might ask Veo 2 to generate a video that simulates footage shot with a specific camera lens or to apply a particular style of lighting, such as volumetric lighting to create beams of light.

With the release of Veo 2, Googe is upping the ante against other competitors in the AI landscape, especially OpenAI. OpenAI recently revealed Sora, which enables users to create videos from text-based prompts. Sora can generate videos up to 1080p resolution and a total of 20 seconds (each), while Veo 2 surges ahead in both resolution (up to 4K) and video length (up to two minutes). Still, Veo 2 is still a work in progress, and Google notes that it can still generate errors like unnatural motions, even though it has reduced the frequency of hallucinations—such as the appearance of unrealistic or misplaced objects.