Google DeepMind today introduced Genie 3, a new general-purpose “world model” capable of generating immersive, interactive virtual environments from simple text prompts. Genie 3 supports real-time navigation at 720p resolution and 24 frames per second, producing worlds that remain visually and physically consistent for several minutes. In contrast, its predecessor, Genie 2, supported only brief interactions of 10 to 20 seconds at lower resolution.
A feature called “promptable world events” lets users adapt the scene after generation—adding weather effects like rain, spawning animals, or introducing objects on the fly. This transforms Genie 3 from a static world into a mutable, responsive environment, ideal for exploration. The clips shown on Google’s blog post show impressively realistic scenes.
DeepMind views Genie 3 as foundational for embodied AI agents (robots and virtual assistants that interact with their surroundings). Shlomi Fruchter, a research director, called the model “the first real-time interactive general-purpose world model,” suitable for tasks like training simulated agents to navigate a warehouse or follow complex instructions. Currently, Genie 3 is being rolled out as a controlled research preview limited to a small group of academics and creators, giving DeepMind the chance to assess safety, address biases, and refine capabilities.
By enabling AI to generate worlds that can be explored, modified, and remembered, Genie 3 takes a step toward truly embodied AI — one that can reason, experiment, and plan in simulation before acting in the real world. For AGI researchers, it presents a powerful new tool. For end users — particularly creators, educators, and game designers — Genie 3 opens new possibilities. For example, educators can use it to create immersive teaching environments; artists and game developers can prototype levels, characters, or scenarios instantly; and everyday users can explore their imagination by generating personalized virtual spaces from simple descriptions – whether it be riding a horse in New Zealand or watching the sea. The model’s responsiveness to real-time prompts makes it a powerful tool for anyone interested in digital creativity, storytelling, or interactive learning.
“Today we are announcing Genie 3, a general purpose world model that can generate an unprecedented diversity of interactive environments. Given a text prompt, Genie 3 can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p,” Google announced in a blog post. “At Google DeepMind, we have been pioneering research in simulated environments for over a decade, from training agents to master real-time strategy games to developing simulated environments for open-ended learning and robotics. This work motivated our development of world models, which are AI systems that can use their understanding of the world to simulate aspects of it, enabling agents to predict both how an environment will evolve and how their actions will affect it,” it added.
DeepMind has already used Genie 3 to train its SIMA agent (Scalable Instructable Multiworld Agent), which accomplished multi-step tasks such as navigating to specific objects in virtual warehouses. Though the world model does not “know” the goal, SIMA achieved success through planning in a self-consistent simulation. However, limitations remain. The range of agent actions is still narrow, simulation duration is limited to a few minutes, and modeling interactions among multiple agents remains a challenge. Additionally, text rendering within environments is imprecise unless explicitly included in the prompt.