News, service

Google DeepMind Unveils Genie 3, a Virtual World-Creating Robot

25. 08. 07.

Google DeepMind has unveiled Genie 3, the world's first real-time, interactive, general-purpose world model. It generates 3D virtual worlds in real time using only text prompts, and includes features that allow users to directly manipulate the environment or train AI agents.

Genie 3 can generate consistent 3D environments for several minutes at 720p resolution at 24 frames per second. This is a significant improvement over the previous Genie 2, which was limited to short simulations of approximately 10 to 20 seconds. This model remembers past interactions to maintain a consistent environment, and it also supports "Promptable World Events," which allows users to control dynamic events like weather changes or object additions in real time through text input.

DeepMind claims Genie 3 can naturally represent various natural and ecosystem elements, such as water, light, animal behavior, and plant growth, using only prompts, without physically based rendering or fixed 3D models. Users can create worlds by describing spaces like ancient Roman cities or futuristic spaceports using text, and within these worlds, AI agents learn to navigate, make decisions, and act.

Genie 3 is attracting particular attention as an experimental tool for AGI research. DeepMind reported that internal tests have confirmed that the SIMA agent running on Genie 3 can recognize goals and perform context-appropriate actions. A representative example is the task of approaching a trash compactor or forklift in a warehouse environment.

Technically, Genie 3 differs from existing visual techniques like NeRF and Gaussian Splatting. It operates in an autoregressive manner, reflecting user input and past sequences on a frame-by-frame basis, creating a world that looks and behaves realistically without the need for a hard-coded physics engine.

However, limitations exist. The range of actions agents can perform is still limited, and precise multi-agent interaction is still in its infancy. Accurate reproduction of real-world terrain and clear text representation are only possible in some situations, and the focus is on real-time simulations lasting a few minutes rather than long-term interactions.

DeepMind anticipates that this model will be used in a variety of fields, including education, gaming, film production, and robot training. However, Genie 3 is currently only available as a limited research preview, initially only available to select researchers and creators. The developers stated that they are working closely with responsible research teams to ensure safety and ethical considerations.

DeepMind believes Genie 3 could serve as the foundation for AI that learns from experience, much like humans. The company explains that if AI can develop the ability to plan through trial and error and navigate uncertainty, this could be a key step toward AGI. Like AlphaGo's unexpected strategy in Go in 2016, Genie 3 has the potential to signal the beginning of a new era.