describe what they want and the machine builds it
With the release of Gemini Omni, Google has extended the logic of conversational AI into the dimension of time itself — moving from still images into motion, from menus into dialogue. Where once video editing demanded technical fluency with timelines and tools, a person may now simply describe what they wish to see, and the machine will attempt to render it with an understanding of gravity, physics, and continuity. This is less a product announcement than a quiet redrawing of who gets to be a creator, and what creation now requires.
- Google has launched Gemini Omni, a system that lets users edit and generate video through natural conversation rather than traditional software timelines.
- The model understands physical laws — gravity, fluid dynamics, motion — closing the uncanny gap that made early AI video feel weightless and unreal.
- Paid subscribers on AI Plus, Pro, and Ultra tiers can access it immediately through the Gemini app, Google Flow, and YouTube Shorts, with no extra cost for YouTube Create users.
- Every AI-generated video is stamped with SynthID watermarking, embedding a layer of transparency and origin verification into the content itself.
- Developer and enterprise API access is imminent, suggesting Google intends Omni to become foundational infrastructure rather than a consumer novelty.
Google has released Gemini Omni, an AI system designed to do for video what earlier tools did for still images — let people describe what they want and have the machine build it. The launch follows weeks of speculation, with traces of the project appearing in the Gemini dashboard before any official word.
Omni extends work Google began with Nano Banana, which brought AI image generation into the Gemini ecosystem. The new system moves into motion, and its first version, Gemini Omni Flash, is already rolling out to paid subscribers across the AI Plus, Pro, and Ultra tiers, accessible through the Gemini app, Google Flow, and YouTube Shorts.
The defining feature is conversational editing. Rather than navigating menus and timelines, a user can describe a change — transform a room, alter a material, introduce a new visual element — and the system applies it while preserving consistency across characters, environments, and motion. Google demonstrated this with a scene in which a person's arm gradually becomes reflective as it touches a rippling mirror.
The model has been trained on the physical logic of the real world, grasping gravity, fluid dynamics, and motion well enough to produce scenes that behave convincingly. Users can guide the system with multiple input types — text, images, existing video, and voice samples — though full audio editing remains in development. AI-generated personal avatars are also being introduced, allowing people to create digital versions of themselves that speak in their own voice.
All content produced by Gemini Omni will carry SynthID watermarking, Google's method for verifying AI-generated material. YouTube Shorts and YouTube Create users gain access at no additional cost this week, while developer and enterprise API access arrives in the coming weeks — a signal that Google sees Omni not as a feature, but as the foundation of a broader creative ecosystem.
Google has released Gemini Omni, a new artificial intelligence system built to handle video the way earlier tools handled still images—by letting people describe what they want and having the machine build it. The announcement came after weeks of rumors, with hints of the project surfacing in the Gemini dashboard just before the official unveiling.
The system represents an expansion of work Google started last year with Nano Banana, which brought AI-powered image generation and editing into the Gemini ecosystem. Where Nano Banana worked with photographs and illustrations, Omni moves into motion. The first version, called Gemini Omni Flash, began rolling out immediately to subscribers of Google's AI Plus, Pro, and Ultra tiers, accessible through the Gemini app, Google Flow, and YouTube Shorts.
What sets Omni apart is how it lets people edit video through conversation rather than menus and timelines. A user can describe a change they want—make a character's arm turn to mirror material, transform a room into a different setting, add new visual elements to an existing clip—and the system applies the edit while keeping characters, environments, and motion consistent across the scene. Google demonstrated this with examples including a person touching a mirror that ripples like liquid as their arm gradually becomes reflective.
The model has been trained to understand how the physical world actually works. It grasps gravity, motion, and fluid dynamics well enough to generate scenes that behave realistically rather than in the uncanny, weightless way early AI video sometimes did. Beyond pure visual generation, Google has woven in the broader knowledge that powers Gemini itself, so the system can create explanatory videos and context-aware visuals, not just aesthetically pleasing clips.
Users can feed the system multiple types of input—text descriptions, images, existing video, voice samples—and combine them to guide the style, motion, or structure of what gets generated. Audio input is currently limited to voice references; fuller audio editing capabilities are still being tested. Google is also introducing AI-generated avatars, letting people create digital versions of themselves that can speak in their own voice for video generation, though broader speech editing features remain in development.
Every video created with Gemini Omni will carry SynthID watermarking, Google's method for marking AI-generated content and verifying its origins. The technology is launching first for paid subscribers, but Google is also bringing it to YouTube Shorts and the YouTube Create app at no additional cost starting this week. Developer and enterprise API access will arrive in the coming weeks, signaling that the company intends this to become infrastructure for a much wider ecosystem of creators and businesses.
Citas Notables
Users can modify clips using natural language prompts, with edits building on each other while maintaining consistency between characters, environments and motion across scenes— Google's description of Omni's conversational editing capability
La Conversación del Hearth Otra perspectiva de la historia
Why does Google need another video tool when editing software already exists?
Because editing software requires you to know what you're doing—you need to understand layers, keyframes, color grading. Omni lets you just say what you want and have it happen. That's a different kind of power.
But doesn't that risk flooding the world with AI video?
Probably. But the watermarking is there so people can tell what's real and what's made. That's the bet Google is making—transparency instead of restriction.
The physics understanding seems important. Why?
Because early AI video looked fake. Things floated. Liquids didn't behave like liquids. If Omni actually understands gravity and motion, the videos will look credible enough that people might actually use them instead of dismissing them as obviously artificial.
Who benefits most from this?
YouTube creators, probably. Anyone making shorts or explainer content. But also enterprises—imagine generating product demos or training videos without hiring a production crew. That's the real market.
What's the catch?
You need a paid subscription to use it. And it's still early—audio editing isn't fully there yet, speech editing is still being tested. This is the opening move, not the finished product.