Google Upgrades Gemini With Conversational AI and Deeper App Integration

Interrupt it mid-sentence and it adjusts, shifting its cadence and tone
Gemini Live uses advanced speech models to create natural, adaptive conversations with users on mobile devices.

At its annual I/O conference, Google unveiled sweeping upgrades to its Gemini AI, deepening its conversational fluency and weaving it more tightly into the fabric of daily digital life. The announcement arrives at a moment when the contest for AI dominance is reshaping entire industries, with Google seeking to reclaim ground after a troubled debut. In offering a system that listens, interrupts, remembers, and integrates, the company is wagering that the future of AI is not a destination you visit, but an atmosphere you inhabit.

  • Google's Gemini enters a direct confrontation with OpenAI's GPT-4o, both unveiled within 24 hours of each other and both claiming more natural, interruptible conversation as their signature advance.
  • The credibility wound from Gemini's factual errors at launch and its controversial image generation failures still lingers, making these upgrades as much a reputational rescue as a technical leap.
  • A context window of 1 million tokens — enough to process a 1,500-page document in one pass — signals that Gemini is being built for depth, not just speed.
  • Google is threading Gemini through Gmail, Maps, YouTube Music, Calendar, and more, betting that omnipresence across familiar tools will matter more than any single feature.
  • Customizable 'Gems' let users sculpt Gemini into personal coaches or tutors, mirroring OpenAI's custom GPTs and signaling that personalization is becoming the new battleground.

Google used its annual I/O developer conference to announce a sweeping overhaul of Gemini, its AI assistant, signaling that the product is ready to move beyond its rocky beginnings and compete seriously with OpenAI's ChatGPT.

The most visible change is Gemini Live, a voice conversation feature rolling out to Gemini Advanced subscribers at $20 per month. Through iOS and Android apps, users can now speak to Gemini and have it respond in kind — and crucially, interrupt it mid-sentence without losing the thread. Google also plans to add camera integration later this year, allowing Gemini to respond to what it sees in a user's immediate surroundings.

Underpinning the upgrade is a dramatically expanded context window of 1 million tokens, meaning Gemini can process the equivalent of a 1,500-page document in a single pass. Google says it will double that capacity before year's end, enabling richer, more informed responses drawn from vastly more information at once.

The integration push is equally ambitious. Gemini is being embedded across Google's app ecosystem — Gmail, Maps, Calendar, YouTube Music, and more — transforming it from a standalone chatbot into a layer woven through services people already use daily. A new feature called Gems also lets users create customized versions of Gemini tailored to specific roles, such as a fitness coach or math tutor, echoing OpenAI's custom GPTs.

The timing is pointed. OpenAI unveiled GPT-4o just one day prior, offering similarly natural conversation and interruption capabilities to both free and paid users. The rivalry is intensifying against a backdrop of record stock valuations for AI-adjacent companies across the industry.

For Google, the stakes are personal as well as competitive. Gemini — formerly Bard — stumbled at launch with a factual error that shook investor confidence, and had to suspend its image generation feature earlier this year after it produced historically inaccurate depictions. These new capabilities represent Google's clearest attempt yet to close that credibility gap and establish Gemini as a tool worthy of everyday trust.

Google took the stage at its annual I/O developer conference on Tuesday with a message: Gemini, its answer to ChatGPT, is ready to talk. The company announced a major upgrade to Gemini 1.5 Pro that fundamentally changes how the AI chatbot works—making it conversational in ways that feel more natural, more capable, and more woven into the everyday tools millions of people already use.

The centerpiece is Gemini Live, a new feature rolling out to Gemini Advanced subscribers, who pay $20 monthly for premium access. Through the mobile app on iOS and Android, users can now speak directly to Gemini and have it speak back. The difference from typing is not trivial. The system uses advanced speech models that let the AI adapt in real time. Interrupt it mid-sentence and it adjusts, shifting its cadence and tone to match the flow of actual conversation. Later this year, Google plans to add camera integration, letting Gemini see what's around you and respond to your immediate surroundings.

The raw processing power behind this upgrade is substantial. Gemini 1.5 Pro now operates with a context window of 1 million tokens—a measure of how much text an AI can hold in mind at once. To put that in concrete terms: the system can parse through a 1,500-page document containing 30,000 lines of text in a single pass. Google says it will double that capacity to 2 million tokens later in the year. The larger the context window, the more nuanced and informed the AI's responses can be, since it's working with more information simultaneously.

Beyond the chatbot itself, Google is embedding Gemini deeper into its ecosystem of apps. Users of Gemini Advanced will soon be able to ask the AI to build complex travel itineraries that factor in travel time between destinations, restaurant recommendations, and museum hours. That integration is coming in the coming months. Calendar, Tasks, and Keep—Google's notes app—will follow. YouTube Music integration launched today. The strategy is clear: make Gemini not a separate tool you visit, but a layer that sits across the services you already open every day.

Google is also introducing Gems, customizable versions of Gemini that work similarly to OpenAI's custom GPTs. Tell Gemini to become a personal training coach or a patient math tutor, and it will. The feature lets users shape the AI to their specific needs without building from scratch.

The timing of these announcements matters. OpenAI revealed GPT-4o just yesterday, a new version of its engine that also supports interruption mid-conversation and offers a more natural back-and-forth. GPT-4o is available to both free and paid users, though the paid tier offers five times greater capacity. The race between these companies is accelerating, with Meta, Nvidia, and Microsoft all posting record stock prices as Wall Street bets on AI as the next major computing frontier.

For Google, this upgrade is also a reset. Gemini, originally called Bard, stumbled badly at launch when it provided incorrect information during its initial public demonstration, rattling investors enough to move the stock. Earlier this year, the company had to disable Gemini's image generation feature after it began depicting historical figures—the pope, Nazi leaders—as women or people of color. Those missteps created a credibility gap. The new features and deeper integrations are Google's way of moving past those early failures and establishing Gemini as a serious, reliable tool that belongs in the hands of everyday users.

Gemini can parse through a 1,500-page document containing 30,000 lines of text in a single pass
— Google, at I/O conference
La Conversación del Hearth Otra perspectiva de la historia
Inventor

Why does Google need to make Gemini more conversational? Isn't typing to an AI already pretty straightforward?

Model

The difference is in how it feels and what becomes possible. When you can interrupt mid-sentence and have the AI actually adapt, you're not waiting for a full response before you can correct or redirect it. It's closer to how humans actually think together.

Inventor

And the context window—1 million tokens—that's the thing that lets it read a 1,500-page document at once?

Model

Exactly. Right now, most people paste in a document and the AI reads it. But with that much capacity, Gemini can hold the entire document in mind and cross-reference ideas across all 1,500 pages without losing the thread. It's the difference between skimming and actually understanding.

Inventor

Google's had some real public failures with Gemini. Does this upgrade actually fix the trust problem, or is it just moving past it?

Model

It's both. The failures were real—bad information at launch, the image generation mess. But this upgrade shows the company is listening. Deeper integration into Gmail and Maps means Gemini becomes useful in ways people actually care about. That builds trust faster than any apology.

Inventor

OpenAI just released GPT-4o. Is Google playing catch-up here?

Model

Not exactly. Both companies are converging on the same features—natural conversation, interruption, real-time response. The real competition now is about who gets it into the apps people use every day. Google has an advantage there because it owns Gmail, Maps, YouTube. OpenAI has to partner its way in.

Quieres la nota completa? Lee el original en CNET ↗
Contáctanos FAQ