Google tweaks Gemini usage limits after user backlash over quick quota depletion

Your quota is used only for successful completions.
Google clarified that system errors won't count against user limits, shifting responsibility for failures to the company.

A week after Google introduced compute-based usage limits for its Gemini AI app, the company found itself navigating a familiar tension in the technology age: the gap between how engineers price complexity and how users experience fairness. When a single ambitious request could silently consume a month's worth of allowance, the system felt less like a resource and more like a trap. Google's adjustments — capping per-prompt costs, freeing lighter models, and promising greater transparency — reflect the ongoing human negotiation between what a tool can do and what people believe they deserve from it.

  • Users discovered that one complex Gemini request — a video analysis, a large file upload, an intricate coding task — could quietly drain their entire monthly quota in a single sitting.
  • The backlash was swift enough that Google began reversing course within a week of launching the compute-based system at I/O 2026.
  • Google capped how much any single prompt can consume, ensuring no one request can monopolize a user's allowance, while making the lighter Flash-Lite model entirely free and unlimited.
  • A bug that left Google AI Ultra subscribers nearly out of video generation credits after just one or two Omni tasks has been addressed with doubled limits.
  • Google is building toward a pay-as-you-go top-up system and more granular usage dashboards, signaling that the compute-based model is staying — but needs a more human face.

Google's experiment with compute-based usage limits for Gemini barely survived its first week. Launched at I/O 2026, the system was designed to charge users according to the true computational cost of each request — a text prompt costing little, a video analysis costing far more. In theory, it was equitable. In practice, users found that a single complex task could consume a disproportionate share of their monthly allowance, leaving them stranded before the reset.

Google responded with a layered set of fixes. The most direct: a per-prompt quota cap for Gemini 3.1 Pro users, ensuring no individual request can monopolize a user's total allowance. Alongside this, all prompts using the lighter Gemini 3.1 Flash-Lite model became free and unlimited — a pressure valve for routine work. The company also clarified that failed requests, errors on Google's end, would not count against user quotas.

Two groups received targeted relief. Ultra subscribers hit by a bug that drained video generation credits after just one or two Omni tasks now have double the previous limit. And Google announced a forthcoming pay-as-you-go top-up system, letting users purchase additional compute credits on demand rather than waiting for a monthly reset.

Underlying all of it is a transparency problem Google has acknowledged but not yet fully solved. The existing usage dashboard offered only a broad overview, leaving users unable to anticipate which tasks were expensive. Google has promised more granular breakdowns, real-time notifications, and persistent model preferences across sessions. The compute-based model remains — but Google is working to make it feel less like a penalty and more like a partnership.

Google's rollout of compute-based usage limits for its Gemini app at I/O 2026 lasted about a week before the company began walking back the strictest parts of the system. Users had complained that their quotas were vanishing too quickly—a single complex prompt or video task could consume what felt like an unfair share of their monthly allowance. On Thursday, Google announced a series of adjustments meant to address what the company called "feedback about hitting limits too quickly."

The core problem was architectural. When Google switched Gemini to a compute-based model, the company tried to price usage according to the actual computational cost of each request. A straightforward text prompt demands far less processing power than a video analysis or a coding task, so the system charged accordingly. But the math didn't feel right to users. A single complex request—say, uploading a large file or asking Gemini to write intricate code—could drain a substantial portion of a user's quota in one go, leaving them with little room for additional work before hitting the monthly ceiling.

Google's response came in layers. For Gemini 3.1 Pro users, the company introduced per-prompt quota caps, preventing any single request from consuming an outsized chunk of the total allowance. This was the most direct fix: you could still run complex tasks, but no individual prompt could monopolize your quota. Separately, Google made all prompts using the lighter Gemini 3.1 Flash-Lite model completely free and unlimited, giving users a way to handle routine requests without touching their quota at all. The company also clarified that failed requests—errors on Google's end—would not be charged against a user's limit. "Your quota is used only for successful completions," the company stated, shifting the burden of system failures away from users.

Beyond these immediate fixes, Google acknowledged that users needed better visibility into how their quota was being consumed. The existing usage dashboard on gemini.google.com offered only a high-level overview, leaving users in the dark about which tasks were expensive and which were cheap. Google promised more granular breakdowns and real-time notifications to help users understand their consumption patterns and plan accordingly. The company also said it would remember a user's preferred model choice across all future sessions, only reverting to a lighter model if the user hit a hard cap.

Two specific user groups received targeted relief. Google AI Ultra subscribers, who had been hit by a bug where just one or two Omni video generations would drain their quotas, now have double the number of video generations available. And looking ahead, Google said it would introduce a pay-as-you-go top-up system, allowing users to buy additional compute credits on demand rather than waiting for a monthly reset.

The adjustments reveal the tension in Google's pricing strategy: the company wants to charge fairly for the actual cost of computation, but users expect a simpler, more predictable model. By capping individual prompts, making the lighter model free, and improving transparency, Google is trying to thread that needle—keeping the compute-based system while making it feel less punitive. Whether users will feel the difference remains to be seen.

A simple text prompt uses far less compute than a complex video or coding prompt.
— Google, explaining the compute-based model
If a request fails, you won't be charged. Our system mistakes are on us, not you.
— Google, on failed request policy
The Hearth Conversation Another angle on the story
Inventor

So Google switched to charging based on actual compute cost, and users immediately felt squeezed. Why did that surprise them?

Model

Because the old system—if there was one—probably felt more like a flat monthly allowance. You got X uses per month, period. This new system is more honest about what things actually cost, but honesty can sting when a single video task eats 20 percent of your budget.

Inventor

And the per-prompt cap fixes that how?

Model

It sets a ceiling on any single request. You can still run expensive tasks, but nothing can hog the whole quota. It's a guardrail.

Inventor

What about the free Flash-Lite tier? Isn't that just Google admitting the limits were too tight?

Model

Partly, yes. But it's also smart product design. Flash-Lite is fast and cheap to run. If you're doing routine work—summarizing, drafting, simple questions—you don't need the heavy model. Free tier for light work, paid quota for the serious stuff.

Inventor

The bug with Omni videos—that sounds like it was hitting people hard.

Model

It was. Imagine paying for a premium tier and then discovering that two video tasks drain your entire month. That's not a pricing problem; that's a broken product. Doubling the limit for Ultra users is the minimum fix.

Inventor

So what's the real issue Google is trying to solve here?

Model

Trust. Users need to feel like the system is fair and transparent. Right now, they're still learning how much things cost. Better dashboards and notifications help with that. But the deeper issue is that compute-based pricing is inherently opaque to regular people. Google's trying to make it less so.

Contact Us FAQ