MIT and Microsoft develop system to slash AI agent energy use by 73%

It is very easy to over-allocate resources, wasting energy and money.

A researcher explains why intelligent optimization of AI workflows matters to cloud providers and their customers.

As artificial intelligence grows more capable, the systems that orchestrate its work have grown more wasteful — consuming energy and computation in ways that few pause to examine. Researchers at MIT and Microsoft have built a system called Murakkab, named for the Urdu word meaning a composition of things, that allows developers to describe what they need in plain language while the system silently handles the vast and opaque machinery of configuration beneath. In doing so, it reduces energy consumption by nearly three-quarters and computation by nearly two-thirds — a reminder that intelligence, artificial or otherwise, is most admirable not when it does more, but when it does more with less.

Agentic AI workflows — chains of models and tools solving complex tasks together — are quietly becoming the load-bearing infrastructure of cloud computing, yet building them efficiently has remained stubbornly manual and wasteful.
Developers are forced to hard-code every technical decision upfront, and when better tools emerge, the entire configuration must be rebuilt from scratch, creating a cycle of inefficiency baked into the industry's foundations.
Murakkab breaks this cycle by letting developers describe goals in plain language, then automatically assembling optimal models, tools, and hardware — and adapting in real time to each user's priorities of cost, speed, or accuracy.
In testing, the system slashed energy use to roughly 27 percent of conventional levels and cut costs to under 25 percent, while in one case reducing energy consumption tenfold with only a 2 percent loss in accuracy.
The research, headed for presentation at a major systems conference, is still early-stage, but its implications point toward a future where cloud infrastructure wastes far less of the energy it quietly consumes at enormous scale.

Agentic workflows — intricate chains of AI models and tools that collaborate to solve complex problems — are becoming the backbone of cloud computing. A video question-answering system, for example, might extract frames, generate a transcript, and then respond to user queries. But building these systems efficiently has proven surprisingly hard. Developers must manually specify nearly every technical choice upfront: which models, which tools, what hardware, how to balance speed against cost. When something better comes along, they start over. And cloud providers, blind to the internals of each workflow, cannot allocate resources intelligently.

Researchers at MIT and Microsoft designed Murakkab — an Urdu word meaning a composition of things — to change this. The concept is straightforward: let developers describe what they want in plain language, and let the system handle the rest. A developer says they need a video Q&A application; Murakkab identifies the best available models and tools, determines what can run in parallel, and schedules hardware accordingly. When new accelerators or models emerge, no retooling is required. When a real customer uses the application, the system optimizes in real time around that user's priorities — accuracy, speed, or cost — while giving cloud providers visibility across multiple workloads to share resources efficiently.

The results were striking. Compared to conventional approaches, Murakkab used roughly 35 percent of the computation, dropped energy consumption to about 27 percent, and reduced costs to under 25 percent. In one scenario, it cut energy use by more than tenfold while sacrificing only 2 percent of accuracy — and identified an optimal configuration a developer could never have found manually.

Graduate researcher Gohar Chaudhry, who led the work alongside MIT's Adam Belay and Microsoft Azure's Ricardo Bianchini, framed the urgency plainly: agentic workflows are growing complicated fast, and over-allocating resources wastes both energy and money at a scale the industry can no longer afford to ignore. The team plans to extend Murakkab to more complex workflows and larger clusters, but the broader question is already forming — whether the cloud industry will embrace intelligent optimization before agentic systems grow too large and too wasteful to course-correct.

Agentic workflows—those intricate chains of AI models and external tools that collaborate to solve complex problems—are becoming the backbone of what cloud providers do. A video question-answering system, for instance, might extract key frames, generate a transcript, and then field user queries about the content. But building and deploying these systems efficiently has proven surprisingly difficult. Developers must hard-code nearly every technical choice upfront: which models to use, in what order, which tools to invoke, what hardware to run it all on, how to balance speed against cost. The configuration space is vast and opaque. When a better model arrives, the developer starts over. And the cloud data center deploying the application has no visibility into the workflow's internals, so it cannot allocate resources intelligently at request time.

Researchers at MIT and Microsoft set out to solve this problem with a system called Murakkab—an Urdu word meaning a composition of things. The idea is simple in concept but complex in execution: let developers describe what they want their agentic workflow to do in plain language, and let the system figure out the rest. A developer might say, "I need a video Q&A application." Murakkab then automatically identifies the best existing models and tools to assemble, determines which components can run in parallel and which must run sequentially, and decides on hardware allocations and deployment schedules. The system makes these choices dynamically, so when a new GPU accelerator or model is released tomorrow, the developer doesn't need to retool anything.

When a cloud provider deploys the application for an actual customer, Murakkab optimizes the workflow in real time based on that user's constraints—whether they care most about accuracy, speed, or cost. The system adaptively configures components and allocates hardware to maximize efficiency while meeting those constraints. It also gives cloud providers visibility into multiple workloads simultaneously, enabling them to share computational resources across customers in the most efficient manner possible.

The results from testing on diverse agentic workflows were striking. Compared to traditional approaches, Murakkab used only about 35 percent of the computation required by other methods. Energy consumption dropped to roughly 27 percent of what conventional systems consumed, and costs fell to less than 25 percent. In one scenario, the system reduced energy consumption by more than tenfold while sacrificing only about 2 percent of accuracy. Perhaps most tellingly, Murakkab identified an unexpectedly optimal configuration for a model that selects video frames in a Q&A task—an optimization that would be nearly impossible for a developer to discover manually, given the sheer size of the configuration space.

Gohar Chaudhry, the MIT graduate student who led the research alongside associate professor Adam Belay and Microsoft Azure's Ricardo Bianchini, emphasized the stakes. "Agentic workflows are getting very complicated and quickly becoming the backbone of what cloud providers are doing," he said. "Energy usage is a huge concern, so we need to be very careful about how efficient these workflows are. It is very easy to over-allocate resources, wasting energy and money." The paper describing Murakkab will be presented at the USENIX Symposium on Operating Systems Design and Implementation.

The researchers acknowledge that their work is still in early stages. They plan to expand the system to handle more complex workflows and larger computing clusters, and to explore optimization opportunities for new agentic applications. But the broader implication is clear: as agentic workflows become critical infrastructure at scale across major cloud platforms, intelligent optimization systems like this one could dramatically reduce wasted energy and operational costs. The question now is whether the cloud industry will adopt and build upon these techniques as agentic systems proliferate.

Notable Quotes

Agentic workflows are getting very complicated and quickly becoming the backbone of what cloud providers are doing. Energy usage is a huge concern, so we need to be very careful about how efficient these workflows are.

— Gohar Chaudhry, MIT graduate student and lead author

The Hearth Conversation Another angle on the story

Inventor

Why does it matter that developers currently have to hard-code all these choices upfront? What goes wrong?

Model

The configuration space is enormous. You're choosing from dozens of models, multiple tools, different hardware options, and various ways to sequence them. Even if you wanted to optimize manually, you'd likely miss the best configuration simply because there are too many possibilities to explore. And when a better model comes out, you start from scratch.

Inventor

So Murakkab lets you avoid that problem by describing what you want in plain language?

Model

Exactly. You say "I need a video Q&A system" and the system figures out which models and tools to use, how to arrange them, and what hardware to allocate. It's not magic—it's systematic exploration of the configuration space, but automated.

Inventor

The energy numbers are striking. How does it actually achieve a 73 percent reduction?

Model

Part of it is eliminating over-allocation. Developers often provision more resources than necessary because they're uncertain. Murakkab right-sizes everything based on actual requirements. It also identifies which components can run in parallel versus sequentially, which reduces idle time and wasted computation.

Inventor

Does the system have to retrain or reoptimize every time a user makes a request?

Model

It adapts dynamically, yes, but not from scratch. It learns user priorities—whether they care about speed or cost—and adjusts configurations accordingly. The cloud provider also gets visibility into multiple workloads, so resources can be shared more efficiently across customers.

Inventor

What's the catch? Is there a scenario where this approach fails?

Model

The researchers tested it on video Q&A and code generation tasks. Scaling to much larger, more complex workflows and bigger computing clusters is still ahead of them. But the fundamental approach seems sound.

Inventor

Why does this matter beyond just saving energy?

Model

At cloud scale, energy is money. It's also environmental. If every major cloud provider could cut energy consumption by 73 percent on agentic workflows, the cumulative impact would be enormous. And as these workflows become more central to what cloud platforms do, that impact only grows.

Want the full story? Read the original at MIT News ↗

MIT and Microsoft develop system to slash AI agent energy use by 73%

Notable Quotes

Get The Register in your inbox