MIT and Microsoft Develop System to Cut AI Agent Energy Use by 73%

Even if you wanted to do this manually, you'd likely configure it suboptimally

The configuration space for AI workflows is so vast that human developers struggle to find optimal setups without automated help.

As artificial intelligence weaves itself ever deeper into the fabric of cloud infrastructure, the hidden cost of that complexity—wasted energy, inefficient computation, and the sheer impossibility of optimal human configuration—has grown into a quiet crisis. Researchers at MIT and Microsoft have answered with Murakkab, a system that lets developers describe what they want in plain language and then handles the vast, shifting puzzle of model selection, hardware allocation, and workflow orchestration automatically. In testing, it reduced energy consumption to roughly a quarter of conventional methods without meaningful loss in performance—a result that suggests the next frontier in AI is not only what these systems can do, but how wisely they can be made to do it.

Agentic AI workflows—chains of models, tools, and databases solving multi-step problems—have quietly become the backbone of cloud computing, yet they are built in ways that waste enormous energy and computing power.
The configuration space for even a single workflow runs into the thousands of combinations, making it practically impossible for any developer to find the optimal setup by hand.
Murakkab dissolves that burden by accepting plain-language descriptions of intent and dynamically selecting models, tools, and hardware—adapting automatically when better options become available.
In live testing, the system cut computational load by 35%, slashed energy use to roughly 27% of conventional methods, and reduced costs below 25%—all while preserving accuracy.
The research team now aims to scale Murakkab to major cloud platforms, where cumulative savings in electricity and cost could reach a scale that reshapes how the industry thinks about AI infrastructure.

Picture an AI system that must watch a video, transcribe it, extract key moments, and answer questions about what it saw. Building it means chaining together multiple specialized models—vision, speech, reasoning—plus databases and tools, each with its own settings, each tunable for speed or cost. The combinations multiply into the thousands, and a developer configuring it by hand would almost certainly miss the optimal setup while burning far more energy than necessary.

This is the problem MIT graduate student Gohar Chaudhry, associate professor Adam Belay, and Microsoft Azure's Ricardo Bianchini set out to solve. Agentic workflows—these multi-step AI pipelines—now run behind the scenes of countless applications, yet they are typically built by hard-coding every technical choice upfront. When a better model is released, developers must start over. Cloud providers deploying the application can't see inside it to allocate hardware intelligently. The system is structurally wasteful.

Their answer is Murakkab—an Urdu word for a composition of things. Instead of demanding technical specificity, it asks developers only to describe their intent in plain language. From there, Murakkab determines which models and tools to use, which components can run in parallel, and what hardware makes sense—dynamically, so that new accelerators or models are absorbed without rewriting anything. When deployed for real users, it further tailors the workflow to what each user actually values: speed, accuracy, or cost.

The results were striking. Across video question-answering and code generation tasks, Murakkab used only 35% of the computation that traditional approaches required, reduced energy consumption to roughly 27% of comparable methods, and cut costs below 25%—without meaningful performance loss. In one case, it found an unexpectedly optimal model configuration a human developer would have been unlikely to discover manually.

The team presented the work at the USENIX Symposium on Operating Systems Design and Implementation and plans to scale Murakkab to larger clusters and more complex workflows. As data centers consume ever-growing amounts of electricity, the deeper promise of the project is a cloud infrastructure that is not merely powerful, but genuinely thoughtful about how it spends its resources.

Imagine you're building an artificial intelligence system that needs to watch a video, transcribe it, pull out key moments, and then answer questions about what it saw. You'd need to chain together multiple AI models—one for vision, one for speech, one for reasoning—plus databases and other tools. Each piece has its own settings. Each can run fast or slow, cheap or expensive. The combinations multiply into the thousands. A developer building this by hand would spend weeks guessing at the right configuration, and even then, would likely waste enormous amounts of computing power and energy.

This is the problem that researchers at MIT and Microsoft set out to solve. Agentic workflows—these complex chains of AI models and external tools working together to solve multi-step problems—are becoming the backbone of what cloud providers actually do. They're everywhere now, running behind the scenes of applications that users interact with every day. But the way they're typically built is deeply inefficient. Developers must hard-code every technical choice upfront: which models to use, in what order, on what hardware, with what resource allocations. When a new and better AI model gets released, the developer has to start over. The cloud provider deploying the application can't see inside the workflow to allocate its own hardware intelligently. The configuration space is so vast that even a developer trying their best would almost certainly miss the optimal setup.

The team—led by MIT graduate student Gohar Chaudhry, working with associate professor Adam Belay and Microsoft Azure's Ricardo Bianchini—built a system called Murakkab, an Urdu word meaning a composition of things. The innovation is elegant: instead of forcing developers to specify every technical detail, Murakkab lets them describe what they want in plain language. A developer might say: extract key frames from video, generate a transcript, answer user questions about the content. Murakkab then automatically figures out which models and tools to use, which components can run in parallel versus sequentially, and what hardware configuration makes sense. Crucially, it makes these decisions dynamically. If a new GPU accelerator or AI model becomes available tomorrow, the system adapts without requiring the developer to rewrite anything.

When a cloud provider deploys the application for actual customers, Murakkab optimizes further in real time. It looks at what the user cares about—maybe they want answers fast, or they want high accuracy, or they want to minimize cost—and configures the workflow accordingly. It allocates hardware, schedules computation, and generates a deployment plan that's ready to run. The system also gives cloud providers visibility into multiple workflows running simultaneously, so they can share computational resources across customers in the most efficient way possible while still meeting everyone's constraints.

The results from testing were striking. When researchers ran Murakkab on diverse agentic workflows—video question-answering systems and code generation tasks—it used only about 35 percent of the computation that traditional approaches required. Energy consumption dropped to roughly 27 percent of what other methods consumed. Costs fell to less than 25 percent. Performance didn't suffer. In one case, the system reduced energy consumption by more than tenfold with only a 2 percent drop in accuracy. It even discovered an unexpectedly optimal configuration for a video frame-selection model that a human developer would have been unlikely to find manually.

Chaudhry and his colleagues are clear about what comes next. Agentic workflows are only getting more complicated. Energy usage is a massive concern for cloud providers—data centers consume staggering amounts of electricity, and that consumption is growing. The researchers plan to scale Murakkab to handle more complex workflows and larger computing clusters. The real opportunity, Chaudhry says, is thinking about this at the scale of major cloud platforms, where the cumulative savings in energy and cost could be enormous. The system was presented at the USENIX Symposium on Operating Systems Design and Implementation, and it points toward a future where the infrastructure running AI applications is far more thoughtful about how it uses resources.

Citas Notables

Agentic workflows are getting very complicated and quickly becoming the backbone of what cloud providers are doing. Energy usage is a huge concern, so we need to be very careful about how efficient these workflows are.

— Gohar Chaudhry, MIT graduate student and lead author

There is a lot of potential to make these workflows more resource-optimal so they consume far less energy, but we need to be thinking about this at the scale of major cloud platforms.

— Gohar Chaudhry

La Conversación del Hearth Otra perspectiva de la historia

Inventor

Why does it matter that developers currently have to hard-code these workflows? Can't they just try different configurations and pick the best one?

Model

The configuration space is too large. There are thousands of possible combinations of models, tools, hardware settings, and resource allocations. Even if a developer had the time to test many of them, they'd almost certainly miss the optimal one. And that's before a new model gets released and they have to start over.

Inventor

So Murakkab is basically automating what a very smart engineer would do if they had infinite time?

Model

Partly, yes. But it's also doing something a human couldn't do at all—it's making decisions dynamically, in real time, based on what the user actually needs in that moment. A developer can't predict every user's priorities. Murakkab adapts.

Inventor

The energy savings are huge. 73 percent less energy. Why hasn't anyone solved this before?

Model

Because agentic workflows are new and they're complicated. You're combining black-box models from different companies, each with their own options. There's no single lever to pull. You need a system that understands the entire pipeline and can see tradeoffs across all of it.

Inventor

What happens if Murakkab makes a choice that turns out to be wrong?

Model

It adjusts. That's the dynamic part. It's not making a decision once and locking it in. It's continuously optimizing based on how the workflow actually performs and what the user actually needs.

Inventor

Is this just about saving money for cloud providers, or does it matter to regular people?

Model

Both. Cloud providers pass savings on to customers through lower prices. But more importantly, it means less electricity consumed, less heat generated, less strain on the grid. At the scale of major cloud platforms, that's significant.

Inventor

What's the catch? There's always a catch.

Model

The researchers are still scaling it up. Right now they've tested it on specific workflows. Making it work reliably across the complexity of real cloud infrastructure at massive scale—that's the next challenge.

Quieres la nota completa? Lee el original en Mirage News ↗

MIT and Microsoft Develop System to Cut AI Agent Energy Use by 73%

Citas Notables

Recibe The Register en tu bandeja