Microsoft's AI system MDASH uncovers 16 Windows flaws, including 4 critical RCE bugs

Freed memory could be accessed later in kernel context
A description of how MDASH identified a subtle use-after-free vulnerability in Windows TCP/IP stack that simpler tools would likely miss.

In a moment that may mark a turning point in how humanity defends its digital infrastructure, Microsoft has unveiled an AI system called MDASH that independently discovered sixteen security vulnerabilities buried deep within Windows—four of them severe enough to grant remote attackers full control of unpatched machines. Built by researchers who once competed in DARPA's autonomous hacking challenge, the system deploys over a hundred specialized AI agents that reason across sprawling codebases, debate their findings, and then attempt to prove each flaw is real before raising the alarm. Where human reviewers and simpler tools would likely pass by in silence, MDASH found the hidden seams where memory ownership breaks down and trust boundaries collapse—suggesting that the long, painstaking work of securing complex software may be entering a new era.

  • Four of the sixteen discovered flaws are critical remote code execution vulnerabilities, meaning an attacker on the network could seize full control of a Windows machine before any user logs in.
  • The bugs were hidden across kernel and user-mode networking and authentication components, their danger invisible to any single piece of code—only legible when reasoning across multiple files and execution paths simultaneously.
  • A use-after-free in the TCP/IP stack and a double-free in the IKEv2 service—running at the highest Windows privilege level—represent the kind of subtle, cross-file logic errors that traditional scanners routinely miss.
  • MDASH's multi-agent architecture, which separates scanning, debate, deduplication, and proof-of-concept generation into distinct stages, achieved 96–100% recall in retrospective testing and outscored all published competitors on a real-world benchmark of 1,507 vulnerabilities.
  • Microsoft is now moving MDASH beyond its internal security teams into limited customer preview, signaling that AI-assisted vulnerability discovery may soon become a standard layer of the industry's defensive posture.

Microsoft has built an AI security system called MDASH that independently found sixteen vulnerabilities in Windows, four of them critical flaws allowing remote attackers to execute code on unpatched machines without any credentials. The system works by deploying more than a hundred specialized AI agents that examine code in coordinated stages—scanning for weaknesses, debating findings among themselves, and then attempting to prove each flaw is real by crafting inputs that trigger it.

Two discoveries illustrate the depth of what MDASH can find. The first, CVE-2026-33827, was a use-after-free bug in the Windows TCP/IP stack: freed memory could be accessed later in kernel context, but the lifetime violation only became visible when reasoning across multiple files, non-obvious control flow, and cleanup routines scattered throughout the networking stack. The second, CVE-2026-33824, struck the IKEv2 service used for IPsec encryption. A remote attacker could send crafted UDP packets to trigger a double-free condition—two parts of the code each believing they owned the same memory block. Because IKEv2 runs with LocalSystem privileges, the highest on Windows, this opened a path to full remote code execution before any user logged in.

MDASH was built by Microsoft's Autonomous Code Security team in collaboration with the Windows Attack Research and Protection group, drawing on researchers who previously won the DARPA AI Cyber Challenge. Its pipeline combines frontier and distilled AI models across preparation, scanning, validation, deduplication, and proof-of-concept stages—an orchestration that fundamentally differs from single-model tools, which struggle with bugs that span files, complex execution paths, or concurrent processes.

The results bear this out. Against a private test driver, MDASH found all twenty-one planted vulnerabilities with zero false positives. In retrospective testing against historical Microsoft Security Response Centre cases, it achieved 96% recall over five years of confirmed bugs in one system file and 100% in another. On the public CyberGym benchmark covering 1,507 real-world vulnerability tasks, it scored 88.45%—the highest published result, roughly five points ahead of the next competitor.

Microsoft is currently deploying MDASH internally while a small group of customers tests it in private preview. The decision to expand access beyond internal teams suggests the industry may be approaching a meaningful shift—from manual code review and traditional static analysis toward AI systems capable of reasoning across complex codebases and proving what they find.

Microsoft has built an artificial intelligence system that found sixteen security flaws in Windows—four of them critical enough to allow attackers to run code remotely on unpatched machines. The system, called MDASH, works by deploying more than a hundred specialized AI agents that examine code in stages, looking for weaknesses, debating what they find, and then attempting to prove each flaw actually exists by crafting inputs that trigger it.

The vulnerabilities MDASH uncovered were embedded in Windows networking and authentication components. Ten lived in kernel-mode software, six in user-mode, and most could be reached by an attacker on the network without needing to log in first. Two examples illustrate why this matters. One flaw, tracked as CVE-2026-33827, was a use-after-free bug in the TCP/IP stack tied to how Windows handles a particular type of IPv4 packet routing. The problem was subtle: freed memory could be accessed later in kernel context, but the lifetime violation wasn't obvious when reading any single piece of code. It required understanding reference ownership across multiple files, control flow that wasn't straightforward, and cleanup routines happening elsewhere in the networking stack. A simpler scanning tool would likely miss it entirely.

The second example, CVE-2026-33824, affected the IKEv2 service used for IPsec encryption. A remote attacker could send specially crafted packets over UDP port 500 to trigger a double-free—a situation where two parts of the code believed they owned the same chunk of memory and both tried to free it. Because the IKEv2 service runs with LocalSystem privileges, the highest level of access on Windows, this flaw opened a direct path to remote code execution before any user even logged in. The bug spanned six files and stemmed from a shallow memory copy that duplicated pointers without duplicating the underlying data, creating the ownership confusion.

Microsoft's Autonomous Code Security team built MDASH in collaboration with the Windows Attack Research and Protection group. Several team members previously worked on Team Atlanta, which won the DARPA AI Cyber Challenge by creating an autonomous system to find and patch bugs in open-source software. The new system combines frontier and distilled AI models in a pipeline: it prepares code for analysis, scans for weaknesses, validates findings through a separate set of debating agents, removes duplicates, and attempts to prove each flaw with triggering inputs. This multi-model approach differs fundamentally from single-model systems, which can miss bugs requiring reasoning across multiple files, complex execution paths, or concurrent processes.

Benchmark results suggest the orchestration matters as much as any individual model. In testing against a private sample device driver called StorageDrive—used by Microsoft to interview offensive security researchers—MDASH found all twenty-one deliberately planted vulnerabilities with zero false positives. In retrospective testing against historical Microsoft Security Response Centre cases, it achieved a 96 percent recall rate for twenty-eight confirmed bugs in clfs.sys over five years and a 100 percent recall rate for seven confirmed bugs in tcpip.sys. On the public CyberGym benchmark, which covers 1,507 real-world vulnerability reproduction tasks from 188 open-source projects, MDASH scored 88.45 percent—the highest published score at the time, about five points ahead of the next entry.

Microsoft faces unusual challenges in automated security auditing. Much of its code is proprietary and therefore absent from public model training data. The operational environment also leaves little room for false positives in core systems—a mistake in kernel code can crash millions of machines. To address this, MDASH allows plugins that inject specialist knowledge general-purpose models might not infer on their own: kernel calling conventions, lock rules, inter-process communication trust boundaries, file-system structures. One such extension was built for CLFS to help construct triggering log files from candidate findings, moving the system from identifying a possible issue to validating a real vulnerability that engineering teams can fix.

Microsoft is currently using MDASH internally within its security engineering teams while a small group of customers tests it in limited private preview. Taesoo Kim, vice president of Agentic Security at Microsoft, said the system is helping engineering teams meaningfully improve security outcomes using generally available AI models today. The company's decision to expand access beyond internal use signals a potential shift in how the industry approaches vulnerability discovery—moving from manual code review and traditional static analysis toward AI-assisted systems that can reason across complex codebases and prove their findings.

The system is helping engineering teams meaningfully improve security outcomes using generally available AI models today
— Taesoo Kim, vice president of Agentic Security at Microsoft
A Conversa do Hearth Outra perspectiva sobre a história
Inventor

Why does MDASH find bugs that simpler AI tools miss?

Model

Because it reasons across multiple files and complex execution paths simultaneously. A single-model system might see a piece of code in isolation and miss the lifetime violation—the way memory ownership gets confused across the networking stack. MDASH's agents debate what they find, validate it, and then try to prove it works by crafting actual triggering inputs.

Inventor

The IKEv2 bug sounds particularly dangerous. Why?

Model

It runs with the highest privileges on Windows and requires no authentication. An attacker on the network can send crafted packets and gain complete control before anyone even logs in. That's the definition of critical.

Inventor

What makes Microsoft's situation different from scanning open-source code?

Model

Most of Microsoft's code is proprietary, so it wasn't in the training data for public AI models. And a false positive in kernel code can crash millions of machines. The stakes are higher, so the system needs to be more precise.

Inventor

How does MDASH actually prove a bug exists?

Model

It crafts inputs designed to trigger the flaw—like the specially constructed log files for CLFS bugs. It moves from "this might be a problem" to "here's how you actually exploit it." That's the difference between a candidate finding and a validated vulnerability engineers can fix.

Inventor

The benchmark scores are impressive, but do they mean MDASH will catch everything?

Model

No. It achieved 96 to 100 percent recall in retrospective testing, which is remarkable, but that's looking backward at known bugs. Real-world code always has surprises. What matters is that it's catching things human reviewers and simpler tools would miss.

Fale Conosco FAQ