Apple's Third-Generation Foundation Models: On-Device and Cloud AI Explained

Twenty billion parameters in your pocket, activating only what's needed.
Apple's new on-device model uses sparse architecture to deliver powerful AI without draining battery or slowing performance.

At WWDC26, Apple revealed the third generation of its foundation model family — a five-model lineup that quietly acknowledges the gap between idealism and the demands of building AI at scale. Where the company once pledged to keep everything in-house on Apple silicon, it now hosts its most powerful model on Google's servers, running on Nvidia chips, while engineering new layers of cryptographic trust to preserve the privacy promises that have long defined its identity. It is a portrait of a company navigating the tension between principle and pragmatism, arriving at something more complicated — and perhaps more honest — than where it began.

  • Apple's original promise — that powerful AI could live entirely within its own walls — has quietly given way to a partnership with Google Cloud, marking a significant philosophical concession.
  • The most capable model in the new lineup, AFM 3 Cloud Pro, runs on Nvidia GPUs inside Google's infrastructure, forcing Apple to extend its Private Cloud Compute architecture into territory it never designed for.
  • To preserve privacy guarantees on third-party servers, Apple and Google built a layered security framework with cryptographically verifiable hardware ledgers and isolated inference environments — a technical workaround for a trust problem that didn't exist two years ago.
  • On the device side, AFM 3 Core Advanced packs twenty billion parameters into a phone using sparse architecture that activates only a fraction of them per request — a genuine leap that keeps Apple's on-device ambitions alive even as its cloud strategy shifts.
  • Human evaluators consistently preferred the new models over their predecessors across languages and tasks, suggesting the strategic compromises are producing real capability gains.

At WWDC26, Apple unveiled its third generation of foundation models — a family of five that tells a more complicated story than the company's original AI vision ever anticipated.

When Apple introduced foundation models in 2024, the philosophy was straightforward: keep everything in-house. A modest on-device model paired with a larger server model running exclusively on Apple silicon, all wrapped in a Private Cloud Compute architecture that let independent researchers verify privacy claims. Capability and privacy, Apple insisted, didn't have to be enemies.

Two years later, the lineup has expanded and the boundaries have shifted. On the device side, AFM 3 Core offers incremental improvements to the original three-billion-parameter model, while AFM 3 Core Advanced represents something more ambitious — twenty billion parameters running locally through a sparse architecture that wakes only one to four billion of them per request, depending on the task. The result is a model capable of expressive voice synthesis and high-accuracy dictation, built on research Apple published a year ago.

The server side tells a different story. Three cloud models handle different roles: AFM 3 Cloud for speed and efficiency, ADM 3 Cloud (Image) for generation and editing tools, and AFM 3 Cloud Pro for the hardest problems — complex reasoning and multi-step agentic tasks. That last model runs not on Apple silicon, but on Nvidia GPUs inside Google Cloud — the first time Apple has ever extended Private Cloud Compute to third-party infrastructure.

To make that work without abandoning its privacy commitments, Apple and Google built a layered security framework: cryptographically verifiable hardware ledgers, multiple independent roots of trust, and isolated inference environments designed to resist everything from side-channel attacks to supply chain compromises.

All five models were trained on public, licensed, open-source, and synthetic data — never on user interactions — and Apple allows web publishers to opt out of training use. Evaluations showed consistent improvements over prior generations across languages, with AFM 3 Core Advanced outperforming Apple's existing dictation system across all seven quality dimensions tested. The technical papers are public for anyone who wants to look closer.

At this year's WWDC, Apple unveiled its third generation of foundation models—a lineup that tells a story about where the company's AI ambitions have landed after two years of development and a significant strategic pivot. The new family consists of five models split between three different homes: some live on your device, some run in Apple's own data centers, and one now lives on Google's servers, powered by Nvidia chips. It's a more complicated picture than Apple's original vision, but it's also more honest about what building AI at scale actually requires.

When Apple first introduced foundation models in 2024, the company's philosophy was clear: keep everything in-house. The initial lineup paired a modest three-billion-parameter model that ran locally on devices with a larger server-based model that operated exclusively on Apple silicon in Apple data centers. This approach served a purpose. Private Cloud Compute, as Apple called it, was designed to offer cloud-level AI capabilities while maintaining the same privacy guarantees users expected from processing that never left their phones. The company could even let independent security researchers verify those privacy claims. It was ambitious, and it reflected Apple's long-standing position that privacy and capability didn't have to be enemies.

But ambition and reality don't always align. As Apple struggled to advance its AI capabilities at the pace the market demanded, the company made a pragmatic choice: it partnered with Google to use Gemini as the foundation for its new AI work. The results of that collaboration arrived at WWDC26 this week.

The new lineup breaks down into two categories. On the device side, there's AFM 3 Core, an updated version of the original three-billion-parameter model with improved quality, and AFM 3 Core Advanced, which represents a genuine leap forward. That second model packs twenty billion parameters into a device that fits in your pocket—a number that would have seemed impossible for on-device AI just a few years ago. Apple achieved this through a technique called sparse architecture, which doesn't activate all twenty billion parameters at once. Instead, depending on what you're asking the model to do, it wakes up only one to four billion parameters per request. This selective activation is based on research Apple published a year ago, a method distinct from the more commonly known Mixture of Experts approach. The result is a model powerful enough to handle expressive voices and high-accuracy dictation, capabilities that require genuine understanding of language and context.

On the server side, Apple offers three options. AFM 3 Cloud is the workhorse—optimized for speed and efficiency. ADM 3 Cloud (Image) handles image generation and editing, powering tools like the new Image Playground. And then there's AFM 3 Cloud Pro, the most capable model in the entire lineup, designed for the hardest problems: complex reasoning and what Apple calls agentic tool use, where the AI acts as an agent taking multiple steps to solve a problem.

Here's where the story gets interesting. AFM 3 Cloud Pro doesn't run on Apple silicon. It runs on Nvidia GPUs hosted in Google Cloud. This marks the first time Apple has extended its Private Cloud Compute architecture to third-party infrastructure, a decision that required rethinking how to maintain security and privacy guarantees when the servers aren't in your own data centers. Apple and Google collaborated on a security framework that goes beyond standard confidential computing. The system maintains a cryptographically verifiable ledger of all the hardware involved, uses multiple independent roots of trust for critical components, and isolates different parts of the inference process into separate, dedicated environments. It's a layered approach designed to protect against everything from side-channel attacks to supply chain compromises.

All five models share a common foundation before specializing for their specific roles and hardware. They were trained on a mixture of publicly available information, licensed data, open-source material, data from dedicated studies, and synthetic data—but notably, not on user data or interactions. Apple also allows web publishers to opt out of having their content used in foundation model training.

When Apple evaluated these models against their predecessors, the improvements showed across the board. Human reviewers preferred responses from the new models in side-by-side comparisons, with consistent gains across different languages and locales. For dictation specifically, AFM 3 Core Advanced outperformed Apple's existing production system across all seven quality dimensions tested. The company has published detailed technical papers on all of this work, available for anyone who wants to dig deeper into how these models were built and evaluated.

We do not rely solely on confidential computing technologies to mitigate attacks that leverage privileged access outside of a confidential VM, including side-channel attacks.
— Apple, on security architecture for AFM 3 Cloud Pro
La Conversación del Hearth Otra perspectiva de la historia
Inventor

Why did Apple need to partner with Google? Couldn't they build this themselves?

Model

They could have, but not at the pace the market demanded. Building frontier AI models is expensive and time-consuming. Google had already done the hard work with Gemini. Rather than spend another year or two catching up, Apple made a strategic choice: use proven technology and focus their own research on the parts that matter most to their ecosystem—on-device efficiency and privacy architecture.

Inventor

So Apple is giving up on the idea of keeping everything in-house?

Model

Not entirely. Most of the work still happens on Apple silicon or in Apple data centers. But AFM 3 Cloud Pro, the most powerful model, runs on Google's infrastructure. Apple had to decide: do we limit our most capable model to what we can build ourselves, or do we extend our privacy framework to trusted partners? They chose the latter.

Inventor

Is that actually safe? Running AI on someone else's servers?

Model

That's the real question. Apple and Google built a security architecture specifically designed for this scenario. They're not relying on confidential computing alone—they've layered in hardware attestation, cryptographic verification of the infrastructure itself, and isolated processes for handling sensitive data. It's more complex than keeping everything in-house, but it's designed to be verifiable.

Inventor

What's the sparse architecture thing about? Why does that matter?

Model

Twenty billion parameters on a phone would normally be impossible—it would drain the battery and slow everything down. Sparse architecture means the model learns which parameters are actually needed for different types of requests. For a simple question, maybe only two billion parameters activate. For something complex, up to four billion. You get the capability of a much larger model without the computational cost.

Inventor

So these models are better than the last generation?

Model

Measurably so. Human reviewers preferred the new models in direct comparisons. For dictation, which is something people use every day, the improvement was consistent across every dimension Apple tested. It's not revolutionary, but it's the kind of steady progress that compounds over time.

Inventor

What happens to all the data these models process?

Model

That's where the privacy architecture comes in. Apple designed the system so that the data flowing through Google's servers is encrypted and isolated in ways that prevent even Google from seeing it. The company maintains a verifiable ledger of what hardware is involved and uses multiple independent security roots. It's not perfect—no system is—but it's designed to be auditable.

Quieres la nota completa? Lee el original en 9to5Mac ↗
Contáctanos FAQ