Meta hires UC Berkeley AI chief to lead safety efforts as agents enter economic mainstream

Help humans do this work better and provide more economic value
Song describes the goal of AI agents as augmentation, not replacement, in an interview at the World Economic Forum.

As artificial intelligence moves from the laboratory into the living economy, the question of whether it will serve humanity or supplant it has become one of the defining challenges of our era. Meta's decision to recruit Song — a Berkeley professor who spent her career building frameworks for trustworthy AI — to lead its superintelligence research reflects a growing recognition that capability without accountability is a foundation built on sand. Her arrival, carrying both a team of safety specialists and a benchmark designed to test AI against the texture of real work, suggests that the most consequential race in technology is no longer simply about who builds the most powerful system, but about who builds one the world can actually live with.

  • AI agents are no longer a distant promise — companies are deploying them into financial systems, supply chains, and complex industries right now, making the safety question urgent rather than theoretical.
  • The gap between what AI agents can do in a lab and what they can reliably do in the real world has exposed a dangerous blind spot, one that Meta is now racing to close.
  • UC Berkeley's 'Agents' Last Exam' benchmark — 1,500 tasks across 55 industries — was built precisely to hold AI to the standard of genuine economic usefulness, not just impressive performance under controlled conditions.
  • Meta's hiring of Song, the benchmark's co-architect and founder of an enterprise AI safety startup, signals that the company is betting its next phase of growth on trustworthiness as much as raw capability.
  • Song's guiding philosophy — that humans retain judgment while agents handle execution — is now being written into the architecture of one of the world's most powerful technology companies.

At UC Berkeley's Centre for Responsible, Decentralised Intelligence, researchers completed a benchmark they believed the industry was missing: a rigorous, grounded test of whether AI agents could perform genuinely useful work. Called Agents' Last Exam, it evaluated performance across more than 1,500 concrete tasks — financial analysis, supply chain management, and more — spanning 55 industries. It was a practical measure for a technology moving fast and raising hard questions.

Days after the benchmark's mid-June unveiling, Meta announced it had hired the researcher behind it. Song, a Berkeley computer science professor and co-director of the centre, was joining Meta's Superintelligence Labs as vice-president of AI research, bringing colleagues from Virtue AI — an enterprise safety startup she co-founded — to help shape how Meta's systems would behave as they grew more capable and autonomous.

The timing reflected the moment. AI agents — systems that perceive problems, make decisions, and act with minimal human intervention — were crossing from research into economic deployment. Song had spent her career at this exact tension, and she was unambiguous about her position: the goal was not replacement but augmentation. Agents would handle execution; humans would retain judgment. That was the model she believed in, and the one Meta was now asking her to build.

What made Song a distinctive hire was her insistence that safety is not a constraint on AI development but a precondition for it. Virtue AI was founded on the premise that you cannot deploy agents into the economy without confidence they will behave as intended — that trust is not a feature added later, but the foundation everything else rests on. The Agents' Last Exam embodied the same logic: not a test of what looks impressive, but of what actually works reliably in conditions resembling the real world.

Meta's decision to bring Song and her team aboard signals that the contest for AI leadership is expanding. The race for raw capability continues, but the race for safety and trustworthiness has become equally consequential. As agents move into production environments — into companies, supply chains, and financial systems — the cost of getting the safety question wrong has grown enormous. Song had spent years preparing for exactly this inflection point. Now she has the resources of one of the world's largest technology companies to meet it.

At UC Berkeley's Centre for Responsible, Decentralised Intelligence, researchers had just finished building something they believed the industry needed: a way to measure whether artificial intelligence agents could actually do useful work in the real world. The benchmark, called Agents' Last Exam, tested performance across more than fifteen hundred concrete tasks—everything from financial analysis to supply chain management—spread across fifty-five different industries. It was a practical yardstick for a technology that was moving fast and raising urgent questions about what it could and should do.

Days after the centre unveiled this benchmark in mid-June, Meta announced it had hired the woman who helped build it. Song, a computer science professor at Berkeley and co-director of the centre, was joining Meta's Superintelligence Labs as vice-president of AI research. She was bringing members of her team from Virtue AI, an enterprise safety startup she had co-founded, to help shape how Meta approached the safety and security of its AI systems as they became more capable and more autonomous.

The timing was not accidental. AI agents—systems that could perceive a problem, make decisions, and take action with minimal human intervention—were moving from research labs into economic reality. Companies were beginning to deploy them for real work. The question of whether they would augment human capability or displace it had shifted from theoretical to urgent. Song had spent her career thinking about this tension, and she was clear about where she stood.

When she spoke to journalists at the World Economic Forum in Dalian just before joining Meta, Song was deliberate about the framing. The goal was not to build machines that would replace people. Rather, the goal was to make AI agents effective enough to handle important, complex work in the real world—the kind of work that mattered economically—while keeping humans in the loop, making the decisions that counted. The agents would handle the execution; humans would handle the judgment. That was the model she believed in, and it was the model Meta was now paying her to help build.

Song's background made her a particular kind of hire. She was not a researcher chasing capability for its own sake. She had spent years thinking about how to make AI systems trustworthy, how to audit them, how to keep them aligned with human values even as they became more autonomous. Virtue AI had been built on the premise that safety was not a constraint on AI development—it was a prerequisite for it. You could not deploy agents into the economy without knowing they would behave as intended. You could not ask people to work alongside systems they did not understand or trust.

The Agents' Last Exam itself reflected this philosophy. It was not designed to push AI toward maximum capability. It was designed to test whether agents could handle real work—the kind of work that had economic value, that companies actually needed done. Fifteen hundred tasks across fifty-five industries meant the benchmark was grounded in what mattered, not in what looked impressive in a lab. It was a way of saying: can this system do something useful, reliably, in conditions that resemble the actual world?

Meta's decision to hire Song and her team signaled something about how the company was thinking about the next phase of AI development. The race for capability was not over, but the race for safety and trustworthiness was becoming just as important. As AI agents moved from research into production, into companies and supply chains and financial systems, the stakes of getting the safety question right had become enormous. Song had spent her career preparing for this moment. Now she had the resources of one of the world's largest technology companies behind her.

We want these AI agents to be more effective in important real-world domains and help humans do this work better and provide more economic value
— Song, in interview at World Economic Forum in Dalian
La Conversación del Hearth Otra perspectiva de la historia
Inventor

Why does Meta need someone like Song specifically? There are plenty of AI researchers.

Model

Because Song has spent years thinking about the gap between what AI can do and what it should do. She's not just chasing capability—she's built a company around the idea that safety is foundational, not optional.

Inventor

The benchmark tests fifteen hundred tasks. That seems oddly specific. Why that number?

Model

It's not about the number itself. It's about the breadth. Fifty-five industries means you're testing whether agents can handle real economic work, not just impressive lab demonstrations. It's grounded in what companies actually need.

Inventor

Song said the goal is not to replace humans. But isn't that always what happens with automation?

Model

Not necessarily. There's a difference between displacement and augmentation. If an agent handles the routine execution while a human handles judgment and oversight, you've changed the work, not eliminated it. That's what Song is arguing for.

Inventor

Why would Meta care about this distinction? Wouldn't they prefer to replace humans if they could?

Model

Because you can't deploy agents into the economy without trust. If people don't understand how the system works or believe it's working against them, adoption stalls. Safety and transparency become competitive advantages, not constraints.

Inventor

What does it mean that Song is bringing her whole team from Virtue AI?

Model

It means Meta is not just hiring an individual—they're importing a philosophy about how to build AI systems. The team has already thought through these problems. They're not starting from scratch.

Contáctanos FAQ