Measurement science is never finished work
In a quiet but consequential act of institutional evolution, the U.S. National Institute of Standards and Technology has transformed its AI safety consortium into a broader scientific enterprise — one aimed at building the measurement frameworks, evaluation standards, and collaborative infrastructure that will define how artificial intelligence is understood and deployed across American society. The move reflects a maturing recognition that safety and technological leadership are not competing priorities but deeply intertwined ones, and that the unglamorous work of standardization may prove as consequential as any breakthrough in the laboratory.
- A 280-member federal AI safety body has been quietly but deliberately reborn with a wider mandate, signaling that the government's original safety-first framing was too narrow for the scale of what AI now demands.
- Six specialized working groups will divide the hard technical labor — from testing AI fitness-for-purpose and annotating risk, to combating misinformation in intelligence systems and setting documentation standards across the industry.
- A dedicated group partnering with intelligence research agencies will tackle the vulnerabilities of large language models — hallucinations, data leaks, adversarial attacks — raising the stakes well beyond academic evaluation.
- The restructuring is anchored in executive and legislative mandates, positioning the consortium as a direct instrument of U.S. technological strategy rather than a purely scientific advisory body.
- New members are being recruited now on a first-come basis, while existing members must formally accept the consortium's expanded direction — a quiet but binding realignment of purpose.
The National Institute of Standards and Technology has restructured one of the federal government's most ambitious AI initiatives. What began two years ago as a focused safety effort has become something broader: a consortium dedicated to building the scientific foundations for measuring, testing, and deploying AI systems across the American economy.
The previous AI Safety Institute Consortium had assembled more than 280 organizations to develop evidence-based standards for measuring artificial intelligence — work that produced a globally recognized framework. But NIST determined that framework needed to grow. The new AI Consortium opens to additional members and reorients around three goals: building an ecosystem for evaluating AI systems, advancing science that AI itself can enable, and promoting the development of American-made AI technologies.
Craig Burkhardt, NIST's deputy director, framed the expansion as an effort to draw on a wider community of technical expertise. Safety concerns remain embedded in the work — but the agency now treats measurement science, standardization, and technological leadership as inseparable from safety itself.
The reorganization responds to the National Artificial Intelligence Initiative Act of 2020 and Executive Order 14179 from 2025, positioning the consortium as a vehicle for accelerating AI from laboratory to real-world adoption. Six working groups will carry out this mission: one developing testing tools for AI fitness-for-purpose; another building risk assessment frameworks; a third identifying gaps in evaluation science. A fourth group — the Benengal Group — will partner with intelligence research agencies to address misinformation, data leaks, and adversarial attacks on large language models. A fifth will standardize documentation practices across datasets, models, and testing processes. A sixth will reactivate work on AI applications in chemical and biological safety.
The call for new members is open now, with selection on a first-come basis. Existing members must sign an amendment accepting the consortium's new direction. What takes shape here will influence how American institutions understand and deploy artificial intelligence for years to come.
The National Institute of Standards and Technology has quietly restructured one of the government's most ambitious efforts to make sense of artificial intelligence. What began two years ago as a focused safety initiative has now become something broader and more ambitious: a consortium dedicated to building the scientific foundations for measuring, testing, and deploying AI systems across the American economy.
The shift marks a significant recalibration in how the federal government approaches AI governance. The previous incarnation, called the AI Safety Institute Consortium, had assembled more than 280 organizations to develop evidence-based guidelines and standards for measuring artificial intelligence. That work produced something foundational—a global framework for AI measurement science. But NIST determined that framework needed to expand. The new AI Consortium, as it is now called, opens its doors to new members and reorients its mission toward three interconnected goals: building an ecosystem for evaluating AI systems, advancing science that AI itself can enable, and promoting the development and use of American-made AI technologies and platforms.
Craig Burkhardt, NIST's deputy director, framed the expansion as an effort to leverage capabilities and priorities from a wider community. The agency is inviting organizations with relevant technical expertise to join the work of addressing the challenges that come with developing and deploying artificial intelligence at scale. This is not a pivot away from safety concerns—those remain embedded in the work—but rather a recognition that measurement science, standardization, and technological leadership are inseparable from safety itself.
The reorganization sits within a larger American strategy for technological dominance in the twenty-first century. It responds to the National Artificial Intelligence Initiative Act of 2020, aligns with Executive Order 14179 from 2025, and reflects the U.S. government's stated commitment to accelerating critical and emerging technologies from laboratory to real-world adoption. The consortium becomes a vehicle for that acceleration, a place where government, industry, and research institutions can collaborate on the unglamorous but essential work of defining what it means for an AI system to work as intended.
The new structure organizes this work into six specialized groups. One will develop tools for testing and evaluating whether an AI system meets its design requirements and is fit for its intended purpose. Another will support the creation of science-based tools for assessing AI risks and impacts. A third will identify gaps and open questions in AI evaluation science by drawing input from different types of organizations, sectors, and stakeholder profiles. The Benengal Group will work with the Intelligence Advanced Research Projects Activity to explore scalable solutions against misinformation, data leaks, faulty reasoning, and attacks on large language models—with an eye toward reliable use of these systems in intelligence analysis. A fifth group will develop standardized templates and practices for documenting datasets, models, AI systems, and testing processes. And NIST will reactivate a working group focused on chemical and biological safety, bringing emerging AI measurement and evaluation approaches to bear on those domains.
The call for new members is open now. NIST is seeking letters of interest from organizations with relevant technical capabilities. The agency will select participants on a first-come, first-served basis as complete applications arrive. Existing members do not need to reapply, though they will need to sign an amendment accepting the consortium's new direction. What emerges from this restructuring will shape how American companies, government agencies, and research institutions understand and deploy artificial intelligence for years to come.
Notable Quotes
The agency seeks to expand its AI measurement efforts by leveraging the capabilities and priorities of a broader community— Craig Burkhardt, NIST deputy director
The Hearth Conversation Another angle on the story
Why expand a consortium that was already working? The previous one had 280 members and was producing standards.
Because measurement science is never finished. They had built a foundation, but the real work—making that foundation useful across different sectors and use cases—requires more hands, more perspectives, more technical capabilities than they had assembled.
So this is about scale, not direction?
It's about both. The original consortium was safety-focused. This one keeps that concern but broadens the mission to include how AI gets evaluated, how it gets documented, how it gets deployed reliably. Safety is part of that, but so is making sure American AI systems are competitive and trustworthy.
The Benengal Group working on misinformation and language models—that sounds like a direct response to real problems we're seeing now.
It is. They're not waiting for a crisis. They're trying to understand how large language models can be made reliable enough for intelligence work, which is about as high-stakes as it gets. If you can make them work there, you've solved something fundamental.
What happens to the organizations that were already in the old consortium?
They stay in, but they have to accept the new charter. It's not a purge. It's a reframing of what the group is trying to accomplish. Some of them will probably find the broader mandate more useful anyway.
Is this about competition with other countries?
Partly. The executive order language makes that clear—this is about American technological leadership. But it's also about the fact that you can't have reliable AI deployment without reliable measurement. That's true whether you're competing or not.