An AI system that never forgets a step, adapts to new guidelines automatically
Each year, hundreds of thousands of Americans suffer cardiac arrest outside a hospital, and survival hinges on whether a bystander acts in the first minutes — yet most do not, and even trained dispatchers miss critical steps. A study published in JAMA Internal Medicine offers a quiet but consequential finding: a purpose-built AI agent called ChatCPR followed emergency resuscitation guidelines with near-perfect fidelity, outperforming both generic AI models and human dispatchers by wide margins. The research does not promise a solution so much as it illuminates a gap — between what emergency response could be and what it currently is — and asks whether technology might help close it.
- Only 9% of out-of-hospital cardiac arrests in the US end in survival, and fewer than half of victims receive any bystander CPR at all — a gap measured in lives lost every day.
- Human dispatchers, despite their training, take a median of nearly three minutes to get compressions started and miss significant portions of established guidelines, introducing variability at the worst possible moment.
- ChatCPR, built on an open-source model and trained specifically on dispatcher protocols, scored 100% on essential CPR criteria and 98.9% on comprehensive guidelines when tested against real 911 call transcripts — surpassing dispatchers by 20 to 60 percentage points.
- The AI's advantages were sharpest in the areas where dispatchers struggled most: checking patient responsiveness, guiding compression quality, deploying AED instructions, and adapting technique for pediatric emergencies.
- Researchers are candid that the study was text-based and retrospective — the harder test, involving panic, noise, grief, and poor connectivity, has not yet been run, and real-world validation must come before any public deployment.
Every year, roughly 350,000 Americans suffer cardiac arrest outside a hospital. Nine percent survive. The margin between life and death often rests on whether a bystander begins chest compressions in the first minutes — yet only about four in ten victims receive any CPR at all. People forget their training, freeze under pressure, or never learned at all. When they call 911, they depend on a dispatcher who may be tired, inconsistent, or slow to recognize what is happening. The median time from call to first compression is nearly three minutes.
A study published in JAMA Internal Medicine tested whether artificial intelligence could do better. Researchers evaluated six widely available AI models across simulated cardiac arrest scenarios, then built a specialized tool called ChatCPR — trained on dispatcher CPR protocols using the open-source Llama 3.3 model — and compared it against both generic AI and real 911 dispatchers. The results were striking. ChatCPR achieved perfect adherence to a 13-point checklist of essential CPR steps and 98.9% adherence to a comprehensive 27-point protocol drawn from 2024 American Heart Association guidelines. Human dispatchers, on the same real 911 call transcripts, averaged 84.5% and 62.8% respectively.
The AI's edge was sharpest where dispatchers struggled most: confirming patient unresponsiveness, guiding compression depth and rate, deploying AED instructions, and adjusting technique for children. Even generic models performed reasonably well on basic criteria, but ChatCPR's specialized training produced the largest and most consistent gains.
The implications are real but not yet proven. CPR training in the US is widespread but shallow — 65% of adults report having taken a class, but only 2% trained in the past year. Skills decay. An AI accessible instantly through a phone, immune to fatigue, and automatically updated to current guidelines, could theoretically address all of this. Dispatcher-assisted CPR already improves 30-day survival by 60% over no intervention; a more reliable, consistent guide could push that further.
But the study's limits matter. It was text-based and retrospective, testing transcripts rather than live emergencies. It did not measure whether panicked callers would actually follow AI instructions, whether the system would hold up under noise and poor connectivity, or whether it might mistakenly instruct CPR when it shouldn't. The researchers are clear: prospective real-world validation is still needed. What the study establishes is that the technology works on paper — and works better than what currently exists. Whether it can work in the world remains the open question.
Every year, roughly 350,000 people in the United States suffer cardiac arrest outside a hospital. Nine percent survive. The difference between life and death often comes down to whether a bystander starts chest compressions in those first minutes—yet only about four in ten cardiac arrests receive any CPR at all. The barriers are familiar: people either never learned, or learned so long ago they've forgotten. When they call 911, they're hoping the dispatcher on the other end knows exactly what to say. But dispatchers are human. They take a median of 75 seconds just to recognize what's happening, and another 101 seconds before compressions actually begin. They vary in skill. They get tired. They miss steps.
A new study published in JAMA Internal Medicine suggests that artificial intelligence might do better. Researchers tested six widely available AI models on their ability to deliver CPR instructions in simulated cardiac arrest scenarios, then built a specialized tool called ChatCPR and compared it to both generic AI and real 911 dispatchers. The results were striking: ChatCPR achieved perfect adherence to a 13-point checklist of essential CPR steps and 98.9 percent adherence to a comprehensive 27-point guideline. Human dispatchers, by contrast, hit 84.5 percent on the essential steps and 62.8 percent on the full protocol. The AI agent outperformed dispatchers by 20 to 60 percentage points depending on the measure.
The study measured performance against the 2024 American Heart Association and American Academy of Pediatrics guidelines, using a checklist that included everything from recognizing unresponsiveness to proper hand placement, compression depth, and when and how to use an automated external defibrillator. The researchers tested six major AI models—Claude, Grok, GPT-4o, Gemini, and others—across five emergency scenarios. Even the generic models performed well on basic criteria, with an average of 89.7 percent adherence to essential steps. But ChatCPR, built on the open-source Llama 3.3 model and trained specifically on dispatcher CPR protocols, showed the largest gains in the areas where dispatchers struggled most: patient responsiveness checks, compression quality, AED deployment, and correct positioning for pediatric CPR.
The researchers then tested ChatCPR against transcripts of 12 real 911 calls where dispatchers had provided CPR guidance. The AI agent scored 100 percent on essential criteria and 98.9 percent on comprehensive criteria—missing only one item related to the recommended sequence of patient assessment. The human dispatchers on those same calls averaged 84.5 and 62.8 percent respectively. This wasn't a theoretical exercise. These were actual calls, actual people, actual instructions that either worked or didn't.
The implications are significant but not yet proven. The United States has a training problem: 65 percent of adults report having taken a CPR class at some point, but only 18 percent trained in the past two years and just 2 percent in the past year. Skills decay. When cardiac arrest happens, people freeze or second-guess themselves. Dispatcher-assisted CPR, introduced in the 1980s, improved 30-day survival odds by 60 percent compared to no bystander intervention—but it's still less effective than CPR initiated immediately by someone who knows what they're doing. An AI system that could be accessed instantly through a phone, that never forgets a step, that adapts to new guidelines automatically, could theoretically close that gap.
But the study has real limits. It was text-based and retrospective. It didn't test what actually happens when a panicked person is on the phone with an AI, when there's noise and confusion and a loved one dying in front of them. It didn't measure whether people would actually follow AI instructions as readily as dispatcher instructions, or whether the system would work reliably under poor connectivity or in the chaos of a real emergency. It didn't test false positives—whether the AI might instruct CPR in situations where it shouldn't. The researchers are clear about this: prospective real-world validation is still needed before this technology could be deployed in public health systems.
What the study does show is that the technology works on paper, and works better than the current alternative. The question now is whether it can work in the world.
Notable Quotes
The instructional agent not only surpasses generic AI models but also outperforms trained human dispatchers on checklist-based guideline adherence, particularly for advanced CPR guidance.— Study findings, JAMA Internal Medicine
Prospective real-world validation is still needed before deployment—the study was text-based and retrospective, and did not test live bystander behavior, survival outcomes, or performance under real-world stress, noise, and connectivity constraints.— Study limitations and forward look
The Hearth Conversation Another angle on the story
Why does it matter that an AI system beats dispatchers on a checklist? Isn't the real test whether people survive?
Absolutely. But the checklist is a proxy for survival. Every step—recognizing unresponsiveness, starting compressions at the right rate, using an AED correctly—directly affects whether the heart restarts. Dispatchers miss steps. The AI doesn't. That's a measurable difference.
But dispatchers have something an AI doesn't: they can hear panic in someone's voice, adjust their tone, build trust. Can an AI do that?
Not yet, and maybe not at all. But here's the thing: most people calling 911 don't need emotional support. They need to know exactly what to do in the next 30 seconds. An AI can deliver that with zero ambiguity. Whether that's better than a human voice—that's the real-world test we haven't done.
So this is still theoretical.
It's theoretical about real-world deployment. But it's not theoretical about the gap. Right now, four in ten cardiac arrests don't get CPR at all. People don't know what to do. An AI that's always available, always right, could change that.
What's the biggest risk?
That people trust it too much before we know it works in chaos. Or that it tells someone to do CPR when they shouldn't. The study didn't test that. We need to.