Three API calls to steal secrets from memory without a password
Across the internet, roughly 300,000 servers quietly running AI workloads have been left open to a vulnerability that requires no password, no privilege, and almost no effort to exploit. Dubbed 'Bleeding Llama,' the flaw lives inside the way Ollama processes uploaded model files — a trust placed in metadata that an attacker can simply lie about, causing the server to read beyond its intended boundaries and surrender whatever secrets happen to be nearby in memory. The breach is not theoretical; it is a precise, repeatable technique that extracts prompts, credentials, and system instructions with surgical efficiency. In the ongoing negotiation between convenience and security, this moment asks organizations to reckon honestly with what they left exposed in the name of making AI easy to run.
- A critical flaw rated CVSS 9.1 allows any unauthenticated attacker to drain sensitive data from an Ollama AI server using just three API calls — no credentials required.
- The exploit weaponizes a malicious model file that lies about its own dimensions, tricking the server into reading past memory boundaries and embedding stolen bytes into a file the attacker can then retrieve.
- Because Ollama uses Go's low-level unsafe memory operations for performance, the leaked data survives conversion intact — meaning stolen secrets arrive at the attacker's server exactly as they left the victim's.
- An estimated 300,000 internet-exposed Ollama deployments remain unpatched, many belonging to organizations that may not even realize their AI servers are publicly reachable.
- A patch exists in Ollama 0.17.1, but remediation demands more than an upgrade — organizations must also enforce authentication, isolate deployments from public networks, audit logs, and rotate any exposed credentials.
- For servers that ran unpatched and internet-facing, the prudent posture is to assume compromise has already occurred and act accordingly.
Somewhere on the internet right now, an attacker with modest technical skill could be quietly draining secrets from an AI server without ever logging in. The vulnerability is called Bleeding Llama — CVE-2026-7482, CVSS 9.1 — and it affects roughly 300,000 Ollama deployments worldwide. Three API calls are all it takes to pull user prompts, system instructions, API keys, and environment variables straight out of server memory.
Ollama's appeal is its simplicity: users upload model files in GGUF format and the platform handles conversion, preparation, and storage. Researchers at Cyera found that this conversion process contains a critical flaw. Ollama trusts the metadata inside uploaded files to accurately describe tensor shapes and sizes. An attacker can craft a malicious GGUF file that misrepresents its dimensions, declaring a tensor far larger than the data it actually contains. The server then reads past the intended buffer boundary, pulling whatever sits nearby in heap memory.
What makes the flaw especially dangerous is that the stolen data survives intact. Ollama uses Go's unsafe memory operations for performance — bypassing normal safety checks — and a forced conversion from float-16 to float-32 creates a lossless pathway that preserves the leaked bytes exactly. The malicious model file now carries that stolen memory embedded inside it. Using Ollama's push functionality, the attacker uploads the file to a server they control, completing the exfiltration cleanly.
In enterprise environments, the stakes are high. Ollama deployments routinely process customer data, proprietary instructions, internal code, and API credentials. A single compromised server could expose months of accumulated prompts and secrets. The patch — Ollama 0.17.1 — corrects the tensor validation logic, but 300,000 servers remain unpatched. Full remediation means upgrading immediately, placing Ollama behind authentication, restricting it to internal networks, auditing logs for signs of exploitation, and rotating any credentials that may have been exposed. For organizations that ran an unpatched, internet-facing instance, the honest assumption is that the window for silent theft has already been open — and may already have been used.
Somewhere in the world right now, an attacker with basic technical knowledge could be stealing secrets from an AI server without typing in a password. The vulnerability is called Bleeding Llama, and it affects roughly 300,000 Ollama servers sitting exposed on the internet—machines that run local AI models for everything from research to customer-facing applications. All it takes is three API calls to pull sensitive data straight out of memory: user prompts, system instructions, API keys, environment variables, anything the server happens to have loaded.
Ollama is popular precisely because it makes running AI models locally straightforward. Users upload model files in GGUF format—a packaging standard that bundles tensors, metadata, and other information needed for inference—and Ollama handles the rest. The platform converts these files, prepares them for use, and stores them. Researchers at Cyera discovered that this conversion process has a critical flaw. When Ollama processes an uploaded file, it trusts the metadata inside to accurately describe the tensor shapes and sizes. An attacker can craft a malicious GGUF file that lies about its dimensions, declaring a tensor far larger than the actual data present. The server then reads past the intended buffer boundary, pulling whatever happens to sit nearby in heap memory.
What makes this particularly dangerous is that the leaked memory doesn't get corrupted or lost in translation. Ollama uses Go's unsafe functionality for low-level memory operations—a deliberate choice for performance that bypasses normal safety checks. When researchers forced a conversion from float-16 to float-32 format, they found a lossless pathway that preserves the stolen bytes exactly as they were. The malicious model file now contains this leaked data embedded inside it. Using Ollama's push functionality, an attacker can then upload that model to a server they control, exfiltrating the stolen memory from the target system entirely.
The consequences in an enterprise setting are severe. Ollama deployments often process sensitive information: customer data, proprietary instructions, internal code, API credentials. When Ollama connects to external tools or coding assistants, those outputs flow through memory too and become fair game for theft. A single compromised server could leak months of accumulated prompts, system configurations, and secrets. The vulnerability carries a CVSS score of 9.1—critical—and has been assigned CVE-2026-7482 by the Echo CVE Numbering Authority.
The fix exists. Ollama version 0.17.1 and later patch the tensor validation logic, preventing the out-of-bounds read. But 300,000 servers worldwide remain unpatched, and many organizations may not even know their Ollama instances are internet-facing. The remediation path is straightforward but demanding: upgrade immediately, pull Ollama behind authentication controls, restrict access to internal networks only, audit logs for signs of exploitation, and rotate any API keys or secrets that might have been exposed. For any organization that has run an unpatched, internet-accessible Ollama server, the prudent assumption is that prompts and environment data have already been stolen. The window for silent exploitation has likely already closed.
Notable Quotes
Leaked heap data can include user prompts, system prompts from other models, and environment variables stored by the host running Ollama, potentially exposing API keys, internal instructions, proprietary code, and customer-related content in enterprise environments.— Cyera researchers
The Hearth Conversation Another angle on the story
Why does this vulnerability matter more than other memory leaks we've seen in software?
Because it sits at the intersection of three things: AI infrastructure is new and often deployed without security-first thinking, the data flowing through these systems is increasingly sensitive and proprietary, and the attack requires almost no skill or authentication to execute.
Walk me through what an attacker actually does, step by step.
They craft a GGUF file with false tensor dimensions, upload it to the Ollama API, and the server reads beyond the buffer during conversion. The leaked memory gets preserved in the new model file. They push that file to their own server and now they have whatever secrets were in that heap.
Three API calls—is that really all it takes?
Yes. Upload the malicious file, trigger conversion, push the result. No authentication needed if the server is exposed. That's what makes it so dangerous at scale.
What kind of data are we talking about here? Is this theoretical or real?
Real. Prompts users typed into the AI, system instructions that guide model behavior, environment variables that often contain API keys and credentials. In enterprise settings, that's customer data, internal code, proprietary workflows. It's not abstract.
How long have these 300,000 servers been vulnerable?
The vulnerability exists in versions before 0.17.1. We don't know exactly when it was introduced, but the patch is recent. Many organizations haven't upgraded yet, so the window for exploitation is still open.
What should a company do if they've been running an exposed Ollama server?
Assume the worst. Upgrade immediately, audit logs for suspicious activity, rotate all API keys and secrets, and review what prompts and data might have been processed while the server was exposed. Then put it behind authentication and a firewall.