The robot's equivalent of a dolphin's sonar paired with a sea turtle's sight
For as long as humans have sent machines into the deep, murky water has been the great equalizer — blinding cameras and forcing vehicles to wait in patient stillness while sediment slowly settles. Engineers at MIT and the Woods Hole Oceanographic Institution have now built a system called Sonar-MASt3R that fuses the structural honesty of sonar with the visual richness of optical cameras, allowing underwater vehicles to navigate and map in near-zero visibility for the first time. It is a reminder that perception, like wisdom, often requires combining what is precise with what is detailed — neither alone being sufficient to see the whole truth.
- Every time an underwater vehicle disturbs the seafloor, it blinds itself — churning up sediment clouds that can halt operations for hours and make safe navigation impossible.
- The core tension is a decades-old tradeoff: sonar gives shape without detail, cameras give detail without scale, and no existing system has reliably delivered both at once in turbid conditions.
- Sonar-MASt3R breaks the impasse by using sonar as a structural scaffold to calibrate the relative depth estimates of an optical algorithm called MASt3R, producing accurate real-time 3D maps even when cameras can see nothing at all.
- Tank tests across eight turbidity levels — including maximum cloudiness — showed the system resolving centimeter-scale object detail and outperforming all comparable opti-acoustic fusion approaches in the literature.
- The team is now preparing for open-ocean trials, where natural water conditions are expected to reduce the acoustic interference that complicated lab testing, potentially validating the system for real-world deployment.
When an underwater vehicle disturbs the seafloor, it blinds itself — sediment clouds render cameras useless, sometimes for hours. For decades, this has forced a hard choice: sonar for shape, or cameras for detail, but rarely both at once in murky conditions.
Engineers at MIT and the Woods Hole Oceanographic Institution have built a system called Sonar-MASt3R that dissolves this constraint. It treats sonar as a structural scaffold — a coarse but reliable map of surrounding shapes — and uses optical cameras to fill in visual detail once the vehicle is close enough to see. The key insight came from pairing an existing image-analysis algorithm, MASt3R, with sonar's absolute distance measurements. MASt3R can estimate relative depth by comparing camera views, but has no sense of true scale. Sonar provides exactly that, calibrating the algorithm's estimates into accurate, real-time 3D maps.
To validate the approach, the team mounted a camera and sonar sensor on a robotic arm inside a sediment-filled tank, sweeping it across objects including a boulder, a coffee mug, and a packing crate. A keyframe algorithm kept only images that revealed new information, preventing data overload. Across eight turbidity levels — up to total camera blindness — the system used sonar to navigate safely toward objects, then resolved them in centimeter-scale detail once close enough. It outperformed existing opti-acoustic fusion methods in both accuracy and resolution.
The work was motivated in part by the problem of unexploded underwater mines in shallow, turbid waters — environments where human divers cannot safely operate and robots have historically struggled to see. The same capability applies to scientific exploration, subsea construction, and deep-sea salvage. Senior scientist Richard Camilli offered a useful analogy: it is like having a rough mental map of a dark china shop, moving carefully toward a specific mug, then switching on a flashlight only once you are close enough to see it clearly.
The team plans to move testing into open ocean next, where they expect natural water to produce less acoustic interference than the tank environment. If field results match the lab, Sonar-MASt3R could open a category of underwater missions that have long been considered intractable — not for lack of technology, but for lack of the ability to see clearly in the dark.
When a remotely operated underwater vehicle disturbs the seafloor, it churns up clouds of sediment that render onboard cameras nearly useless. The vehicle must simply wait—sometimes for hours—until the murk settles enough to proceed safely. For decades, this has been the constraint: either you have visibility and can see detail, or you have sonar and can sense shape, but rarely both at once.
Engineers at MIT and the Woods Hole Oceanographic Institution have developed a system that breaks this impasse. Called Sonar-MASt3R, the technique merges acoustic sonar data with optical camera imagery in real time, allowing underwater vehicles to navigate and map environments so turbid that cameras alone would be blind. The approach treats sonar as a structural scaffold—a coarse but reliable map of what surrounds the vehicle—and uses optical cameras to fill in visual detail once the vehicle has moved close enough to see. The result is a hybrid perception system that works even when visibility drops to nearly zero.
The method builds on an existing image-analysis algorithm called MASt3R, developed by French researchers, which estimates depth by comparing multiple camera views of the same scene. The problem with MASt3R alone is that it has no sense of absolute scale. It can tell you that one pixel is five units closer than another, but not whether that distance is five meters or five feet. Sonar solves this precisely. By measuring how long acoustic waves take to bounce off objects and return, sonar yields exact distances and shapes. Phung and her advisor Richard Camilli, a senior scientist at WHOI, realized they could use sonar's absolute measurements to calibrate MASt3R's relative depth estimates, creating a system that generates accurate 3D maps in real time, even in murky water.
To test the concept, the team filled a tank with water, sediment, and objects—a boulder, a coffee mug, a packing crate—and mounted an underwater camera and sonar sensor on a robotic arm. In each trial, the arm swept slowly across the tank, gathering sonar and visual data simultaneously. Sonar-MASt3R first created a coarse map of shapes and contours using only the acoustic data. Then, guided by this rough map, the system directed the camera to capture close-up images of specific objects. A "keyframe" algorithm compared each new image to the previous one, keeping only frames that revealed new information and discarding redundant shots. This allowed the system to build a detailed map in real time without being overwhelmed by data.
The researchers tested eight levels of turbidity, stirring up sediment to simulate increasingly murky conditions. Even at maximum cloudiness—where the camera could see nothing—the sonar generated a usable map of the tank's contents. The system then guided the robotic arm safely through the sediment toward specific objects, which the camera could then resolve in centimeter-scale detail. Compared with other opti-acoustic fusion approaches in the literature, Sonar-MASt3R produced more accurate maps and finer resolution in cloudier conditions. Amy Phung, the MIT graduate student who led the work, presented the findings this week at the IEEE International Conference on Robotics and Automation.
The motivation for this work came partly from a sobering real-world problem: unexploded underwater mines in shallow, turbid waters where visibility is already poor. Recovering these safely requires robotic vehicles that can navigate and inspect without human divers entering the danger zone. But the same capability matters for scientific exploration, underwater construction, maintenance of subsea infrastructure, and deep-sea salvage operations—all scenarios where low visibility has historically made safe operation difficult or impossible.
Camilli offered a homely analogy: imagine walking through a dark china shop, trying to locate a specific coffee mug without knocking things over. Sonar-MASt3R is like having a rough mental map of where the shelves are, so you can move safely toward the mug, then switch on a flashlight once you're close enough to see it clearly. The team plans to test the system in natural ocean conditions next, where they expect the task to become easier. Tank experiments create acoustic echoes and reverberations that complicate processing; open water should be more forgiving. If the system performs as well in the field as it has in the lab, it could unlock a category of underwater missions that are currently untractable—not because the technology doesn't exist, but because human perception and robotic sensors have never been able to see clearly enough to work safely in the dark, suspended sediment of the deep.
Notable Quotes
We hope that this work enables us to do more operations in those challenging, low-visibility environments, and helps provide more coverage in areas that are difficult to operate in today.— Amy Phung, MIT graduate student who led the research
The real value in this effort is so we can use this technology in mission scenarios that are untractable right now. And there are plenty of untractable missions because we don't have the observational or perception capabilities.— Amy Phung
The Hearth Conversation Another angle on the story
Why does this matter more than just waiting for the sediment to settle?
Because waiting isn't always an option. If you're trying to recover an unexploded mine, or inspect a damaged pipeline, or search for something on the seafloor, time matters—sometimes urgently. And in some environments, the sediment never fully settles. You're fighting against the physics of the place itself.
So you're essentially giving the robot two senses instead of one.
Exactly. Sonar is like echolocation—it works in complete darkness, but it's coarse. Cameras are like vision—they give you detail, but only if there's light and clarity. By fusing them, you get the robot's equivalent of a dolphin's sonar paired with a sea turtle's close-range sight.
The tank tests showed it worked in maximum turbidity. Will it actually work in the ocean?
That's the honest question. Tanks are echo chambers—sound bounces off walls and creates ghosts and distortions that make processing harder. The ocean is messier in some ways, but cleaner in others. There's no ceiling bouncing sound back at you. The team thinks it should actually be easier in real water.
What happens if the sonar and camera disagree about what's there?
That's built into the algorithm. The sonar gives you the structure, the scale, the shape of things. The camera fills in the visual detail. They're not competing—they're complementary. Sonar says "there's an object three meters away." The camera says "and it's a coffee mug with a handle."
How long before this is actually deployed somewhere?
The team is moving toward real-world testing now. But deployment depends on the application. For mine recovery or deep-sea work, you're talking about specialized operations where the technology can be integrated into existing robotic systems. It's not something that gets deployed overnight, but it's also not theoretical anymore. It works.