We've spent a lot of time making sure our models are grounded in reaction mechanisms the way an expert chemist would be.
In the vast, largely uncharted territory between chemistry and artificial intelligence, MIT's Connor Coley is teaching machines not merely to predict, but to reason — embedding the intuitions of expert chemists into models capable of navigating a molecular universe so large it defies human comprehension. His work addresses one of medicine's most persistent bottlenecks: the staggering gap between the number of compounds that could theoretically become drugs and the few that human effort alone could ever evaluate. At stake is not just faster drug discovery, but a deeper question about how scientific knowledge itself can be encoded, transferred, and amplified through computation.
- The search space for potential drug compounds spans up to 10^60 molecules — a number so vast that experimental science alone will never close the distance.
- Most AI models treat chemistry as a pattern-matching exercise, missing the underlying physical principles that determine whether a reaction is even possible.
- Coley's lab built models like ShEPhERD and FlowER by forcing machine learning systems to respect chemical laws — conservation of mass, reaction mechanisms, feasibility of intermediate steps.
- Pharmaceutical companies have already adopted these models into their drug discovery pipelines, signaling a shift from academic proof-of-concept to real-world deployment.
- The lab continues expanding across structure elucidation, laboratory automation, and experimental design, pushing toward AI systems that think the way trained chemists do.
Connor Coley occupies a rare position at MIT — holding appointments across chemical engineering, computer science, and the Schwarzman College of Computing — and has spent years working on a problem that sits at the edge of what either field can solve alone. The number of chemical compounds that could theoretically function as small-molecule drugs is almost incomprehensibly large, somewhere between 10^20 and 10^60. No laboratory could test them all. AI offers a way through, but only if the models doing the searching actually understand chemistry rather than merely mimicking its outputs.
Coley's path to this work was shaped early. Growing up in Dublin, Ohio, in a family of scientists and mathematicians, he graduated high school at sixteen and arrived at Caltech drawn to chemical engineering as a bridge between science and mathematics. He taught himself programming along the way, writing Fortran code to help solve protein crystal structures. When he came to MIT for his PhD in 2014, he found his true focus: using machine learning to understand how chemicals behave and how they can be made.
His doctoral work, supported by a DARPA program aimed at improving compound synthesis, laid the foundation for what came next. After a postdoctoral fellowship at the Broad Institute — where he worked on identifying molecules capable of binding to disease-associated proteins — he returned to MIT in 2020 to build a lab with a clear mission: design new molecules with desired properties and discover new ways to manufacture them.
The models his lab has produced reflect a deliberate philosophy. FlowER, a generative model for predicting chemical reactions, was built with conservation of mass encoded directly into its architecture and was required to evaluate whether intermediate reaction steps are actually feasible — the kind of reasoning expert chemists apply instinctively. ShEPhERD assesses drug candidates by modeling how their three-dimensional shapes interact with target proteins, and has since been adopted by pharmaceutical companies. The consistent thread is Coley's insistence that machine learning, to be useful in chemistry, must be grounded in the principles that govern how molecules actually behave. His bet is that the most powerful systems will be those that combine AI's pattern recognition with the deep mechanistic reasoning that defines expert chemical thinking.
Connor Coley stands at the intersection of two disciplines that rarely speak to each other: the precise, rule-bound world of chemistry and the pattern-matching power of artificial intelligence. As an MIT associate professor with appointments spanning chemical engineering, computer science, and the Schwarzman College of Computing, he has spent the last several years teaching machines to think like chemists—not just to crunch numbers, but to understand why molecules behave the way they do.
The problem he's trying to solve is staggering in scale. Between 10 to the 20th power and 10 to the 60th power chemical compounds could theoretically serve as small-molecule drugs. No human chemist, no laboratory, no amount of experimental work could ever test them all. That's where AI enters. In recent years, researchers across the field have begun using machine learning to sift through this astronomical possibility space, identifying which compounds might actually work as medicines. Coley's contribution has been to embed chemical reasoning directly into these models, so they don't just predict outcomes—they understand the principles governing why those outcomes occur.
His path to this work began in Dublin, Ohio, where he was the kind of student who competed in Science Olympiad and graduated high school at sixteen. His family had primed him for this: his father is a radiologist, his mother studied molecular biophysics and biochemistry before business school, his grandmother taught mathematics. At Caltech, he chose chemical engineering as a way to marry his love of science with mathematics, but he also taught himself computer science, writing code in Fortran to help solve protein crystal structures. When he arrived at MIT in 2014 as a PhD student, he found his true subject: the marriage of machine learning and cheminformatics—using computation to understand how chemicals behave and how they can be synthesized.
His doctoral work, advised by Klavs Jensen and William Green, focused on optimizing automated chemical reactions through machine learning. Much of it was supported by a DARPA program called Make-It, which aimed to use data science to improve how medicines and other compounds could be synthesized from basic building blocks. By the time he was finishing his degree, Coley had already begun applying for faculty positions. At twenty-five, he accepted an offer from MIT itself—the same institution where he'd just completed his PhD. Some colleagues warned him against staying in one place, but he saw something they didn't: MIT's rare ability to support work at the intersection of AI and science, with resources and collaborations that seemed impossible to refuse.
Before taking the job, he deferred for a year to do a postdoctoral fellowship at the Broad Institute, where he worked on identifying small molecules from vast DNA-encoded libraries that might bind to disease-associated proteins. When he returned to MIT in 2020, he built a lab with a specific mission: use AI not just to synthesize known compounds, but to design entirely new molecules with desired properties and discover new ways to manufacture them. The work has produced several notable models. ShEPhERD evaluates potential drug molecules by analyzing how their three-dimensional shapes will interact with target proteins—a model now deployed by pharmaceutical companies in their own drug discovery pipelines. FlowER, another generative model, predicts what chemical products will result from combining different inputs.
What distinguishes Coley's approach is his insistence on grounding these models in actual chemistry. When his lab designed FlowER, they didn't just feed it data and let it learn patterns. They built in fundamental physical principles, like the law of conservation of mass. They forced the model to consider whether the intermediate steps in a reaction pathway are actually feasible—the kind of reasoning that expert chemists do intuitively, thinking through mechanisms and how reactions evolve step by step. This constraint-based approach improved the model's accuracy significantly. "Chemists think about intermediate steps and mechanisms naturally," Coley explains. "It's how chemistry is taught. But machine-learning models don't inherently think that way. We've spent a lot of time making sure our models are grounded in reaction mechanisms the way an expert chemist would be."
His lab now pursues multiple research threads: computer-aided structure elucidation, laboratory automation, optimal experimental design. Each thread pulls in a different direction, but they all point toward the same horizon—advancing what AI can do in chemistry. The work matters because drug discovery is slow and expensive, and the space of possible compounds is so vast that human intuition alone will never be enough. But neither will pure machine learning, untethered from the principles that govern how molecules actually work. Coley's bet is that the future belongs to systems that combine both: the pattern recognition of AI with the deep chemical reasoning of expert practitioners.
Notable Quotes
MIT is a very special place in terms of the resources and the fluidity across departments. The caliber of students and the strength of collaborations definitely outweighed any potential concerns of staying in the same place.— Connor Coley, on why he chose to stay at MIT for his faculty position
We're trying to give more of a medicinal chemistry intuition to the generative model, so the model is aware of the right criteria and considerations.— Connor Coley, on designing AI models that incorporate chemical reasoning
The Hearth Conversation Another angle on the story
Why does it matter that these models understand chemistry, rather than just predicting outcomes?
Because a model that only predicts can fail in ways you don't understand. If it tells you a reaction will work, but the intermediate steps are impossible, you've wasted time and resources. A chemist knows to think through the mechanism. We're trying to give machines that same intuition.
How did you decide which chemical principles to build in?
We started with the basics—conservation of mass, feasibility of reaction steps. But it's an ongoing conversation with chemists in the lab. They tell us what they naturally consider, and we figure out how to encode it mathematically.
ShEPhERD and FlowER are already being used by pharmaceutical companies. What does that feel like?
It's validating, but it also raises the stakes. We're not just publishing papers anymore. These models are making real decisions about which compounds to pursue. That responsibility pushes us to be more rigorous.
You stayed at MIT after your PhD, which is unusual. Do you regret it?
Not for a second. The fluidity across departments here is rare. I can collaborate with chemists, computer scientists, engineers—all in one ecosystem. That's not common.
What's the next frontier?
Designing molecules with multiple desired properties simultaneously—not just efficacy, but also manufacturability, safety, cost. That's where the real complexity lies.