A robot that performs beautifully in simulation often stumbles when it encounters the messy reality of actual objects.
In Vienna this June, a global gathering of researchers and engineers posed a quiet but consequential question to the field of artificial intelligence: does it actually work when the world pushes back? AGIBOT's 2026 World Challenge brought 526 teams from 27 countries to test embodied AI not in the clean abstraction of simulation, but against the friction, gravity, and unpredictability of physical reality. It was less a competition than a reckoning — a collective acknowledgment that intelligence which cannot act reliably in the world is, in the end, only a promise.
- For years, AI robotics has been quietly gaming its own exams — simulation scores rewarded algorithmic elegance while real-world robots stumbled on ordinary shelves and unexpected surfaces.
- AGIBOT forced the issue in Vienna, requiring finalist teams to control an actual humanoid robot through physical tasks, exposing the gap between what looks good on a benchmark and what holds up under real conditions.
- Over 526 teams competed across two tracks — one testing whether robots could reason through tasks and execute them physically, another probing whether AI could predict the messy consequences of real-world actions.
- More than a hundred teams cleared the official baseline, signaling that the field is no longer niche — institutions from Tsinghua to UC San Diego to Alibaba are all racing toward deployable, not just demonstrable, intelligence.
- The competition is now pointing toward an expanded ecosystem: online leaderboards, diversified benchmarks, and open toolchains designed to close the long-standing chasm between simulation and deployment.
In Vienna this June, AGIBOT staged a competition that amounted to a quiet ultimatum to the robotics world: stop hiding behind simulation. The AGIBOT World Challenge 2026, held alongside the International Conference on Robotics and Automation, drew 526 teams from 27 countries to test embodied AI — the kind of intelligence that must function in physical space, not just in code.
The field has long relied on simulation for evaluation. It is clean, fast, and reproducible. But robots that excel in virtual environments routinely fail when they meet actual objects, actual friction, actual gravity. AGIBOT's competition this year made a deliberate break from that tradition, centering its evaluation on closed-loop testing with real robots on real tasks.
The competition ran in two tracks. The Reasoning to Action track measured whether robots could interpret instructions, plan steps, and execute them physically — including adapting to disturbances and generalizing to unfamiliar tasks. The World Model track tested something more abstract: whether AI systems could accurately predict the physical consequences of robot actions. Participants included teams from the Chinese Academy of Sciences, Tsinghua University, UC San Diego, Sber Robotics Center, Alibaba, and vivo.
The format moved from online automated benchmarking to an offline final in Vienna, where teams controlled the AGIBOT G2 humanoid robot through physical environments. Scoring prioritized stability, adaptability, and the ability to complete long sequences of actions — the qualities that matter in actual deployment. PrismBot from vivo won the Reasoning to Action track; NeoVerse-ABot, a joint team from the Chinese Academy of Sciences and Amap CV Lab, took the World Model track.
Alongside the competition, AGIBOT released an open-source toolchain — including a dataset, a simulation platform, and access to the G2 robot — intended to help developers bridge the gap between training and deployment. The company plans to expand with online leaderboards, broader benchmarks, and deeper institutional partnerships. The message from Vienna was plain: the field is ready to hold itself to a harder standard.
In Vienna this June, more than five hundred research teams and companies gathered for a competition that marked a turning point in how the world measures artificial intelligence. AGIBOT WORLD CHALLENGE 2026, held alongside the International Conference on Robotics and Automation, brought together 526 teams from 27 countries to test embodied AI systems—the kind of intelligence that lives in robots and has to work in the physical world, not just in code.
For years, AI competitions have relied on simulation. A team trains a model in a virtual environment, runs it through standardized tests on a computer, and the one with the highest score wins. It's clean, reproducible, and fast. But there's a problem: a robot that performs beautifully in simulation often stumbles when it encounters the messy reality of actual objects, actual friction, actual gravity. AGIBOT's competition this year made a deliberate choice to change that. The evaluation framework moved beyond pure simulation scores toward what the company calls closed-loop testing on real robots, real tasks, and standardized benchmarks. In other words: does it actually work?
The competition split into two tracks. The Reasoning to Action track tested how well robots could understand a task, plan the steps needed to complete it, and then execute those steps in a physical space. The World Model track evaluated something more abstract but equally crucial: how well AI systems could predict what would happen in the physical world based on what they knew about robot actions and sensor inputs. Together, these tracks reflected a broader shift in embodied AI research—away from simple task execution and toward understanding, prediction, and decision-making. The participating teams came from prestigious institutions: the Chinese Academy of Sciences, Tsinghua University, UC San Diego, Sber Robotics Center in Russia, and major companies like Alibaba and vivo. More than one hundred teams surpassed the official baseline, a sign of genuine engagement across academia and industry.
The competition used a two-stage format. Teams first competed online, with their models evaluated automatically using AGIBOT's benchmarks—EWMBench and Genie Sim 3.0—which provided standardized metrics and reproducible results. Then the finalists traveled to Vienna for the real test: controlling the AGIBOT G2 humanoid robot to complete tasks in actual physical environments. This offline final forced teams to confront what matters in deployment: robot stability, the ability to adapt to unexpected physical conditions, and the capacity to complete complex tasks over long sequences of actions. The scoring system placed these practical concerns at the center, not as afterthoughts.
In the Reasoning to Action track, PrismBot, a team from vivo, took the championship. Shanghai RoboParty's RP-VLA and GreenVLA finished second and third. The track itself had evolved from the previous year's Manipulation competition—it now encompassed the full pipeline from understanding language instructions to spatial reasoning to executing atomic skills, adapting to disturbances, and even transferring knowledge to tasks the model had never seen before. AGIBOT also launched a separate real-supermarket benchmark alongside the main competition, where teams had to navigate robots through an actual retail environment, picking items from shelves and placing them elsewhere, all under real physical constraints like shelf heights and randomized item positions. Teams controlled the robots remotely through an API, making their algorithms directly responsible for what the physical machine did.
The World Model track, which focused on predicting physical interactions, was won by NeoVerse-ABot, a joint team from the Institute of Automation at the Chinese Academy of Sciences and Amap CV Lab. The PAI@IAII team from the Institute of Industrial Artificial Intelligence placed second, and the Loop team from the University of Science and Technology of China came third. This track deliberately included messy real-world scenarios—objects being dropped, grasping failures—to test whether models could handle the complexity of actual physical interaction, not just ideal conditions.
Beyond the competition itself, AGIBOT released a full-stack toolchain meant to help developers move from training to deployment. It includes the AGIBOT WORLD open-source dataset, Genie Sim 3.0, and access to the G2 robot platform. The company is positioning this infrastructure as a bridge across what has long been a chasm in robotics: the gap between what works in simulation and what works in the real world. By combining automated benchmarking with real-robot validation, AGIBOT is trying to establish a more consistent framework for evaluation—one that rewards not just algorithmic cleverness but deployability, generalization, and robustness.
Looking ahead, AGIBOT plans to expand the ecosystem further. The company intends to launch an online simulation leaderboard, introduce more diverse test tasks and benchmarks, and continue refining its toolchain in partnership with research institutions and industry. The underlying goal is clear: to move embodied AI forward not just through incremental algorithmic improvements, but toward systems that can actually be built, deployed, and scaled in real-world settings. The competition in Vienna was a statement that the field is ready to stop pretending simulation is enough.
Citas Notables
The competition placed robot stability, physical-world adaptability, and long-horizon task reliability at the center of the scoring system, aligning technical evaluation more closely with practical deployment needs.— AGIBOT competition framework
La Conversación del Hearth Otra perspectiva de la historia
Why does it matter that they tested on real robots instead of just in simulation?
Because a model that looks perfect in a virtual environment often fails when it touches actual objects. Real robots deal with friction, unexpected textures, objects that shift or fall. Simulation can't capture all of that. If you only measure performance in simulation, you're not actually measuring whether something will work when deployed.
So this competition is saying: prove it works in the real world, not just on paper.
Exactly. They brought finalists to Vienna with actual robots. The teams had to complete real tasks—picking items from shelves, navigating around obstacles. You can't fake that. It forces teams to build systems that are robust, not just clever.
What's the World Model track doing differently?
It's testing whether robots can predict what will happen next. If I push an object, will it slide or tip? If I try to grasp something, will my fingers slip? These predictions matter for planning. And they deliberately included failures—dropped objects, failed grasps—to make sure models could handle messy reality, not just ideal scenarios.
Five hundred teams from 27 countries. That's a lot of people working on this problem.
It shows the field is maturing. You've got universities, startups, major tech companies all competing. More than a hundred teams beat the baseline, which means there's real progress happening. But the fact that AGIBOT is pushing everyone toward real-robot testing suggests the field was getting too comfortable with simulation scores.
What happens next?
AGIBOT is building infrastructure—open datasets, benchmarks, access to robots—so more teams can test their work in the real world. They're also planning online leaderboards and more diverse tasks. The goal is to make real-world validation the standard, not the exception. That's how you get embodied AI that actually works.