We all know how annoying real robots are. They’re expensive, they’re finicky, and teaching them to do anything useful takes an enormous amount of time and effort. One way of making robot learning slightly more bearable is to program robots to teach themselves things, which is not as fast as having a human instructor in the loop, but can be much more efficient because that human can be off doing something else more productive instead. Google industrialized this process by running a bunch of robots in parallel, which sped things up enormously, but you’re still constrained by those pesky physical arms.
The way to really scale up robot learning is to do as much of it as you can in simulation instead. You can use as many virtual robots running in virtual environments testing virtual scenarios as you have the computing power to handle, and then push the fast forward button so that they’re learning faster than real time. Since no simulation is perfect, it’ll take some careful tweaking to get it to actually be useful and reliable in reality, and that means that humans have get back involved in the process. Ugh.
A team of NVIDIA researchers, working at the company’s new robotics lab in Seattle, is taking a crack at eliminating this final human-dependent step in a paper that they’re presenting at ICRA today. There’s still some tuning that has to happen to match simulation with reality, but now, it’s tuning that happens completely autonomously, meaning that the gap between simulation and reality can be closed without any human involvement at all.
For everyone who does Sim-2-Real research, the proof is still in running it on the real robot, and showing that it transfers. And there’s a lot of iteration there. It’s not like you train in sim, you test on the real robot, and you’re done. It’s more like, you train in sim, test on the real robot, realize it’s not generalizable, rethink your approach, and train in a new sim, and hope that now it’ll generalize on the real robot. And this process can go on for a long time before you actually get that generalization behavior that you hope for. And in that process you’re constantly testing on a real robot to see if your generalization works, or doesn’t work.
While it would be amazing if we could just remove the real robot testing completely and just go straight from simulation to deployment, we can’t, because no simulation is a good enough representation of the real world for that to work. The way to deal with this reality gap is to just “mess with” the simulation in specific ways (what is known as “domain randomization”) to build in enough resiliency that you can cope with the inherent uncertainty and occasional chaos that reality throws your way.
It’s this process of messing with (and ideally optimizing) the parameters of the simulation that requires an experienced human, and even if you know what you’re doing, it can be tedious and time consuming. Basically, you run the simulation for a while, try the learned task on a real robot, watch exactly how it fails, and then go in and alter the simulation parameters that you think will help get things closer.