A Case for Robot Learning
What makes programming robots difficult? We’ll look at the promising idea of end-to-end learning and why it is (or maybe isn’t) the future of robotics.
By Ashwin Reddy
Recent advances in machine learning have led to superhuman performance in environments like Go, chess, and Atari arcade games and even in tasks like determining protein folding possible. We could conclude that artificial intelligence shows great promise in those fields.
What about robotics? This video from 2008 shows a PR1 robot that seems to be cleaning a house on its own:
Unfortunately, there’s a catch: a human (off-screen) had to teleoperate the robot. This video tells us robots can physically perform useful and complicated tasks, but they are not quite intelligent enough to do it on their own. If robots could determine what to do on their own, they would be far more useful in the real world.
First, we’ll see why programming robots presents a unique set of challenges. Then, we’ll look at some of the ways that recent advances in machine learning can help bridge the gap between the robots we have and the robots we want.
The Challenge of Embodied Intelligence
Why should writing software for a robot be harder than any other kind of software?
In principle, there’s no difference between writing a program that will run on a personal computer (PC) and one that will run on a robot. At the end of the day, a developer will push source code in a programming language to execute an algorithm onto a device.
But to think along such lines is to miss a subtle but important change in how we expect to deploy these programs.
Take a spreadsheet program as an example of a program running on a PC. It takes inputs from a human to perform calculations. Most likely, the human data is sensible, and there is no issue. If the human inputs nonsensical data, the program can just reject it. In the worse case, the program fails and the computer might crash, a nuisance for the user but not a catastrophe.
Robots live in an entirely different space, one where we can’t be so indifferent to the realities of inputs and outputs. Where a spreadsheet asks a human for input, a robot must actively probe the local world for information. That information is noisy and laggy. Where a spreadsheet failure is simply annoying, a failing robot might wear out its own mechanical parts or wreck the home it’s in. Researchers can soften these practical difficulties by working with robots in computer simulations, but the fundamental problem remains.
Computer scientists have given a name to this difficulty in programming robots: Moravec’s paradox. It states that programming computers to do logic and arithmetic problems, tasks humans find finnicky, is easier than programming them to understand their world, which humans do effortlessly. Since we learn motor skills while we’re young (not to mention we’re aided by millions of years of evolution), moving around in the world doesn’t seem so hard to us. However, it’s not clear how to break our motor skills down into an algorithm.
The term embodied intelligence better characterizes how the real world affects decision making. The robot is physically embodied in the real world, so its sensors and motors must be tightly integrated together rather than treated as peripheral devices for the software.
Robot Learning
Embodied intelligence doesn’t fit neatly with traditional programming methods. A software engineer might write down some logic on a piece of paper, think about some design choices, code up the idea, run it a hundred times, debug, and test. This cycle is fundamentally not tenable for a roboticist.
Robot learning presents a different philosophy. Here, we treat the robot like a newborn baby: it is capable of doing many things but it needs some support in learning to do them effectively.
Robot learning changes the focus from trying to break down robot tasks algorithmically to determining how a robot can learn from its own experience. That experience can come from the robot trying things out in the real world, from a robot in simulation, or potentially even from other robots.
End-to-end learning
A reasonable approach to algorithmic robotics would be breaking a system down into different modules (e.g. a perception module, a navigation module, a manipulation module, etc.) that interface with one another. This approach allows an engineer to build and debug pieces separately, creating maintainability. However, the challenges of real world robotics that we’ve discussed mean that modular systems can often be brittle.
One suggestion from robot learning is to replace these modules with deep neural networks. Then, any given robotics task is reduced to two steps:
Figure out how to get high quality robot experience.
Determine how best to use it to train the network/module on a task.
End-to-end learning takes this idea to its limit. It proposes that a single deep neural network could digest raw sensory input to produce motor outputs with zero human-written code connecting these. In the video below, Prof. Sergey Levine describes a paper of his that helped prove the concept viable:
https://www.youtube.com/watch?v=W2fcJVtLspM
In short, there are still modules since the first part of the network is typically intended to process images, the last part determines the motor actions, and so forth. However, we’re not interested in how effective any particular module is. Instead, we only care about the holistic performance, so it’s fine if one module is weaker just as long as the next module picks up the slack.
Sidebar: Transfer Learning
One benefit of end-to-end learning is that a network trained to do task A can be finetuned to perform a similar but different task, B.
This flexibility enables researchers to transfer skills acquired by a robot trained in simulation to a real instance of the robot. Discrepancies between the simulated environment and real world mean that correct robot behavior in one might be incorrect in another, so you can’t simply expect a trained network to do the right thing in the real world.
A simple technique known as domain randomization proposes that the simulation environment randomly changes parameters from lighting conditions to positions of objects, etc. in order to force the learning to be robust to these changes.
Alternatives to end-to-end learning
To give a sense for what other kinds of techniques exist, I want to highlight some methods that don’t belong to the end-to-end camp.
For example, the Dex-Net project tries to solve the problem of robot grasping for everyday objects using machine learning, but it isn’t end-to-end. Instead, the model’s primary goal is to predict the success of various kinds of grasps for a given object. While this network takes in sensory inputs, its output is not directly connected to motor torques, so it isn’t in the end-to-end learning family.
In a talk “Feedback control from pixels” given at MIT’s Embodied Intelligence Seminar, Prof. Russ Tedrake argues that an entirely end-to-end model isn’t the only way to solve complex robotics problems. To him, this approach sacrifices the rigor and deeper understanding of classical robotics, informed by control theory. In one project, they use linear models and control theory to get a robot to push chopped carrots on a cutting board.
In short, imagine a spectrum of robotics software. On one end, we have robots that are hardwired by humans to perform specific tasks. On the other end, we have robots that train themselves end-to-end on whatever task they’re assigned. Roboticists will need to figure out which segment of this spectrum is likely to be most fruitful.
Robots of Tomorrow
Let’s assume for the sake of argument that end-to-end learning for robotics is worthwhile. It is by no means an approach that everyone in the robotics community agrees with, but it leads the way to many fascinating ways of deploying robots.
For instance, suppose we discover that a certain neural network architecture works really well on three or four different robotic tasks. That would be promising because we could then use that information to acquire new skills quickly.
We might also start thinking about how robots can learn from one another. For example, is it possible for a humanoid robot in a factory to learn to assemble cars better using data collected from a Spot robot in an office space?
Whether end-to-end-learning does or doesn’t become the future of robotics, one thing is for sure: robot software has its own constraints and limitations that make it more complex than software that isn’t embodied. At a high level, roboticists are trying to build the kind of dynamic behavior that animals exhibit from the ground up. For that reason, roboticists will always have to think a little differently to apply artificial intelligence in their work.