Now imagine teaching the machines something as complicated as dribbling—which is exactly what researchers at Carnegie Mellon University and a startup called DeepMotion have done. Using motion-capture technology, they’ve shown an algorithm generally how humans move when they dribble. Then, thanks to a process called reinforcement learning, a simulated basketball player can teach itself through trial and error how to finely manipulate the ball, both while stationary and while running. It’s taught itself to expertly do what would thoroughly embarrass an … underactive type like myself.
The researchers began by putting people in motion-capture suits to watch them dribble. This gave the reinforcement learning algorithms a good head start. You could try to have an avatar learn from scratch: First to stand, then to walk, then to run, then to manipulate a ball. To do that, you give the system a goal—say, move forward as fast as possible—and it tries movements at random. If the avatar does something that gets it closer to its goal, like combining random movements in order to stand, it gets points. If it does something dumb, it gets dinged. With a point system like this, over time it teaches itself how to run.
That’s not a good way to go about it in this case, though. “If you’re trying to do something easy, then maybe you can just explore the space and flail around much like a baby does as it’s sort of figuring out how to grab things and so on,” says CMU roboticist Jessica Hodgins, who helped develop the system. “But it doesn’t make sense in this complicated space of doing something that requires as much agility as basketball dribbling.”
So instead of starting from scratch, the motion-capture information allows the avatar to mimic a dribbling human’s body movement. What the researchers couldn’t capture, though, was the ball itself—it moves too fast, and you can’t stick trackers on it. They had to add the ball into the simulation, and let the avatar play with it through reinforcement learning, or trial and error.
Take a look at the GIF above. The avatar’s dribbling starts out awkward at first, but soon improves. “You’re reinforcing the behaviors that you want and then negatively reinforcing the behaviors that you don’t want,” says Hodgins. “You’re doing that by running many, many trials and having the system learn through those trials to be more robust to different kinds of situations.”
Had the researchers dropped an avatar in simulation with a perfectly tracked ball, that might work fine. But as soon as they changed something about the environment, like the flatness of the court, the avatar would fall to pieces.
Conversely, because it’s learning on its own to manipulate the ball—with the boost of already knowing how the rest of its body should be moving—it can then adapt to, say, a court that isn’t perfectly flat. It’s “robust,” as computer scientists say.
The adaptable avatar can even learn to dribble as it runs, via the same process. (Above it loses the ball at first, but learns to improve.) And because it’s more flexible to perturbations in its environment, the researchers can give it a digital “push” as it moves across the court, yet still it dribbles. Until, well, it falls on its face, as you can see below.