“Open the pod bay doors, HAL.”
Few other movies lines have captured the horror of artificial intelligence run amok more than HAL’s reply in “2001: A Space Odyssey,” when the sentient computer says regretfully, “I’m sorry, Dave, I’m afraid I can’t do that.”
The computer, capable of interpreting and expressing emotion (and committing murder, as it turns out), may be a bit on the extreme side when it comes to ideas of future robot-human collaborations. But Clifford Nass, director of the Communication Between Humans and Interactive Media Lab at Stanford University, is a big believer in the power of emotion-laden speech to smooth the way toward improved human-machine relations.
HAL notwithstanding, Nass has wondered whether disagreement in human-robot interactions might actually be fruitful. But how would a robot working with a moody astronaut successfully point out that the human was welding the wrong part of a moon-based colony, for example? In other words, as he said during last month’s Council for the Advancement of Science Writing annual conference in Palo Alto, Calif., “Can robots disagree and can they do that in such a way that we don’t hate them?”
As a social species, humans are primed from an early age to quickly detect emotion in speech, whether aggression, fear, happiness, boredom or other easily identifiable combinations of pitch, range, speed or volume. For computers, the task is exceedingly difficult.
“On the simple level, listening is a way to gather data and speaking is a way to transmit data,” Nass said. “So social rules and expectations should be irrelevant to technology.”
Only they aren’t. As annoyed as we often are with the synthetic voice telling us our airline reservation request wasn’t understood, studies suggest we routinely respond by saying “please” and “thank you.”
Illah Nourbakhsh, an associate professor of robotics at The Robotics Institute of Carnegie Mellon University in Pittsburgh, Pa., admits that he even thanks himself while leaving to-do memos on his voicemail.
In a telephone interview, Nourbakhsh said some researchers have treated robots almost like aliens arriving from a distant planet and are approaching the problem of human-robot communication and collaboration by trying to install a new framework for mutual understanding.
Scientists like Nass, he said, are instead beginning with the idea that we have an inborn ability to recognize and respond to cues like emotion in faces and voices, and are designing systems with the idea that humans shouldn’t have to “read a whole new handbook on how to deal with robots.”
Under a collaborative project conducted at NASA Ames Research Center, Nourbakhsh and colleagues adopted a similar strategy to improve human-robot interactions with a future lunar colony in mind.
The idea of having robots do human-like things isn’t to try to recreate true anger or joy or sorrow, he said, but to create an interface that people quickly understand.
An emotional talking head
Among the many projects supported by a European Commission-backed program called Computers in the Human Interaction Loop (or CHIL), researchers at Stockholm, Sweden’s Royal Institute of Technology have hewed to the same human-centered principle with their “Talking Head”: the animated head of a bald man capable of expressive visible speech with several emotional states.
But Nass’s studies on how humans interact with existing automated voice systems suggest the difficulty in teaching robots how to pick their way through the minefield of human emotions.
Systems that consistently blame the user for misunderstandings, for example, are generally disliked — but deemed smart. Apologetic systems that consistently shoulder the blame are more likable, but considered dumb.
Beyond simple misunderstanding, outright disagreement could be particularly precarious for human-robot collaborations. So how can a robot disagree without getting socked?
Research suggests that humans can mitigate criticism by avoiding eye contact and distancing themselves from their critical comments — by lowering and softening their voice, for example.
To find out whether a robot could accomplish the same thing, Nass and his collaborators tried a variation of the classic “desert survival situation,” in which a human-robot team must choose between two objects that might help them survive in a hostile environment.
In some cases, the scientists programmed the robot to agree with its human partner’s suggestion to fetch a particular object — matches, for example. In other trials, the robot began walking toward the suggested item, only to stop, turn around, and suggest something else — a lantern, say.
When the robot was in agreement, the study participants preferred having their mechanical collaborator look right at them. During disagreements, however, their favorable impressions went up when the robot looked somewhere else.
Nass and his collaborators added another twist by manipulating where the robot’s speech originated: either from the robot itself or from a box a few feet away — a literal distancing of its voice.
When the robot agreed with its human partner’s choice and said so without distancing its voice, study participants were more likely to stick to their original picks. But when the robot disagreed while transferring its voice to the box, the participants were more persuadable regarding their choices.
Interestingly, people identified more with the dissenting robot when its voice originated from the box.
“Robots can provide contrary information — they are allowed to disagree, but it is very delicate,” Nass said. The key, he said, is distancing — an idea that also might lend itself to cars if a voice emanating from the backseat criticizes a driver for bad decisions (yes, a literal back-seat driver) while a more direct voice praises good driving skills.
Driven to distraction
In a similar vein, Nass began wondering how emotional speech might affect drivers. “Happy people benefit from happy people,” he said, but what about the cliché that misery loves company? “Totally wrong,” Nass said. “It loves miserable company.”
In a series of experiments conducted in 2005, he and collaborators from Stanford and Toyota first showed drivers a five-minute video that left them feeling either happy or upset, then had them use a driving simulator for 20 minutes.
As a “passenger,” the researchers included a pre-recorded message from a woman offering both comments and questions about the route (“My favorite part of this drive is the lighthouse”). Half the time, her voice was energetic. The other half, it was subdued.
The experiment showed that already happy drivers drove best and conversed more when the upbeat-sounding voice was along for the ride. Sad drivers, however, preferred the subdued voice, as suggested by a decreased number of accidents during the simulation.
“For sad drivers, they were actually distracted by the happy voice,” Nass said. And when asked, they clearly preferred the voice that more closely matched their somber mood. “It tells us that voices in cars manifest more than just content, they manifest emotion,” he said.
Installing an emotion-laden voice into car technology could be inexpensive and easy to implement, he said. Ah, but which voice? Drivers are emotionally fickle creatures, and who would take the blame if a cloyingly perky computer navigator led a suddenly depressed driver to distraction — and into a ditch?
Simulating a wide range of emotion in speech programs has become a major area of research by both industry and academic groups. The catch, according to Nourbakhsh, is that synthetic speech is much further along than the ability of computers to model the cognitive states of humans and how to interact with them over time.
Automated call centers, for example, can be optimized so that an increasingly apologetic voice politely asks a misunderstood caller to try again. Perhaps the method could encourage more people to remain on the line instead of hanging up in frustration, but Nourbakhsh cautions that an over-reliance on the technique could hide deeper problems.
“We’re using an emotional short-circuiting to get over the fact that our voice-recognition stinks,” he said. “Do we really want to invent a back-story so that it’s OK for the robot to be wrong, when we really want to make them better?”
When the robot is actually right, communication can be even trickier. “It’s an interesting question as to when and how you should correct a human,” he said. “There are some obvious lines in the sand that you should draw.”
Trivial corrections are likely to quickly strain relations, but as the danger increases — a driver is driving the wrong way on a highway, for example — a robot would need to weigh the emotional impact on the driver against the consequences of not intervening.
The trick, Nourbakhsh said, is to make the system as cognitively nondisruptive as possible.
“In other words, if you’re kind of a quiet person and it blares at you, that’s bad.”
For toys, the dissonance may not matter. But until researchers find the proper balance, Nourbakhsh said, the use of voice — and especially of emotional voice — will likely proceed more cautiously in products like automobiles.
“It’s not going to be like Robin Williams in your car,” he said.