ChatGPT used by mental health tech app in AI experiment with users

Jan. 14, 2023, 2:10 PM UTC

When people log in to Koko, an online emotional support chat service based in San Francisco, they expect to swap messages with an anonymous volunteer. They can ask for relationship advice, discuss their depression or find support for nearly anything else — a kind of free, digital shoulder to lean on.

But for a few thousand people, the mental health support they received wasn’t entirely human. Instead, it was augmented by robots.

In October, Koko ran an experiment in which GPT-3, a newly popular artificial intelligence chatbot, wrote responses either in whole or in part. Humans could edit the responses and were still pushing the buttons to send them, but they weren’t always the authors.

About 4,000 people got responses from Koko at least partly written by AI, Koko co-founder Robert Morris said.

The experiment on the small and little-known platform has blown up into an intense controversy since he disclosed it a week ago, in what may be a preview of more ethical disputes to come as AI technology works its way into more consumer products and health services.

Morris thought it was a worthwhile idea to try because GPT-3 is often both fast and eloquent, he said in an interview with NBC News.

“People who saw the co-written GTP-3 responses rated them significantly higher than the ones that were written purely by a human. That was a fascinating observation,” he said.

Morris said that he did not have official data to share on the test.

Once people learned the messages were co-created by a machine, though, the benefits of the improved writing vanished. “Simulated empathy feels weird, empty,” Morris wrote on Twitter.

When he shared the results of the experiment on Twitter on Jan. 6, he was inundated with criticism. Academics, journalists and fellow technologists accused him of acting unethically and tricking people into becoming test subjects without their knowledge or consent when they were in the vulnerable spot of needing mental health support. His Twitter thread got more than 8 million views.

Senders of the AI-crafted messages knew, of course, whether they had written or edited them. But recipients saw only a notification that said: “Someone replied to your post! (written in collaboration with Koko Bot)” without further details of the role of the bot.

In a demonstration that Morris posted online, GPT-3 responded to someone who spoke of having a hard time becoming a better person. The chatbot said, “I hear you. You’re trying to become a better person and it’s not easy. It’s hard to make changes in our lives, especially when we’re trying to do it alone. But you’re not alone.”

No option was provided to opt out of the experiment aside from not reading the response at all, Morris said. “If you got a message, you could choose to skip it and not read it,” he said.

Leslie Wolf, a Georgia State University law professor who writes about and teaches research ethics, said she was worried about how little Koko told people who were getting answers that were augmented by AI.

“This is an organization that is trying to provide much-needed support in a mental health crisis where we don’t have sufficient resources to meet the needs, and yet when we manipulate people who are vulnerable, it’s not going to go over so well,” she said. People in mental pain could be made to feel worse, especially if the AI produces biased or careless text that goes unreviewed, she said.

Now, Koko is on the defensive about its decision, and the whole tech industry is once again facing questions over the casual way it sometimes turns unassuming people into lab rats, especially as more tech companies wade into health-related services.

Congress mandated the oversight of some tests involving human subjects in 1974 after revelations of harmful experiments including the Tuskegee Syphilis Study, in which government researchers denied proper treatment to Black men with syphilis and some of the men died. As a result, universities and others who receive federal support must follow strict rules when they conduct experiments with human subjects, a process enforced by what are known as institutional review boards, or IRBs.

But, in general, there are no such legal obligations for private corporations or nonprofit groups that don’t receive federal support and aren’t looking for approval from the Food and Drug Administration.

Morris said Koko has not received federal funding.

“People are often shocked to learn that there aren’t actual laws specifically governing research with humans in the U.S.,” Alex John London, director of the Center for Ethics and Policy at Carnegie Mellon University and the author of a book on research ethics, said in an email.

He said that even if an entity isn’t required to undergo IRB review, it ought to in order to reduce risks. He said he’d like to know which steps Koko took to ensure that participants in the research “were not the most vulnerable users in acute psychological crisis.”

Morris said that “users at higher risk are always directed to crisis lines and other resources” and that “Koko closely monitored the responses when the feature was live.”

After the publication of this article, Morris said in an email Saturday that Koko was now looking at ways to set up a third-party IRB process to review product changes. He said he wanted to go beyond current industry standard and show what’s possible to other nonprofits and services.

There are infamous examples of tech companies exploiting the oversight vacuum. In 2014, Facebook revealed that it had run a psychological experiment on 689,000 people showing it could spread negative or positive emotions like a contagion by altering the content of people’s news feeds. Facebook, now known as Meta, apologized and overhauled its internal review process, but it also said people should have known about the possibility of such experiments by reading Facebook’s terms of service — a position that baffled people outside the company due to the fact that few people actually have an understanding of the agreements they make with platforms like Facebook.

But even after a firestorm over the Facebook study, there was no change in federal law or policy to make oversight of human subject experiments universal.

Koko is not Facebook, with its enormous profits and user base. Koko is a nonprofit platform and a passion project for Morris, a former Airbnb data scientist with a doctorate from the Massachusetts Institute of Technology. It’s a service for peer-to-peer support — not a would-be disrupter of professional therapists — and it’s available only through other platforms such as Discord and Tumblr, not as a standalone app.

Koko had about 10,000 volunteers in the past month, and about 1,000 people a day get help from it, Morris said.

“The broader point of my work is to figure out how to help people in emotional distress online,” he said. “There are millions of people online who are struggling for help.”

There’s a nationwide shortage of professionals trained to provide mental health support, even as symptoms of anxiety and depression have surged during the coronavirus pandemic.

“We’re getting people in a safe environment to write short messages of hope to each other,” Morris said.

Critics, however, have zeroed in on the question of whether participants gave informed consent to the experiment.

Camille Nebeker, a University of California, San Diego professor who specializes in human research ethics applied to emerging technologies, said Koko created unnecessary risks for people seeking help. Informed consent by a research participant includes at a minimum a description of the potential risks and benefits written in clear, simple language, she said.

“Informed consent is incredibly important for traditional research,” she said. “It’s a cornerstone of ethical practices, but when you don’t have the requirement to do that, the public could be at risk.”

She noted that AI has also alarmed people with its potential for bias. And although chatbots have proliferated in fields like customer service, it’s still a relatively new technology. This month, New York City schools banned ChatGPT, a bot built on the GPT-3 tech, from school devices and networks.

“We are in the Wild West,” Nebeker said. “It’s just too dangerous not to have some standards and agreement about the rules of the road.”

The FDA regulates some mobile medical apps that it says meet the definition of a “medical device,” such as one that helps people try to break opioid addiction. But not all apps meet that definition, and the agency issued guidance in September to help companies know the difference. In a statement provided to NBC News, an FDA representative said that some apps that provide digital therapy may be considered medical devices, but that per FDA policy, the organization does not comment on specific companies.

In the absence of official oversight, other organizations are wrestling with how to apply AI in health-related fields. Google, which has struggled with its handling of AI ethics questions, held a “health bioethics summit” in October with The Hastings Center, a bioethics nonprofit research center and think tank. In June, the World Health Organization included informed consent in one of its six “guiding principles” for AI design and use.

Koko has an advisory board of mental-health experts to weigh in on the company’s practices, but Morris said there is no formal process for them to approve proposed experiments.

Stephen Schueller, a member of the advisory board and a psychology professor at the University of California, Irvine, said it wouldn’t be practical for the board to conduct a review every time Koko’s product team wanted to roll out a new feature or test an idea. He declined to say whether Koko made a mistake, but said it has shown the need for a public conversation about private sector research.

“We really need to think about, as new technologies come online, how do we use those responsibly?” he said.

Morris said he has never thought an AI chatbot would solve the mental health crisis, and he said he didn’t like how it turned being a Koko peer supporter into an “assembly line” of approving prewritten answers.

But he said prewritten answers that are copied and pasted have long been a feature of online help services, and that organizations need to keep trying new ways to care for more people. A university-level review of experiments would halt that search, he said.

“AI is not the perfect or only solution. It lacks empathy and authenticity,” he said. But, he added, “we can’t just have a position where any use of AI requires the ultimate IRB scrutiny.”

If you or someone you know is in crisis, call 988 to reach the Suicide and Crisis Lifeline. You can also call the network, previously known as the National Suicide Prevention Lifeline, at 800-273-8255, text HOME to 741741 or visit SpeakingOfSuicide.com/resources for additional resources.

David Ingram

David Ingram covers tech for NBC News.