Playing roulette with race, gender, data and your face

After an art project showed how AI can categorize people in offensive ways, a major research database is discarding half its images of people.
The online art project ImageNet Roulette classified this image of NBC News reporter Alex Johnson as "draftsman, drawer: an artist skilled at drawing," which couldn't be further from the truth.
The online art project ImageNet Roulette classified this image of NBC News reporter Alex Johnson as "draftsman, drawer: an artist skilled at drawing," which couldn't be further from the truth. But very different and often problematic descriptions come up when the image is of a woman or a person of color.NBC News
Get the Mach newsletter.
SUBSCRIBE
By Alex Johnson

A few months ago, an art project appeared online that offered people the chance to see just how one of the most widely used photo research databases would categorize them.

People can upload photos of themselves to the project, called ImageNet Roulette, where they're matched against the "people" categories of ImageNet, the 10-year-old photo database at the heart of some of the biggest machine learning efforts. The system then classifies people based on similar photos tagged in the database.

The results are frequently wacky — it thought Donald Trump Jr. was a "steward/flight attendant" — but also often offensive. As exhaustively aggregated on social media since the project resurfaced recently, caucasian-appearing people are generally classified in terms of jobs or other function descriptors; darker-skinned people — or even just dark pictures of anyone — are frequently described in terms of race:

Women, as well, are often classified by how the algorithm assesses their appearance:

The author of this article gave it a try, as seen in the illustration at the top, and was categorized as "draftsman, drawer: an artist skilled at drawing," which is about as wrong as it can be.

This is, in fact, the point of the project, created by Trevor Paglen, a noted technology artist who has received a MacArthur Fellowship "genius" grant.

Paglen and his co-creator, Kate Crawford, co-director of the AI Now Institute at New York University, say explicitly that the project is a "provocation designed to help us see into the ways that humans are classified in machine learning systems."

"That is by design: We want to shed light on what happens when technical systems are trained on problematic training data," Paglen and Crawford say. "AI classifications of people are rarely made visible to the people being classified. ImageNet Roulette provides a glimpse into that process — and to show the ways things can go wrong."

ImageNet, the giant image database used by the project, hasn't directly addressed the web tool. But as the tool went viral, reinvigorating the debate around the development of artificial intelligence systems and the biases that can be introduced through existing datasets, ImageNet announced this week that it would scrub more than half of the 1.2 million pictures of people cited in its sprawling collection.

"Science progresses through trial and error, through understanding the limitations and flaws of past results," ImageNet said in a statement. "We believe that ImageNet, as an influential research dataset, deserves to be critically examined, in order for the research community to design better collection methods and build better datasets."

Why now?

ImageNet Roulette returned to wide attention on Monday in connection with an exhibit called "Training Humans," which opened last week at Fondazione Prada, a modern art museum in Milan, Italy.

Get the mach newsletter.

Paglen and Crawford say they don't generate the offensive descriptions, which they say come solely from the language categories that ImageNet uses.

It's the same language structure that ImageNet uses to catalog all of its 14 million images into 22,000 visual categories — the same language structure that has influenced the work of research teams from some of the biggest names in technology, including Google and Microsoft, which have used it in competitions to refine the algorithms driving their own object recognition systems.

Concern that such programs can embed racial and gender bias in artificial intelligence systems has been at the forefront of artificial intelligence discussions in recent months as companies and law enforcement agencies increasingly adopt facial recognition technologies to identify everyday people with greater accuracy.

Last year, the American Civil Liberties Union, or ACLU, used Amazon's technology, called Rekognition, to build a database of 25,000 publicly available arrest photos. It then ran the official photos of all 535 members of Congress against the database — which, it said, identified 28 of the lawmakers as other people who had been arrested for alleged crimes.

Facial recognition surveillance by governments and large institutions "threatens to chill First Amendment-protected activity like engaging in protest or practicing religion, and it can be used to subject immigrants to further abuse from the government," the ACLU said.

Rep. Alexandria Ocasio-Cortez, D-N.Y., has been sounding similar alarms throughout this year.

In January, Ocasio-Cortez pointed out that facial recognition algorithms "always have these racial inequities that get translated, because algorithms are still made by human beings, and those algorithms are still pegged to basic human assumptions."

In May, in questioning AI experts at a hearing of the House Oversight and Government Reform Committee, she elicited testimony that today's facial recognition technology is ineffective, to a statistically significant extent, in recognizing anyone other than white men:

"So, we have a technology that was created and designed by one demographic that is only mostly effective on that one demographic, and they're trying to sell it and impose it on the entirety of the country," she said.

'Junk science' or a prod to Silicon Valley's conscience?

ImageNet Roulette would appear to substantiate that assertion, and to that extent, it accomplishes its goals in a vivid manner.

But notwithstanding how the project has been described in publicity materials and news reports this week, ImageNet Roulette isn't itself a sophisticated artificial intelligence system. It's an art project, one that created and uses its own algorithms to tell ImageNet how to process photos. Like any other algorithm, it's subject to whatever biases are shared by its coders.

Moreover, ImageNet is primarily intended to be used in recognizing and classifying objects, not people. It said using ImageNet to classify people has always been "problematic and raises important questions about fairness and representation," suggesting that projects like ImageNet Roulette aren't a rigorous test.

Other AI experts raised similar doubts.

Peter Skomoroch, the AI venture capital investor who is the former principal data scientist at LinkedIn, went so far as to call ImageNet Roulette "junk science," writing on Twitter: "We can and do examine these issues using real machine learning systems. That's not what is happening here.

"Intentionally building a broken demo that gives bad results for shock value reminds me of Edison's war of the currents."

(Skomoroch was referring to the campaign in the late 1880s by Thomas Edison, an advocate of using direct current systems, or DC, to deliver electricity, to discredit Nikola Tesla's alternating current system, or AC, which powers the United States' electric grid today.)

Paglen and Crawford couldn't be reached directly for comment, but they've been discussing ImageNet Roulette widely online this week as their exhibit opens in Milan.

In a 7,000-word essay they posted Wednesday, Paglen and Crawford said their purpose wasn't to discredit AI and facial recognition technologies.

Instead, they said, it was to demonstrate to everyday people that the algorithms used to train such systems — the rules the systems follow — are fundamentally flawed because they're written by people, and people are flawed.

"ImageNet is an object lesson, if you will, in what happens when people are categorized like objects," they wrote. "And this practice has only become more common in recent years, often inside the big AI companies, where there is no way for outsiders to see how images are being ordered and classified."

That's a valid criticism when it comes to Imagenet, even though it's considered to be among the most reliable and vital databases used to train object recognition systems.

Imagenet was built beginning in 2009 using a catalog of descriptive labels created by WordNet, an academic database designed in 1985 to slot all of the nouns, verbs, adjectives and adverbs in English into categories called synonym sets, or "synsets."

The word "dog," for example, is assigned to sets related to canines, carnivores, mammals, vertebrates, animals and so forth. It pops up in categories related to wildlife and sports ("sled dog" and "sleigh dog"), food ("frankfurter" and "hot dog"), smithwork ("dog-iron" and "firedog," which are other words for "andiron") and pursuit ("to dog," or to chase after).

Because WordNet is value-neutral, it seeks to recognize all synsets that a word like "dog" can fit into, and not all of those sets are politically palatable — "dog" also shows up in sets related to women's appearances ("frump, dog: a dull unattractive unpleasant girl or woman").

Because WordNet lists such meaning, they're picked up by ImageNet, and in turn by ImageNet Roulette. When you shift your attention to words that can relate to race, gender and the like, you can quickly see where things go wrong.

Paglen and Crawford contend that datasets like ImageNet "aren't simply raw materials to feed algorithms, but are political interventions," because "at the image layer of the training set, like everywhere else, we find assumptions, politics and worldviews."

Racial assumptions in data systems, in particular, "hark back to historical approaches where people were visually assessed and classified as a tool of oppression and race science," they wrote.

ImageNet said this week that it recognizes that "WordNet contains offensive synsets that are inappropriate to use as image labels." Specifically, 437 subcategories of the "people" set are "unsafe" (that is, offensive regardless of context), and 1,156 more are "sensitive" (meaning they're offensive depending on the context).

ImageNet said it has been working on the problem for a year and is removing all 1,593 "unsafe" and "sensitive" subcategories. And it said it's removing its database links to all of the photos in those subsets — wiping out 600,040 of the images in the "people" set and leaving only 577,244 intact, or fewer than half.

"Finally, our effort remains a work in progress," the project wrote. "Our research report is awaiting peer review and we will share it shortly. We welcome input and suggestions from the research community and beyond on how to build better and fairer datasets for training and evaluating AI systems."