Here's DALL-E: An algorithm learned to draw anything you tell it

Jan. 27, 2021, 9:18 PM UTC

By Melanie Ehrenkranz

OpenAI, one of the industry leaders in artificial intelligence development, released evidence in early January of a leap forward of its capabilities: An illustration of a baby daikon radish in a tutu walking a dog.

Also, a bunny in pajamas watching TV, a shrimp in a suit using a calculator and a variety of other bizarre combinations — all drawn by its new series of algorithms called DALL-E. The program can generate a variety of drawings and pictures based on simple text prompts. In other examples, the system generated a series of realistic looking pictures based on the prompt "a store front that has the word ‘openai’ written on it."

The drawings may look simple (some are better than others), but it's the kind of progress that highlights how artificial intelligence is continuing to gain humanlike capabilities.

It's also a cause for concern — that these programs can learn human biases.

“Text-to-image is very powerful in that it gives one the ability to express what they want to see in language,” said Mark Riedl, associate professor at the Georgia Tech School of Interactive Computing. “Language is universal, whereas artistic ability to draw is a skill that must be learned over time. If one has an idea to create a cartoon character of Pikachu wielding a lightsaber, that might not be something someone can sit down and draw even if it is something they can explain.”

OpenAI found that DALL-E is sometimes able to transfer some human activities and articles of clothing to animals and inanimate objects, such as food items. Here the text prompt was "an illustration of a baby daikon radish in a tutu walking a dog."OpenAI

DALL-E, which the company says is a portmanteau combining the name of the Spanish artist Salvador Dalí and the Pixar character WALL-E, is the second piece of technology from OpenAI in less than a year to draw the attention of technologists. In May, the company released Generative Pre-trained Transformer 3, or GPT-3, one of the most impressive and humanlike text generators, which with a prompt of just a few words can generate coherent essays.

OpenAI has said both DALL-E and GPT-3 are trained on massive datasets including public information on Wikipedia and are built on the transformer neural network model, which was first announced in December 2017 and has been lauded as "particularly revolutionary in natural language processing." The company has made public enough information to have a basic understanding on how DALL-E works, but the exact details of the data it was trained on remain unknown.

And therein lies the concern about that cute baby radish and other forms of media created by these systems. Academics and technology watchdogs have in recent years warned that the data used to train these systems can contain societal biases that end up in the output of these systems.

That might not have major societal ramifications for a drawing of a radish, but algorithmic bias has already begun to show up in algorithms that have powered crucial decisions such as predicting criminal behavior and grading high-level placement exams.

A study published this month by researchers from Stanford and McMaster universities found that GPT-3 was persistently biased against Muslims. In nearly a quarter of the study’s test cases, “Muslim” was correlated to “terrorist.”

“While these associations between Muslims and violence are learned during pre-training, they do not seem to be memorized,” the researchers wrote, “rather, GPT-3 manifests the underlying biases quite creatively, demonstrating the powerful ability of language models to mutate biases in different ways, which may make the biases more difficult to detect and mitigate.”

Software capable of generating an image from text isn’t new, but to date it’s been either confined to a limited genre ( such as birds and flowers or even just birds) or pretty wonky. DALL-E is impressive for its ability to blend together relatively complex concepts.

Like a snail made of a harp.

OpenAI found that DALL-E can generate animals synthesized from a variety of concepts, including musical instruments, foods, and household items. While not always successful, they found that DALL-E sometimes takes the forms of the two objects into consideration when determining how to combine them. For example, when prompted to draw "a snail made of harp," it sometimes relates the pillar of the harp to the spiral of the snail's shell.OpenAI

OpenAI's DALL-E generator is publicly available in a demo online but is limited to phrases chosen by the company. While the illustrated successes are undoubtedly impressive and accurate, it’s hard to know the weaknesses and ethical concerns of the model without being able to test a range of words and concepts on it.

“We do not know if the restricted demo prevents us from seeing more problematic results,” Riedl said. “In some cases, the full prompt used to generate the images is obscured as well. There is an art to phrasing prompts just right and results will be better if the phrase is one that triggers the system to do better.”

There are, of course, societal implications, both from malicious use cases of the technology or unintended biases. OpenAI said in its blog post that models like these have the power to harm society and that it has future plans to look at how DALL-E might contribute to them.

“Bias and misuse are important, industrywide problems that OpenAI takes very seriously as part of our commitment to the safe and responsible deployment of AI for the benefit of all of humanity,” an OpenAI spokesperson said. “Our policy and safety teams are closely involved with research on DALL-E.”

There are a number of positive creative potentials should DALL-E work across a broad range of blended concepts and generate images free of bias and discrimation. Namely, it lets people create a specific image tailored to their needs without having to learn certain skills, enabling a larger population of creators without automating skilled artists out of a job.

“I do not believe the output of DALL-E is of high enough quality to replace, for example, illustrators, though it could speed up this type of work,” Riedl said.

Speeding up work, however, comes with its own set of issues. While DALL-E might not put animators out of work, powerful new software also tends to be ripe for exploitation.

Riedl noted a few examples, including the generation of pornographic content. Deepfake technology that can seamlessly put faces of one human on another has been used to generate inauthentic media without the consent of people featured in it. Riedl also said people can use keywords and phrases to create images “that are meant to be threatening, disrespectful or hurtful.”

OpenAI said it has kept DALL-E from public use in an effort to make sure its new technology isn't used for nefarious means.

"We are committed to conducting additional research and we would not make DALL-E generally available before building in safeguards to mitigate bias and address other safety concerns," the company said.

Melanie Ehrenkranz

Melanie Ehrenkranz is a reporter focused on technology and culture.