In a recent episode, I spoke about the new artificial intelligence tool, GPT-3. The AI community was blown away by how well the tool could generate text, but what if AI could create images as well?
Just months after GPT-3 was announced, OpenAI unveiled a brand-new tool called DALL·E. While GPT-3 is shockingly good at generating text, DALL·E can transform text into images instead.
DALL·E is a breath-taking breakthrough in AI that can manipulate visual concepts through language to create images.
But how was DALL·E created, and what connection does it have to GPT-3? Find out in this episode of Short and Sweet AI.
To learn more about DALL·E, how it all works, and what it can do, keep reading or listen to the podcast episode below.
The name DALL·E comes from a combination of surrealist artist Salvador Dali and the animated robot Wall-E.
What DALL·E does is simple yet revolutionary. Acting as a natural extension of GPT-3, DALL·E was trained with a combination of the 13 billion features of GPT-3 and a dataset of 12 billion images.
DALL·E can take a text prompt and responds not with words like GPT-3 would, but with images instead. It’s a very powerful text-to-image technology.
DALL·E can create an image of anything you can imagine (whether it exists or not). That’s because DALL·E doesn’t just recognize images. It draws them instead. As a result, the tool can come up with some really fun and unique designs.
For example, you could give it the text prompt “an armchair in the shape of an avocado,” and it will create an image to match it. In fact, DALL·E did you just that. You can now find a selection of avocado chair images on the OpenAI website.
DALL·E can create both images of things that already exist and concepts that don’t. For example, you can find ordinary photographs of a capybara on the OpenAI website, created by DALL·E. But you can also find more surreal images of a radish in a tutu walking a dog. It seems anything is possible with DALL·E.
How Does it Work?
While text-to-image algorithms are not new technology, they have previously been limited to creating images of birds and flowers. DALL·E is a huge step beyond that, thanks to the GPT-3 neural network, which trains it to recognize text and images.
DALL·E uses language and understanding provided by GPT-3 and its own underlying structure to create images from text.
Each time the tool generates images, it comes up with a large selection. Another machine learning algorithm called CLIP then ranks those images and determines which pictures best match the text.
As a result, the images are much more relevant to the text and can reflect a blend of more complex concepts.
This is what makes DALL·E the most impressive and realistic text-to-image system ever developed.
Unintended but Useful Behaviors
Like GPT-3, DALL·E also surprised developers with some unintended but useful behaviors. DALL·E is another example of “zero-shot visual reasoning” or ZSL for short. ZSL is the ability of models to perform tasks they weren’t specifically trained to do.
GPT-3 managed to write computer code even though it wasn’t trained to do any coding. DALL·E “learned” how to generate images from captions and even, in some cases, transformed images into sketches.
Another surprising talent of DALL·E was that it could design custom text on street signs. Essentially, DALL·E can act like an AI Photoshop.
DALL·E also shows an understanding of complex visual concepts, which means it can, in a sense, answer questions visually.
During tests, DALL·E was given hidden patterns and prompted to solve an uncompleted grid by selecting images to match. DALL·E was able to fill in the rest of the grid with matching patterns without any prompts.
Creativity is a Measure of Intelligence
Experts agree that language grounded in visual understanding makes DALL·E a brilliant piece of AI technology.
The exciting thing about DALL·E is that it can take two unrelated concepts, such as “armchair” and “an avocado,” and put them together into an image that makes complete sense. The ability to coherently blend concepts and use them in this way is a stunning example of creativity.
In the AI world, creativity is one measure of human intelligence that is difficult to replicate in machines. In a sense, DALL·E stores information about our world and uses it in a very human-like way.
So, this begs the question – is this how machine intelligence becomes human-like intelligence?