Microsoft is developing a bot that can draw what you want it to by leveraging Artificial Intelligence (AI) technology — programmed to pay close attention to individual words when generating images from caption-like text descriptions.
The technology, which the researchers simply call the drawing bot, can generate images of everything from ordinary pastoral scenes — such as grazing livestock — to the absurd and a floating double-decker bus. Each image contains details that are absent from the text descriptions, indicating that this AI contains an artificial imagination.
“If you go to Bing and you search for a bird, you get a bird picture. But here, the pictures are created by the computer, pixel by pixel, from scratch. These birds may not exist in the real world — they are just an aspect of our computer’s imagination of birds,” Xiaodong He from Microsoft’s research lab in a blog post late on Thursday.
According to results on an industry standard test, reported in a research paper posted on arXiv.org, the bot produced a nearly three-fold boost in image quality compared to the previous state-of-the-art technique for text-to-image generation. The core of this bot is a technology known as a “Generative Adversarial Network” or GAN.
The network consists of two Machine Learning models — one that generates images from text descriptions and another, known as a discriminator, that uses text descriptions to judge the authenticity of generated images.
The researchers said that text-to-image generation technology could find practical applications acting as a sort of sketch assistant to painters and interior designers or as a tool for voice-activated photo refinement.
For now, the technology is imperfect. “For AI and humans to live in the same world, they have to have a way to interact with each other. The language and vision are the two most important modalities for humans and machines to interact with each other,” The blog post explained.