Microsoft is developing a bot that can draw what you want it to by leveraging Artificial Intelligence (AI) technology — programmed to pay close attention to individual words when generating images from caption-like text descriptions. Also Read - Forza Horizon 5 available for pre-orders: Price in India, editions and more
The technology, which the researchers simply call the drawing bot, can generate images of everything from ordinary pastoral scenes — such as grazing livestock — to the absurd and a floating double-decker bus. Each image contains details that are absent from the text descriptions, indicating that this AI contains an artificial imagination. Also Read - Everything Xbox and Bethesda showcased during E3 2021: Age of Empires IV, Xbox mini-fridge and more
“If you go to Bing and you search for a bird, you get a bird picture. But here, the pictures are created by the computer, pixel by pixel, from scratch. These birds may not exist in the real world — they are just an aspect of our computer’s imagination of birds,” Xiaodong He from Microsoft’s research lab in a blog post late on Thursday. Also Read - Windows 11 Sun Valley teased officially with YouTube video ahead of June 24 reveal
According to results on an industry standard test, reported in a research paper posted on arXiv.org, the bot produced a nearly three-fold boost in image quality compared to the previous state-of-the-art technique for text-to-image generation. The core of this bot is a technology known as a “Generative Adversarial Network” or GAN.
The network consists of two Machine Learning models — one that generates images from text descriptions and another, known as a discriminator, that uses text descriptions to judge the authenticity of generated images.
The researchers said that text-to-image generation technology could find practical applications acting as a sort of sketch assistant to painters and interior designers or as a tool for voice-activated photo refinement.
For now, the technology is imperfect. “For AI and humans to live in the same world, they have to have a way to interact with each other. The language and vision are the two most important modalities for humans and machines to interact with each other,” The blog post explained.