For all those who have a hard time captioning photos, Google has come to the rescue. The search giant has announced on its research blog that they are building a Neural Image Caption (NIC) Generator which is able to describe an image in a few words. Also Read - Flipkart Big Saving Days 2021 sale: Top deals on mobile phones to look atAlso Read - Google's Tensor chipset on Pixel 6 series will be manufactured by Samsung: Nikkei
Citing the use cases of this innovation, the company says that it will make it easier for users to search for images on Google, help visually impaired people understand image content and provide alternative text for images when Internet connections are slow. Also Read - Google Pixel 6 wallpapers now available for download: Here’s how to get them
NIC is the brainchild of Google’s research scientists Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan who have published the entire paper on arxiv.org. The program is based on computer vision which allows machines to see the world and natural language processing which tries to make human language meaningful to computers. The latter has also been put to use by Microsoft in its new Skype translate feature.
Two different kinds of artificial neural networks, which are biologically inspired computer models are used inside NIC. One network encodes the image into a compact representation, while the other network generates a sentence to describe it.
In its tests, Google says that NIC was able to come up with rather accurate descriptions of images. The program automatically generated Two pizzas sitting on top of a stove top oven, caption which was exactly what the image represented. Another image shared on the blog was aptly captioned, “A group of people shopping at an outdoor market,” representing the scenario shown in the picture.
Scientists still feel that NIC has a long way to go as the model scored 59 on a particular dataset, where humans score around 69. “As the datasets suited to learning image descriptions grow and mature, so will the performance of end-to-end approaches like NIC. We look forward to continuing developments in systems that can read images and generate good natural-language descriptions,” they said.