Google open sources TensorFlow-based image captioning project 'Show and Tell'

Gradually adding to its already burgeoning arsenal of machine learning-powered artificially intelligent products, Google has today announced and open-sourced yet another one of its projects. The latest addition being an image captioning model called ‘Show and Tell‘ that learns how to describe the content of images. This means that the AI can interpret and describe any image — with text captions — supplied by the user.

This ‘image-to-text’ project is powered by a deep neural network running on Google’s second generation machine learning system — TensorFlow — launched about an year ago. It has been developed by the research scientists on the company’s Brain Team, and they boast the system of being a 93.9 per cent accurate as compared to previous version which fell short on expectations.

But how does the ‘Show an Tell’ AI predict text captions with respect to the corresponding images?

To make the captions as accurate as possible, the research team had to train both the vision and language frameworks with captions created by real people. This approach towards naming objects in a frame, reduces redundancies and helps the system piece together a completely descriptive sentence to describe the image in question. It works on a more complex level to synthesize original captions from previously unseen images.

The core strength of the ‘Show and Tell’ project is its ability to bridge logical gaps to connect objects with an image to the corresponding context.

Talking a bit in terms of machine learning lingo(research sheet), the Show and Tell project is an example of an encoder-decoder ‘convolutional’ neural network, where the image is being encoded into fixed-length vectors and then decoded into a natural language description. The system has been trained to work as a language model conditioned on the image encoding. The text representation works on an embedding model where each word — also a fixed-length vector — is learned during the real training.

google-daydream-1

It took the team a good one to two weeks to go through the initial training phase, which was conducted on a single machine with a NVIDIA Tesla K20m GPU. The second training phase may take a couple additional weeks to achieve peak performance, but it’ll help you achieve reasonable results with each try. Previous version of image captioning models took an average of 3 seconds per training step, but today’s open-source project takes it a notch further and perform the same task in quarter of that time — 0.7 seconds.

Once combined with the immensely huge catalog of Google, this technology could be highly useful for visually impaired users, who can use the same to recognize the content of images. They can then interact with the content in a way that has never been made possible before, except Microsoft(COCO) and Facebook are also working to develop similar technologies to integrate into their platforms. This could also be the same technology that is currently being used by Google Assistant to help you determine the next smart reply on Allo.

Also, Google has previously also announced a public alpha release of a TensorFlow-based cloud machine learning platform that powers various different service offered by the company, including speech recognition in the Google app, search in Google Photos and the Smart Reply feature in Inbox by Gmail. This service can be used by developers for building and training custom models to be used in intelligent applications.

The Tech Portal is published by Blue Box Media Private Limited. Our investors have no influence over our reporting. Read our full Ownership and Funding Disclosure →

Staff@The Tech Portal

Our dedicated desk-team at The Tech Portal, bringing you breaking technology and startup coverage from the US and Europe.

Discover more from The Tech Portal

Subscribe to get the latest posts sent to your email.

Google open sources TensorFlow-based image captioning project ‘Show and Tell’

Up next

Tech Exits in H1 2016: Runner up, UK

Author

Staff@The Tech Portal

Tags

Discover more from The Tech Portal

2 comments

Leave a Reply Cancel reply

TCS profit up 12% to ₹13,718 crore in Q4 FY26, revenue reaches ₹70,698 crore

Google enters a $920Mn monthly deal with SpaceX for AI computing power

Meta outage hits Facebook, Instagram, WhatsApp services; now recovering globally

OpenAI rolls out ‘ChatGPT Personal Finance’ tool with real-time bank account connectivity

Hackers stole data from Github internal repositories in recent attack

World-model AI startup Odyssey secures $310Mn in funding at a $1.45Bn valuation

PayPal shuts down $850Mn venture arm amid ongoing restructuring efforts

Canada’s CPP invests ₹7,000 crore in CtrlS to accelerate data centre expansion across India

Telegram challenges India’s temporary ban order, Pavel Durov also alleges Reliance for network disruptions