YouTube's automatic captioning feature will now describe sound effects as well

So [Music], [Applause] and [Laughter] will tell you what that noise is in the background. Sound effects in videos will now be described as captions near the bottom of the screen. This is thanks to the advancements in the YouTube’s automatic captioning system brought about thanks to the progress of Google’s machine learning systems. The system has gotten really good at the transcription of what people are saying and it is now moving a step further forward to tackle describing ambient sounds as well.

Announcing the news, YouTube said:

Since 2009, YouTube has provided automatic caption tracks for videos, focusing heavily on speech transcription in order to make the content hosted more accessible. However, without similar descriptions of the ambient sounds in videos, much of the information and impact of a video is not captured by speech transcription alone.

The company also said that music, applause and laughter were the sounds that were most easily recognized and thus it was starting there. Actually, its systems can also recognize other sounds however, context proved to be a difficulty. For instance, if something “rang”, the question would be exactly what rang and how to proceed from there. Plus, the three sounds Google chose, were also among some of the most frequently labeled.

The model applied here is DNN or Deep Neural Network. The company hopes to continue expanding upon its capabilities until it can offer more particularization and produce captions like [Mild Applause], [Raucous Applause] and so on. In the Viterbi algorithm that was applied, Google said that the predicted segments for each sound effect corresponding to the ON state.

Google

The company hopes to continue along this track for some time to come. Google also said that it had managed to develop a framework that would enrich the automatic caption track with sound effects, but added that there was still much to be done.

We hope that this will spur further work and discussion in the community around improving captions using not only automatic techniques, but also around ways to make creator-generated and community-contributed caption tracks richer (including perhaps, starting with the auto-captions) and better to further improve the viewing experience for our users.

You can learn more about the topic by going right here.

YouTube’s automatic captioning feature will now describe sound effects as well

Up next

ISPs may collect personal data without any notice, decides US Senate

Author

Mudit Mohilay

Tags

Leave a Reply Cancel reply

Trump teases a group of “very wealthy people” as potential buyers for TikTok US operations

Google announces full launch of Gemma 3n, its mobile-focused AI model

Meta could acquire voice AI startup PlayAI along with key talent: Report

Microsoft faces lawsuit from authors over alleged unauthorised use of content for training AI models

Elon Musk bans hashtags in X ads starting June 27, calls them ‘esthetic nightmare’

Trump teases a group of “very wealthy people” as potential buyers for TikTok US operations

Google announces full launch of Gemma 3n, its mobile-focused AI model

Meta could acquire voice AI startup PlayAI along with key talent: Report

Microsoft faces lawsuit from authors over alleged unauthorised use of content for training AI models