Google proposes new evaluation scheme for AI generated audio and video

Currently there is no metrics to measure the quality of audio or video media produced by an AI system. Some may argue that Fréchet Inception Distance (FID) is the best measure for image quality, as it does complex assessments between the AI generated image with its real life counterpart to produce best results.

Well no matter how many measuring systems are currently making rounds for audio and image qualities, none of them are globally accepted and are just referenced in their specific domains.

To resolve this, Google today proposed Fréchet Audio Distance (FAD) and Fréchet Video Distance (FVD), for measuring the quality of audio and video produced, respectively. It is said that FVD will access the whole video without keeping any reference point. Similarly, FAD also has no reference point and could be used on all kinds of audio in contrast to time-aligned ground truth signals like source-to-distortion ratio (SDR).

Software engineers Kevin Kilgour and Thomas Unterthiner read in a blog post;

Access to robust metrics for evaluation of generative models is crucial for measuring (and making) progress in the fields of audio and video understanding, but currently no such metrics exist. Clearly, some [generated] videos shown below look more realistic than others, but can the differences between them be quantified?

To relate how close FAD and FVD are to human judgement the software engineers conducted a series of tests involving human evaluators. The evaluators were asked to work with 10,000 video pairs and 69,000 5-second audio clips. From the results obtained, the engineer duo said that they related “quite well” with the human judgement.

Kilgour and Unterthiner said;

We are currently making great strides in generative [AI] models. FAD and FVD will help us [keep] this progress measurable and will hopefully lead us to improve our models for audio and video generation.

Google proposes new evaluation scheme for AI generated audio and video

Up next

WeWork gets $8 Bn lifeline from Softbank & a new executive chairman, Adam Neumann walks off a billionaire

Author

Aditya Srivastava

Tags

Intel plans to lay off over 500 in Oregon amid financial and AI stress

Windows 11 surpasses Windows 10 to become the most used desktop operating system

Ruoming Pang, Apple’s head of fundational models, has reportedly joined Meta’s AI team

TikTok could launch a US-only version with a different algorithm ahead of possible sale

Intel plans to lay off over 500 in Oregon amid financial and AI stress

Windows 11 surpasses Windows 10 to become the most used desktop operating system

Ruoming Pang, Apple’s head of fundational models, has reportedly joined Meta’s AI team

Google proposes new evaluation scheme for AI generated audio and video

Up next

Author

Aditya Srivastava

Tags

Intel plans to lay off over 500 in Oregon amid financial and AI stress

X says Indian Govt. mandated mass account suspensions including Reuters, Govt. denies

Windows 11 surpasses Windows 10 to become the most used desktop operating system

Ruoming Pang, Apple’s head of fundational models, has reportedly joined Meta’s AI team

TikTok could launch a US-only version with a different algorithm ahead of possible sale