And Disney continues to churn out fascinating tech. The special research division under Disney has now come up with a new technology in the field of re-dubbing, extensively used in the field of television and movies.
The latest technology named as “Visually Consistent Acoustic Redubbing” in the paper written by Sarah Taylor and Iain Matthews of Disney labs, Pittsburgh, can detect facial expressions and lip movements in a video, i.e the visual elements of speech (visemes) and map them to different phoneme sequences synchronizing with the video.
When composed with a pronunciation dictionary and language model, this produces a vast number of word sequences that are in sync with the original video, literally putting plausible words into the mouth of the speaker.
In simple words, the technology can come up with a list of many things which you could be saying by detecting your facial features and lip movements while you are saying something else. So you may originally be speaking the dialogue by Batman but it can come up as a funny dialogue from a Bhojpuri movie perfectly synced to your lip movements..!!
This method of redubbing differs from the traditional approach of visual only speech recognition which involves detecting static lip movements and mapping them to single set of phonetic sequences with the aim to predict the original speech in a “many to one” manner.
“This work highlights the extreme level of ambiguity in visual-only speech recognition,” Taylor said. So whereas a lip reader battles against ambiguity, using context to figure out the most likely words that were spoken, the Disney team exploited that ambiguity to find alternative words.
“Dynamic visemes are a more accurate model of visual speech articulation than conventional visemes and can generate visually plausible phonetic sequences with far greater linguistic diversity,” she added.
It is interesting that the current internet sensation Dubsmash also works on similar lines but in a far more basic method. It is a video messaging app which allows users to select an audio clip and then shoot a selfie video performing on the same clip and the app then creates a dubbed version of your video synced with the selected audio clip.
Disney tech goes a lot further in the sense that it can come up with hundreds of possible speech sequences just by detecting the visual elements of speech and perfectly synchronising them with the video. The technology is a novel idea in the field of automatic speech redubbing which remains an unexplored area of research but can be quite useful in more effective dubbing of movies, television shows and video games for audiences that speak a different language to the original recording.