Further expanding its focus on 360-degree videos, Facebook today announced at its F8 developer conference that it has developed completely new techniques to improve the viewing experience. This not only involves machine learning but also physics to predict a scene which the human viewer is expected to be observing in the video at any given instant.
Back in early 2016, Facebook debuted Dynamic Streaming to display the highest number of pixels in the person’s field of view and to improve the performance of the stream. But, how can an artificially intelligent program predict where the action will be taking place in the 360-video? And how will it push a higher number of pixels into that particular region to give it rendering priority — improving the performance even on spotty network connections?
Facebook has today shed light on multiple techniques, such as gravitational view-prediction, new AI models and encoding technique, which will enable it to impart intuitiveness to the AI-powered system. These will make the system capable enough to predict and then concentrate pixels in a stream in some location — further improving the probability of guessing the location in the future. This means you will soon not be greeted with hazy scenes, which cause you dizziness and frustration, but with a higher-resolution video (without compromising on the quality of the content).
The company has been using its resources to monitor humans and collect data to understand where they are more likely to actually look when viewing a 360-degree video. Now, this treasure trove of data was employed by the massive VR team at Facebook to build out a heatmap tool which signifies the popularity of each spot in every frame of a 360 video. It uses techniques like computer vision, data filtering and aggregation, and temporal and 3D spatial interpolation to help indicate the areas of most interest.
This enables the AI system to successfully highlight the points of interest in the video — but which is the most probable one and will capture your interest during playback still remains unknown. Also, there are a couple drawbacks with the heatmapping tool as the same does not generate high-quality outputs — due to scalability and accuracy limitations. This means you may begin watching the video from the opening/center scene before realizing that the action was happening at the right boundary.
Thus, Facebook decided to upgrade and bring in deep neural networks to generate saliency map that learns to predict which parts of an unknown video will be important and worth viewing for an individual. Facebook used many original, different 360-degree videos to teach deep learning models from scratch. These enabled the social media giant to provide a scalable predictive solution for videos even in the absence of statistical information. Talking about the saliency map, the official blog post added,
We trained our deep learning models from scratch, using original videos and behavioral signals as input to our training set. We also experimented with several model structures and cost functions, and built and compared several training and testing infrastructures throughout. The final model is based on a fully convolutional neural network.
Using the said models, Facebook wants to improve our viewing experience by helping us explore interesting visual content wherever it might be. The blog post mentions that said models have enabled it to increase resolution by 39 percent on VR devices. This means the company, as usual, is studying how it can improve the delivery of videos to its users. The data and techniques developed through these experiment will one day come in handy for content creators.