Google AI researchers are building an AI model to assess the depth of videos in motion by leveraging a data set of 2,000 Mannequin Challenge YouTube Videos. Applications built on such an intelligent framework will help developers in the near future to craft outstanding augmented reality experiences in videos shot with hand-held cameras and 3-D videos.
The Mannequin Challenge was a viral trend on YouTube back in November of 2016 wherein people in the video stood still (like mannequins) while one person shot this scene with a hand-held camera. Researchers say that this provides a data-set in precisely gauging the depth in a video field, where videos comprise of moving cameras and moving people.
“While there is a recent surge in using machine learning for depth prediction, this work is the first to tailor a learning-based approach to the case of simultaneous camera and human motion,” research scientist Tali Dekel and engineer Forrester Cole said in a blog post today.
This method even excels currently available state-of-the-art tools for depth measurement.
“To the extent that people succeed in staying still during the videos, we can assume the scenes are static and obtain accurate camera poses and depth information by processing them with structure-from-motion (SfM) and multi-view stereo (MVS) algorithms,” the paper reads. “Because the entire scene, including the people, is stationary, we estimate camera poses and depth using SfM and MVS, and use this derived 3D data as supervision for training.”
Researchers engaged in making this model by training a neural network that was able to take RGB image input to generate a depth map and make human-like shapes and predict.
AI researchers at the University of California, Berkeley had last year, used these YouTube videos to develop something similar.