Supervised Monocular Depth Estimation Based on Machine and Deep Learning Models
Abstract: Depth Estimation refers to measuring the distance of each pixel relative to the camera. Depth estimation is crucial for many applications, such as scene understanding and reconstruction, robot vision, and self-driving cars. Depth maps can be estimated using stereo or monocular images. Depth estimation is typically performed through stereo vision following several time-consuming stages, such as epipolar geometry, rectification, and matching. However, predicting depth maps from single RGB images is still challenging as object shapes are to be inferred from intensity images strongly affected by viewpoint changes, texture content, and light conditions. Additionally, the camera only captures a 2D projection of the 3D world. While the apparent size and position of objects in the image can change significantly based on their distance from the camera.
Stereo cameras have been deployed in systems to obtain depth map information. Although it shows good performance, but its main drawback is the complex and expensive hardware setup it requires and the time complexity, which limits its use. In turn, monocular cameras have become simpler and cheaper; however, single images always need more important depth map information. Many approaches to predict depth maps from monocular images have recently been proposed, thanks to the revolution of deep learning models. However, most of these solutions result in blurry approximations of low-resolution depth maps. In general, depth estimation requires knowing the appropriate representation methods to extract the shared features in a single RGB image and the corresponding depth map to get the depth estimation.
Consequently, this thesis attempts to contribute into two research lines in estimating depth maps (also known as depth images): the first line estimates the depth based on the object present in a scene to reduce the complexity of the complete scene. Thus, we developed new techniques and concepts based on traditional and deep learning methods to achieve this task. The second research line estimates the depth based on a complete scene from a monocular camera. We have developed more comprehensive techniques with a high precision rate and acceptable computational timing to get more precise depth maps.