Friday, August 15, 2025
Science
No Result
View All Result
  • Login
  • HOME
  • SCIENCE NEWS
  • CONTACT US
  • HOME
  • SCIENCE NEWS
  • CONTACT US
No Result
View All Result
Scienmag
No Result
View All Result
Home Science News Space

On robust cross-view consistency in self-supervised monocular depth estimation

June 17, 2024
in Space
Reading Time: 6 mins read
0
Visualization of the photometric loss
65
SHARES
589
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT
ADVERTISEMENT

Understanding the 3D structure of scenes is an essential topic in machine perception, which plays a crucial part in autonomous driving and robot vision. Traditionally, this task can be accomplished by structure from motion and with multi-view or binocular stereo inputs. Since stereo images are more expensive and inconvenient to acquire than monocular ones, solutions based on monocular vision have attracted increasing attention from the community. However, monocular depth estimation is generally more challenging than stereo methods due to scale ambiguity and unknown camera motion. Several works have been proposed to narrow the performance gap.

Visualization of the photometric loss

Credit: Beijing Zhongke Journal Publising Co. Ltd.

Understanding the 3D structure of scenes is an essential topic in machine perception, which plays a crucial part in autonomous driving and robot vision. Traditionally, this task can be accomplished by structure from motion and with multi-view or binocular stereo inputs. Since stereo images are more expensive and inconvenient to acquire than monocular ones, solutions based on monocular vision have attracted increasing attention from the community. However, monocular depth estimation is generally more challenging than stereo methods due to scale ambiguity and unknown camera motion. Several works have been proposed to narrow the performance gap.

 

Recently, with the unprecedented success of deep learning in computer vision, convolutional neural networks (CNNs) have achieved promising results in the field of depth estimation. In the paradigm of supervised learning, depth estimation is usually regarded as a regression or classification problem, which needs expensive labeled datasets. By contrast, there are also some successful attempts to execute monocular depth estimation and visual odometry prediction together in a self-supervised manner by utilizing cross-view consistency between consecutive frames. In most prior works of this pipeline, two networks are used to predict the depth and the camera pose separately, which are then jointly exploited to warp source frames to the reference ones, thereby converting the depth estimation problem to a photometric error minimization process. The essence of this paradigm is utilizing the cross-view geometry consistency from videos to regularize the joint learning of depth and pose.

 

Previous self-supervised monocular depth estimation (SS-MDE) works have proved the effectiveness of the photometric loss among consecutive frames, but it is quite vulnerable and even problematic in some cases. First, the photometric consistency is based on the assumption that the pixel intensities projected from the same 3D point in different frames are constant, which is easily violated by illumination variance, reflective surface and texture-less region. Second, there are always some dynamic objects in natural scenes and thus generating occlusion areas, which also affects the success of photometric consistency. To demonstrate the vulnerability of photometric loss, researchers conduct a preliminary study on virtual KITTI because it has dense ground truth depth maps and precise poses. One figure about visualization of the photometric loss shows that even though the ground truth depth and pose are used, the photometric loss map is always not zero due to factors such as occlusions, illumination variance, dynamic objects, etc. To address this problem, the perceptual losses are used in recent work. In line with this research direction, researchers are dedicated to proposing more robust loss items to help enhance the self-supervision signal.

 

Therefore, the work published in Machine Intelligence Research by researchers from University of Sydney, Tsinghua University and University of Queensland targets to explore more robust cross-view consistency losses to mitigate the side effect of these challenging cases. Researchers first propose a depth feature alignment (DFA) loss, which learns feature offsets between consecutive frames by reconstructing the reference frames from its adjacent frames via deformable alignment. Then, these feature offsets are used to align the temporal depth feature. In this way, researchers utilize the consistency between adjacent frames via feature-level representation, which is more representative and discriminative than pixel intensities. One figure in this paper shows that comparing the photometric intensity between consecutive frames can be problematic, because the intensities of the surrounding region of the target pixel are very close, and the ambiguity may probably cause mismatches.

 

Besides, prior work proposes to use ICP-based point cloud alignment loss to utilize 3D geometry to enforce cross-view consistency, which is useful to alleviate the ambiguity of 2D pixels. However, rigid 3D point cloud alignment cannot work properly in scenes with the object motion and the resulting occlusion, thereby being sensitive to local object motion. In order to make the model more robust to moving objects and the resulting occlusion areas, researchers propose voxel density as a new 3D representation and define voxel density alignment (VDA) loss to enforce cross-view consistency. Their VDA loss regards the point cloud as an integral spatial distribution. It only enforces the numbers of points inside corresponding voxels of adjacent frames to be consistent and does not penalize small spatial perturbation since the point still stays in the same voxel.

 

These two cross-view consistency losses exploit the temporal coherence in depth feature space and 3D voxel space for SS-MDE, both shifting the prior “point-to-point” alignment paradigm to the “region-to-region” one. Their method can achieve superior results than the state-of-the-art (SOTA). Researchers conduct ablation experiments to demonstrate the effectiveness and robustness of the proposed losses.

 

SS-MDE paradigm has become very popular in the community, which mainly takes advantage of cross-view consistency in monocular videos. In Section 2, researchers explore different categories of cross-view consistency used in previous self-unsupervised monocular depth estimation works, including photometric cross-view consistency, feature-level cross-view consistency and 3D space cross-view consistency.

 

Section 3 introduces methods of the study. In this paper, researchers adopt DFA loss and VDA loss as additional cross-view consistency to the widely used photometric loss and smooth loss. Researchers first propose DFA loss to exploit the temporal coherence in feature space to produce consistent depth estimation. Compared with the photometric loss in the RGB space, measuring the cross-view consistency in the depth feature space is more robust in challenging cases such as illumination variance and texture-less regions, owing to the representation power of deep features. Moreover, researchers design VDA loss to exploit robust cross-view 3D geometry consistency by aligning point cloud distribution in the voxel space. VDA loss has shown to be more effective in handling moving objects and occlusion regions than the rigid point cloud alignment loss.

 

Section 4 gives a detailed description of the experiments. This section includes seven parts: Part one is about network implementation, network in this paper is composed of three branches for offset learning, depth estimation and pose estimation, respectively; Part two is the introduction of evaluation metrics; Part three is the depth estimation evaluation; Part four introduces ablation study, researchers first ablate the performance of different backbone networks and input resolutions used in their method, the following ablation study is conducted on KITTI using the most lightweight version (R18 LR) to highlight the effectiveness of the proposed two cross-view consistency losses, DFA loss and VDA loss; Part five is the experimental analysis, which includes the effectiveness of VDA loss in handling moving objects, analysis of hyperparameters in VDA loss, analysis of depth feature alignment offset, visualization of depth feature alignment offset, comparison with alignment using optical flow; Part six is the evaluation of generalization ability; Part seven is the model complexity analysis, the model complexity of researchers’ methods is consistent with their baseline method.

 

This study is dedicated to the SS-MDE problem with a focus on robust cross-view consistency. Experimental results on outdoor benchmarks demonstrate that the method in this paper has achieved superior results than state-of-the-art approaches and can generate better depth maps in texture-less regions and moving object areas. Researchers propose that more efforts can be made to improve the voxelization method in VDA loss to enhance the generalization ability and apply the proposed method to indoor scenes, which will be left for future work.

 

 

See the article:

On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation



Journal

Machine Intelligence Research

DOI

10.1007/s11633-023-1474-0

Article Title

On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation

Article Publication Date

21-Mar-2024

Share26Tweet16
Previous Post

SwRI breaks ground on new hypersonic engine research facility

Next Post

Duke-NUS study reveals high use of physical restraints in home care for older adults with dementia

Related Posts

blank
Space

Infant Mice Thrive in Microgravity: A Groundbreaking Space Research Discovery

August 15, 2025
blank
Space

Loop Quantum Gravity: Black Hole Effects Rewritten

August 15, 2025
blank
Space

Extended Enriched Gas Found in Redshift 6.7 Merger

August 15, 2025
blank
Space

Moon Radiation: Unleashing Cosmic Particle Secrets

August 14, 2025
blank
Space

As Atmospheric Conditions Evolve, So Will Their Reaction to Geomagnetic Storms

August 14, 2025
blank
Space

Fermions Conquer Cosmic Singularity Chaos!

August 14, 2025
Next Post

Duke-NUS study reveals high use of physical restraints in home care for older adults with dementia

  • Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    27533 shares
    Share 11010 Tweet 6881
  • University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

    947 shares
    Share 379 Tweet 237
  • Bee body mass, pathogens and local climate influence heat tolerance

    641 shares
    Share 256 Tweet 160
  • Researchers record first-ever images and data of a shark experiencing a boat strike

    507 shares
    Share 203 Tweet 127
  • Warm seawater speeding up melting of ‘Doomsday Glacier,’ scientists warn

    310 shares
    Share 124 Tweet 78
Science

Embark on a thrilling journey of discovery with Scienmag.com—your ultimate source for cutting-edge breakthroughs. Immerse yourself in a world where curiosity knows no limits and tomorrow’s possibilities become today’s reality!

RECENT NEWS

  • Efficient Framework Models Ionic Materials’ Surface Chemistry
  • Identity Fusion Boosts Trust, Cooperation Across Groups
  • Microglia Link Sleep Loss to Mania Sex-Specifically
  • Respiration Defects Hinder Serine Synthesis in Lung Cancer

Categories

  • Agriculture
  • Anthropology
  • Archaeology
  • Athmospheric
  • Biology
  • Bussines
  • Cancer
  • Chemistry
  • Climate
  • Earth Science
  • Marine
  • Mathematics
  • Medicine
  • Pediatry
  • Policy
  • Psychology & Psychiatry
  • Science Education
  • Social Science
  • Space
  • Technology and Engineering

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 4,859 other subscribers

© 2025 Scienmag - Science Magazine

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • HOME
  • SCIENCE NEWS
  • CONTACT US

© 2025 Scienmag - Science Magazine

Discover more from Science

Subscribe now to keep reading and get access to the full archive.

Continue reading