Robust Visual SLAM for UAV Navigation in GPS-Denied and Degraded Environments: A Multi-Paradigm Evaluation and Deployment Study

Prasoon Kumar, Akshay Deepak, Sandeep Kumar

May 5, 2026

arXiv:2605.03678v1 PDF

cs.RO(primary)

#2641of 3576·Robotics

#2641 of 3576 · Robotics

Tournament Score

1321±34

10501750

32%

Win Rate

Wins

Losses

Matches

Rating

4.5/ 10

Significance5

Rigor4

Novelty3.5

Clarity5.5

Tournament Score

1321±34

10501750

32%

Win Rate

Wins

Losses

Matches

Rating

4.5/ 10

Significance

Rigor

Novelty

Clarity

Abstract

Reliable localization in GPS-denied, visually degraded environments is critical for autonomous UAV opera- tions. This paper presents a systematic comparative evaluation of five V-SLAM systems ORB-SLAM3, DPVO, DROID-SLAM, DUSt3R, and MASt3R spanning classical, deep learning, recurrent, and Vision Transformer (ViT) paradigms. Experiments are conducted on curated sequences from four public benchmarks (TUM RGB-D, EuRoC MAV, UMA-VI, SubT-MRS) and a custom monocular indoor dataset under five controlled degradation conditions (normal, low light, dust haze, motion blur, and combined), with sub-millimeter Vicon ground truth. Results show that ORB-SLAM3 fails critically under severe degradation (62.4% overall TSR; 0% under dense haze), while learning-based methods remain robust: MASt3R achieves the lowest degraded ATE (0.027 m) and DUSt3R the highest tracking success (96.5%). DPVO offers the best efficiency robustness trade-off (18.6 FPS, 3.1 GB GPU memory, 86.1% TSR), making it the preferred choice for memory-constrained embedded platforms. Embedded deployment analysis across NVIDIA Jetson platforms provides actionable guidelines for SLAM selection under SWaP-constrained UAV scenarios.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

1. Core Contribution

This paper presents a systematic comparative evaluation of five Visual SLAM systems—ORB-SLAM3 (classical), DPVO (deep patch CNN), DROID-SLAM (recurrent differentiable), DUSt3R (ViT), and MASt3R (ViT with learned descriptors)—under controlled visual degradation conditions relevant to UAV navigation in GPS-denied environments. The study spans four public benchmarks and a custom indoor dataset with five degradation categories (normal, low light, dust haze, motion blur, combined), accompanied by embedded deployment profiling on NVIDIA Jetson platforms.

The primary value proposition is a decision-support framework for practitioners selecting SLAM algorithms under SWaP (Size, Weight, and Power) constraints, rather than a novel algorithmic contribution. The paper fills a gap by providing cross-paradigm comparison under degraded conditions that are individually studied but rarely combined in existing benchmarks.

2. Methodological Rigor

Strengths in experimental design:

Statistical significance testing with paired t-tests and Bonferroni correction across 10 pairwise comparisons is appropriate and well-executed.

The degradation model is formally defined using physically motivated operators (Koschmieder's law for haze, intensity scaling for low light, motion kernels for blur).

Sub-millimeter Vicon ground truth for the custom dataset is gold-standard.

Multiple metrics (ATE, RPE, TSR, FPS, GPU memory) provide a multi-dimensional evaluation.

Significant concerns:

The custom dataset construction methodology is unclear. The paper states sequences are "hand-picked" from public datasets and augmented with synthetic degradation, but it also describes a physical 12×8×3m testbed with a hazer and controlled lighting. The relationship between these two data sources is ambiguous—how many sequences come from each? This muddies the reproducibility.

The degradation conditions are primarily synthetic (intensity scaling, additive haze models), which may not capture the full complexity of real-world degradation (non-uniform lighting, spatially varying dust density, sensor noise characteristics). The claim of "controlled replication" overstates the fidelity.

DUSt3R and MASt3R are not SLAM systems per se—they are pairwise 3D reconstruction methods. The paper describes a "unified benchmark pipeline" with loop closure modules, but it's unclear how these were adapted into a full SLAM pipeline. Were additional components added? This is a critical detail that is insufficiently documented.

The paper claims to evaluate on "curated sequences from four public benchmarks," but the evaluation methodology essentially treats each dataset as representing a single degradation modality (TUM=normal, EuRoC=motion blur, UMA-VI=low light, SubT-MRS=dust). This conflation of dataset with degradation type means confounding factors (environment geometry, camera model, trajectory complexity) are not controlled.

The embedded deployment analysis (Tables VIII-IX) appears to be inference latency measurements rather than full SLAM pipeline benchmarking. Whether loop closure, map management, and other backend components are included is unclear.

3. Potential Impact

Practical utility: The deployment recommendations stratified by platform tier (no GPU → Jetson Xavier NX → Jetson AGX Orin → high-end) are genuinely useful for UAV system integrators. The finding that DPVO offers the best efficiency-robustness trade-off for memory-constrained platforms is actionable.

Academic impact: The cross-paradigm comparison is informative but primarily confirmatory—it is expected that learning-based methods outperform classical feature extraction under degradation, and that ViT-based methods with global attention are more robust than local patch methods. The architectural insights (global attention vs. local features, learned geometric priors) are well-articulated but not surprising.

Benchmark contribution: The custom dataset could be valuable if released, but the paper does not commit to public release of the custom dataset (only linking to existing public repositories). Without the dataset, reproducibility is limited.

4. Timeliness & Relevance

The paper addresses a timely and practically important problem. GPS-denied navigation is a genuine capability gap, and the emergence of ViT-based reconstruction methods (DUSt3R/MASt3R appearing in 2024) makes a comparative evaluation relevant. The defence framing, while sometimes overstated, highlights real deployment scenarios.

However, the paper arrives in a rapidly evolving landscape. Gaussian splatting-based SLAM, foundation model-based approaches, and newer ViT variants are emerging quickly. The evaluation may become dated relatively fast.

5. Strengths & Limitations

Key Strengths:

Comprehensive multi-paradigm comparison spanning four distinct architectural families

Well-structured failure mode analysis (Table X) providing qualitative understanding of when and why each system fails

Power efficiency analysis (FPS/Watt) is rarely reported but critical for UAV deployment

Statistical rigor with appropriate multiple comparison correction

Notable Weaknesses:

No novel algorithm, architecture, or dataset contribution—purely evaluative

The "custom dataset" description is contradictory and insufficiently detailed for reproduction

DUSt3R and MASt3R integration into a SLAM pipeline is inadequately described; these are fundamentally different from the other three systems (pairwise reconstruction vs. sequential SLAM)

Some numbers are inconsistent across the paper (e.g., ORB-SLAM3 TSR reported as 62.4%, 61.2% in different sections; DPVO TSR as 86.1%, 87.3%)

The paper is dated May 2026 on arXiv but references works through 2025, and some claims about "future work" include validation on datasets that already exist—suggesting the work may be incomplete

Missing comparison with other relevant systems (e.g., DSO, SVO, Kimera) and no comparison with visual-inertial configurations despite their obvious relevance

The defence framing occasionally borders on speculative and distracts from the technical content

Reproducibility: While official repositories for all five systems are linked, the evaluation scripts, custom dataset, configuration files, and adaptation code (especially for DUSt3R/MASt3R as SLAM) are not provided, limiting reproducibility.

Summary

This is a competent benchmarking study that provides useful practical guidance for SLAM system selection under degraded conditions. Its primary value lies in the breadth of comparison and the deployment-oriented analysis. However, the lack of algorithmic novelty, inconsistencies in reported numbers, ambiguous dataset methodology, and insufficient documentation of how non-SLAM systems (DUSt3R/MASt3R) were integrated into the evaluation pipeline limit its scientific contribution. The findings, while useful, are largely confirmatory of expected trends.

Rating:4.5/ 10

Significance 5Rigor 4Novelty 3.5Clarity 5.5

Generated May 6, 2026

Comparison History (34)

vs. Task-Oriented Robot-Human Handovers on Legged Manipulators

gpt-5.25/6/2026

Paper 2 is more novel and broadly impactful: it introduces a new task-oriented handover framework combining LLM-based part/affordance reasoning with transferable 3D affordances for zero-shot generalization, validated via comparative experiments and a user study, and demonstrated on legged manipulators—directly relevant to human-robot collaboration. Paper 1 is valuable and rigorous but is primarily a comparative evaluation/deployment study of existing SLAM systems; its contribution is more incremental and domain-specific despite strong practical relevance for UAV navigation.

vs. Scalable and General Whole-Body Control for Cross-Humanoid Locomotion

gpt-5.25/6/2026

Paper 1 likely has higher scientific impact due to a more novel and broadly enabling contribution: a single learned whole-body controller that generalizes zero-shot across diverse humanoid morphologies, validated on many simulated models and multiple real robots. This advances a core robotics bottleneck (cross-embodiment control) with clear downstream applications and potential to influence learning, control, and robot design. Paper 2 is timely and practically valuable, but is primarily a comparative evaluation/deployment study of existing SLAM systems, with less methodological novelty and narrower scientific leap.

vs. A Hybrid Autoencoder for Robust Heightmap Generation from Fused Lidar and Depth Data for Humanoid Robot Locomotion

claude-opus-4.65/6/2026

Paper 2 offers broader impact through a comprehensive multi-paradigm evaluation of V-SLAM systems across diverse degradation conditions, providing actionable deployment guidelines for UAV platforms. Its systematic benchmarking methodology spanning classical to transformer-based approaches, combined with embedded deployment analysis on real hardware (Jetson platforms), serves a wider research community. Paper 1, while presenting a solid hybrid autoencoder for heightmap generation, addresses a narrower problem with incremental improvements (7-10% accuracy gains). Paper 2's timeliness (ViT-based SLAM evaluation) and practical relevance for GPS-denied navigation give it stronger impact potential.

vs. Multi-Agent Route Planning as a QUBO Problem

claude-opus-4.65/6/2026

Paper 1 addresses a critical real-world problem (UAV navigation in GPS-denied environments) with a comprehensive multi-paradigm evaluation across multiple benchmarks and degradation conditions, plus practical embedded deployment analysis. It provides actionable guidelines for practitioners and spans multiple active research communities (robotics, computer vision, deep learning, embedded systems). Paper 2 presents a valid QUBO formulation for multi-agent route planning with NP-hardness proof and quantum computing experiments, but addresses a narrower problem with more incremental contributions—the QUBO formulation is relatively straightforward, and the quantum annealing results show no significant advantage over classical solvers.

vs. Seeing Where to Deploy: Metric RGB-Based Traversability Analysis for Aerial-to-Ground Hidden Space Inspection

gemini-35/6/2026

Paper 1 introduces a novel methodological framework for heterogeneous multi-robot cooperation, addressing the complex problem of metric scale recovery and traversability analysis from monocular aerial images without LiDAR. This represents a significant conceptual advancement in autonomous deployment. In contrast, Paper 2 is primarily a benchmarking and evaluation study of existing SLAM systems. While highly useful for practical deployment, it lacks the foundational methodological innovation present in Paper 1, making Paper 1 more likely to drive future theoretical and algorithmic research in aerial-ground robotics.

vs. CyboRacket: A Perception-to-Action Framework for Humanoid Racket Sports

claude-opus-4.65/6/2026

CyboRacket presents a novel integrated perception-to-action framework for humanoid robots performing dynamic racket sports, combining onboard vision, physics-based prediction, and whole-body control on a real humanoid platform. This addresses a fundamental challenge in robotics (tight perception-action coupling under time constraints) with broader implications for humanoid manipulation and real-time decision-making. Paper 1, while practically useful, is primarily a benchmarking/evaluation study of existing SLAM systems rather than introducing a new method, limiting its novelty and broader scientific impact despite its engineering value.

vs. Residual Reinforcement Learning for Waste-Container Lifting Using Large-Scale Cranes with Underactuated Tools

claude-opus-4.65/6/2026

Paper 1 addresses a broader and more impactful problem—UAV navigation in GPS-denied environments—with a comprehensive multi-paradigm evaluation across multiple benchmarks and degradation conditions. It provides actionable deployment guidelines for embedded platforms, making it highly relevant to the growing autonomous systems community. Paper 2, while technically sound, addresses a narrower application (crane container lifting) in simulation only, with more limited generalizability. Paper 1's systematic comparison of classical vs. learning-based SLAM and its practical embedded deployment analysis give it wider relevance and citation potential.

vs. From Bench to Flight: Translating Drone Impact Tests into Operational Safety Limits

gemini-35/6/2026

Paper 1 provides a comprehensive, rigorous evaluation of cutting-edge V-SLAM paradigms, including modern Vision Transformers, under extreme environmental degradations. By benchmarking these advanced models against classical methods and providing actionable deployment guidelines for SWaP-constrained hardware, it addresses a critical bottleneck in autonomous robotics. While Paper 2 offers a highly valuable toolchain for MAV safety compliance, Paper 1 demonstrates broader scientific significance by advancing the fundamental understanding of how modern AI paradigms perform in real-world, visually degraded navigation tasks, likely driving future research in robust robotic perception.

vs. Visuo-Tactile World Models

gemini-35/6/2026

Paper 1 introduces a novel methodological advancement in embodied AI by integrating tactile sensing into world models, significantly improving physical reasoning (e.g., object permanence) and zero-shot robot planning. In contrast, Paper 2 is primarily a benchmarking and evaluation study of existing SLAM algorithms. While Paper 2 offers high practical value for UAV deployment, Paper 1 has a higher potential for foundational scientific impact by addressing critical limitations in how AI systems model and interact with the physical world.

vs. CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction

gpt-5.25/6/2026

Paper 2 likely has higher scientific impact due to greater novelty (introducing MM-EQA plus a decentralized LLM-based communication framework calibrated with conformal prediction), broader cross-field relevance (robotics, multi-agent systems, LLM communication, uncertainty calibration), and timeliness (LLM-assisted coordination is a fast-moving area). It also contributes a new benchmark and dataset, which can drive follow-on work. Paper 1 is rigorous and highly practical for UAV SLAM deployment, but is primarily an evaluation/deployment study rather than a fundamentally new method, making its impact potentially narrower.

vs. On Surprising Effects of Risk-Aware Domain Randomization for Contact-Rich Sampling-based Predictive Control

gpt-5.25/6/2026

Paper 1 likely has higher impact: it delivers a comprehensive, multi-paradigm empirical evaluation of widely used SLAM systems across multiple benchmarks, controlled degradation modes, and real embedded deployment constraints, producing actionable guidance for GPS-denied UAV navigation. The combination of rigor (diverse datasets, quantified failure modes, Vicon ground truth) and direct real-world applicability (Jetson deployment/SWaP trade-offs) broadens relevance across robotics and autonomy. Paper 2 is intriguing and potentially novel, but appears preliminary (single simple task, early observations) with narrower demonstrated scope and validation.

vs. Neural Control: Adjoint Learning Through Equilibrium Constraints

claude-opus-4.65/6/2026

Paper 1 presents a novel theoretical and methodological contribution—adjoint-based differentiable equilibrium control for multi-stable physical systems—with broad applicability across robotics, physical simulation, and differentiable physics. The memory-efficient gradient computation and MPC integration address fundamental challenges in learning-based control. Paper 2, while practically useful, is primarily a comparative benchmark study of existing SLAM systems under degraded conditions, offering incremental engineering insights rather than new methods. Paper 1's novelty, methodological depth, and potential to influence multiple research areas give it higher scientific impact.

vs. Jiao: Bridging Isolation and Customization in Mixed Criticality Robotics

gpt-5.25/6/2026

Paper 2 has higher potential impact due to greater novelty and broader relevance: it proposes a systems architecture that addresses a key barrier to deploying mixed-criticality robotics on shared multicore hardware (isolation plus end-user customization), with safety-oriented mechanisms (Safe IO Cell, sync service, IEC 61508-aligned layer). This is timely for consumer/industrial robotics and can generalize across platforms and applications beyond UAVs. Paper 1 is a valuable, rigorous benchmarking/deployment study, but it is largely evaluative rather than introducing a fundamentally new method, limiting transformative impact.

vs. Sensorless State Estimation and Control for Agile Cable-Suspended Payload Transport by Quadrotors

gpt-5.25/6/2026

Paper 2 is more scientifically innovative: it introduces an uncommon Udwadia–Kalaba constraint-based modeling framework for quadrotor–cable–payload dynamics, integrates the derived tension consistently into NMPC, and adds a sensorless load-state estimator—advances that can generalize to broader constrained aerial manipulation and control problems. It contributes new methodology validated on real robots, with potential impact across robotics, optimal control, and constrained dynamics. Paper 1 is timely and highly useful as a comparative/deployment study, but is largely evaluative/integrative rather than methodologically novel.

vs. Refining Compositional Diffusion for Reliable Long-Horizon Planning

claude-opus-4.65/6/2026

Paper 1 addresses a fundamental problem in diffusion-based planning (mode-averaging in compositional methods) with a novel, training-free guidance approach that has broad applicability across robotics tasks. It introduces a theoretically grounded mechanism (self-reconstruction error as log-density proxy) that advances the state of the art in long-horizon planning. Paper 2, while practically useful, is primarily a comparative evaluation/benchmarking study of existing SLAM systems rather than proposing a novel method, which limits its potential to drive new research directions and reduces its overall scientific impact.

vs. ManeuverNet: A Soft Actor-Critic Framework for Precise Maneuvering of Double-Ackermann-Steering Robots with Optimized Reward Functions

gpt-5.25/6/2026

Paper 1 likely has higher impact due to broader relevance (SLAM robustness affects UAVs, robotics, AR/VR, autonomy), timely focus on GPS-denied/degraded perception, and a rigorous multi-paradigm benchmark across multiple public datasets plus controlled degradations with Vicon ground truth and embedded deployment analysis. Its comparative, deployment-oriented findings can directly guide system selection and stimulate follow-up research on robustness. Paper 2 is valuable but more domain-specific (double-Ackermann agricultural maneuvers) and its core novelty (SAC variant + reward design) is narrower and less broadly generalizable.

vs. Force Generative Imitation Learning: Bridging Position Trajectory and Force Commands through Control Technique

claude-opus-4.65/6/2026

Paper 2 introduces a novel conceptual contribution—bridging position trajectories and force commands through a force generative model combined with feedback control, with an interesting finding about memory and stability. This addresses a fundamental gap in contact-rich manipulation and has broader implications for robotics foundation models. Paper 1, while thorough and practically useful, is primarily a benchmarking/evaluation study of existing SLAM systems rather than introducing new methods. Paper 2's novelty in combining generative models with control theory and its potential to enable force control in foundation models gives it higher scientific impact potential.

vs. Task-Aware Scanning Parameter Configuration for Robotic Inspection Using Vision Language Embeddings and Hyperdimensional Computing

claude-opus-4.65/6/2026

Paper 1 introduces a novel problem formulation (instruction-conditioned sensing parameter recommendation), a new multimodal dataset (Instruct-Obs2Param), and a novel framework (ScanHD) combining vision-language embeddings with hyperdimensional computing—representing genuine methodological innovation. Paper 2, while practically useful, is primarily a comparative benchmark evaluation of existing SLAM systems without introducing new algorithms. Paper 1's novelty in formulating and solving an overlooked problem (adaptive sensor configuration from task intent) has broader potential to influence autonomous inspection, robotic sensing, and multimodal reasoning research.

vs. Optically Sensorized Electro-Ribbon Actuator (OS-ERA)

gemini-35/6/2026

Paper 2 introduces a novel hardware innovation by integrating soft optical waveguide sensors into Electro-Ribbon Actuators, solving a critical bottleneck for closed-loop control in soft robotics. This fundamental advancement in actuator design offers higher scientific novelty and potential for follow-up research compared to Paper 1, which primarily provides an empirical benchmarking and evaluation of existing SLAM algorithms rather than introducing new underlying methodologies.

vs. BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

gpt-5.25/6/2026

Paper 1 is more novel and potentially higher-impact: it introduces a robot-free, portable VR-based data collection pipeline for humanoid whole-body manipulation and links it to learning + retargeting + whole-body control, addressing a key bottleneck (scalable demonstrations) with broad relevance to humanoid robotics and imitation learning. Paper 2 is rigorous and timely with strong real-world applicability, but is primarily a comparative/deployment study over existing SLAM methods, offering less methodological novelty and likely narrower long-term scientific novelty despite high practical value.