Robust Visual SLAM for UAV Navigation in GPS-Denied and Degraded Environments: A Multi-Paradigm Evaluation and Deployment Study
Prasoon Kumar, Akshay Deepak, Sandeep Kumar
Abstract
Reliable localization in GPS-denied, visually degraded environments is critical for autonomous UAV opera- tions. This paper presents a systematic comparative evaluation of five V-SLAM systems ORB-SLAM3, DPVO, DROID-SLAM, DUSt3R, and MASt3R spanning classical, deep learning, recurrent, and Vision Transformer (ViT) paradigms. Experiments are conducted on curated sequences from four public benchmarks (TUM RGB-D, EuRoC MAV, UMA-VI, SubT-MRS) and a custom monocular indoor dataset under five controlled degradation conditions (normal, low light, dust haze, motion blur, and combined), with sub-millimeter Vicon ground truth. Results show that ORB-SLAM3 fails critically under severe degradation (62.4% overall TSR; 0% under dense haze), while learning-based methods remain robust: MASt3R achieves the lowest degraded ATE (0.027 m) and DUSt3R the highest tracking success (96.5%). DPVO offers the best efficiency robustness trade-off (18.6 FPS, 3.1 GB GPU memory, 86.1% TSR), making it the preferred choice for memory-constrained embedded platforms. Embedded deployment analysis across NVIDIA Jetson platforms provides actionable guidelines for SLAM selection under SWaP-constrained UAV scenarios.
AI Impact Assessments
(1 models)Scientific Impact Assessment
1. Core Contribution
This paper presents a systematic comparative evaluation of five Visual SLAM systems—ORB-SLAM3 (classical), DPVO (deep patch CNN), DROID-SLAM (recurrent differentiable), DUSt3R (ViT), and MASt3R (ViT with learned descriptors)—under controlled visual degradation conditions relevant to UAV navigation in GPS-denied environments. The study spans four public benchmarks and a custom indoor dataset with five degradation categories (normal, low light, dust haze, motion blur, combined), accompanied by embedded deployment profiling on NVIDIA Jetson platforms.
The primary value proposition is a decision-support framework for practitioners selecting SLAM algorithms under SWaP (Size, Weight, and Power) constraints, rather than a novel algorithmic contribution. The paper fills a gap by providing cross-paradigm comparison under degraded conditions that are individually studied but rarely combined in existing benchmarks.
2. Methodological Rigor
Strengths in experimental design:
Significant concerns:
3. Potential Impact
Practical utility: The deployment recommendations stratified by platform tier (no GPU → Jetson Xavier NX → Jetson AGX Orin → high-end) are genuinely useful for UAV system integrators. The finding that DPVO offers the best efficiency-robustness trade-off for memory-constrained platforms is actionable.
Academic impact: The cross-paradigm comparison is informative but primarily confirmatory—it is expected that learning-based methods outperform classical feature extraction under degradation, and that ViT-based methods with global attention are more robust than local patch methods. The architectural insights (global attention vs. local features, learned geometric priors) are well-articulated but not surprising.
Benchmark contribution: The custom dataset could be valuable if released, but the paper does not commit to public release of the custom dataset (only linking to existing public repositories). Without the dataset, reproducibility is limited.
4. Timeliness & Relevance
The paper addresses a timely and practically important problem. GPS-denied navigation is a genuine capability gap, and the emergence of ViT-based reconstruction methods (DUSt3R/MASt3R appearing in 2024) makes a comparative evaluation relevant. The defence framing, while sometimes overstated, highlights real deployment scenarios.
However, the paper arrives in a rapidly evolving landscape. Gaussian splatting-based SLAM, foundation model-based approaches, and newer ViT variants are emerging quickly. The evaluation may become dated relatively fast.
5. Strengths & Limitations
Key Strengths:
Notable Weaknesses:
Reproducibility: While official repositories for all five systems are linked, the evaluation scripts, custom dataset, configuration files, and adaptation code (especially for DUSt3R/MASt3R as SLAM) are not provided, limiting reproducibility.
Summary
This is a competent benchmarking study that provides useful practical guidance for SLAM system selection under degraded conditions. Its primary value lies in the breadth of comparison and the deployment-oriented analysis. However, the lack of algorithmic novelty, inconsistencies in reported numbers, ambiguous dataset methodology, and insufficient documentation of how non-SLAM systems (DUSt3R/MASt3R) were integrated into the evaluation pipeline limit its scientific contribution. The findings, while useful, are largely confirmatory of expected trends.
Generated May 6, 2026
Comparison History (34)
Paper 2 is more novel and broadly impactful: it introduces a new task-oriented handover framework combining LLM-based part/affordance reasoning with transferable 3D affordances for zero-shot generalization, validated via comparative experiments and a user study, and demonstrated on legged manipulators—directly relevant to human-robot collaboration. Paper 1 is valuable and rigorous but is primarily a comparative evaluation/deployment study of existing SLAM systems; its contribution is more incremental and domain-specific despite strong practical relevance for UAV navigation.
Paper 1 likely has higher scientific impact due to a more novel and broadly enabling contribution: a single learned whole-body controller that generalizes zero-shot across diverse humanoid morphologies, validated on many simulated models and multiple real robots. This advances a core robotics bottleneck (cross-embodiment control) with clear downstream applications and potential to influence learning, control, and robot design. Paper 2 is timely and practically valuable, but is primarily a comparative evaluation/deployment study of existing SLAM systems, with less methodological novelty and narrower scientific leap.
Paper 2 offers broader impact through a comprehensive multi-paradigm evaluation of V-SLAM systems across diverse degradation conditions, providing actionable deployment guidelines for UAV platforms. Its systematic benchmarking methodology spanning classical to transformer-based approaches, combined with embedded deployment analysis on real hardware (Jetson platforms), serves a wider research community. Paper 1, while presenting a solid hybrid autoencoder for heightmap generation, addresses a narrower problem with incremental improvements (7-10% accuracy gains). Paper 2's timeliness (ViT-based SLAM evaluation) and practical relevance for GPS-denied navigation give it stronger impact potential.
Paper 1 addresses a critical real-world problem (UAV navigation in GPS-denied environments) with a comprehensive multi-paradigm evaluation across multiple benchmarks and degradation conditions, plus practical embedded deployment analysis. It provides actionable guidelines for practitioners and spans multiple active research communities (robotics, computer vision, deep learning, embedded systems). Paper 2 presents a valid QUBO formulation for multi-agent route planning with NP-hardness proof and quantum computing experiments, but addresses a narrower problem with more incremental contributions—the QUBO formulation is relatively straightforward, and the quantum annealing results show no significant advantage over classical solvers.
Paper 1 introduces a novel methodological framework for heterogeneous multi-robot cooperation, addressing the complex problem of metric scale recovery and traversability analysis from monocular aerial images without LiDAR. This represents a significant conceptual advancement in autonomous deployment. In contrast, Paper 2 is primarily a benchmarking and evaluation study of existing SLAM systems. While highly useful for practical deployment, it lacks the foundational methodological innovation present in Paper 1, making Paper 1 more likely to drive future theoretical and algorithmic research in aerial-ground robotics.
CyboRacket presents a novel integrated perception-to-action framework for humanoid robots performing dynamic racket sports, combining onboard vision, physics-based prediction, and whole-body control on a real humanoid platform. This addresses a fundamental challenge in robotics (tight perception-action coupling under time constraints) with broader implications for humanoid manipulation and real-time decision-making. Paper 1, while practically useful, is primarily a benchmarking/evaluation study of existing SLAM systems rather than introducing a new method, limiting its novelty and broader scientific impact despite its engineering value.
Paper 1 addresses a broader and more impactful problem—UAV navigation in GPS-denied environments—with a comprehensive multi-paradigm evaluation across multiple benchmarks and degradation conditions. It provides actionable deployment guidelines for embedded platforms, making it highly relevant to the growing autonomous systems community. Paper 2, while technically sound, addresses a narrower application (crane container lifting) in simulation only, with more limited generalizability. Paper 1's systematic comparison of classical vs. learning-based SLAM and its practical embedded deployment analysis give it wider relevance and citation potential.
Paper 1 provides a comprehensive, rigorous evaluation of cutting-edge V-SLAM paradigms, including modern Vision Transformers, under extreme environmental degradations. By benchmarking these advanced models against classical methods and providing actionable deployment guidelines for SWaP-constrained hardware, it addresses a critical bottleneck in autonomous robotics. While Paper 2 offers a highly valuable toolchain for MAV safety compliance, Paper 1 demonstrates broader scientific significance by advancing the fundamental understanding of how modern AI paradigms perform in real-world, visually degraded navigation tasks, likely driving future research in robust robotic perception.
Paper 1 introduces a novel methodological advancement in embodied AI by integrating tactile sensing into world models, significantly improving physical reasoning (e.g., object permanence) and zero-shot robot planning. In contrast, Paper 2 is primarily a benchmarking and evaluation study of existing SLAM algorithms. While Paper 2 offers high practical value for UAV deployment, Paper 1 has a higher potential for foundational scientific impact by addressing critical limitations in how AI systems model and interact with the physical world.
Paper 2 likely has higher scientific impact due to greater novelty (introducing MM-EQA plus a decentralized LLM-based communication framework calibrated with conformal prediction), broader cross-field relevance (robotics, multi-agent systems, LLM communication, uncertainty calibration), and timeliness (LLM-assisted coordination is a fast-moving area). It also contributes a new benchmark and dataset, which can drive follow-on work. Paper 1 is rigorous and highly practical for UAV SLAM deployment, but is primarily an evaluation/deployment study rather than a fundamentally new method, making its impact potentially narrower.
Paper 1 likely has higher impact: it delivers a comprehensive, multi-paradigm empirical evaluation of widely used SLAM systems across multiple benchmarks, controlled degradation modes, and real embedded deployment constraints, producing actionable guidance for GPS-denied UAV navigation. The combination of rigor (diverse datasets, quantified failure modes, Vicon ground truth) and direct real-world applicability (Jetson deployment/SWaP trade-offs) broadens relevance across robotics and autonomy. Paper 2 is intriguing and potentially novel, but appears preliminary (single simple task, early observations) with narrower demonstrated scope and validation.
Paper 1 presents a novel theoretical and methodological contribution—adjoint-based differentiable equilibrium control for multi-stable physical systems—with broad applicability across robotics, physical simulation, and differentiable physics. The memory-efficient gradient computation and MPC integration address fundamental challenges in learning-based control. Paper 2, while practically useful, is primarily a comparative benchmark study of existing SLAM systems under degraded conditions, offering incremental engineering insights rather than new methods. Paper 1's novelty, methodological depth, and potential to influence multiple research areas give it higher scientific impact.
Paper 2 has higher potential impact due to greater novelty and broader relevance: it proposes a systems architecture that addresses a key barrier to deploying mixed-criticality robotics on shared multicore hardware (isolation plus end-user customization), with safety-oriented mechanisms (Safe IO Cell, sync service, IEC 61508-aligned layer). This is timely for consumer/industrial robotics and can generalize across platforms and applications beyond UAVs. Paper 1 is a valuable, rigorous benchmarking/deployment study, but it is largely evaluative rather than introducing a fundamentally new method, limiting transformative impact.
Paper 2 is more scientifically innovative: it introduces an uncommon Udwadia–Kalaba constraint-based modeling framework for quadrotor–cable–payload dynamics, integrates the derived tension consistently into NMPC, and adds a sensorless load-state estimator—advances that can generalize to broader constrained aerial manipulation and control problems. It contributes new methodology validated on real robots, with potential impact across robotics, optimal control, and constrained dynamics. Paper 1 is timely and highly useful as a comparative/deployment study, but is largely evaluative/integrative rather than methodologically novel.
Paper 1 addresses a fundamental problem in diffusion-based planning (mode-averaging in compositional methods) with a novel, training-free guidance approach that has broad applicability across robotics tasks. It introduces a theoretically grounded mechanism (self-reconstruction error as log-density proxy) that advances the state of the art in long-horizon planning. Paper 2, while practically useful, is primarily a comparative evaluation/benchmarking study of existing SLAM systems rather than proposing a novel method, which limits its potential to drive new research directions and reduces its overall scientific impact.
Paper 1 likely has higher impact due to broader relevance (SLAM robustness affects UAVs, robotics, AR/VR, autonomy), timely focus on GPS-denied/degraded perception, and a rigorous multi-paradigm benchmark across multiple public datasets plus controlled degradations with Vicon ground truth and embedded deployment analysis. Its comparative, deployment-oriented findings can directly guide system selection and stimulate follow-up research on robustness. Paper 2 is valuable but more domain-specific (double-Ackermann agricultural maneuvers) and its core novelty (SAC variant + reward design) is narrower and less broadly generalizable.
Paper 2 introduces a novel conceptual contribution—bridging position trajectories and force commands through a force generative model combined with feedback control, with an interesting finding about memory and stability. This addresses a fundamental gap in contact-rich manipulation and has broader implications for robotics foundation models. Paper 1, while thorough and practically useful, is primarily a benchmarking/evaluation study of existing SLAM systems rather than introducing new methods. Paper 2's novelty in combining generative models with control theory and its potential to enable force control in foundation models gives it higher scientific impact potential.
Paper 1 introduces a novel problem formulation (instruction-conditioned sensing parameter recommendation), a new multimodal dataset (Instruct-Obs2Param), and a novel framework (ScanHD) combining vision-language embeddings with hyperdimensional computing—representing genuine methodological innovation. Paper 2, while practically useful, is primarily a comparative benchmark evaluation of existing SLAM systems without introducing new algorithms. Paper 1's novelty in formulating and solving an overlooked problem (adaptive sensor configuration from task intent) has broader potential to influence autonomous inspection, robotic sensing, and multimodal reasoning research.
Paper 2 introduces a novel hardware innovation by integrating soft optical waveguide sensors into Electro-Ribbon Actuators, solving a critical bottleneck for closed-loop control in soft robotics. This fundamental advancement in actuator design offers higher scientific novelty and potential for follow-up research compared to Paper 1, which primarily provides an empirical benchmarking and evaluation of existing SLAM algorithms rather than introducing new underlying methodologies.
Paper 1 is more novel and potentially higher-impact: it introduces a robot-free, portable VR-based data collection pipeline for humanoid whole-body manipulation and links it to learning + retargeting + whole-body control, addressing a key bottleneck (scalable demonstrations) with broad relevance to humanoid robotics and imitation learning. Paper 2 is rigorous and timely with strong real-world applicability, but is primarily a comparative/deployment study over existing SLAM methods, offering less methodological novelty and likely narrower long-term scientific novelty despite high practical value.